Multilingual Transformer-Based Summarization for Low-Resource Nepali News Articles

Bidhya Bhattarai; Biplav Chaudhary; Punam Bashyal; Sajita Khadka; Krishnanand Badu

doi:10.3126/jhcoe.v2i1.91518

Authors

Bidhya Bhattarai Himalaya College of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Biplav Chaudhary Himalaya College of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Punam Bashyal Himalaya College of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Sajita Khadka Himalaya College of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Krishnanand Badu Himalaya College of Engineering, Tribhuvan University (TU), Lalitpur, Nepal

DOI:

https://doi.org/10.3126/jhcoe.v2i1.91518

Keywords:

Abstractive, Low-Resource Language, MT5, NLP, Transformer

Abstract

The study refines an abstract Nepali news summarization system using natural language processing (NLP). The powerful multilingual T5 (mT5) model was fine-tuned in the collected data set. Pre-processing steps, including tokenization, punctuation removal, and special character removal, were applied to enhance performance. Using supervised learning, the model was trained to reduce overfitting. Evaluation was conducted using the ROUGE metric to assess the quality of the generated summaries. The extensive text is then provided to users in the form of concise and meaningful summaries, preserving the core meaning of the original content. The news articles were also extracted using an API, and the summaries are displayed accordingly. This paper highlights the transformer- based model for low-resource languages like Nepali. Moving forward, the plan is to secure more powerful computational resources and improve the scalability of the generated summaries.

Downloads

Download data is not yet available.

Abstract

87

PDF

109

Multilingual Transformer-Based Summarization for Low-Resource Nepali News Articles

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Current Issue

Information