Automated News Classification using N-gram Model and Key Features of Nepali Language

Authors

  • Dinesh Dangol Nepal Engineering College, Changunarayan, Bhaktapur, Nepal
  • Rupesh Dahi Shrestha Nepal Engineering College, Changunarayan, Bhaktapur, Nepal
  • Arun Timalsina Institute of Engineering, Pulchowk Campus, Tribhuvan University, Nepal

DOI:

https://doi.org/10.3126/scitech.v13i1.23504

Keywords:

Document Similarity, Nepali Text Classification, Morphological analysis, Vector Space Model, Bag-of-words Model, N-gram, Bi-gram, Nepali News Classification

Abstract

With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.

Downloads

Download data is not yet available.
Abstract
2148
PDF
967

Author Biographies

Dinesh Dangol, Nepal Engineering College, Changunarayan, Bhaktapur, Nepal

Assistant Professor and Head, Department of Computer Science and Engineering

Rupesh Dahi Shrestha, Nepal Engineering College, Changunarayan, Bhaktapur, Nepal

Assistant Professor and Head, Department of Electronics and Communication Engineering

Arun Timalsina, Institute of Engineering, Pulchowk Campus, Tribhuvan University, Nepal

Assistant Professor, Department of Electronics and Computer Engineering and Deputy Director of Center for Applied Research and Development

Downloads

Published

2018-09-30

How to Cite

Dangol, D., Shrestha, R. D., & Timalsina, A. (2018). Automated News Classification using N-gram Model and Key Features of Nepali Language. SCITECH Nepal, 13(1), 64–69. https://doi.org/10.3126/scitech.v13i1.23504

Issue

Section

Articles