Political Profiling of Nepali Twitter Users based on Space Vector Model
DOI:
https://doi.org/10.3126/jost.v4i1.74558Keywords:
Nepali language, political profiling, vector space model, TF-IDF, Word-embedding, doc2vec, cosine similarityAbstract
Everyday people in social networks create a huge amount of data as posts, blogs, tweets, articles, comments in form of text, images, audios and videos. The number of social media users and the data they are adding up in the cloud is increasing drastically day by day. People from all over the globe with different regions, cultures, languages, education, and public figures post or blogs reflecting their vision and opinion. These micro-blogs are now being used by researchers and business houses to analyze behaviour, sentiment and daily life-consuming habits, expenses capacity. In this paper, we are concerned about classifying a Nepali Twitter user to one of the pre-defined classes of political parties in Nepal using a vector space model. In this approach, a set of words is defined as a document class that represents a political party. A number of steps for text preprocessing are to be done based on the morphological structure of the Nepali language for a better result. TF-IDF and Doc2Vec methods are used to extract the feature of the terms used in tweets. Cosine similarity as a classifier is used to match the tweeter's profile with the political party's class and find the maximum similarity. Finally, com-pare the result between TF-IDF and Doc2Vec to conclude which one is more effective in the domain of tweets in the Nepali language.