Balancing Privacy And Accuracy In Nepali Sentiment Analysis: Fine-Tuning Nepalibert With Differential Privacy
DOI:
https://doi.org/10.3126/jacem.v12i01.93900Keywords:
Differential Privacy, DP-SGD, Low-Resource NLP, Membership Inference Attack, Nepali Sentiment Analysis, NepaliBERT, Privacy–Utility Trade-off, Privacy-Preserving Machine LearningAbstract
The increasing volume of user-generated Nepali text has enabled the development of sentiment analysis systems, but training large language models on real data introduces significant privacy risks, including potential exposure through membership inference attacks. This study examines the balance between accuracy and privacy in Nepali sentiment analysis by fine-tuning NepaliBERT with and without Differential Privacy. A high-performing non-private baseline model was trained on approximately 7,000 labeled samples, achieving near-perfect classification performance (Accuracy up to 99.88–100% and Macro F1 up to 1.00), and was subsequently evaluated for vulnerability using membership inference and canary-based privacy assessments. To mitigate privacy risks, Differentially Private Stochastic Gradient Descent was applied under varying privacy budgets (ε), and the resulting models were systematically analyzed to measure performance degradation and resistance to privacy attacks. The findings establish an empirical benchmark for the privacy–utility trade-off in low-resource Nepali NLP and provide practical guidance for building sentiment analysis systems that are both accurate and privacy-preserving.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
JACEM reserves the copyright for the published papers. Author will have right to use content of the published paper in part or in full for their own work.