Nepali Speech Emotion Recognition Using Variational Quantum Circuits
DOI:
https://doi.org/10.3126/jacem.v12i01.93927Keywords:
Data Reuploading Classifier, Nepali Emotion Dataset, SER, VQCAbstract
Speech emotion recognition (SER) is an active area of research, yet existing work has focused almost exclusively on high-resource languages, leaving Nepali — with over 32 million first- and second-language speakers worldwide — without any published SER study or emotional-speech corpus. This paper addresses that gap along two dimensions. First, we construct a Nepali emotional-speech dataset comprising 600 utterances across three emotion classes (happy, sad, neutral), validated by 117 native listeners whose mean recognition accuracy is 91.5%. Second, on this corpus, we evaluate a fully quantum data-reuploading variational quantum circuit (VQC) classifier with trainable SU(2) encoding on nine qubits, and compare it directly against two classical baselines — a random forest and a multilayer perceptron — on the same PCA(27) feature pipeline and stratified 480/120 split. A staged hyperparameter search covering circuit depth, learning rate, optimizer, and batch size identifies an optimal VQC configuration of ten layers and 540 trainable parameters, which attains 90.83% test accuracy and a macro-F1 of 0.908. Gradient-norm analysis confirms the absence of barren plateaus during training. Both classical baselines outperform the VQC under this protocol (Random Forest 95.00%, MLP 99.17%); however, a leave-one-speaker-out robustness check shows that classical accuracy collapses by approximately one-third under this evaluation, indicating that a substantial portion of the classical advantage reflects speaker-level information leakage.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
JACEM reserves the copyright for the published papers. Author will have right to use content of the published paper in part or in full for their own work.