Video Captioning in Nepali Using Encoder Decoder
DOI:
https://doi.org/10.3126/jacem.v9i1.71424Keywords:
MSVD, Encoder, Decoder, LSTM, GRUAbstract
Video captioning is a challenging task as it requires accurately transforming visual understanding into natural language descriptions. This challenge is further compounded when dealing with Nepali, due to the lack of existing academic work in this domain. This study develops an encoder-decoder paradigm for Nepali video captioning to address this difficulty. Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) sequence-to-sequence models are utilized to produce relevant textual descriptions based on features extracted from video frames using Convolutional Neural Networks (CNNs). Additionally, a Nepali video captioning dataset is created by adapting the Microsoft Research Video Description Corpus (MSVD) datasets through Google Translate, followed by manual post-editing. The efficiency of the model for video captioning in Nepali is demonstrated using BLEU, METEOR, and ROUGE metrics to assess its performance.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
JACEM reserves the copyright for the published papers. Author will have right to use content of the published paper in part or in full for their own work.