Video Captioning in Nepali Using Encoder Decoder

Kabita Parajuli; Shashidhar R Joshi

doi:10.3126/jacem.v9i1.71424

Authors

Kabita Parajuli Department of Electronics and Computer Engineering Pulchowk Campus, Tribhuvan University Lalitpur, Nepal
Shashidhar R Joshi Department of Electronics and Computer Engineering Pulchowk Campus, Tribhuvan University Lalitpur, Nepal

DOI:

https://doi.org/10.3126/jacem.v9i1.71424

Keywords:

MSVD, Encoder, Decoder, LSTM, GRU

Abstract

Video captioning is a challenging task as it requires accurately transforming visual understanding into natural language descriptions. This challenge is further compounded when dealing with Nepali, due to the lack of existing academic work in this domain. This study develops an encoder-decoder paradigm for Nepali video captioning to address this difficulty. Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) sequence-to-sequence models are utilized to produce relevant textual descriptions based on features extracted from video frames using Convolutional Neural Networks (CNNs). Additionally, a Nepali video captioning dataset is created by adapting the Microsoft Research Video Description Corpus (MSVD) datasets through Google Translate, followed by manual post-editing. The efficiency of the model for video captioning in Nepali is demonstrated using BLEU, METEOR, and ROUGE metrics to assess its performance.

Downloads

Abstract

108

PDF

78

Video Captioning in Nepali Using Encoder Decoder

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue