Deep Fake Audio Detection Using a Hybrid CNN-BiLSTM Model with Attention Mechanism

Authors

  • Shubham Chapagain Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
  • Bishal Thapa Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
  • Shubham Man Singh Baidhya Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
  • Smriti B.K Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
  • Shrawan Thapa Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal

DOI:

https://doi.org/10.3126/injet.v2i2.78619

Keywords:

: Deepfake Audio Detection, CNN-BiLSTM Hybrid Model, Synthetic Speech, Audio Authenticity

Abstract

The increasing sophistication of deepfake technologies has raised significant concerns regarding the authenticity of audio files. To address this challenge, we propose a deepfake audio detection system that employs a CNN-BiLSTM hybrid model for identifying synthetic speech. The system processes audio files by converting them into Mel-Spectrograms, which effectively capture the unique spectral features distinguishing real voices from fake ones. The processed data is then fed into the CNN-BiLSTM model, which leverages the strengths of Convolutional Neural Networks (CNNs) for spatial pattern recognition and Bidirectional Long Short-Term Memory Networks (BiLSTMs) for capturing long-term dependencies in the temporal sequence of the audio data. The model, trained on dataset, achieves an accuracy of 95%, effectively detecting subtle irregularities indicative of deepfake audio. The system provides users with a comprehensive analysis, including a confidence score and detailed insights into the authenticity of the audio, offering an effective tool for distinguishing real from fake audio. Our system combines cutting-edge machine learning technology with a user-friendly interface, making it both highly effective and accessible for practical applications in deepfake detection. Unlike existing systems that rely solely on either CNN or RNN architectures, our approach integrates both to enhance detection accuracy, particularly for complex and subtle manipulations.  Additionally, we introduce a confidence scoring mechanism and insight analysis that provides users with transparent reasoning behind each detection.

Downloads

Download data is not yet available.
Abstract
401
PDF
259

Downloads

Published

2025-05-19

How to Cite

Chapagain, S., Thapa, B., Baidhya, S. M. S., B.K, S., & Thapa, S. (2025). Deep Fake Audio Detection Using a Hybrid CNN-BiLSTM Model with Attention Mechanism. International Journal on Engineering Technology, 2(2), 204–214. https://doi.org/10.3126/injet.v2i2.78619

Issue

Section

Articles