Deep Fake Audio Detection Using a Hybrid CNN-BiLSTM Model with Attention Mechanism

Shubham Chapagain; Bishal Thapa; Shubham Man Singh Baidhya; Smriti B.K; Shrawan Thapa

doi:10.3126/injet.v2i2.78619

Authors

Shubham Chapagain Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Bishal Thapa Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Shubham Man Singh Baidhya Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Smriti B.K Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Shrawan Thapa Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal

DOI:

https://doi.org/10.3126/injet.v2i2.78619

Keywords:

: Deepfake Audio Detection, CNN-BiLSTM Hybrid Model, Synthetic Speech, Audio Authenticity

Abstract

The increasing sophistication of deepfake technologies has raised significant concerns regarding the authenticity of audio files. To address this challenge, we propose a deepfake audio detection system that employs a CNN-BiLSTM hybrid model for identifying synthetic speech. The system processes audio files by converting them into Mel-Spectrograms, which effectively capture the unique spectral features distinguishing real voices from fake ones. The processed data is then fed into the CNN-BiLSTM model, which leverages the strengths of Convolutional Neural Networks (CNNs) for spatial pattern recognition and Bidirectional Long Short-Term Memory Networks (BiLSTMs) for capturing long-term dependencies in the temporal sequence of the audio data. The model, trained on dataset, achieves an accuracy of 95%, effectively detecting subtle irregularities indicative of deepfake audio. The system provides users with a comprehensive analysis, including a confidence score and detailed insights into the authenticity of the audio, offering an effective tool for distinguishing real from fake audio. Our system combines cutting-edge machine learning technology with a user-friendly interface, making it both highly effective and accessible for practical applications in deepfake detection. Unlike existing systems that rely solely on either CNN or RNN architectures, our approach integrates both to enhance detection accuracy, particularly for complex and subtle manipulations. Additionally, we introduce a confidence scoring mechanism and insight analysis that provides users with transparent reasoning behind each detection.

Downloads

Download data is not yet available.

Abstract

461

PDF

323

Deep Fake Audio Detection Using a Hybrid CNN-BiLSTM Model with Attention Mechanism

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information