Hybrid GAN-Transformer For Synthetic Medical Text Generation To Address Data Scarcity In Healthcare

Rajesh Raskoti; Kobid Karkee; Shreekrishna Timilsina

doi:10.3126/jacem.v12i01.93910

Authors

Rajesh Raskoti Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal
Kobid Karkee Department of Electronics and Computer Engineering, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal
Shreekrishna Timilsina Department of Electronics and Computer Engineering, Himalaya College Of Engineering, Tribhuvan University Lalitpur, Nepal

DOI:

https://doi.org/10.3126/jacem.v12i01.93910

Keywords:

Generative Adversarial Network, Synthetic data, Transformer

Abstract

Medical research often struggles to access patient data because privacy laws like HIPAA and GDPR protect sensitive health information. This study proposes a system that generates realistic synthetic medical notes without risking patient privacy. We developed a hybrid model using two neural networks: DistilGPT-2 as the text generator and BioBERT as the evaluator. BioBERT was trained on clinical notes from the MIMIC-III database. Training was completed in three phases. First, the generator learned basic medical writing and achieved a best validation perplexity of 8.87 (from an initial 9.34), with training perplexity reducing from 19.55 to 6.90. Second, we intentionally weakened the evaluator to maintain balance between the models. Third, both networks were trained at full strength with added controls. However, the evaluator became too accurate, which disrupted the training process. The generator’s validation perplexity increased to 47.05, and text diversity (Distinct-1) decreased from 0.271 (Phase 2 start) to a minimum of 0.110 (Phase 3, epoch 8).

Downloads

Download data is not yet available.

Abstract

70

pdf

41

Author Biographies

Rajesh Raskoti, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

MSc. in informatics and intelligent Systems Engineering

Kobid Karkee, Department of Electronics and Computer Engineering, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

Asst. Professor

Shreekrishna Timilsina, Department of Electronics and Computer Engineering, Himalaya College Of Engineering, Tribhuvan University Lalitpur, Nepal

Lecturer

Hybrid GAN-Transformer For Synthetic Medical Text Generation To Address Data Scarcity In Healthcare

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Rajesh Raskoti, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

Kobid Karkee, Department of Electronics and Computer Engineering, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

Shreekrishna Timilsina, Department of Electronics and Computer Engineering, Himalaya College Of Engineering, Tribhuvan University Lalitpur, Nepal

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue