Hybrid GAN-Transformer For Synthetic Medical Text Generation To Address Data Scarcity In Healthcare

Authors

  • Rajesh Raskoti Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal
  • Kobid Karkee Department of Electronics and Computer Engineering, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal
  • Shreekrishna Timilsina Department of Electronics and Computer Engineering, Himalaya College Of Engineering, Tribhuvan University Lalitpur, Nepal

DOI:

https://doi.org/10.3126/jacem.v12i01.93910

Keywords:

Generative Adversarial Network, Synthetic data, Transformer

Abstract

Medical research often struggles to access patient data because privacy laws like HIPAA and GDPR protect sensitive health information. This study proposes a system that generates realistic synthetic medical notes without risking patient privacy. We developed a hybrid model using two neural networks: DistilGPT-2 as the text generator and BioBERT as the evaluator. BioBERT was trained on clinical notes from the MIMIC-III database. Training was completed in three phases. First, the generator learned basic medical writing and achieved a best validation perplexity of 8.87 (from an initial 9.34), with training perplexity reducing from 19.55 to 6.90. Second, we intentionally weakened the evaluator to maintain balance between the models. Third, both networks were trained at full strength with added controls. However, the evaluator became too accurate, which disrupted the training process. The generator’s validation perplexity increased to 47.05, and text diversity (Distinct-1) decreased from 0.271 (Phase 2 start) to a minimum of 0.110 (Phase 3, epoch 8).

Downloads

Download data is not yet available.
Abstract
1
pdf
1

Author Biographies

Rajesh Raskoti, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

MSc. in informatics and intelligent Systems Engineering

Kobid Karkee, Department of Electronics and Computer Engineering, Thapathali Campus, Institute of Engineering, Tribhuvan University Kathmandu, Nepal

Asst. Professor

Shreekrishna Timilsina, Department of Electronics and Computer Engineering, Himalaya College Of Engineering, Tribhuvan University Lalitpur, Nepal

Lecturer 

Downloads

Published

2026-05-12

How to Cite

Raskoti, R., Karkee, K., & Timilsina, S. (2026). Hybrid GAN-Transformer For Synthetic Medical Text Generation To Address Data Scarcity In Healthcare . Journal of Advanced College of Engineering and Management, 12(01), 105–115. https://doi.org/10.3126/jacem.v12i01.93910

Issue

Section

Articles