Hybrid GAN-Transformer For Synthetic Medical Text Generation To Address Data Scarcity In Healthcare
DOI:
https://doi.org/10.3126/jacem.v12i01.93910Keywords:
Generative Adversarial Network, Synthetic data, TransformerAbstract
Medical research often struggles to access patient data because privacy laws like HIPAA and GDPR protect sensitive health information. This study proposes a system that generates realistic synthetic medical notes without risking patient privacy. We developed a hybrid model using two neural networks: DistilGPT-2 as the text generator and BioBERT as the evaluator. BioBERT was trained on clinical notes from the MIMIC-III database. Training was completed in three phases. First, the generator learned basic medical writing and achieved a best validation perplexity of 8.87 (from an initial 9.34), with training perplexity reducing from 19.55 to 6.90. Second, we intentionally weakened the evaluator to maintain balance between the models. Third, both networks were trained at full strength with added controls. However, the evaluator became too accurate, which disrupted the training process. The generator’s validation perplexity increased to 47.05, and text diversity (Distinct-1) decreased from 0.271 (Phase 2 start) to a minimum of 0.110 (Phase 3, epoch 8).
Downloads
Downloads
Published
How to Cite
Issue
Section
License
JACEM reserves the copyright for the published papers. Author will have right to use content of the published paper in part or in full for their own work.