A comparative analysis on synthetic data generation of electronic health records using CTGAN, REaLTabFormer and TabDDPM
DOI:
https://doi.org/10.3126/jiee.v9i1.82519Keywords:
CTGAN, Diffusion models, GAN, Synthetic data generation, TransformersAbstract
The increasing importance of Electronic Health Records (EHR) for medical research and clinical applications necessitates the generation of high-quality synthetic data that preserves patient privacy. This study evaluates and compares the performance of Conditional Tabular Generative Adversarial Network (CTGAN), Transformers-based models (REaLTabFormer), and Diffusion Models (TabDDPM) across multiple medical datasets. Our findings demonstrate that TabDDPM consistently outperforms other models in generating synthetic data that closely mirrors real-world distributions, effectively preserving statistical properties and feature relationships. Its ability to maintain complex dependencies and capture variations in the data makes it the most reliable choice for synthetic EHR generation. While CTGAN proves to be a strong alternative, particularly excelling in certain datasets, its performance is less stable across different distributions, leading to occasional deviations from real data characteristics. REaLTabFormer, on the other hand, shows potential in specific cases but struggles to maintain statistical integrity and generalization across diverse datasets, limiting its effectiveness in some scenarios.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 JIEE and the authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Upon acceptance of an article, the copyright for the published works remains in the JIEE, Thapathali Campus and the authors.