A comparative analysis on synthetic data generation of electronic health records using CTGAN, REaLTabFormer and TabDDPM

Authors

  • Girban Adhikari Department of Electronics and Computer Engineering, Thapathali Campus, IOE, Tribhuvan University, Nepal
  • Arjan Sapkota Department of Electronics and Computer Engineering, Thapathali Campus, IOE, Tribhuvan University, Nepal
  • Jivan Acharya Department of Electronics and Computer Engineering, Thapathali Campus, IOE, Tribhuvan University, Nepal
  • Subarna Ghimire Department of Electronics and Computer Engineering, Thapathali Campus, IOE, Tribhuvan University, Nepal
  • Umesh Kanta Ghimire Department of Electronics and Computer Engineering, Thapathali Campus, IOE, Tribhuvan University, Nepal

DOI:

https://doi.org/10.3126/jiee.v9i1.82519

Keywords:

CTGAN, Diffusion models, GAN, Synthetic data generation, Transformers

Abstract

The increasing importance of Electronic Health Records (EHR) for medical research and clinical applications necessitates the generation of high-quality synthetic data that preserves patient privacy. This study evaluates and compares the performance of Conditional Tabular Generative Adversarial Network (CTGAN), Transformers-based models (REaLTabFormer), and Diffusion Models (TabDDPM) across multiple medical datasets. Our findings demonstrate that TabDDPM consistently outperforms other models in generating synthetic data that closely mirrors real-world distributions, effectively preserving statistical properties and feature relationships. Its ability to maintain complex dependencies and capture variations in the data makes it the most reliable choice for synthetic EHR generation. While CTGAN proves to be a strong alternative, particularly excelling in certain datasets, its performance is less stable across different distributions, leading to occasional deviations from real data characteristics. REaLTabFormer, on the other hand, shows potential in specific cases but struggles to maintain statistical integrity and generalization across diverse datasets, limiting its effectiveness in some scenarios.

Downloads

Download data is not yet available.
Abstract
27
PDF
14

Downloads

Published

2026-06-01

How to Cite

Adhikari, G., Sapkota, A., Acharya, J., Ghimire, S., & Ghimire, U. K. (2026). A comparative analysis on synthetic data generation of electronic health records using CTGAN, REaLTabFormer and TabDDPM. Journal of Innovations in Engineering Education, 9(1), 173–181. https://doi.org/10.3126/jiee.v9i1.82519

Issue

Section

Articles