Diabetes Prediction Using Random Forest and XGBoost Machine Learning Algorithm
DOI:
https://doi.org/10.3126/joetp.v6i1.87829Keywords:
Diabetes, Machine Learning, Prediction, Random Forest, XG Boost ClassifierAbstract
Diabetes mellitus is a prevalent chronic disease with serious global health implications, where timely identification is crucial for effective management and intervention. Accurate prediction of diabetes can greatly enhance patient care by enabling prompt medical responses. In recent years, machine learning techniques have gained attention in the healthcare domain for disease prediction and prognosis. This study investigates the application of Random Forest (RF) and XGBoost (XGB) classifiers for predicting diabetes using the PIMA Indian Diabetes dataset. Data preprocessing methods—including missing value imputation, normalization, feature selection, and upsampling were applied to improve data quality and model accuracy. Hyperparameter tuning was also conducted to further optimize model performance. To enhance predictive capability, a soft voting ensemble integrating RF and XGB was developed, achieving outstanding results with an AUC of 0.91, an accuracy of 0.84, a precision of 0.80, and a recall of 0.92, indicating both strong predictive ability and reliability. The SHAP (Shapley Additive Explanations) value analysis revealed that glucose, age, and BMI were the most influential factors contributing to diabetes risk. The results highlight the potential of ensemble learning methods in healthcare analytics. this study contributes to leverage interpretable machine learning for early disease detection and informed clinical decision-making.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright © Faculty of Engineering, Far Western University