Diabetes Prediction Using Stacking Ensemble and MLP: A Comparative Study with Hybrid Deep Learning Models
Keywords:
Madhesh Province; Rice Productivity; Climate Extremes; Adaptive Capacity; Nepal Agriculture; Food Security.Abstract
Diabetes mellitus is one of the most prevalent chronic diseases globally, necessitating accurate predictive tools to support early clinical intervention. This paper presents a comprehensive evaluation of two advanced machine learning architectures—a Stacking Ensemble Classifier (CatBoost + LightGBM with Logistic Regression meta-learner) and a seven-layer Multilayer Perceptron (MLP) neural network—applied to the Kaggle Diabetes Prediction Dataset of approximately 100,000 patient records exhibiting a severe class imbalance of 91.5% non-diabetic vs 8.5% diabetic cases. The proposed Stacking Classifier achieves an accuracy of 97.59%, precision of 97.61%, recall of 97.59%, F1-Score of 97.59%, and AUC-ROC of 0.9974. The MLP achieves 95.48% accuracy and AUC of 0.9933. Both models are benchmarked against three hybrid deep learning models—RF+NN (96.81%), XGBoost+NN (96.75%), and Autoencoder+RF (96.58%)—demonstrating substantial superiority particularly in recall and F1-Score balance. SMOTE-based oversampling, 5-fold stratified cross-validation, and LIME explainability are integrated throughout. Confusion matrices, ROC curves, precision-recall curves, multi-metric comparisons, and LIME feature attribution plots derived from actual experimental results are presented in detail.
References
International Diabetes Federation, "IDF Diabetes Atlas, 10th edition," Brussels, Belgium: IDF, 2021.
Z. E. Rasjid, "Predictive Analytics in Healthcare: The Use of Machine Learning for Diagnoses," in Proc. ICECET, Cape Town, South Africa, Dec. 2021, pp. 1–6.
A. Barhate, P. Kumar, P. Verma, N. Jikar, A. Tale, and V. Hikre, "Smart Healthcare: Harnessing the Power of Machine Learning for Predictive Analysis," in Proc. PICET, Vadodara, India, May 2024, pp. 1–7.
S. K. Puli and P. Usha, "Transforming Healthcare: Advancements, Applications, and Future Directions of Machine Learning," in Proc. ICSCC, Bali, Indonesia, Jul. 2024, pp. 502–506.
P. Kargotra, I. R. Parray, A. Malik, and I. L. Kharisma, "Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models," Engineering Proceedings, vol. 107, no. 67, Sep. 2025. DOI: 10.3390/engproc2025107067
T. B. Sivakumar, A. Malakar, S. Lekshmi, G. Shailaja, E. Kalaivani, and K. D. Babu, "Enhanced Diabetes Prediction Using Deep Autoencoder Framework and Electronic Health Records," in Proc. ICAIT, Chikkamagaluru, India, Jul. 2024, pp. 1–5.
S. Naik, P. Kumar, S. Saha, S. D. Bairagya, D. Rawat, and S. K. Baliarsingh, "Predictive Healthcare Analytics: A Multidisease Approach Using Logistic Regression," in Proc. ICCCNT, Kamand, India, Jun. 2024, pp. 1–6.
A. Sundas, S. Badotra, G. S. Shahi, A. Verma, S. Bharany, A. O. Ibrahim, A. W. Abulfaraj, and F. Binzagr, "Smart Patient Monitoring and Recommendation (SPMR) Using Cloud Analytics and Deep Learning," IEEE Access, vol. 12, pp. 54238–54255, 2024.
S. Iqbal, G. F. Siddiqui, A. Rehman, L. Hussain, T. Saba, U. Tariq, and A. A. Abbasi, "Prostate Cancer Detection Using Deep Learning and Traditional Techniques," IEEE Access, vol. 9, pp. 27085–27100, 2021.
Shruti and N. K. Trivedi, "Predictive Analytics in Healthcare using Machine Learning," in Proc. 14th ICCCNT, Delhi, India, Jul. 2023, pp. 1–5.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
M. T. Ribeiro, S. Singh, and C. Guestrin, "'Why Should I Trust You?': Explaining the Predictions of Any Classifier," in Proc. KDD, San Francisco, CA, Aug. 2016, pp. 1135–1144.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




