Abstract:
Timely identification and diagnosis of medical conditions hold paramount im
portance in averting severe health complications and optimizing healthcare
effica-cy. Machine Learning, an offshoot of Artificial Intelligence, possesses
considera-ble potential in anticipatory analysis through the integration of Data
Mining. The objective of our investigation is to establish a streamlined
mechanism for the prompt and precise identification of Type 2 diabetes by
utilizing the widely rec-ognized Pima dataset, which encompasses eight clinical
parameters. To ensure equitable consideration of all features, we employ the
"Standard scaler" technique for feature scaling. Our primary focus lies in
enhancing the accuracy of diabetes prognosis by employing supervised machine
learning methods, namely Decision Tree, Random Forest, Gradient Boosting
algorithms, and Support Vector Ma-chine. Performance evaluation encompasses
various metrics such as F1-score, MCC, and other relevant indicators. Notably,
Random Forest emerges as the most accurate model, attaining an impressive
accuracy rate of 95.24%. Moreover, to mitigate overfitting, we conduct a 5-fold
cross-validation, which further af-firms an accuracy rate of 92.55%. It is worth
highlighting that our proposed models exhibit superior accuracy in predicting
diabetes mellitus when compared to previous endeavors employing the Pima
dataset.