A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning

Abstract

Diabetes is a prevalent chronic disease with significant health and economicburdens worldwide. Early prediction and diagnosis can aid in effectivemanagement and prevention of complications. This study explores the use ofmachine learning models to predict diabetes based on lifestyle factors usingdata from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 survey.The dataset consists of 21 lifestyle and health-related features, capturingaspects such as physical activity, diet, mental health, and socioeconomicstatus. Three classification models, Decision Tree, K-Nearest Neighbors (KNN),and Logistic Regression, are implemented and evaluated to determine theirpredictive performance. The models are trained and tested using a balanceddataset, and their performances are assessed based on accuracy, precision,recall, and F1-score. The results indicate that the Decision Tree, KNN, andLogistic Regression achieve an accuracy of 0.74, 0.72, and 0.75, respectively,with varying strengths in precision and recall. The findings highlight thepotential of machine learning in diabetes prediction and suggest futureimprovements through feature selection and ensemble learning techniques.