Cardiovascular diseases (CVD) are a leading global cause of death and morbidity. This study evaluates data balancing techniques (SMOTE, ENN, SMOTE-ENN, SMOTE-Tomek) and machine learning (ML) algorithms for predicting CVD risk using big data. The 2021 CDC BRFSS dataset, with 308,854 records, was preprocessed by removing missing and irrelevant data. The dataset was split into 80% training and 20% testing subsets. ML models, including logistic regression, random forest, LightGBM, XGBoost, and CatBo
