Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data

Science Communicator Platform

Share By

Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data Publisher

Allahqoli L ; Behzadi MH ; Aghamohammadi SZ ; Hakimi S ; Salehiniya H ; Fallahi A ; Rahmani A ; Shahabinia Z ; Mazidimoradi A ; Momenimovahed Z ; Ghiyasvand M

Source: International Journal of Breast Cancer Published:2026

Abstract

Objective: This study is aimed at developing and evaluating a machine learning–based model for breast cancer classification using integrated clinical, demographic, reproductive, and lifestyle data. Methods: A retrospective machine learning framework was developed using data from a case–control study conducted in Tehran. The dataset included demographic, clinical, reproductive, lifestyle, and screening-related variables. Data preprocessing was performed within machine learning pipelines to ensure data quality and prevent data leakage. Duplicate records and noninformative variables were removed. Missing values were imputed using median values for numerical variables and the most frequent value for categorical variables. Categorical features were encoded appropriately, and Min–Max normalization was applied where required. Feature selection was conducted using mutual information (MI) and analysis of variance (ANOVA) within a stratified cross-validation framework. The data were split into training (80%) and test (20%) sets. Several supervised learning algorithms, including Gaussian Naive Bayes (GNB), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), support vector machine (SVM), logistic regression (LR), and artificial neural network (ANN), were trained and evaluated. Model performance was assessed using accuracy, precision, recall (sensitivity), F1-score, and receiver operating characteristic–area under the curve (ROC-AUC), with stratified 5-fold cross-validation and final evaluation on an independent test set. Results: Significant differences were observed between breast cancer patients and healthy controls across multiple demographic and clinical variables. Patients were generally older and more likely to be widowed, belong to higher socioeconomic classes, and be housewives, whereas higher education levels and employment were more frequent among healthy individuals (p < 0.001). Reproductive factors, including age at first marriage and breastfeeding duration, also showed significant differences. Feature selection reduced 414 initial variables to 40 key predictors. The most influential features included genetic factors (BRCA1/2 mutations and family history), reproductive and hormonal characteristics (age at menarche, menopause, and infertility), lifestyle behaviors (dietary patterns and physical activity), anthropometric measures (BMI and weight at age 30), and screening-related variables (mammography, ultrasound, and biopsy). All models demonstrated strong and stable performance with minimal differences between cross-validation and test results, indicating good generalization. RF achieved the highest performance (accuracy: 0.9897, precision: 0.9946, recall: 0.9840, F1-score: 0.9892), followed by SVM and LR, whereas ANN showed the lowest overall performance. Conclusion: Machine learning models can effectively classify breast cancer using multidimensional patient data. Ensemble methods, particularly RF, demonstrated superior accuracy and robustness, highlighting their ability to capture complex nonlinear relationships. The identified predictors are consistent with established clinical and epidemiological risk factors, supporting the validity of the proposed models. These findings suggest that machine learning approaches hold strong potential for personalized risk assessment and early detection of breast cancer; however, external validation across diverse populations is necessary to confirm generalizability. Copyright © 2026 Leila Allahqoli et al. International Journal of Breast Cancer published by John Wiley & Sons Ltd.

Related Docs

View other Related Docs

1. Prediction of Breast Cancer Using Machine Learning Approaches, Journal of Biomedical Physics and Engineering (2022)

2. Prediction Breast Cancer Risk: Performance Analysis Data Mining Techniques, Frontiers in Health Informatics (2021)

3. Applications of Machine-Learning Algorithms for Prediction of Benign and Malignant Breast Lesions Using Ultrasound Radiomics Signatures: A Multi-Center Study, Biocybernetics and Biomedical Engineering (2022)

Experts (# of related papers)

View all Related Experts

Alireza Atashi (3)

Reza Safdari (3)

Other Related Docs

4. Comparison of Different Machine Learning Algorithms to Classify Patients Suspected of Having Sepsis Infection in the Intensive Care Unit, Informatics in Medicine Unlocked (2023)

5. Predicting Cardiovascular Diseases Using Imbalanced Data: An Xgboost-Based Analysis of the 2022 Brfss Dataset, American Heart Journal Plus: Cardiology Research and Practice (2026)

6. Artificial Intelligence in Breast Cancer Survival Prediction: A Comprehensive Systematic Review and Meta-Analysis, Frontiers in Oncology (2024)

7. Predicting the Risk of Mortality and Rehospitalization in Heart Failure Patients: A Retrospective Cohort Study by Machine Learning Approach, Clinical Cardiology (2024)

8. Evaluation of Machine Learning Methods for Prediction of Heart Failure Mortality and Readmission: Meta-Analysis, BMC Cardiovascular Disorders (2025)

9. Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data, Cancer Reports (2025)

10. Diagnosis of Breast Cancer Using Decision Tree, Artificial Neural Network and Naive Bayes to Provide a Native Model for Fars Province, Journal of Payavard Salamat (2019)

11. Icu Outcomes Prediction Using Optimized Extra Trees Classifier And Lasso-Based Feature Selection, Lecture Notes in Computer Science (2026)

12. Machine Learning Prediction of One-Year Mortality After Percutaneous Coronary Intervention in Acute Coronary Syndrome Patients, International Journal of Cardiology (2024)

13. Applying Data Mining Techniques to Classify Patients With Suspected Hepatitis C Virus Infection, Intelligent Medicine (2022)

14. Comparing Machine Learning Models for Predicting Mortality After Myocardial Infarction: A Systematic Review and Meta-Analysis, Archives of Academic Emergency Medicine (2026)

15. Prediction of Subsequent Fragility Fractures: Application of Machine Learning, BMC Musculoskeletal Disorders (2024)

16. Predicting In-Hospital Mortality in Patients With Acute Myocardial Infarction: A Comparison of Machine Learning Approaches, Clinical Cardiology (2025)

17. Predicting the Early Detection of Breast Cancer Using Hybrid Machine Learning Systems and Thermographic Imaging, International Journal of Imaging Systems and Technology (2024)

18. Classification of Potential Breast/Colorectal Cancer Cases Using Machine Learning Methods, International Journal of Cancer Management (2023)

19. Predictive Modeling for Acute Kidney Injury After Percutaneous Coronary Intervention in Patients With Acute Coronary Syndrome: A Machine Learning Approach, European Journal of Medical Research (2024)

20. Machine Learning Models for Predicting Sudden Sensorineural Hearing Loss Outcome: A Systematic Review, Annals of Otology# Rhinology and Laryngology (2024)

Style	Citing Format
MLA	Allahqoli L, et al.. "Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data." International Journal of Breast Cancer, vol. 2026, no. 1, 2026, pp. -.
APA	Allahqoli L, Behzadi MH, Aghamohammadi SZ, Hakimi S, Salehiniya H, Fallahi A, Rahmani A, Shahabinia Z, Mazidimoradi A, Momenimovahed Z, Ghiyasvand M (2026). Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data. International Journal of Breast Cancer, 2026(1), -.
Chicago	Allahqoli L, Behzadi MH, Aghamohammadi SZ, Hakimi S, Salehiniya H, Fallahi A, Rahmani A, et al.. "Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data." International Journal of Breast Cancer 2026, no. 1 (2026): -.
Harvard	Allahqoli L et al. (2026) 'Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data', International Journal of Breast Cancer, 2026(1), pp. -.
Vancouver	Allahqoli L, Behzadi MH, Aghamohammadi SZ, Hakimi S, Salehiniya H, Fallahi A, et al.. Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data. International Journal of Breast Cancer. 2026;2026(1):-.
BibTex	@article{ author = {Allahqoli L and Behzadi MH and Aghamohammadi SZ and Hakimi S and Salehiniya H and Fallahi A and Rahmani A and Shahabinia Z and Mazidimoradi A and Momenimovahed Z and Ghiyasvand M}, title = {Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data}, journal = {International Journal of Breast Cancer}, volume = {2026}, number = {1}, pages = {-}, year = {2026} }
RIS	TY - JOUR AU - Allahqoli L AU - Behzadi MH AU - Aghamohammadi SZ AU - Hakimi S AU - Salehiniya H AU - Fallahi A AU - Rahmani A AU - Shahabinia Z AU - Mazidimoradi A AU - Momenimovahed Z AU - Ghiyasvand M TI - Machine Learning–Based Prediction of Breast Cancer in Women: Insights From Feature Selection of Clinical and Lifestyle Data JO - International Journal of Breast Cancer VL - 2026 IS - 1 SP - EP - PY - 2026 ER -

Science Communicator Platform

Authors

Abstract