PREDICTING EARLY-STAGE CERVICAL CANCER USING MACHINE LEARNING: INTEGRATING COLPOSCOPY FINDINGS AND CLINICAL DATA
By
Rakesh Kumar Saini1, Neeraj Dubey2, Arvind Kumar Yadav3 and Shailendra Jain4*
1,3Department of Physics, Maharaja Chhatrasal Bundelkhand University, Chhatarpur, Madhya Pradesh, India 471001
2OSD, Higher education, Additional Director, Sagar Division, Sagar, Madhya Pradesh, India 470001
4Eklavya University, Near Toll Plaza Sagar Road, Damoh, Madhya Pradesh, India 470661
Email: rakeshsainidec79@gmail.com, drnd9024@gmail.com, ary4861@gmail.com, shailendra.jain@eklavyauniversity.ac.in
(Received: October 09, 2023; In format: May 21, 2023; Revised: June 27, 2025; Accepted: June 30, 2025)
DOI: https://doi.org/10.58250/jnanabha_SI.2025.55108
Abstract
Cervical cancer is a major global health challenge, with early detection playing a critical role in reduCIN g morbidity and mortality. This study investigates the application of machine learning models for the early-stage detection of cervical cancer, using colposcopic findings and clinical data. To predict cervical cancer outcomes, we compared the performance of two popular machine learning algorithms, Decision Tree and Random Forest. The models were trained on a dataset containing demographic, clinical, and colposcopic data, including factors such as HPV status, lesion grade, and lesion size. The performance of both models was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). The Random Forest model outperformed the Decision Tree in all key metrics, achieving an accuracy of 89.6%, precision of 87.2%, recall of 84.5%, F1-score of 85.8%, and AUC of 0.92. In contrast, the Decision Tree model showed an accuracy of 81.4%, precision of 75.3%, recall of 72.8%,F1- score of 74.0%, and AUC of 0.85 . The results highlight that the Random Forest model is more effective at minimizing false negatives and false positives, offering improved predictive power for early cervical cancer detection. These findings suggest that machine learning, particularly ensemble methods like Random Forest, can enhance clinical decision-making and improve early detection, thereby reducing unnecessary procedures and improving patient outcomes, especially in low-resource settings. Further
research is needed to incorporate additional data and refine these models for even greater accuracy and applicability in clinical practice.
2020 Mathematical Sciences Classification: 93B45.
Keywords and Phrases: Cervical cancer, colposcopy, early detection, machine learning, predictive modeling, Pap smear, HPV