Comparing Different KNN Parameters Based on Woman Risk Factors to Predict the Cervical Cancer
DOI:
https://doi.org/10.30871/jaic.v9i5.10746Keywords:
healthinformatics, KNN, Cervical Cancer, minkowski, classificationAbstract
Cervical cancer remains a major cause of mortality among women, particularly in low-resource regions where access to conventional screening is limited. Early detection through predictive modeling offers a low-cost and non-invasive alternative to clinical diagnostics. This study aims to evaluate the effectiveness of the k-Nearest Neighbors algorithm for predicting cervical cancer risk using behavioral and psychosocial attributes. The research utilized the publicly available Sobar cervical cancer behavioral dataset comprising 72 instances with 18 input features and a binary target label. Data preprocessing included removal of incomplete records, encoding of categorical variables, and normalization. The algorithm was tested across varying numbers of neighbors and distance metrics, with performance evaluated using 10-fold cross-validation and multiple classification metrics. The optimal configuration was achieved with three neighbors and the Manhattan distance metric, yielding an accuracy of 93.06%, sensitivity of 93.10%, specificity of 85.90%, precision of 93.10%, F1-score of 92.90%, and an area under the curve of 0.8952. This performance surpassed the reported baseline of a probabilistic classifier and demonstrated the algorithm’s capability to capture complex behavioral patterns associated with cervical cancer risk. These findings confirm the feasibility of applying optimized instance-based learning to behavioral data for early cancer risk assessment. The approach offers potential for integration into community health programs to support early detection and prevention strategies.
Downloads
References
[1] N. Razali, S. A. Mostafa, A. Mustapha, and M. Helmy, “Risk Factors of Cervical Cancer using Classification in Data Mining,” IJECTS 2019, 2020, doi: 10.1088/1742-6596/1529/2/022102.
[2] P. R. Garg et al., “Women’s Knowledge on Cervical Cancer Risk Factors and Symptoms: A Cross Sectional Study from Urban India,” Asian Pacific J. Cancer Prev., vol. 23, no. 3, pp. 1083–1090, 2022, doi: 10.31557/APJCP.2022.23.3.1083.
[3] J. Lu, E. Song, A. Ghoneim, and M. Alrashoud, “Machine learning for assisting cervical cancer diagnosis: An ensemble approach,” in Future Generation Computer Systems, 2020, vol. 106, pp. 199–205, doi: 10.1016/j.future.2019.12.033.
[4] S. Setiawati and Y. Hapsari, “Clinical Manifestations, Diagnosis, Management and Prevention of Cervical Cancer,” J. Biol. Trop., vol. 23, no. 4, pp. 382–390, 2023, doi: 10.29303/jbt.v23i4.5594.
[5] S. Gupta and M. K. Gupta, “Computational Prediction of Cervical Cancer Diagnosis Using Ensemble-Based Classification Algorithm,” Comput. J., vol. 65, no. 6, pp. 1527–1539, 2022, doi: 10.1093/comjnl/bxaa198.
[6] S. Zhang, H. Xu, L. Zhang, and Y. Qiao, “Cervical cancer: Epidemiology, risk factors and screening,” Chinese J. Cancer Res., vol. 32, no. 6, pp. 720–728, 2020, doi: 10.21147/j.issn.1000-9604.2020.06.05.
[7] M. Musthofa and M. Anshori, “Comparing Discriminant Analysis Function for Early Prediction of Smartphone Addiction,” J. Enhanc. Stud. Informatics Comput. Appl., vol. 2, no. 1, pp. 1–7, 2025, doi: https://doi.org/10.47794/jesica.v2i1.12.
[8] H. G. Ahmad and M. J. Shah, “Prediction of Cardiovascular Diseases ( CVDs ) Using Machine Learning Techniques in Health,” vol. 4, no. 2, pp. 267–279, 2021.
[9] Sobar, R. Machmud, and A. Wijaya, “Behavior determinant based cervical cancer early detection with machine learning algorithm,” Adv. Sci. Lett., vol. 22, no. 10, pp. 3120–3123, 2016, doi: 10.1166/asl.2016.7980.
[10] S. I. Journal, “Cervical Cancer Cell Prediction using Machine Learning Classification Algorithms Cervical Cancer Cell Prediction using,” Eng. Sci. Int. J., vol. 8, no. 1, pp. 25–29, 2021, doi: 10.30726/esij/v8.i1.2021.81006.
[11] A. F. Gündüz and A. Karcı, “Heart Sound Classification for Murmur Abnormality Detection Using an Ensemble Approach Based on Traditional Classifiers and Feature Sets,” Anatol. J. Comput. Sci., vol. 5, no. 1, pp. 1–13, 2020.
[12] E. Najwaini, Thomas Edyson Tarigan, Fajri Profesio Putra, and Sulistyowati, “Application of the K-Nearest Neighbors (KNN) Algorithm on the Brain Tumor Dataset,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 1, pp. 18–26, 2023, doi: 10.56705/ijaimi.v1i1.85.
[13] F. N. Yahya, M. Anshori, and A. N. Khudori, “Evaluasi Performa XGBoost dengan Oversampling dan Hyperparameter Tuning untuk Prediksi Alzheimer,” Techno.Com, vol. 24, no. 1, pp. 1–12, 2025, doi: 10.62411/tc.v24i1.12057.
[14] Y. Dimas Pratama and A. Salam, “Comparison of Data Normalization Techniques on KNN Classification Performance for Pima Indians Diabetes Dataset,” J. Appl. Informatics Comput., vol. 9, no. 3, p. 693, 2025, doi: 10.30871/jaic.v9i3.9353.
[15] R. Katarya and S. Jain, “Comparison of different machine learning models for diabetes detection,” Proc. 2020 IEEE Int. Conf. Adv. Dev. Electr. Electron. Eng. ICADEE 2020, no. Icadee, pp. 0–4, 2020, doi: 10.1109/ICADEE51157.2020.9368899.
[16] M. Anshori, F. Mar’i, and F. A. Bachtiar, “Comparison of Machine Learning Methods for Android Malicious Software Classification based on System Call,” Proc. 2019 4th Int. Conf. Sustain. Inf. Eng. Technol. SIET 2019, pp. 343–348, 2019, doi: 10.1109/SIET48054.2019.8985998.
[17] R. Katarya and S. Maan, “Predicting mental health disorders using machine learning for employees in technical and non-technical companies,” Proc. 2020 IEEE Int. Conf. Adv. Dev. Electr. Electron. Eng. ICADEE 2020, no. Icadee, 2020, doi: 10.1109/ICADEE51157.2020.9368923.
[18] M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, no. May, p. 100071, 2022, doi: 10.1016/j.dajour.2022.100071.
[19] N. M. Mathkunti and S. Rangaswamy, “Machine Learning Techniques to Identify Dementia,” SN Comput. Sci., vol. 1, no. 3, pp. 1–6, 2020, doi: 10.1007/s42979-020-0099-4.
[20] N. Wulandari, Y. Cahyana, and H. Hikmayanti Handayani, “Sentiment Analysis on the Relocation of the National Capital (IKN) on Social Media X Using Naive Bayes and K-Nearest Neighbor (KNN) Methods,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 724–731, 2025, doi: 10.30871/jaic.v9i3.9552.
[21] R. Choirunnisa, M. Anshori, and W. T. Kusuma, “Improving Random Forest Evaluation in Mental Health Disorder Identification with Cross Validation,” J. Artif. Intell. Digit. Bus., vol. 4, no. 2, pp. 3526–3534, 2025.
[22] M. Anshori and M. S. Haris, “Predicting Heart Disease using Logistic Regression,” Knowl. Eng. Data Sci., vol. 5, no. 2, p. 188, 2022, doi: 10.17977/um018v5i22022p188-196.
[23] E. Frank, M. A. Hall, and I. H. Witten, “The WEKA workbench,” Data Min., pp. 553–571, 2017, doi: 10.1016/b978-0-12-804291-5.00024-6.
[24] B. P. Doppala, D. Bhattacharyya, M. Janarthanan, and N. Baik, “A Reliable Machine Intelligence Model for Accurate Identification of Cardiovascular Diseases Using Ensemble Techniques,” J. Healthc. Eng., vol. 2022, no. 1, 2022, doi: 10.1155/2022/2585235.
[25] W. Andriyani et al., Matematika Pada Kecerdasan Buatan, Pertama., vol. 7, no. 2. Makassar: CV Tohar Media, 2024.
[26] J. R. Khan, S. Chowdhury, H. Islam, and E. Raheem, “Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh,” J. Data Sci., vol. 17, no. 1, pp. 195–218, 2021, doi: 10.6339/jds.201901_17(1).0009.
[27] Z. Shapcott, An Investigation into Distance Measures in Cluster Analysis, no. April. 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Maria Claudia Saletia, Mochammad Anshori, M Syauqi Haris

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








