Optimization of K-Nearest Neighbors Algorithm with Cross Validation Techniques for Diabetes Prediction with Streamlit
Abstract
The problem that occurs in the application of K-Nearest Neighbors as a classification algorithm is the frequent occurrence of overfitting in data processing. This can be overcome by using cross-validation techniques in evaluating the algorithm model and minimizing overfitting. Then the performance of diabetes prediction accuracy is unknown using the K-Nearest Neighbors algorithm with cross-validation technique. The data used comes from the National Institute of Digestive and Kidney Diabetes in 2021. The case study in this study is to find out the initial screening for diabetes is supported by the results of algorithm accuracy and real time application of streamlit-based users. The purpose of this study was to optimize the accuracy results with a cross validation technique supported by the k-nearest neighbors algorithm in the study of diabetes data. The method used is the k-nearest neighbors algorithm which is supported by cross validation technique for optimal accuracy results. Then the application of a streamlit-based interactive web application for testing the accuracy results used by the user to see the probability that the user has diabetes. The results showed that the optimization of the Cross Validation technique supported by the KNearest Neighbors algorithm model worked well. The results of the confusion matrix using the cross validation technique are more accurate in terms of the advantages of using the cross-validation technique itself. So that the classification report which has a value of 95% is more accurate than the accuracy which is worth 92% because of the use of cross-validation techniques that can minimize overfitting in addition to considerations of the accuracy value and the implementation of streamlit-based interactive web applications for user testing is going well.
Downloads
References
F. D. Telaumbanua, P. Hulu, T. Z. Nadeak, R. R. Lumbantong, and A. Dharma, “Penggunaan Machine Learning Di Bidang Kesehatan,” J. Teknol. dan Ilmu Komput. Prima, vol. 2, no. 2, pp. 57–64, 2020, doi: 10.34012/jutikomp.v2i2.657.
B. Triandi, “Keamanan Informasi secara Aksiologi Dalam Menghadapi Era Revolusi Industri 4.0,” Jurikom, vol. 6, no. 5, pp. 477–483, 2019, [Online]. Available: http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom%7CPage477.
M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh, and N. Kumar, “Latest trends on heart disease prediction using machine learning and image fusion,” Mater. Today Proc., vol. 37, no. Part 2, pp. 3213–3218, 2020, doi: 10.1016/j.matpr.2020.09.078.
Y. Jeevan Nagendra Kumar, N. Kameswari Shalini, P. K. Abhilash, K. Sandeep, and D. Indira, “Prediction of diabetes using machine learning,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 7, pp. 2547–2551, 2019, doi: 10.35940/ijrte.e6290.018520.
B. Pranto, S. M. Mehnaz, E. B. Mahid, I. M. Sadman, A. Rahman, and S. Momen, “Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh,” Inf., vol. 11, no. 8, 2020, doi: 10.3390/INFO11080374.
A. Maulida, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020.
M. F. Faruque, Asaduzzaman, S. M. M. Hossain, M. H. Furhad, and I. H. Sarker, “Predicting diabetes mellitus and analysing risk-factors correlation,” EAI Endorsed Trans. Pervasive Heal. Technol., vol. 5, no. 20, pp. 1–15, 2020, doi: 10.4108/eai.13-7-2018.164173.
M. Atif, J. Siddiqui, F. Talib, S. S. Sohail, J. Hamdard, and N. Delhi, “Applications of Machine Learning Techniques for Disease Diagnosis : a review,” vol. 7, no. 17, pp. 2652–2661, 2020.
V. V. Ramalingam, A. Dandapath, and M. Karthik Raja, “Heart disease prediction using machine learning techniques: A survey,” Int. J. Eng. Technol., vol. 7, no. 2.8 Special Issue 8, pp. 684–687, 2018, doi: 10.14419/ijet.v7i2.8.10557.
L. J. Muhammad, E. A. Algehyne, and S. S. Usman, “Predictive Supervised Machine Learning Models for Diabetes Mellitus,” SN Comput. Sci., vol. 1, no. 5, pp. 1–10, 2020, doi: 10.1007/s42979-020-00250-8.
H. Torkey, E. Ibrahim, E. E.-D. Hemdan, A. El-Sayed, and M. A. Shouman, “Diabetes classification application with efficient missing and outliers data handling algorithms,” Complex Intell. Syst., no. 0123456789, 2021, doi: 10.1007/s40747-021-00349-2.
P. S. Nugroho, N. A. Tianingrum, S. Sunarti, A. Rachman, D. S. Fahrurodzi, and R. Amiruddin, “Predictor risk of diabetes mellitus in Indonesia, based on national health survey,” Malaysian J. Med. Heal. Sci., vol. 16, no. 1, pp. 126–130, 2020.
S. Kohsaka, N. Morita, S. Okami, Y. Kidani, and T. Yajima, “Current trends in diabetes mellitus database research in Japan,” Diabetes, Obes. Metab., vol. 23, no. S2, pp. 3–18, 2021, doi: 10.1111/dom.14325.
Kemenkes, “Infodatin tetap produktif, cegah, dan atasi Diabetes Melitus 2020,” Pusat Data dan Informasi Kementrian Kesehatan RI. pp. 1–10, 2020, [Online]. Available: https://pusdatin.kemkes.go.id/resources/download/pusdatin/infodatin/Infodatin-2020-Diabetes-Melitus.pdf.
M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of significant features and data mining techniques in predicting heart disease,” Telemat. Informatics, vol. 36, pp. 82–93, 2019, doi: 10.1016/j.tele.2018.11.007.
S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019, doi: 10.1186/s12911-019-1004-8.
D. Dahiwade, G. Patle, and E. Meshram, “Designing disease prediction model using machine learning approach,” Proc. 3rd Int. Conf. Comput. Methodol. Commun. ICCMC 2019, no. Iccmc, pp. 1211–1215, 2019, doi: 10.1109/ICCMC.2019.8819782.
K. M. F. Fuhad, J. F. Tuba, M. R. A. Sarker, S. Momen, N. Mohammed, and T. Rahman, “Detection from Blood Smear and Its Smartphone Based Application,” Diagnostics, vol. 10, no. 329, 2020.
J. H. Joloudari et al., “Coronary artery disease diagnosis; ranking the significant features using a random trees model,” Int. J. Environ. Res. Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17030731.
U. S. Department of Health and Human Services, “National Institute of Diabetes and Digestive and Kidney Diseases.” https://www.niddk.nih.gov/ (accessed Jul. 13, 2021).
A. D. Kumari, J. P. Kumar, V. S. Prakash, and K. S. Divya, “Supervised Learning Algorithms : A Comparison,” vol. 1, no. 1, pp. 1–12, 2020.
M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, no. xxxx, 2021, doi: 10.1016/j.icte.2021.05.001.
G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” 2019 IEEE 5th Int. Conf. Converg. Technol. I2CT 2019, pp. 9–13, 2019, doi: 10.1109/I2CT45611.2019.9033691.
A. Saxena, M. Dhadwal, and M. Kowsigan, “Indian Crop Production : Prediction And Model Deployment Using Ml And Streamlit,” vol. 32, no. 3, pp. 1874–1886, 2020.
S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning,” 2018, [Online]. Available: http://arxiv.org/abs/1811.12808.
Copyright (c) 2022 Aditya Budi Prasetyo, Tri Ginanjar Laksana
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).