Optimization of K-Nearest Neighbors Algorithm with Cross Validation Techniques for Diabetes Prediction with Streamlit

  • Aditya Budi Prasetyo Institut Teknologi Telkom Purwokerto
  • Tri Ginanjar Laksana Institut Teknologi Telkom Purwokerto
Keywords: Cross Validation, Diabetes, K-Nearest Neighbors, Optimization, Streamlit

Abstract

The problem that occurs in the application of K-Nearest Neighbors as a classification algorithm is the frequent occurrence of overfitting in data processing. This can be overcome by using cross-validation techniques in evaluating the algorithm model and minimizing overfitting. Then the performance of diabetes prediction accuracy is unknown using the K-Nearest Neighbors algorithm with cross-validation technique. The data used comes from the National Institute of Digestive and Kidney Diabetes in 2021. The case study in this study is to find out the initial screening for diabetes is supported by the results of algorithm accuracy and real time application of streamlit-based users. The purpose of this study was to optimize the accuracy results with a cross validation technique supported by the k-nearest neighbors algorithm in the study of diabetes data. The method used is the k-nearest neighbors algorithm which is supported by cross validation technique for optimal accuracy results. Then the application of a streamlit-based interactive web application for testing the accuracy results used by the user to see the probability that the user has diabetes. The results showed that the optimization of the Cross Validation technique supported by the KNearest Neighbors algorithm model worked well. The results of the confusion matrix using the cross validation technique are more accurate in terms of the advantages of using the cross-validation technique itself. So that the classification report which has a value of 95% is more accurate than the accuracy which is worth 92% because of the use of cross-validation techniques that can minimize overfitting in addition to considerations of the accuracy value and the implementation of streamlit-based interactive web applications for user testing is going well.

Downloads

Download data is not yet available.

Author Biography

Tri Ginanjar Laksana, Institut Teknologi Telkom Purwokerto

Teknik Informatika, Fakultas Informatika, Institut Teknologi Telkom Purwokerto

References

F. D. Telaumbanua, P. Hulu, T. Z. Nadeak, R. R. Lumbantong, and A. Dharma, “Penggunaan Machine Learning Di Bidang Kesehatan,” J. Teknol. dan Ilmu Komput. Prima, vol. 2, no. 2, pp. 57–64, 2020, doi: 10.34012/jutikomp.v2i2.657.

B. Triandi, “Keamanan Informasi secara Aksiologi Dalam Menghadapi Era Revolusi Industri 4.0,” Jurikom, vol. 6, no. 5, pp. 477–483, 2019, [Online]. Available: http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom%7CPage477.

M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh, and N. Kumar, “Latest trends on heart disease prediction using machine learning and image fusion,” Mater. Today Proc., vol. 37, no. Part 2, pp. 3213–3218, 2020, doi: 10.1016/j.matpr.2020.09.078.

Y. Jeevan Nagendra Kumar, N. Kameswari Shalini, P. K. Abhilash, K. Sandeep, and D. Indira, “Prediction of diabetes using machine learning,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 7, pp. 2547–2551, 2019, doi: 10.35940/ijrte.e6290.018520.

B. Pranto, S. M. Mehnaz, E. B. Mahid, I. M. Sadman, A. Rahman, and S. Momen, “Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh,” Inf., vol. 11, no. 8, 2020, doi: 10.3390/INFO11080374.

A. Maulida, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020.

M. F. Faruque, Asaduzzaman, S. M. M. Hossain, M. H. Furhad, and I. H. Sarker, “Predicting diabetes mellitus and analysing risk-factors correlation,” EAI Endorsed Trans. Pervasive Heal. Technol., vol. 5, no. 20, pp. 1–15, 2020, doi: 10.4108/eai.13-7-2018.164173.

M. Atif, J. Siddiqui, F. Talib, S. S. Sohail, J. Hamdard, and N. Delhi, “Applications of Machine Learning Techniques for Disease Diagnosis : a review,” vol. 7, no. 17, pp. 2652–2661, 2020.

V. V. Ramalingam, A. Dandapath, and M. Karthik Raja, “Heart disease prediction using machine learning techniques: A survey,” Int. J. Eng. Technol., vol. 7, no. 2.8 Special Issue 8, pp. 684–687, 2018, doi: 10.14419/ijet.v7i2.8.10557.

L. J. Muhammad, E. A. Algehyne, and S. S. Usman, “Predictive Supervised Machine Learning Models for Diabetes Mellitus,” SN Comput. Sci., vol. 1, no. 5, pp. 1–10, 2020, doi: 10.1007/s42979-020-00250-8.

H. Torkey, E. Ibrahim, E. E.-D. Hemdan, A. El-Sayed, and M. A. Shouman, “Diabetes classification application with efficient missing and outliers data handling algorithms,” Complex Intell. Syst., no. 0123456789, 2021, doi: 10.1007/s40747-021-00349-2.

P. S. Nugroho, N. A. Tianingrum, S. Sunarti, A. Rachman, D. S. Fahrurodzi, and R. Amiruddin, “Predictor risk of diabetes mellitus in Indonesia, based on national health survey,” Malaysian J. Med. Heal. Sci., vol. 16, no. 1, pp. 126–130, 2020.

S. Kohsaka, N. Morita, S. Okami, Y. Kidani, and T. Yajima, “Current trends in diabetes mellitus database research in Japan,” Diabetes, Obes. Metab., vol. 23, no. S2, pp. 3–18, 2021, doi: 10.1111/dom.14325.

Kemenkes, “Infodatin tetap produktif, cegah, dan atasi Diabetes Melitus 2020,” Pusat Data dan Informasi Kementrian Kesehatan RI. pp. 1–10, 2020, [Online]. Available: https://pusdatin.kemkes.go.id/resources/download/pusdatin/infodatin/Infodatin-2020-Diabetes-Melitus.pdf.

M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of significant features and data mining techniques in predicting heart disease,” Telemat. Informatics, vol. 36, pp. 82–93, 2019, doi: 10.1016/j.tele.2018.11.007.

S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019, doi: 10.1186/s12911-019-1004-8.

D. Dahiwade, G. Patle, and E. Meshram, “Designing disease prediction model using machine learning approach,” Proc. 3rd Int. Conf. Comput. Methodol. Commun. ICCMC 2019, no. Iccmc, pp. 1211–1215, 2019, doi: 10.1109/ICCMC.2019.8819782.

K. M. F. Fuhad, J. F. Tuba, M. R. A. Sarker, S. Momen, N. Mohammed, and T. Rahman, “Detection from Blood Smear and Its Smartphone Based Application,” Diagnostics, vol. 10, no. 329, 2020.

J. H. Joloudari et al., “Coronary artery disease diagnosis; ranking the significant features using a random trees model,” Int. J. Environ. Res. Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17030731.

U. S. Department of Health and Human Services, “National Institute of Diabetes and Digestive and Kidney Diseases.” https://www.niddk.nih.gov/ (accessed Jul. 13, 2021).

A. D. Kumari, J. P. Kumar, V. S. Prakash, and K. S. Divya, “Supervised Learning Algorithms : A Comparison,” vol. 1, no. 1, pp. 1–12, 2020.

M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, no. xxxx, 2021, doi: 10.1016/j.icte.2021.05.001.

G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” 2019 IEEE 5th Int. Conf. Converg. Technol. I2CT 2019, pp. 9–13, 2019, doi: 10.1109/I2CT45611.2019.9033691.

A. Saxena, M. Dhadwal, and M. Kowsigan, “Indian Crop Production : Prediction And Model Deployment Using Ml And Streamlit,” vol. 32, no. 3, pp. 1874–1886, 2020.

S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning,” 2018, [Online]. Available: http://arxiv.org/abs/1811.12808.

Published
2022-12-08
How to Cite
[1]
A. Prasetyo and T. Laksana, “Optimization of K-Nearest Neighbors Algorithm with Cross Validation Techniques for Diabetes Prediction with Streamlit”, JAIC, vol. 6, no. 2, pp. 194-204, Dec. 2022.
Section
Articles