Bagging Nearest Neighbor and its Enhancement for Machinery Predictive Maintenance
Abstract
K-nearest Neighbor is a simple algorithm in Machine learning for such a prediction classification task which plays in valuable aspects of understanding big data. However, this algorithm sometimes does a lacking job of classification tasks for many different dataset characteristics. Therefore, this study will adopt enhancement methods to create a better performance of the nearest-neighbor model. Thus, this study focused on nearest neighbor enhancement to do a binary classification task from the extremely unbalanced dataset of a machine failure problem. Firstly, this study will create new features from the machinery dataset through the feature engineering processes and transform the chosen numerical features with standardization steps as the proper scaling. Then, the modified under-sampling method will be given which will reduce the amount of the majority class to 4.75 times that of the minority class. Next is the applied grid-search tuning which will find the right parameter combinations for the nearest-neighbor model being applied. Furthermore, the previous pre-processing steps will be combined with an additional bagging method. Finally, the resulting bagged KNN will present a 0.971 rate of accuracy, 0.555 rate of precision, 0.781 rate of recall, 0.649 rate of f1-score, 0.95 auc of ROC curve, and 0.702 auc of precision-recall curve.
Downloads
References
M. O. K. Mendonça, S. L. Netto, P. S. R. Diniz, and S. Theodoridis, “Machine learning: Review and Trends,” Signal Processing and Machine Learning Theory, pp. 869–959, Jan. 2024, doi: 10.1016/B978-0-32-391772-8.00019-3.
T. W. Edgar and D. O. Manz, “Machine Learning,” Research Methods for Cyber Security, pp. 153–173, Jan. 2017, doi: 10.1016/B978-0-12-805349-2.00006-6.
R. Detrano et al., “International application of a new probability algorithm for the diagnosis of coronary artery disease,” Am J Cardiol, vol. 64, no. 5, pp. 304–310, Aug. 1989, doi: 10.1016/0002-9149(89)90524-9.
B. Abbasi and D. M. Goldenholz, “Machine learning applications in epilepsy,” Epilepsia, vol. 60, no. 10, pp. 2037–2047, Oct. 2019, doi: 10.1111/EPI.16333.
M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, “Machine Learning for industrial applications: A comprehensive literature review,” Expert Syst Appl, vol. 175, Aug. 2021, doi: 10.1016/J.ESWA.2021.114820.
S. Matzka, “Explainable Artificial Intelligence for Predictive Maintenance Applications,” Proceedings - 2020 3rd International Conference on Artificial Intelligence for Industries, AI4I 2020, pp. 69–74, Sep. 2020, doi: 10.1109/AI4I49448.2020.00023.
N. Sharma, R. Sharma, and N. Jindal, “Machine Learning and Deep Learning Applications-A Vision,” Global Transitions Proceedings, vol. 2, no. 1, pp. 24–28, Jun. 2021, doi: 10.1016/J.GLTP.2021.01.004.
D. Lopez-Bernal, D. Balderas, P. Ponce, and A. Molina, “Education 4.0: Teaching the Basics of KNN, LDA and Simple Perceptron Algorithms for Binary Classification Problems,” Future Internet 2021, Vol. 13, Page 193, vol. 13, no. 8, p. 193, Jul. 2021, doi: 10.3390/FI13080193.
A. Enhanced Student et al., “Enhanced Student Admission Procedures at Universities Using Data Mining and Machine Learning Techniques,” Applied Sciences 2024, Vol. 14, Page 1109, vol. 14, no. 3, p. 1109, Jan. 2024, doi: 10.3390/APP14031109.
G. Fischer et al., “Classification of the Pathological Range of Motion in Low Back Pain Using Wearable Sensors and Machine Learning,” Sensors 2024, Vol. 24, Page 831, vol. 24, no. 3, p. 831, Jan. 2024, doi: 10.3390/S24030831.
N. Bhatia, “Survey of Nearest Neighbor Techniques,” IJCSIS) International Journal of Computer Science and Information Security, vol. 8, no. 2, 2010, Accessed: Jan. 31, 2024. [Online]. Available: http://sites.google.com/site/ijcsis/
V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies, ICCTCT 2018, Nov. 2018, doi: 10.1109/ICCTCT.2018.8551020.
L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324/METRICS.
K. Chan-Bagot et al., “Integrating SAR, Optical, and Machine Learning for Enhanced Coastal Mangrove Monitoring in Guyana,” Remote Sensing 2024, Vol. 16, Page 542, vol. 16, no. 3, p. 542, Jan. 2024, doi: 10.3390/RS16030542.
D. Z. Syeda and M. N. Asghar, “Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning,” Applied Sciences 2024, Vol. 14, Page 1015, vol. 14, no. 3, p. 1015, Jan. 2024, doi: 10.3390/APP14031015.
G. Biau and E. Scornet, “A Random Forest Guided Tour,” Test, vol. 25, no. 2, pp. 197–227, Nov. 2015, doi: 10.1007/s11749-016-0481-7.
N. Sharma, T. Sidana, S. Singhal, and S. Jindal, “Predictive Maintenance: Comparative Study of Machine Learning Algorithms for Fault Diagnosis,” SSRN Electronic Journal, Jun. 2022, doi: 10.2139/SSRN.4143868.
K. Chen and Y. Jin, “An ensemble learning algorithm based on Lasso selection,” Proceedings - 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010, vol. 1, pp. 617–620, 2010, doi: 10.1109/ICICISYS.2010.5658515.
Moch. Lutfi, “Implementasi Metode K-Nearest Neighbor dan Bagging Untuk Klasifikasi Mutu Produksi Jagung,” agromix, vol. 10, no. 2, 2019, doi: 10.35891/agx.v10i2.1636.
M. I. Arisani and M. Muljono, “Peningkatan Kinerja K-Nearest Neighbor menggunakan Bagging pada Permasalahan Ragam Kelas terhadap Pemeliharaan Prediktif Permesinan,” JUSTIN (Jurnal Sistem dan Teknologi Informasi), vol. 12, no. 2, pp. 373–379, Apr. 2024, doi: 10.26418/JUSTIN.V12I2.78503.
IEEE Computer Society and Institute of Electrical and Electronics Engineers., “2020 Third International Conference on Artificial Intelligence for Industries : AI4I 2020 : proceedings : virtual conference, 21-23 September 2020.,” p. 83.
L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The Choice of ScalingTtechnique Matters for Classification Performance,” Appl Soft Comput, vol. 133, Dec. 2022, doi: 10.1016/j.asoc.2022.109924.
R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
D. M. Belete and M. D. Huchaiah, “Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results,” International Journal of Computers and Applications, vol. 44, no. 9, pp. 875–886, 2022, doi: 10.1080/1206212X.2021.1974663.
I. Muhamad and M. Matin, “A Hyperparameter Tuning Using GridsearchCV on Random Forest for Malware Detection,” MULTINETICS , vol. 9, no. 1, pp. 43–50, May 2023, doi: 10.32722/MULTINETICS.V9I1.5578.
S. Faiqotul Ulya, Y. Sukestiyarno, P. Hendikawati, and D. Juli, “Analisis Prediksi Quick Count dengan Metode Stratified Random Sampling dan Estimasi Confidence Interval Menggunakan Metode Maksimum Likelihood,” Unnes Journal of Mathematics, vol. 7, no. 1, pp. 108–119, Nov. 2018, doi: 10.15294/UJM.V7I1.27385.
Y. Miftahuddin, S. Umaroh, and F. R. Karim, “Perbandingan Metode Perhitungan Jarak Euclidean, Haversine, dan Manhattan dalam Penentuan Posisi Karyawan,” Jurnal Tekno Insentif, vol. 14, no. 2, pp. 69–77, Aug. 2020, doi: 10.36787/jti.v14i2.270.
D. N. S. Gonçalves, C. D. M. Gonçalves, T. F. De Assis, and M. A. Da Silva, “Analysis of the Difference between the Euclidean Distance and the Actual Road Distance in Brazil,” Transportation Research Procedia, vol. 3, pp. 876–885, Jan. 2014, doi: 10.1016/J.TRPRO.2014.10.066.
K. Hajian-Tilaki, “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation,” Caspian J Intern Med, vol. 4, no. 2, p. 627, 2013, Accessed: Feb. 29, 2024. [Online]. Available: /pmc/articles/PMC3755824/
J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” ACM International Conference Proceeding Series, vol. 148, pp. 233–240, 2006, doi: 10.1145/1143844.1143874.
Copyright (c) 2024 Muhammad Irfan Arisani, Muljono Muljono
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).