Bagging Nearest Neighbor and its Enhancement for Machinery Predictive Maintenance

  • Muhammad Irfan Arisani Universitas Dian Nuswantoro
  • Muljono Muljono Universitas Dian Nuswantoro
Keywords: Bagging Method, Binary Classification, Machine Learning, Nearest Neighbor

Abstract

K-nearest Neighbor is a simple algorithm in Machine learning for such a prediction classification task which plays in valuable aspects of understanding big data. However, this algorithm sometimes does a lacking job of classification tasks for many different dataset characteristics. Therefore, this study will adopt enhancement methods to create a better performance of the nearest-neighbor model. Thus, this study focused on nearest neighbor enhancement to do a binary classification task from the extremely unbalanced dataset of a machine failure problem. Firstly, this study will create new features from the machinery dataset through the feature engineering processes and transform the chosen numerical features with standardization steps as the proper scaling. Then, the modified under-sampling method will be given which will reduce the amount of the majority class to 4.75 times that of the minority class. Next is the applied grid-search tuning which will find the right parameter combinations for the nearest-neighbor model being applied. Furthermore, the previous pre-processing steps will be combined with an additional bagging method. Finally, the resulting bagged KNN will present a 0.971 rate of accuracy, 0.555 rate of precision, 0.781 rate of recall, 0.649 rate of f1-score, 0.95 auc of ROC curve, and 0.702 auc of precision-recall curve.

Downloads

Download data is not yet available.

References

M. O. K. Mendonça, S. L. Netto, P. S. R. Diniz, and S. Theodoridis, “Machine learning: Review and Trends,” Signal Processing and Machine Learning Theory, pp. 869–959, Jan. 2024, doi: 10.1016/B978-0-32-391772-8.00019-3.

T. W. Edgar and D. O. Manz, “Machine Learning,” Research Methods for Cyber Security, pp. 153–173, Jan. 2017, doi: 10.1016/B978-0-12-805349-2.00006-6.

R. Detrano et al., “International application of a new probability algorithm for the diagnosis of coronary artery disease,” Am J Cardiol, vol. 64, no. 5, pp. 304–310, Aug. 1989, doi: 10.1016/0002-9149(89)90524-9.

B. Abbasi and D. M. Goldenholz, “Machine learning applications in epilepsy,” Epilepsia, vol. 60, no. 10, pp. 2037–2047, Oct. 2019, doi: 10.1111/EPI.16333.

M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, “Machine Learning for industrial applications: A comprehensive literature review,” Expert Syst Appl, vol. 175, Aug. 2021, doi: 10.1016/J.ESWA.2021.114820.

S. Matzka, “Explainable Artificial Intelligence for Predictive Maintenance Applications,” Proceedings - 2020 3rd International Conference on Artificial Intelligence for Industries, AI4I 2020, pp. 69–74, Sep. 2020, doi: 10.1109/AI4I49448.2020.00023.

N. Sharma, R. Sharma, and N. Jindal, “Machine Learning and Deep Learning Applications-A Vision,” Global Transitions Proceedings, vol. 2, no. 1, pp. 24–28, Jun. 2021, doi: 10.1016/J.GLTP.2021.01.004.

D. Lopez-Bernal, D. Balderas, P. Ponce, and A. Molina, “Education 4.0: Teaching the Basics of KNN, LDA and Simple Perceptron Algorithms for Binary Classification Problems,” Future Internet 2021, Vol. 13, Page 193, vol. 13, no. 8, p. 193, Jul. 2021, doi: 10.3390/FI13080193.

A. Enhanced Student et al., “Enhanced Student Admission Procedures at Universities Using Data Mining and Machine Learning Techniques,” Applied Sciences 2024, Vol. 14, Page 1109, vol. 14, no. 3, p. 1109, Jan. 2024, doi: 10.3390/APP14031109.

G. Fischer et al., “Classification of the Pathological Range of Motion in Low Back Pain Using Wearable Sensors and Machine Learning,” Sensors 2024, Vol. 24, Page 831, vol. 24, no. 3, p. 831, Jan. 2024, doi: 10.3390/S24030831.

N. Bhatia, “Survey of Nearest Neighbor Techniques,” IJCSIS) International Journal of Computer Science and Information Security, vol. 8, no. 2, 2010, Accessed: Jan. 31, 2024. [Online]. Available: http://sites.google.com/site/ijcsis/

V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies, ICCTCT 2018, Nov. 2018, doi: 10.1109/ICCTCT.2018.8551020.

L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324/METRICS.

K. Chan-Bagot et al., “Integrating SAR, Optical, and Machine Learning for Enhanced Coastal Mangrove Monitoring in Guyana,” Remote Sensing 2024, Vol. 16, Page 542, vol. 16, no. 3, p. 542, Jan. 2024, doi: 10.3390/RS16030542.

D. Z. Syeda and M. N. Asghar, “Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning,” Applied Sciences 2024, Vol. 14, Page 1015, vol. 14, no. 3, p. 1015, Jan. 2024, doi: 10.3390/APP14031015.

G. Biau and E. Scornet, “A Random Forest Guided Tour,” Test, vol. 25, no. 2, pp. 197–227, Nov. 2015, doi: 10.1007/s11749-016-0481-7.

N. Sharma, T. Sidana, S. Singhal, and S. Jindal, “Predictive Maintenance: Comparative Study of Machine Learning Algorithms for Fault Diagnosis,” SSRN Electronic Journal, Jun. 2022, doi: 10.2139/SSRN.4143868.

K. Chen and Y. Jin, “An ensemble learning algorithm based on Lasso selection,” Proceedings - 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010, vol. 1, pp. 617–620, 2010, doi: 10.1109/ICICISYS.2010.5658515.

Moch. Lutfi, “Implementasi Metode K-Nearest Neighbor dan Bagging Untuk Klasifikasi Mutu Produksi Jagung,” agromix, vol. 10, no. 2, 2019, doi: 10.35891/agx.v10i2.1636.

M. I. Arisani and M. Muljono, “Peningkatan Kinerja K-Nearest Neighbor menggunakan Bagging pada Permasalahan Ragam Kelas terhadap Pemeliharaan Prediktif Permesinan,” JUSTIN (Jurnal Sistem dan Teknologi Informasi), vol. 12, no. 2, pp. 373–379, Apr. 2024, doi: 10.26418/JUSTIN.V12I2.78503.

IEEE Computer Society and Institute of Electrical and Electronics Engineers., “2020 Third International Conference on Artificial Intelligence for Industries : AI4I 2020 : proceedings : virtual conference, 21-23 September 2020.,” p. 83.

L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The Choice of ScalingTtechnique Matters for Classification Performance,” Appl Soft Comput, vol. 133, Dec. 2022, doi: 10.1016/j.asoc.2022.109924.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.

D. M. Belete and M. D. Huchaiah, “Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results,” International Journal of Computers and Applications, vol. 44, no. 9, pp. 875–886, 2022, doi: 10.1080/1206212X.2021.1974663.

I. Muhamad and M. Matin, “A Hyperparameter Tuning Using GridsearchCV on Random Forest for Malware Detection,” MULTINETICS , vol. 9, no. 1, pp. 43–50, May 2023, doi: 10.32722/MULTINETICS.V9I1.5578.

S. Faiqotul Ulya, Y. Sukestiyarno, P. Hendikawati, and D. Juli, “Analisis Prediksi Quick Count dengan Metode Stratified Random Sampling dan Estimasi Confidence Interval Menggunakan Metode Maksimum Likelihood,” Unnes Journal of Mathematics, vol. 7, no. 1, pp. 108–119, Nov. 2018, doi: 10.15294/UJM.V7I1.27385.

Y. Miftahuddin, S. Umaroh, and F. R. Karim, “Perbandingan Metode Perhitungan Jarak Euclidean, Haversine, dan Manhattan dalam Penentuan Posisi Karyawan,” Jurnal Tekno Insentif, vol. 14, no. 2, pp. 69–77, Aug. 2020, doi: 10.36787/jti.v14i2.270.

D. N. S. Gonçalves, C. D. M. Gonçalves, T. F. De Assis, and M. A. Da Silva, “Analysis of the Difference between the Euclidean Distance and the Actual Road Distance in Brazil,” Transportation Research Procedia, vol. 3, pp. 876–885, Jan. 2014, doi: 10.1016/J.TRPRO.2014.10.066.

K. Hajian-Tilaki, “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation,” Caspian J Intern Med, vol. 4, no. 2, p. 627, 2013, Accessed: Feb. 29, 2024. [Online]. Available: /pmc/articles/PMC3755824/

J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” ACM International Conference Proceeding Series, vol. 148, pp. 233–240, 2006, doi: 10.1145/1143844.1143874.

Published
2024-08-13
How to Cite
[1]
M. Arisani and M. Muljono, “Bagging Nearest Neighbor and its Enhancement for Machinery Predictive Maintenance”, JAIC, vol. 8, no. 2, pp. 248-256, Aug. 2024.
Section
Articles