Balancing CICIoV2024 Dataset with RUS for Improved IoV Attack Detection

Muhammad David Firmansyah; Ifan Rizqa; Fauzi Adi Rafrastara

doi:10.30871/jaic.v9i2.9079

Authors

Muhammad David Firmansyah Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang
Ifan Rizqa Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang
Fauzi Adi Rafrastara Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30871/jaic.v9i2.9079

Keywords:

Internet of Things, Internet of Vehicle, Imbalanced Dataset, Machine Learning, Random Under Sampling.

Abstract

This study addresses the cybersecurity challenges within the Internet of Vehicles (IoV) by exploring the efficacy of Random Under-Sampling (RUS) in balancing the class distribution of the CICIoV2024 dataset for improved intrusion detection. IoV technology connects vehicles to digital infrastructure, fostering communication and enhancing safety but is simultaneously vulnerable to cyber threats such as Denial of Service (DoS) and spoofing attacks. This research employed RUS to mitigate data imbalance within the CICIoV2024 dataset, which often impedes effective threat detection in machine learning models. Four machine learning classifiers Random Forest, AdaBoost, Gradient Boosting, and XGBoost were evaluated on both imbalanced and balanced datasets to compare their performance. Results demonstrated that RUS significantly enhances model accuracy, precision, recall, and F1-score, reaching perfect scores across all classifiers post-balancing. Additionally, RUS contributed to substantial reductions in training and testing times, thereby boosting computational efficiency. These findings underscore the potential of RUS in addressing data imbalance in IoV cybersecurity, establishing a foundation for future research aimed at safeguarding IoV systems against evolving cyber threats.

Downloads

Download data is not yet available.

References

[1] S. M. Hussain, K. M. Yusof, R. Asuncion, S. A. Hussain, and A. Ahmad, “An Integrated Approach of 4G LTE and DSRC (IEEE 802.11p) for Internet of Vehicles (IoV) by Using a Novel Cluster-Based Efficient Radio Interface Selection Algorithm to Improve Vehicular Network (VN) Performance,” in Sustainable Advanced Computing, vol. 840, S. Aurelia, S. S. Hiremath, K. Subramanian, and S. Kr. Biswas, Eds., in Lecture Notes in Electrical Engineering, vol. 840. , Singapore: Springer Singapore, 2022, pp. 569–583. doi: 10.1007/978-981-16-9012-9_46.

[2] C. Abdelaziz Kerrache, M. Amadeo, S. H. Ahmed, and C. Liang, “Future Internet of Vehicles,” Trans. Emerg. Telecommun. Technol., vol. 31, no. 5, p. e3975, May 2020, doi: 10.1002/ett.3975.

[3] T. Guan, Y. Han, N. Kang, N. Tang, X. Chen, and S. Wang, “An Overview of Vehicular Cybersecurity for Intelligent Connected Vehicles,” Sustainability, vol. 14, no. 9, p. 5211, Apr. 2022, doi: 10.3390/su14095211.

[4] H. Taslimasa, S. Dadkhah, E. C. P. Neto, P. Xiong, S. Ray, and A. A. Ghorbani, “Security issues in Internet of Vehicles (IoV): A comprehensive survey,” Internet Things, vol. 22, p. 100809, Jul. 2023, doi: 10.1016/j.iot.2023.100809.

[5] J. Asharf, N. Moustafa, H. Khurshid, E. Debie, W. Haider, and A. Wahab, “A Review of Intrusion Detection Systems Using Machine and Deep Learning in Internet of Things: Challenges, Solutions and Future Directions,” Electronics, vol. 9, no. 7, p. 1177, Jul. 2020, doi: 10.3390/electronics9071177.

[6] P. Vanin et al., “A Study of Network Intrusion Detection Systems Using Artificial Intelligence/Machine Learning,” Appl. Sci., vol. 12, no. 22, p. 11752, Nov. 2022, doi: 10.3390/app122211752.

[7] Z. Jiang, K. Zhao, R. Li, J. Zhao, and J. Du, “PHYAlert: identity spoofing attack detection and prevention for a wireless edge network,” J. Cloud Comput., vol. 9, no. 1, p. 5, Dec. 2020, doi: 10.1186/s13677-020-0154-7.

[8] J. Nagarajan et al., “Machine Learning based intrusion detection systems for connected autonomous vehicles: A survey,” Peer--Peer Netw. Appl., vol. 16, no. 5, pp. 2153–2185, Sep. 2023, doi: 10.1007/s12083-023-01508-7.

[9] E. C. P. Neto et al., “CICIoV2024: Advancing realistic IDS approaches against DoS and spoofing attack in IoV CAN bus,” Internet Things, vol. 26, p. 101209, Jul. 2024, doi: 10.1016/j.iot.2024.101209.

[10] P. Dey and D. Bhakta, “A New Random Forest and Support Vector Machine-based Intrusion Detection Model in Networks,” Natl. Acad. Sci. Lett., vol. 46, no. 5, pp. 471–477, Oct. 2023, doi: 10.1007/s40009-023-01223-0.

[11] S. Salmi and L. Oughdir, “Performance evaluation of deep learning techniques for DoS attacks detection in wireless sensor network,” J. Big Data, vol. 10, no. 1, p. 17, Feb. 2023, doi: 10.1186/s40537-023-00692-w.

[12] E. S. Ali et al., “Machine Learning Technologies for Secure Vehicular Communication in Internet of Vehicles: Recent Advances and Applications,” Secur. Commun. Netw., vol. 2021, pp. 1–23, Mar. 2021, doi: 10.1155/2021/8868355.

[13] “IoV Dataset 2024 | Datasets | Research | Canadian Institute for Cybersecurity | UNB.” Accessed: Oct. 31, 2024. [Online]. Available: https://www.unb.ca/cic/datasets/iov-dataset-2024.html

[14] L. Dube and T. Verster, “Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models,” Data Sci. Finance Econ., vol. 3, no. 4, pp. 354–379, 2023, doi: 10.3934/DSFE.2023021.

[15] T. H. M. Le and M. A. Babar, “Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?,” Jul. 15, 2024, arXiv: arXiv:2407.10722. Accessed: Oct. 31, 2024. [Online]. Available: http://arxiv.org/abs/2407.10722

[16] M. Kim and K.-B. Hwang, “An empirical evaluation of sampling methods for the classification of imbalanced data,” PLOS ONE, vol. 17, no. 7, p. e0271260, Jul. 2022, doi: 10.1371/journal.pone.0271260.

[17] F. A. Rafrastara, W. Ghozi, and A. Wardoyo, “Deteksi Serangan berbasis Machine Learning pada Internet of Vehicle,” vol. 2024, 2024.

[18] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning,” IEEE Access, vol. 10, pp. 40281–40306, 2022, doi: 10.1109/ACCESS.2022.3165809.

[19] F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” J. Inform. J. Pengemb. IT, vol. 8, no. 2, pp. 113–118, May 2023, doi: 10.30591/jpit.v8i2.5207.

[20] A. S. Tarawneh, A. B. Hassanat, G. A. Altarawneh, and A. Almuhaimeed, “Stop Oversampling for Class Imbalance Learning: A Review,” IEEE Access, vol. 10, pp. 47643–47660, 2022, doi: 10.1109/ACCESS.2022.3169512.

[21] V. Kumar, A. Kumar, S. Garg, and S. R. Payyavula, “Boosting Algorithms to Identify Distributed Denial-of-Service Attacks,” J. Phys. Conf. Ser., vol. 2312, no. 1, p. 012082, Aug. 2022, doi: 10.1088/1742-6596/2312/1/012082.

[22] F. A. Rafrastara, C. Supriyanto, A. Amiral, S. R. Amalia, M. D. Al Fahreza, and F. Ahmed, “Performance Comparison of k-Nearest Neighbor Algorithm with Various k Values and Distance Metrics for Malware Detection,” J. MEDIA Inform. BUDIDARMA, vol. 8, no. 1, p. 450, Jan. 2024, doi: 10.30865/mib.v8i1.6971.

[23] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Glob. Transit. Proc., vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.

[24] A. Moscovich and S. Rosset, “On the cross-validation bias due to unsupervised pre-processing,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 4, pp. 1474–1502, Sep. 2022, doi: 10.1111/rssb.12537.

[25] S. K. Wildah, A. Latif, A. Mustopa, S. Suharyanto, M. S. Maulana, and A. Sasongko, “Klasifikasi Penyakit Daun Kopi Menggunakan Kombinasi Haralick, Color Histogram dan Random Forest,” J. Sist. Dan Teknol. Inf. JustIN, vol. 11, no. 1, p. 35, Jan. 2023, doi: 10.26418/justin.v11i1.60985.

[26] L. Mentch and S. Zhou, “Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success,” Aug. 2020.

[27] F. O. Aghware et al., “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407–420, Mar. 2024, doi: 10.62411/jcta.10323.

[28] W. Fan, Z. Ding, R. Huang, C. Zhou, and X. Zhang, “Improved AdaBoost for virtual reality experience prediction based on Long Short-Term Memory network,” Appl. Comput. Eng., vol. 77, no. 1, pp. 158–163, Jul. 2024, doi: 10.54254/2755-2721/77/20240678.

[29] F. Aziz and B. L. E. Panggabean, “Klasifikasi Nasabah Potensial menggunakan Algoritma Ensemble Least Square Support Vector Machine dengan AdaBoost,” J. Inform. J. Pengemb. IT, vol. 8, no. 3, pp. 269–274, Sep. 2023, doi: 10.30591/jpit.v8i3.5675.

[30] Z. G. Modarres, M. Shabankhah, and A. Kamandi, “Making AdaBoost Less Prone to Overfitting on Noisy Datasets,” in 2020 6th International Conference on Web Research (ICWR), Tehran, Iran: IEEE, Apr. 2020, pp. 251–259. doi: 10.1109/ICWR49608.2020.9122292.

[31] N. Novianti, M. Zarlis, and P. Sihombing, “Penerapan Algoritma Adaboost Untuk Peningkatan Kinerja Klasifikasi Data Mining Pada Imbalance Dataset Diabetes,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 2, p. 1200, Apr. 2022, doi: 10.30865/mib.v6i2.4017.

[32] B. Fuhrer, C. Tessler, and G. Dalal, “Gradient Boosting Reinforcement Learning,” Jul. 11, 2024, arXiv: arXiv:2407.08250. Accessed: Oct. 31, 2024. [Online]. Available: http://arxiv.org/abs/2407.08250

[33] P. Messer and T. Schmid, “Gradient Boosting for Hierarchical Data in Small Area Estimation,” Jun. 06, 2024, arXiv: arXiv:2406.04256. Accessed: Oct. 31, 2024. [Online]. Available: http://arxiv.org/abs/2406.04256

[34] A. F. Cruz, C. Belém, S. Jesus, J. Bravo, P. Saleiro, and P. Bizarro, “FairGBM: Gradient Boosting with Fairness Constraints,” Mar. 03, 2023, arXiv: arXiv:2209.07850. Accessed: Oct. 31, 2024. [Online]. Available: http://arxiv.org/abs/2209.07850

[35] T. Wahyuningsih, A. Iriani, H. D. Purnomo, and I. Sembiring, “Predicting students’ success level in an examination using advanced linear regression and extreme gradient boosting,” Comput. Sci. Inf. Technol., vol. 5, no. 1, pp. 29–37, Mar. 2024, doi: 10.11591/csit.v5i1.pp29-37.

[36] C. Qin, Y. Zhang, F. Bao, C. Zhang, P. Liu, and P. Liu, “XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring,” Math. Probl. Eng., vol. 2021, pp. 1–18, Mar. 2021, doi: 10.1155/2021/6655510.

[37] W. Chimphlee and S. Chimphlee, “Hyperparameters optimization XGBoost for network intrusion detection using CSE-CIC-IDS 2018 dataset,” IAES Int. J. Artif. Intell. IJ-AI, vol. 13, no. 1, p. 817, Mar. 2024, doi: 10.11591/ijai.v13.i1.pp817-826.

Balancing CICIoV2024 Dataset with RUS for Improved IoV Attack Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn