Evaluation of the Decision Tree Model for Air Condition Classification on the Global Air Pollution Dataset
Abstract
Air pollution is an urgent global environmental problem, with significant impacts on public health and ecosystem stability. This research aims to develop an air quality classification model using the Global Air Pollution dataset from Kaggle, which consists of 23,463 rows of data and 12 features, including important variables such as Air Quality Index (AQI), PM2.5, NO2, and O3. Decision Tree, Random Forest, and Support Vector Machine (SVM) algorithms are applied to perform classification, with a focus on hyperparameter tuning to increase model accuracy. The research results show that the Decision Tree provides the best results with an accuracy of 99.89% after tuning hyperparameters using the Grid Search method. The SVM model showed an improvement of 94.89% to 99.32%, while Random Forest recorded an accuracy of 96.87% with no significant improvement after tuning. Importance feature analysis identified PM2.5 and AQI as the dominant factors in influencing air quality, with PM2.5 having the highest importance value of 0.93. This research confirms that machine learning can be an effective tool for integrating and classifying air pollution. It is hoped that the integration of this model into a real-time air quality monitoring system can help make more responsive and precise decisions in dealing with air pollution problems.
Downloads
References
F. Widiawati, R. Kurniawan, and T. Suprapti, “Klasifikasi Data Tingkat Kualitas Udara Di Tangerang Selatan Menggunakan Algoritma Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 6, pp. 3739–3745, 2024, doi: 10.36040/jati.v7i6.8261.
A. Efendi, I. Iskandar, R. Kurniawan, and M. Affandes, “Klasifikasi Kebakaran Hutan Riau Menggunakan Random Forest dan Visualisasi Citra Sentinel-2,” Kaji. Ilm. Inform. dan Komput., vol. 4, no. 3, pp. 1602–1612, 2023, doi: 10.30865/klik.v4i3.1521.
R. F. Ramadhani, S. S. Prasetiyowati, and Y. Sibaroni, “Performance Analysis of Air Pollution Classification Prediction Map with Decision Tree and ANN,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 536–543, 2022, doi: 10.47065/josyc.v3i4.2117.
I. Irwansyah, A. D. Wiranata, and T. T. M, “Komparasi Algoritma Decision Tree, Naive Bayes Dan K-Nearest Neighbor Untuk Menentukan Kualitas Udara Di Provinsi Dki Jakarta,” Infotech J. Technol. Inf., vol. 9, no. 2, pp. 193–198, 2023, doi: 10.37365/jti.v9i2.203.
B. Sunarko et al., “Penerapan Stacking Ensemble Learning untuk Klasifikasi Efek Kesehatan Akibat Pencemaran Udara,” Edu Komputika J., vol. 10, no. 1, pp. 55–63, 2023, doi: 10.15294/edukomputika.v10i1.72080.
A. I. Sang, E. Sutoyo, and I. Darmawan, “Analisis Data Mining Untuk Klasifikasi Data Kualitas Udara Dki Jakarta Menggunakan Algoritma Decision Tree Dan Support Vector Machine Data Minning Analysis for Classification of Air Quality Data Dki Jakarta Using Decision Tree Algorthm and Support Vector ,” e-Proceeding Eng., vol. 8, no. 5, pp. 8954–8963, 2021.
R. Rofiani, L. Oktaviani, D. Vernanda, and T. Hendriawan, “Penerapan Metode Klasifikasi Decision Tree dalam Prediksi Kanker Paru-Paru Menggunakan Algoritma C4.5,” J. Tekno Kompak, vol. 18, no. 1, p. 126, 2024, doi: 10.33365/jtk.v18i1.3525.
S. Syihabuddin Azmil Umri, “Analisis Dan Komparasi Algoritma Klasifikasi Dalam Indeks Pencemaran Udara Di Dki Jakarta,” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 98–104, 2021, doi: 10.33387/jiko.v4i2.2871.
R. S. Amanu, F. A. Ramadhan, and A. H. Saputra, “Perbandingan Model Prediksi Data Mining Dalam Memprediksi Konsentrasi Polutan Karbon Monoksida (Co) Di Jakarta,” J. Teknol. Inf. J. Keilmuan dan Apl. Bid. Tek. Inform., vol. 18, no. 1, pp. 7–21, 2024, doi: 10.47111/jti.v18i1.12451.
Y. V. Sari, Z. Muallifah, and A. Fanani, “Klasifikasi Kualitas Air Menggunakan Metode Extreme Learning Machine (ELM),” J. JUPITER, vol. 15, no. 2, pp. 983–994, 2023, [Online]. Available: https://jurnal.polsri.ac.id/index.php/jupiter/article/view/6995
T. Madan, S. Sagar and D. Virmani, "Air Quality Prediction using Machine Learning Algorithms –A Review," 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2020, pp. 140-145, doi: 10.1109/ICACCCN51052.2020.9362912.
R. Murugan and N. Palanichamy, "Smart City Air Quality Prediction using Machine Learning," 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1048-1054, doi: 10.1109/ICICCS51141.2021.9432074.
S. Ameer et al., "Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities," in IEEE Access, vol. 7, pp. 128325-128338, 2019, doi: 10.1109/ACCESS.2019.2925082.
N. Das and Asaduzzaman, "An IoT-based System for Air Pollution Data Analysis and Visualization," 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh, 2021, pp. 1-6, doi: 10.1109/ICEEICT53905.2021.9667912
S. Rani, P. Kumari and S. K. Singh, "Machine Learning-based Multiclass Classification Model for Effective Air Quality Prediction," 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, United Kingdom, 2023, pp. 1-7, doi: 10.1109/GlobConET56651.2023.10149947.
Copyright (c) 2024 Cindy Dinda Sabella, Yoga Pristyanto
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).