Evaluation of the Decision Tree Model for Air Condition Classification on the Global Air Pollution Dataset

  • Cindy Dinda Sabella Universitas Amikom Yogyakarta
  • Yoga Pristyanto Universitas Amikom Yogyakarta
Keywords: Decision Tree, Air Quality Index, Air Pollution, Machine Learning, Classification

Abstract

Air pollution is an urgent global environmental problem, with significant impacts on public health and ecosystem stability. This research aims to develop an air quality classification model using the Global Air Pollution dataset from Kaggle, which consists of 23,463 rows of data and 12 features, including important variables such as Air Quality Index (AQI), PM2.5, NO2, and O3. Decision Tree, Random Forest, and Support Vector Machine (SVM) algorithms are applied to perform classification, with a focus on hyperparameter tuning to increase model accuracy. The research results show that the Decision Tree provides the best results with an accuracy of 99.89% after tuning hyperparameters using the Grid Search method. The SVM model showed an improvement of 94.89% to 99.32%, while Random Forest recorded an accuracy of 96.87% with no significant improvement after tuning. Importance feature analysis identified PM2.5 and AQI as the dominant factors in influencing air quality, with PM2.5 having the highest importance value of 0.93. This research confirms that machine learning can be an effective tool for integrating and classifying air pollution. It is hoped that the integration of this model into a real-time air quality monitoring system can help make more responsive and precise decisions in dealing with air pollution problems.

Downloads

Download data is not yet available.

References

F. Widiawati, R. Kurniawan, and T. Suprapti, “Klasifikasi Data Tingkat Kualitas Udara Di Tangerang Selatan Menggunakan Algoritma Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 6, pp. 3739–3745, 2024, doi: 10.36040/jati.v7i6.8261.

A. Efendi, I. Iskandar, R. Kurniawan, and M. Affandes, “Klasifikasi Kebakaran Hutan Riau Menggunakan Random Forest dan Visualisasi Citra Sentinel-2,” Kaji. Ilm. Inform. dan Komput., vol. 4, no. 3, pp. 1602–1612, 2023, doi: 10.30865/klik.v4i3.1521.

R. F. Ramadhani, S. S. Prasetiyowati, and Y. Sibaroni, “Performance Analysis of Air Pollution Classification Prediction Map with Decision Tree and ANN,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 536–543, 2022, doi: 10.47065/josyc.v3i4.2117.

I. Irwansyah, A. D. Wiranata, and T. T. M, “Komparasi Algoritma Decision Tree, Naive Bayes Dan K-Nearest Neighbor Untuk Menentukan Kualitas Udara Di Provinsi Dki Jakarta,” Infotech J. Technol. Inf., vol. 9, no. 2, pp. 193–198, 2023, doi: 10.37365/jti.v9i2.203.

B. Sunarko et al., “Penerapan Stacking Ensemble Learning untuk Klasifikasi Efek Kesehatan Akibat Pencemaran Udara,” Edu Komputika J., vol. 10, no. 1, pp. 55–63, 2023, doi: 10.15294/edukomputika.v10i1.72080.

A. I. Sang, E. Sutoyo, and I. Darmawan, “Analisis Data Mining Untuk Klasifikasi Data Kualitas Udara Dki Jakarta Menggunakan Algoritma Decision Tree Dan Support Vector Machine Data Minning Analysis for Classification of Air Quality Data Dki Jakarta Using Decision Tree Algorthm and Support Vector ,” e-Proceeding Eng., vol. 8, no. 5, pp. 8954–8963, 2021.

R. Rofiani, L. Oktaviani, D. Vernanda, and T. Hendriawan, “Penerapan Metode Klasifikasi Decision Tree dalam Prediksi Kanker Paru-Paru Menggunakan Algoritma C4.5,” J. Tekno Kompak, vol. 18, no. 1, p. 126, 2024, doi: 10.33365/jtk.v18i1.3525.

S. Syihabuddin Azmil Umri, “Analisis Dan Komparasi Algoritma Klasifikasi Dalam Indeks Pencemaran Udara Di Dki Jakarta,” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 98–104, 2021, doi: 10.33387/jiko.v4i2.2871.

R. S. Amanu, F. A. Ramadhan, and A. H. Saputra, “Perbandingan Model Prediksi Data Mining Dalam Memprediksi Konsentrasi Polutan Karbon Monoksida (Co) Di Jakarta,” J. Teknol. Inf. J. Keilmuan dan Apl. Bid. Tek. Inform., vol. 18, no. 1, pp. 7–21, 2024, doi: 10.47111/jti.v18i1.12451.

Y. V. Sari, Z. Muallifah, and A. Fanani, “Klasifikasi Kualitas Air Menggunakan Metode Extreme Learning Machine (ELM),” J. JUPITER, vol. 15, no. 2, pp. 983–994, 2023, [Online]. Available: https://jurnal.polsri.ac.id/index.php/jupiter/article/view/6995

T. Madan, S. Sagar and D. Virmani, "Air Quality Prediction using Machine Learning Algorithms –A Review," 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2020, pp. 140-145, doi: 10.1109/ICACCCN51052.2020.9362912.

R. Murugan and N. Palanichamy, "Smart City Air Quality Prediction using Machine Learning," 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1048-1054, doi: 10.1109/ICICCS51141.2021.9432074.

S. Ameer et al., "Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities," in IEEE Access, vol. 7, pp. 128325-128338, 2019, doi: 10.1109/ACCESS.2019.2925082.

N. Das and Asaduzzaman, "An IoT-based System for Air Pollution Data Analysis and Visualization," 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh, 2021, pp. 1-6, doi: 10.1109/ICEEICT53905.2021.9667912

S. Rani, P. Kumari and S. K. Singh, "Machine Learning-based Multiclass Classification Model for Effective Air Quality Prediction," 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, United Kingdom, 2023, pp. 1-7, doi: 10.1109/GlobConET56651.2023.10149947.

Published
2024-11-14
How to Cite
[1]
C. Sabella and Y. Pristyanto, “Evaluation of the Decision Tree Model for Air Condition Classification on the Global Air Pollution Dataset”, JAIC, vol. 8, no. 2, pp. 478-486, Nov. 2024.
Section
Articles