Improving News Text Classification Using a Hybrid C5.0-KNN Model
DOI:
https://doi.org/10.30871/jaic.v9i6.11478Keywords:
News Topic Classification, C5.0 Decision Tree, K-Nearest Neighbours, Low-Resource Languages, Clustering, Kmeans, Tableau, Perkebunan IndonesiaAbstract
In the digital era, the overwhelming volume of online news far exceeds readers’ ability to manually filter information, necessitating automated text classification. However, achieving high classification accuracy remains challenging, especially in low-resource languages like IndonesianThe C5.0 decision tree and K-Nearest Neighbors (KNN) offer complementary strengths but have not yet been jointly utilized for Indonesian news classification; therefore, this study proposes a hybrid C5.0–KNN model designed to enhance news classification performance. A dataset of 1.700 articles was collected from four Indonesian online news, namely CNN Indonesia, Okezone, Tribun Jakarta, and Tribun Jabar, covering five topical categories, namely economy/ekonomi, technology/teknologi, sport/olahraga, entertainment/hiburan, or life style/gaya hidup). The data underwent preprocessing and TF-IDF weighing before classification with the hybrid model. In this approach, C5.0 first generates interpretable decision rules, and KNN then refines borderline cases, combining rule-based and instance-based methods. The findings revealed that the hybrid model achieved a highest accuracy of 0.8847 (using 25% test data and k=5), outperforming standalone C5.0 (0.7426) and KNN (0.8735). Notably, it attained 100% recall for “sport/olahraga” and an F1-score of 0.89 for “entertainment/hiburan”. These results demonstrate the model’s novelty, efficiency, and strong potential for real-world news classification in low-resource language contexts, offering practical value for journalists, analysts, and media monitoring systems.
Downloads
References
[1] S. M. Habib, E. Haerani, S. K. Gusti and S. Ramadhani, “Klasifikasi Berita Menggunakan Metode Naïve Bayes Classifier,” Jurnal Nasional Komputasi dan Teknologi Informasi, vol. 5, no. 2, pp. 248-258, 2022.
[2] V. Manurung and A. F. Rozi, “Analisis Perbandingan Algoritma K-Nearest Neighbor dan Decision Tree pada Klasifikasi Tingkat Stress Individu,” TIN: Terapan Informatika Nusantara, vol. 5, no. 1, pp. 73-80, 2024.
[3] M. Irfan, W. U. Dewi, K. Nisa and M. Usman, “Implementasi K-Nearest Neighbors, Decision Tree dan Support Vector Machine Pada Data Diabetes,” JMIK (Jurnal Mahasiswa Ilmu Komputer), vol. 4, no. 2, pp. 137-150, 2023.
[4] E. B. Santoso, Y. H. Chrisnanto and G. Abdillah, “Identification of Hoax News Using TF-RF and C5.0 Decision-Tree Algorithm,” Enrichment: Journal of Multidisciplinary Research and Development, vol. 1, no. 6, pp. 336-348, 2023.
[5] A. Ihsan and E. Rainarli, “ Optimization of K-Nearest Neighbour to Categorize Indonesian’s News Articles,” Asia-Pacific Journal of Information Technology and Multimedia, vol. 10, no. 1, pp. 43-51, 2021.
[6] M. R. Amartha, R. Wahyuni and Y. Irawan, “Optimasi Algoritma C5.0 dengan Teknik Ensemble Boosting untuk Peningkatan Akurasi dalam Klasifikasi Ulasan Masyarakat Terhadap Layanan BPJS Kesehatan,” JEKIN (Jurnal Teknik Informatika), vol. 5, no. 1, pp. 100-110, 2025.
[7] A. N. Kasanah, M. and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196-201, 2019.
[8] N. Islam, F. T. Jahra, M. T. Hasan and D. M. Farid, “KNN Tree: A New Method to Ameliorate K-Nearest Neighbour Classification Using Decision Tree,” in In Proc. 2023 Int. Conf. on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, 2023.
[9] H. Allam, L. Makubvure, B. Gyamfi, K. N. Graham and K. Akinwolere, “Text Classification: How Machine Learning Is Revolutionizing Text Categorization,” Information, vol. 16, no. 2, p. 130, 2025.
[10] Z. Mohammadi-Pirouz, K. Hajian-Tilaki, M. S. Haddat-Zavareh, A. Amoozadeh and S. Bahrami, “Development of Decision-Tree Classification Algorithms in Predicting Mortality of COVID-19 Patients,” International Journal of Emergency Medicine, vol. 17, p. 126, 2024.
[11] Y. Wulandari, E. Haerani and S. K. Gusti, “Klasifikasi Berita Menggunakan Algoritma C4.5,” Jurnal Nasional Komputasi dan Teknologi Informasi, vol. 5, no. 2, pp. 279-289, 2022.
[12] D. Soyusiawaty, Buku Ajar Pemrosesan Bahasa Alami, Yogyakarta: Universitas Ahmad Dahlan, 2023.
[13] I. Lestari, D. Fitria, Syafriandi and A. Salma, “Comparison of the C5.0 Algorithm and the CART Algorithm in Stroke Classification,” UNP Journal of Statistics and Data Science, pp. 90-98, 2024.
[14] N. Tanjung, D. Irmayani and V. Sihombing, “Implementation of C5.0 Algorithm for Prediction of Student Learning Graduation in Computer System Architecture Subjects,” Sinkron: Jurnal dan Penelitian Teknik Informatika, pp. 274-280, 2022.
[15] N. D. Bagaskara, “Klasifikasi Sentimen Masyarakat Terhadap Kepolisian Negara Republik Indonesia Menggunakan Naive Bayes Classifier dan Support Vector Machine,” Surabaya, 2022.
[16] L. A. Susanto, “Komparasi Model Support Vector Machine dan K-Nearest Neighbor pada Analisis Sentimen Aplikasi Polri Super App,” JITET (Jurnal Informatika dan Teknik Elektro Terapan), vol. 12, no. 2, pp. 1180-1190, 2024.
[17] N. T. Ujianto, G. H. Fadilah, A. P. Fanti, A. D. Saputra and I. G. Ramadhan, “Penerapan Algoritma K-Nearest Neighbors (KNN) untuk Klasifikasi Citra Medis,” Jurnal Penerapan Teknologi Informasi dan Komunikasi, vol. 2, no. 2, pp. 33-43, 2023.
[18] S. W. Ramdany, S. A. Kaidar, B. Aguchino, C. A. A. Putri and R. Anggie, “Penerapan UML Class Diagram dalam Perancangan Sistem Informasi Perpustakaan Berbasis Web,” Journal of Industrial and Engineering System (JIES), vol. 5, no. 1, pp. 30-41, 2024.
[19] L. P. Sumirat, D. Cahyono, Y. Kristyawan and S. Kacung, Dasar-dasar Rekayasa Perangkat Lunak, Bojonegoro: Madza Media, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Liza Wikarsa, Algy Ngenget, Andrew Tumewu , Miracle Kalempouw , Edgard Oley

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








