Optimizing Sentiment Classification Models for TikTok Comments using Emotion-Based Preprocessing and Grid Search
DOI:
https://doi.org/10.30871/jaic.v10i1.11742Keywords:
Emotion-Based Preprocessing, Grid Search, Hyperparameter Tuning, Machine Learning, Sentiment AnalysisAbstract
TikTok has become one of the social media platforms with a significant influence on public opinion formation in Indonesia. However, the linguistic characteristics of user comments which are expressive, concise, and feature emotional forms like emojis, emoticons, and excessive capitalization pose challenges for sentiment analysis. This research aims to optimize a sentiment classification model for TikTok comments using emotion-based preprocessing and hyperparameter optimization via Grid Search. The dataset comprises 4,500 comments from three different time periods discussing the Minister of Finance, Purbaya Yudhi Sadewa. Three testing scenarios were conducted: common preprocessing, emotion-based preprocessing, and a combination of emotion-based preprocessing with Grid Search. The results indicate that emotion-based preprocessing improved model accuracy by 4–5%, while Grid Search optimization provided an additional increase of up to 3%, achieving a peak F1-score of 0.92 with the LightGBM model. Analysis based on sentiment time-periods reveals that across the three different periods, sentiments remained predominantly positive. The integration of emotion-based processing and parameter tuning proved effective in enhancing the model's ability to understand emotional variations in text and to map periodic changes in public sentiment on Indonesian-language social media.
Downloads
References
[1] Z. Cheng dan Y. Li, “Like, Comment, and Share on TikTok: Exploring the Effect of Sentiment and Second-Person View on the User Engagement with TikTok News Videos,” Soc. Sci. Comput. Rev., vol. 42, no. 1, hlm. 201–223, Feb 2024, doi: 10.1177/08944393231178603.
[2] E. Supriyadi dan P. N. Makatita, “Sentiment Analysis of TikTok User Comments on QRIS Adoption in Indonesia Using IndoBERT,” Procedia Comput. Sci., vol. 269, hlm. 121–130, Jan 2025, doi: 10.1016/j.procs.2025.08.265.
[3] S. A. A. Hakami, R. Hendley, dan P. Smith, “Emoji Sentiment Roles for Sentiment Analysis: A Case Study in Arabic Texts,” dalam Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), H. Bouamor, H. Al-Khalifa, K. Darwish, O. Rambow, F. Bougares, A. Abdelali, N. Tomeh, S. Khalifa, dan W. Zaghouani, Ed., Abu Dhabi, United Arab Emirates (Hybrid): Association for Computational Linguistics, Des 2022, hlm. 346–355. doi: 10.18653/v1/2022.wanlp-1.32.
[4] A. Khan, D. Majumdar, dan B. Mondal, “Sentiment analysis of emoji fused reviews using machine learning and Bert,” Sci. Rep., vol. 15, no. 1, hlm. 7538, Mar 2025, doi: 10.1038/s41598-025-92286-0.
[5] M. A. Palomino dan F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol. 12, no. 17, hlm. 8765, Jan 2022, doi: 10.3390/app12178765.
[6] H. Tang, W. Tang, D. Zhu, S. Wang, Y. Wang, dan L. Wang, “EMFSA: Emoji-based multifeature fusion sentiment analysis,” PLOS ONE, vol. 19, no. 9, hlm. e0310715, Sep 2024, doi: 10.1371/journal.pone.0310715.
[7] F.-Y. Chang, “A Quantitative Analysis of Comparison of Emoji Sentiment: Taiwan Mandarin Users and English Users,” dalam Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), Y.-C. Chang dan Y.-C. Huang, Ed., Taipei, Taiwan: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Nov 2022, hlm. 283–288. Diakses: 9 November 2025. [Daring]. Tersedia pada: https://aclanthology.org/2022.rocling-1.35/
[8] M. A. K. Raiaan dkk., “A systematic review of hyperparameter optimization techniques in Convolutional Neural Networks,” Decis. Anal. J., vol. 11, hlm. 100470, Jun 2024, doi: 10.1016/j.dajour.2024.100470.
[9] A. S. Aribowo, N. H. Cahyana, dan Y. Fauziah, “Enhancing Semi-Supervised Sentiment Analysis Through Hyperparameter Tuning Within Iterations: A Comparative Study Using Grid Search and Random Search,” dalam Proceedings of the 2023 1st International Conference on Advanced Informatics and Intelligent Information Systems (ICAI3S 2023), vol. 181, A. Putro Suryotomo dan H. Cahya Rustamaji, Ed., dalam Advances in Intelligent Systems Research, vol. 181. , Dordrecht: Atlantis Press International BV, 2024, hlm. 248–260. doi: 10.2991/978-94-6463-366-5_23.
[10] S. W. Iriananda, R. W. Budiawan, A. Y. Rahman, dan I. Istiadi, “Optimasi Klasifikasi Sentimen Komentar Pengguna Game Bergerak Menggunakan Svm, Grid Search Dan Kombinasi N-Gram,” J. Teknol. Inf. Dan Ilmu Komput., vol. 11, no. 4, hlm. 743–752, Agu 2024, doi: 10.25126/jtiik.1148244.
[11] S. Matharaarachchi, M. Domaratzki, dan S. Muthukumarana, “Enhancing SMOTE for imbalanced data with abnormal minority instances,” Mach. Learn. Appl., vol. 18, hlm. 100597, Des 2024, doi: 10.1016/j.mlwa.2024.100597.
[12] R. Dolak dan P. Kajzar, “Web Scraping and Its Use for Teaching in Course Information Systems in Tourism,” dalam Innovative Technologies and Learning, W.-S. Wang, F. E. Sandnes, C.-F. Lai, T. A. Sandtrø, dan Y.-M. Huang, Ed., Cham: Springer Nature Switzerland, 2026, hlm. 222–230.
[13] S. Patankar dan M. Phadke, “A CNN-transformer framework for emotion recognition in code-mixed English–Hindi data,” Discov. Artif. Intell., vol. 5, no. 1, hlm. 160, Jul 2025, doi: 10.1007/s44163-025-00400-y.
[14] K. S. Eljil, F. Nait-Abdesselam, E. Hamouda, dan M. Hamdi, “Enhancing Sentiment Analysis on Social Media with Novel Preprocessing Techniques,” J. Adv. Inf. Technol., vol. 14, no. 6, hlm. 1206–1213, 2023, doi: 10.12720/jait.14.6.1206-1213.
[15] A. Thakkar, D. Mungra, A. Agrawal, dan K. Chaudhari, “Improving the Performance of Sentiment Analysis Using Enhanced Preprocessing Technique and Artificial Neural Network,” IEEE Trans. Affect. Comput., vol. 13, no. 4, hlm. 1771–1782, Okt 2022, doi: 10.1109/TAFFC.2022.3206891.
[16] A. R. Lubis, Y. Y. Lase, D. A. Rahman, dan D. Witarsyah, “Improving Spell Checker Performance for Bahasa Indonesia Using Text Preprocessing Techniques with Deep Learning Models,” Ingénierie Systèmes Inf., vol. 28, no. 5, hlm. 1335–1342, Okt 2023, doi: 10.18280/isi.280522.
[17] Z. Mansur, N. Omar, S. Tiun, dan E. M. Alshari, “A normalization model for repeated letters in social media hate speech text based on rules and spelling correction,” PloS One, vol. 19, no. 3, hlm. e0299652, 2024, doi: 10.1371/journal.pone.0299652.
[18] Arif Bijaksana Putra Negara, “The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 14, no. 03, hlm. 172–185, Okt 2025, doi: 10.24843/LKJITI.2023.v14.i03.p05.
[19] K. Machová, M. Mikula, X. Gao, dan M. Mach, “Lexicon-based Sentiment Analysis Using the Particle Swarm Optimization,” Electronics, vol. 9, no. 8, hlm. 1317, Agu 2020, doi: 10.3390/electronics9081317.
[20] H. Ahmad, W. Akbar, N. Aslam, A. Ahmed, dan M. Khurshid, “TF-IDF Feature Extraction based Sarcasm Detection on Social Media,” J. Comput. Biomed. Inform., vol. 5, no. 01, Jun 2023, doi: 10.56979/501/2023.
[21] S. F. Taskiran, B. Turkoglu, E. Kaya, dan T. Asuroglu, “A comprehensive evaluation of oversampling techniques for enhancing text classification performance,” Sci. Rep., vol. 15, no. 1, hlm. 21631, Jul 2025, doi: 10.1038/s41598-025-05791-7.
[22] H. Allam, L. Makubvure, B. Gyamfi, K. N. Graham, dan K. Akinwolere, “Text Classification: How Machine Learning Is Revolutionizing Text Categorization,” Information, vol. 16, no. 2, hlm. 130, Feb 2025, doi: 10.3390/info16020130.
[23] I. Wardhana, Musi Ariawijaya, Vandri Ahmad Isnaini, dan Rahmi Putri Wirman, “Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 6, no. 1, hlm. 92–99, Feb 2022, doi: 10.29207/resti.v6i1.3682.
[24] M. Ogunsanya, J. Isichei, dan S. Desai, “Grid search hyperparameter tuning in additive manufacturing processes,” Manuf. Lett., vol. 35, hlm. 1031–1042, Agu 2023, doi: 10.1016/j.mfglet.2023.08.056.
[25] I. V. Tetko, R. van Deursen, dan G. Godin, “Be aware of overfitting by hyperparameter optimization!,” J. Cheminformatics, vol. 16, no. 1, hlm. 139, Des 2024, doi: 10.1186/s13321-024-00934-w.
[26] C. Bentéjac, A. Csörgő, dan G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, hlm. 1937–1967, Mar 2021, doi: 10.1007/s10462-020-09896-5.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Bagas Restya Ermawan, Mahendra Bayu Prayoga, Akmal Rafi Fadhillah, Ema Utami

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








