Forecasting Topic, Word, and Hashtag Popularity on X (Twitter) Using LightGBM for Digital Marketing Optimization
DOI:
https://doi.org/10.30871/jaic.v10i2.12233Keywords:
BERTopic, LightGBM, Optuna, Time Series Forecasting, Social Media AnalyticsAbstract
This study presents a machine learning, based framework to forecast the popularity of topics, words, and hashtags on platform X (Twitter) for data-driven digital marketing optimization. Using one year of Indonesian promotional tweets, BERTopic was applied for topic modeling, while LightGBM optimized with Optuna was used to forecast temporal dynamics based on engineered time series features. Evaluation results using RMSE, MAE, and RMSSE, complemented by comparisons with an ARIMA baseline, indicate that the proposed LightGBM model achieves competitive and consistently superior performance. Despite challenges in predicting word-level spikes caused by noise and event-driven behavior, the model effectively captures underlying trend patterns. The proposed approach supports improved campaign timing, content planning, and ROI, with the full implementation publicly available at Github including detailed documentation of the dataset, preprocessing steps, and experimental pipeline to support reproducibility and further research.
Downloads
References
[1] C. E. Sitanggang, D. A. Firda, R. Ramadhini, J. M. Panjaitan, Sofwan and M. Sholeh, " Studi Literatur: Penggunaan Media Sosial Sebagai Alat Promosi Usaha," Jurnal Ilmiah Ekonomi Dan Bisnis Universitas Multi Data Palembang, vol. 14, pp. 23-29, 2024.
[2] A. D. Ade, M. Rizan and I. Febrilia, "Pengaruh Aktivitas Pemasaran Media Sosial Media Sosial Terhadap Citra Merek, Loyalitas Merek, Dan Niat Beli Ulang Pada Social Commerce Tiktok Shop," Jurnal Masharif al-Syariah: Jurnal Ekonomi dan Perbankan Syariah, vol. 9, no. 4, pp. 2399-2416, 2024.
[3] M. Febiansyah, Jondri and Indwiarti, "Prediksi Retweet Berdasarkan Konten Dan Pengguna Dengan Metode Classifier Selection," Smart Comp: Jurnalnya Orang Pintar Komputer, vol. 14, no. 1, pp. 123-129, 2025.
[4] The Global Statistics, "The Global Statistics," The Data Expert, 12 Maret 2025. [Online]. Available: https://www.theglobalstatistics.com/indonesia-social-media-statistics. [Accessed 3 April 2025].
[5] H. Kwak, C. Lee, H. Park and S. Moon, "What is Twitter, a Social Network or a News Media?," in WWW '10: Proceedings of the 19th international conference on World wide web, North Carolina, 2010.
[6] P. A. Riyantoko and A. Muhaimin, "A Simple Data Sentiment Analysis using Bjorka phenomenon on Twitter," in 7st International Seminar of Research Month 2022, Surabaya, 2023.
[7] K. Darvidou, "Content Marketing Strategy and Development," Technium Business and Management (TBM), vol. 10, pp. 55-67, 2024.
[8] Y. WANG, J. CALLAN and B. ZHENG, "Should We Use the Sample? Analyzing Datasets Sampled from Twitter’s Stream API," ACM Transactions on the Web (TWEB), vol. 9, no. 3, pp. 13-35, 2015.
[9] S. S. S. Ramesh, C. Raghavaraju, S. L. P and A. T. Navis, "Exploratory Analysis and Predictive Modeling of Social Media Data by Decoding Twitter," Research Square, 2024.
[10] C.-C. Hsu, C.-M. Lee, X.-Y. Hou and C.-H. Tsai, "Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts," in MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 2023.
[11] A. Jafar and M. Lee, "Comparative Performance Evaluation of State-of-the-Art Hyperparameter Optimization Frameworks," The Transactions of the Korea Institute of Electrical Engineers, vol. 72, no. 5, pp. 607-619, 2023.
[12] K. Ng and P. Lei, "A Lightweight Method using LightGBM Model with Optuna in MOOCs Dropout Prediction," in ICEMT '22: Proceedings of the 6th International Conference on Education and Multimedia Technology , Guangzhou, 2022.
[13] R.-S. Constantin, A. A. Davidescu and E. M. Manta, "Time Series Forecasting with LightGBM under Data Scarcity: An Application to Romania's Inland Gas Consumption," Proceedings of the International Conference on Business Excellence, vol. 19, no. 1, pp. 1518-1531, 2025.
[14] Y. Chen, X. Xie, Z. Pei, W. Yi, C. Wang, W. Zhang and Z. Ji, "Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM," Applied Science, vol. 14, no. 2, pp. 1-16, 2024.
[15] M. Thoriqulhaq, M. Idhom and K. M. Hindrayani, "Implementasi Algoritma LightGBM untuk Prediksi Status Gizi Bayi dan Balita di Desa Doko Kabupaten Kediri," J-TETA : Jurnal Teknik Terapan, vol. 4, no. 1, pp. 65-73, 2025.
[16] T. Kee and W. K. Ho, "Optimizing Machine Learning Models for Urban Sciences: A Comparative Analysis of Hyperparameter Tuning Methods," Urban Science, vol. 9, no. 9, pp. 1-24, 2025.
[17] M. Mendonça and Á. Figueira, "Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse," Informatics, vol. 11, no. 1, pp. 1-34, 2024.
[18] E. Zhu, "BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights," arXiv, New York, 2024.
[19] F. Koto, J. H. Lau and T. Baldwin, "IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, 2021.
[20] Karnisih, Sunarno, Iqbal, Djuniadi and F. S. Pribadi, "Penerapan Algoritma Linear Regression dan Support Vector Regression dalam Prediksi Temperatur Udara di Malang," Techno.COM, vol. 24, no. 1, pp. 218-229, 2025.
[21] R. C. Ganzevoort and J. H. v. Vuuren, "Atwo-phasedcluster-basedapproachtowardsrankedforecast-model selection," MachineLearningwithApplications, vol. 13, pp. 1-14, 2023.
[22] X. Zhang and W. Cao, "Research on Time Series Forecasting Method Based on Autoregressive Integrated Moving Average Model with Zonotopic Kalman Filter," Sustainability (MDPI), vol. 17, no. 7, pp. 1-18, 2025.
[23] M. d. Groot, M. Aliannejadi and M. R. Haas, "Experiments on Generalizability of BERTopic on Multi-Domain Short Text," arXiv, 2022.
[24] https://github.com/deannisasp/twitter-forecasting-lightgbm-
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Deannisa Syafira Putri, Amri Muhaimin, Mohammad Idhom

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








