Forecasting Topic, Word, and Hashtag Popularity on X (Twitter) Using LightGBM for Digital Marketing Optimization

Authors

  • Deannisa Syafira Putri Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Amri Muhaimin Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Mohammad Idhom Universitas Pembangunan Nasional "Veteran" Jawa Timur

DOI:

https://doi.org/10.30871/jaic.v10i2.12233

Keywords:

BERTopic, LightGBM, Optuna, Time Series Forecasting, Social Media Analytics

Abstract

This study presents a machine learning, based framework to forecast the popularity of topics, words, and hashtags on platform X (Twitter) for data-driven digital marketing optimization. Using one year of Indonesian promotional tweets, BERTopic was applied for topic modeling, while LightGBM optimized with Optuna was used to forecast temporal dynamics based on engineered time series features. Evaluation results using RMSE, MAE, and RMSSE, complemented by comparisons with an ARIMA baseline, indicate that the proposed LightGBM model achieves competitive and consistently superior performance. Despite challenges in predicting word-level spikes caused by noise and event-driven behavior, the model effectively captures underlying trend patterns. The proposed approach supports improved campaign timing, content planning, and ROI, with the full implementation publicly available at Github including detailed documentation of the dataset, preprocessing steps, and experimental pipeline to support reproducibility and further research.

Downloads

Download data is not yet available.

References

[1] C. E. Sitanggang, D. A. Firda, R. Ramadhini, J. M. Panjaitan, Sofwan and M. Sholeh, " Studi Literatur: Penggunaan Media Sosial Sebagai Alat Promosi Usaha," Jurnal Ilmiah Ekonomi Dan Bisnis Universitas Multi Data Palembang, vol. 14, pp. 23-29, 2024.

[2] A. D. Ade, M. Rizan and I. Febrilia, "Pengaruh Aktivitas Pemasaran Media Sosial Media Sosial Terhadap Citra Merek, Loyalitas Merek, Dan Niat Beli Ulang Pada Social Commerce Tiktok Shop," Jurnal Masharif al-Syariah: Jurnal Ekonomi dan Perbankan Syariah, vol. 9, no. 4, pp. 2399-2416, 2024.

[3] M. Febiansyah, Jondri and Indwiarti, "Prediksi Retweet Berdasarkan Konten Dan Pengguna Dengan Metode Classifier Selection," Smart Comp: Jurnalnya Orang Pintar Komputer, vol. 14, no. 1, pp. 123-129, 2025.

[4] The Global Statistics, "The Global Statistics," The Data Expert, 12 Maret 2025. [Online]. Available: https://www.theglobalstatistics.com/indonesia-social-media-statistics. [Accessed 3 April 2025].

[5] H. Kwak, C. Lee, H. Park and S. Moon, "What is Twitter, a Social Network or a News Media?," in WWW '10: Proceedings of the 19th international conference on World wide web, North Carolina, 2010.

[6] P. A. Riyantoko and A. Muhaimin, "A Simple Data Sentiment Analysis using Bjorka phenomenon on Twitter," in 7st International Seminar of Research Month 2022, Surabaya, 2023.

[7] K. Darvidou, "Content Marketing Strategy and Development," Technium Business and Management (TBM), vol. 10, pp. 55-67, 2024.

[8] Y. WANG, J. CALLAN and B. ZHENG, "Should We Use the Sample? Analyzing Datasets Sampled from Twitter’s Stream API," ACM Transactions on the Web (TWEB), vol. 9, no. 3, pp. 13-35, 2015.

[9] S. S. S. Ramesh, C. Raghavaraju, S. L. P and A. T. Navis, "Exploratory Analysis and Predictive Modeling of Social Media Data by Decoding Twitter," Research Square, 2024.

[10] C.-C. Hsu, C.-M. Lee, X.-Y. Hou and C.-H. Tsai, "Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts," in MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 2023.

[11] A. Jafar and M. Lee, "Comparative Performance Evaluation of State-of-the-Art Hyperparameter Optimization Frameworks," The Transactions of the Korea Institute of Electrical Engineers, vol. 72, no. 5, pp. 607-619, 2023.

[12] K. Ng and P. Lei, "A Lightweight Method using LightGBM Model with Optuna in MOOCs Dropout Prediction," in ICEMT '22: Proceedings of the 6th International Conference on Education and Multimedia Technology , Guangzhou, 2022.

[13] R.-S. Constantin, A. A. Davidescu and E. M. Manta, "Time Series Forecasting with LightGBM under Data Scarcity: An Application to Romania's Inland Gas Consumption," Proceedings of the International Conference on Business Excellence, vol. 19, no. 1, pp. 1518-1531, 2025.

[14] Y. Chen, X. Xie, Z. Pei, W. Yi, C. Wang, W. Zhang and Z. Ji, "Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM," Applied Science, vol. 14, no. 2, pp. 1-16, 2024.

[15] M. Thoriqulhaq, M. Idhom and K. M. Hindrayani, "Implementasi Algoritma LightGBM untuk Prediksi Status Gizi Bayi dan Balita di Desa Doko Kabupaten Kediri," J-TETA : Jurnal Teknik Terapan, vol. 4, no. 1, pp. 65-73, 2025.

[16] T. Kee and W. K. Ho, "Optimizing Machine Learning Models for Urban Sciences: A Comparative Analysis of Hyperparameter Tuning Methods," Urban Science, vol. 9, no. 9, pp. 1-24, 2025.

[17] M. Mendonça and Á. Figueira, "Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse," Informatics, vol. 11, no. 1, pp. 1-34, 2024.

[18] E. Zhu, "BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights," arXiv, New York, 2024.

[19] F. Koto, J. H. Lau and T. Baldwin, "IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, 2021.

[20] Karnisih, Sunarno, Iqbal, Djuniadi and F. S. Pribadi, "Penerapan Algoritma Linear Regression dan Support Vector Regression dalam Prediksi Temperatur Udara di Malang," Techno.COM, vol. 24, no. 1, pp. 218-229, 2025.

[21] R. C. Ganzevoort and J. H. v. Vuuren, "Atwo-phasedcluster-basedapproachtowardsrankedforecast-model selection," MachineLearningwithApplications, vol. 13, pp. 1-14, 2023.

[22] X. Zhang and W. Cao, "Research on Time Series Forecasting Method Based on Autoregressive Integrated Moving Average Model with Zonotopic Kalman Filter," Sustainability (MDPI), vol. 17, no. 7, pp. 1-18, 2025.

[23] M. d. Groot, M. Aliannejadi and M. R. Haas, "Experiments on Generalizability of BERTopic on Multi-Domain Short Text," arXiv, 2022.

[24] https://github.com/deannisasp/twitter-forecasting-lightgbm-

Downloads

Published

2026-04-22

How to Cite

[1]
D. S. Putri, A. Muhaimin, and M. Idhom, “Forecasting Topic, Word, and Hashtag Popularity on X (Twitter) Using LightGBM for Digital Marketing Optimization”, JAIC, vol. 10, no. 2, pp. 1707–1718, Apr. 2026.

Issue

Section

Articles

Similar Articles

<< < 1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.