An Applied Data Science Approach for Detecting Depression Symptoms in Indonesian Social Media Text Using Transformer Models

Authors

  • Winson Winson Universitas Bunda Mulia
  • Puguh Hiskiawan Universitas Bunda Mulia

DOI:

https://doi.org/10.30871/jaic.v10i3.12644

Keywords:

Applied Data Science, Depression Detection, IndoBERT, XLM-RoBERTa, Natural Language Processing

Abstract

Depression is a mental health disorder that often remains undetected due to limited access to mental health services and persistent social stigma. Social media platforms provide an alternative source for identifying depressive symptoms through linguistic expressions shared by users in textual posts. This study proposes an applied data science approach for detecting depression symptoms in Indonesian social media text using Transformer-based models. The dataset was constructed by combining the DEPTWEET dataset with social media posts collected through keyword-based scraping guided by PHQ-9 indicators. The proposed framework consists of dataset construction, text preprocessing, Transformer-based modeling, and performance evaluation. Two pre-trained language models, IndoBERT and XLM-RoBERTa, were evaluated under two preprocessing configurations, namely normal preprocessing and light preprocessing. Experimental results show that preprocessing strategies significantly influence classification performance. Light preprocessing consistently improves contextual representation and leads to better results compared with normal preprocessing. XLM-RoBERTa combined with light preprocessing achieves the best overall performance with a test accuracy of 0.77 and an F1-score of 0.77. Additional robustness analysis and pairwise model agreement evaluation further indicate that both models maintain relatively stable predictions when processing noisy social media text. Findings from this study demonstrate the effectiveness of Transformer-based models for multi-class depression detection in Indonesian social media environments. The proposed framework provides insights into how applied data science techniques can support large-scale analysis of mental health signals in online platforms and contribute to the development of data-driven approaches for early detection of depression symptoms.

Downloads

Download data is not yet available.

References

[1] T. Wang, “Major Depressive Disorder: A General Overview,” SHS Web of Conferences, vol. 193, p. 03007, 2024, doi: 10.1051/shsconf/202419303007.

[2] X. Hong, Y. Li, and Z. Xue, “A Review of Studies on Major Depressive Disorder,” Advances in Social Science, Education and Humanities Research, 2022.

[3] T. Joseph, “Natural Language Processing (NLP) for Sentiment Analysis in Social Media,” 2024. [Online]. Available: www.carijournals.orgwww.carijournals.org

[4] L. Lina, A. Chris, R. Ranny, and P. Hiskiawan, “Monitoring Crowd Behavior For Campus Surveillance In Indonesia Using Convolutional Neural Network,” International Journal of Innovative Computing, Information and Control, vol. 22, no. 1, pp. 95–107, Feb. 2026, doi: 10.24507/ijicic.22.01.95.

[5] C. A. Arango-Dávila and H. G. Rincón-Hoyos, “Depressive Disorder, Anxiety Disorder and Chronic Pain: Multiple Manifestations of a Common Clinical and Pathophysiological Core,” Jan. 01, 2018, Elsevier Doyma. doi: 10.1016/j.rcp.2016.10.007.

[6] A. Albladi, M. Islam, and C. Seals, “Sentiment Analysis of Twitter Data Using NLP Models: A Comprehensive Review,” 2025, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2025.3541494.

[7] Y. Saputri, F. Syaki, and N. Hadinata, “Sentiment Analysis of Trending Topics on Social Media X Using Natural Language Processing and LSTM,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[8] S. M. Padmaja et al., “Depression Detection in Social Media Using NLP and Hybrid Deep Learning Models,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 2, p. 2025, 2025, [Online]. Available: www.ijacsa.thesai.org

[9] E. Wallace, M. Gardner, and S. Singh, “Interpreting Predictions of NLP Models,” in EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing, Tutorial Abstracts, Association for Computational Linguistics (ACL), 2020, pp. 20–23. doi: 10.18653/v1/P17.

[10] P. Hiskiawan, J. William, L. Feliepe, and T. Jansel, “A Hybrid Data Science Framework for Forecasting Bitcoin Prices using Traditional and AI Models,” Journal of Applied Informatics and Computing, vol. 9, no. 5, pp. 2089–2101, 2025.

[11] S. Fathoniah and C. Rozikin, “Analisis Sentimen Masyarakat terhadap Teroris dalam Media Sosial Twitter menggunakan NLP,” Jurnal Ilmiah Wahana Pendidikan, vol. 2022, no. 13, pp. 412–419, 2022, doi: 10.5281/zenodo.6962682.

[12] N. Alyaa Anindyaputri and A. Suganda Girsang, “A Comparative Study of Deep Learning Models for Detecting Depressive Disorder in Tweets,” Journal of System and Management Sciences, vol. 14, no. 3, Feb. 2024, doi: 10.33168/jsms.2024.0318.

[13] G. F. Situmorang and R. Purba, “Deteksi Potensi Depresi dari Unggahan Media Sosial X Menggunakan IndoBERT,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 2, pp. 649–661, Sep. 2024, doi: 10.47065/bits.v6i2.5496.

[14] I. R. Hidayat and W. Maharani, “General Depression Detection Analysis Using IndoBERT Method,” International Journal on Information and Communication Technology (IJoICT), vol. 8, no. 1, pp. 41–51, Aug. 2022, doi: 10.21108/ijoict.v8i1.634.

[15] V. Sharma, G. Sikka, and A. K. Sharma, “A Time-Aware Multilingual Multimodal Framework for Depression Detection on Social Media,” Reserch Square, Nov. 2025, doi: 10.21203/rs.3.rs-8067865/v1.

[16] S. Islam et al., “Ensemble Transformer with Post-hoc Explanations for Depression Emotion and Severity Detection,” iScience, p. 114605, Feb. 2026, doi: 10.1016/j.isci.2025.114605.

[17] P. Triawan, I. Tahyudin, and P. Purwadi, “Impact of NLP Algorithms on Sentiment Analysis Efficiency and Accuracy,” Journal of Information Systems and Informatics, vol. 7, no. 3, pp. 2684–2709, Sep. 2025, doi: 10.51519/journalisi.v7i3.1222.

[18] J. Y. M. Nip and B. Berthelier, “Social Media Sentiment Analysis,” Encyclopedia, vol. 4, no. 4, pp. 1590–1598, Dec. 2024, doi: 10.3390/encyclopedia4040104.

[19] B. A. Mustofa, W. Laksito, and Y. Saptomo, “Use of Natural Language Processing in Social Media Text Analysis,” Journal of Artificial Intelligence and Engineering Applications, vol. 4, no. 2, pp. 2808–4519, 2025, [Online]. Available: https://ioinformatic.org/

[20] Rakibul Hasan Chowdhury, “Sentiment analysis and social media analytics in brand management: Techniques, trends, and implications,” World Journal of Advanced Research and Reviews, vol. 23, no. 2, pp. 287–296, Aug. 2024, doi: 10.30574/wjarr.2024.23.2.2369.

[21] N. Hussain et al., “Multi-Level Depression Severity Detection with Deep Transformers and Enhanced Machine Learning Techniques,” AI (Switzerland), vol. 6, no. 7, Jul. 2025, doi: 10.3390/ai6070157.

[22] N. Tötsch and D. Hoffmann, “Classifier uncertainty: evidence, potential impact, and probabilistic treatment,” PeerJ Comput. Sci., vol. 7, 2021, doi: 10.7717/peerj-cs.398.

[23] A. Tiwari, Y. Gaidhani, G. Katare, K. Mehta, and M. M. Raghuwanshi, “Sentiment Analysis for Social Media Using NLP,” 2021. [Online]. Available: www.ijcrt.org

[24] S. U. Rahaman, R. Kokku, and S. Suddala, “Sentiment Analysis Revolution: Using NLP to Uncover Social Media’s Hidden Marketing Power,” International Journal of Novel Research and development (IJNRD), 2022, [Online]. Available: www.ijnrd.org

[25] M. Kabir et al., “DEPTWEET: A Typology for Social Media Texts to Detect Depression Severities,” arXiv:221005372v1, Oct. 2022, doi: 10.1016/j.chb.2022.107503.

[26] F. I. Kurniadi, N. L. P. S. P. Paramita, E. F. A. Sihotang, M. S. Anggreainy, and R. Zhang, “BERT and RoBERTa Models for Enhanced Detection of Depression in Social Media Text,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 202–209. doi: 10.1016/j.procs.2024.10.244.

[27] S. Padmalal, I. Edwin Dayanand, G. S. Rao, and S. Gore, “Enhancing Sentiment Analysis in Social Media Texts Using Transformer-Based NLP Models,” SSRG International Journal of Electrical and Electronics Engineering, vol. 11, no. 8, pp. 208–216, Aug. 2024, doi: 10.14445/23488379/IJEEE-V11I8P118.

[28] M. S. Alam, M. S. H. Mrida, and M. A. Rahman, “Sentiment Analysis In Social Media: How Data Science Impacts Public Opinion Knowledge Integrates Natural Language Processing (NLP) With Artificial Intelligence (AI),” American Journal of Scholarly Research and Innovation, vol. 4, no. 1, pp. 63–100, Jan. 2025, doi: 10.63125/r3sq6p80.

[29] J. Lappeman, R. Clark, J. Evans, L. Sierra-Rubia, and P. Gordon, “Studying social media sentiment using human validated analysis,” MethodsX, vol. 7, Jan. 2020, doi: 10.1016/j.mex.2020.100867.

[30] M. Rana, R. Khokale, and S. Sall, “Exploring Sentiment Analysis in Social Media: A Natural Language Processing Case Study,” International Journal on Recent and Innovation Trends in Computing and Communication, p. 12, 2023, [Online]. Available: http://www.ijritcc.org

[31] A. Qasim, G. Mehak, N. Hussain, A. Gelbukh, and G. Sidorov, “Detection of Depression Severity in Social Media Text Using Transformer-Based Models,” Information (Switzerland), vol. 16, no. 2, Feb. 2025, doi: 10.3390/info16020114.

[32] S. S. Dhawale, R. Ponnusamy, P. K. Kumaresan, S. Thavareesan, S. Rajiakodi, and B. R. Chakravarthi, “RACHNA: Racial hoax code mixed Hindi–English with novel language augmentation,” Natural Language Processing Journal, vol. 13, p. 100183, Dec. 2025, doi: 10.1016/j.nlp.2025.100183.

[33] F. Mahardhika, M. L. Haryanti, and P. Hiskiawan, “Performance Evaluation of Speech Emotion Recognition Using Hybrid Feature Selection and Machine Learning,” in 2025 4th International Conference on Creative Communication and Innovative Technology (ICCIT), 2025, pp. 1–7. doi: 10.1109/ICCIT65724.2025.11166879.

[34] B. Shelia M., “Implementing the Patient Health Questionnaire-9 (PHQ-9) to Identify and Refer Adults with Depression,” International Journal of Depression and Anxiety, vol. 6, no. 1, Dec. 2023, doi: 10.23937/2643-4059/1710040.

[35] E. Fonseca-Pedrero, A. Díez-Gómez, A. Pérez-Albéniz, S. Al-Halabí, B. Lucas-Molina, and M. Debbané, “Youth screening depression: Validation of the Patient Health Questionnaire-9 (PHQ-9) in a representative sample of adolescents,” Psychiatry Res., vol. 328, Oct. 2023, doi: 10.1016/j.psychres.2023.115486.

[36] G. S. B. Simanullang and J. A. The, “Roles of Natural Language Processing in New Product Development Process: Literature Review,” Jurnal Rekayasa Sistem Industri, vol. 13, no. 1, pp. 117–130, Apr. 2024, doi: 10.26593/jrsi.v13i1.6790.117-130.

[37] S. Joshi, G. J. Laxmi Priya, U. G. Student, and B. Durga Bhavani, “Social Media Sentiment Analysis using NLP and AI Concepts,” Industrial Engineering Journal, 2023.

[38] B. Kaldarova, A. Tursynbayev, G. Zhakypbekova, G. Beissenova, L. Zhaidakbayeva, and S. Aldeshov, “Applying artificial intelligence to detect depressive disorders in adolescents via social network generated contents,” Int. J. Health Sci. (Qassim)., pp. 1706–1724, Aug. 2022, doi: 10.53730/ijhs.v6ns8.12287.

[39] J. You, S. Wang, X. Gong, and X. Wan, “M3L: A Multi-Modal and Multi-Lingual Depression Detection Framework,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2025, pp. 5268–5272. doi: 10.21437/Interspeech.2025-329.

[40] F. Ayu, D. Aryanti, A. Luthfiarta, D. Adiwinata, and I. Soeroso, “Aspect-Based Sentiment Analysis with LDA and IndoBERT Algorithm on Mental Health App: Riliv,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[41] E. Lim, M. Jhon, J. W. Kim, S. H. Kim, S. Kim, and H. J. Yang, “A lightweight approach based on cross-modality for depression detection,” Comput. Biol. Med., vol. 186, Mar. 2025, doi: 10.1016/j.compbiomed.2024.109618.

[42] A. E. Putra and W. Maharani, “Depression Levels Detection Through Twitter Tweets Using RoBERTa Method,” Journal of Information System Research (JOSH), vol. 3, no. 4, pp. 453–459, Jul. 2022, doi: 10.47065/josh.v3i4.1872.

[43] J. Al Abrar, M. Bin, K. M. R. Chowdhury, and M. A. Bahari, “A Hybrid Transformer-Sequential Model for Depression Detection in Bangla-English Code-Mixed Text,” in Proceedings ofthe Second Workshop on Bangla Language Processing (BLP-2025), 2025, p. 2025.

[44] D. William and D. Suhartono, “Text-based Depression Detection on Social Media Posts: A Systematic Literature Review,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 582–589. doi: 10.1016/j.procs.2021.01.043.

[45] A. S. Rizky and E. Y. Hidayat, “Emotion Classification in Indonesian Text Using IndoBERT,” Computer Engineering and Applications, 2024.

[46] F. Nuraini, M. Najamuddin, D. Miharja, and A. H. Anshor, “Mental Health Chatbot for Detecting Depression Symptoms Using Natural Language Processing and DASS-21,” Jurnal Teknologi Universitas Muhammadiyah Jakarta, 2025, doi: 10.24853/jurtek.17.2.133-142.

[47] N. Ahmed, A. K. Saha, Md. A. Al Noman, J. R. Jim, M. F. Mridha, and M. M. Kabir, “Deep learning-based natural language processing in human–agent interaction: Applications, advancements and challenges,” Natural Language Processing Journal, vol. 9, p. 100112, Dec. 2024, doi: 10.1016/j.nlp.2024.100112.

[48] P. Hiskiawan, C. Chih, C. Zheng, and K. Ye, “Processing of electrical resistivity tomography data using convolutional neural network in ERT NET architectures,” Arabian Journal of Geosciences, pp. 1–14, 2023, doi: 10.1007/s12517-023-11690-w.

[49] Murat Başal, “Natural Language Processing for Sentiment Analysis in Social Media Marketing,” Economics World, vol. 12, no. 1, Mar. 2025, doi: 10.17265/2328-7144/2025.01.004.

[50] M. Claesen and B. De Moor, “Hyperparameter Search in Machine Learning,” CoRR, vol. abs/1502.0, 2015.

[51] M. Iqbal, Hendri Mahmud Nawawi, M. R. Ramadhan Saelan, M. Sony Maulana, Yudhistira, and A. Mustopa, “Optimasi Hyperparameter Multilayer Perceptron Untuk Prediksi Daya Beli Mobil,” Jurnal Manajemen Informatika dan Sistem Informasi, vol. 6, no. 1, pp. 73–81, 2023, doi: 10.36595/misi.v6i1.739.

[52] N. A. Rahmi, S. Defit, and Okfalisa, “The Use of Hyperparameter Tuning in Model Classification: A Scientific Work Area Identification,” International Journal on Informatics Visualization, vol. 8, no. 4, pp. 2181–2188, 2024, doi: 10.62527/joiv.8.4.3092.

[53] Z. Zhang, J. Zhu, Z. Guo, Y. Zhang, Z. Li, and B. Hu, “Natural Language Processing for Depression Prediction on Sina Weibo: Method Study and Analysis,” JMIR Ment. Health, vol. 11, 2024, doi: 10.2196/58259.

[54] Y. Zhou, “Depression Prediction Model based on NLP,” Applied and Computational Engineering, vol. 109, no. 1, pp. 109–112, Nov. 2024, doi: 10.54254/2755-2721/109/20241284.

[55] W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,” Adv. Neural Inf. Process. Syst., no. Nips, pp. 4905–4913, 2016.

[56] M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” pp. 1–17, 2020, [Online]. Available: http://arxiv.org/abs/2008.05756

[57] M. Studer, G. Ritschard, A. Gabadinho, and N. S. Muller, “Discrepancy Analysis of State Sequences,” Sociol. Methods Res., vol. 40, no. 3, pp. 471–510, 2011, doi: 10.1177/0049124111415372.

[58] P. Hiskiawan, E. Stephanie, H. Heryanto, and S. A. Feri, “Trustworthy Data Science Framework for Non-Invasive Nutritional Screening Using Computer Vision,” in 2025 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2025, pp. 1744–1748.

[59] M. Ranjan, S. Tiwari, A. M. Sattar, and N. S. Tatkar, “A New Approach for Carrying Out Sentiment Analysis of Social Media Comments Using Natural Language Processing,” Engineering Proceedings, vol. 59, no. 1, 2023, doi: 10.3390/engproc2023059181.

[60] F. Joanda Kaunang, A. Pramana Thenata, B. Hakim, D. Fernando Nainggolan, P. Hiskiawan, and Ranny, “Sound Engine Based In-Situ Environment Leveraging Neural Network Classification Algorithm,” in 2025 IEEE International Conference on Artificial Intelligence for Learning and Optimization (ICoAILO), 2025, pp. 352–358. doi: 10.1109/ICoAILO66760.2025.11156048.

[61] and L. F. T. J. P. Hiskiawan, J. William, “A Hybrid Data Science Framework for Forecasting Bitcoin Prices using Traditional and AI Models,” JAIC, vol. 9, no. 5, pp. 2089–2101, 2025, doi: https://doi.org/10.30871/jaic.v9i5.10631.

[62] P. Hiskiawan, S. A. Yasodhara, and D. Alexander, “Mel-Frequency Cepstral Coefficients and Neural Networks for Indonesian Traditional Music Recognition,” in 2025 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2025, pp. 1707–1712.

[63] M. Widmann, “Cohen’s kappa: Learn it, use it, judge it,” www.knime.com, 2024.

[64] I. Ramadhani and W. Maharani, “Predicting Depressive Disorder Based on DASS-42 on Twitter Using XLNet’s Pretrained Model Classification Text,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 379–385, Sep. 2022, doi: 10.47065/josyc.v3i4.2157.

[65] H. Mao and Q. Han, “Enhancing TextGCN for depression detection on social media with emotion representation,” Front. Psychol., vol. 16, 2025, doi: 10.3389/fpsyg.2025.1612769.

[66] M. T. Ribeiro, T. Wu, C. Guestrin, and S. Singh, “Beyond Accuracy: Behavioral Testing of NLP Models with Checklist (Extended Abstract),” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), 2021. [Online]. Available: https://github.com/marcotcr/checklist.

Downloads

Published

2026-06-08

How to Cite

[1]
W. Winson and P. Hiskiawan, “An Applied Data Science Approach for Detecting Depression Symptoms in Indonesian Social Media Text Using Transformer Models”, JAIC, vol. 10, no. 3, pp. 2115–2127, Jun. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.