Improving Panic Disorder Classification Using SMOTE and Random Forest
Abstract
Panic disorder is a serious anxiety disorder that can significantly impact an individual's mental health. If left undetected, this disorder can disrupt daily life, social relationships, and overall quality of life. Early detection and intervention are crucial for managing panic disorder and improving the well-being of those affected. Technology plays a pivotal role in facilitating early detection through data-driven approaches that employ algorithms to identify patterns of behavior or symptoms associated with panic disorder. Accurate classification of panic disorder is crucial for effective diagnosis and treatment. However, machine learning models trained on imbalanced datasets, such as those containing panic disorder patients, are prone to overfitting, leading to poor generalization performance. This study investigates the effectiveness of the Synthetic Minority Oversampling Technique (SMOTE) in addressing overfitting in panic disorder dataset classification using the Random Forest algorithm. The results demonstrate that SMOTE significantly improves the classification performance of Random Forest. By mitigating overfitting and improving generalization to unseen data, SMOTE increases accuracy by 15 percentage points. Before using SMOTE, the accuracy was 82%, and after using SMOTE it is 97%. The findings underscore the promise of SMOTE as a tool for boosting the performance of machine learning algorithms in classifying panic disorder from imbalanced data.
Downloads
References
W. H. Organization, “Depression and other mental disorders,” 2020, https://www.who.int/publications/i/item/depression-global-health-estimates.
P. Cao, D. Zhao, and O. Zaiane, “An Optimized Cost-Sensitive SVM for Imbalanced Data Learning,” in Advances in Knowledge Discovery and Data Mining, vol. 7819, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, Eds., in Lecture Notes in Computer Science, vol. 7819. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 280–292. doi: 10.1007/978-3-642-37456-2_24.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” jair, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
L. Liu, S. Tang, F. -X. Wu, Y. -P. Wang and J. Wang, "An Ensemble Hybrid Feature Selection Method for Neuropsychiatric Disorder Classification," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 3, pp. 1459-1471, 1 May-June 2022, doi: 10.1109/TCBB.2021.3053181.
Q. Chen, Z.-L. Zhang, W.-P. Huang, J. Wu, and X.-G. Luo, “PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets,” Neurocomputing, vol. 498, pp. 75–88, Aug. 2022, doi: 10.1016/j.neucom.2022.05.017.
G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: 10.1145/1007730.1007735.
D. Wang, P., Yu, Z., & Zhang, “Facial expression recognition for panic disorder detection using convolutional neural networks,” IEEE, vol. 6, 2018.
T. Li, H., Sun, F., & Zhang, “Speech emotion recognition for panic disorder detection using recurrent neural networks,” IEEE, vol. 6, 2018.
V. Srividhya and R. Anitha, “Evaluating Preprocessing Techniques in Text Categorization,” pp. 49–51, 2010.
S. Saifullah, Y. Fauziyah, and A. S. Aribowo, “Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data,” J. Inform., vol. 15, no. 1, p. 45, 2021, doi: 10.26555/jifo.v15i1.a20111.
S. Easterbrook and J. Callahan, “Formal Methods for Verification and Validation of Partial Specifications : A Case Study 1 Introduction 2 Context : The IV & V Process,” pp. 1–13.
L. Tommy, D. Novianto, and Y. S. Japriadi, “Sistem Rekomendasi Hybrid untuk Pemesanan Hidangan Berdasarkan Karakteristik dan Rating Hidangan,” J. Appl. Informatics Comput., vol. 4, no. 2, pp. 137–145, 2020, doi: 10.30871/jaic.v4i2.2687.
R. C. Bhagat and S. S. Patil, “Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest.” 2015 IEEE International Advance Computing Conference (IACC), 2015, doi: 10.1109/iadcc.2015.7154739.
Andri, R. Yunis, and Tanti, “Optimizing Random Forest Classification Using Chi-Square and SMOTE-ENN on Student Drop-Out Data.” 2023 Eighth International Conference on Informatics and Computing (ICIC), 2023, doi: 10.1109/icic60109.2023.10382055.
J. Prasetya and A. Abdurakhman, “Comparison Of Smote Random Forest And Smote K-Nearest Neighbors Classification Analysis On Imbalanced Data.” Media Statistika, vol. 15, no. 2, pp. 198-208, 2023, doi: 10.14710/medstat.15.2.198-208.
I. Permatasari, B. Dermawan, I. Maulana, and D. Kurniawan, “Classification of COVID-19 Aid Recipients in Kasomalang District Using the K-Nearest Neighbor Method”, JAIC, vol. 8, no. 1, pp. 133-139, Jul. 2024.
S. Himawan, R. Sohiburoyyan, and I. Iryanto, “Hyperparameter Tuning on Graph Neural Network for the Classification of SARS-CoV-2 Inhibitors”, JAIC, vol. 7, no. 2, pp. 186-191, Nov. 2023.
M. Fajri and A. Primajaya, “Komparasi Teknik Hyperparameter Optimization pada SVM untuk Permasalahan Klasifikasi dengan Menggunakan Grid Search dan Random Search”, JAIC, vol. 7, no. 1, pp. 10-15, Jul. 2023.
W. Husain, L. K. Xin, N. A. Rashid and N. Jothi, "Predicting Generalized Anxiety Disorder among women using random forest approach," 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 2016, pp. 37-42, doi: 10.1109/ICCOINS.2016.7783185.
S. F. Abdoh, M. Abo Rizka and F. A. Maghraby, "Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques," in IEEE Access, vol. 6, pp. 59475-59485, 2018, doi: 10.1109/ACCESS.2018.2874063.
Copyright (c) 2024 Dini Nurmalasari
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).