Baby Cry Classification Using Ensemble Learning and Whisper Method Comparison

Authors

  • I Putu Yogi Prasetya Dharmawan Program Studi Teknologi Informasi, Universitas Udayana
  • I Made Agus Dwi Suarjaya Program Studi Teknologi Informasi, Universitas Udayana
  • Wayan Oger Vihikan Program Studi Teknologi Informasi, Universitas Udayana

DOI:

https://doi.org/10.30871/jaic.v9i2.9167

Keywords:

Audio Classification, Baby Cry Classification, Ensemble Learning, Whisper Model, Machine Learning

Abstract

Baby cry classification is an important topic in Machine Learning, especially in the healthcare field, as crying is the primary form of communication for infants to convey their needs or conditions. Many inexperienced parents tend to interpret baby cries in a limited way, even though each cry has unique characteristics that represent specific needs such as hunger, discomfort, sleepiness, flatulence, and abdominal pain. With the advancement of technology, identification of baby cries can now be done automatically through AI-based applications, but the implementation is still limited. This study compares the performance of ensemble learning methods, namely Random Forest and XGBoost, with the Whisper model in classifying baby cries. The results show that the Whisper-small model has the best performance with precision 0.9115 and recall 0.9007, followed by XGBoost with slightly degraded performance after hyperparameter optimization. Random Forest showed the lowest performance among the three models. Transformer-based models such as Whisper-small proved to be superior in capturing the complex patterns of infant cries, compared to tree-based models. These findings indicate the great potential of accurate and reliable models to help parents understand the needs of infants more effectively, thereby improving the quality of infant care.

Downloads

Download data is not yet available.

References

[1] P. A. Riadi, M. R. Faisal, D. Kartini, R. A. Nugroho, D. T. Nugrahadi, and D. B. Magfira, “A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 73–83, Jan. 2024, doi: 10.35882/JEEEMI.V6I1.350.

[2] S. Vesangi and S. Reddy Regatte, “A Novel Approach to Predict the Reason for Baby Cry using Machine Learning,” International Journal of Computer Science Trends and Technology (IJCST), vol. 10, [Online]. Available: www.ijcstjournal.org

[3] T. H. Rochadiani, “Pendekatan Transfer Learning Untuk Klasifikasi Tangisan Bayi Dengan Imbalance Dataset,” The Indonesian Journal of Computer Science, vol. 13, no. 2, Apr. 2024, doi: 10.33022/IJCS.V13I2.3834.

[4] S. A. Younis, D. Sobhy, and N. S. Tawfik, “Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis,” Future Internet, vol. 16, no. 7, Jul. 2024, doi: 10.3390/fi16070242.

[5] C. A. A. Soemedhy, D. P. Martiyaningsih, and V. A. Kurniawan, “Klasifikasi Tangisan Bayi Klasifikasi Tangisan Bayi Berdasarkan Amplitudo Frekuensi Suara Menggunakan Algoritma MFCC dan CNN,” Jurnal Teknik Industri, Sistem Informasi dan Teknik Informatika, vol. 1, no. 1, pp. 39–48, Oct. 2022, Accessed: Dec. 13, 2024. [Online]. Available: https://ejournal.ubibanyuwangi.ac.id/index.php/jurnal_tinsika/article/view/17

[6] T. Nadia Maghfira, T. Basaruddin, and A. Krisnadhi, “Infant cry classification using CNN - RNN,” J Phys Conf Ser, vol. 1528, no. 1, p. 012019, Jun. 2020, doi: 10.1088/1742-6596/1528/1/012019.

[7] J. Elektronik Ilmu Komputer Udayana, I. Dewa Agung Adwitya Prawangsa, and A. Eka Karyawati, “Perbandingan Metode Ensemble Learning Random Forest Dan Adaboost Pada Pengenalan Chord Instrumen Piano Dan Gitar,” JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), vol. 12, no. 4, pp. 809–816, May 2024, doi: 10.24843/JLK.2024.V12.I04.P07.

[8] M. Charola, A. Kachhi, and H. A. Patil, “Whisper Encoder features for Infant Cry Classification,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2023, pp. 1773–1777. doi: 10.21437/Interspeech.2023-1916.

[9] K. Makna et al., “Klasifikasi Makna Tangisan Bayi Menggunakan CNN Berdasarkan Kombinasi Fitur MFCC dan DWT,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 8, no. 2, pp. 599–610, Jun. 2021, doi: 10.35957/JATISI.V8I2.470.

[10] N. F. Muhammad, R. Dewan, J. Pusppanathan, and F. A. Suryanata, “Baby Crying Sound Classification using Convolutional Neural Network,” Journal of Human Centered Technology, vol. 3, no. 1, pp. 67–74, Feb. 2024, doi: 10.11113/HUMENTECH.V3N1.66.

[11] S. Y. Yusdiantoro and T. B. Sasongko, “Implementasi Algoritma MFCC dan CNN dalam Klasifikasi Makna Tangisan Bayi,” Indonesian Journal of Computer Science, vol. 12, no. 4, Aug. 2023, doi: 10.33022/IJCS.V12I4.3243.

[12] M. Charola, S. Rathod, and H. A. Patil, “Robustness of Whisper Features for Infant Cry Classification,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Science and Business Media Deutschland GmbH, 2023, pp. 421–433. doi: 10.1007/978-3-031-48312-7_34.

[13] N. S. F. Putri, A. P. Wibawa, H. Al Rasyid, A. Nafalski, and U. R. Hasyim, “Boosting and bagging classification for computer science journal,” International Journal of Advances in Intelligent Informatics, vol. 9, no. 1, pp. 27–38, Mar. 2023, doi: 10.26555/ijain.v9i1.985.

[14] A. Maiti, C. Dutta, J. S. Banerjee, and P. Sarigiannidis, “Ai For Infant Well-Being: Advanced Techniques In Cry Interpretation And Monitoring,” Journal of Mechanics of Continua and Mathematical Sciences, vol. 19, no. 2, pp. 39–65, 2024, doi: 10.26782/jmcms.2024.02.00003.

[15] Y. Zayed, A. Hasasneh, and C. Tadj, “Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features,” Diagnostics, vol. 13, no. 12, Jun. 2023, doi: 10.3390/diagnostics13122107.

[16] C. Ji, T. B. Mudiyanselage, Y. Gao, and Y. Pan, “A review of infant cry analysis and classification,” EURASIP J Audio Speech Music Process, vol. 2021, no. 1, pp. 1–17, Dec. 2021, doi: 10.1186/S13636-021-00197-5/FIGURES/5.

[17] N. Kunhare, R. Tiwari, and J. Dhar, “Particle swarm optimization and feature selection for intrusion detection system,” Sadhana - Academy Proceedings in Engineering Sciences, vol. 45, no. 1, pp. 1–14, Dec. 2020, doi: 10.1007/S12046-020-1308-5/FIGURES/12.

[18] G. Ashari Rakhmat and W. Mutohar, “MIND (Multimedia Artificial Intelligent Networking Database Prakiraan Hujan menggunakan Metode Random Forest dan Cross Validation,” Journal MIND Journal | ISSN, vol. 8, no. 2, pp. 173–187, 2023, doi: 10.26760/mindjournal.v8i2.173-187.

[19] V. R. Joshi, K. Srinivasan, P. M. D. R. Vincent, V. Rajinikanth, and C. Y. Chang, “A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification,” Front Public Health, vol. 10, p. 819865, Mar. 2022, doi: 10.3389/FPUBH.2022.819865/BIBTEX.

[20] C. Y. Chang, S. Bhattacharya, P. M. D. Raj Vincent, K. Lakshmanna, and K. Srinivasan, “An Efficient Classification of Neonates Cry Using Extreme Gradient Boosting-Assisted Grouped-Support-Vector Network,” J Healthc Eng, vol. 2021, no. 1, p. 7517313, Jan. 2021, doi: 10.1155/2021/7517313.

[21] K. Gu, J. Wang, H. Qian, and X. Su, “Study on Intelligent Diagnosis of Rotor Fault Causes with the PSO-XGBoost Algorithm,” Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/9963146.

[22] K. Danach, H. Kanj, K. Hamze, and I. Moukadem, “Optimizing Learning-Based Combinatorial Optimization Algorithms: Advanced Hyperparameter Techniques and Real-World Applications,” Nanotechnol Percept, vol. 20, no. S15, pp. 2996–3017, Dec. 2024, doi: 10.62441/NANO-NTP.VI.4434.

[23] M. Dewi Renanti, A. Buono, K. Priandana, and S. Hartono Wijaya, “Evaluating Noise-Robustness of Convolutional and Recurrent Neural Networks for Baby Cry Recognition,” 2024. [Online]. Available: www.ijacsa.thesai.org

[24] “Infant cry audio corpus.” Accessed: Jan. 29, 2025. [Online]. Available: https://www.kaggle.com/datasets/warcoder/infant-cry-audio-corpus

Downloads

Published

2025-03-10

How to Cite

[1]
I. P. Y. P. Dharmawan, I. M. A. D. Suarjaya, and W. O. Vihikan, “Baby Cry Classification Using Ensemble Learning and Whisper Method Comparison”, JAIC, vol. 9, no. 2, pp. 273–283, Mar. 2025.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.