Comparative Evaluation of MFCC and Mel-spectrogram Features for CNN-Based Respiratory Abnormality Detection
DOI:
https://doi.org/10.30871/jaic.v10i2.12355Keywords:
Convolutional Neural Networks (CNNs), Respiratory sounds, MFCC, Mel-spectrogram, Medical diagnosisAbstract
Automated respiratory sound analysis addresses critical limitations in traditional clinical auscultation, particularly high inter-observer variability and limited specialist access in resource-constrained settings. This study rigorously compares Mel-Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram representations for classifying respiratory abnormalities using convolutional neural networks. Using the ICBHI 2017 dataset (920 recordings, 6,898 cycles from 126 patients), we implemented identical CNN architectures differing only in input features. Class imbalance was addressed through Synthetic Minority Over-sampling Technique applied exclusively to training data. The MFCC model achieved 83% accuracy with superior sensitivity for normal sounds (97% recall), while Mel-spectrograms reached 82% accuracy with higher precision (95%). MFCC demonstrated better crackle detection (76% vs 73% recall) and wheeze precision (75% vs 71%), attributed to enhanced transient spectral capture through discrete cosine transformation. Both models showed strong discrimination (AUC > 0.90). MFCC offers computational efficiency advantages for screening applications, while Mel-spectrograms provide interpretability for diagnostic contexts. This controlled comparison provides evidence-based guidance for computer-aided respiratory diagnostic system design, particularly relevant for resource-limited healthcare environments.
Downloads
References
[1] A. Bohadana, G. Izbicki, and S. S. Kraman, "Fundamentals of lung auscultation," N. Engl. J. Med., vol. 370, no. 8, pp. 744–751, Feb. 2014.
[2] M. L. Aviles-Solis, J. C. Storvoll, S. E. Vanbelle, and H. Melbye, "Prevalence and clinical associations of wheezes and crackles in the general population: the Tromsø study," BMC Pulm. Med., vol. 19, no. 1, pp. 1–8, 2019.
[3] M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, "Auscultation of the respiratory system," Ann. Thorac. Med., vol. 10, no. 3, pp. 158–168, Jul. 2015.
[4] R. X. A. Pramono, S. Bowyer, and E. Rodriguez-Villegas, "Automatic adventitious respiratory sound analysis: A systematic review," PLoS ONE, vol. 12, no. 5, p. e0177926, May 2017.
[5] S. Reichert, R. Gass, C. Brandt, and E. Andrès, "Analysis of respiratory sounds: State of the art," Clin. Med. Circ. Respirat. Pulm. Med., vol. 2, pp. 45–58, Jan. 2008.
[6] H. Pasterkamp, S. S. Kraman, and G. R. Wodicka, "Respiratory sounds: Advances beyond the stethoscope," Am. J. Respir. Crit. Care Med., vol. 156, no. 3, pp. 974–987, Sep. 1997.
[7] A. R. A. Sovijärvi et al., "Definition of terms for applications of respiratory sounds," Eur. Respir. Rev., vol. 10, no. 77, pp. 597–610, 2000.
[8] H. J. Schreur et al., "Lung sounds during allergen-induced asthmatic responses in patients with asthma," Am. J. Respir. Crit. Care Med., vol. 153, no. 5, pp. 1510–1517, May 1996.
[9] R. L. H. Murphy Jr, "Computerized multichannel lung sound analysis," IEEE Eng. Med. Biol. Mag., vol. 26, no. 1, pp. 16–19, Jan. 2007.
[10] R. Palaniappan, K. Sundaraj, and N. U. Ahmed, "Lung sound classification using cepstral-based statistical features," Comput. Biol. Med., vol. 43, no. 3, pp. 181–191, Mar. 2013.
[11] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall, 1993.
[12] S. Perna and A. Tagarelli, "Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks," in Proc. IEEE 32nd Int. Symp. Comput.-Based Med. Syst. (CBMS), Jun. 2019, pp. 50–55.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017.
[14] M. Aykanat, Ö. Kılıç, B. Kurt, and S. Saryal, "Classification of lung sounds using convolutional neural networks," EURASIP J. Image Video Process., vol. 2017, no. 1, p. 65, Dec. 2017.
[15] J. Acharya and A. Basu, "Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning," IEEE Trans. Biomed. Circuits Syst., vol. 14, no. 3, pp. 535–544, Jun. 2020.
[16] B. M. Rocha et al., "An open access database for the evaluation of respiratory sound classification algorithms," Physiol. Meas., vol. 40, no. 3, p. 035001, Mar. 2019.
[17] B. McFee et al., "librosa: Audio and music signal analysis in Python," in Proc. 14th Python Sci. Conf., vol. 8, 2015, pp. 18–25.
[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002.
[19] M. Abadi et al., "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. [Online]. Available: https://www.tensorflow.org/
[20] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, Jan. 2014.
[21] V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 807–814.
[22] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proc. Int. Conf. Learn. Represent., 2015.
[23] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[24] K. Kochetov, E. Putin, M. Balashov, A. Filchenkov, and A. Shalyto, "Wheeze detection using convolutional neural networks," in Proc. Eur. Symp. Artif. Neural Netw., Comput. Intell. Mach. Learn., 2018, pp. 105–110.
[25] S. Chakraborty, G. Pal, and P. S. Bhattacharya, "Detection of respiratory disorder using mel-frequency cepstral coefficients and convolutional neural network," IEEE Sens. Lett., vol. 4, no. 6, pp. 1–4, Jun. 2020.
[26] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, Feb. 2020.
[27] Y. Ma, X. Xu, and Y. Li, “LungAttn: Advanced lung sound classification using attention mechanism with dual-stream CNN and Transformer,” Physiol. Meas., vol. 42, no. 10, p. 105006, Nov. 2021.
[28] S. Alqudaihi et al., “Cough sound detection and diagnosis using artificial intelligence techniques: Challenges and opportunities,” IEEE Access, vol. 9, pp. 102327–102344, 2021.
[29] H. Pham et al., “Robust detection of COVID-19 in cough sounds using recurrence plot-based feature extraction and deep learning,” Biomed. Signal Process. Control, vol. 78, p. 103963, Sep. 2022.
[30] W. Xia, D. Togneri, F. Sohel, M. Bennamoun, D. Khoo, and B. Murray, “Respiratory sound classification using long short-term memory,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2023, pp. 1–5.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Simboni Simboni Tege, Kafunda Katalay Pierre, Oshasha Oshasha Fiston, Sylvestre Frey, Albert Ntumba Nkongolo, Biaba Kuya Jirince

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








