Comparative Analysis of 1D CNN Architectures for Guitar Chord Recognition from Static Hand Landmarks

Authors

  • Rafi Abhista Naya Universitas Ciputra
  • Evan Tanuwijaya Universitas Ciputra

DOI:

https://doi.org/10.30871/jaic.v9i6.11339

Keywords:

Guitar Chord Recognition, Hand Landmarks, 1D CNN, MediaPipe, Computer Vision, Music Technology

Abstract

Vision-based guitar chord recognition offers a promising alternative to traditional audio-driven methods, particularly for silent practice, classroom environments, and interactive learning applications. While existing research predominantly relies on full-frame image analysis using 2D convolutional networks, the use of structured hand landmarks remains underexplored despite their advantages in robustness and computational efficiency. This study presents a comprehensive comparative analysis of three one-dimensional convolutional neural network architectures—CNN-1D, ResNet-1D, and Inception-1D—for classifying seven guitar chord types using 63-dimensional static hand-landmark vectors extracted via MediaPipe Hands. The methodology encompasses extensive dataset preprocessing, targeted landmark augmentation, Bayesian hyperparameter optimization, and stratified 5-fold cross-validation. Results show that CNN-1D achieves the highest mean accuracy (97.61%), outperforming both ResNet-1D and Inception-1D, with statistical tests confirming significant improvements over ResNet-1D. Robustness experiments further demonstrate that CNN-1D maintains superior resilience under Gaussian noise, landmark occlusion, and geometric scaling. Additionally, CNN-1D provides the fastest inference and most stable computational performance, making it highly suitable for real-time or mobile deployment. These findings highlight that, for structured and low-dimensional landmark data, simpler convolutional architectures outperform deeper or multi-branch designs, offering an efficient and reliable solution for vision-based guitar chord recognition.

Downloads

Download data is not yet available.

References

[1] Wilson, K., & Pfeiffer, P. E. (2023). Feedback in augmented and virtual reality piano tutoring systems: a mini review. Frontiers in Virtual Reality, 4, 1207397.

[2] Chen, R., Shen, W., Srinivasamurthy, A., & Chordia, P. (2012, October). Chord Recognition Using Duration-explicit Hidden Markov Models. In ISMIR (pp. 445-450).

[3] Rao, Z., & Feng, C. (2023). Automatic Identification of Chords in Noisy Music Using Temporal Correlation Support Vector Machine. IAENG International Journal of Computer Science, 50(2).

[4] Birkeland, S., Fjeldvik, L. J., Noori, N., Yeduri, S. R., & Cenkeramaddi, L. R. (2024). Thermal video-based hand gestures recognition using lightweight cnn. Journal of Ambient Intelligence and Humanized Computing, 15(12), 3849-3860.

[5] Mitjans Coma, A. (2020). Visual recognition of guitar chords using neural networks.

[6] Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C. L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214.

[7] Ooaku, T., Linh, T. D., Arai, M., Maekawa, T., & Mizutani, K. (2018, November). Guitar chord recognition based on finger patterns with deep learning. In Proceedings of the 4th International Conference on Communication and Information Processing (pp. 54-57).

[8] Özbaltan, N. (2024). Real-time chord identification application: Enabling lifelong music education through seamless integration of audio processing and machine learning. Online Journal of Music Sciences, 9(2), 405-414.

[9] Doosti, B., Naha, S., Mirbagheri, M., & Crandall, D. J. (2020). Hope-net: A graph-based model for hand-object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6608-6617).

[10] Biswas, S., Nandy, A., Naskar, A. K., & Saw, R. (2023, November). MediaPipe with LSTM architecture for real-time hand gesture recognization. In International Conference on Computer Vision and Image Processing (pp. 422-431). Cham: Springer Nature Switzerland.

[11] bilkent, “Bilkent Cs464 Dataset,” Roboflow, 2024. https://universe.roboflow.com/bilkent/bilkent-cs464 (accessed Sep. 22, 2025).

[12] Kay, C., Mahowald, M., & Hernandez, C. PalmPilot: Drone Control using Live Hand Signal Detection.

[13] Gordienko, Y., Gordienko, N., Taran, V., Rojbi, A., Telenyk, S., & Stirenko, S. (2025). Effect of natural and synthetic noise data augmentation on physical action classification by brain–computer interface and deep learning. Frontiers in Neuroinformatics, 19, 1521805.

[14] Chen, Y., Shi, J., Hu, J., Shen, C., Huang, W., & Zhu, Z. (2025). Simulation data driven time–frequency fusion 1D convolutional neural network with multiscale attention for bearing fault diagnosis. Measurement Science and Technology, 36(3), 035109.

[15] Chen, J., Geng, X., Yao, F., Liao, X., Zhang, Y., & Wang, Y. (2024). Single-cycle pulse signal recognition based on one-dimensional deep convolutional neural network. Electronics, 13(3), 511.

[16] Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D. J. (2021). 1D convolutional neural networks and applications: A survey. Mechanical systems and signal processing, 151, 107398.

[17] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[18] Li, Z., Wang, H., Han, Q., Liu, J., Hou, M., Chen, G., ... & Weng, T. (2022). Convolutional neural network with multiscale fusion and attention mechanism for skin diseases assisted diagnosis. Computational Intelligence and Neuroscience, 2022(1), 8390997.

[19] Yan, T., Chen, G., Zhang, H., Wang, G., Yan, Z., Li, Y., ... & Wang, B. (2024). Convolutional neural network with parallel convolution scale attention module and ResCBAM for breast histology image classification. Heliyon, 10(10).

[20] Al-qaness, M. A., Dahou, A., Trouba, N. T., Abd Elaziz, M., & Helmi, A. M. (2024). TCN-inception: temporal convolutional network and inception modules for sensor-based human activity recognition. Future Generation Computer Systems, 160, 375-388.

[21] Vaiyapuri, T. (2025). An Optuna-Based Metaheuristic Optimization Framework for Biomedical Image Analysis. Engineering, Technology & Applied Science Research, 15(4), 24382-24389.

[22] Tao, S., Peng, P., Li, Y., Sun, H., Li, Q., & Wang, H. (2024). Supervised contrastive representation learning with tree-structured parzen estimator Bayesian optimization for imbalanced tabular data. Expert Systems with Applications, 237, 121294.

[23] Hanifi, S., Cammarono, A., & Zare-Behtash, H. (2024). Advanced hyperparameter optimization of deep learning models for wind power prediction. Renewable Energy, 221, 119700.

[24] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proc. 14th Int. Joint Conf. Artificial Intelligence (IJCAI), 1995, pp. 1137–1143, doi:10.1145/3730436.3730498.

[25] D. Krstajic, L. Buturovic, D. E. Leahy, and P. Thomas, “Cross-validation pitfalls when selecting and assessing regression and classification models,” J. Cheminformatics, vol. 6, no. 1, p. 10, 2014, doi:10.1186/1758-2946-6-10.

[26] S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection,” Stat. Surveys, vol. 4, pp. 40–79, 2010, doi:10.1214/09-SS054.

[27] Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[28] Anam, M. K., Defit, S., Haviluddin, H., Efrizoni, L., & Firdaus, M. B. (2024). Early stopping on CNN-LSTM development to improve classification performance. Journal of Applied Data Sciences, 5(3), 1175-1188.

[29] You, K., Long, M., Wang, J., & Jordan, M. I. (2019). How does learning rate decay help modern neural networks?. arXiv preprint arXiv:1908.01878.

[30] Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research, 5(Sep), 1089-1105.

[31] S. Varma and R. Simon, "Bias in error estimation when using cross-validation for model selection," BMC Bioinformatics, vol. 7, no. 1, p. 91, Feb. 2006, doi: 10.1186/1471-2105-7-91.

[32] S. Bates, T. Hastie, and R. Tibshirani, "Cross-validation: what does it estimate and how well does it do it?," J. Am. Stat. Assoc., vol. 119, no. 546, pp. 1434–1445, Oct. 2023, doi: 10.1080/01621459.2023.2197686.

[33] D. Wilimitis, L. Foschini, M. Rajkomar, and A. Beam, "Practical considerations and applied examples of cross-validation in machine learning," JMIR AI, vol. 2, no. 1, p. e49023, Dec. 2023, doi: 10.2196/49023.

[34] Łańcucki, A., Staniszewski, K., Nawrot, P., & Ponti, E. M. (2025). Inference-Time Hyper-Scaling with KV Cache Compression. arXiv preprint arXiv:2506.05345.

[35] Zhang, Y., & Notni, G. (2025). 3D geometric features based real-time American sign language recognition using PointNet and MLP with MediaPipe hand skeleton detection. Measurement: Sensors, 101697.

[36] Muñoz, J., & Young, C. (2018). We ran 9 billion regressions: Eliminating false positives through computational model robustness. Sociological Methodology, 48(1), 1-33.

[37] Bayle, P., Bayle, A., Janson, L., & Mackey, L. (2020). Cross-validation confidence intervals for test error. Advances in Neural Information Processing Systems, 33, 16339-16350.

[38] Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.

[39] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.

[40] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[41] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

[42] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[43] Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.

[44] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).

[45] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).

[46] You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., ... & Hsieh, C. J. (2019). Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962.

[47] Smith, L. N. (2018). A disciplined approach to neural network hyper-parameters: Part 1--learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820.

[48] Wang, Z., Yan, W., & Oates, T. (2017, May). Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International joint conference on neural networks (IJCNN) (pp. 1578-1585). IEEE.

[49] Bagnall, A., Lines, J., Bostrom, A., Large, J., & Keogh, E. (2017). The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data mining and knowledge discovery, 31(3), 606-660.

[50] Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz, G., Holtz, C., ... & Woods, E. (2020). Tslearn, a machine learning toolkit for time series data. Journal of machine learning research, 21(118), 1-6.

[51] Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., & Kautz, J. (2016). Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4207-4215).

[52] Kiranyaz, S., Ince, T., Abdeljaber, O., Avci, O., & Gabbouj, M. (2019, May). 1-D convolutional neural networks for signal processing applications. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8360-8364). IEEE.

[53] Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314-1324).

[54] Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., & Theobalt, C. (2017). Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE international conference on computer vision (pp. 1154-1163).

[55] Núnez, J. (2022). Comparison of Spatio-Temporal Hand Pose Denoising Models (Doctoral dissertation, PhD thesis, Universitat DE Barcelona, 2022. 3).

[56] Guo, W., Qiao, Z., Sun, Y., Xu, Y., & Xiong, H. (2025, October). Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in sEMG Analysis. In Forty-second International Conference on Machine Learning.

[57] Caughlin, K., Duran-Sierra, E., Cheng, S., Cuenca, R., Ahmed, B., Ji, J., ... & Busso, C. (2022). Aligning small datasets using domain adversarial learning: Applications in automated in vivo oral cancer diagnosis. IEEE journal of biomedical and health informatics, 27(1), 457-468.

[58] Schwabe, D., Becker, K., Seyferth, M., Klaß, A., & Schaeffter, T. (2024). The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ digital medicine, 7(1), 203.

[59] Sridhar, S., Mueller, F., Oulasvirta, A., & Theobalt, C. (2015). Fast and robust hand tracking using detection-guided optimization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3221).

[60] Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical image analysis, 65, 101759.

[61] Meir, Y., Tevet, O., Tzach, Y., Hodassman, S., Gross, R. D., & Kanter, I. (2023). Efficient shallow learning as an alternative to deep learning. Scientific Reports, 13(1), 5423.

[62] Roy, P., Ghosh, S., Bhattacharya, S., & Pal, U. (2018). Effects of degradations on deep neural network architectures. arXiv preprint arXiv:1807.10108.

[63] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).

[64] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). Pmlr.

[65] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).

[66] Simon, T., Joo, H., Matthews, I., & Sheikh, Y. (2017). Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1145-1153).

[67] K. Tokunaga, R. Ozaki, Y. Kamihoriuchi, T. Kawasetsu and K. Hosoda, "Hand Tracking System Utilizing Learning Based on Vision Sensing and Ionic Gel Sensor Glove," 2025 IEEE/SICE International Symposium on System Integration (SII), Munich, Germany, 2025, pp. 696-701, doi: 10.1109/SII59315.2025.10871040.

[68] Khan, H., Wang, X., & Liu, H. (2022). Handling missing data through deep convolutional neural network. Information Sciences, 595, 278-293.

[69] Kim, S., Kim, H., Yun, E., Lee, H., Lee, J., & Lee, J. (2023, July). Probabilistic imputation for time-series classification with missing data. In International Conference on Machine Learning (pp. 16654-16667). PMLR.

Downloads

Published

2025-12-17

How to Cite

[1]
R. A. Naya and E. Tanuwijaya, “Comparative Analysis of 1D CNN Architectures for Guitar Chord Recognition from Static Hand Landmarks”, JAIC, vol. 9, no. 6, pp. 3904–3918, Dec. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.