Comparative Analysis of 1D CNN Architectures for Guitar Chord Recognition from Static Hand Landmarks
DOI:
https://doi.org/10.30871/jaic.v9i6.11339Keywords:
Guitar Chord Recognition, Hand Landmarks, 1D CNN, MediaPipe, Computer Vision, Music TechnologyAbstract
Vision-based guitar chord recognition offers a promising alternative to traditional audio-driven methods, particularly for silent practice, classroom environments, and interactive learning applications. While existing research predominantly relies on full-frame image analysis using 2D convolutional networks, the use of structured hand landmarks remains underexplored despite their advantages in robustness and computational efficiency. This study presents a comprehensive comparative analysis of three one-dimensional convolutional neural network architectures—CNN-1D, ResNet-1D, and Inception-1D—for classifying seven guitar chord types using 63-dimensional static hand-landmark vectors extracted via MediaPipe Hands. The methodology encompasses extensive dataset preprocessing, targeted landmark augmentation, Bayesian hyperparameter optimization, and stratified 5-fold cross-validation. Results show that CNN-1D achieves the highest mean accuracy (97.61%), outperforming both ResNet-1D and Inception-1D, with statistical tests confirming significant improvements over ResNet-1D. Robustness experiments further demonstrate that CNN-1D maintains superior resilience under Gaussian noise, landmark occlusion, and geometric scaling. Additionally, CNN-1D provides the fastest inference and most stable computational performance, making it highly suitable for real-time or mobile deployment. These findings highlight that, for structured and low-dimensional landmark data, simpler convolutional architectures outperform deeper or multi-branch designs, offering an efficient and reliable solution for vision-based guitar chord recognition.
Downloads
References
[1] Wilson, K., & Pfeiffer, P. E. (2023). Feedback in augmented and virtual reality piano tutoring systems: a mini review. Frontiers in Virtual Reality, 4, 1207397.
[2] Chen, R., Shen, W., Srinivasamurthy, A., & Chordia, P. (2012, October). Chord Recognition Using Duration-explicit Hidden Markov Models. In ISMIR (pp. 445-450).
[3] Rao, Z., & Feng, C. (2023). Automatic Identification of Chords in Noisy Music Using Temporal Correlation Support Vector Machine. IAENG International Journal of Computer Science, 50(2).
[4] Birkeland, S., Fjeldvik, L. J., Noori, N., Yeduri, S. R., & Cenkeramaddi, L. R. (2024). Thermal video-based hand gestures recognition using lightweight cnn. Journal of Ambient Intelligence and Humanized Computing, 15(12), 3849-3860.
[5] Mitjans Coma, A. (2020). Visual recognition of guitar chords using neural networks.
[6] Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C. L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214.
[7] Ooaku, T., Linh, T. D., Arai, M., Maekawa, T., & Mizutani, K. (2018, November). Guitar chord recognition based on finger patterns with deep learning. In Proceedings of the 4th International Conference on Communication and Information Processing (pp. 54-57).
[8] Özbaltan, N. (2024). Real-time chord identification application: Enabling lifelong music education through seamless integration of audio processing and machine learning. Online Journal of Music Sciences, 9(2), 405-414.
[9] Doosti, B., Naha, S., Mirbagheri, M., & Crandall, D. J. (2020). Hope-net: A graph-based model for hand-object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6608-6617).
[10] Biswas, S., Nandy, A., Naskar, A. K., & Saw, R. (2023, November). MediaPipe with LSTM architecture for real-time hand gesture recognization. In International Conference on Computer Vision and Image Processing (pp. 422-431). Cham: Springer Nature Switzerland.
[11] bilkent, “Bilkent Cs464 Dataset,” Roboflow, 2024. https://universe.roboflow.com/bilkent/bilkent-cs464 (accessed Sep. 22, 2025).
[12] Kay, C., Mahowald, M., & Hernandez, C. PalmPilot: Drone Control using Live Hand Signal Detection.
[13] Gordienko, Y., Gordienko, N., Taran, V., Rojbi, A., Telenyk, S., & Stirenko, S. (2025). Effect of natural and synthetic noise data augmentation on physical action classification by brain–computer interface and deep learning. Frontiers in Neuroinformatics, 19, 1521805.
[14] Chen, Y., Shi, J., Hu, J., Shen, C., Huang, W., & Zhu, Z. (2025). Simulation data driven time–frequency fusion 1D convolutional neural network with multiscale attention for bearing fault diagnosis. Measurement Science and Technology, 36(3), 035109.
[15] Chen, J., Geng, X., Yao, F., Liao, X., Zhang, Y., & Wang, Y. (2024). Single-cycle pulse signal recognition based on one-dimensional deep convolutional neural network. Electronics, 13(3), 511.
[16] Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D. J. (2021). 1D convolutional neural networks and applications: A survey. Mechanical systems and signal processing, 151, 107398.
[17] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[18] Li, Z., Wang, H., Han, Q., Liu, J., Hou, M., Chen, G., ... & Weng, T. (2022). Convolutional neural network with multiscale fusion and attention mechanism for skin diseases assisted diagnosis. Computational Intelligence and Neuroscience, 2022(1), 8390997.
[19] Yan, T., Chen, G., Zhang, H., Wang, G., Yan, Z., Li, Y., ... & Wang, B. (2024). Convolutional neural network with parallel convolution scale attention module and ResCBAM for breast histology image classification. Heliyon, 10(10).
[20] Al-qaness, M. A., Dahou, A., Trouba, N. T., Abd Elaziz, M., & Helmi, A. M. (2024). TCN-inception: temporal convolutional network and inception modules for sensor-based human activity recognition. Future Generation Computer Systems, 160, 375-388.
[21] Vaiyapuri, T. (2025). An Optuna-Based Metaheuristic Optimization Framework for Biomedical Image Analysis. Engineering, Technology & Applied Science Research, 15(4), 24382-24389.
[22] Tao, S., Peng, P., Li, Y., Sun, H., Li, Q., & Wang, H. (2024). Supervised contrastive representation learning with tree-structured parzen estimator Bayesian optimization for imbalanced tabular data. Expert Systems with Applications, 237, 121294.
[23] Hanifi, S., Cammarono, A., & Zare-Behtash, H. (2024). Advanced hyperparameter optimization of deep learning models for wind power prediction. Renewable Energy, 221, 119700.
[24] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proc. 14th Int. Joint Conf. Artificial Intelligence (IJCAI), 1995, pp. 1137–1143, doi:10.1145/3730436.3730498.
[25] D. Krstajic, L. Buturovic, D. E. Leahy, and P. Thomas, “Cross-validation pitfalls when selecting and assessing regression and classification models,” J. Cheminformatics, vol. 6, no. 1, p. 10, 2014, doi:10.1186/1758-2946-6-10.
[26] S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection,” Stat. Surveys, vol. 4, pp. 40–79, 2010, doi:10.1214/09-SS054.
[27] Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[28] Anam, M. K., Defit, S., Haviluddin, H., Efrizoni, L., & Firdaus, M. B. (2024). Early stopping on CNN-LSTM development to improve classification performance. Journal of Applied Data Sciences, 5(3), 1175-1188.
[29] You, K., Long, M., Wang, J., & Jordan, M. I. (2019). How does learning rate decay help modern neural networks?. arXiv preprint arXiv:1908.01878.
[30] Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research, 5(Sep), 1089-1105.
[31] S. Varma and R. Simon, "Bias in error estimation when using cross-validation for model selection," BMC Bioinformatics, vol. 7, no. 1, p. 91, Feb. 2006, doi: 10.1186/1471-2105-7-91.
[32] S. Bates, T. Hastie, and R. Tibshirani, "Cross-validation: what does it estimate and how well does it do it?," J. Am. Stat. Assoc., vol. 119, no. 546, pp. 1434–1445, Oct. 2023, doi: 10.1080/01621459.2023.2197686.
[33] D. Wilimitis, L. Foschini, M. Rajkomar, and A. Beam, "Practical considerations and applied examples of cross-validation in machine learning," JMIR AI, vol. 2, no. 1, p. e49023, Dec. 2023, doi: 10.2196/49023.
[34] Łańcucki, A., Staniszewski, K., Nawrot, P., & Ponti, E. M. (2025). Inference-Time Hyper-Scaling with KV Cache Compression. arXiv preprint arXiv:2506.05345.
[35] Zhang, Y., & Notni, G. (2025). 3D geometric features based real-time American sign language recognition using PointNet and MLP with MediaPipe hand skeleton detection. Measurement: Sensors, 101697.
[36] Muñoz, J., & Young, C. (2018). We ran 9 billion regressions: Eliminating false positives through computational model robustness. Sociological Methodology, 48(1), 1-33.
[37] Bayle, P., Bayle, A., Janson, L., & Mackey, L. (2020). Cross-validation confidence intervals for test error. Advances in Neural Information Processing Systems, 33, 16339-16350.
[38] Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.
[39] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
[40] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[41] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
[42] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[43] Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
[44] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
[45] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).
[46] You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., ... & Hsieh, C. J. (2019). Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962.
[47] Smith, L. N. (2018). A disciplined approach to neural network hyper-parameters: Part 1--learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820.
[48] Wang, Z., Yan, W., & Oates, T. (2017, May). Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International joint conference on neural networks (IJCNN) (pp. 1578-1585). IEEE.
[49] Bagnall, A., Lines, J., Bostrom, A., Large, J., & Keogh, E. (2017). The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data mining and knowledge discovery, 31(3), 606-660.
[50] Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz, G., Holtz, C., ... & Woods, E. (2020). Tslearn, a machine learning toolkit for time series data. Journal of machine learning research, 21(118), 1-6.
[51] Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., & Kautz, J. (2016). Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4207-4215).
[52] Kiranyaz, S., Ince, T., Abdeljaber, O., Avci, O., & Gabbouj, M. (2019, May). 1-D convolutional neural networks for signal processing applications. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8360-8364). IEEE.
[53] Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314-1324).
[54] Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., & Theobalt, C. (2017). Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE international conference on computer vision (pp. 1154-1163).
[55] Núnez, J. (2022). Comparison of Spatio-Temporal Hand Pose Denoising Models (Doctoral dissertation, PhD thesis, Universitat DE Barcelona, 2022. 3).
[56] Guo, W., Qiao, Z., Sun, Y., Xu, Y., & Xiong, H. (2025, October). Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in sEMG Analysis. In Forty-second International Conference on Machine Learning.
[57] Caughlin, K., Duran-Sierra, E., Cheng, S., Cuenca, R., Ahmed, B., Ji, J., ... & Busso, C. (2022). Aligning small datasets using domain adversarial learning: Applications in automated in vivo oral cancer diagnosis. IEEE journal of biomedical and health informatics, 27(1), 457-468.
[58] Schwabe, D., Becker, K., Seyferth, M., Klaß, A., & Schaeffter, T. (2024). The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ digital medicine, 7(1), 203.
[59] Sridhar, S., Mueller, F., Oulasvirta, A., & Theobalt, C. (2015). Fast and robust hand tracking using detection-guided optimization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3221).
[60] Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical image analysis, 65, 101759.
[61] Meir, Y., Tevet, O., Tzach, Y., Hodassman, S., Gross, R. D., & Kanter, I. (2023). Efficient shallow learning as an alternative to deep learning. Scientific Reports, 13(1), 5423.
[62] Roy, P., Ghosh, S., Bhattacharya, S., & Pal, U. (2018). Effects of degradations on deep neural network architectures. arXiv preprint arXiv:1807.10108.
[63] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
[64] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). Pmlr.
[65] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
[66] Simon, T., Joo, H., Matthews, I., & Sheikh, Y. (2017). Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1145-1153).
[67] K. Tokunaga, R. Ozaki, Y. Kamihoriuchi, T. Kawasetsu and K. Hosoda, "Hand Tracking System Utilizing Learning Based on Vision Sensing and Ionic Gel Sensor Glove," 2025 IEEE/SICE International Symposium on System Integration (SII), Munich, Germany, 2025, pp. 696-701, doi: 10.1109/SII59315.2025.10871040.
[68] Khan, H., Wang, X., & Liu, H. (2022). Handling missing data through deep convolutional neural network. Information Sciences, 595, 278-293.
[69] Kim, S., Kim, H., Yun, E., Lee, H., Lee, J., & Lee, J. (2023, July). Probabilistic imputation for time-series classification with missing data. In International Conference on Machine Learning (pp. 16654-16667). PMLR.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rafi Abhista Naya, Evan Tanuwijaya

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








