A Hybrid Approach to Music Recommendations Based on Audio Similarity Using Autoencoder and LightGBM
DOI:
https://doi.org/10.30871/jaic.v9i6.10516Keywords:
Music Recommendation System, Audio Feature, Autoencoder, PCA, LightGBMAbstract
Music recommendation systems help users navigate large music collections by suggesting songs aligned with their preferences. However, conventional methods often overlook the depth of audio content, limiting personalization and accuracy. This study proposes a hybrid approach that uses PCA and Autoencoder to extract audio embeddings. These embeddings are processed using K-Nearest Neighbors to find similar tracks, followed by a reranking step with LightGBM based on predicted relevance. The system achieved strong results: 98% accuracy, 0.96 precision, 0.96 recall, and 0.96 F1-score for the Similar class, with 0.99 precision and recall for Not Similar. Cross-validation confirmed model robustness, with an average accuracy of 97.99%, precision of 0.9577, recall of 0.9624, and F1-score of 0.9600, all with low standard deviations. These outcomes show that combining deep audio features with machine learning ranking enhances recommendation quality. Future improvements may involve incorporating metadata and genre-based visualizations for more diverse and interpretable results.
Downloads
References
[1] A. I. Putra and R. R. Santika, “Implementasi Machine Learning dalam Penentuan Rekomendasi Musik dengan Metode Content-Based Filtering,” Edumatic J. Pendidik. Inform., vol. 4, no. 1, pp. 121–130, 2020, doi: 10.29408/edumatic.v4i1.2162.
[2] H. Siefkes, L. C. Oliveira, R. Koppel, and W. Hogan, “Machine Learning – Based Critical,” pp. 1–13, 2024, doi: 10.1109/CONFLUENCE47617.2020.9058196.
[3] S. Pencarian et al., “Lagu Untuk Pengalaman Mendengarkan Yang Lebih Personal Menggunakan Content-Based Filtering,” vol. 8, no. 2, pp. 169–174, 2025.
[4] A. T. R. Dani, V. Ratnasari, L. Ni’matuzzahroh, I. C. AVIANTHOLIB, R. NOVIDIANTO, and N. Y. ADRIANINGSIH, “Analisis Klasifikasi Artist Music Menggunakan Model Regresi Logistik Biner Dan Analisis Diskriminan,” Jambura J. Probab. Stat., vol. 3, no. 1, pp. 1–10, 2022, doi: 10.34312/jjps.v3i1.13708.
[5] Y. Jiang and F. H. F. Leung, “Vector-Based Feature Representations for Speech Signals: From Supervector to Latent Vector,” IEEE Trans. Multimed., vol. 23, no. c, pp. 2641–2655, 2021, doi: 10.1109/TMM.2020.3014559.
[6] K. R. Putra and M. A. Rachman, “Perbandingan Metode Content-based , Collaborative dan Hybrid Filtering pada Sistem Rekomendasi Lagu,” vol. 9, no. 2, pp. 179–193, 2024, doi: https://doi.org/10.26760/mindjournal.v1i1.49.
[7] X. Wang, Z. Wang, Y. Zhang, X. Jiang, and Z. Cai, “Latent representation learning based autoencoder for unsupervised feature selection in hyperspectral imagery,” Multimed. Tools Appl., vol. 81, no. 9, pp. 12061–12075, 2022, doi: 10.1007/s11042-020-10474-8.
[8] T. Sugiura, Y. Yamagishi, and Y. Kishimoto, “Leveraging LightGBM Ranker for Efficient Large-Scale News Recommendation Systems,” ACM Int. Conf. Proceeding Ser., pp. 27–31, 2024, doi: 10.1145/3687151.3687156.
[9] Karthik V, Savita Chaudhary, and Radhika A D, “Feature Extraction in Music information retrival using Machine Learning Algorithms,” Int. J. Data Informatics Intell. Comput., vol. 1, no. 1, pp. 1–10, 2024, doi: 10.59461/ijdiic.v1i1.11.
[10] M. U. Hassan, N. Zafar, H. Ali, I. Yaqoob, S. A. A. Alaliyat, and I. A. Hameed, “Collaborative Filtering Based Hybrid Music Recommendation System,” Lect. Notes Networks Syst., vol. 350, pp. 239–249, 2022, doi: 10.1007/978-981-16-7618-5_21.
[11] M. Casella, P. Dolce, M. Ponticorvo, and D. Marocco, “Autoencoders as an alternative approach to Principal Component Analysis for dimensionality reduction. An application on simulated data from psychometric models,” CEUR Workshop Proc., vol. 3100, pp. 0–2, 2021.
[12] A. Y. Timur and A. N. Rohman, “Indonesia Metode Content-Based Filtering Dan Cosine Similarity,” vol. 13, no. 1, pp. 1415–1423, 2025.
[13] A. E. Karrar, “The Effect of Using Data Pre-Processing by Imputations in Handling Missing Values,” Indones. J. Electr. Eng. Informatics, vol. 10, no. 2, pp. 375–384, 2022, doi: 10.52549/ijeei.v10i2.3730.
[14] L. Wang, S. Jiang, and S. Jiang, “A feature selection method via analysis of relevance, redundancy, and interaction,” Expert Syst. Appl., vol. 183, no. August 2019, p. 115365, 2021, doi: 10.1016/j.eswa.2021.115365.
[15] M. Alkhayrat, M. Aljnidi, and K. Aljoumaa, “A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-0286-0.
[16] T. Pratiwi, A. Sunyoto, and D. Ariatmanto, “Music Genre Classification Using K-Nearest Neighbor and Mel-Frequency Cepstral Coefficients,” Sinkron, vol. 8, no. 2, pp. 861–867, 2024, doi: 10.33395/sinkron.v8i2.12912.
[17] X. Cheng, K. Liu, X. Hu, T. Liu, C. Che, and C. Zhu, “Comparative Analysis of Machine Learning Models for Music Recommendation,” Theor. Nat. Sci., vol. 53, no. 1, pp. 249–254, 2024, doi: 10.54254/2753-8818/53/20240233.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Winda Ardelia Aristawidya, Majid Rahardi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








