Z-Score Based Initialization for K-Medoids Clustering: Application on QSAR Toxicity Data
DOI:
https://doi.org/10.30871/jaic.v9i5.10448Keywords:
K-Medoids, Medoid Initialization, Z-Score, QSAR, Clustering EvaluationAbstract
The efficiency of clustering algorithms significantly depends on the initialization quality, especially in unsupervised learning applied to complex datasets. This study introduces an enhanced K-Medoids clustering approach using Z-Score-based medoid initialization to improve convergence speed and cluster validity. The method was evaluated using the QSAR Fish Toxicity dataset, consisting of 908 instances and seven numerical features. Initial medoids were selected based on standardized Z-Score values, resulting in a substantial reduction in convergence time from an average of 6 iterations to just 2. Clustering performance was assessed using three internal validation metrics: Davies-Bouldin Index (DBI), Silhouette Coefficient (SC), and Calinski-Harabasz Index (CHI). The DBI score decreased from 1.7328 to 0.8768, indicating improved cluster compactness and separation. In parallel, the SC increased from 0.327 to 0.619, and the CHI rose from 214.75 to 562.43, confirming more coherent and well-separated clusters. These results demonstrate that Z-Score-based initialization significantly boosts the robustness of K-Medoids, offering a simple yet effective strategy for unsupervised partitioning, particularly in toxicological and biochemical data analysis.
Downloads
References
[1] B. Chander and K. Gopalakrishnan, “Data clustering using unsupervised machine learning”, in Statistical Modeling in Machine Learning, Academic Press, 2023, pp. 179–204.
[2] J. Heidari, N. Daneshpour, and A. Zangeneh, “A novel K-means and K-medoids algorithms for clustering non-spherical-shape clusters non-sensitive to outliers,” Pattern Recognition, vol. 155, p. 110639, 2024.
[3] N. Den Teuling, S. Pauws, and E. van den Heuvel, “Clustering of longitudinal data: A tutorial on a variety of approaches,” arXiv preprint arXiv:2111.05469, pp. 1–37, 2021.
[4] N. D. Teuling, S. Pauws, and E. V. D. Heuvel, “Clustering of longitudinal data: A tutorial on a variety of approaches,” arXiv preprint arXiv:2111.05469, 2021.
[5] P. Ray, S. S. Reddy, and T. Banerjee, “Various dimension reduction techniques for high dimensional data analysis: a review,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3473–3515, 2021.
[6] H. Cevikalp and E. Chome, “Robust and compact maximum margin clustering for high-dimensional data,” Neural Computing and Applications, vol. 36, no. 11, pp. 5981–6003, 2024.
[7] J. O. Agushaka and A. E. Ezugwu, “Initialisation approaches for population-based metaheuristic algorithms: a comprehensive review,” Applied Sciences, vol. 12, no. 2, p. 896, 2022.
[8] S. Yarat, S. Senan, and Z. Orman, “A comparative study on PSO with other metaheuristic methods,” Applying Particle Swarm Optimization: New Solutions and Cases for Optimized Portfolios, pp. 49–72, 2021.
[9] X. Wu et al., “Multi-UAV task allocation based on improved genetic algorithm,” IEEE Access, vol. 9, pp. 100369–100379, 2021.
[10] Y. Wang and Z. Han, “Ant colony optimization for traveling salesman problem based on parameters optimization,” Applied Soft Computing, vol. 107, p. 107439, 2021.
[11] N. Hasdyna and R. K. Dinata, “A Hybrid Optimization of Supervised Learning Models using Information Gain-Based Feature Selection,” International Journal of Computing, vol. 24, no. 1, pp. 178–189, Mar. 2025.
[12] H. Henderi et al., “Optimization of Davies-Bouldin Index with k-medoids algorithm,” AIP Conference Proceedings, vol. 3065, no. 1, p. 030002, Sep. 2024.
[13] H. Lai, T. Huang, B. Lu, S. Zhang, and R. Xiaog, “Silhouette coefficient-based weighting k-means algorithm,” Neural Computing and Applications, vol. 37, no. 5, pp. 3061–3075, 2025.
[14] F. M. Hasan, T. F. Hussein, H. D. Saleem, and O. S. Qasim, “Enhanced unsupervised feature selection method using crow search algorithm and Calinski-Harabasz,” International Journal of Computational Methods and Experimental Measurements, vol. 12, no. 2, pp. 185–190, 2024.
[15] R. K. Dinata, S. Retno, and N. Hasdyna, “Minimization of the Number of Iterations in K-Medoids Clustering with Purity Algorithm,” Revue d'Intelligence Artificielle, vol. 35, no. 3, pp. 193–199, 2021.
[16] P. Jarupunphol, S. Kuptabut, and W. Sudjarid, “Evaluating K-Means and K-Medoids clustering for household poverty analysis using random forests,” Multidisciplinary Science Journal, vol. 7, no. 11, p. 2025557, 2025.
[17] A. Alfitra, N. Nurdin and R. Meiyanti, “Comparison of K-Means and K-Medoids Methods in Clustering High Population Density Areas in Bireuen Regency,” JITE (Journal of Informatics and Telecommunication Engineering ), vol. 8, no. 3, pp. 42–50, 2025
[18] T. Salsabila, N. Nurdin and S. Retno, “Comparison of K-Medoids and K-Means Result for Regional Clustering of Capture Fisheries in Aceh Province,” IJESTY, vol. 5, no. 2, pp. 282–289, 2025.
[19] N. Hasdyna, R. K. Dinata, Rahmi, and T. I. Fajri, “Hybrid Machine Learning for Stunting Prevalence: A Novel Comprehensive Approach to Its Classification, Prediction, and Clustering Optimization in Aceh, Indonesia,” Informatics, vol. 11, no. 4, p. 89, 2024.
[20] N. Nurdin, Fajriana, Rini Meiyanti, Adelia, and Maya Maulita, “Clustering and Mapping of Agricultural Production Based on Geographic Information System Using K-Medoids Algorithm”, JAIT, vol. 5, pp. 116–124, Feb. 2025.
[21] A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Cluster validity indices for automatic clustering: A comprehensive review,” Heliyon, vol. 11, no. 2, 2025.
[22] C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, “Analysis of microarray data using Z score transformation,” Journal of Molecular Diagnostics, vol. 5, no. 2, pp. 73–81, 2003.
[23] R. K. Dinata, R. T. Adek, N. Hasdyna, and S. Retno, “K-nearest neighbor classifier optimization using purity,” AIP Conference Proceedings, vol. 2431, no. 1, AIP Publishing, Aug. 2023.
[24] F. M. Hasan, T. F. Hussein, H. D. Saleem, and O. S. Qasim, “Enhanced unsupervised feature selection method using crow search algorithm and Calinski-Harabasz,” International Journal of Computational Methods and Experimental Measurements, vol. 12, no. 2, pp. 185–190, 2024.
[25] P. Palli, S. Mishra, and P. S. Rao, “Inferring compound similarity: a clustering approach in drug discovery,” in 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU), IEEE, pp. 1–6, Mar. 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nurdin Nurdin, Nova Amalia, Fajriana Fajriana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








