Z-Score Based Initialization for K-Medoids Clustering: Application on QSAR Toxicity Data

Authors

  • Nurdin Nurdin Universitas Malikussaleh
  • Nova Amalia Universitas Malikussaleh
  • Fajriana Fajriana Universitas Malikussaleh

DOI:

https://doi.org/10.30871/jaic.v9i5.10448

Keywords:

K-Medoids, Medoid Initialization, Z-Score, QSAR, Clustering Evaluation

Abstract

The efficiency of clustering algorithms significantly depends on the initialization quality, especially in unsupervised learning applied to complex datasets. This study introduces an enhanced K-Medoids clustering approach using Z-Score-based medoid initialization to improve convergence speed and cluster validity. The method was evaluated using the QSAR Fish Toxicity dataset, consisting of 908 instances and seven numerical features. Initial medoids were selected based on standardized Z-Score values, resulting in a substantial reduction in convergence time from an average of 6 iterations to just 2. Clustering performance was assessed using three internal validation metrics: Davies-Bouldin Index (DBI), Silhouette Coefficient (SC), and Calinski-Harabasz Index (CHI). The DBI score decreased from 1.7328 to 0.8768, indicating improved cluster compactness and separation. In parallel, the SC increased from 0.327 to 0.619, and the CHI rose from 214.75 to 562.43, confirming more coherent and well-separated clusters. These results demonstrate that Z-Score-based initialization significantly boosts the robustness of K-Medoids, offering a simple yet effective strategy for unsupervised partitioning, particularly in toxicological and biochemical data analysis.

Downloads

Download data is not yet available.

References

[1] B. Chander and K. Gopalakrishnan, “Data clustering using unsupervised machine learning”, in Statistical Modeling in Machine Learning, Academic Press, 2023, pp. 179–204.

[2] J. Heidari, N. Daneshpour, and A. Zangeneh, “A novel K-means and K-medoids algorithms for clustering non-spherical-shape clusters non-sensitive to outliers,” Pattern Recognition, vol. 155, p. 110639, 2024.

[3] N. Den Teuling, S. Pauws, and E. van den Heuvel, “Clustering of longitudinal data: A tutorial on a variety of approaches,” arXiv preprint arXiv:2111.05469, pp. 1–37, 2021.

[4] N. D. Teuling, S. Pauws, and E. V. D. Heuvel, “Clustering of longitudinal data: A tutorial on a variety of approaches,” arXiv preprint arXiv:2111.05469, 2021.

[5] P. Ray, S. S. Reddy, and T. Banerjee, “Various dimension reduction techniques for high dimensional data analysis: a review,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3473–3515, 2021.

[6] H. Cevikalp and E. Chome, “Robust and compact maximum margin clustering for high-dimensional data,” Neural Computing and Applications, vol. 36, no. 11, pp. 5981–6003, 2024.

[7] J. O. Agushaka and A. E. Ezugwu, “Initialisation approaches for population-based metaheuristic algorithms: a comprehensive review,” Applied Sciences, vol. 12, no. 2, p. 896, 2022.

[8] S. Yarat, S. Senan, and Z. Orman, “A comparative study on PSO with other metaheuristic methods,” Applying Particle Swarm Optimization: New Solutions and Cases for Optimized Portfolios, pp. 49–72, 2021.

[9] X. Wu et al., “Multi-UAV task allocation based on improved genetic algorithm,” IEEE Access, vol. 9, pp. 100369–100379, 2021.

[10] Y. Wang and Z. Han, “Ant colony optimization for traveling salesman problem based on parameters optimization,” Applied Soft Computing, vol. 107, p. 107439, 2021.

[11] N. Hasdyna and R. K. Dinata, “A Hybrid Optimization of Supervised Learning Models using Information Gain-Based Feature Selection,” International Journal of Computing, vol. 24, no. 1, pp. 178–189, Mar. 2025.

[12] H. Henderi et al., “Optimization of Davies-Bouldin Index with k-medoids algorithm,” AIP Conference Proceedings, vol. 3065, no. 1, p. 030002, Sep. 2024.

[13] H. Lai, T. Huang, B. Lu, S. Zhang, and R. Xiaog, “Silhouette coefficient-based weighting k-means algorithm,” Neural Computing and Applications, vol. 37, no. 5, pp. 3061–3075, 2025.

[14] F. M. Hasan, T. F. Hussein, H. D. Saleem, and O. S. Qasim, “Enhanced unsupervised feature selection method using crow search algorithm and Calinski-Harabasz,” International Journal of Computational Methods and Experimental Measurements, vol. 12, no. 2, pp. 185–190, 2024.

[15] R. K. Dinata, S. Retno, and N. Hasdyna, “Minimization of the Number of Iterations in K-Medoids Clustering with Purity Algorithm,” Revue d'Intelligence Artificielle, vol. 35, no. 3, pp. 193–199, 2021.

[16] P. Jarupunphol, S. Kuptabut, and W. Sudjarid, “Evaluating K-Means and K-Medoids clustering for household poverty analysis using random forests,” Multidisciplinary Science Journal, vol. 7, no. 11, p. 2025557, 2025.

[17] A. Alfitra, N. Nurdin and R. Meiyanti, “Comparison of K-Means and K-Medoids Methods in Clustering High Population Density Areas in Bireuen Regency,” JITE (Journal of Informatics and Telecommunication Engineering ), vol. 8, no. 3, pp. 42–50, 2025

[18] T. Salsabila, N. Nurdin and S. Retno, “Comparison of K-Medoids and K-Means Result for Regional Clustering of Capture Fisheries in Aceh Province,” IJESTY, vol. 5, no. 2, pp. 282–289, 2025.

[19] N. Hasdyna, R. K. Dinata, Rahmi, and T. I. Fajri, “Hybrid Machine Learning for Stunting Prevalence: A Novel Comprehensive Approach to Its Classification, Prediction, and Clustering Optimization in Aceh, Indonesia,” Informatics, vol. 11, no. 4, p. 89, 2024.

[20] N. Nurdin, Fajriana, Rini Meiyanti, Adelia, and Maya Maulita, “Clustering and Mapping of Agricultural Production Based on Geographic Information System Using K-Medoids Algorithm”, JAIT, vol. 5, pp. 116–124, Feb. 2025.

[21] A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Cluster validity indices for automatic clustering: A comprehensive review,” Heliyon, vol. 11, no. 2, 2025.

[22] C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, “Analysis of microarray data using Z score transformation,” Journal of Molecular Diagnostics, vol. 5, no. 2, pp. 73–81, 2003.

[23] R. K. Dinata, R. T. Adek, N. Hasdyna, and S. Retno, “K-nearest neighbor classifier optimization using purity,” AIP Conference Proceedings, vol. 2431, no. 1, AIP Publishing, Aug. 2023.

[24] F. M. Hasan, T. F. Hussein, H. D. Saleem, and O. S. Qasim, “Enhanced unsupervised feature selection method using crow search algorithm and Calinski-Harabasz,” International Journal of Computational Methods and Experimental Measurements, vol. 12, no. 2, pp. 185–190, 2024.

[25] P. Palli, S. Mishra, and P. S. Rao, “Inferring compound similarity: a clustering approach in drug discovery,” in 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU), IEEE, pp. 1–6, Mar. 2024.

Downloads

Published

2025-10-08

How to Cite

[1]
N. Nurdin, N. Amalia, and F. Fajriana, “Z-Score Based Initialization for K-Medoids Clustering: Application on QSAR Toxicity Data”, JAIC, vol. 9, no. 5, pp. 2410–2417, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.