Household Clustering in West Java Based on Stunting Risk Factors Using K-Modes and K-Prototypes Algorithms

Muhammad Yusran; Siti Nuradilla; Mega Ramatika Putri; Anwar Fitrianto; Rachmat Bintang Yudhianto

doi:10.30871/jaic.v9i6.11508

Authors

Muhammad Yusran IPB University
Siti Nuradilla IPB University
Mega Ramatika Putri IPB University
Anwar Fitrianto IPB University
Rachmat Bintang Yudhianto IPB University

DOI:

https://doi.org/10.30871/jaic.v9i6.11508

Keywords:

Stunting, Clustering, K-Modes, K-Prototypes

Abstract

Stunting remains one of Indonesia’s most persistent public health challenges, with West Java contributing the highest number of cases due to its large population and regional disparities in household welfare. Identifying household groups vulnerable to stunting is essential for designing targeted interventions that integrate nutrition, sanitation, and socio-economic development. This study introduces a data-driven clustering framework using the K-Modes and K-Prototypes algorithms to classify 22,161 households in West Java based on 26 indicators from the March 2024 National Socioeconomic Survey (SUSENAS), encompassing food security, sanitation, drinking water access, economic conditions, social assistance, and demographics. The K-Modes algorithm was applied to categorical data, while K-Prototypes integrated numerical and categorical variables, with parameter optimization performed using a grid search and the Elbow method. Clustering performance was evaluated through the Silhouette Score, Calinski–Harabasz Index, and Davies–Bouldin Index, followed by a bootstrapped stability analysis employing the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). Results show that K-Prototypes outperformed K-Modes, yielding a higher Silhouette Score (0.6681 compared to 0.2922), higher CH Index (13,890.6 compared to 3,976.1), and lower DBI (0.4607 compared to 1.5274), indicating superior compactness and separation. Stability testing confirmed strong robustness, with mean ARI = 0.959 and mean NMI = 0.932 across 50 bootstrap replications. The optimal five-cluster structure identified distinct socioeconomic groups, with the highest stunting risk found among households with low income, limited housing space, inadequate sanitation, and more children under five. The findings highlight the effectiveness of K-Prototypes in modeling mixed-type data and support the design of evidence-based, regionally adaptive stunting reduction strategies aligned with Presidential Regulation No. 72/2021 on the Acceleration of Stunting Reduction.

Downloads

Download data is not yet available.

References

[1] TP2S, “Garut Darurat Stunting, Prevalensi Tertinggi di Jawa Barat - TP2S.” Accessed: Sept. 18, 2025. [Online]. Available: https://stunting.go.id/garut-darurat-stunting-prevalensi-tertinggi-di-jawa-barat/

[2] T. Beal, A. Tumilowicz, A. Sutrisna, D. Izwardy, and L. M. Neufeld, “A review of child stunting determinants in Indonesia,” Maternal & Child Nutrition, vol. 14, no. 4, p. e12617, Oct. 2018, doi: 10.1111/mcn.12617.

[3] S. Novianti, E. Huriyati, and R. S. Padmawati, “Safe Drinking Water, Sanitation and Mother’s Hygiene Practice as Stunting Risk Factors: A Case Control Study in a Rural Area of Ciawi Sub-district, Tasikmalaya District, West Java, Indonesia,” Ethiop J Health Sci, vol. 33, no. 6, Dec. 2023, doi: 10.4314/ejhs.v33i6.3.

[4] Kementerian Sekretariat Negara RI, “PERPRES No. 72 Tahun 2021 Tentang Percepatan Penurunan Stunting,” Database Peraturan | JDIH BPK. Accessed: Sept. 18, 2025. [Online]. Available: http://peraturan.bpk.go.id/Details/174964/perpres-no-72-tahun-2021

[5] M. F. Amalia and D. B. Arianto, “Implementasi Algoritma K-Means Clustering Dalam Klasterisasi Kabupaten/Kota Provinsi Jawa Barat Berdasarkan Faktor Pemicu Stunting Pada Balita,” simkom, vol. 9, no. 1, pp. 36–46, Jan. 2024, doi: 10.51717/simkom.v9i1.356.

[6] F. Ramadhani, “Spatial Clustering Analysis of Stunting in North Sumatra Based on Environmental Factors Using K-Means Algorithm,” Data Science: J. of Computing and Appl. Informatics, vol. 9, no. 2, July 2025, doi: 10.32734/jocai.v9.i2-17179.

[7] M. H. M. Rohman et al., “Clustering Analysis of Stunting Risk Factors Using K-Means and Principal Component Analysis: A Case Study in Indonesian Regency,” SinkrOn, vol. 9, no. 1, pp. 65–77, Jan. 2025, doi: 10.33395/sinkron.v9i1.14311.

[8] I. P. Ica, Martanto, Arif Rinaldi Dikananda, and Dede Rohman, “Use of K-Means Algorithm in Model Improvement Production Data Grouping for Determination Convection Production Strategy,” j. of artif. intell. and eng. appl., vol. 4, no. 2, pp. 916–926, Feb. 2025, doi: 10.59934/jaiea.v4i2.775.

[9] A. Ahmad and S. S. Khan, “Survey of State-of-the-Art Mixed Data Clustering Algorithms,” IEEE Access, vol. 7, pp. 31883–31902, 2019, doi: 10.1109/ACCESS.2019.2903568.

[10] Z. Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” 1998.

[11] A. Wirawan and D. Prasetyawan, “Analisis cluster data latar belakang ekonomi mahasiswa untuk rekomendasi penentuan uang kuliah tunggal dengan model K-Modes,” infotech, vol. 4, no. 2, pp. 234–246, Dec. 2023, doi: 10.37373/infotech.v4i2.898.

[12] C. Aprotama, Yenni Kurniawati, Muhammad Arief Rivano, and Devi Yopita Sipayung, “Application of K-Modes Clustering Method to Identify Low Birth Weight Factors in Central Sulawesi Province,” ujsds, vol. 3, no. 2, pp. 164–171, May 2025, doi: 10.24036/ujsds/vol3-iss2/357.

[13] K. S. Dorman and R. Maitra, “An efficient K-modes algorithm for clustering categorical datasets,” Statistical Analysis, vol. 15, no. 1, pp. 83–97, Feb. 2022, doi: 10.1002/sam.11546.

[14] A. Yildiz and E. E. Aksoy, “Investigation of Individual Investment Preferences with K-Mode Cluster Analysis Based on Socio-Demographic Characteristics,” IJARBSS, vol. 10, no. 7, p. Pages 280-295, July 2020, doi: 10.6007/IJARBSS/v10-i7/7415.

[15] S. Sulastri, B. Susetyo, and I. M. Sumertajaya, “The Clustering of the Aquaculture Fisheries Companies in Indonesia Using the K-Prototypes and Two Step Cluster (TSC) Algorithm,” International Journal of Sciences, vol. 58, no. 1, 2021.

[16] A. Mohd, L. E. Teoh, and H. L. Khoo, “Passengers’ requests clustering with k-prototype algorithm for the first-mile and last-mile (FMLM) shared-ride taxi service,” Multimodal Transportation, vol. 3, no. 2, p. 100132, June 2024, doi: 10.1016/j.multra.2024.100132.

[17] A. F. H. Marsandy, M. N. Hayati, and M. Fauziyah, “Klasterisasi Prevalensi Stunting Menggunakan K-Prototype pada Data Campuran,” metik. j., vol. 8, no. 2, pp. 48–54, Dec. 2024, doi: 10.47002/metik.v8i2.824.

[18] A. Wijayanto, Y. K. Suprapto, and D. P. Wulandari, “Clustering on Multidimensional Poverty Data using PAM and K-prototypes Algorithm : Case Study: Jambi Province 2017,” in 2019 International Seminar on Intelligent Technology and Its Applications (ISITIA), Aug. 2019, pp. 210–215. doi: 10.1109/ISITIA.2019.8937130.

[19] H. Hernández, E. Alberdi, A. Goti, and A. Oyarbide-Zubillaga, “Application of the k-Prototype Clustering Approach for the Definition of Geostatistical Estimation Domains,” Mathematics, vol. 11, no. 3, p. 740, Feb. 2023, doi: 10.3390/math11030740.

[20] B. Islam, T. I. Ibrahim, T. Wang, M. Wu, and J. Qin, “Current trends in household food insecurity, dietary diversity, and stunting among children under five in Asia: a systematic review,” J Glob Health, vol. 15, p. 04049, Jan. 2025, doi: 10.7189/jogh.15.04049.

[21] M. Batool et al., “Relationship of stunting with water, sanitation, and hygiene (WASH) practices among children under the age of five: a cross-sectional study in Southern Punjab, Pakistan,” BMC Public Health, vol. 23, no. 1, p. 2153, Nov. 2023, doi: 10.1186/s12889-023-17135-z.

[22] S. Kishore, T. Thomas, H. Sachdev, A. V. Kurpad, and P. Webb, “Modeling the potential impacts of improved monthly income on child stunting in India: a subnational geospatial perspective,” BMJ Open, vol. 12, no. 4, p. e055098, Apr. 2022, doi: 10.1136/bmjopen-2021-055098.

[23] I. Siramaneerat, E. Astutik, F. Agushybana, P. Bhumkittipich, and W. Lamprom, “Examining determinants of stunting in Urban and Rural Indonesian: a multilevel analysis using the population-based Indonesian family life survey (IFLS),” BMC Public Health, vol. 24, no. 1, p. 1371, May 2024, doi: 10.1186/s12889-024-18824-z.

[24] A. Karimzadeh, S. Sabeti, and O. Shoghli, “Optimal Clustering of Pavement Segments Using K-Prototype Algorithm in a High-Dimensional Mixed Feature Space,” Journal of Management in Engineering, vol. 37, no. 4, p. 04021022, July 2021, doi: 10.1061/(ASCE)ME.1943-5479.0000910.

[25] A. Chaturvedi, P. E. Green, and J. D. Caroll, “K-modes Clustering,” J. of Classification, vol. 18, no. 1, pp. 35–55, Jan. 2001, doi: 10.1007/s00357-001-0004-3.

[26] I. Herdiana, M. A. Kamal, Triyani, M. N. Estri, and Renny, “A More Precise Elbow Method for Optimum K-means Clustering,” Feb. 09, 2025, arXiv: arXiv:2502.00851. doi: 10.48550/arXiv.2502.00851.

[27] D. R. Quinthara, Abd. C. Fauzan, and M. M. Huda, “Penerapan Algoritma K-Modes Menggunakan Validasi Davies Bouldin Index Untuk Klasterisasi Karakter Pada Game Wild Rift,” JSCE, vol. 4, no. 2, pp. 123–135, July 2023, doi: 10.61628/jsce.v4i2.802.

[28] Z. Jia and L. Song, “Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient,” Mathematical Problems in Engineering, vol. 2020, pp. 1–13, July 2020, doi: 10.1155/2020/5143797.

[29] Y. Januzaj, E. Beqiri, and A. Luma, “Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique,” Int. J. Onl. Eng., vol. 19, no. 04, pp. 174–182, Apr. 2023, doi: 10.3991/ijoe.v19i04.37059.

[30] A. M. Ikotun, F. Habyarimana, and A. E. Ezugwu, “Benchmarking validity indices for evolutionary K-means clustering performance,” Sci Rep, vol. 15, no. 1, p. 21842, July 2025, doi: 10.1038/s41598-025-08473-6.

[31] S. Lubbe, “Bootstrapping Cluster Analysis Solutions with the R Package ClusBoot,” Austrian Journal of Statistics, vol. 53, no. 3, pp. 1–19, 2024, doi: 10.17713/ajs.v53i3.1169.

[32] N. X. Vinh, J. Epps, and J. Bailey, “Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance,” Journal of Machine Learning Research, vol. 11, no. 95, pp. 2837–2854, 2010.

Household Clustering in West Java Based on Stunting Risk Factors Using K-Modes and K-Prototypes Algorithms

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn