Data Streaming Pipeline Model Using DBSTREAM-Based Online Machine Learning for E-Commerce User Segmentation

Muhammad Adin Musababa; Muhammad Fachrie

doi:10.30871/jaic.v9i6.11522

Authors

Muhammad Adin Musababa Universitas Teknologi Yogyakarta
Muhammad Fachrie Universitas Teknologi Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.11522

Keywords:

Data Streaming, DBSTREAM, Online Machine Learning, Pipeline, Segmentation

Abstract

The rapid development of information technology has driven major transformations in the digital business sector, particularly e-commerce. Consumers who shop at e-commerce sites generally have different characteristics, behaviors, and needs. Analyzing the behavior of each consumer is difficult to do manually, requiring an automation system that can help identify consumer behavior patterns adaptively. However, most customer segmentation approaches still rely on batch learning methods based on static data, making them unable to quickly adapt to changes in user behavior. This study aims to design a streaming data pipeline based on Online Machine Learning (OML) integrated with the Density-Based Clustering for Data Streams (DBSTREAM) algorithm to produce adaptive e-commerce user segmentation. The system was developed using Python with RabbitMQ as a real-time data stream simulator, MongoDB for storing results, and Streamlit as a visualization interface. The clustering process was performed incrementally using DBSTREAM, then stabilized through Hierarchical Agglomerative Clustering (HAC) to avoid over-segmentation. Evaluation using the Silhouette Coefficient and Davies-Bouldin Index (DBI) shows that the optimal model for the cluster threshold is in the range of 0.6 to 0.8 and for the fading factor is 0.0005 or even smaller, such as 0.0003. The evaluation results obtained a Silhouette value of -0.1125 and a DBI of 0.2796. These results prove that DBSTREAM-based OML integration is capable of forming consumer behavior segmentation efficiently and adaptively to continuous and real-time changes in streaming data.

Downloads

Download data is not yet available.

References

[1] E. Febrianty, L. Awalina, and W. I. Rahayu, “Optimalisasi Strategi Pemasaran dengan Segmentasi Pelanggan Menggunakan Penerapan K-Means Clustering pada Transaksi Online Retail Optimizing Marketing Strategies with Customer Segmentation Using K-Means Clustering on Online Retail Transactions,” Jurnal Teknologi dan Informasi (JATI), vol. 13, 2023, doi: 10.34010/jati.v13i2.

[2] F. Helmi, “Analisis Perilaku Pelanggan E-Commerce Menggunakan Model Klustering Dengan Algoritma K-Means,” 2024. [Online]. Available: http://repository.unas.ac.id/10631/

[3] R. Siagian, P. S. Pahala Sirait, and A. Halima, “E-Commerce Customer Segmentation Using K-Means Algorithm and Length, Recency, Frequency, Monetary Model,” Journal Of Informatics And Telecommunication Engineering, vol. 5, no. 1, pp. 21–30, Jul. 2021, doi: 10.31289/jite.v5i1.5182.

[4] J. Wu et al., “An Empirical Study on Customer Segmentation by Purchase Behaviors Using a RFM Model and K -Means Algorithm,” Math Probl Eng, vol. 2020, 2020, doi: 10.1155/2020/8884227.

[5] C. Hafidz Ardana et al., “Segmentasi Pelanggan Penjualan Online Menggunakan Metode K-means Clustering,” 2024.

[6] A. S. M. S. Hossain, “Customer segmentation using centroid based and density based clustering algorithms,” in 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 2017, pp. 1–6. doi: 10.1109/EICT.2017.8275249.

[7] S. Shalev-Shwartz, “Online learning and online convex optimization,” 2011. doi: 10.1561/2200000018.

[8] V. Abeykoon et al., “Streaming Machine Learning Algorithms with Big Data Systems,” in 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 5661–5666. doi: 10.1109/BigData47090.2019.9006337.

[9] M. Tareq, E. A. Sundararajan, M. Mohd, and N. S. Sani, “Online clustering of evolving data streams using a density grid-based method,” IEEE Access, vol. 8, pp. 166472–166490, 2020, doi: 10.1109/ACCESS.2020.3021684.

[10] M. Fakhrun Nuha, Q. M. Baligh Ghoni, and A. Zarfan Shabirin, “Analisis Pola Perilaku Konsumen E-Commerce Menggunakan Ensemble Learning: Studi pada Brazilian E-Commerce Public Dataset by Olist.”

[11] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L. F. De Carvalho, and J. Gama, “Data stream clustering: A survey,” ACM Comput Surv, vol. 46, no. 1, Jan. 2013, doi: 10.1145/2522968.2522981.

[12] M. Hahsler and M. Bolaos, “Clustering Data Streams Based on Shared Density between Micro-Clusters,” IEEE Trans Knowl Data Eng, vol. 28, no. 6, pp. 1449–1461, Jun. 2016, doi: 10.1109/TKDE.2016.2522412.

[13] S. Zhang, J. Liu, and X. Zuo, “Adaptive Online Incremental Learning for Evolving Data Streams,” Jan. 2022, doi: 10.1016/j.asoc.2021.107255.

[14] E. Bartz and T. Bartz-Beielstein Editors, “Machine Learning: Foundations, Methodologies, and Applications Online Machine Learning A Practical Guide with Examples in Python.”

[15] M. Guntara and F. D. Astuti, “Komparasi Kinerja Label-Encoding dengan One-Hot-Encoding pada Algoritma K-Nearest Neighbor menggunakan Himpunan Data Campuran,” JIKO (Jurnal Informatika dan Komputer), vol. 9, no. 2, p. 352, Jun. 2025, doi: 10.26798/jiko.v9i2.1605.

[16] P. Palinggik Allorerung, A. Erna, M. Bagussahrir, and S. Alam, “Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit,” 2024.

[17] D. Destin et al., “Mapping districts in West Java by under-five pneumonia indicators: an agglomerative hierarchical clustering study (Open Data Jabar 2023),” Commun. Math. Biol. Neurosci., vol. 2025, no., Jan. 2025, doi: 10.28919/cmbn/9547.

[18] Y. Hasan, “Pengukuran Silhouette Score Dan Davies-Bouldin Index Pada Hasil Cluster K-Means Dan Dbscan,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, Oct. 2024, doi: 10.23960/jitet.v12i3S1.5001.

[19] M. Sholeh and K. Aeni, “Perbandingan Evaluasi Metode Davies Bouldin, Elbow dan Silhouette pada Model Clustering dengan Menggunakan Algoritma K-Means,” STRING (Satuan Tulisan Riset dan Inovasi Teknologi), vol. 8, no. 1, p. 56, Aug. 2023, doi: 10.30998/string.v8i1.16388.

Data Streaming Pipeline Model Using DBSTREAM-Based Online Machine Learning for E-Commerce User Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn