Data Streaming Pipeline Model Using DBSTREAM-Based Online Machine Learning for E-Commerce User Segmentation
DOI:
https://doi.org/10.30871/jaic.v9i6.11522Keywords:
Data Streaming, DBSTREAM, Online Machine Learning, Pipeline, SegmentationAbstract
The rapid development of information technology has driven major transformations in the digital business sector, particularly e-commerce. Consumers who shop at e-commerce sites generally have different characteristics, behaviors, and needs. Analyzing the behavior of each consumer is difficult to do manually, requiring an automation system that can help identify consumer behavior patterns adaptively. However, most customer segmentation approaches still rely on batch learning methods based on static data, making them unable to quickly adapt to changes in user behavior. This study aims to design a streaming data pipeline based on Online Machine Learning (OML) integrated with the Density-Based Clustering for Data Streams (DBSTREAM) algorithm to produce adaptive e-commerce user segmentation. The system was developed using Python with RabbitMQ as a real-time data stream simulator, MongoDB for storing results, and Streamlit as a visualization interface. The clustering process was performed incrementally using DBSTREAM, then stabilized through Hierarchical Agglomerative Clustering (HAC) to avoid over-segmentation. Evaluation using the Silhouette Coefficient and Davies-Bouldin Index (DBI) shows that the optimal model for the cluster threshold is in the range of 0.6 to 0.8 and for the fading factor is 0.0005 or even smaller, such as 0.0003. The evaluation results obtained a Silhouette value of -0.1125 and a DBI of 0.2796. These results prove that DBSTREAM-based OML integration is capable of forming consumer behavior segmentation efficiently and adaptively to continuous and real-time changes in streaming data.
Downloads
References
[1] E. Febrianty, L. Awalina, and W. I. Rahayu, “Optimalisasi Strategi Pemasaran dengan Segmentasi Pelanggan Menggunakan Penerapan K-Means Clustering pada Transaksi Online Retail Optimizing Marketing Strategies with Customer Segmentation Using K-Means Clustering on Online Retail Transactions,” Jurnal Teknologi dan Informasi (JATI), vol. 13, 2023, doi: 10.34010/jati.v13i2.
[2] F. Helmi, “Analisis Perilaku Pelanggan E-Commerce Menggunakan Model Klustering Dengan Algoritma K-Means,” 2024. [Online]. Available: http://repository.unas.ac.id/10631/
[3] R. Siagian, P. S. Pahala Sirait, and A. Halima, “E-Commerce Customer Segmentation Using K-Means Algorithm and Length, Recency, Frequency, Monetary Model,” Journal Of Informatics And Telecommunication Engineering, vol. 5, no. 1, pp. 21–30, Jul. 2021, doi: 10.31289/jite.v5i1.5182.
[4] J. Wu et al., “An Empirical Study on Customer Segmentation by Purchase Behaviors Using a RFM Model and K -Means Algorithm,” Math Probl Eng, vol. 2020, 2020, doi: 10.1155/2020/8884227.
[5] C. Hafidz Ardana et al., “Segmentasi Pelanggan Penjualan Online Menggunakan Metode K-means Clustering,” 2024.
[6] A. S. M. S. Hossain, “Customer segmentation using centroid based and density based clustering algorithms,” in 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 2017, pp. 1–6. doi: 10.1109/EICT.2017.8275249.
[7] S. Shalev-Shwartz, “Online learning and online convex optimization,” 2011. doi: 10.1561/2200000018.
[8] V. Abeykoon et al., “Streaming Machine Learning Algorithms with Big Data Systems,” in 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 5661–5666. doi: 10.1109/BigData47090.2019.9006337.
[9] M. Tareq, E. A. Sundararajan, M. Mohd, and N. S. Sani, “Online clustering of evolving data streams using a density grid-based method,” IEEE Access, vol. 8, pp. 166472–166490, 2020, doi: 10.1109/ACCESS.2020.3021684.
[10] M. Fakhrun Nuha, Q. M. Baligh Ghoni, and A. Zarfan Shabirin, “Analisis Pola Perilaku Konsumen E-Commerce Menggunakan Ensemble Learning: Studi pada Brazilian E-Commerce Public Dataset by Olist.”
[11] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L. F. De Carvalho, and J. Gama, “Data stream clustering: A survey,” ACM Comput Surv, vol. 46, no. 1, Jan. 2013, doi: 10.1145/2522968.2522981.
[12] M. Hahsler and M. Bolaos, “Clustering Data Streams Based on Shared Density between Micro-Clusters,” IEEE Trans Knowl Data Eng, vol. 28, no. 6, pp. 1449–1461, Jun. 2016, doi: 10.1109/TKDE.2016.2522412.
[13] S. Zhang, J. Liu, and X. Zuo, “Adaptive Online Incremental Learning for Evolving Data Streams,” Jan. 2022, doi: 10.1016/j.asoc.2021.107255.
[14] E. Bartz and T. Bartz-Beielstein Editors, “Machine Learning: Foundations, Methodologies, and Applications Online Machine Learning A Practical Guide with Examples in Python.”
[15] M. Guntara and F. D. Astuti, “Komparasi Kinerja Label-Encoding dengan One-Hot-Encoding pada Algoritma K-Nearest Neighbor menggunakan Himpunan Data Campuran,” JIKO (Jurnal Informatika dan Komputer), vol. 9, no. 2, p. 352, Jun. 2025, doi: 10.26798/jiko.v9i2.1605.
[16] P. Palinggik Allorerung, A. Erna, M. Bagussahrir, and S. Alam, “Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit,” 2024.
[17] D. Destin et al., “Mapping districts in West Java by under-five pneumonia indicators: an agglomerative hierarchical clustering study (Open Data Jabar 2023),” Commun. Math. Biol. Neurosci., vol. 2025, no., Jan. 2025, doi: 10.28919/cmbn/9547.
[18] Y. Hasan, “Pengukuran Silhouette Score Dan Davies-Bouldin Index Pada Hasil Cluster K-Means Dan Dbscan,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, Oct. 2024, doi: 10.23960/jitet.v12i3S1.5001.
[19] M. Sholeh and K. Aeni, “Perbandingan Evaluasi Metode Davies Bouldin, Elbow dan Silhouette pada Model Clustering dengan Menggunakan Algoritma K-Means,” STRING (Satuan Tulisan Riset dan Inovasi Teknologi), vol. 8, no. 1, p. 56, Aug. 2023, doi: 10.30998/string.v8i1.16388.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Adin Musababa, Muhammad Fachrie

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








