From Web Extraction to Collaborative Filtering: An End-to-End Architecture for Reliable Recommendation Systems

Lamanabwe Epus Hervé; Blaise Muhala Luhepa; Herman MATONDO MANANGA; Dieuleveut Nianga Kaya-Kaya; Benjamin Consolant Majegeza

doi:10.30871/jaic.v10i3.12592

Authors

Lamanabwe Epus Hervé University of Kinshasa
Blaise Muhala Luhepa University of Kinshasa
Herman MATONDO MANANGA University of Kinshasa
Dieuleveut Nianga Kaya-Kaya University of Kinshasa
Benjamin Consolant Majegeza University of Kinshasa

DOI:

https://doi.org/10.30871/jaic.v10i3.12592

Keywords:

Recommender Systems, Data Pipeline Quality, Web Data Extraction, Collaborative Filtering, Hybrid Recommendation

Abstract

The growth of digital platforms has generated large volumes of Web-derived interaction data, but these data are often noisy, duplicated, incomplete, and temporally unstable. Recommendation quality therefore depends not only on the ranking model, but also on how extraction, validation, and temporal control are integrated upstream. This paper presents an end-to-end architecture in which Web extraction, schema normalization, cleaning, deduplication, anomaly quarantine, recency-aware processing, and recommendation generation are treated as a single operational pipeline. The contribution is not the use of hybrid recommendation alone, which is already common, but the explicit integration of these quality-control stages with temporally valid offline evaluation and system-level monitoring. Four recommendation strategies are studied within the same pipeline: global popularity, recency-weighted popularity, implicit matrix factorization, and a hybrid method that combines collaborative filtering with a recency-based fallback for sparse-user cold-start situations. Experiments are conducted on a realistic e-commerce dataset comprising approximately 50,000 users, 18,000 items, and 1.2 million interactions under a strict chronological 80/20 split. Evaluation includes Precision@K, Recall@K, NDCG@K, Coverage@K, sparse-user cold-start analysis, and system indicators. Results indicate that the hybrid approach achieves the best observed aggregate ranking performance under the present protocol, improves sparse-user robustness (Recall@10 = 0.158), maintains broad catalog coverage (38.9%), and remains operationally stable under the tested evaluation conditions (p95 latency = 48 ms; uptime = 99.7%). These findings support assessing recommendation quality as a property of the full data-to-recommendation pipeline rather than of the ranking algorithm alone.

Downloads

Download data is not yet available.

References

[1] Y. H. Alfaifi, ”Recommender Systems Applications: Data Sources, Features, and Challenges”, Information, vol. 15, nro 10, s. 660, loka 2024, doi: 10.3390/info15100660.

[2] A. Alhwayzee, S. Araban, ja D. Zabihzadeh, ”A Robust Recommender System Against Adversarial and Shilling Attacks Using Diffusion Networks and Self-Adaptive Learning”, Symmetry, vol. 17, nro 2, s. 233, helmi 2025, doi: 10.3390/sym17020233.

[3] N. A. M. Binti Amir Suharman, A. H.-L. Lim, ja H.-N. Goh, ”Sequence-to-pattern analysis for predicting buying decisions on imbalanced clickstream data”, Cogent Engineering, vol. 12, nro 1, s. 2501493, joulu 2025, doi: 10.1080/23311916.2025.2501493.

[4] S. Cao ym., ”FlyCache: Recommendation-driven edge caching architecture for full life cycle of video streaming”, Digital Communications and Networks, vol. 11, nro 4, s. 961–974, elo 2025, doi: 10.1016/j.dcan.2025.01.001.

[5] F. Carnovalini, A. Rodà, ja G. A. Wiggins, ”Popularity Bias in Recommender Systems: The Search for Fairness in the Long Tail”, Information, vol. 16, nro 2, s. 151, helmi 2025, doi: 10.3390/info16020151.

[6] S. Chakraborty, ”A Study On Hybrid Recommender Systems For Effective Targeted Marketing In E-Commerce Platforms”, IJABMR, vol. 02, nro 04, s. 54–64, 2025, doi: 10.62674/ijabmr.2025.v2i04.006.

[7] L. Cheng, X. Huang, J. Sang, ja J. Yu, ”Towards Robust Recommendation: A Review and an Adversarial Robustness Evaluation Library”, 13. kesäkuuta 2025, arXiv: arXiv:2404.17844. doi: 10.48550/arXiv.2404.17844.

[8] E. Coppolillo ym., ”Algorithmic Drift: A simulation framework to study the effects of recommender systems on user preferences”, Information Processing & Management, vol. 62, nro 4, s. 104125, heinä 2025, doi: 10.1016/j.ipm.2025.104125.

[9] Y. Du, R. Chen, Q. Tan, Q. Han, S. Wang, ja X. Zhao, ”Cross-Task Collaborative Meta-Learning for Cold-Start Recommendations”, IEEE Trans. Knowl. Data Eng., vol. 37, nro 12, s. 7016–7029, joulu 2025, doi: 10.1109/TKDE.2025.3613366.

[10] J. Feng, ”E-commerce recommender system design based on web information extraction and sentiment analysis”, PLoS One, vol. 20, nro 9, s. e0327213, syys 2025, doi: 10.1371/journal.pone.0327213.

[11] A. Ferrara ym., ”DIVAN: Deep-Interest Virality-Aware Network to Exploit Temporal Dynamics in News Recommendation”, teoksessa Proceedings of the Recommender Systems Challenge 2024, Bari Italy: ACM, loka 2024, s. 12–16. doi: 10.1145/3687151.3687153.

[12] H. Foidl, V. Golendukhina, R. Ramler, ja M. Felderer, ”Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers”, Journal of Systems and Software, vol. 207, s. 111855, tammi 2024, doi: 10.1016/j.jss.2023.111855.

[13] R. Garapati ja M. Chakraborty, ”Recommender systems in the digital age: a comprehensive review of methods, challenges, and applications”, Knowl Inf Syst, vol. 67, nro 8, s. 6367–6411, elo 2025, doi: 10.1007/s10115-025-02453-y.

[14] Y. Ge ym., ”A Survey on Trustworthy Recommender Systems”, ACM Trans. Recomm. Syst., vol. 3, nro 2, s. 1–68, kesä 2025, doi: 10.1145/3652891.

[15] D. Gusak, A. Volodkevich, A. Klenitskiy, A. Vasilev, ja E. Frolov, ”Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders”, teoksessa Proceedings of the Nineteenth ACM Conference on Recommender Systems, Prague Czech Republic: ACM, syys 2025, s. 874–883. doi: 10.1145/3705328.3748164.

[16] W. Huang, B. Liu, ja Z. Wang, ”A novel interest drift sensitivity academic paper recommender based on implicit feedback”, Egyptian Informatics Journal, vol. 28, s. 100538, joulu 2024, doi: 10.1016/j.eij.2024.100538.

[17] A. Khadka ja S. Sthapit, ”A Review on Scholarly Publication Recommender Systems: Features, Approaches, Evaluation, and Open Research Directions”, Informatics, vol. 12, nro 4, s. 108, loka 2025, doi: 10.3390/informatics12040108.

[18] A. Klimashevskaia, D. Jannach, M. Elahi, ja C. Trattner, ”A survey on popularity bias in recommender systems”, User Model User-Adap Inter, vol. 34, nro 5, s. 1777–1834, marras 2024, doi: 10.1007/s11257-024-09406-0.

[19] H. Liu, Y. Wang, Z. Zhang, J. Deng, C. Chen, ja L. Y. Zhang, ”Matrix factorization recommender based on adaptive Gaussian differential privacy for implicit feedback”, Information Processing & Management, vol. 61, nro 4, s. 103720, heinä 2024, doi: 10.1016/j.ipm.2024.103720.

[20] D.-N. Nguyen, V.-H. Nguyen, T. Trinh, T. Ho, ja H.-S. Le, ”A personalized product recommendation model in e-commerce based on retrieval strategy”, Journal of Open Innovation: Technology, Market, and Complexity, vol. 10, nro 2, s. 100303, kesä 2024, doi: 10.1016/j.joitmc.2024.100303.

[21] Y. Park, J. Mun, Y. Lee, J. Um, J. Choi, ja J. Choi, ”Data-Driven Optimization of Healthcare Recommender System Retraining Pipelines in MLOps with Wearable IoT Data”, Sensors, vol. 25, nro 20, s. 6369, loka 2025, doi: 10.3390/s25206369.

[22] S. G. K. Patro, ”Dynamic Hybrid Recommendation Approach for Improving Accuracy in E-Commerce with Limited User Data”, Next-Gener. Comput. Syst. Technol., vol. 1, nro 2, s. 62–78, joulu 2025, doi: 10.62762/NGCST.2025.832339.

[23] F. Qian, W. Chen, H. Chen, J. Liu, S. Zhao, ja Y. Zhang, ”Building robust deep recommender systems: Utilizing a weighted adversarial noise propagation framework with robust fine-tuning modules”, Knowledge-Based Systems, vol. 314, s. 113181, huhti 2025, doi: 10.1016/j.knosys.2025.113181.

[24] F. Rodrigues, F. Pinelas, S. Ferreira, M. Rodrigues, ja N. Rocha, ”A Recommendation System Based on a Microservice Architecture to Avoid Workplace Stress”, Electronics, vol. 14, nro 7, s. 1446, huhti 2025, doi: 10.3390/electronics14071446.

[25] M. Söylemez, B. Tekinerdogan, ja A. K. Tarhan, ”Microservice reference architecture design: A multi‐case study”, Softw Pract Exp, vol. 54, nro 1, s. 58–84, tammi 2024, doi: 10.1002/spe.3241.

[26] Tanveer Ahmad Lone, Dr. Ajit Kumar, ja Dr. Muzafar Rasool Bhat, ”Exploring the Efficiency of Hybrid Recommender Systems Implemented with TensorFlow Framework”, IJARSCT, s. 528–533, loka 2024, doi: 10.48175/IJARSCT-19979.

[27] R. T. Turksoy ja B. Turkmen, ”The Effects of Data Split Strategies on the Offline Experiments for CTR Prediction”, 26. kesäkuuta 2024, arXiv: arXiv:2406.18320. doi: 10.48550/arXiv.2406.18320.

[28] S. Wang, X. Zhang, Y. Wang, ja F. Ricci, ”Trustworthy Recommender Systems”, ACM Trans. Intell. Syst. Technol., vol. 15, nro 4, s. 1–20, elo 2024, doi: 10.1145/3627826.

[29] S. Ye ja J. Lu, ”Robust Recommender Systems with Rating Flip Noise”, ACM Trans. Intell. Syst. Technol., vol. 16, nro 1, s. 1–19, helmi 2025, doi: 10.1145/3641285.

[30] M. Zarour, H. Alzabut, ja K. T. Al-Sarayreh, ”MLOps best practices, challenges and maturity models: A systematic literature review”, Information and Software Technology, vol. 183, s. 107733, heinä 2025, doi: 10.1016/j.infsof.2025.107733.

[31] Y. Zhang, X. Zhang, Z. Cui, ja C. Ma, ”Shapley Value-driven Data Pruning for Recommender Systems”, teoksessa Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto ON Canada: ACM, elo 2025, s. 3879–3888. doi: 10.1145/3711896.3737127.

[32] Y. Zhao ym., ”Generative recommender systems: A comprehensive survey on model, framework, and application”, Information Fusion, vol. 127, s. 103919, maalis 2026, doi: 10.1016/j.inffus.2025.103919.

[33] Y. Zhao, Y. Wang, Y. Liu, X. Cheng, C. C. Aggarwal, ja T. Derr, ”Fairness and Diversity in Recommender Systems: A Survey”, ACM Trans. Intell. Syst. Technol., vol. 16, nro 1, s. 1–28, helmi 2025, doi: 10.1145/3664928.

[34] A. Alhwayzee et al., K. Wardatzky, O. Inel, and A. Bernstein, “Toward Operationalizing a Comprehensive Evaluation Framework for Recommender Systems Explanations,” 2025, available: https://beyondrecsys.github.io/2025/paper2.pdf.“A Robust Recommender System Against Adversarial and Noisy Feedback,” Symmetry, 2025.

From Web Extraction to Collaborative Filtering: An End-to-End Architecture for Reliable Recommendation Systems

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn