From Web Extraction to Collaborative Filtering: An End-to-End Architecture for Reliable Recommendation Systems
DOI:
https://doi.org/10.30871/jaic.v10i3.12592Keywords:
Recommender Systems, Data Pipeline Quality, Web Data Extraction, Collaborative Filtering, Hybrid RecommendationAbstract
The growth of digital platforms has generated large volumes of Web-derived interaction data, but these data are often noisy, duplicated, incomplete, and temporally unstable. Recommendation quality therefore depends not only on the ranking model, but also on how extraction, validation, and temporal control are integrated upstream. This paper presents an end-to-end architecture in which Web extraction, schema normalization, cleaning, deduplication, anomaly quarantine, recency-aware processing, and recommendation generation are treated as a single operational pipeline. The contribution is not the use of hybrid recommendation alone, which is already common, but the explicit integration of these quality-control stages with temporally valid offline evaluation and system-level monitoring. Four recommendation strategies are studied within the same pipeline: global popularity, recency-weighted popularity, implicit matrix factorization, and a hybrid method that combines collaborative filtering with a recency-based fallback for sparse-user cold-start situations. Experiments are conducted on a realistic e-commerce dataset comprising approximately 50,000 users, 18,000 items, and 1.2 million interactions under a strict chronological 80/20 split. Evaluation includes Precision@K, Recall@K, NDCG@K, Coverage@K, sparse-user cold-start analysis, and system indicators. Results indicate that the hybrid approach achieves the best observed aggregate ranking performance under the present protocol, improves sparse-user robustness (Recall@10 = 0.158), maintains broad catalog coverage (38.9%), and remains operationally stable under the tested evaluation conditions (p95 latency = 48 ms; uptime = 99.7%). These findings support assessing recommendation quality as a property of the full data-to-recommendation pipeline rather than of the ranking algorithm alone.
Downloads
References
[1] Y. H. Alfaifi, ”Recommender Systems Applications: Data Sources, Features, and Challenges”, Information, vol. 15, nro 10, s. 660, loka 2024, doi: 10.3390/info15100660.
[2] A. Alhwayzee, S. Araban, ja D. Zabihzadeh, ”A Robust Recommender System Against Adversarial and Shilling Attacks Using Diffusion Networks and Self-Adaptive Learning”, Symmetry, vol. 17, nro 2, s. 233, helmi 2025, doi: 10.3390/sym17020233.
[3] N. A. M. Binti Amir Suharman, A. H.-L. Lim, ja H.-N. Goh, ”Sequence-to-pattern analysis for predicting buying decisions on imbalanced clickstream data”, Cogent Engineering, vol. 12, nro 1, s. 2501493, joulu 2025, doi: 10.1080/23311916.2025.2501493.
[4] S. Cao ym., ”FlyCache: Recommendation-driven edge caching architecture for full life cycle of video streaming”, Digital Communications and Networks, vol. 11, nro 4, s. 961–974, elo 2025, doi: 10.1016/j.dcan.2025.01.001.
[5] F. Carnovalini, A. Rodà, ja G. A. Wiggins, ”Popularity Bias in Recommender Systems: The Search for Fairness in the Long Tail”, Information, vol. 16, nro 2, s. 151, helmi 2025, doi: 10.3390/info16020151.
[6] S. Chakraborty, ”A Study On Hybrid Recommender Systems For Effective Targeted Marketing In E-Commerce Platforms”, IJABMR, vol. 02, nro 04, s. 54–64, 2025, doi: 10.62674/ijabmr.2025.v2i04.006.
[7] L. Cheng, X. Huang, J. Sang, ja J. Yu, ”Towards Robust Recommendation: A Review and an Adversarial Robustness Evaluation Library”, 13. kesäkuuta 2025, arXiv: arXiv:2404.17844. doi: 10.48550/arXiv.2404.17844.
[8] E. Coppolillo ym., ”Algorithmic Drift: A simulation framework to study the effects of recommender systems on user preferences”, Information Processing & Management, vol. 62, nro 4, s. 104125, heinä 2025, doi: 10.1016/j.ipm.2025.104125.
[9] Y. Du, R. Chen, Q. Tan, Q. Han, S. Wang, ja X. Zhao, ”Cross-Task Collaborative Meta-Learning for Cold-Start Recommendations”, IEEE Trans. Knowl. Data Eng., vol. 37, nro 12, s. 7016–7029, joulu 2025, doi: 10.1109/TKDE.2025.3613366.
[10] J. Feng, ”E-commerce recommender system design based on web information extraction and sentiment analysis”, PLoS One, vol. 20, nro 9, s. e0327213, syys 2025, doi: 10.1371/journal.pone.0327213.
[11] A. Ferrara ym., ”DIVAN: Deep-Interest Virality-Aware Network to Exploit Temporal Dynamics in News Recommendation”, teoksessa Proceedings of the Recommender Systems Challenge 2024, Bari Italy: ACM, loka 2024, s. 12–16. doi: 10.1145/3687151.3687153.
[12] H. Foidl, V. Golendukhina, R. Ramler, ja M. Felderer, ”Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers”, Journal of Systems and Software, vol. 207, s. 111855, tammi 2024, doi: 10.1016/j.jss.2023.111855.
[13] R. Garapati ja M. Chakraborty, ”Recommender systems in the digital age: a comprehensive review of methods, challenges, and applications”, Knowl Inf Syst, vol. 67, nro 8, s. 6367–6411, elo 2025, doi: 10.1007/s10115-025-02453-y.
[14] Y. Ge ym., ”A Survey on Trustworthy Recommender Systems”, ACM Trans. Recomm. Syst., vol. 3, nro 2, s. 1–68, kesä 2025, doi: 10.1145/3652891.
[15] D. Gusak, A. Volodkevich, A. Klenitskiy, A. Vasilev, ja E. Frolov, ”Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders”, teoksessa Proceedings of the Nineteenth ACM Conference on Recommender Systems, Prague Czech Republic: ACM, syys 2025, s. 874–883. doi: 10.1145/3705328.3748164.
[16] W. Huang, B. Liu, ja Z. Wang, ”A novel interest drift sensitivity academic paper recommender based on implicit feedback”, Egyptian Informatics Journal, vol. 28, s. 100538, joulu 2024, doi: 10.1016/j.eij.2024.100538.
[17] A. Khadka ja S. Sthapit, ”A Review on Scholarly Publication Recommender Systems: Features, Approaches, Evaluation, and Open Research Directions”, Informatics, vol. 12, nro 4, s. 108, loka 2025, doi: 10.3390/informatics12040108.
[18] A. Klimashevskaia, D. Jannach, M. Elahi, ja C. Trattner, ”A survey on popularity bias in recommender systems”, User Model User-Adap Inter, vol. 34, nro 5, s. 1777–1834, marras 2024, doi: 10.1007/s11257-024-09406-0.
[19] H. Liu, Y. Wang, Z. Zhang, J. Deng, C. Chen, ja L. Y. Zhang, ”Matrix factorization recommender based on adaptive Gaussian differential privacy for implicit feedback”, Information Processing & Management, vol. 61, nro 4, s. 103720, heinä 2024, doi: 10.1016/j.ipm.2024.103720.
[20] D.-N. Nguyen, V.-H. Nguyen, T. Trinh, T. Ho, ja H.-S. Le, ”A personalized product recommendation model in e-commerce based on retrieval strategy”, Journal of Open Innovation: Technology, Market, and Complexity, vol. 10, nro 2, s. 100303, kesä 2024, doi: 10.1016/j.joitmc.2024.100303.
[21] Y. Park, J. Mun, Y. Lee, J. Um, J. Choi, ja J. Choi, ”Data-Driven Optimization of Healthcare Recommender System Retraining Pipelines in MLOps with Wearable IoT Data”, Sensors, vol. 25, nro 20, s. 6369, loka 2025, doi: 10.3390/s25206369.
[22] S. G. K. Patro, ”Dynamic Hybrid Recommendation Approach for Improving Accuracy in E-Commerce with Limited User Data”, Next-Gener. Comput. Syst. Technol., vol. 1, nro 2, s. 62–78, joulu 2025, doi: 10.62762/NGCST.2025.832339.
[23] F. Qian, W. Chen, H. Chen, J. Liu, S. Zhao, ja Y. Zhang, ”Building robust deep recommender systems: Utilizing a weighted adversarial noise propagation framework with robust fine-tuning modules”, Knowledge-Based Systems, vol. 314, s. 113181, huhti 2025, doi: 10.1016/j.knosys.2025.113181.
[24] F. Rodrigues, F. Pinelas, S. Ferreira, M. Rodrigues, ja N. Rocha, ”A Recommendation System Based on a Microservice Architecture to Avoid Workplace Stress”, Electronics, vol. 14, nro 7, s. 1446, huhti 2025, doi: 10.3390/electronics14071446.
[25] M. Söylemez, B. Tekinerdogan, ja A. K. Tarhan, ”Microservice reference architecture design: A multi‐case study”, Softw Pract Exp, vol. 54, nro 1, s. 58–84, tammi 2024, doi: 10.1002/spe.3241.
[26] Tanveer Ahmad Lone, Dr. Ajit Kumar, ja Dr. Muzafar Rasool Bhat, ”Exploring the Efficiency of Hybrid Recommender Systems Implemented with TensorFlow Framework”, IJARSCT, s. 528–533, loka 2024, doi: 10.48175/IJARSCT-19979.
[27] R. T. Turksoy ja B. Turkmen, ”The Effects of Data Split Strategies on the Offline Experiments for CTR Prediction”, 26. kesäkuuta 2024, arXiv: arXiv:2406.18320. doi: 10.48550/arXiv.2406.18320.
[28] S. Wang, X. Zhang, Y. Wang, ja F. Ricci, ”Trustworthy Recommender Systems”, ACM Trans. Intell. Syst. Technol., vol. 15, nro 4, s. 1–20, elo 2024, doi: 10.1145/3627826.
[29] S. Ye ja J. Lu, ”Robust Recommender Systems with Rating Flip Noise”, ACM Trans. Intell. Syst. Technol., vol. 16, nro 1, s. 1–19, helmi 2025, doi: 10.1145/3641285.
[30] M. Zarour, H. Alzabut, ja K. T. Al-Sarayreh, ”MLOps best practices, challenges and maturity models: A systematic literature review”, Information and Software Technology, vol. 183, s. 107733, heinä 2025, doi: 10.1016/j.infsof.2025.107733.
[31] Y. Zhang, X. Zhang, Z. Cui, ja C. Ma, ”Shapley Value-driven Data Pruning for Recommender Systems”, teoksessa Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto ON Canada: ACM, elo 2025, s. 3879–3888. doi: 10.1145/3711896.3737127.
[32] Y. Zhao ym., ”Generative recommender systems: A comprehensive survey on model, framework, and application”, Information Fusion, vol. 127, s. 103919, maalis 2026, doi: 10.1016/j.inffus.2025.103919.
[33] Y. Zhao, Y. Wang, Y. Liu, X. Cheng, C. C. Aggarwal, ja T. Derr, ”Fairness and Diversity in Recommender Systems: A Survey”, ACM Trans. Intell. Syst. Technol., vol. 16, nro 1, s. 1–28, helmi 2025, doi: 10.1145/3664928.
[34] A. Alhwayzee et al., K. Wardatzky, O. Inel, and A. Bernstein, “Toward Operationalizing a Comprehensive Evaluation Framework for Recommender Systems Explanations,” 2025, available: https://beyondrecsys.github.io/2025/paper2.pdf.“A Robust Recommender System Against Adversarial and Noisy Feedback,” Symmetry, 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Lamanabwe Epus Hervé, Blaise Muhala Luhepa, Herman MATONDO MANANGA, Dieuleveut Nianga Kaya-Kaya, Benjamin Consolant Majegeza

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








