Knowledge Discovery on E-Commerce Customer Churn Using Interpretable Machine Learning: A Comparative Study of SHAP-Based Classifiers

Authors

  • Dhita Amanda Ardhani Universitas Sriwijaya
  • Ken Ditha Tania Universitas Sriwijaya

DOI:

https://doi.org/10.30871/jaic.v9i5.10811

Keywords:

Customer Churn, E-Commerce, Machine Learning, SHapley Additive exPlanations

Abstract

Customer churn remains one of the most pressing issues in the e-commerce sector, as it directly erodes revenue and reduces customer lifetime value. This study proposes an interpretable machine learning approach designed not only to predict churn but also to uncover practical insights that can inform retention strategies. The analysis draws on a publicly available dataset containing customer behavior and transaction records. Data preparation involved handling missing values, applying label encoding, and addressing class imbalance with SMOTE. Five classification models—Logistic Regression, Random Forest, XGBoost, Support Vector Machine, and Gradient Boosting—were trained on an 80:20 stratified split, with performance assessed through accuracy, precision, recall, F1-score, and AUC. Among these, XGBoost delivered the most consistent results, achieving 96% accuracy, 95% precision, 92% recall, and a near-perfect AUC of 0.999, followed closely by Random Forest. Logistic Regression produced the lowest AUC at 0.886. To ensure transparency in decision-making, SHAP (SHapley Additive exPlanations) was applied, revealing Tenure, Complain, and CashbackAmount as the most influential predictors. Longer customer relationships were linked to reduced churn risk, while frequent complaints and higher cashback usage indicated a greater likelihood of leaving. These findings contribute knowledge by blending robust predictive performance with interpretability, enabling e-commerce businesses to design more targeted and proactive customer retention measures.

Downloads

Download data is not yet available.

References

[1] B. Zhu, C. Qian, S. vanden Broucke, J. Xiao, and Y. Li, “A bagging-based selective ensemble model for churn prediction on imbalanced data,” Expert Syst Appl, vol. 227, Oct. 2023, doi: 10.1016/j.eswa.2023.120223.

[2] D. Asif, M. S. Arif, and A. Mukheimer, “A data-driven approach with explainable artificial intelligence for customer churn prediction in the telecommunications industry,” Results in Engineering, vol. 26, Jun. 2025, doi: 10.1016/j.rineng.2025.104629.

[3] A. Amin, A. Adnan, and S. Anwar, “An adaptive learning approach for customer churn prediction in the telecommunication industry using evolutionary computation and Naïve Bayes,” Appl Soft Comput, vol. 137, Apr. 2023, doi: 10.1016/j.asoc.2023.110103.

[4] H. Zhang and W. Zhang, “Application of GWO-attention-ConvLSTM model in customer churn prediction and satisfaction analysis in customer relationship management,” Heliyon, vol. 10, no. 17, Sep. 2024, doi: 10.1016/j.heliyon.2024.e37229.

[5] R. Krishna, D. Jayanthi, D. S. Shylu Sam, K. Kavitha, N. K. Maurya, and T. Benil, “Application of machine learning techniques for churn prediction in the telecom business,” Results in Engineering, vol. 24, Dec. 2024, doi: 10.1016/j.rineng.2024.103165.

[6] K. Ljubičić, A. Merćep, and Z. Kostanjčar, “Churn prediction methods based on mutual customer interdependence,” J Comput Sci, vol. 67, Mar. 2023, doi: 10.1016/j.jocs.2022.101940.

[7] K. A. Pflughoeft, N. T. Butz, and A. Corbley, “Customer churn prediction for fixed wireless access: The case of a regional internet service provider,” Telecomm Policy, vol. 49, no. 4, May 2025, doi: 10.1016/j.telpol.2025.102929.

[8] S. K. Wagh, A. A. Andhale, K. S. Wagh, J. R. Pansare, S. P. Ambadekar, and S. H. Gawande, “Customer churn prediction in telecom sector using machine learning techniques,” Results in Control and Optimization, vol. 14, Mar. 2024, doi: 10.1016/j.rico.2023.100342.

[9] S. Arockia Panimalar and A. Krishnakumar, “Customer churn prediction model in cloud environment using DFE-WUNB: ANN deep feature extraction with Weight Updated Tuned Naïve Bayes classification with Block-Jacobi SVD dimensionality reduction,” Eng Appl Artif Intell, vol. 126, Nov. 2023, doi: 10.1016/j.engappai.2023.107015.

[10] H. D. Hoang and N. T. Cam, “Do they like your game? Early-stage churn prediction using a two-phase neural network system,” Eng Appl Artif Intell, vol. 144, Mar. 2025, doi: 10.1016/j.engappai.2025.110102.

[11] F. E. Usman-Hamza et al., “Empirical analysis of tree-based classification models for customer churn prediction,” Sci Afr, vol. 23, Mar. 2024, doi: 10.1016/j.sciaf.2023.e02054.

[12] P. Boozary, S. Sheykhan, H. GhorbanTanhaei, and C. Magazzino, “Enhancing customer retention with machine learning: A comparative analysis of ensemble models for accurate churn prediction,” International Journal of Information Management Data Insights, vol. 5, no. 1, Jun. 2025, doi: 10.1016/j.jjimei.2025.100331.

[13] H. Habiba Shabbir, M. Hamza Farooq, A. Zafar, B. Ayesha Akram, T. Waheed, and M. Aslam, “Enhancing employee churn prediction with weibull time-to-event modeling,” Journal of Engineering Research (Kuwait), 2025, doi: 10.1016/j.jer.2025.03.009.

[14] S. S. Poudel, S. Pokharel, and M. Timilsina, “Explaining customer churn prediction in telecom industry using tabular machine learning models,” Machine Learning with Applications, vol. 17, p. 100567, Sep. 2024, doi: 10.1016/j.mlwa.2024.100567.

[15] Z. Liu, P. Jiang, K. W. De Bock, J. Wang, L. Zhang, and X. Niu, “Extreme gradient boosting trees with efficient Bayesian optimization for profit-driven customer churn prediction,” Technol Forecast Soc Change, vol. 198, Jan. 2024, doi: 10.1016/j.techfore.2023.122945.

[16] A. De Caigny, K. W. De Bock, and S. Verboven, “Hybrid black-box classification for customer churn prediction with segmented interpretability analysis,” Decis Support Syst, vol. 181, Jun. 2024, doi: 10.1016/j.dss.2024.114217.

[17] P. Jiang, Z. Liu, L. Zhang, and J. Wang, “Hybrid model for profit-driven churn prediction based on cost minimization and return maximization,” Expert Syst Appl, vol. 228, Oct. 2023, doi: 10.1016/j.eswa.2023.120354.

[18] A. L. D. Loureiro, V. L. Miguéis, Á. Costa, and M. Ferreira, “Improving customer retention in taxi industry using travel data analytics: A churn prediction study,” Journal of Retailing and Consumer Services, vol. 85, Jul. 2025, doi: 10.1016/j.jretconser.2025.104288.

[19] J. Sanchez Ramirez, K. Coussement, A. De Caigny, D. F. Benoit, and E. Guliyev, “Incorporating usage data for B2B churn prediction modeling,” Industrial Marketing Management, vol. 120, pp. 191–205, Jul. 2024, doi: 10.1016/j.indmarman.2024.05.008.

[20] N. A. Sofiah, K. D. Tania, A. Meiriza and A. Wedhasmara, "A Comparative Assessment SARIMA and LSTM Models for the Gurugram Air Quality Index's Knowledge Discovery," 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), Indonesia, 2024, pp. 26-31, doi: 10.1109/ICECOS63900.2024.10791243.

[21] J. Shobana and C. G. Gangadhar, “E-commerce customer churn prevention using machine learning-based business intelligence strategy,” Measurement, vol. 270, Jan. 2023, Art. no. 110998. doi: 10.1016/j.measurement.2023.110998.

[22] I. Boukrouh and A. Azmani, “Explainable machine learning models applied to predicting customer churn for e-commerce,” International Journal of Artificial Intelligence (IJAI), vol. 14, no. 1, pp. 286–297, Feb. 2025. doi: 10.11591/ijai.v14.i1.pp286-297.

[23] S. Kumar, S. Deep, and P. Kalra, “A comprehensive analysis of machine learning techniques for churn prediction in e-commerce: A comparative study,” International Journal of Computer Trends and Technology (IJCTT), vol. 72, no. 5, pp. 163–170, May 2024. doi: 10.14445/22312803/IJCTT-V72I5P119.

[24] J. Maan and H. Maan, “Customer churn prediction model using explainable machine learning,” arXiv preprint arXiv:2303.00960, Mar. 2023. [Online]. Available: https://arxiv.org/abs/2303.00960

[25] J. Li, “Customer churn prediction using machine learning: A case study of e-commerce data,” International Journal of Computer Applications, vol. 186, no. 48, pp. 1–6, Nov. 2024. doi: 10.5120/ijca2024924140.

[26] O. S. Owolabi, A. T. Adepoju, and A. A. Ajayi, “Comparative analysis of machine learning models for customer churn prediction in the U.S. banking and financial services: Economic impact and industry-specific insights,” Journal of Data Analysis and Information Processing, vol. 12, pp. 388–418, 2024. doi: 10.4236/jdaip.2024.123021.

[27] A. Almahadeen, “Evaluating machine learning techniques for predicting customer churn in e-commerce sector,” Journal of Logistics, Informatics and Service Science, vol. 11, no. 6, pp. 439–450, 2024. [Online]. Available: https://www.aasmr.org/liss/onlinefirst/Vol11/No.6/Vol.11.No.6.27.pdf

[28] S. Baghla and G. Gupta, “Performance evaluation of various classification techniques for customer churn prediction in e-commerce,” Microprocessors and Microsystems, vol. 101, Art. no. 104689, Apr. 2023. doi: 10.1016/j.micpro.2023.104689.

[29] D. Y. C. Wang, L. A. Jordanger, and J. C.-W. Lin, “Explainability of highly associated fuzzy churn patterns in binary classification,” arXiv preprint arXiv:2410.15827, Oct. 2024. [Online]. Available: https://arxiv.org/abs/2410.15827

[30] H. Ren, “Machine learning-based prediction of customer churn risk in e-commerce,” in Proc. Int. Conf. on Business Intelligence and Big Data (BIBD), Chengdu, China, Oct. 2024, pp. 55–60. doi: 10.1109/BIBD.2024.9932147.

Downloads

Published

2025-10-16

How to Cite

[1]
D. Amanda Ardhani and K. D. Tania, “Knowledge Discovery on E-Commerce Customer Churn Using Interpretable Machine Learning: A Comparative Study of SHAP-Based Classifiers”, JAIC, vol. 9, no. 5, pp. 2695–2702, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.