Optimized Machine Learning Model for Credit Card Fraud Detection Using SMOTE-Tomek and Feature Engineering

  • Mochammad Abdurrochman Ari Wibowo Universitas Dian Nuswantoro
  • De Rosal Ignatius Moses Setiadi Universitas Dian Nuswantoro
Keywords: Creditcard, Feature Engineering, Machine Learning, Oversampling, Smote-Tomek

Abstract

In today’s digital economy, credit cards are essential and both credit card usage and theft have increased significantly in recent years. Credit card fraud can be categorized using machine learning models using data from suspicious transaction history. However, credit data is often imbalanced. Therefore, machine learning models are biased towards the majority class resulting in poor performance on publicly accessible Kaggle credit card classification datasets. We balance the class distribution in the dataset using a hybrid synthetic minority oversampling strategy to address this difficulty. The findings show that the random forest machine learning model combined with oversampling techniques combined with feature engineering and cross-validation yields optimal results of more than 99% for all assessment measures. It performs better compared to three other models, namely decision tree, gradient boosting, and XGBoost. It can be concluded that the use of feature engineering, cross-validation, and oversampling are useful approaches to handle imbalanced credit card data and ultimately help in preventing credit card transaction fraud.

Downloads

Download data is not yet available.

References

M. Habibpour et al., “Uncertainty-aware credit card fraud detection using deep learning,” Eng. Appl. Artif. Intell., vol. 123, p. 106248, 2023, doi: 10.1016/j.engappai.2023.106248.

G. Zhang et al., “eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks,” ACM Trans. Inf. Syst., vol. 40, no. 3, pp. 1–29, Jul. 2022, doi: 10.1145/3474379.

A. Ali et al., “Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review,” Appl. Sci., vol. 12, no. 19, p. 9637, Sep. 2022, doi: 10.3390/app12199637.

P. T. S. Ningsih, M. Gusvarizon, and R. Hermawan, “Analisis Sistem Pendeteksi Penipuan Transaksi Kartu Kredit dengan Algoritma Machine Learning,” J. Teknol. Inform. dan Komput., vol. 8, no. 2, pp. 386–401, Sep. 2022, doi: 10.37012/jtik.v8i2.1306.

A. Shen, R. Tong, and Y. Deng, “Application of Classification Models on Credit Card Fraud Detection,” in 2007 International Conference on Service Systems and Service Management, Jun. 2007, pp. 1–4. doi: 10.1109/ICSSSM.2007.4280163.

E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, “Credit Card Fraud Detection Using a New Hybrid Machine Learning Architecture,” Mathematics, vol. 10, no. 9, p. 1480, Apr. 2022, doi: 10.3390/math10091480.

Y. Pristyanto and A. A. Zein, “Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 12, no. 1, pp. 9–15, 2023, doi: 10.32736/sisfokom.v12i1.1399.

E. Ileberi, Y. Sun, and Z. Wang, “Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost,” IEEE Access, vol. 9, pp. 165286–165294, 2021, doi: 10.1109/access.2021.3134330.

M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00943-4.

D. R. I. M. Setiadi, H. M. M. Islam, G. A. Trisnapradika, and W. Herowati, “Analyzing Preprocessing Impact on Machine Learning Classifiers for Cryotherapy and Immunotherapy Dataset,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 39–50, Jun. 2024, doi: 10.62411/faith.2024-2.

Z. S. Dhahir, “A Hybrid Approach for Efficient DDoS Detection in Network Traffic Using CBLOF-Based Feature Engineering and XGBoost,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 174–190, Sep. 2024, doi: 10.62411/faith.2024-33.

F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, “Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346–357, Feb. 2024, doi: 10.62411/jcta.9539.

J. A. Ingio, A. S. Nsang, and A. Iorliam, “Optimizing Rice Production Forecasting Through Integrating Multiple Linear Regression with Recursive Feature Elimination,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 96–108, Aug. 2024, doi: 10.62411/faith.2024-17.

M. I. Akazue, I. A. Debekeme, A. E. Edje, C. Asuai, and U. J. Osame, “UNMASKING FRAUDSTERS: Ensemble Features Selection to Enhance Random Forest Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 201–211, Dec. 2023, doi: 10.33633/jcta.v1i2.9462.

D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection,” J. Futur. Artif. Intell. Technol., vol. 2, no. 1, pp. 75–83, 2024, doi: 10.62411/faith.2024-15.

M. D. Okpor et al., “Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 109–123, Sep. 2024, doi: 10.62411/faith.2024-14.

F. O. Aghware et al., “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407–420, Mar. 2024, doi: 10.62411/jcta.10323.

Machine Learning Group - ULB, “Credit Card Fraud Detection,” Kaggle.com, 2017. https://kaggle.com/mlg-ulb/creditcardfraud

F. . Osisanwo, J. E. . Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and J. Akinjobi, “Supervised Machine Learning Algorithms: Classification and Comparison,” Int. J. Comput. Trends Technol., vol. 48, no. 3, pp. 128–138, 2017, doi: 10.14445/22312803/ijctt-v48p126.

Y. Xin and X. Ren, “Predicting depression among rural and urban disabled elderly in China using a random forest classifier,” BMC Psychiatry, vol. 22, no. 1, p. 118, Feb. 2022, doi: 10.1186/s12888-022-03742-4.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. with Appl., vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.

A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” J. Chemom., vol. 18, no. 6, pp. 275–285, 2004, doi: 10.1002/cem.873.

O. Lyashevska, F. Malone, E. MacCarthy, J. Fiehler, J.-H. Buhk, and L. Morris, “Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data,” Stat. Methods Med. Res., vol. 30, no. 3, pp. 916–925, 2020, doi: 10.1177/0962280220980484.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A Review,” Int. J. Adv. Soft Compu. Appl, vol. 7, no. 3, pp. 176–204, 2015.

G. Lemaitre, F. Nogueira, and C. K. Aridas, “Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 559–563, Jan. 2017.

M. Kubat, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Fourteenth Int. Conf. Mach. Learn., 2000.

P. Mrozek, J. Panneerselvam, and O. Bagdasar, “Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets,” 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC). IEEE, 2020. doi: 10.1109/ucc48980.2020.00067.

M. A. Al-Shabi, “Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets,” J. Adv. Math. Comput. Sci., pp. 1–16, 2019, doi: 10.9734/jamcs/2019/v33i530192.

T. R. Noviandy, G. M. Idroes, and I. Hardi, “An Interpretable Machine Learning Strategy for Antimalarial Drug Discovery with LightGBM and SHAP,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 84–95, Aug. 2024, doi: 10.62411/faith.2024-16.

Published
2024-11-25
How to Cite
[1]
M. A. A. Wibowo and D. R. I. M. Setiadi, “Optimized Machine Learning Model for Credit Card Fraud Detection Using SMOTE-Tomek and Feature Engineering”, JAIC, vol. 8, no. 2, pp. 580-588, Nov. 2024.
Section
Articles