Optimized Machine Learning Model for Credit Card Fraud Detection Using SMOTE-Tomek and Feature Engineering
Abstract
In today’s digital economy, credit cards are essential and both credit card usage and theft have increased significantly in recent years. Credit card fraud can be categorized using machine learning models using data from suspicious transaction history. However, credit data is often imbalanced. Therefore, machine learning models are biased towards the majority class resulting in poor performance on publicly accessible Kaggle credit card classification datasets. We balance the class distribution in the dataset using a hybrid synthetic minority oversampling strategy to address this difficulty. The findings show that the random forest machine learning model combined with oversampling techniques combined with feature engineering and cross-validation yields optimal results of more than 99% for all assessment measures. It performs better compared to three other models, namely decision tree, gradient boosting, and XGBoost. It can be concluded that the use of feature engineering, cross-validation, and oversampling are useful approaches to handle imbalanced credit card data and ultimately help in preventing credit card transaction fraud.
Downloads
References
M. Habibpour et al., “Uncertainty-aware credit card fraud detection using deep learning,” Eng. Appl. Artif. Intell., vol. 123, p. 106248, 2023, doi: 10.1016/j.engappai.2023.106248.
G. Zhang et al., “eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks,” ACM Trans. Inf. Syst., vol. 40, no. 3, pp. 1–29, Jul. 2022, doi: 10.1145/3474379.
A. Ali et al., “Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review,” Appl. Sci., vol. 12, no. 19, p. 9637, Sep. 2022, doi: 10.3390/app12199637.
P. T. S. Ningsih, M. Gusvarizon, and R. Hermawan, “Analisis Sistem Pendeteksi Penipuan Transaksi Kartu Kredit dengan Algoritma Machine Learning,” J. Teknol. Inform. dan Komput., vol. 8, no. 2, pp. 386–401, Sep. 2022, doi: 10.37012/jtik.v8i2.1306.
A. Shen, R. Tong, and Y. Deng, “Application of Classification Models on Credit Card Fraud Detection,” in 2007 International Conference on Service Systems and Service Management, Jun. 2007, pp. 1–4. doi: 10.1109/ICSSSM.2007.4280163.
E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, “Credit Card Fraud Detection Using a New Hybrid Machine Learning Architecture,” Mathematics, vol. 10, no. 9, p. 1480, Apr. 2022, doi: 10.3390/math10091480.
Y. Pristyanto and A. A. Zein, “Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 12, no. 1, pp. 9–15, 2023, doi: 10.32736/sisfokom.v12i1.1399.
E. Ileberi, Y. Sun, and Z. Wang, “Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost,” IEEE Access, vol. 9, pp. 165286–165294, 2021, doi: 10.1109/access.2021.3134330.
M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00943-4.
D. R. I. M. Setiadi, H. M. M. Islam, G. A. Trisnapradika, and W. Herowati, “Analyzing Preprocessing Impact on Machine Learning Classifiers for Cryotherapy and Immunotherapy Dataset,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 39–50, Jun. 2024, doi: 10.62411/faith.2024-2.
Z. S. Dhahir, “A Hybrid Approach for Efficient DDoS Detection in Network Traffic Using CBLOF-Based Feature Engineering and XGBoost,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 174–190, Sep. 2024, doi: 10.62411/faith.2024-33.
F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, “Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346–357, Feb. 2024, doi: 10.62411/jcta.9539.
J. A. Ingio, A. S. Nsang, and A. Iorliam, “Optimizing Rice Production Forecasting Through Integrating Multiple Linear Regression with Recursive Feature Elimination,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 96–108, Aug. 2024, doi: 10.62411/faith.2024-17.
M. I. Akazue, I. A. Debekeme, A. E. Edje, C. Asuai, and U. J. Osame, “UNMASKING FRAUDSTERS: Ensemble Features Selection to Enhance Random Forest Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 201–211, Dec. 2023, doi: 10.33633/jcta.v1i2.9462.
D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection,” J. Futur. Artif. Intell. Technol., vol. 2, no. 1, pp. 75–83, 2024, doi: 10.62411/faith.2024-15.
M. D. Okpor et al., “Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 109–123, Sep. 2024, doi: 10.62411/faith.2024-14.
F. O. Aghware et al., “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407–420, Mar. 2024, doi: 10.62411/jcta.10323.
Machine Learning Group - ULB, “Credit Card Fraud Detection,” Kaggle.com, 2017. https://kaggle.com/mlg-ulb/creditcardfraud
F. . Osisanwo, J. E. . Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and J. Akinjobi, “Supervised Machine Learning Algorithms: Classification and Comparison,” Int. J. Comput. Trends Technol., vol. 48, no. 3, pp. 128–138, 2017, doi: 10.14445/22312803/ijctt-v48p126.
Y. Xin and X. Ren, “Predicting depression among rural and urban disabled elderly in China using a random forest classifier,” BMC Psychiatry, vol. 22, no. 1, p. 118, Feb. 2022, doi: 10.1186/s12888-022-03742-4.
X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. with Appl., vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.
A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” J. Chemom., vol. 18, no. 6, pp. 275–285, 2004, doi: 10.1002/cem.873.
O. Lyashevska, F. Malone, E. MacCarthy, J. Fiehler, J.-H. Buhk, and L. Morris, “Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data,” Stat. Methods Med. Res., vol. 30, no. 3, pp. 916–925, 2020, doi: 10.1177/0962280220980484.
A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A Review,” Int. J. Adv. Soft Compu. Appl, vol. 7, no. 3, pp. 176–204, 2015.
G. Lemaitre, F. Nogueira, and C. K. Aridas, “Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 559–563, Jan. 2017.
M. Kubat, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Fourteenth Int. Conf. Mach. Learn., 2000.
P. Mrozek, J. Panneerselvam, and O. Bagdasar, “Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets,” 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC). IEEE, 2020. doi: 10.1109/ucc48980.2020.00067.
M. A. Al-Shabi, “Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets,” J. Adv. Math. Comput. Sci., pp. 1–16, 2019, doi: 10.9734/jamcs/2019/v33i530192.
T. R. Noviandy, G. M. Idroes, and I. Hardi, “An Interpretable Machine Learning Strategy for Antimalarial Drug Discovery with LightGBM and SHAP,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 84–95, Aug. 2024, doi: 10.62411/faith.2024-16.
Copyright (c) 2024 Mochammad Abdurrochman Ari Wibowo, De Rosal Ignatius Moses Setiadi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).