Model Pembelajaran Mesin untuk Deteksi Penipuan Kartu Kredit yang Dioptimalkan Menggunakan SMOTE-Tomek dan Rekayasa Fitur

Mochammad Abdurrochman Ari Wibowo; De Rosal Ignatius Moses Setiadi

doi:10.30871/jaic.v8i2.8732

Authors

Mochammad Abdurrochman Ari Wibowo Universitas Dian Nuswantoro
De Rosal Ignatius Moses Setiadi Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v8i2.8732

Keywords:

Creditcard, Feature Engineering, Machine Learning, Oversampling, Smote-Tomek

Abstract

In today's digital economy, credit cards are essential and both credit card usage and theft have increased significantly in recent years. Credit card fraud can be categorized using machine learning models using data from suspicious transaction history. However, credit data is often imbalanced. Therefore, machine learning models are biased towards the majority class resulting in poor performance on publicly accessible Kaggle credit card classification datasets. We balance the class distribution in the dataset using a hybrid synthetic minority oversampling strategy to address this difficulty. The findings show that the random forest machine learning model combined with oversampling techniques combined with feature engineering and cross-validation yields optimal results of more than 99% for all assessment measures. It performs better compared to three other models, namely decision tree, gradient boosting, and XGBoost. It can be concluded that the use of feature engineering, cross-validation, and oversampling are useful approaches to handle imbalanced credit card data and ultimately help in preventing credit card transaction fraud.

Downloads

Download data is not yet available.

Author Biography

Mochammad Abdurrochman Ari Wibowo, Universitas Dian Nuswantoro

Program Studi Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang, 50131, Indonesia

References

M. Habibpour et al., "Uncertainty-aware credit card fraud detection using deep learning," Eng. Appl. Artif. Intell., vol. 123, p. 106248, 2023, doi: 10.1016/j.engappai.2023.106248.

G. Zhang et al., "eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks," ACM Trans. Inf. Syst., vol. 40, no. 3, pp. 1"“29, Jul. 2022, doi: 10.1145/3474379.

A. Ali et al., "Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review," Appl. Sci., vol. 12, no. 19, p. 9637, Sep. 2022, doi: 10.3390/app12199637.

P. T. S. Ningsih, M. Gusvarizon, and R. Hermawan, "Analisis Sistem Pendeteksi Penipuan Transaksi Kartu Kredit dengan Algoritma Machine Learning," J. Teknol. Inform. dan Komput., vol. 8, no. 2, pp. 386"“401, Sep. 2022, doi: 10.37012/jtik.v8i2.1306.

A. Shen, R. Tong, and Y. Deng, "Application of Classification Models on Credit Card Fraud Detection," in 2007 International Conference on Service Systems and Service Management, Jun. 2007, pp. 1"“4. doi: 10.1109/ICSSSM.2007.4280163.

E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, "Credit Card Fraud Detection Using a New Hybrid Machine Learning Architecture," Mathematics, vol. 10, no. 9, p. 1480, Apr. 2022, doi: 10.3390/math10091480.

Y. Pristyanto and A. A. Zein, "Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class," J. Sisfokom (Sistem Inf. dan Komputer), vol. 12, no. 1, pp. 9"“15, 2023, doi: 10.32736/sisfokom.v12i1.1399.

E. Ileberi, Y. Sun, and Z. Wang, "Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost," IEEE Access, vol. 9, pp. 165286"“165294, 2021, doi: 10.1109/access.2021.3134330.

M. Mujahid et al., "Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering," J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00943-4.

D. R. I. M. Setiadi, H. M. M. Islam, G. A. Trisnapradika, and W. Herowati, "Analyzing Preprocessing Impact on Machine Learning Classifiers for Cryotherapy and Immunotherapy Dataset," J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 39"“50, Jun. 2024, doi: 10.62411/faith.2024-2.

Z. S. Dhahir, "A Hybrid Approach for Efficient DDoS Detection in Network Traffic Using CBLOF-Based Feature Engineering and XGBoost," J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 174"“190, Sep. 2024, doi: 10.62411/faith.2024-33.

F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, "Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing," J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346"“357, Feb. 2024, doi: 10.62411/jcta.9539.

J. A. Ingio, A. S. Nsang, and A. Iorliam, "Optimizing Rice Production Forecasting Through Integrating Multiple Linear Regression with Recursive Feature Elimination," J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 96"“108, Aug. 2024, doi: 10.62411/faith.2024-17.

M. I. Akazue, I. A. Debekeme, A. E. Edje, C. Asuai, and U. J. Osame, "UNMASKING FRAUDSTERS: Ensemble Features Selection to Enhance Random Forest Fraud Detection," J. Comput. Theor. Appl., vol. 1, no. 2, pp. 201"“211, Dec. 2023, doi: 10.33633/jcta.v1i2.9462.

D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, "Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection," J. Futur. Artif. Intell. Technol., vol. 2, no. 1, pp. 75"“83, 2024, doi: 10.62411/faith.2024-15.

M. D. Okpor et al., "Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble," J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 109"“123, Sep. 2024, doi: 10.62411/faith.2024-14.

F. O. Aghware et al., "Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection," J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407"“420, Mar. 2024, doi: 10.62411/jcta.10323.

Machine Learning Group - ULB, "Credit Card Fraud Detection," Kaggle.com, 2017. https://kaggle.com/mlg-ulb/creditcardfraud

F. . Osisanwo, J. E. . Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and J. Akinjobi, "Supervised Machine Learning Algorithms: Classification and Comparison," Int. J. Comput. Trends Technol., vol. 48, no. 3, pp. 128"“138, 2017, doi: 10.14445/22312803/ijctt-v48p126.

Y. Xin and X. Ren, "Predicting depression among rural and urban disabled elderly in China using a random forest classifier," BMC Psychiatry, vol. 22, no. 1, p. 118, Feb. 2022, doi: 10.1186/s12888-022-03742-4.

X. Y. Liew, N. Hameed, and J. Clos, "An investigation of XGBoost-based algorithm for breast cancer classification," Mach. Learn. with Appl., vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.

A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, "An introduction to decision tree modeling," J. Chemom., vol. 18, no. 6, pp. 275"“285, 2004, doi: 10.1002/cem.873.

O. Lyashevska, F. Malone, E. MacCarthy, J. Fiehler, J.-H. Buhk, and L. Morris, "Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data," Stat. Methods Med. Res., vol. 30, no. 3, pp. 916"“925, 2020, doi: 10.1177/0962280220980484.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, "Classification with class imbalance problem: A Review," Int. J. Adv. Soft Compu. Appl, vol. 7, no. 3, pp. 176"“204, 2015.

G. Lemaitre, F. Nogueira, and C. K. Aridas, "Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning," J. Mach. Learn. Res., vol. 18, no. 1, pp. 559"“563, Jan. 2017.

M. Kubat, "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection," Fourteenth Int. Conf. Mach. Learn., 2000.

P. Mrozek, J. Panneerselvam, and O. Bagdasar, "Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets," 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC). IEEE, 2020. doi: 10.1109/ucc48980.2020.00067.

M. A. Al-Shabi, "Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets," J. Adv. Math. Comput. Sci., pp. 1"“16, 2019, doi: 10.9734/jamcs/2019/v33i530192.

T. R. Noviandy, G. M. Idroes, and I. Hardi, "An Interpretable Machine Learning Strategy for Antimalarial Drug Discovery with LightGBM and SHAP," J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 84"“95, Aug. 2024, doi: 10.62411/faith.2024-16.

Optimized Machine Learning Model for Credit Card Fraud Detection Using SMOTE-Tomek and Feature Engineering

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Mochammad Abdurrochman Ari Wibowo, Universitas Dian Nuswantoro

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn