Detecting Financial Fraud Using Random Forest Machine Learning

Peta Kahiomba Esther; Mabela Matendo Rostin; Kafunda Katalay Pierre; Mbuyi Mukendi Eugene; Mitelezi Mbila Jonathan; Albert Ntumba

doi:10.30871/jaic.v10i3.12539

Authors

Peta Kahiomba Esther Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo
Mabela Matendo Rostin Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo
Kafunda Katalay Pierre Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo
Mbuyi Mukendi Eugene Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo
Mitelezi Mbila Jonathan Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo
Albert Ntumba Department of Mathematics, Statistics and Computer Science, Faculty of Science and Technology, University of Kinshasa, Kinshasa, DR Congo

DOI:

https://doi.org/10.30871/jaic.v10i3.12539

Keywords:

Fraud Detection, Artificial Intelligence, Decision Tree, Data Science

Abstract

Financial fraud detection is a critical challenge for banking institutions facing increasingly sophisticated threats in digital transaction environments. This study investigates the application of the Random Forest algorithm for detecting fraudulent credit card transactions using the publicly available benchmark dataset from the Université Libre de Bruxelles (284,807 transactions, 0.172% fraud prevalence). Pre-processing includes QuantileTransformer normalization and SMOTE oversampling applied exclusively to the training set to address class imbalance. The model (n_estimators = 200) is validated using a stratified 70/30 split combined with 10-fold cross-validation to ensure robustness and prevent overfitting. Results yield an accuracy of 97%, ROC-AUC of 97%, precision of 95%, recall of 78%, and F1-score of 86%. Comparative evaluation against Logistic Regression, Support Vector Machine, and Gradient Boosting confirms that Random Forest provides the best balance between detection performance and computational efficiency (training: 45 s; inference: 0.3 ms per transaction). Feature importance analysis identifies transaction amount and PCA components V14 and V17 as the most discriminative variables. Confusion matrix analysis reveals 68 False Negatives and 142 False Positives out of 85,443 test samples. Despite these results, limitations include reduced feature interpretability due to PCA transformation, potential geographic data bias, and real-time production deployment challenges. This work confirms the relevance of Random Forest for financial fraud detection and opens perspectives toward hybrid deep learning and graph-based architectures.

Downloads

Download data is not yet available.

References

[1] C. Liu, Y. Chan, S. H. A. Kazmi, and H. Fu, “Financial Fraud Detection Model: Based on Random Forest,” Int. J. Econ. Finance, vol. 8, no. 2, pp. 17-26, 2016.

[2] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit Card Fraud Detection Using Machine Learning Techniques: A Comparative Analysis,” International Journal of Computer Applications, vol. 175, no. 7, pp. 1–9, Oct. 2017. DOI: 10.5120/ijca2017915828

[3] A. H. M. Aburbeian and H. I. Ashqar, “Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data,” IEEE Access, vol. 11, pp. 44291–44302, 2023. DOI: 10.1109/ACCESS.2023.3270292

[4] V. N. Dornadula and S. Geetha, “Credit Card Fraud Detection Using Machine Learning Algorithms,” Procedia Computer Science, vol. 165, pp. 631–641, 2019. DOI: 10.1016/j.procs.2019.12.197

[5] M. Zareapoor and P. Shamsolmoali, “Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier,” Procedia Computer Science, vol. 48, pp. 679–685, 2015. DOI: 10.1016/j.procs.2015.04.201, vol. 436, pp. 012075, 2018. [Online]. Available: https://iopscience.iop.org/article/10.1088/1757-899X/436/1/012075

[6] N. Carneiro, G. Figueira, and M. Costa, “A data mining based system for credit-card fraud detection in e-tail,” Expert Systems with Applications, vol. 95, pp. 231–245, 2018. DOI: 10.1016/j.eswa.2017.11.020

[7] M. Mounika, D. Aravinda, and B. Ramesh, “Credit Card Fraud Detection using Random Forest Algorithm,” [Online], 2021.

[8] A. K. Kalusivalingam, A. Sharma, N. Patel, and V. Singh, “Enhancing Financial Fraud Detection with Hybrid Deep Learning and Random Forest Algorithms,” Cogn. Comput. J., [Online]. Available: https://cognitivecomputingjournal.com

[9] W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” IEEE Access, vol. 10, pp. 82304–82360, 2022. DOI: 10.1109/ACCESS.2022.3196318

[10] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data mining for credit card fraud: A comparative study,” Decision Support Systems, vol. 50, no. 3, pp. 602–613, 2011. DOI: 10.1016/j.dss.2010.08.006

[11] Z. Zhang, X. Zhou, X. Zhang, L. Wang, and P. Wang, “A model based on convolutional neural network for online transaction fraud detection,” Security and Communication Networks, vol. 2022, Article ID 4643998, 2022. DOI: 10.1155/2022/4643998

[12] A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,” Expert Systems with Applications, vol. 51, pp. 134–142, 2016. DOI: 10.1016/j.eswa.2015.12.030[Online]. Available: https://doi.org/10.1002/9781119302797

[13] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. [Online]. Available: https://doi.org/10.1023/A:1010933404324

[14] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, vol. 41, no. 10, pp. 4915–4928, 2014. DOI: 10.1016/j.eswa.2014.02.026[Online]. Available: https://doi.org/10.1109/SSCI.2015.33

[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

[16] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. S. Hacid, and H. Zeineddine, “An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection,” IEEE Access, vol. 7, pp. 93010-93022, 2019.

Detecting Financial Fraud Using Random Forest Machine Learning

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn