Comparative Analysis of the Performance of Decision Tree and Random Forest Algorithms in SQL Injection Attack Detection

  • Alfatarizky Budi Aulianoor Universitas Amikom Yogyakarta
  • Muhammad Koprawi Universitas Amikom Yogyakarta
Keywords: Machine Learning, Decision Tree, Random Forest, SQL Injection, Database

Abstract

This study compares the performance of two machine learning algorithms the Decision Tree and Random Forest. SQL Injection attacks continue to threaten web applications because they exploit vulnerabilities by injecting malicious code into SQL statements executed on database servers. Therefore, machine learning algorithms are used to identify SQL Injection attacks. The dataset used is 33761 in the form of random query data input in a CSV tabular containing sentence and label columns. The research software used is Google Colaboratory and Microsoft Edge. The series of research conducted by Collect Data is data collection, Preprocessing handling missing values, deleting rows that contain duplicates, and the same query having different labels. Train and Test is used to build models and prepare test data, Build and Compile involves building Decision Tree and Random Forest models. The final step is to evaluate both algorithm models to determine which performs better. After conducting a series of research processes, the results of the Random Forest algorithm are slightly better than the Decision Tree algorithm, with an accuracy of 99.81%, precision of 99.79%, recall of 99.65%, and an average F1-score of 99.72%.

Downloads

Download data is not yet available.

References

O. Alotaibi and E. Pardede, “Transformation of schema from relational database (RDB) to NoSQL databases,” Data (Basel), vol. 4, no. 4, Dec. 2019, doi: 10.3390/data4040148.

C. A. Győrödi, D. V. Dumşe-Burescu, D. R. Zmaranda, R. Győrödi, G. A. Gabor, and G. D. Pecherle, “Performance analysis of nosql and relational databases with couchdb and mysql for application’s data storage,” Applied Sciences (Switzerland), vol. 10, no. 23, pp. 1–21, Dec. 2020, doi: 10.3390/app10238524.

J. Zhang, S. Hu, Z. Shi, and S. Han, “A Learning Sentiment Database for Machine Learning,” in Journal of Physics: Conference Series, Institute of Physics, 2023. doi: 10.1088/1742-6596/2504/1/012030.

F. Abdelhedi, R. Jemmali, and G. Zurfluh, “Relational Databases Ingestion into a NoSQL Data Warehouse.”

G. J. J. van den Burg, A. Nazábal, and C. Sutton, “Wrangling messy CSV files by detecting row and type patterns,” Data Min Knowl Discov, vol. 33, no. 6, pp. 1799–1820, Nov. 2019, doi: 10.1007/s10618-019-00646-y.

K. Sidharta and T. Wibowo, “Studi Efisiensi Sumber Daya Terhadap Efektivitas Penggunaan Database : Studi Kasus Sql Server Dan Mysql.” [Online]. Available: http://journal.uib.ac.id/index.php/cbssit

H. Jurnal et al., “Jurnal Informatika Dan Teknologi Komputer Analisa Perbandingan Kinerja Response Time Query Mysql Dan Mongodb,” Juli, vol. 2, no. 2, pp. 158–166, 2022.

M. Alghawazi, D. Alghazzawi, and S. Alarifi, “Detection of SQL Injection Attack Using Machine Learning Techniques: A Systematic Literature Review,” Journal of Cybersecurity and Privacy, vol. 2, no. 4, pp. 764–777, Dec. 2022, doi: 10.3390/jcp2040039.

O. Cheikhrouhou, H. Hamam, A. Mahfoudhi, and I. Jemal, “SQL Injection Attack Detection and Prevention Techniques Using Machine Learning,” 2020. [Online]. Available: http://www.ripublication.com

Institute of Electrical and Electronics Engineers, 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA).

Y. Abdulmalik, “An Improved SQL Injection Attack Detection Model Using Machine Learning Techniques,” International Journal of Innovative Computing, vol. 11, no. 1, pp. 53–57, Apr. 2021, doi: 10.11113/ijic.v11n1.300.

B. A. Pham and V. H. Subburaj, “An Experimental setup for Detecting SQLi Attacks using Machine Learning Algorithms,” 2020.

F. G. Deriba, A. O. Salau, S. H. Mohammed, T. M. Kassa, and W. B. Demilie, “Development of a Compressive Framework Using Machine Learning Approaches for SQL Injection Attacks,” Przeglad Elektrotechniczny, vol. 98, no. 7, pp. 181–187, 2022, doi: 10.15199/48.2022.07.30.

Abhishek, A. Dhankar, and N. Gupta, “A systematic review of techniques, tools and applications of machine learning,” in Proceedings of the 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, ICICV 2021, Institute of Electrical and Electronics Engineers Inc., Feb. 2021, pp. 764–768. doi: 10.1109/ICICV50876.2021.9388637.

M. Bagas, A. Darmawan, F. Dewanta, and S. Astuti, “Analisis Perbandingan Algoritma Decision Tree, Random Forest, dan Naïve Bayes untuk Prediksi Banjir di Desa Dayeuhkolot Comparative Analysis of Decision Tree, Random Forest, and Naïve Bayes Algorithm for Flood Prediction at Dayeuhkolot Village,” TELKA, vol. 9, no. 1, pp. 52–61, 2023.

A. Irma Purnamasari and I. Ali, “Perbandingan Tingkat Akurasi Algoritma Decision Tree Dan Random Forest Dalam Mengklasifikasi Penerima Bantuan Sosial Bpnt Di Desa Slangit,” 2024.

K. K. Dutta, S. A. Sunny, A. Victor, A. G. Nathu, M. Ayman Habib, and D. Parashar, “Kannada alphabets recognition using decision tree and random forest models,” in Proceedings of the 3rd International Conference on Intelligent Sustainable Systems, ICISS 2020, Institute of Electrical and Electronics Engineers Inc., Dec. 2020, pp. 534–541. doi: 10.1109/ICISS49785.2020.9315972.

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.

G. Karatas, O. Demir, and O. K. Sahingoz, “Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset,” IEEE Access, vol. 8, pp. 32150–32162, 2020, doi: 10.1109/ACCESS.2020.2973219.

M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decision Analytics Journal, vol. 3, p. 100071, Jun. 2022, doi: 10.1016/j.dajour.2022.100071.

J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Systems with Applications, vol. 134. Elsevier Ltd, pp. 93–101, Nov. 15, 2019. doi: 10.1016/j.eswa.2019.05.028.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,” Augmented Human Research, vol. 5, no. 1, Dec. 2020, doi: 10.1007/s41133-020-00032-0.

S. Tufail, H. Riggs, M. Tariq, and A. I. Sarwat, “Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms,” Electronics (Switzerland), vol. 12, no. 8. MDPI, Apr. 01, 2023. doi: 10.3390/electronics12081789.

S. Sza Amulya Larasati, E. Nuraida Kusuma Dewi, B. Hanif Farhansyah, F. Abdurrachman Bachtiar, F. Pradana, and U. Brawijaya, “Penerapan Decision Tree Dan Random Forest Dalam Deteksi Tingkat Stres Manusia Berdasarkan Kondisi Tidur,” vol. 10, no. 7, pp. 1503–1510, 2023, doi: 10.25126/jtiik.2023107993.

Published
2024-07-25
How to Cite
[1]
A. Aulianoor and M. Koprawi, “Comparative Analysis of the Performance of Decision Tree and Random Forest Algorithms in SQL Injection Attack Detection”, JAIC, vol. 8, no. 1, pp. 194-202, Jul. 2024.
Section
Articles