Random Forest-based Hepatocellular Carcinoma Liver Disease Classification Model with LDA Feature Selection on Patient Medical Records

Nurul Istiqamah; Arif Iman Anshori; Novita Rahmayuna; Umi Meganinditya Wulandari

doi:10.30871/jaic.v10i2.11573

Authors

Nurul Istiqamah Institut Teknologi dan Bisnis Nobel Indonesia
Arif Iman Anshori Nahdlatul Ulama Institute of Technology and Science, Pekalongan
Novita Rahmayuna Bina Nusantara University
Umi Meganinditya Wulandari Nahdlatul Ulama Institute of Technology and Science, Pekalongan

DOI:

https://doi.org/10.30871/jaic.v10i2.11573

Keywords:

Hepatocellular Carcinoma, Random Forest, Feature Selection, Classification, LDA

Abstract

Hepatocellular carcinoma (HCC) is one of the leading causes of liver cancer mortality worldwide, and early detection remains challenging due to the complexity of clinical indicators. This study investigates a Random Forest-based classification model for HCC using patient medical record data, with Linear Discriminant Analysis (LDA) applied as a feature selection approach. The dataset consists of 100 clinical records comprising 39 attributes. A stratified 80:20 train–test split and cross-validation were employed to evaluate model stability. The baseline Random Forest model achieved an accuracy of 85% with an AUC of 0.69, indicating moderate discrimination performance. When LDA-based feature selection was applied prior to classification, predictive performance did not improve under the current dataset conditions. Although LDA contributed to identifying clinically relevant variables such as bilirubin markers and viral infection indicators, dimensionality reduction did not enhance overall classification results. These findings suggest that Random Forest provides relatively stable performance for HCC classification within limited datasets, while LDA-based feature selection primarily contributes to interpretability rather than predictive gain. However, the results should be interpreted cautiously due to the small sample size and class imbalance. Future work should involve larger datasets and rigorous validation strategies to improve generalization capability.

Downloads

Download data is not yet available.

References

[1] N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction,” Front. Bioinforma., vol. 2, June 2022, doi: 10.3389/fbinf.2022.927312.

[2] Huan. Liu and Hiroshi. Motoda, Computational methods of feature selection. Chapman & Hall/CRC, 2008, p. 419.

[3] U. M. Wulandari, B. Warsito, and F. Farikin, “Survival Information System Using ReliefF Feature Selection and Backpropagation in Hepatocellular Carcinoma Disease,” in 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), July 2023, pp. 37–42. doi: 10.1109/ISITIA59021.2023.10221079.

[4] Y. Yin et al., “IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset,” J. Big Data, vol. 10, no. 1, Feb. 2023, doi: 10.1186/s40537-023-00694-8.

[5] E. Odhiambo Omuya, G. Onyango Okeyo, and M. Waema Kimwele, “Feature Selection for Classification using Principal Component Analysis and Information Gain,” Expert Syst. Appl., vol. 174, p. 114765, July 2021, doi: 10.1016/j.eswa.2021.114765.

[6] E. Ileberi, Y. Sun, and Z. Wang, “A machine learning based credit card fraud detection using the GA algorithm for feature selection,” J. Big Data, vol. 9, no. 1, Dec. 2022, doi: 10.1186/s40537-022-00573-8.

[7] U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1060–1073, Apr. 2022, doi: 10.1016/j.jksuci.2019.06.012.

[8] G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, and F. E. Alsaadi, “Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods,” Appl. Soft Comput., vol. 86, p. 105836, Jan. 2020, doi: 10.1016/j.asoc.2019.105836.

[9] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” J. Appl. Sci. Technol. Trends, vol. 1, no. 1, Art. no. 1, May 2020, doi: 10.38094/jastt1224.

[10] H. H. Htun, M. Biehl, and N. Petkov, “Survey of feature selection and extraction techniques for stock market prediction,” Financ. Innov., vol. 9, no. 1, Jan. 2023, doi: 10.1186/s40854-022-00441-7.

[11] M. O. Adebiyi, M. O. Arowolo, M. D. Mshelia, and O. O. Olugbara, “A Linear Discriminant Analysis and Classification Model for Breast Cancer Diagnosis,” Appl. Sci., vol. 12, no. 22, Art. no. 22, Jan. 2022, doi: 10.3390/app122211455.

[12] A. Tharwat, T. Gaber, A. Ibrahim, and A. E. Hassanien, “Linear discriminant analysis: A detailed tutorial,” AI Commun, vol. 30, no. 2, pp. 169–190, Jan. 2017, doi: 10.3233/AIC-170729.

[13] M. Park, D. Jung, S. Lee, and S. Park, “Heatwave Damage Prediction Using Random Forest Model in Korea,” Appl. Sci., vol. 10, no. 22, Art. no. 22, Jan. 2020, doi: 10.3390/app10228237.

[14] N. Istiqamah, B. Surarso, and B. Warsito, “Classification of customer review using random forest classifier,” AIP Conf. Proc., vol. 2738, no. 1, p. 060005, June 2023, doi: 10.1063/5.0140436.

[15] N. Rahmayuna, D. S. Rahardwika, C. A. Sari, D. R. I. M. Setiadi, and E. H. Rachmawanto, “Pathogenic Bacteria Genus Classification using Support Vector Machine,” in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Nov. 2018, pp. 23–27. doi: 10.1109/ISRITI.2018.8864478

Random Forest-based Hepatocellular Carcinoma Liver Disease Classification Model with LDA Feature Selection on Patient Medical Records

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn