Random Forest-based Hepatocellular Carcinoma Liver Disease Classification Model with LDA Feature Selection on Patient Medical Records
DOI:
https://doi.org/10.30871/jaic.v10i2.11573Keywords:
Hepatocellular Carcinoma, Random Forest, Feature Selection, Classification, LDAAbstract
Hepatocellular carcinoma (HCC) is one of the leading causes of liver cancer mortality worldwide, and early detection remains challenging due to the complexity of clinical indicators. This study investigates a Random Forest-based classification model for HCC using patient medical record data, with Linear Discriminant Analysis (LDA) applied as a feature selection approach. The dataset consists of 100 clinical records comprising 39 attributes. A stratified 80:20 train–test split and cross-validation were employed to evaluate model stability. The baseline Random Forest model achieved an accuracy of 85% with an AUC of 0.69, indicating moderate discrimination performance. When LDA-based feature selection was applied prior to classification, predictive performance did not improve under the current dataset conditions. Although LDA contributed to identifying clinically relevant variables such as bilirubin markers and viral infection indicators, dimensionality reduction did not enhance overall classification results. These findings suggest that Random Forest provides relatively stable performance for HCC classification within limited datasets, while LDA-based feature selection primarily contributes to interpretability rather than predictive gain. However, the results should be interpreted cautiously due to the small sample size and class imbalance. Future work should involve larger datasets and rigorous validation strategies to improve generalization capability.
Downloads
References
[1] N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction,” Front. Bioinforma., vol. 2, June 2022, doi: 10.3389/fbinf.2022.927312.
[2] Huan. Liu and Hiroshi. Motoda, Computational methods of feature selection. Chapman & Hall/CRC, 2008, p. 419.
[3] U. M. Wulandari, B. Warsito, and F. Farikin, “Survival Information System Using ReliefF Feature Selection and Backpropagation in Hepatocellular Carcinoma Disease,” in 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), July 2023, pp. 37–42. doi: 10.1109/ISITIA59021.2023.10221079.
[4] Y. Yin et al., “IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset,” J. Big Data, vol. 10, no. 1, Feb. 2023, doi: 10.1186/s40537-023-00694-8.
[5] E. Odhiambo Omuya, G. Onyango Okeyo, and M. Waema Kimwele, “Feature Selection for Classification using Principal Component Analysis and Information Gain,” Expert Syst. Appl., vol. 174, p. 114765, July 2021, doi: 10.1016/j.eswa.2021.114765.
[6] E. Ileberi, Y. Sun, and Z. Wang, “A machine learning based credit card fraud detection using the GA algorithm for feature selection,” J. Big Data, vol. 9, no. 1, Dec. 2022, doi: 10.1186/s40537-022-00573-8.
[7] U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1060–1073, Apr. 2022, doi: 10.1016/j.jksuci.2019.06.012.
[8] G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, and F. E. Alsaadi, “Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods,” Appl. Soft Comput., vol. 86, p. 105836, Jan. 2020, doi: 10.1016/j.asoc.2019.105836.
[9] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” J. Appl. Sci. Technol. Trends, vol. 1, no. 1, Art. no. 1, May 2020, doi: 10.38094/jastt1224.
[10] H. H. Htun, M. Biehl, and N. Petkov, “Survey of feature selection and extraction techniques for stock market prediction,” Financ. Innov., vol. 9, no. 1, Jan. 2023, doi: 10.1186/s40854-022-00441-7.
[11] M. O. Adebiyi, M. O. Arowolo, M. D. Mshelia, and O. O. Olugbara, “A Linear Discriminant Analysis and Classification Model for Breast Cancer Diagnosis,” Appl. Sci., vol. 12, no. 22, Art. no. 22, Jan. 2022, doi: 10.3390/app122211455.
[12] A. Tharwat, T. Gaber, A. Ibrahim, and A. E. Hassanien, “Linear discriminant analysis: A detailed tutorial,” AI Commun, vol. 30, no. 2, pp. 169–190, Jan. 2017, doi: 10.3233/AIC-170729.
[13] M. Park, D. Jung, S. Lee, and S. Park, “Heatwave Damage Prediction Using Random Forest Model in Korea,” Appl. Sci., vol. 10, no. 22, Art. no. 22, Jan. 2020, doi: 10.3390/app10228237.
[14] N. Istiqamah, B. Surarso, and B. Warsito, “Classification of customer review using random forest classifier,” AIP Conf. Proc., vol. 2738, no. 1, p. 060005, June 2023, doi: 10.1063/5.0140436.
[15] N. Rahmayuna, D. S. Rahardwika, C. A. Sari, D. R. I. M. Setiadi, and E. H. Rachmawanto, “Pathogenic Bacteria Genus Classification using Support Vector Machine,” in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Nov. 2018, pp. 23–27. doi: 10.1109/ISRITI.2018.8864478
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nurul Istiqamah, Arif Iman Anshori, Novita Rahmayuna, Umi Meganinditya Wulandari

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








