Classification of Tumor and Normal Tissue Gene Expression in Lung Adenocarcinoma Using Support Vector Machine and Gaussian Process Classification
DOI:
https://doi.org/10.30871/jaic.v9i6.11763Keywords:
Biomarker, Gene expression, GPC, Lung adenocarcinoma, SVMAbstract
Lung adenocarcinoma (LUAD) is a major cause of cancer-related mortality worldwide. This study aims to identify potential LUAD biomarkers and develop robust classification models using the GSE151101 microarray dataset. Preprocessing included RMA normalization, ComBat batch-effect correction, and feature filtering based on annotation completeness, variability, and statistical significance. Support Vector Machine (SVM) and Gaussian Process Classification (GPC) models were constructed, with the polynomial GPC model achieving the best performance (accuracy 97.92%; F1-score 97.96%). Repeated 10-fold cross-validation confirmed its stability (mean accuracy 96.88%, SD ±1.97%, CV 2.03%), outperforming linear SVM, GPC-RBF, and Multiple Kernel Learning (MKL). Functional enrichment analysis showed that key discriminative genes; CDH13, CDKN2A, BCL2L11, MYL9, COL1A1, and AKT3; were enriched in pathways related to epithelial–mesenchymal transition, extracellular matrix remodelling, focal adhesion, PI3K/AKT signalling, and cell-cycle regulation, all of which are central to LUAD progression. In general, polynomial-kernel GPC is a stable and useful way to classify transcriptomes and rank biomarkers. Nevertheless, the translational potential of these signatures requires further validation in independent and clinically controlled cohorts.
Downloads
References
[1] T. I. A. Mohamed, A. E. Ezugwu, J. V. Fonou-Dombeu, M. Mohammed, J. Greeff, and M. K. Elbashir, “A novel feature selection algorithm for identifying hub genes in lung cancer,” Sci. Rep., vol. 13, no. 1, p. 21671, Dec. 2023, doi: 10.1038/s41598-023-48953-1.
[2] C. Zhang et al., “Identification of lncRNA, miRNA and mRNA expression profiles and ceRNA Networks in small cell lung cancer,” BMC Genomics, vol. 24, no. 1, p. 217, Apr. 2023, doi: 10.1186/s12864-023-09306-4.
[3] D. Wu, Y. Liu, J. Liu, L. Ma, and X. Tong, “Myeloid cell differentiation-related gene signature for predicting clinical outcome, immune microenvironment, and treatment response in lung adenocarcinoma,” Sci. Rep., vol. 14, no. 1, p. 17460, July 2024, doi: 10.1038/s41598-024-68111-5.
[4] S. A. G. Willis-Owen et al., “Y disruption, autosomal hypomethylation and poor male lung cancer survival,” Sci. Rep., vol. 11, no. 1, p. 12453, June 2021, doi: 10.1038/s41598-021-91907-8.
[5] M. F. Azhari and R. Fajriyah, “Idektifikasi Gen Marker Pbmcs Ischemic Stroke Menggunakan Analisis Bioinformatika Dan Support Vector Machine,” no. 1, 2024.
[6] Universitas Gadjah Mada, V. Sutanto, Z. Sukma, Business Intelligence Data Engineering Division, A. Afiahayati, and Universitas Gadjah Mada, “Predicting Secondary Structure of Protein Using Hybrid of Convolutional Neural Network and Support Vector Machine,” Int. J. Intell. Eng. Syst., vol. 14, no. 1, pp. 232–243, Feb. 2021, doi: 10.22266/ijies2021.0228.23.
[7] N. Jiang et al., “Identification of endoplasmic reticulum stress genes in human stroke based on bioinformatics and machine learning,” Neurobiol. Dis., vol. 199, p. 106583, Sept. 2024, doi: 10.1016/j.nbd.2024.106583.
[8] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning, 3. print. in Adaptive computation and machine learning. Cambridge, Mass.: MIT Press, 2008.
[9] C. Hardcastle, R. O’Mullan, R. Arróyave, and B. Vela, “Physics-informed Gaussian process classification for constraint-aware alloy design,” Digit. Discov., vol. 4, no. 7, pp. 1884–1900, 2025, doi: 10.1039/D5DD00084J.
[10] R. Fajriyah, H. A. Isnandar, and A. Arifuddin, “Gene Markers Identification Of Acute Myocardial Infarction Disease Based On Genomic Profiling Through Extreme Gradient Boosting (XGBoost),” MEDIA Stat., vol. 17, no. 1, pp. 69–80, Oct. 2024, doi: 10.14710/medstat.17.1.69-80.
[11] J. P. Debnath et al., “Identification of potential biomarkers for 2022 Mpox virus infection: a transcriptomic network analysis and machine learning approach,” Sci. Rep., vol. 15, no. 1, p. 2922, Jan. 2025, doi: 10.1038/s41598-024-80519-7.
[12] K. P. Murphy, Machine learning: a probabilistic perspective, 4. print. (fixed many typos). in Adaptive computation and machine learning series. Cambridge, Mass.: MIT Press, 2013.
[13] K.-L. Du, B. Jiang, J. Lu, J. Hua, and M. N. S. Swamy, “Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions,” Mathematics, vol. 12, no. 24, p. 3935, Dec. 2024, doi: 10.3390/math12243935.
[14] J. Wu and C. Hicks, “Breast Cancer Type Classification Using Machine Learning,” J. Pers. Med., vol. 11, no. 2, p. 61, Jan. 2021, doi: 10.3390/jpm11020061.
[15] A. Banerjee, D. Dunson, and S. Tokdar, “Efficient Gaussian Process Regression for Large Data Sets,” June 29, 2011, arXiv: arXiv:1106.5779. doi: 10.48550/arXiv.1106.5779.
[16] N. Amaya-Tejera, M. Gamarra, J. I. Vélez, and E. Zurek, “A distance-based kernel for classification via Support Vector Machines,” Front. Artif. Intell., vol. 7, p. 1287875, Feb. 2024, doi: 10.3389/frai.2024.1287875.
[17] M. Gonen, E. Alpaydın, B. E. Tr, and B. E. Tr, “Multiple Kernel Learning Algorithms”.
[18] L. Wang, H. Wang, and G. Fu, “Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data,” IEEE Access, vol. 9, pp. 565–580, 2021, doi: 10.1109/ACCESS.2020.3046604.
[19] D. Rosati et al., “Differential gene expression analysis pipelines and bioinformatic tools for the identification of specific biomarkers: A review,” Comput. Struct. Biotechnol. J., vol. 23, pp. 1154–1168, Dec. 2024, doi: 10.1016/j.csbj.2024.02.018.
[20] P. Yang, P. Feng, G. Tian, G. Zhao, G. Yuan, and Y. Pan, “Integrative machine learning and bioinformatics analysis unveil key genes for precise glioma classification and prognosis evaluation,” Comput. Biol. Chem., vol. 119, p. 108510, Dec. 2025, doi: 10.1016/j.compbiolchem.2025.108510.
[21] V. Alur, V. Raju, B. Vastrad, C. Vastrad, S. Kavatagimath, and S. Kotturshetti, “Bioinformatics Analysis of Next Generation Sequencing Data Identifies Molecular Biomarkers Associated With Type 2 Diabetes Mellitus,” Clin. Med. Insights Endocrinol. Diabetes, vol. 16, p. 11795514231155635, Jan. 2023, doi: 10.1177/11795514231155635.
[22] A. De Falco, Z. Dezso, F. Ceccarelli, L. Cerulo, A. Ciaramella, and M. Ceccarelli, “Adaptive one-class Gaussian processes allow accurate prioritization of oncology drug targets,” Bioinformatics, vol. 37, no. 10, pp. 1420–1427, June 2021, doi: 10.1093/bioinformatics/btaa968.
[23] W. Pu et al., “Aberrant methylation of CDH13 can be a diagnostic biomarker for lung adenocarcinoma,” J. Cancer, vol. 7, no. 15, pp. 2280–2289, 2016, doi: 10.7150/jca.15758.
[24] J. Magenheim et al., “Universal lung epithelium DNA methylation markers for detection of lung damage in liquid biopsies,” Eur. Respir. J., vol. 60, no. 5, p. 2103056, Nov. 2022, doi: 10.1183/13993003.03056-2021.
[25] H. Devos, J. Zoidakis, M. G. Roubelakis, A. Latosinska, and A. Vlahou, “Reviewing the Regulators of COL1A1,” Int. J. Mol. Sci., vol. 24, no. 12, p. 10004, Jan. 2023, doi: 10.3390/ijms241210004.
[26] S. Khurana, A. P. Singh, A. Kumar, and R. Nema, “Prognostic value of AKT isoforms in non-small cell lung adenocarcinoma,” J. Biomed. Res., vol. 37, no. 3, pp. 225–228, May 2023, doi: 10.7555/JBR.36.20220138.
[27] “MYL9 binding with MYO19 suppresses epithelial-mesenchymal transition in non-small-cell lung cancer.” Accessed: Nov. 19, 2025. [Online]. Available: https://journals.physiology.org/doi/epdf/10.1152/physiolgenomics.00119.2024
[28] L. Hou, T. Lin, Y. Wang, B. Liu, and M. Wang, “Collagen type 1 alpha 1 chain is a novel predictive biomarker of poor progression-free survival and chemoresistance in metastatic lung cancer,” J. Cancer, vol. 12, no. 19, pp. 5723–5731, July 2021, doi: 10.7150/jca.59723.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rahmadi Yotenka, Adhitya Ronnie Effendie, Rohmatul Fajriyah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








