The Impact of the L1/L2 Ratio on Selection Stability and Solution Sparsity along the Elastic Net Regularization Path in High-Dimensional Genomic Data
DOI:
https://doi.org/10.30871/jaic.v10i1.12059Keywords:
Elastic Net, LASSO, High-Dimensional Data, Feature Selection Stability, Sparsity, Regularization PathAbstract
High-dimensional genomic datasets (p>n) pose persistent challenges for predictive modeling and biomarker-oriented feature selection due to multicollinearity and instability of selected feature sets under resampling. Although Elastic Net is widely used to address correlated predictors via combined L1/L2 regularization, the practical role of the L1/L2 mixing ratio (α) is often treated as a secondary tuning choice driven primarily by predictive accuracy. This study investigates how varying α shapes the trade-off among selection stability, solution sparsity, and predictive performance along the Elastic Net regularization path. Experiments were conducted using the publicly available METABRIC breast cancer cohort (n = 1,964) with 21,113 gene expression features and a binary overall survival status outcome. Logistic regression with Elastic Net penalty was fitted across a grid of α values, with the regularization strength (λ) selected by cross-validation. Feature selection stability was evaluated under repeated resampling using the Jaccard index, Dice coefficient, and Adjusted Rand Index (ARI), while sparsity was summarized by the average number of non-zero coefficients; predictive performance was assessed using AUC, accuracy, and F1-score. Results show a monotonic decline in stability as α increases: α = 0.2 yields the highest stability (Jaccard 0.324, Dice 0.487, ARI 0.434), whereas LASSO (α = 1.0) produces the lowest stability (Jaccard 0.278, Dice 0.431, ARI 0.400). In contrast, predictive performance varies only marginally across α (AUC 0.696–0.704; accuracy 0.666–0.671; F1-score 0.738–0.742), while sparsity changes substantially (average selected features 110–204). Coefficient path analyses further illustrate abrupt shrinkage under LASSO versus smoother, group-preserving shrinkage under Elastic Net, consistent with improved reproducibility under lower-to-moderate α. Frequency-of-selection analysis highlights genes repeatedly selected across resampling, supporting interpretability of stable configurations without claiming causal biomarker validity. Overall, the findings demonstrate that α is a substantive modeling choice that materially affects stability and sparsity even when accuracy is similar, motivating stability-aware tuning for high-dimensional genomic prediction and reproducible feature discovery.
Downloads
References
[1] G. C. Reinsel, R. P. Velu, and K. Chen, “High-Dimensional Reduced-Rank Regression,” in Lecture Notes in Statistics, vol. 225, 2022, pp. 279–309. doi: 10.1007/978-1-0716-2793-8_10.
[2] J. Wang, P. Zhao, Y. Li, Y. Yang, and F. Chen, “Simulation study on variable selection method for high-dimensional biomedical data,” Journal of Xi’an Jiaotong University (Medical Sciences), vol. 42, no. 4, pp. 628–632, 2021, doi: 10.7652/jdyxb202104024.
[3] Z. T. Ke and L. Wang, “A Comparison Of Hamming Errors Of Representative Variable Selection Methods,” in ICLR 2022 - 10th International Conference on Learning Representations, 2022. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85149719095&partnerID=40&md5=ec73609d963cf681a17ca598a85a075e
[4] R. Sarkar, S. Manage, and X. Gao, “Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations,” Annals of Data Science, vol. 11, no. 4, pp. 1139–1164, 2024, doi: 10.1007/s40745-023-00481-5.
[5] Y. Yang and H. Yang, “Rates of convergence of the adaptive elastic net and the post-selection procedure in ultra-high dimensional sparse models,” Commun Stat Theory Methods, vol. 50, no. 1, pp. 73–94, 2021, doi: 10.1080/03610926.2019.1628991.
[6] H. Ali, M. Shahzad, S. Sarfraz, K. B. Sewell, S. Alqalyoobi, and B. P. Mohan, “Application and impact of Lasso regression in gastroenterology: A systematic review,” Indian Journal of Gastroenterology, vol. 42, no. 6, pp. 780–790, 2023, doi: 10.1007/s12664-023-01426-9.
[7] M. Hajihosseinlou, A. Maghsoudi, and R. Ghezelbash, “Regularization in machine learning models for MVT Pb-Zn prospectivity mapping: applying lasso and elastic-net algorithms,” Earth Sci Inform, vol. 17, no. 5, pp. 4859–4873, 2024, doi: 10.1007/s12145-024-01404-5.
[8] O. Al Hosni and A. Starkey, “Stability and accuracy of feature selection methods on datasets of varying data complexity,” in 2021 22nd International Arab Conference on Information Technology, ACIT 2021, 2021. doi: 10.1109/ACIT53391.2021.9677329.
[9] T. Łukaszuk and J. Krawczuk, “Importance of feature selection stability in the classifier evaluation on high-dimensional genetic data,” PeerJ, vol. 12, no. 11, 2024, doi: 10.7717/peerj.18405.
[10] Z. Mungloo-Dilmohamud, Y. Jaufeerally-Fakim, and C. Peña-Reyes, “Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, pp. 659–669. doi: 10.1007/978-3-030-45385-5_59.
[11] R. Sen, A. K. Mandal, and B. Chakraborty, “A Critical Study on Stability Measures of Feature Selection with a Novel Extension of Lustgarten Index,” Mach Learn Knowl Extr, vol. 3, no. 4, pp. 771–787, Dec. 2021, doi: 10.3390/make3040038.
[12] J. M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, pp. 175–184. doi: 10.1007/978-3-642-04277-5_18.
[13] O. Melikechi and J. W. Miller, “Integrated Path Stability Selection,” J Am Stat Assoc, 2025, doi: 10.1080/01621459.2025.2525589.
[14] X. Li and R. Jacobucci, “Regularized Structural Equation Modeling With Stability Selection,” Psychol Methods, vol. 27, no. 4, pp. 497–518, 2021, doi: 10.1037/met0000389.
[15] N. Sudjai, M. Duangsaphon, and C. Chandhanayingyong, “Adaptive Elastic Net on High-Dimensional Sparse Data with Multicollinearity: Application to Lipomatous Tumor Classification,” 2024.
[16] N. Sudjai, M. Duangsaphon, and C. Chandhanayingyong, “Adaptive Lasso sparse logistic regression on high-dimensional data with multicollinearity,” Science, Engineering and Health Studies, vol. 19, Feb. 2025, doi: 10.69598/sehs.19.25020002.
[17] D. Theng, K. K. Bhoyar, and P. Pawade, “Feature Selection and Stability Analysis using Ensemble Techniques,” Journal of Intelligent Systems and Internet of Things, vol. 16, no. 1, pp. 41–48, 2025, doi: 10.54216/JISIoT.160104.
[18] A. S. Sumant and D. Patil, “Stability Investigation of Ensemble Feature Selection for High Dimensional Data Analytics,” in Lecture Notes in Networks and Systems, 2022, pp. 801–815. doi: 10.1007/978-3-031-12413-6_63.
[19] A. Spooner, G. Mohammadi, P. S. Sachdev, H. Brodaty, and A. Sowmya, “Ensemble feature selection with data-driven thresholding for Alzheimer’s disease biomarker discovery,” BMC Bioinformatics, vol. 24, no. 1, Dec. 2023, doi: 10.1186/s12859-022-05132-9.
[20] R. Bertolini and S. J. Finch, “Stability of filter feature selection methods in data pipelines: a simulation study,” Int J Data Sci Anal, vol. 17, no. 2, pp. 225–248, 2024, doi: 10.1007/s41060-022-00373-6.
[21] B. Antunes and D. R. C. Hill, “Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing,” Comput Sci Rev, vol. 53, 2024, doi: 10.1016/j.cosrev.2024.100655.
[22] H. Zou and T. Hastie, “Regularization and Variable Selection Via the Elastic Net,” J R Stat Soc Series B Stat Methodol, vol. 67, no. 2, pp. 301–320, Apr. 2005, doi: 10.1111/J.1467-9868.2005.00503.X.
[23] S. Biyani, A. S. Patil, and V. Swami, “The Influence of FOXC1 Gene on Development, Organogenesis, and Functions,” Clinical and Translational Metabolism, vol. 22, no. 1, 2024, doi: 10.1007/s12018-024-09297-0.
[24] X. T. R. Moore, L. Gheghiani, and Z. Fu, “The Role of Polo-Like Kinase 1 in Regulating the Forkhead Box Family Transcription Factors,” May 01, 2023, MDPI. doi: 10.3390/cells12091344.
[25] D. Kalathil, S. John, and A. S. Nair, “FOXM1 and Cancer: Faulty Cellular Signaling Derails Homeostasis,” Feb. 15, 2021, Frontiers Media S.A. doi: 10.3389/fonc.2020.626836.
[26] M. Li et al., “FOXM1 transcriptional regulation,” Biol Cell, vol. 116, no. 9, 2024, doi: 10.1111/boc.202400012.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Fani Fahira, Kusman Sadik, Cici Suhaeni, Agus M Soleh

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








