The Impact of the L1/L2 Ratio on Selection Stability and Solution Sparsity along the Elastic Net Regularization Path in High-Dimensional Genomic Data

Authors

  • Fani Fahira School of Data Science, Mathematics and Informatics, IPB University
  • Kusman Sadik School of Data Science, Mathematics and Informatics, IPB University
  • Cici Suhaeni School of Data Science, Mathematics and Informatics, IPB University
  • Agus M Soleh School of Data Science, Mathematics and Informatics, IPB University

DOI:

https://doi.org/10.30871/jaic.v10i1.12059

Keywords:

Elastic Net, LASSO, High-Dimensional Data, Feature Selection Stability, Sparsity, Regularization Path

Abstract

High-dimensional genomic datasets (p>n) pose persistent challenges for predictive modeling and biomarker-oriented feature selection due to multicollinearity and instability of selected feature sets under resampling. Although Elastic Net is widely used to address correlated predictors via combined L1/L2 regularization, the practical role of the L1/L2 mixing ratio (α) is often treated as a secondary tuning choice driven primarily by predictive accuracy. This study investigates how varying α shapes the trade-off among selection stability, solution sparsity, and predictive performance along the Elastic Net regularization path. Experiments were conducted using the publicly available METABRIC breast cancer cohort (n = 1,964) with 21,113 gene expression features and a binary overall survival status outcome. Logistic regression with Elastic Net penalty was fitted across a grid of α values, with the regularization strength (λ) selected by cross-validation. Feature selection stability was evaluated under repeated resampling using the Jaccard index, Dice coefficient, and Adjusted Rand Index (ARI), while sparsity was summarized by the average number of non-zero coefficients; predictive performance was assessed using AUC, accuracy, and F1-score. Results show a monotonic decline in stability as α increases: α = 0.2 yields the highest stability (Jaccard 0.324, Dice 0.487, ARI 0.434), whereas LASSO (α = 1.0) produces the lowest stability (Jaccard 0.278, Dice 0.431, ARI 0.400). In contrast, predictive performance varies only marginally across α (AUC 0.696–0.704; accuracy 0.666–0.671; F1-score 0.738–0.742), while sparsity changes substantially (average selected features 110–204). Coefficient path analyses further illustrate abrupt shrinkage under LASSO versus smoother, group-preserving shrinkage under Elastic Net, consistent with improved reproducibility under lower-to-moderate α. Frequency-of-selection analysis highlights genes repeatedly selected across resampling, supporting interpretability of stable configurations without claiming causal biomarker validity. Overall, the findings demonstrate that α is a substantive modeling choice that materially affects stability and sparsity even when accuracy is similar, motivating stability-aware tuning for high-dimensional genomic prediction and reproducible feature discovery.

Downloads

Download data is not yet available.

References

[1] G. C. Reinsel, R. P. Velu, and K. Chen, “High-Dimensional Reduced-Rank Regression,” in Lecture Notes in Statistics, vol. 225, 2022, pp. 279–309. doi: 10.1007/978-1-0716-2793-8_10.

[2] J. Wang, P. Zhao, Y. Li, Y. Yang, and F. Chen, “Simulation study on variable selection method for high-dimensional biomedical data,” Journal of Xi’an Jiaotong University (Medical Sciences), vol. 42, no. 4, pp. 628–632, 2021, doi: 10.7652/jdyxb202104024.

[3] Z. T. Ke and L. Wang, “A Comparison Of Hamming Errors Of Representative Variable Selection Methods,” in ICLR 2022 - 10th International Conference on Learning Representations, 2022. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85149719095&partnerID=40&md5=ec73609d963cf681a17ca598a85a075e

[4] R. Sarkar, S. Manage, and X. Gao, “Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations,” Annals of Data Science, vol. 11, no. 4, pp. 1139–1164, 2024, doi: 10.1007/s40745-023-00481-5.

[5] Y. Yang and H. Yang, “Rates of convergence of the adaptive elastic net and the post-selection procedure in ultra-high dimensional sparse models,” Commun Stat Theory Methods, vol. 50, no. 1, pp. 73–94, 2021, doi: 10.1080/03610926.2019.1628991.

[6] H. Ali, M. Shahzad, S. Sarfraz, K. B. Sewell, S. Alqalyoobi, and B. P. Mohan, “Application and impact of Lasso regression in gastroenterology: A systematic review,” Indian Journal of Gastroenterology, vol. 42, no. 6, pp. 780–790, 2023, doi: 10.1007/s12664-023-01426-9.

[7] M. Hajihosseinlou, A. Maghsoudi, and R. Ghezelbash, “Regularization in machine learning models for MVT Pb-Zn prospectivity mapping: applying lasso and elastic-net algorithms,” Earth Sci Inform, vol. 17, no. 5, pp. 4859–4873, 2024, doi: 10.1007/s12145-024-01404-5.

[8] O. Al Hosni and A. Starkey, “Stability and accuracy of feature selection methods on datasets of varying data complexity,” in 2021 22nd International Arab Conference on Information Technology, ACIT 2021, 2021. doi: 10.1109/ACIT53391.2021.9677329.

[9] T. Łukaszuk and J. Krawczuk, “Importance of feature selection stability in the classifier evaluation on high-dimensional genetic data,” PeerJ, vol. 12, no. 11, 2024, doi: 10.7717/peerj.18405.

[10] Z. Mungloo-Dilmohamud, Y. Jaufeerally-Fakim, and C. Peña-Reyes, “Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, pp. 659–669. doi: 10.1007/978-3-030-45385-5_59.

[11] R. Sen, A. K. Mandal, and B. Chakraborty, “A Critical Study on Stability Measures of Feature Selection with a Novel Extension of Lustgarten Index,” Mach Learn Knowl Extr, vol. 3, no. 4, pp. 771–787, Dec. 2021, doi: 10.3390/make3040038.

[12] J. M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, pp. 175–184. doi: 10.1007/978-3-642-04277-5_18.

[13] O. Melikechi and J. W. Miller, “Integrated Path Stability Selection,” J Am Stat Assoc, 2025, doi: 10.1080/01621459.2025.2525589.

[14] X. Li and R. Jacobucci, “Regularized Structural Equation Modeling With Stability Selection,” Psychol Methods, vol. 27, no. 4, pp. 497–518, 2021, doi: 10.1037/met0000389.

[15] N. Sudjai, M. Duangsaphon, and C. Chandhanayingyong, “Adaptive Elastic Net on High-Dimensional Sparse Data with Multicollinearity: Application to Lipomatous Tumor Classification,” 2024.

[16] N. Sudjai, M. Duangsaphon, and C. Chandhanayingyong, “Adaptive Lasso sparse logistic regression on high-dimensional data with multicollinearity,” Science, Engineering and Health Studies, vol. 19, Feb. 2025, doi: 10.69598/sehs.19.25020002.

[17] D. Theng, K. K. Bhoyar, and P. Pawade, “Feature Selection and Stability Analysis using Ensemble Techniques,” Journal of Intelligent Systems and Internet of Things, vol. 16, no. 1, pp. 41–48, 2025, doi: 10.54216/JISIoT.160104.

[18] A. S. Sumant and D. Patil, “Stability Investigation of Ensemble Feature Selection for High Dimensional Data Analytics,” in Lecture Notes in Networks and Systems, 2022, pp. 801–815. doi: 10.1007/978-3-031-12413-6_63.

[19] A. Spooner, G. Mohammadi, P. S. Sachdev, H. Brodaty, and A. Sowmya, “Ensemble feature selection with data-driven thresholding for Alzheimer’s disease biomarker discovery,” BMC Bioinformatics, vol. 24, no. 1, Dec. 2023, doi: 10.1186/s12859-022-05132-9.

[20] R. Bertolini and S. J. Finch, “Stability of filter feature selection methods in data pipelines: a simulation study,” Int J Data Sci Anal, vol. 17, no. 2, pp. 225–248, 2024, doi: 10.1007/s41060-022-00373-6.

[21] B. Antunes and D. R. C. Hill, “Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing,” Comput Sci Rev, vol. 53, 2024, doi: 10.1016/j.cosrev.2024.100655.

[22] H. Zou and T. Hastie, “Regularization and Variable Selection Via the Elastic Net,” J R Stat Soc Series B Stat Methodol, vol. 67, no. 2, pp. 301–320, Apr. 2005, doi: 10.1111/J.1467-9868.2005.00503.X.

[23] S. Biyani, A. S. Patil, and V. Swami, “The Influence of FOXC1 Gene on Development, Organogenesis, and Functions,” Clinical and Translational Metabolism, vol. 22, no. 1, 2024, doi: 10.1007/s12018-024-09297-0.

[24] X. T. R. Moore, L. Gheghiani, and Z. Fu, “The Role of Polo-Like Kinase 1 in Regulating the Forkhead Box Family Transcription Factors,” May 01, 2023, MDPI. doi: 10.3390/cells12091344.

[25] D. Kalathil, S. John, and A. S. Nair, “FOXM1 and Cancer: Faulty Cellular Signaling Derails Homeostasis,” Feb. 15, 2021, Frontiers Media S.A. doi: 10.3389/fonc.2020.626836.

[26] M. Li et al., “FOXM1 transcriptional regulation,” Biol Cell, vol. 116, no. 9, 2024, doi: 10.1111/boc.202400012.

Downloads

Published

2026-02-04

How to Cite

[1]
F. Fahira, K. Sadik, C. Suhaeni, and A. M Soleh, “The Impact of the L1/L2 Ratio on Selection Stability and Solution Sparsity along the Elastic Net Regularization Path in High-Dimensional Genomic Data”, JAIC, vol. 10, no. 1, pp. 273–283, Feb. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.