Principal Component Analysis (PCA) for Interval-Valued Symbolic Data: A Comparison of the Center and Vertex (TOPS) Methods

Benjamin Boono; Mabela Rostin Makengo; Kikomba Kahungu Michael; Mbuyi Lunkondo Patience; Kipulu Ngimbi Serge

doi:10.30871/jaic.v10i2.12381

Authors

Benjamin Boono Department of Mathematics, Statistics and Computer Science, University of Kinshasa
Mabela Rostin Makengo Department of Mathematics, Statistics and Computer Science, University of Kinshasa
Kikomba Kahungu Michael Department of Exact Sciences, Higher Pedagogical Institute of Gombe–Kinshasa
Mbuyi Lunkondo Patience Center for Interdisciplinary Research, National Pedagogical University (CRIDUPN)
Kipulu Ngimbi Serge Computer Science Section, Higher Pedagogical and Technical Institute of Kinshasa

DOI:

https://doi.org/10.30871/jaic.v10i2.12381

Keywords:

Symbolic Data Analysis (SDA), Principal Component Analysis (PCA), Centers, Vertices (TOPS), Dimensionality Reduction

Abstract

Classical dimensionality reduction techniques, such as Principal Component Analysis (PCA), are widely used to explore the structure of multivariate datasets. However, these methods are traditionally restricted to situations in which each variable is represented by a single numerical value per individual. The emergence of symbolic data, particularly interval-valued data, has introduced new challenges in the field of data science. In this framework, a single variable may take multiple possible values, reflecting either measurement uncertainty or intrinsic variability of the observation. Such data therefore provide a more faithful representation of the complexity of observed phenomena, but they require specifically adapted analytical methodologies. This paper aims to compare two PCA variants applied to interval-valued symbolic data: The Center Method, in which each interval is represented by its midpoint, and the Vertex Method (TOPS), in which the lower and upper bounds of each interval are jointly exploited. We formally define interval-valued variables, present the algorithmic steps of both the Center and TOPS methods, analyze their computational complexity, and introduce evaluation metrics including explained variance, reconstruction error, and sensitivity analysis with respect to interval width. The objective is to assess the extent to which these approaches preserve the information contained within intervals and to determine which method proves more appropriate for a given dataset. Using a biomedical dataset (n = 1021 individuals, p = 7 interval-valued variables), we show that while the Center method provides strong dimensional condensation and interpretability, the TOPS method more faithfully preserves the geometry of intervals in the presence of high variability. This study clarifies the theoretical differences between the two approaches and proposes a systematic evaluation framework for interval-valued symbolic PCA methods.

Downloads

Download data is not yet available.

References

[1] L. Billard, Sample Covariance Functions for Complex Quantitative Data: Conference Yokohama Japan 2008

[2] L. Carlo and P. Fabrizia, Principal component analysis of interval data: a symbolic data analysis approach .

[3] L. Billard et E. Diday, Symbolic Data Analysis: Conceptual Statistics and Data Mining .

[4] H. Vilela, S. Dias and P. Brito, Extracting information from interval data using symbolic principal component analysis: Communications in Statistics Simulation and Computation .

[5] F. Wang, J. Chen and X. Gong, An Improved Interval-type Symbolic Data Principal Component Analysis .

[6] P. Cazes, A. Chouakria, E. Diday et Y. Schektman. Extension de l'analyse en composantes principales à des données de type intervalle .

[7] N. Carlo, L. Billard and F. Palumbo, Principal Component Analysis of Interval Data: a Symbolic Data Analysis Approach .

[8] E. Diday, Introduction à l'approche symbolique en analyse des données : Journées Symbolique-Numerique .

[9] E. Diday, Une introduction à l'analyse des données symboliques, SFC .

[10] G. Meccariello, Analisi in componenti principali per dati ad intervallo.

[11] I. Jolliffe and J. Cadima, Principal component analysis: a review and recent developments.

[12] Golub and V. Loan, Principal component analysis: a review and recent developments.

[13] E. Diday and F. Esposito, An introduction to Symbolic Data Analysis and the Sodas Software IDA.

[14] E. Diday, Symbolic Data Analysis: Past, Present and Future

[15] F. Noirhomme and P. Brito, Far beyond the classical data models: Symbolic data analysis.

[16] F. Hausdorff, Grundzüge der Mengenlehre. Leipzig, Veit & Comp. Classical reference for the Hausdorff distance

[17] H-H Bock and E. Diday, Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data

[18] A. Irpino and R. Verde, Basic statistics for distributional symbolic variables: A new metric-based approach. Advances in Data Analysis and Classification

[19] E. Diday and J-C Simon, Clustering analysis in digital Pattern Recognition

Principal Component Analysis (PCA) for Interval-Valued Symbolic Data: A Comparison of the Center and Vertex (TOPS) Methods

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn