Principal Component Analysis (PCA) for Interval-Valued Symbolic Data: A Comparison of the Center and Vertex (TOPS) Methods
DOI:
https://doi.org/10.30871/jaic.v10i2.12381Keywords:
Symbolic Data Analysis (SDA), Principal Component Analysis (PCA), Centers, Vertices (TOPS), Dimensionality ReductionAbstract
Classical dimensionality reduction techniques, such as Principal Component Analysis (PCA), are widely used to explore the structure of multivariate datasets. However, these methods are traditionally restricted to situations in which each variable is represented by a single numerical value per individual. The emergence of symbolic data, particularly interval-valued data, has introduced new challenges in the field of data science. In this framework, a single variable may take multiple possible values, reflecting either measurement uncertainty or intrinsic variability of the observation. Such data therefore provide a more faithful representation of the complexity of observed phenomena, but they require specifically adapted analytical methodologies. This paper aims to compare two PCA variants applied to interval-valued symbolic data: The Center Method, in which each interval is represented by its midpoint, and the Vertex Method (TOPS), in which the lower and upper bounds of each interval are jointly exploited. We formally define interval-valued variables, present the algorithmic steps of both the Center and TOPS methods, analyze their computational complexity, and introduce evaluation metrics including explained variance, reconstruction error, and sensitivity analysis with respect to interval width. The objective is to assess the extent to which these approaches preserve the information contained within intervals and to determine which method proves more appropriate for a given dataset. Using a biomedical dataset (n = 1021 individuals, p = 7 interval-valued variables), we show that while the Center method provides strong dimensional condensation and interpretability, the TOPS method more faithfully preserves the geometry of intervals in the presence of high variability. This study clarifies the theoretical differences between the two approaches and proposes a systematic evaluation framework for interval-valued symbolic PCA methods.
Downloads
References
[1] L. Billard, Sample Covariance Functions for Complex Quantitative Data: Conference Yokohama Japan 2008
[2] L. Carlo and P. Fabrizia, Principal component analysis of interval data: a symbolic data analysis approach .
[3] L. Billard et E. Diday, Symbolic Data Analysis: Conceptual Statistics and Data Mining .
[4] H. Vilela, S. Dias and P. Brito, Extracting information from interval data using symbolic principal component analysis: Communications in Statistics Simulation and Computation .
[5] F. Wang, J. Chen and X. Gong, An Improved Interval-type Symbolic Data Principal Component Analysis .
[6] P. Cazes, A. Chouakria, E. Diday et Y. Schektman. Extension de l'analyse en composantes principales à des données de type intervalle .
[7] N. Carlo, L. Billard and F. Palumbo, Principal Component Analysis of Interval Data: a Symbolic Data Analysis Approach .
[8] E. Diday, Introduction à l'approche symbolique en analyse des données : Journées Symbolique-Numerique .
[9] E. Diday, Une introduction à l'analyse des données symboliques, SFC .
[10] G. Meccariello, Analisi in componenti principali per dati ad intervallo.
[11] I. Jolliffe and J. Cadima, Principal component analysis: a review and recent developments.
[12] Golub and V. Loan, Principal component analysis: a review and recent developments.
[13] E. Diday and F. Esposito, An introduction to Symbolic Data Analysis and the Sodas Software IDA.
[14] E. Diday, Symbolic Data Analysis: Past, Present and Future
[15] F. Noirhomme and P. Brito, Far beyond the classical data models: Symbolic data analysis.
[16] F. Hausdorff, Grundzüge der Mengenlehre. Leipzig, Veit & Comp. Classical reference for the Hausdorff distance
[17] H-H Bock and E. Diday, Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data
[18] A. Irpino and R. Verde, Basic statistics for distributional symbolic variables: A new metric-based approach. Advances in Data Analysis and Classification
[19] E. Diday and J-C Simon, Clustering analysis in digital Pattern Recognition
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Benjamin Boono, Mabela Rostin Makengo , Kikomba Kahungu Michael, Mbuyi Lunkondo Patience, Kipulu Ngimbi Serge

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








