The high-dimension, low-sample-size geometric representation holds under mild conditions. (English) Zbl 1135.62039
Summary: High-dimension, low-small-sample size datasets have different geometrical properties from those of traditional low-dimensional data. In their asymptotic study regarding increasing dimensionality with a fixed sample size, P. Hall et al. [J. R. Stat. Soc., Ser. B 67, No. 3, 427–444 (2005; Zbl 1069.62097)] showed that each data vector is approximately located on the vertices of a regular simplex in a high-dimensional space. A perhaps unappealing aspect of their result is the underlying assumption which requires the variables, viewed as a time series, to be almost independent. We establish an equivalent geometric representation under much milder conditions using asymptotic properties of sample covariance matrices. We discuss implications of the results, such as the use of principal components analysis in a high-dimensional space, extension to the case of non-independent samples and also the binary classification problem.
##### MSC:
 62H05 Characterization and structure theory (Multivariate analysis) 62H25 Factor analysis and principal components; correspondence analysis