×

The analysis of multivariate data using semi-definite programming. (English) Zbl 1331.62293

Summary: A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62-07 Data analysis (statistics) (MSC2010)
90C90 Applications of mathematical programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] AL-IBRAHIM, A.H., and AL-KANDARI, N. (2008), “Stability of the Principal Components”, Computational Statistics, 23, 153-171. · JFM 64.1302.04
[2] ANWAR, M.A., AL-KANDARI, N., and AL-QALLAF, C.L. (2004), “Use of Bostick’s Library Anxiety Scale on Undergraduate Biological Sciences Students of Kuwait University”, Library and Information Science Research, 26, 266-283. · JFM 64.1302.04
[3] BOSTICK, S.L. (1992), “The Development and Validation of the Library Anxiety Scale”, PhD dissertation, Wayne State University, USA. · Zbl 1215.62056
[4] BOYD, S., and VANDENBERGHE, L. (2004), Convex Optimization, Cambridge UK: Cambridge University Press. · Zbl 1058.90049
[5] CADIMA J., and JOLLIFFE, I. (1995), “Loadings and Correlations in the Interpretation of Principal Components”, Applied Statistics, 22, 203-214. · doi:10.1080/757584614
[6] CANDE’S, E.J., and RECHT, B. (2008), “Exact Matrix Completion Via Convex Optimization, <Emphasis Type=”Italic“>Foundations of Computational Mathematics, <Emphasis Type=”Italic”>9, 717-772. · Zbl 1219.90124
[7] D’ASPREMONT, A., EL GHAOUI, L., JORDAN, M.I., and LANCKRIET, G.R.G. (2004), “A Direct Formulation for Sparse PCA Using Programming”, in Advances in Neural Information Processing Systems (NIPS),Vancouver BC., reprinted in 2007 in SIAM Review, 49 (3), 434-448. · Zbl 1128.90050
[8] ECKART, C., and YOUNG, G. (1936), “The Approximation of One Matrix by Another of Lower Rank”, Psychometrika, 1, 211-218. · JFM 62.1075.02 · doi:10.1007/BF02288367
[9] GIFI, A. (1990), Nonlinear Multivariate Analysis, Chichester: Wiley. · Zbl 0697.62048
[10] HAN, J., and KAMBER, M. (2006), Data Mining: Concepts and Techniques (2nd ed.), San Francisco CA: Morgan Kaufmann Publishers. · Zbl 1445.68004
[11] JEFFERS, J.N.R. (1967), “Two Case Studies in the Applications Of Principal Component Analysis”, Applied Statistics, 16, 225-236. · doi:10.2307/2985919
[12] JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), New York NY: Springer Verlag. · Zbl 1011.62064
[13] JOLLIFFE I.T., and UDDIN, M. (2003), “A Modified Principal Component Technique Based on the Lasso”, Journal of Computational and Graphical Statistics, 12, 531-547. · doi:10.1198/1061860032148
[14] LOEHLIN, J.C. (1998), Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis, Mahwah NJ: Lawrence Erlbaum Associates. · Zbl 0920.62078
[15] MESBAHI, M. (1999), “On the Semi-Definite Programming Solution of the Least Order Dynamic Output Feedback Synthesis”, in Proceedings of the American Control Conference, pp. 2355-2359.
[16] MOGHADDAM, B., WEISS. Y., and AVIDAN. S., (2006), “Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms”, Advances in Neural Information Processing Systems, 18, 915-922.
[17] PATAKI, G. (1998), “On the Rank of Extreme Matrices in Programs and the Multiplicity of Optimal Eigenvalues”, Mathematics of Operations Research, 23(2), 339-358. · Zbl 0977.90051
[18] RENCHER, A.C. (1995), Methods of Multivariate Analysis, New York: Wiley. · Zbl 0836.62039
[19] RENCHER, A.C. (1998), Multivariate Statistical Inference and Applications, New York: Wiley. · Zbl 0932.62065
[20] ROHDE, A., and TSYBAKOV, A. (2011), “Estimation Of High-Dimensional Low-Rank Matrices”, Annals of Statistics, 39(2), 887-930. · Zbl 1215.62056 · doi:10.1214/10-AOS860
[21] SAGNOL, G. (2011), “A Class of Programs with Rank-One Solutions”, Linear Algebra and Its Applications, 435(6), 1446-1463. · Zbl 1220.90084
[22] SHAWE-TAYLOR, J., and CRISTIANINI, N. (2004), Kernel Methods for Pattern Analysis, Cambridge UK: Cambridge University Press. · Zbl 0994.68074
[23] YOUNG, G., and HOUSEHOLDER, A. S. (1938), “Discussion of a Set of Points in Terms of Their Mutual Distances”, Psychometrika, 3, 19-22. · JFM 64.1302.04 · doi:10.1007/BF02287916
[24] ZOU, H., HASTIE, T., and TIBSHIRANI, R. (2006), “Sparse Principal Component Analysis”, Journal of Computational and Graphical Statistics, 15, 265-286. · doi:10.1198/106186006X113430
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.