×

zbMATH — the first resource for mathematics

An exposition of multivariate analysis with the singular value decomposition in R. (English) Zbl 06983898
Summary: ExPosition is a new comprehensive package providing crisp graphics and implementing multivariate analysis methods based on the singular value decomposition ({svd}). The core techniques implemented in ExPosition are: principal components analysis, (metric) multidimensional scaling, correspondence analysis, and several of their recent extensions such as barycentric discriminant analyses (e.g., discriminant correspondence analysis), multi-table analyses (e.g., multiple factor analysis, {Statis}, and {distatis}), and non-parametric resampling techniques (e.g., permutation and bootstrap). Several examples highlight the major differences between ExPosition and similar packages. Finally, the future directions of ExPosition are discussed.
MSC:
62-XX Statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Abdi, H., Singular value decomposition (SVD) and generalized singular value decomposition (GSVD), (Salkind, N. J., Encyclopedia of Measurement and Statistics, (2007), Sage Thousand Oaks CA), 907-912
[2] Abdi, H., Z-scores, (Salkind, N. J., Encyclopedia of Measurement and Statistics, (2007), Sage Thousand Oaks CA), 1057-1058
[3] Abdi, H., Partial least squares regression and projection on latent structure regression (PLS regression), Wiley Interdiscip. Rev. Comput. Stat., 2, 1, 97-106, (2010)
[4] Abdi, H.; Chin, W.; Esposito Vinzi, V.; Russolillo, G.; Trinchera, L., New perspectives in partial least squares and related methods, (2013), Springer-Verlag New York · Zbl 1276.62007
[5] Abdi, H.; Dunlop, J. P.; Williams, L. J., How to compute reliability estimates and display confidence and tolerance intervals for pattern classifiers using the bootstrap and 3-way multidimensional scaling (DISTATIS), NeuroImage, 45, 89-95, (2009)
[6] Abdi, H., Valentin, D., O’Toole, A., Edelman, B., 2005. Distatis: the analysis of multiple distance matrices. In: Proceedings of the IEEE Computer Society: International Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA, pp. 42-47.
[7] Abdi, H.; Williams, L., Principal component analysis, Wiley Interdiscip. Rev. Comput. Stat., 2, 433-459, (2010)
[8] Abdi, H.; Williams, L. J., Correspondence analysis, (Salkind, N. J.; Dougherty, D. M.; Frey, B., Encyclopedia of Research Design, (2010), Sage Thousand Oaks, CA), 267-278
[9] Abdi, H.; Williams, L.; Beaton, D.; Posamentier, M.; Harris, T.; Krishnan, A.; Devous, M., Analysis of regional cerebral blood flow data to discriminate among alzheimer’s disease, frontotemporal dementia, and elderly controls: a multi-block barycentric discriminant analysis (MUBADA) methodology, J. Alzheimer’s Dis., 31, s189-s201, (2012)
[10] Abdi, H.; Williams, L.; Connolly, A.; Gobbini, M.; Dunlop, J.; Haxby, J., Multiple subject barycentric discriminant analysis (MUSUBADA): how to assign scans to categories without using spatial normalization, Comput. Math. Methods Med., 2012, 1-15, (2012) · Zbl 1244.62088
[11] Abdi, H.; Williams, L.; Valentin, D., Multiple factor analysis: principal component analysis for multi-table and multi-block data sets, Wiley Interdiscip. Rev. Comput. Stat., 5, 149-179, (2013)
[12] Abdi, H.; Williams, L.; Valentin, D.; Bennani-Dosse, M., Statis and distatis: optimum multitable principal component analysis and three way metric multidimensional scaling, Wiley Interdiscip. Rev. Comput. Stat., 4, 124-167, (2012)
[13] Alström, J., Alström, T., 2012. Beeradvocate.com. URL: http://beeradvocate.com/.
[14] Baty, F.; Facompré, M.; Wiegand, J.; Schwager, J.; Brutsche, M. H., Analysis with respect to instrumental variables for the exploration of microarray data structures, BMC Bioinformatics, 7, 1, 422, (2006)
[15] Beaton, D.; Filbey, F. M.; Abdi, H., Integrating partial least squares and correspondence analysis for nominal data, (Proceedings in Mathematics and Statistics: New perspectives in Partial Least Squares and Related Methods, (2013), Springer-Verlag), 81-94
[16] Bécue-Bertaut, M.; Pagès, J., Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data, Comput. Statist. Data Anal., 52, 6, 3255-3268, (2008) · Zbl 1452.62406
[17] Benzécri, J., L’analyse des données. vol. 2, (1973), Dunod Paris · Zbl 0297.62039
[18] Benzécri, J., Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire, Cah. Anal. Données, 4, 377-378, (1979)
[19] Bookstein, F., Partial least squares: a dose-response model for measurement in the behavioral and brain sciences, Psycoloquy, 5, 23, (1994)
[20] Borg, I., Modern multidimensional scaling: theory and applications, (2005), Springer · Zbl 1085.62079
[21] Buchsbaum, B.; Lemire-Rodger, S.; Fang, C.; Abdi, H., The neural basis of vivid memory is patterned on perception, J. Cogn. Neurosci., 24, 1-17, (2012)
[22] Caillez, F.; Pagès, J., Introduction à l’analyse des données, (1976), SMASH Paris
[23] Chernick, M., Bootstrap methods: A guide for practitioners and researchers. vol. 619, (2008), Wiley-Interscience · Zbl 1136.62029
[24] Couture-Beil, A., 2013. rjson: JSON for R. R package version 0.2.12. URL: http://CRAN.R-project.org/package=rjson.
[25] Cuadras, C. M.; Cuadras, D.; Greenacre, M. J., A comparison of different methods for representing categorical data, Comm. Statist. Simulation Comput., 35, 2, 447-459, (2006) · Zbl 1093.62061
[26] Dray, S., On the number of principal components: a test of dimensionality based on measurements of similarity between matrices, Comput. Statist. Data Anal., 52, 4, 2228-2237, (2008) · Zbl 1452.62409
[27] Dray, S.; Dufour, A., The ade4 package: implementing the duality diagram for ecologists, J. Stat. Softw., 22, 4, 1-20, (2007)
[28] Eddelbuettel, D.; Sanderson, C., Rcpparmadillo: accelerating R with high-performance C++ linear algebra, Comput. Statist. Data Anal., (2013), in press. URL: http://dx.doi.org/10.1016/j.csda.2013.02.005
[29] Efron, B.; Tibshirani, R., An introduction to the bootstrap. vol. 57, (1993), Chapman & Hall/CRC
[30] Escofier, B., Analyse factorielle et distances répondant au principe d’équivalence distributionnelle, Rev. Stat. Appl., 26, 4, 29-37, (1978)
[31] Escoufier, Y., Operators related to a data matrix: a survey, (COMPSTAT: Proceedings in Computational Statistics; 17th Symposium Held in Rome, Italy, 2006, (2007), Physica-Verlag New York), 285-287
[32] Esposito Vinzi, V.; Russolillo, G., Partial least squares algorithms and methods, Wiley Interdiscip. Rev. Comput. Stat., 5, 1, 1-19, (2013)
[33] Gomez, J. C.; Moens, M.-F., Pca document reconstruction for email classification, Comput. Statist. Data Anal., 56, 3, 741-751, (2012)
[34] Google, Inc., 2013. Google Maps.
[35] Gower, J., Adding a point to vector diagrams in multivariate analysis, Psychometrika, 55, 582-585, (1968) · Zbl 0167.17802
[36] Greenacre, M., Theory and applications of correspondence analysis, (1984), Academic Press · Zbl 0555.62005
[37] Greenacre, M. J., Correspondence analysis in practice, (2007), CRC Press · Zbl 1198.62061
[38] Hesterberg, T., Bootstrap, Wiley Interdiscip. Rev. Comput. Stat., 3, 497-526, (2011)
[39] Hill, M. O., Correspondence analysis: a neglected multivariate method, J. R. Stat. Soc. Ser. C. Appl. Stat., 23, 3, 340-354, (1974)
[40] Husson, F.; Lê, S. L.; Pagès, J., Variability of the representation of the variables resulting from PCA in the case of a conventional sensory profile, Food Qual. Preference, 18, 7, 933-937, (2007)
[41] Jolliffe, I., (Principal Component Analysis, Springer Series in Statistics, (2002), Springer-Verlag New York) · Zbl 1011.62064
[42] Josse, J.; Husson, F., Selecting the number of components in principal component analysis using cross-validation approximations, Comput. Statist. Data Anal., (2011)
[43] Krishnan, A.; Williams, L. J.; McIntosh, A. R.; Abdi, H., Partial least squares (PLS) methods for neuroimaging: a tutorial and review, NeuroImage, 56, 2, 455-475, (2011)
[44] Lavit, C.; Escoufier, Y.; Sabatier, R.; Traissac, P., The act (statis method), Comput. Statist. Data Anal., 18, 1, 97-119, (1994) · Zbl 0825.62009
[45] Lê, S.; Josse, J.; Husson, F., Factominer: an R package for multivariate analysis, J. Stat. Softw., 25, 1, 1-18, (2008)
[46] Lebart, L.; Morineau, A.; Warwick, K. M., (Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, (1984), Wiley)
[47] Le Floch, E.; Guillemot, V.; Frouin, V.; Pinel, P.; Lalanne, C.; Trinchera, L.; Tenenhaus, A.; Moreno, A.; Zilbovicius, M.; Bourgeron, T.; Dehaene, S.; Thirion, B.; Poline, J.; Duchesnay, E., Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares, NeuroImage, 63, 1, 11-24, (2012)
[48] Liang, F., Use of SVD-based probit transformation in clustering gene expression profiles, Comput. Statist. Data Anal., 51, 12, 6355-6366, (2007) · Zbl 1445.62275
[49] McIntosh, A.; Bookstein, F.; Haxby, J.; Grady, C., Spatial pattern analysis of functional brain images using partial least squares, NeuroImage, 3, 3, 143-157, (1996)
[50] McIntosh, A.; Lobaugh, N., Partial least squares analysis of neuroimaging data: applications and advances, NeuroImage, 23, S250-S263, (2004)
[51] McIntosh, A. R.; Mišić, B., Multivariate statistical analyses for neuroimaging data, Annu. Rev. Psychol., 64, 1, 499-525, (2013)
[52] Meyners, M.; Castura, J. C.; Thomas Carr, B., Existing and new approaches for the analysis of CATA data, Food Qual. Preference, 30, 2, 309-319, (2013)
[53] Nenadic, O.; Greenacre, M., Correspondence analysis in R, with two- and three-dimensional graphics: the ca package, J. Stat. Softw., 20, 3, 1-13, (2007), URL: http://www.jstatsoft.org
[54] Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O’Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Wagner, H., 2013. vegan: community ecology package. R package version 2.0-6. URL: http://CRAN.R-project.org/package=vegan.
[55] Peres-Neto, P. R.; Jackson, D. A.; Somers, K. M., How many principal components? stopping rules for determining the number of non-trivial axes revisited, Comput. Statist. Data Anal., 49, 4, 974-997, (2005) · Zbl 1429.62223
[56] Pinkham, A. E.; Sasson, N. J.; Beaton, D.; Abdi, H.; Kohler, C. G.; Penn, D. L., Qualitatively distinct factors contribute to elevated rates of paranoia in autism and schizophrenia, J. Abnormal Psychol., 121, (2012)
[57] Rao, C., A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance, Questiió: Quad. d’Estadística, Sistemes, Inform. Invest. Oper., 19, 1, 23-63, (1995) · Zbl 1167.62421
[58] Rao, C., The use of Hellinger distance in graphical displays of contingency table data, Multivariate Stat., 3, 143-161, (1995) · Zbl 0925.62214
[59] R: a language and environment for statistical computing, (2010), R Foundation for Statistical Computing Vienna, Austria, URL: http://www.R-project.org
[60] St-Laurent, M.; Abdi, H.; Burianová, H.; Grady, C., Influence of ageing on the neural correlates of autobiographical, episodic, and semantic memory retrieval, J. Cogn. Neurosci., 23, 12, 4150-4163, (2011)
[61] Strother, S. C.; Anderson, J.; Hansen, L. K.; Kjems, U.; Kustra, R.; Sidtis, J.; Frutiger, S.; Muley, S.; LaConte, S.; Rottenberg, D., The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework, NeuroImage, 15, 4, 747-771, (2002)
[62] Takane, Y.; Yanai, H.; Hwang, H., An improved method for generalized constrained canonical correlation analysis, Comput. Statist. Data Anal., 50, 1, 221-241, (2006) · Zbl 1429.62213
[63] Tenenhaus, M., La régression PLS: théorie et pratique, (1998), Technip Paris · Zbl 0923.62058
[64] Tenenhaus, M.; Esposito Vinzi, V.; Chatelin, Y.; Lauro, C., Pls path modeling, Comput. Statist. Data Anal., 48, 1, 159-205, (2005) · Zbl 1429.62227
[65] Thioulouse, J., Simultaneous analysis of a sequence of paired ecological tables: a comparison of several method, Ann. Appl. Stat., 5, 2300-2325, (2011) · Zbl 1234.62154
[66] Torgerson, W., Theory and methods of scaling, (1958), Wiley New York
[67] Tucker, L. R., An inter-battery method of factor analysis, Psychometrika, 23, 2, 111-136, (1958) · Zbl 0097.35102
[68] Tuncer, Y.; Tanik, M. M.; Allison, D. B., An overview of statistical decomposition techniques applied to complex systems, Comput. Statist. Data Anal., 52, 5, 2292-2310, (2008) · Zbl 1452.62424
[69] Williams, L.; Abdi, H.; French, R.; Orange, J., A tutorial on multi-block discriminant correspondence analysis (MUDICA): a new method for analyzing discourse data from clinical populations, J. Speech, Lang. Hearing Res., 53, 1372-1393, (2010)
[70] Williams, L.; Dunlop, J.; Abdi, H., Effect of age on variability in the production of text-based global inferences, PLoS One, 7, 5, (2012)
[71] Wold, S.; Ruhe, A.; Wold, H.; Dunn, W., The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses, SIAM J. Sci. Stat. Comput., 5, 3, 735-743, (1984) · Zbl 0545.62044
[72] Yanai, H.; Takeuchi, K.; Takane, Y., Projection matrices, generalized inverse matrices, and singular value decomposition, (2011), Springer-Verlag New York · Zbl 1279.15003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.