zbMATH — the first resource for mathematics

Weighted Euclidean biplots. (English) Zbl 1364.62151
Summary: We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
PROXSCAL; R; sedaR; smacof
Full Text: DOI
[1] BORG, I., and GROENEN, P.J.F. (2005), Modern Multidimensional Scaling (2nd ed.), New York: Springer. · Zbl 1085.62079
[2] COMMANDEUR, J.J.F., and HEISER,W.J. (1993), “Mathematical Derivations in the Proximity Scaling (PROXSCAL) of Symmetric Data Matrices”, Technical Report No. RR-93-03, Leiden, The Netherlands: Department of Data Theory, Leiden University.
[3] DE LEEUW, J. (1977), “Applications of Convex Analysis to Multidimensional Scaling”, in Recent Developments in Statistics, eds J.R. Barra, F. Brodeau, G. Romier, and B. van Cutsem, Amsterdam: North-Holland, pp. 133-145.
[4] DE LEEUW, J. (1988), “Convergence of the Majorization Method for Multidimensional Scaling”, Journal of Classification, 5, 163-180. · Zbl 0692.62056
[5] DE LEEUW, J., and HEISER, W.J. (1980), “Multidimensional Scaling with Restrictions on the Configuration”, in Multivariate Analysis, Vol. V, ed. P.R. Krishnaiah, Amsterdam: North-Holland, pp. 501-522. · Zbl 0468.62054
[6] DE LEEUW, J., and MAIR, P. (2009), “Multidimensional Scaling Using Majorization: SMACOF in R”, Journal of Statistical Software, 31, 1-30.
[7] GABRIEL, K.R. (1971), “The Biplot-Graphic Display of Matrices with Applications to Principal Component Analysis”, Biometrika, 58, 453-467. · Zbl 0228.62034
[8] GABRIEL, K.R., and ODOROFF, C.L. (1990), “Biplots in Biomedical Research”, Statistics in Medicine, 9, 469-485.
[9] GOWER, J.C., and HAND, D.J. (1996), Biplots, London: Chapman and Hall. · Zbl 0867.62053
[10] GOWER, J.C., and HARDING, S.A. (1988), “Nonlinear Biplots”, Biometrika, 75, 445-455. · Zbl 0654.62047
[11] GOWER, J.C., and LEGENDRE, P. (1986), “Metric and Euclidean Properties of Dissimilarity Coefficients”, Journal of Classification, 3, 5-48. · Zbl 0592.62048
[12] GOWER, J.C., LUBBE, S., and LE ROUX, N. (2011), Understanding Biplots, Chichester, UK: Wiley.
[13] GREENACRE, M.J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. · Zbl 0555.62005
[14] GREENACRE, M.J. (2007), Correspondence Analysis in Practice (2nd ed.), London: Chapman & Hall/CRC. Free download of the Spanish translation of this book from www.multivariatestatistics.org.
[15] GREENACRE, M.J. (2010), Biplots in Practice, Madrid: BBVA Foundation. Free download from www.multivariatestatistics.org.
[16] GREENACRE, M.J. (2013), “Contribution Biplots”, Journal of Computational and Graphical Statistics, 22, 107-122.
[17] LEGENDRE, P., and LEGENDRE, L. (1998), Numerical Ecology, Amsterdam: North Holland. · Zbl 1033.92036
[18] R CORE TEAM (2015), “R: A Language and Environment for Statistical Computing”, R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org.
[19] RIOS, M., VILLAROYA,A., and OLLER, J.M. (1994), “Intrinsic Data Analysis: A Method for the Simultaneous Representation of Populations and Variables”, Research report 160, Department of Statistics, University of Barcelona.
[20] TUCKER, L.R. (1951), “A Method for the Synthesis of Factor Analysis Studies”, Technical Report No. 984, Washington, DC: Department of the Army.
[21] VIVES, S., and VILLAROYA, A. (1996), “La Combinació de Tècniques de Geometria Diferencial amb Anàlisi Multivariant Clàssica: Una Aplicació a la Caracterització de les Comarques Catalanes”, Qüestiió, 20, 449-482.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.