Horseshoes in multidimensional scaling and local kernel methods. (English) Zbl 1149.62316

Summary: Classical multidimensional scaling (MDS) is a method for visualizing high-dimensional point clouds by mapping to low-dimensional Euclidean spaces. This mapping is defined in terms of eigenfunctions of a matrix of interpoint dissimilarities. We analyze in detail multidimensional scaling applied to a specific dataset: the 2005 United States House of Representatives roll call votes. Certain MDS and kernel projections output “horseshoes” that are characteristic of dimensionality reduction techniques. We show that, in general, a latent ordering of the data gives rise to these patterns when one only has local information. That is, when only the interpoint distances for nearby points are known accurately. Our results provide a rigorous set of results and insight into manifold learning in the special case where the manifold is a curve.


62H99 Multivariate analysis
91C15 One- and multidimensional scaling in the social and behavioral sciences
91F10 History, political science
Full Text: DOI arXiv


[1] Albouy, A. (2004). Mutual distances in celestial mechanics. Lectures at Nankai Institute, Tianjin, China. Available at http://www.imcce.fr/fr/presentation/equipes/ASD/preprints/prep.2004/Albouy_Nankai09_2004.pdf.
[2] Americans for Democratic Action (2005). ADA Congressional voting record-U.S. House of Representatives. Available at http://www.adaction.org.
[3] Bogomolny, E., Bohigas, O. and Schmit, C. (2003). Spectral properties of distance matrices. J. Phys. A : Math. Gen. 36 3595-3616. Available at http://www.citebase.org/abstract?id=oai:arXiv.org:nlin/0301044. · Zbl 1057.15027 · doi:10.1088/0305-4470/36/12/341
[4] Borchardt, C. W. (1866). Ueber die aufgabe des maximum, welche der bestimmung des tetraeders von grösstem volumen bei gegebenem flächeninhalt der seitenflächen für mehr als drei dimensionen entspricht. Math. Abhand. Akad. Wiss. Berlin 121-155.
[5] Borg, I. and Groenen, P. (1997). Modern Multidimensional Scaling : Theory and Applications . Springer, New York. · Zbl 0862.62052
[6] Burden, B. C., Caldeira, G. A. and Groseclose, T. (2000). Measuring the ideologies of U. S. senators: The song remains the same. Legislative Studies Quarterly 25 237-258.
[7] Cantoni, A. and Butler, P. (1976). Eigenvalues and eigenvectors of symmetric centrosymmetrlc matrices. Linear Algebra Appl. 13 275-288. · Zbl 0326.15007 · doi:10.1016/0024-3795(76)90101-4
[8] Clinton, J., Jackman, S. and Rivers, D. (2004). The statistical analysis of roll call data. American Political Science Review 355-370.
[9] Coombs, C. H. (1964). A Theory of Data . Wiley, New York.
[10] Cox, T. F. and Cox, M. A. A. (2000). Multidimensional Scaling . Chapman and Hall, London. · Zbl 1147.68460
[11] Diaconis, P., Goel, S. and Holmes, S. (2008). Supplement to “Horseshoes in multidimensional scaling and local kernel methods.” DOI: 10.1214/08-AOAS165SUPP. · Zbl 1149.62316
[12] de Leeuw, J. (2005). Multidimensional unfolding. In Encyclopedia of Statistics in Behavioral Science . Wiley, New York.
[13] de Leeuw, J. (2007). A horseshoe for multidimensional scaling. Technical report, UCLA.
[14] Dufrene, M. and Legendre, P. (1991). Geographic structure and potential ecological factors in Belgium. J. Biogeography . Available at http://links.jstor.org/sici?sici=0305-0270(199105)18
[15] Good, I. J. (1970). The inverse of a centrosymmetric matrix. Technometrics 12 925-928. · Zbl 0194.05903 · doi:10.2307/1267339
[16] Guttman, L. (1968). A general nonmetric technique for finding the smallest coordinate space for a configuration of \cdots . Psychometrika . Available at http://www.springerlink.com/index/AG2018142W42704L.pdf. · Zbl 0205.49302
[17] Heckman, J. J. and Snyder, J. M. (1997). Linear probability models of the demand for attributes with an empirical application to estimating the preferences of legislators. RAND J. Economics 28 S142-S189.
[18] Hill, M. O. and Gauch, H. G. (1980). Detrended correspondence analysis, an improved ordination technique. Vegetatio 42 47-58.
[19] Iwatsubo, S. (1984). The analytical solutions of an eigenvalue problem in the case of applying optimal scoring method to some types of data. In Data Analysis and Informatics III 31-40. North-Holland, Amsterdam. · Zbl 0566.62046
[20] Kendall, D. G. (1970). A mathematical approach to seriation. Phil. Trans. Roy. Soc. London 269 125-135.
[21] Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis . Academic Press, New York. · Zbl 0432.62029
[22] Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15 1373-1396. Available at http://www.mitpressjournals.org/doi/abs/10.1162/089976603321780317. · Zbl 1085.68119 · doi:10.1162/089976603321780317
[23] Office of the Clerk-U.S. House of Representatives. (2005). U.S. House of Representatives roll call votes 109th Congress-1st session. Available at http://clerk.house.gov.
[24] Palmer, M. (2008). Ordination methods for ecologists. Available at http://ordination.okstate.edu/.
[25] Parlett, B. N. (1980). The Symmetric Eigenvalue Problem . Prentice Hall, Englewood Cliffs, NJ. · Zbl 0431.65017
[26] Podani, J. and Miklos, I. (2002). Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 3331-3343.
[27] Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 2323-2326.
[28] Schoenberg, I. J. (1935). Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert.” Ann. of Math. ( 2 ) 36 724-732. · Zbl 0012.30703 · doi:10.2307/1968654
[29] Schölkopf, B., Smola, A. and Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10 1299-1319. Available at http://www.mitpressjournals.org/doi/abs/10.1162/089976698300017467.
[30] Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. I. Psychometrika 27 125-140. · Zbl 0129.12103 · doi:10.1007/BF02289630
[31] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 888-905. Available at citeseer.ist.psu.edu/shi97normalized.html.
[32] Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 2319-2323.
[33] ter Braak, C. (1985). Correspondence analysis of incidence and abundance data: Properties in terms of a unimodal response \cdots . Biometrics . Available at http://links.jstor.org/sici?sici=0006-341X(198512)41
[34] ter Braak, C. J. F. (1987). Ordination. In Data Analysis in Community and Landscape Ecology 81-173. Center for Agricultural Publishing and Documentation. Wageningen, The Netherlands.
[35] ter Braak, C. and Prentice, I. (1988). A theory of gradient analysis. Advances in Ecological Research . Available at http://cat.inist.fr/?aModele=afficheN&cpsidt=7248779.
[36] Torgerson, W. S. (1952). Multidimensional scaling. I. Theory and method. Psychometrika 17 401-419. · Zbl 0049.37603 · doi:10.1007/BF02288916
[37] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555-586. · Zbl 1133.62045 · doi:10.1214/009053607000000640
[38] Wartenberg, D., Ferson, S. and Rohlf, F. (1987). Putting things in order: A critique of detrended correspondence analysis. The American Naturalist . Available at http://links.jstor.org/sici?sici=0003-0147(198703)129
[39] Weaver, J. R. (1985). Centrosymmetric (cross-symmetric) matrices, their basic properties, eigenvalues, and eigenvectors. Amer. Math. Monthly 92 711-717. · Zbl 0619.15021 · doi:10.2307/2323222
[40] Williams, C. K. (2000). On a connection between kernel PCA and metric multidimensional scaling. In NIPS 675-681.
[41] Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika 3 19-22. · JFM 64.1302.04
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.