×

Euclidean distance matrix completion and point configurations from the minimal spanning tree. (English) Zbl 1384.15005

Summary: The paper introduces a special case of the Euclidean distance matrix completion problem of interest in statistical data analysis where only the minimal spanning tree distances are given and the matrix completion must preserve the minimal spanning tree. Two solutions are proposed: one an adaptation of a more general method based on a dissimilarity parameterized formulation and the other an entirely novel method which constructs the point configuration directly through a guided random search. These methods as well as three standard edcmp methods are described and compared experimentally on real and synthetic data. It is found that the constructive method given by the guided random search algorithm clearly outperforms all others considered here. Notably, standard methods including the adaptation force peculiar and generally unwanted geometric structure on the point configurations their completions produce.

MSC:

15A83 Matrix completion problems
05C50 Graphs and linear algebra (matrices, eigenvalues, etc.)
05C05 Trees
62-07 Data analysis (statistics) (MSC2010)

Software:

vegan; Qhull; LBFGS-B
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] A. Y. Alfakih, A. Khandani, and H. Wolkowicz, {\it Solving Euclidean distance matrix completion problems via semidefinite programming}, Comput. Optim. Appl., 12 (1999), pp. 13-30. · Zbl 1040.90537
[2] E. Anderson, {\it The irises of the Gaspe Peninsula}, Bull. Amer. Iris Soc., 59 (1935), pp. 2-5.
[3] C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, {\it The quickhull algorithm for convex hulls}, ACM Trans. Math. Softw., 22 (1996), pp. 469-483, . · Zbl 0884.65145
[4] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, {\it A limited memory algorithm for bound constrained optimization}, SIAM J. Sci. Comput., 16 (1995), pp. 1190-1208. · Zbl 0836.65080
[5] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, {\it Euclidean distance matrices: Essential theory, algorithms, and applications}, Signal Process. Mag., 32 (2015), pp. 12-30.
[6] H. Edelsbrunner, {\it Alpha shapes}—{\it A survey}, Tessellations Sci., 27 (2010), pp. 1-25.
[7] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, {\it A density-based algorithm for discovering clusters in large spatial databases with noise}, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[8] H. Fang and D. P. O’Leary, {\it Euclidean distance matrix completion problems}, Optim. Methods Softw., 27 (2012), pp. 695-717. · Zbl 1252.49046
[9] R. A. Fisher, {\it The use of multiple measurements in taxonomic problems}, Ann. Eugenics, 7 (1936), pp. 179-188.
[10] J. H. Friedman and L. C. Rafsky, {\it Multivariate Generalizations of the Wald-Wolfowitz and Smirnov two-sample tests}, Ann. Statisti., 7 (1979), pp. 697-717, . · Zbl 0423.62034
[11] J. H. Friedman and L. C. Rafsky, {\it Graphics for the multivariate two-sample problem}, J. Amer. Statist. Assoc., 76 (1981), pp. 277-287, .
[12] J. C. Gower and G. B. Dijksterhuis, {\it Procrustes Problems}, vol. 30, Oxford University Press, Oxford, 2004. · Zbl 1057.62044
[13] J. C. Gower and G. Ross, {\it Minimum spanning trees and single linkage cluster analysis}, Appl. Statist., 1969, pp. 54-64.
[14] I. G. Grooms, R. M. Lewis, and M. W. Trosset, {\it Molecular embedding via a second order dissimilarity parameterized approach}, SIAM J. Sci. Comput., 31 (2009), pp. 2733-2756. · Zbl 1193.92039
[15] C. R. Johnson and P. Tarazaga, {\it Connections between the real positive semidefinite and distance matrix completion problems}, Linear Algebra Appl., 223 (1995), pp. 375-391. · Zbl 0827.15032
[16] N. Krislock and H. Wolkowicz, {\it Explicit sensor network localization using semidefinite representations and facial reductions}, SIAM J. Optim., 20 (2010), pp. 2679-2708. · Zbl 1229.90250
[17] N. Krislock and H. Wolkowicz, {\it Euclidean distance matrices and applications}, in Handbook on Semidefinite, Cone and Polynomial Optimization: Theory, Algorithms, Software and Applications, M. Anjos and J. Lasserre, eds., vol. 166 of International Series in Operations Research and Management Science, Springer Science & Business Media, New York, 2011, pp. 879-914. · Zbl 1334.90109
[18] J. Oksanen, F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R. Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stevens, E. Szoecs, and H. Wagner, {\it Vegan: Community Ecology Package}, R package version 2.4-3 ed., 2017, .
[19] F. Schmid and R. Schmidt, {\it Multivariate extensions of Spearman’s rho and related statistics}, Statist. Probab. Lett., 77 (2007), pp. 407-416. · Zbl 1108.62056
[20] C. Spearman, {\it The proof and measurement of association between two things}, Amer. J. Psychol., 15 (1904), pp. 72-101.
[21] W. Stuetzle, {\it Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample}, J. Classification, 20 (2003), pp. 25-47. · Zbl 1055.62075
[22] J. B. Tenenbaum, V. De Silva, and J. C. Langford, {\it A global geometric framework for nonlinear dimensionality reduction}, Science, 290 (2000), pp. 2319-2323.
[23] M. W. Trosset, {\it Computing distances between convex sets and subsets of the positive semidefinite matrices}, Technical Report 97-3 MS 134, Rice University, Houston, TX, 1997.
[24] M. W. Trosset, {\it Distance matrix completion by numerical optimization}, Comput. Optim. Appl., 17 (2000), pp. 11-22. · Zbl 0964.65047
[25] L. Wilkinson, A. Anand, and R. Grossman, {\it Graph-theoretic scagnostics}, in IEEE Symposium on Information Visualization, Washington, DC, IEEE Computer Society, 2005, pp. 157-164.
[26] C. T. Zahn, {\it Graph-theoretical methods for detecting and describing gestalt clusters}, IEEE Trans. Comput., 100 (1971), pp. 68-86. · Zbl 0264.68040
[27] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, {\it Lbfgs-b: Fortran subroutines for large-scale bound constrained optimization}, Report NAM-11, Northwestern University, Evanston, IL, 1994. · Zbl 0912.65057
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.