×

Approximating snowflake metrics by trees. (English) Zbl 1393.68180

Summary: Tree metrics are encountered throughout pure and applied mathematics. Their simple structure makes them a convenient choice of metric in many applications from machine learning and computer science. At the same time, there is an elegant theory of harmonic analysis with respect to tree metrics that parallels the classical theory.
A basic question in this field, which is of both theoretical and practical interest, is how to design efficient algorithms for building trees with good metric properties. In particular, given a finite metric space, we seek a random family of dominating tree metrics approximating the underlying metric in expectation. For general metrics, this problem has been solved: on the one hand, there are finite metric spaces that cannot be approximated by trees without incurring a distortion logarithmic in the size of the space, while the tree construction of J. Fakcharoenphol et al. [in: Proceedings of the 35th annual ACM symposium on theory of computing, STOC’03. New York, NY: ACM Press. 448–455 (2003; Zbl 1192.68977)] shows how to achieve such a logarithmic error for arbitrary metrics. Since a distortion that grows even logarithmically with the size of the set may be too large for practical use in many settings, one naturally asks if there is a more restricted class of metrics where one can do better. The main result of this paper is that certain random family of trees already studied in the computer science literature, including the FRT trees, can be used to approximate snowflake metrics (metrics raised to a power less than 1) with expected distortion bounded by its doubling dimension and the degree of snowflaking. We also show that without snowflaking, the metric distortion can be bounded by a term logarithmic in the distance being approximated and linear in the dimension.
We also present an optimal algorithm for building a single FRT tree, whose running time is bounded independently of all problem parameters other than the number of points. We conclude by demonstrating our theoretical results on a numerical example, and applying them to the approximation of the Earth Mover’s Distance between probability distributions.

MSC:

68W05 Nonnumerical algorithms
05C05 Trees
54E35 Metric spaces, metrizability
68W25 Approximation algorithms

Citations:

Zbl 1192.68977

Software:

EMD
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Gavish, M.; Coifman, R., Harmonic analysis of digital databases, (Cohen, J.; Zayed, A. I., Wavelets and Multiscale Analysis, (2011), Birkhäuser), 161-197 · Zbl 1250.68096
[2] Gavish, M.; Coifman, R. R., Sampling, denoising and compression of matrices by coherent matrix organization, Appl. Comput. Harmon. Anal., 33, 3, 354-369, (2012) · Zbl 1256.65036
[3] Gavish, M.; Nadler, B.; Coifman, R. R., Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning, (Proceedings of the 27th International Conference on Machine Learning, ICML-10, (2010)), 367-374
[4] Bartal, Y., Probabilistic approximation of metric spaces and its algorithmic applications, (37th Annual Symposium on Foundations of Computer Science, (1996), IEEE), 184-193
[5] Bartal, Y., On approximating arbitrary metrics by tree metrics, (Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, (1998), ACM Press), 161-168 · Zbl 1029.68951
[6] Alon, N.; Karp, R. M.; Peleg, D.; West, D., A graph-theoretic game and its application to the k-server problem, SIAM J. Comput., 24, 1, 78-100, (1995) · Zbl 0818.90147
[7] Charikar, M.; Chekuri, C.; Goel, A.; Guha, S.; Plotkin, S., Approximating a finite metric by a small number of tree metrics, (Proceedings of the 39th Annual Symposium on Foundations of Computer Science, (1998), IEEE), 379-388
[8] Gupta, A., Steiner points in tree metrics don’t (really) help, (Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, (2001), Society for Industrial and Applied Mathematics), 220-227 · Zbl 0990.05033
[9] Fakcharoenphol, J.; Rao, S.; Talwar, K., A tight bound on approximating arbitrary metrics by tree metrics, (Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, (2003), ACM), 448-455 · Zbl 1192.68977
[10] Charikar, M. S., Similarity estimation techniques from rounding algorithms, (Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, (2002)), 380-388 · Zbl 1192.68226
[11] Marinai, S.; Miotti, B.; Soda, G., Using Earth Mover’s distance in the bag-of-visual-words model for mathematical symbol retrieval, (2011 International Conference on Document Analysis and Recognition, (2011)), 1309-1313
[12] Rubner, Y.; Tomasi, C.; Guibas, L. J., The Earth Mover’s distance as a metric for image retrieval, Int. J. Comput. Vis., 40, 2, 99-121, (2000) · Zbl 1012.68705
[13] Sandler, R.; Lindenbaum, M., Nonnegative matrix factorization with Earth Mover’s distance metric for image analysis, IEEE Trans. Pattern Anal. Mach. Intell., 33, 8, 1590-1602, (2011)
[14] Wan, X., A novel document similarity measure based on Earth Mover’s distance, Inform. Sci., 177, 18, 3718-3730, (2007)
[15] Heinonen, J., Lectures on analysis on metric spaces, (2001), Springer-Verlag · Zbl 0985.46008
[16] Gupta, A.; Krauthgamer, R.; Lee, J. R., Bounded geometries, fractals, and low-distortion embeddings, (Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, (2003))
[17] Assouad, P., Plongements lipschitziens dans \(\mathbb{R}^n\), Bull. Soc. Math. France, 111, 4, 429-448, (1983) · Zbl 0597.54015
[18] Naor, A.; Neiman, O., Assouad’s theorem with dimension independent of the snowflaking, Rev. Mat. Iberoam., 28, 4, 1-21, (2012) · Zbl 1260.46016
[19] Karger, D. R.; Ruhl, M., Finding nearest neighbors in growth-restricted metrics, (Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, (2002)), 741-750 · Zbl 1192.68750
[20] Deng, D. G.; Han, Y., Harmonic analysis on spaces of homogeneous type, (2009), Springer · Zbl 1158.43002
[21] Gupta, A.; Krauthgamer, R.; Lee, J. R., Bounded geometries, fractals, and low-distortion embeddings, (Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, (2003), IEEE), 534-543
[22] Beygelzimer, A.; Kakade, S.; Langford, J., Cover trees for nearest neighbor, (Proceedings of the 23rd International Conference on Machine Learning, ICML-10, (2006)), 97-104
[23] Chan, T.-H. H.; Dhamdhere, K.; Gupta, A.; Kleinberg, J.; Slivkins, A., Metric embeddings with relaxed guarantees, SIAM J. Comput., 38, 6, 2303-2329, (2009) · Zbl 1191.68348
[24] Hildrum, K.; Kubiatowicz, J.; Ma, S.; Rao, S., A note on the nearest neighbor in growth-restricted metrics, (Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (2004), Society for Industrial and Applied Mathematics), 560-561 · Zbl 1318.68069
[25] Abraham, I.; Gavoille, C.; Goldberg, A. V.; Malkhi, D., Routing in networks with low doubling dimension, (Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, (2006), IEEE), 1-10
[26] Chan, H. T.-H.; Gupta, A.; Maggs, B. M.; Zhou, S., On hierarchical routing in doubling metrics, (Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (2005)), 762-771 · Zbl 1297.68090
[27] Dasgupta, S.; Freund, Y., Random projection trees and low dimensional manifolds, (Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, (2008), ACM), 537-546 · Zbl 1231.68114
[28] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396, (2003) · Zbl 1085.68119
[29] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30, (2006) · Zbl 1095.68094
[30] Brudnyi, A.; Brudnyi, Y., Methods of geometric analysis in extension and trace problems, vol. 1, (2012), Springer · Zbl 1253.46001
[31] Semmes, S., On the nonexistence of bilipschitz parameterizations and geometric problems about \(A_\infty\)-weights, Rev. Mat. Iberoam., 12, 2, 337-410, (1996) · Zbl 0858.46017
[32] Meyer, Y., Wavelets and operators, (1992), Cambridge University Press · Zbl 0776.42019
[33] Lee, J. R.; Naor, A., Extending Lipschitz functions via random metric partitions, Invent. Math., 1, 59-95, (2005) · Zbl 1074.46004
[34] Shirdhonkar, S.; Jacobs, D. W., Approximate Earth Mover’s distance in linear time, (IEEE Conference on Computer Vision and Pattern Recognition, (2008)), 1-8
[35] Pele, O.; Werman, M., Fast and robust Earth Mover’s distances, (2009 IEEE 12th International Conference on Computer Vision, (2009), IEEE), 460-467
[36] Leeb, W., The mixed Lipschitz space and its dual for tree metrics, Appl. Comput. Harmon. Anal., (2016)
[37] Leeb, W.; Coifman, R., Hölder-Lipschitz norms and their duals on spaces with semigroups, with applications to Earth Mover’s distance, J. Fourier Anal. Appl., 22, 4, 910-953, (2016) · Zbl 1358.46025
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.