×

Randomized partition trees for nearest neighbor search. (English) Zbl 1311.68148

Summary: The \(k\)-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest: when the data are drawn from a doubling measure; when the data and query distributions are identical and are supported on a set of bounded doubling dimension; and when the data are documents from a topic model.

MSC:

68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68P20 Information storage and retrieval of data
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Ailon, N., Chazelle, B.: The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39, 302-322 (2009) · Zbl 1185.68327 · doi:10.1137/060673096
[2] Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117-122 (2008) · doi:10.1145/1327452.1327494
[3] Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching. J. ACM 45, 891-923 (1998) · Zbl 1065.68650 · doi:10.1145/293347.293348
[4] Assouad, P.: Plongements lipschitziens dans \[{\mathbb{R}}^nRn\]. Bull. Soc. Math. France 111(4), 429-448 (1983) · Zbl 0597.54015
[5] Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509-517 (1975) · Zbl 0306.68061 · doi:10.1145/361002.361007
[6] Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (2006) · Zbl 0994.54501
[7] Cayton, L., Dasgupta, S.: A learning framework for nearest-neighbor search. In: Advances in Neural Information Processing Systems (2007) · Zbl 0306.68061
[8] Clarkson, K.: Nearest neighbor queries in metric spaces. Discret. Comput. Geom. 22, 63-93 (1999) · Zbl 0994.54501 · doi:10.1007/PL00009449
[9] Clarkson, K.; Darrell, T. (ed.); Indyk, P. (ed.), Nearest-neighbor searching and metric space dimensions (2005), Cambridge
[10] Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: ACM Symposium on Theory of, Computing, pp. 537-546 (2008) · Zbl 1231.68114
[11] Dasgupta, S., Sinha, K.: Randomized partition trees for exact nearest neighbor search. In: 26th Annual Conference on Learning Theory (2013) · Zbl 1311.68148
[12] Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: 44th Annual IEEE Symposium on Foundations of Computer, Science, pp. 534-543 (2003)
[13] Karger, D., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: ACM Symposium on Theory of, Computing, pp. 741-750 (2002) · Zbl 1192.68750
[14] Kleinberg, J.: Two algorithms for nearest-neighbor search in high dimensions. In: 29th ACM Symposium on Theory of, Computing (1997) · Zbl 0963.68049
[15] Krauthgamer, R., Lee, J.: Navigating nets: simple algorithms for proximity search. In: ACM-SIAM Symposium on Discrete Algorithms (2004) · Zbl 1318.68071
[16] Liu, T., Moore, A., Gray, A., Yang, K.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems (2004)
[17] Maneewongvatana, S., Mount, D.: The analysis of a probabilistic approach to nearest neighbor searching. In: Seventh International Worshop on Algorithms and Data Structures, pp. 276-286 (2001) · Zbl 0997.68521
[18] McFee, B., Lanckriet, G.: Large-scale music similarity search with spatial trees. In: 12th Conference of the International Society for Music Retrieval (2011) · Zbl 1280.68181
[19] Stone, C.: Consistent nonparametric regression. Ann. Stat. 5, 595-645 (1977) · Zbl 0366.62051 · doi:10.1214/aos/1176343886
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.