Multiple choice tries and distributed hash tables.

*(English)*Zbl 1172.68067Summary: We consider tries built from \(n\) strings such that each string can be chosen from a pool of \(k\) strings, each of them generated by a discrete i.i.d. source. Three cases are considered: \(k=2\), \(k\) is large but fixed, and \(k\sim c\log n\). The goal in each case is to obtain tries as balanced as possible. Various parameters such as height and fill-up level are analyzed. It is shown that for two-choice tries a 50% reduction in height is achieved when compared with ordinary tries. In a greedy online construction when the string that minimizes the depth of insertion for every pair is inserted, the height is only reduced by 25%. To further reduce the height by another 25%, we design a more refined online algorithm. The total computation time of the algorithm is \(O(n\log n)\). Furthermore, when we choose the best among \(k\geq 2\) strings, then for large but fixed \(k\) the height is asymptotically equal to the typical depth in a trie.

Finally, we show that further improvement can be achieved if the number of choices for each string is proportional to \(\log n\). In this case highly balanced trees can be constructed by a simple greedy algorithm for which the difference between the height and the fill-up level is bounded by a constant with high probability. This, in turn, has implications for distributed hash tables, leading to a randomized ID management algorithm in peer-to-peer networks such that, with high probability, the ratio between the maximum and the minimum load of a processor is \(O(1)\).

Finally, we show that further improvement can be achieved if the number of choices for each string is proportional to \(\log n\). In this case highly balanced trees can be constructed by a simple greedy algorithm for which the difference between the height and the fill-up level is bounded by a constant with high probability. This, in turn, has implications for distributed hash tables, leading to a randomized ID management algorithm in peer-to-peer networks such that, with high probability, the ratio between the maximum and the minimum load of a processor is \(O(1)\).

##### Keywords:

random tries; random data structures; probabilistic analysis of algorithms; algorithms on sequences; distributed hash tables##### Software:

PATRICIA
PDF
BibTeX
XML
Cite

\textit{L. Devroye} et al., Random Struct. Algorithms 34, No. 3, 337--367 (2009; Zbl 1172.68067)

Full Text:
DOI

##### References:

[1] | I. Abraham, B. Awerbuch, Y. Azar, Y. Bartal, D. Malkhi, and E. Pavlov, A generic scheme for building overlay networks in adversarial scenarios, In Proceedings of the international parallel and distributed processing symposium (IPDPS 2003), Nice, France, 2003. |

[2] | M. Adler, E. Halperin, R. M. Karp, and V. V. Vazirani, A stochastic process on the hypercube with applications to peer-to-peer networks, In Proceedings of the 35th ACM symposium on theory of computing (STOC 2003), San Diego, CA, 2003, pp. 575-584. · Zbl 1192.68018 |

[3] | Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal, Balanced allocations (extended abstract), In Proceedings of the 26th ACM symposium on the theory of computing, Montréal, Québec, Canada, 1994, pp. 593-602. |

[4] | Azar, Balanced allocations, SIAM J Comput 29 pp 180– (1999) · Zbl 0937.68053 |

[5] | Balakrishnan, Looking up data in P2P systems, Commun ACM 1946 pp 43– (2003) |

[6] | Balakrishnan, Lecture Notes in Computer Science 3640 pp 104– (2005) |

[7] | Bondy, Graph theory with applications (1976) · Zbl 1226.05083 · doi:10.1007/978-1-349-03521-2 |

[8] | Boucheron, A sharp concentration inequality with applications in random combinatorics and learning, Random Struct Algorithms 16 pp 277– (2000) |

[9] | Boucheron, Concentration inequalities using the entropy method, Ann Probab 31 pp 1583– (2003) · Zbl 1051.60020 |

[10] | Chung, On the application of the Borel-Cantelli lemma, Trans Am Math Soc 72 pp 179– (1952) · Zbl 0046.35203 |

[11] | Coffman, File structures using hashing functions, Commun ACM 13 pp 427– (1970) · Zbl 0216.24203 |

[12] | Csiszár, On generalized entropy, Studia Scientiarium Mathematicarum Hungarica 4 pp 401– (1969) |

[13] | A. Czumaj, and V. Stemann, Randomized allocation processes, In Proceedings of the 38th IEEE symposium on foundations of computer science (FOCS’97), October 19-22, Miami Beach, FL, 1997, pp. 194-203. · Zbl 1011.68177 |

[14] | F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica, Wide-area cooperative storage with CFS, In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 2001), Banff, Alberta, Canada, 2001, pp. 202-215. |

[15] | Deheuvels, On the Erdös-Rényi theorem for random fields and sequences and its relationships with the theory of runs and spacings, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 70 pp 91– (1985) |

[16] | Dembo, Large deviations techniques and applications (1998) · doi:10.1007/978-1-4612-5320-4 |

[17] | Devroye, A study of trie-like structures under the density model, Ann App Probab 2 pp 402– (1992) · Zbl 0758.68051 |

[18] | Devroye, Laws of large numbers and tail inequalities for random tries and Patricia trees, J Comput Appl Math 142 pp 27– (2002) · Zbl 1005.60032 |

[19] | Erdös, On a new law of large numbers, J Anal Math 22 pp 103– (1970) |

[20] | Fredkin, Trie memory, Commun ACM 3 pp 490– (1960) |

[21] | Hall, On representatives of subsets, J London Math Soc 10 pp 26– (1935) · Zbl 0010.34503 |

[22] | Jacquet, Lecture notes in computer science 214 pp 196– (1986) |

[23] | Jacquet, Analysis of digital tries with Markovian dependency, IEEE Trans Info Theory 37 pp 1470– (1991) |

[24] | Janson, Random graphs (2000) · doi:10.1002/9781118032718 |

[25] | D. R. Karger, and M. Ruhl, New algorithms for load balancing in peer-to-peer systems, IRIS Student Workshop, Cambridge, MA, 2003. · Zbl 1115.68018 |

[26] | Knessl, On the number of full levels in tries, Random Struct Algorithms 25 pp 247– (2004) · Zbl 1077.68022 |

[27] | Knuth, Sorting and searching 3 (1973) |

[28] | Konheim, A note on growing binary trees, Discrete Math 4 pp 57– (1973) · Zbl 0247.05105 |

[29] | Lévy, Sur la division d’un segment par des points choisis au hasard, Comptes Rendus Acad Sci Paris 208 pp 147– (1939) · JFM 65.0553.02 |

[30] | Mahmoud, Evolution of random search trees (1992) · Zbl 0762.68033 |

[31] | D. Malkhi, M. Naor, and D. Ratajczak, Viceroy: A scalable and dynamic emulation of the butterfly, In Proceedings of the 21st ACM symposium on principles of distributed computing (PODC 2002), Monterey, California, 2002, pp. 183-192. · Zbl 1292.68015 |

[32] | Mallows, An inequality involving multinomial probabilities, Biometrika 55 pp 422– (1968) · Zbl 0162.50707 |

[33] | G. S. Manku, M. Bawa, and P. Raghavan, Symphony: Distributed hashing in a small world, In Proceedings of the fourth USENIX symposium on internet technologies and systems (USITS 2003), Seattle, WA, 2003, pp. 127-140 |

[34] | G. S. Manku, Balanced binary trees for ID management and load balance in distributed hash tables, In 23rd ACM symposium on principles of distributed computing (PODC 2004), St. John’s, Newfoundland, Canada, 2004, pp. 197-205. · Zbl 1321.68227 |

[35] | Morrison, PATRICIA - Practical algorithm to retrieve information coded in alphanumeric, J ACM 15 pp 514– (1968) |

[36] | M. Naor, and U. Wieder, Novel architectures for P2P applications: The continuous-discrete approach, In Proceedings of the 15th ACM symposium on parallelism in algorithms and architectures (SPAA 2003), San Diego, CA, 2003, pp. 50-59. · Zbl 1192.68050 |

[37] | Novak, On the Erdös-Rényi maximum of partial sums, Theory Probab Appl 42 pp 254– (1995) |

[38] | R. Pagh, and F. F. Rodler, Cuckoo hashing, BRICS Report Series RS-01-32, Department of Computer Science, University of Aarhus, 2001. · Zbl 1006.68522 |

[39] | Pittel, Asymptotical growth of a class of random trees, Ann Probab 13 pp 414– (1985) · Zbl 0563.60010 |

[40] | Pittel, Path in a random digital tree: limiting distributions, Adv Appl Probab 18 pp 139– (1986) · Zbl 0588.60012 |

[41] | Pyke, Spacings, J Royal Stat Soc 27 pp 395– (1965) |

[42] | S. Ramabhadran, S. Ratnasamy, J. M. Hellerstein, and S. Shenker, Prefix hash tree: An indexing data structure over distributed hash tables, IRB Technical Report, 2004. |

[43] | S. Ratnasamy, P. Francis, M. Handley, and R. M. Karp, A scalable content-addressable network, In Proceedings of the ACM SIGCOMM 2001, San Diego, CA, 2001, pp. 161-172. |

[44] | Régnier, New results on the size of tries, IEEE Trans Info Theory IT-35 pp 203– (1989) · Zbl 0684.68038 |

[45] | Rényi, On the dimension and entropy of probability distributions, Acta Mathematica Academiae Sci Hungarica 10 pp 193– (1959) |

[46] | Szpankowski, Some results on V -ary asymmetric tries, J Algorithms 9 pp 224– (1988) · Zbl 0637.68072 |

[47] | Szpankowski, Average case analysis of algorithms on sequences (2001) · Zbl 0968.68205 · doi:10.1002/9781118032770 |

[48] | Tarjan, Data structures and network algorithms (1983) · Zbl 0584.68077 · doi:10.1137/1.9781611970265 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.