Howard, Steven R.; Ramdas, Aaditya Sequential estimation of quantiles with applications to A/B testing and best-arm identification. (English) Zbl 07526603 Bernoulli 28, No. 3, 1704-1728 (2022). Summary: We design confidence sequences – sequences of confidence intervals which are valid uniformly over time – for quantiles of any distribution over a complete, fully-ordered set, based on a stream of i.i.d. observations. We give methods both for tracking a fixed quantile and for tracking all quantiles simultaneously. Specifically, we provide explicit expressions with small constants for intervals whose widths shrink at the fastest possible \(\sqrt{{t^{-1}}\log \log t}\) rate, along with a nonasymptotic concentration inequality for the empirical distribution function which holds uniformly over time with the same rate. The latter strengthens Smirnov’s empirical process law of the iterated logarithm and extends the Dvoretzky-Kiefer-Wolfowitz inequality to hold uniformly over time. We give a new algorithm and sample complexity bound for selecting an arm with an approximately best quantile in a multi-armed bandit framework. In simulations, our method needs fewer samples than existing methods by a factor of five to fifty. Cited in 4 Documents MSC: 62Gxx Nonparametric inference 60Gxx Stochastic processes 62Lxx Sequential statistical methods Keywords:best-arm identification; confidence sequences; Dvoretzky-Kiefer-Wolfowitz inequality; empirical process; quantile estimation PDFBibTeX XMLCite \textit{S. R. Howard} and \textit{A. Ramdas}, Bernoulli 28, No. 3, 1704--1728 (2022; Zbl 07526603) Full Text: DOI arXiv Link References: [1] Anderson, C.W. (1984). Large deviations of extremes. In Statistical Extremes and Applications (Vimeiro, 1983). NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci. 131 325-340. Dordrecht: Reidel. · Zbl 0551.60029 [2] Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (2008). A First Course in Order Statistics. Classics in Applied Mathematics 54. Philadelphia, PA: SIAM. 10.1137/1.9780898719062 · Zbl 1172.62017 [3] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press. 10.1093/acprof:oso/9780199535255.001.0001 · Zbl 1279.60005 [4] Boucheron, S. and Thomas, M. (2012). Concentration inequalities for order statistics. Electron. Commun. Probab. 17 no. 51, 12. 10.1214/ECP.v17-2210 · Zbl 1349.60021 [5] Bubeck, S., Munos, R. and Stoltz, G. (2009). Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory. Lecture Notes in Computer Science 5809 23-37. Berlin: Springer. 10.1007/978-3-642-04414-4_7 · Zbl 1262.68061 [6] Corless, R.M., Gonnet, G.H., Hare, D.E.G., Jeffrey, D.J. and Knuth, D.E. (1996). On the Lambert \(W\) function. Adv. Comput. Math. 5 329-359. 10.1007/BF02124750 · Zbl 0863.65008 [7] Darling, D.A. and Robbins, H. (1967a). Confidence sequences for mean, variance, and median. Proc. Natl. Acad. Sci. USA 58 66-68. 10.1073/pnas.58.1.66 · Zbl 0173.21002 [8] Darling, D.A. and Robbins, H. (1967b). Iterated logarithm inequalities. Proc. Natl. Acad. Sci. USA 57 1188-1192. 10.1073/pnas.57.5.1188 · Zbl 0167.46902 [9] Darling, D.A. and Robbins, H. (1968). Some nonparametric sequential tests with power one. Proc. Natl. Acad. Sci. USA 61 804-809. 10.1073/pnas.61.3.804 · Zbl 0241.62033 [10] David, Y. and Shimkin, N. (2016). Pure exploration for max-quantile bandits. In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science 556-571. Springer. [11] Dekkers, A.L.M. and de Haan, L. (1989). On the estimation of the extreme-value index and large quantile estimation. Ann. Statist. 17 1795-1832. 10.1214/aos/1176347396 · Zbl 0699.62028 [12] Drees, H. (1998). On smooth statistical tail functionals. Scand. J. Stat. 25 187-210. 10.1111/1467-9469.00097 · Zbl 0923.62032 [13] Drees, H., de Haan, L. and Li, D. (2003). On large deviation for extremes. Statist. Probab. Lett. 64 51-62. 10.1016/S0167-7152(03)00130-5 · Zbl 1113.62329 [14] Dudley, R.M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1 290-330. 10.1016/0022-1236(67)90017-1 · Zbl 0188.20502 [15] Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27 642-669. 10.1214/aoms/1177728174 · Zbl 0073.14603 [16] Even-Dar, E., Mannor, S. and Mansour, Y. (2002). PAC bounds for multi-armed bandit and Markov decision processes. In Computational Learning Theory (Sydney, 2002). Lecture Notes in Computer Science 2375 255-270. Berlin: Springer. 10.1007/3-540-45435-7_18 · Zbl 1050.68059 [17] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics, [40]. New York: Cambridge Univ. Press. 10.1017/CBO9781107337862 · Zbl 1358.62014 [18] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. · Zbl 0127.10602 · doi:10.1080/01621459.1963.10500830 [19] Howard, S.R. and Ramdas, A. (2022). Supplement to “Sequential estimation of quantiles with applications to A/B testing and best-arm identification.” 10.3150/21-BEJ1388SUPP [20] Howard, S.R., Ramdas, A., McAuliffe, J. and Sekhon, J. (2020). Time-uniform Chernoff bounds via nonnegative supermartingales. Probab. Surv. 17 257-317. 10.1214/18-PS321 · Zbl 1456.60054 [21] Howard, S.R., Ramdas, A., McAuliffe, J. and Sekhon, J. (2021). Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49 1055-1080. 10.1214/20-aos1991 · Zbl 1476.62170 [22] James, B.R. (1975). A functional law of the iterated logarithm for weighted empirical distributions. Ann. Probab. 3 762-772. 10.1214/aop/1176996263 · Zbl 0347.60030 [23] Jamieson, K. and Nowak, R. (2014). Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 48th Annual Conference on Information Sciences and Systems (CISS) 1-6. [24] Jamieson, K., Malloy, M., Nowak, R. and Bubeck, S. (2014). Lil’ UCB: An optimal exploration algorithm for multi-armed bandits. In Proceedings of the 27th Conference on Learning Theory. Proceedings of Machine Learning Research 35 423-439. [25] Kalogerias, D.S., Nikolakakis, K.E., Sarwate, A.D. and Sheffet, O. (2020). Best-Arm Identification for Quantile Bandits with Privacy. arXiv:2006.06792. [26] Kalyanakrishnan, S., Tewari, A., Auer, P. and Stone, P. (2012). PAC subset selection in stochastic multi-armed bandits. In Proceedings of the 29th International Conference on Machine Learning 655-662. [27] Kaufmann, E., Cappé, O. and Garivier, A. (2016). On the complexity of best-arm identification in multi-armed bandit models. J. Mach. Learn. Res. 17 Paper No. 1, 42. · Zbl 1360.62433 [28] Mannor, S. and Tsitsiklis, J.N. (2003/04). The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5 623-648. 10.1007/978-3-540-45167-9_31 · Zbl 1222.68099 [29] Massart, P. (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab. 18 1269-1283. · Zbl 0713.62021 [30] Schreuder, N., Brunel, V.-E. and Dalalyan, A. (2020). A nonasymptotic law of iterated logarithm for general M-estimators. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR 108 1331-1341. [31] Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to Statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: Wiley. · Zbl 1170.62365 [32] Smirnov, N.V. (1944). Approximate laws of distribution of random variables from empirical data. Uspekhi Mat. Nauk 10 179-206. · Zbl 0063.07087 [33] Stephanou, M., Varughese, M. and Macdonald, I. (2017). Sequential quantiles via Hermite series density estimation. Electron. J. Stat. 11 570-607. 10.1214/17-EJS1245 · Zbl 1392.62243 [34] Szörényi, B., Busa-Fekete, R., Weng, E. and Hüllermeier, P. (2015). Qualitative multi-armed bandits: A quantile-based approach. In Proceedings of the 32nd International Conference on Machine Learning 1660-1668. [35] Talagrand, M. (2005). The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer Monographs in Mathematics. Berlin: Springer. · Zbl 1075.60001 [36] Torossian, L., Garivier, A. and Picheny, V. (2019). X-armed bandits: Optimizing quantiles, CVaR and other risks. In ‘Asian Conference on Machine Learning’, PMLR 252-267. [37] Khintchine, A. (1924). Über einen Satz der Wahrscheinlichkeitsrechnung. Fund. Math. 6 9-20. · JFM 50.0344.02 [38] Ville, J. (1939). Étude Critique de la Notion de Collectif. Gauthier-Villars, Paris. · Zbl 0021.14601 [39] Yu, J.Y. and Nikolova, E. (2013). Sample complexity of risk-averse bandit-arm selection. In Twenty-Third International Joint Conference on Artificial Intelligence. [40] Zhao, S., Zhou, E., Sabharwal, A. and Ermon, S. (2016). Adaptive concentration inequalities for sequential decision problems. In 30th Conference on Neural Information Processing Systems This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.