×

Bootstrapping max statistics in high dimensions: near-parametric rates under weak variance decay and application to functional and multinomial data. (English) Zbl 1464.62266

The authors study bootstrap based on Gaussian multipliers for “max-type” statistics of functional data. They show that this Gaussian bootstrap method can approximate the distribution of such statistics with a rate close to the nonparametric rate under some assumptions, e.g. independence of the observed vectors and decaying variances of the components. This has the effect that with high probability, the maximum is attained by one of relatively few components with high variance. Assuming this variance decay, the rate is independent of the number of components \(p\). In a simulation study, the Gaussian multiplier bootstrap is compared to a method based on principal components [H. Choi and M. Reimherr, J. R. Stat. Soc., Ser. B, Stat. Methodol. 80, No. 1, 239–260 (2018; Zbl 1381.62143)].

MSC:

62G09 Nonparametric statistical resampling methods
62G15 Nonparametric tolerance and confidence regions
62G05 Nonparametric estimation
62R10 Functional data analysis

Citations:

Zbl 1381.62143

References:

[1] Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley Series in Probability and Statistics. Wiley Interscience, New York. · Zbl 1018.62002
[2] Arlot, S., Blanchard, G. and Roquain, E. (2010a). Some nonasymptotic results on resampling in high dimension. I. Confidence regions. Ann. Statist. 38 51-82. · Zbl 1180.62066 · doi:10.1214/08-AOS667
[3] Arlot, S., Blanchard, G. and Roquain, E. (2010b). Some nonasymptotic results on resampling in high dimension. II. Multiple tests. Ann. Statist. 38 83-99. · Zbl 1181.62055 · doi:10.1214/08-AOS668
[4] Balakrishnan, S. and Wasserman, L. (2018). Hypothesis testing for high-dimensional multinomials: A selective review. Ann. Appl. Stat. 12 727-749. · Zbl 1405.62061 · doi:10.1214/18-AOAS1155SF
[5] Balakrishnan, S. and Wasserman, L. (2019). Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. Ann. Statist. 47 1893-1927. · Zbl 1466.62307 · doi:10.1214/18-AOS1729
[6] Belloni, A., Chernozhukov, V., Chetverikov, D., Hansen, C. and Kato, K. (2018). High-dimensional econometrics and regularized GMM. arXiv:1806.01888.
[7] Bénasséni, J. (2012). A new derivation of eigenvalue inequalities for the multinomial distribution. J. Math. Anal. Appl. 393 697-698. · Zbl 1308.60020
[8] Benko, M., Härdle, W. and Kneip, A. (2009). Common functional principal components. Ann. Statist. 37 1-34. · Zbl 1169.62057 · doi:10.1214/07-AOS516
[9] Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension. J. Statist. Plann. Inference 113 385-402. · Zbl 1017.60023 · doi:10.1016/S0378-3758(02)00094-0
[10] Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Bernoulli 21 1200-1230. · Zbl 1388.62173 · doi:10.3150/14-BEJ602
[11] Cao, G., Yang, L. and Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. J. Nonparametr. Stat. 24 359-377. · Zbl 1241.62119 · doi:10.1080/10485252.2011.638071
[12] Chafaï, D. and Concordet, D. (2009). Confidence regions for the multinomial parameter with small sample size. J. Amer. Statist. Assoc. 104 1071-1079. · Zbl 1388.62062 · doi:10.1198/jasa.2009.tm08152
[13] Chang, J., Yao, Q. and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika 104 111-127. · Zbl 1506.62307 · doi:10.1093/biomet/asw066
[14] Chen, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist. 46 642-678. · Zbl 1396.62019 · doi:10.1214/17-AOS1563
[15] Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896-1928. · Zbl 1327.62303 · doi:10.1214/15-AOS1329
[16] Chen, D. and Müller, H.-G. (2012). Nonlinear manifold representations for functional data. Ann. Statist. 40 1-29. · Zbl 1246.62146
[17] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030 · doi:10.1214/13-AOS1161
[18] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787-1818. · Zbl 1305.62161 · doi:10.1214/14-AOS1235
[19] Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab. 45 2309-2352. · Zbl 1377.60040 · doi:10.1214/16-AOP1113
[20] Choi, H. and Reimherr, M. (2016). R package ‘\( \mathtt{fregion} \)’. https://github.com/hpchoi/fregion.
[21] Choi, H. and Reimherr, M. (2018). A geometric approach to confidence regions and bands for functional parameters. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 239-260. · Zbl 1381.62143 · doi:10.1111/rssb.12239
[22] Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B 46 440-464. · Zbl 0571.62017 · doi:10.1111/j.2517-6161.1984.tb01318.x
[23] Degras, D. A. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statist. Sinica 21 1735-1765. · Zbl 1225.62052 · doi:10.5705/ss.2009.207
[24] Deng, H. and Zhang, C. H. (2017). Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors. arXiv:1705.09528.
[25] Dezeure, R., Bühlmann, P. and Zhang, C.-H. (2017). High-dimensional simultaneous inference with the bootstrap. TEST 26 685-719. · Zbl 06833591 · doi:10.1007/s11749-017-0554-2
[26] Fan, J., Shao, Q.-M. and Zhou, W.-X. (2018). Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Statist. 46 989-1017. · Zbl 1402.62097 · doi:10.1214/17-AOS1575
[27] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, New York. · Zbl 1119.62046
[28] Fienberg, S. E. and Holland, P. W. (1973). Simultaneous estimation of multinomial cell probabilities. J. Amer. Statist. Assoc. 68 683-691. · Zbl 0267.62030 · doi:10.1080/01621459.1973.10481405
[29] Fitzpatrick, S. and Scott, A. (1987). Quick simultaneous confidence intervals for multinomial proportions. J. Amer. Statist. Assoc. 82 875-878. · Zbl 0623.62024 · doi:10.1080/01621459.1987.10478511
[30] Goodman, L. A. (1965). On simultaneous confidence intervals for multinomial proportions. Technometrics 7 247-254. · Zbl 0131.17701 · doi:10.1080/00401706.1965.10490252
[31] Hoeffding, W. (1965). Asymptotically optimal tests for multinomial distributions. Ann. Math. Stat. 36 369-408. · Zbl 0135.19706 · doi:10.1214/aoms/1177700150
[32] Holst, L. (1972). Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika 59 137-145. · Zbl 0235.62008 · doi:10.1093/biomet/59.1.137
[33] Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer Series in Statistics. Springer, New York. · Zbl 1279.62017
[34] Horváth, L., Kokoszka, P. and Reeder, R. (2013). Estimation of the mean of functional time series and a two-sample problem. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 103-122. · Zbl 07555440
[35] Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1338.62009
[36] Johnson, W. B., Schechtman, G. and Zinn, J. (1985). Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann. Probab. 13 234-253. · Zbl 0564.60020 · doi:10.1214/aop/1176993078
[37] Jung, S., Lee, M. H. and Ahn, J. (2018). On the number of principal components in high dimensions. Biometrika 105 389-402. · Zbl 07072419 · doi:10.1093/biomet/asy010
[38] Koltchinskii, V., Löffler, M. and Nickl, R. (2020). Efficient estimation of linear functionals of principal components. Ann. Statist. 48 464-490. · Zbl 1440.62232 · doi:10.1214/19-AOS1816
[39] Koltchinskii, V. and Lounici, K. (2017a). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110-133. · Zbl 1366.60057 · doi:10.3150/15-BEJ730
[40] Koltchinskii, V. and Lounici, K. (2017b). Normal approximation and concentration of spectral projectors of sample covariance. Ann. Statist. 45 121-157. · Zbl 1367.62175 · doi:10.1214/16-AOS1437
[41] Lopes, M. E., Lin, Z. and Müller, H.-G. (2020). Supplement to “Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data.” https://doi.org/10.1214/19-AOS1844SUPP.
[42] Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029-1058. · Zbl 1320.62124 · doi:10.3150/12-BEJ487
[43] Naumov, A., Spokoiny, V. and Ulyanov, V. (2019). Bootstrap confidence sets for spectral projectors of sample covariance. Probab. Theory Related Fields 174 1091-1132. · Zbl 1420.62073 · doi:10.1007/s00440-018-0877-2
[44] Paninski, L. (2008). A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inform. Theory 54 4750-4755. · Zbl 1322.62082 · doi:10.1109/TIT.2008.928987
[45] Quesenberry, C. P. and Hurst, D. C. (1964). Large sample simultaneous confidence intervals for multinomial proportions. Technometrics 6 191-195. · Zbl 0129.32605 · doi:10.1080/00401706.1964.10490163
[46] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York. · Zbl 1079.62006
[47] Reiß, M. and Wahl, M. (2020). Nonasymptotic upper bounds for the reconstruction error of PCA. Ann. Statist. 48 1098-1123. · Zbl 1450.62070
[48] Rice, J. A. (2007). Mathematical Statistics and Data Analysis. Duxbury Press, Pacific Grove CA.
[49] Sison, C. P. and Glaz, J. (1995). Simultaneous confidence intervals and sample size determination for multinomial proportions. J. Amer. Statist. Assoc. 90 366-369. · Zbl 0820.62028 · doi:10.1080/01621459.1995.10476521
[50] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.
[51] Wang, H. (2008). Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions. J. Multivariate Anal. 99 896-911. · Zbl 1136.62328 · doi:10.1016/j.jmva.2007.05.003
[52] Wang, J.-L., Chiou, J.-M. and Müller, H.-G. (2016). Functional data analysis. Annu. Rev. Stat. Appl. 3 257-295.
[53] Wasserman, L., Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for estimating undirected graphs. Electron. J. Stat. 8 1188-1224. · Zbl 1298.62089 · doi:10.1214/14-EJS928
[54] Zelterman, D. (1987). Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. 82 624-629. · Zbl 0641.62037 · doi:10.1080/01621459.1987.10478475
[55] Zhang, X. and Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. J. Amer. Statist. Assoc. 112 757-768.
[56] Zhang, J.-T., Cheng, M.-Y., Wu, H.-T. and Zhou, B. (2019). A new test for functional one-way ANOVA with applications to ischemic heart screening. Comput. Statist. Data Anal. 132 3-17. · Zbl 1507.62204 · doi:10.1016/j.csda.2018.05.004
[57] Zheng, S. · Zbl 1367.62151 · doi:10.1080/01621459.2013.866899
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.