×

Nonparametric distributed learning under general designs. (English) Zbl 1466.62364

Summary: This paper focuses on the distributed learning in nonparametric regression framework. With sufficient computational resources, the efficiency of distributed algorithms improves as the number of machines increases. We aim to analyze how the number of machines affects statistical optimality. We establish an upper bound for the number of machines to achieve statistical minimax in two settings: nonparametric estimation and hypothesis testing. Our framework is general compared with existing work. We build a unified frame in distributed inference for various regression problems, including thin-plate splines and additive regression under random design: univariate, multivariate, and diverging-dimensional designs. The main tool to achieve this goal is a tight bound of an empirical process by introducing the Green function for equivalent kernels. Thorough numerical studies back theoretical findings.

MSC:

62J02 General nonlinear regression
62J07 Ridge regression; shrinkage estimators (Lasso)
62G08 Nonparametric regression and quantile regression
62G10 Nonparametric hypothesis testing
62C20 Minimax procedures in statistical decision theory
62K20 Response surface designs

Software:

gss
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Peter L Bartlett, Olivier Bousquet, Shahar Mendelson, et al. Local rademacher complexities., The Annals of Statistics, 33(4) :1497-1537, 2005. · Zbl 1083.62034 · doi:10.1214/009053605000000282
[2] Peter de Jong. A central limit theorem for generalized quadratic forms., Probability Theory and Related Fields, 75(2):261-277, 1987. · Zbl 0596.60022 · doi:10.1007/BF00354037
[3] PPB Eggermont and VN LaRiccia., Maximum penalized likelihood estimation, volume 2. Springer, 2009. · Zbl 1184.62063
[4] Jianqing Fan, Chunming Zhang, and Jian Zhang. Generalized likelihood ratio statistics and wilks phenomenon., The Annals of Statistics, pages 153-193, 2001. · Zbl 1029.62042 · doi:10.1214/aos/996986505
[5] Evarist Giné and Richard Nickl., Mathematical foundations of infinite-dimensional statistical models, volume 40. Cambridge University Press, 2016. · Zbl 1358.62014
[6] Chong Gu., Smoothing spline ANOVA models, volume 297. Springer Science & Business Media, 2013. · Zbl 1269.62040
[7] Yuri Ingster and Irina A Suslina., Nonparametric goodness-of-fit testing under Gaussian models, volume 169. Springer Science & Business Media, 2012. · Zbl 05280099
[8] Yuri I Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. i, ii, iii., Math. Methods Statist, 2(2):85-114, 1993. · Zbl 0798.62057
[9] Junwei Lu, Guang Cheng, and Han Liu. Nonparametric heterogeneity testing for massive data., arXiv preprint arXiv:1601.06212, 2016.
[10] Lukas Meier, Sara Van de Geer, Peter Bühlmann, et al. High-dimensional additive modeling., The Annals of Statistics, 37(6B) :3779-3821, 2009. · Zbl 1360.62186 · doi:10.1214/09-AOS692
[11] Shahar Mendelson. Geometric parameters of kernel machines. In, International Conference on Computational Learning Theory, pages 29-43. Springer, 2002. · Zbl 1050.68070
[12] Ha Quang Minh, Partha Niyogi, and Yuan Yao. Mercer’s theorem, feature maps, and smoothing. In, International Conference on Computational Learning Theory, pages 154-168. Springer, 2006. · Zbl 1143.68554
[13] Stanislav Minsker, et al. Distributed statistical estimation and rates of convergence in normal approximation., Electronic Journal of Statistics, 13(2) :5213-5252, 2019. · Zbl 1434.62046 · doi:10.1214/19-EJS1647
[14] Tomaso Poggio and Christian R Shelton. On the mathematical foundations of learning., American Mathematical Society, 39(1):1-49, 2002. · Zbl 0983.68162
[15] Garvesh Raskutti, Martin J Wainwright, and Bin Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming., Journal of Machine Learning Research, 13(Feb):389-427, 2012. · Zbl 1283.62071
[16] Zuofeng Shang and Guang Cheng. Local and global asymptotic inference in smoothing spline models., The Annals of Statistics, 41(5) :2608-2638, 2013. · Zbl 1293.62107 · doi:10.1214/13-AOS1164
[17] Zuofeng Shang and Guang Cheng. Computational limits of a distributed algorithm for smoothing spline., The Journal of Machine Learning Research, 18(1) :3809-3845, 2017. · Zbl 1442.90055
[18] Zuofeng Shang, Botao Hao, and Guang Cheng. Nonparametric bayesian aggregation for massive data., Journal of Machine Learning Research, 20(140):1-81, 2019. · Zbl 1441.62086
[19] Peter Sollich and Christopher KI Williams. Understanding gaussian process regression using the equivalent kernel. In, Deterministic and statistical methods in machine learning, pages 211-228. Springer, 2005. · Zbl 1133.68410
[20] Sanvesh Srivastava, Cheng Li, and David B Dunson. Scalable bayes via barycenter in wasserstein space., The Journal of Machine Learning Research, 19(1):312-346, 2018. · Zbl 1444.62037
[21] Charles J Stone. Additive regression and other nonparametric models., The Annals of Statistics, pages 689-705, 1985. · Zbl 0605.62065 · doi:10.1214/aos/1176349548
[22] Botond Szabó and Harry van Zanten. An asymptotic analysis of distributed nonparametric methods., Journal of Machine Learning Research, 20(87):1-30, 2019. · Zbl 1434.68457
[23] Sebastian Weber, Andrew Gelman, Daniel Lee, Michael Betancourt, Aki Vehtari, and Amy Racine-Poon. Bayesian aggregation of average data: An application in drug development., The Annals of Applied Statistics, 12(3) :1583-1604, 2018. · Zbl 1405.62214 · doi:10.1214/17-AOAS1122
[24] Yuting Wei and Martin J Wainwright. The local geometry of testing in ellipses: Tight control via localized kolmogorov widths., IEEE Transactions on Information Theory, 2020. · Zbl 1446.62126 · doi:10.1109/TIT.2020.2981313
[25] Ganggang Xu, Zuofeng Shang, and Guang Cheng. Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In, International Conference on Machine Learning, volume 80, pages 5483-5491. PMLR, 2018.
[26] Yun Yang, Mert Pilanci, Martin J Wainwright, et al. Randomized sketches for kernels: Fast and optimal nonparametric regression., The Annals of Statistics, 45(3):991 -1023, 2017. · Zbl 1371.62039 · doi:10.1214/16-AOS1472
[27] Yun Yang, Zuofeng Shang, and Guang Cheng. Non-asymptotic theory for nonparametric testing., In Conference on Learning Theory, to appear, 2020.
[28] Ming Yuan, Ding-Xuan Zhou, et al. Minimax optimal rates of estimation in high dimensional additive models., The Annals of Statistics, 44(6) :2564-2593, 2016. · Zbl 1360.62200 · doi:10.1214/15-AOS1422
[29] Tong Zhang. Learning bounds for kernel regression using effective data dimensionality., Neural Computation, 17(9) :2077-2098, 2005. · Zbl 1080.68044 · doi:10.1162/0899766054323008
[30] Yuchen Zhang, John C Duchi, and Martin J Wainwright. Divide and conquer kernel ridge regression. In, Conference on Learning Theory, pages 592-617, 2013. · Zbl 1351.62142
[31] Ding-Xuan Zhou. · Zbl 1016.68044 · doi:10.1006/jcom.2002.0635
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.