zbMATH — the first resource for mathematics

Adaptive Bayesian nonparametric regression using a kernel mixture of polynomials with application to partial linear models. (English) Zbl 1437.62155
Summary: We propose a kernel mixture of polynomials prior for Bayesian nonparametric regression. The regression function is modeled by local averages of polynomials with kernel mixture weights. We obtain the minimax-optimal contraction rate of the full posterior distribution up to a logarithmic factor by estimating metric entropies of certain function classes. Under the assumption that the degree of the polynomials is larger than the unknown smoothness level of the true function, the posterior contraction behavior can adapt to this smoothness level provided an upper bound is known. We also provide a frequentist sieve maximum likelihood estimator with a near-optimal convergence rate. We further investigate the application of the kernel mixture of polynomials to partial linear models and obtain both the near-optimal rate of contraction for the nonparametric component and the Bernstein-von Mises limit (i.e., asymptotic normality) of the parametric component. The proposed method is illustrated with numerical examples and shows superior performance in terms of computational efficiency, accuracy, and uncertainty quantification compared to the local polynomial regression, DiceKriging, and the robust Gaussian stochastic process.
62G08 Nonparametric regression and quantile regression
62J02 General nonlinear regression
60G15 Gaussian processes
Full Text: DOI Euclid
[1] Affandi, R. H., Fox, E., and Taskar, B. (2013). “Approximate inference in continuous determinantal processes.” In Advances in Neural Information Processing Systems, 1430-1438.
[2] Bhattacharya, A., Pati, D., and Dunson, D. (2014). “Anisotropic function estimation using multi-bandwidth Gaussian processes.” Annals of Statistics, 42(1): 352. · Zbl 1360.62168
[3] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet-Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479-1490. · Zbl 1373.62368
[4] Bickel, P., Kleijn, B., et al. (2012). “The semiparametric Bernstein-von Mises theorem.” The Annals of Statistics, 40(1): 206-237. · Zbl 1246.62081
[5] Bickel, P. J., Klaassen, C. A., Ritov, Y., Wellner, J. A., et al. (1998). “Efficient and adaptive estimation for semiparametric models.” · Zbl 0894.62005
[6] Cabrera, J. (2012). “locpol: Kernel local polynomial regression. R package version 0.4-0.”
[7] Castillo, I. and van der Vaart, A. (2012). “Needles and straw in a haystack: Posterior concentration for possibly sparse sequences.” The Annals of Statistics, 40(4): 2069-2101. · Zbl 1257.62025
[8] Celeux, G., Hurn, M., and Robert, C. P. (2000). “Computational and inferential difficulties with mixture posterior distributions.” Journal of the American Statistical Association, 95(451): 957-970. · Zbl 0999.62020
[9] Chen, H. et al. (1988). “Convergence rates for parametric components in a partly linear model.” The Annals of Statistics, 16(1): 136-146. · Zbl 0637.62067
[10] Choi, T. and Woo, Y. (2015). “A partially linear model using a Gaussian process prior.” Communications in Statistics-Simulation and Computation, 44(7): 1770-1786. · Zbl 1327.62137
[11] De Jonge, R., Van Zanten, J., et al. (2010). “Adaptive nonparametric Bayesian inference using location-scale mixture priors.” The Annals of Statistics, 38(6): 3300-3320. · Zbl 1204.62062
[12] De Jonge, R., Van Zanten, J., et al. (2012). “Adaptive estimation of multivariate functions using conditionally Gaussian tensor-product spline priors.” Electronic Journal of Statistics, 6: 1984-2001. · Zbl 1295.62007
[13] Devroye, L., Györfi, L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media. · Zbl 0853.68150
[14] Doob, J. L. (1949). “Application of the theory of martingales.” Le calcul des probabilites et ses applications, 23-27.
[15] Engle, R. F., Granger, C. W., Rice, J., and Weiss, A. (1986). “Semiparametric estimates of the relation between weather and electricity sales.” Journal of the American statistical Association, 81(394): 310-320.
[16] Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications: monographs on statistics and applied probability 66, volume 66. CRC Press.
[17] Fan, Y. and Li, Q. (1999). “Root-n-consistent estimation of partially linear time series models.” Journal of Nonparametric Statistics, 11(1-3): 251-269. · Zbl 0953.62094
[18] Friedman, J. H. and Stuetzle, W. (1981). “Projection pursuit regression.” Journal of the American statistical Association, 76(376): 817-823.
[19] Gao, C. and Zhou, H. H. (2015). “Rate-optimal posterior contraction for sparse PCA.” The Annals of Statistics, 43(2): 785-818. · Zbl 1312.62078
[20] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. CRC press Boca Raton, FL. · Zbl 1279.62004
[21] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). “Convergence rates of posterior distributions.” Annals of Statistics, 28(2): 500-531. · Zbl 1105.62315
[22] Ghosal, S. and van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press. · Zbl 1376.62004
[23] Ghosal, S., van der Vaart, A., et al. (2007a). “Convergence rates of posterior distributions for non-i.i.d observations.” The Annals of Statistics, 35(1): 192-223. · Zbl 1114.62060
[24] Ghosal, S., van der Vaart, A., et al. (2007b). “Posterior convergence rates of Dirichlet mixtures at smooth densities.” The Annals of Statistics, 35(2): 697-723. · Zbl 1117.62046
[25] Ghosal, S. and van der Vaart, A. W. (2001). “Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities.” Annals of Statistics, 29(5): 1233-1263. · Zbl 1043.62025
[26] Gu, M., Palomo, J., and Berger, J. O. (2016). “Robust GaSP: an R Package for objective Bayesian emulation of complex computer model codes.” Technical Report.
[27] Gu, M., Wang, X., and Berger, J. O. (2017). “Robust Gaussian Stochastic Process Emulation.” arXiv preprint arXiv:1708.04738. · Zbl 1408.62155
[28] Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2006). A distribution-free theory of nonparametric regression. Springer Science & Business Media. · Zbl 1021.62024
[29] Hastie, T. and Tibshirani, R. (1990). Generalized additive models. Wiley Online Library. · Zbl 0747.62061
[30] Hayfield, T., Racine, J. S., et al. (2008). “Nonparametric econometrics: The np package.” Journal of Statistical Software, 27(5): 1-32.
[31] Ichimura, H. (1993). “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models.” Journal of Econometrics, 58(1-2): 71-120. · Zbl 0816.62079
[32] Jasra, A., Holmes, C. C., and Stephens, D. A. (2005). “Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling.” Statistical Science, 50-67. · Zbl 1100.62032
[33] Klein, R. W. and Spady, R. H. (1993). “An efficient semiparametric estimator for binary response models.” Econometrica: Journal of the Econometric Society, 387-421. · Zbl 0783.62100
[34] Knapik, B. T., van der Vaart, A. W., van Zanten, J. H., et al. (2011). “Bayesian inverse problems with Gaussian priors.” The Annals of Statistics, 39(5): 2626-2657. · Zbl 1232.62079
[35] Kruijer, W., Rousseau, J., van der Vaart, A., et al. (2010). “Adaptive Bayesian density estimation with location-scale mixtures.” Electronic Journal of Statistics, 4: 1225-1257. · Zbl 1329.62188
[36] Lenk, P. J. (1999). “Bayesian inference for semiparametric regression using a Fourier representation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(4): 863-879. · Zbl 0940.62021
[37] Nadaraya, E. A. (1964). “On estimating regression.” Theory of Probability & Its Applications, 9(1): 141-142.
[38] Nocedal, J. and Wright, S. (2006). Numerical optimization. Springer Science & Business Media. · Zbl 1104.65059
[39] Pati, D., Bhattacharya, A., Pillai, N. S., and Dunson, D. (2014). “Posterior contraction in sparse Bayesian factor models for massive covariance matrices.” The Annals of Statistics, 42(3): 1102-1130. · Zbl 1305.62124
[40] Plumlee, M. (2017). “Bayesian calibration of inexact computer models.” Journal of the American Statistical Association, 1-12.
[41] Plumlee, M. and Joseph, V. R. (2016). “Orthogonal Gaussian process models.” arXiv preprint arXiv:1611.00203. · Zbl 1390.62047
[42] Rasmussen, C. E. and Williams, C. K. (2006). Gaussian processes for machine learning, volume 1. MIT press Cambridge. · Zbl 1177.68165
[43] Robinson, P. M. (1988). “Root-N-consistent semiparametric regression.” Econometrica: Journal of the Econometric Society, 931-954. · Zbl 0647.62100
[44] Ročková, V. (2018). “Bayesian estimation of sparse signals with a continuous spike-and-slab prior.” The Annals of Statistics, 46(1): 401-437. · Zbl 1395.62230
[45] Rockova, V. and van der Pas, S. (2017). “Posterior concentration for Bayesian regression trees and their ensembles.” arXiv preprint arXiv:1708.08734.
[46] Roustant, O., Ginsbourger, D., and Deville, Y. (2012). “DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodelling and optimization.” Journal of Statistical Software, 51(1): 54p.
[47] Shen, W., Tokdar, S. T., and Ghosal, S. (2013). “Adaptive Bayesian multivariate density estimation with Dirichlet mixtures.” Biometrika, 100(3): 623-640. · Zbl 1284.62183
[48] Shen, X. and Wong, W. H. (1994). “Convergence rate of sieve estimates.” The Annals of Statistics, 580-615. · Zbl 0805.62008
[49] Speckman, P. (1988). “Kernel smoothing in partial linear models.” Journal of the Royal Statistical Society. Series B (Methodological), 413-436. · Zbl 0671.62045
[50] Stone, C. J. (1982). “Optimal global rates of convergence for nonparametric regression.” The annals of statistics, 1040-1053. · Zbl 0511.62048
[51] Szabó, B., van der Vaart, A. W., and van Zanten, J. H. (2015). “Frequentist coverage of adaptive nonparametric Bayesian credible sets.” Ann. Statist., 43(4): 1391-1428. · Zbl 1317.62040
[52] Takeda, H., Farsiu, S., and Milanfar, P. (2007). “Kernel regression for image processing and reconstruction.” IEEE Transactions on image processing, 16(2): 349-366.
[53] Tang, Y., Sinha, D., Pati, D., Lipsitz, S., and Lipshultz, S. (2015). “Bayesian partial linear model for skewed longitudinal data.” Biostatistics, 16(3): 441-453.
[54] Tuo, R. and Wu, C. J. (2015). “Efficient calibration for imperfect computer models.” The Annals of Statistics, 43(6): 2331-2352. · Zbl 1326.62228
[55] van der Vaart, A. and van Zanten, H. (2007). “Bayesian inference with rescaled Gaussian process priors.” Electronic Journal of Statistics, 1: 433-448. · Zbl 1140.62066
[56] van der Vaart, A. and Zanten, H. v. (2011). “Information rates of nonparametric Gaussian process methods.” Journal of Machine Learning Research, 12(Jun): 2095-2119. · Zbl 1280.68228
[57] van der Vaart, A. W. and van Zanten, J. H. (2008). “Rates of contraction of posterior distributions based on Gaussian process priors.” The Annals of Statistics, 1435-1463. · Zbl 1141.60018
[58] van der Vaart, A. W. and van Zanten, J. H. (2009). “Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth.” The Annals of Statistics, 2655-2675. · Zbl 1173.62021
[59] Watson, G. S. (1964). “Smooth regression analysis.” Sankhyā: The Indian Journal of Statistics, Series A, 359-372. · Zbl 0137.13002
[60] Wooldridge, J. M. (2015). Introductory econometrics: A modern approach. Nelson Education.
[61] Xie, F., Jin, W., and Xu, Y. (2017). “A Theoretical Framework for Bayesian Nonparametric Regression: Orthonormal Random Series and Rates of Contraction.” arXiv preprint arXiv:1712.05731.
[62] Xie, F. and Xu, Y. (2017). “Bayesian Repulsive Gaussian Mixture Model.” arXiv preprint arXiv:1703.09061.
[63] Xie, F. and Xu, Y. (2019). “Supplementary Material for “Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models”.” Bayesian Analysis.
[64] Xu, Y., Mueller, P., and Telesca, D. (2016a). “Bayesian inference for latent biologic structure with determinantal point processes (DPP).” Biometrics, 72(3): 955-964. · Zbl 1390.62320
[65] Xu, Y., Xu, Y., and Saria, S. (2016b). “A Bayesian Nonparametric Approach for Estimating Individualized Treatment-Response Curves.” In Machine Learning for Healthcare Conference, 282-300.
[66] Yang, Y., Bhattacharya, A., and Pati, D. (2017). “Frequentist coverage and sup-norm convergence rate in Gaussian process regression.” arXiv preprint arXiv:1708.04753.
[67] Yang, Y., Cheng, G., and Dunson, D. B. (2015). “Semiparametric Bernstein-von Mises Theorem: Second Order Studies.” arXiv preprint arXiv:1503.04493.
[68] Yoo, W. W., Ghosal, S., et al. (2016). “Supremum norm posterior contraction and credible sets for nonparametric multivariate regression.” The Annals of Statistics, 44(3): 1069-1102. · Zbl 1338.62121
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.