Bayesian manifold regression. (English) Zbl 1341.62196

Summary: There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. When the number of predictors \(D\) is large, one encounters a daunting problem in attempting to estimate a \(D\)-dimensional surface based on limited data. Fortunately, in many applications, the support of the data is concentrated on a \(d\)-dimensional subspace with \(d\ll D\). Manifold learning attempts to estimate this subspace. Our focus is on developing computationally tractable and theoretically supported Bayesian nonparametric regression methods in this context. When the subspace corresponds to a locally-Euclidean compact Riemannian manifold, we show that a Gaussian process regression approach can be applied that leads to the minimax optimal adaptive rate in estimating the regression function under some conditions. The proposed model bypasses the need to estimate the manifold, and can be implemented using standard algorithms for posterior computation in Gaussian processes. Finite sample performance is illustrated in a data analysis example.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62G99 Nonparametric inference
68T05 Learning and adaptive systems in artificial intelligence


Full Text: DOI arXiv Euclid


[1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. · Zbl 0037.20701
[2] Belkin, M. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373-1396. · Zbl 1085.68119
[3] Bhattacharya, A., Pati, D. and Dunson, D. (2014). Anisotropic function estimation using multi-bandwidth Gaussian processes. Ann. Statist. 42 352-381. · Zbl 1360.62168
[4] Bickel, P. J. and Kleijn, B. J. K. (2012). The semiparametric Bernstein-von Mises theorem. Ann. Statist. 40 206-237. · Zbl 1246.62081
[5] Bickel, P. J. and Li, B. (2007). Local polynomial regression on unknown manifolds. In Complex Datasets and Inverse Problems. Institute of Mathematical Statistics Lecture Notes-Monograph Series 54 177-186. IMS, Beachwood, OH.
[6] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2007). Universal algorithms for learning theory. II. Piecewise polynomial functions. Constr. Approx. 26 127-152. · Zbl 1191.62067
[7] Binev, P., Cohen, A., Dahmen, W., DeVore, R. and Temlyakov, V. (2005). Universal algorithms for learning theory. I. Piecewise constant functions. J. Mach. Learn. Res. 6 1297-1321. · Zbl 1191.62068
[8] Camastra, F. and Vinviarelli, A. (2002). Estimating the intrinsic dimension of data with a fractal-based method. IEEE P.A.M.I. 24 1404-1407.
[9] Carter, K. M., Raich, R. and Hero, A. O. III (2010). On local intrinsic dimension estimation and its applications. IEEE Trans. Signal Process. 58 650-663. · Zbl 1392.94122
[10] Castillo, I., Kerkyacharian, G. and Picard, D. (2013). Thomas Bayes’ walk on manifolds. Probab. Theory Related Fields 158 665-710. · Zbl 1285.62028
[11] Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. and Carin, L. (2010). Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Process. 58 6140-6155. · Zbl 1392.94139
[12] Farahmand, A. M., Szepesvái, C. and Audibert, J. (2007). Manifold-adaptive dimension estimation. In ICML 2007 265-272. ACM Press, New York.
[13] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. · Zbl 1105.62315
[14] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192-223. · Zbl 1114.62060
[15] Giné, E. and Nickl, R. (2011). Rates on contraction for posterior distributions in \(L^{r}\)-metrics, \(1\leq r\leq\infty\). Ann. Statist. 39 2883-2911. · Zbl 1246.62095
[16] Kpotufe, S. (2009). Escaping the curse of dimensionality with a tree-based regressor. In COLT 2009 -The 22 nd Conference on Learning Theory , June 18 - 21. Montreal, QC.
[17] Kpotufe, S. and Dasgupta, S. (2012). A tree-based regressor that adapts to intrinsic dimension. J. Comput. System Sci. 78 1496-1515. · Zbl 1435.62143
[18] Kundu, S. and Dunson, D. B. (2011). Latent factor models for density estimation. Available at . arXiv:1108.2720v2 · Zbl 1334.62051
[19] Lawrence, N. D. (2003). Gaussian process latent variable models for visualisation of high dimensional data. Neural Information Processing Systems 16 329-336.
[20] Levina, E. and Bickel, P. (2004). Maximun likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing Systems 17 . MIT Press, Cambridge, MA.
[21] Lin, L. and Dunson, D. B. (2014). Bayesian monotone regression using Gaussian process projection. Biometrika 101 303-317. · Zbl 1452.62285
[22] Little, A. V., Lee, J., Jung, Y. M. and Maggioni, M. (2009). Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD. In 2009 IEEE/SP 15 th Workshop on Statistical Signal Processing 85-88. IEEE, Cardiff.
[23] Nene, S. A., Nayar, S. K. and Murase, H. (1996). Columbia object image library (COIL-100). Technical report, Columbia Univ., New York.
[24] Page, G., Bhattacharya, A. and Dunson, D. (2013). Classification via Bayesian nonparametric learning of affine subspaces. J. Amer. Statist. Assoc. 108 187-201. · Zbl 06158335
[25] Reich, B. J., Bondell, H. D. and Li, L. (2011). Sufficient dimension reduction via Bayesian mixture modeling. Biometrics 67 886-895. · Zbl 1226.62023
[26] Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323-2326.
[27] Savitsky, T., Vannucci, M. and Sha, N. (2011). Variable selection for nonparametric Gaussian process priors: Models and computational strategies. Statist. Sci. 26 130-149. · Zbl 1222.65017
[28] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053. · Zbl 0511.62048
[29] Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319-2323.
[30] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 273-282. · Zbl 0850.62538
[31] Tokdar, S. T., Zhu, Y. M. and Ghosh, J. K. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319-344. · Zbl 1330.62182
[32] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press, Cambridge. · Zbl 1179.62073
[33] van der Vaart, A. and van Zanten, H. (2011). Information rates of nonparametric Gaussian process methods. J. Mach. Learn. Res. 12 2095-2119. · Zbl 1280.68228
[34] van der Vaart, A. W. and van Zanten, J. H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics : Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. Collect. 3 200-222. IMS, Beachwood, OH.
[35] van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist. 37 2655-2675. · Zbl 1173.62021
[36] Yang, Y. and Dunson, D. B. (2015). Supplement to “Bayesian manifold regression.” .
[37] Ye, G.-B. and Zhou, D.-X. (2008). Learning and approximation by Gaussians on Riemannian manifolds. Adv. Comput. Math. 29 291-310. · Zbl 1156.68045
[38] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.