×

Learning non-parametric basis independent models from point queries via low-rank methods. (English) Zbl 1373.68336

Summary: We consider the problem of learning multi-ridge functions of the form \(f(\mathbf x)=g(\mathbf {Ax})\) from point evaluations of \(f\). We assume that the function \(f\) is defined on an \(\ell_2\)-ball in \(\mathbb R^d\), \(g\) is twice continuously differentiable almost everywhere, and \(\mathbf A\in\mathbb R^{k\times d}\) is a rank \(k\) matrix, where \(k\ll d\). We propose a randomized, polynomial-complexity sampling scheme for estimating such functions. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive a polynomial time estimator of the function \(f\) along with uniform approximation guarantees. We prove that our scheme can also be applied for learning functions of the form: \(f(\mathbf x)=\sum_{i=1}^k g_i(\mathbf a_i^T\mathbf x)\), provided \(f\) satisfies certain smoothness conditions in a neighborhood around the origin. We also characterize the noise robustness of the scheme. Finally, we present numerical examples to illustrate the theoretical bounds in action.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
41A46 Approximation by arbitrary nonlinear expressions; widths and entropy

Software:

hgam; ADMiRA
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bishop, C. M., Neural Networks for Pattern Recognition (1995), Oxford University Press: Oxford University Press USA
[2] Candès, E. J., Harmonic analysis of neural networks, Appl. Comput. Harmon. Anal., 6, 2, 197-218 (1999) · Zbl 0931.68104
[3] Candès, E. J., Ridgelets: Estimating with ridge functions, Ann. Statist., 31, 5, 1561-1599 (2003) · Zbl 1046.62037
[4] Candès, E. J.; Donoho, D. L., Ridgelets: a key to higher dimensional intermittency?, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 357, 1760, 2495-2509 (1999) · Zbl 1082.42503
[5] Candès, E. J.; Li, X.; Ma, Y.; Wright, J., Robust principal component analysis?, J. ACM, 58, 3, 1-37 (2011) · Zbl 1327.62369
[6] Candès, E. J.; Plan, Y., Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements (2010)
[7] Candès, E. J.; Recht, B., Exact matrix completion via convex optimization, Found. Comput. Math., 9, 6, 717-772 (2009) · Zbl 1219.90124
[8] Candès, E. J.; Tao, T., The power of convex relaxation: near-optimal matrix completion, IEEE Trans. Inform. Theory, 56, 2053-2080 (May 2010)
[9] Cohen, A.; Daubechies, I.; DeVore, R. A.; Kerkyacharian, G.; Picard, D., Capturing ridge functions in high dimensions from point queries, Constr. Approx., 35, 2, 225-243 (2012) · Zbl 1318.62286
[10] DeVore, R.; Lorentz, G. G., Constructive Approximation (1993) · Zbl 0797.41016
[11] Donoho, D. L.; Johnstone, I. M., Projection based regression and a duality with kernel methods, Ann. Statist., 17, 58-106 (1989) · Zbl 0699.62067
[12] Fornasier, M.; Schnass, K.; Vybíral, J., Learning functions of few arbitrary linear parameters in high dimensions, Found. Comput. Math., 12, 2, 229-262 (2012) · Zbl 1252.65036
[13] Friedman, J. H.; Stuetzel, W., Projection pursuit regression, J. Amer. Statist. Assoc., 76, 817-823 (1981)
[14] Hall, P.; Li, K. C., On almost linearity of low dimensional projections from high dimensional data, Ann. Statist., 867-889 (1993) · Zbl 0782.62065
[15] Hardle, W., Applied Nonparametric Regression, vol. 26 (1990), Cambridge Univ. Press
[16] Huber, P. J., Projection pursuit, Ann. Statist., 13, 435-475 (1985) · Zbl 0595.62059
[17] Koltchinskii, V.; Yuan, M., Sparsity in multiple kernel learning, Ann. Statist., 38, 6, 3660-3695 (2010) · Zbl 1204.62086
[18] Laurent, B.; Massart, P., Adaptive estimation of a quadratic functional by model selection, Ann. Statist., 28, 5, 1302-1338 (2000) · Zbl 1105.62328
[19] Lee, K.; Bresler, Y., ADMiRA: Atomic decomposition for minimum rank approximation, IEEE Trans. Inform. Theory, 56, 9, 4402-4416 (2010) · Zbl 1366.94112
[20] Li, K. C., Sliced inverse regression for dimension reduction, J. Amer. Statist. Assoc., 316-327 (1991) · Zbl 0742.62044
[21] Li, Q.; Racine, J., Nonparametric Econometrics: Theory and Practice (2007), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 1183.62200
[22] Lin, Y.; Zhang, H. H., Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34, 5, 2272-2297 (2006) · Zbl 1106.62041
[23] Logan, B. F.; Shepp, L. A., Optimal reconstruction of a function from its projections, Duke Math. J., 42, 645-659 (1975) · Zbl 0343.41020
[24] Meier, L.; Van De Geer, S.; Bühlmann, P., High-dimensional additive modeling, Ann. Statist., 37, 6B, 3779-3821 (2009) · Zbl 1360.62186
[25] Meka, R.; Jain, P.; Dhillon, I. S., Guaranteed rank minimization via singular value projection (2009)
[26] Muller-Gronbach, Th.; Ritter, K., Minimal errors for strong and weak approximation of stochastic differential equations, (Monte Carlo and Quasi-Monte Carlo Methods (2008)), 53-82 · Zbl 1140.65305
[27] Novak, E.; Wozniakowski, H., Approximation of infinitely differentiable multivariate functions is intractable, J. Complexity, 25, 398-404 (August 2009)
[28] Pinkus, A., Approximation theory of the MLP model in neural networks, Acta Numer., 8, 143-195 (1999) · Zbl 0959.68109
[29] Raskutti, G.; Wainwright, M. J.; Yu, B., Minimax-optimal rates for sparse additive models over kernel classes via convex programming (2010), Technical report
[30] Ravikumar, P.; Lafferty, J.; Liu, H.; Wasserman, L., Sparse additive models, J. R. Stat. Soc. Ser. B Stat. Methodol., 71, 5, 1009-1030 (2009) · Zbl 1411.62107
[31] Recht, B.; Fazel, M.; Parrilo, P. A., Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Rev., 52, 471-501 (2010) · Zbl 1198.90321
[32] Rohn, J., A handbook of results on interval linear problems (2005), Czech Academy of Sciences: Czech Academy of Sciences Prague, Czech Republic, Technical report
[33] Rudin, W., Function Theory in the Unit Ball of \(C^n (1980)\), Springer Verlag: Springer Verlag New York/Berlin · Zbl 0495.32001
[34] Traub, J. F.; Wasilkowski, G. W.; Wozniakowski, H., Information-Based Complexity (1988), Academic Press: Academic Press New York · Zbl 0654.94004
[36] Waters, Andrew E.; Sankaranarayanan, Aswin C.; Baraniuk, Richard G., SPARCS: Recovering low-rank and sparse matrices from compressive measurements, (Neural Information Processing Systems (NIPS) (2011))
[37] Wedin, P. A., Perturbation bounds in connection with singular value decomposition, BIT, 12, 99-111 (1972) · Zbl 0239.15015
[38] Weyl, H., Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung), Math. Ann., 71, 441-479 (1912) · JFM 43.0436.01
[39] So, W., Rank one perturbation and its application to the Laplacian spectrum of a graph, Linear Multilinear Algebra, 46, 193-198 (1999) · Zbl 0935.05065
[40] Xia, Y., A multiple-index model and dimension reduction, J. Amer. Statist. Assoc., 103, 484, 1631-1640 (2008) · Zbl 1286.62021
[41] Xia, Y.; Tong, H.; Li, W. K.; Zhu, L. X., An adaptive estimation of dimension reduction space, J. R. Stat. Soc. Ser. B Stat. Methodol., 64, 3, 363-410 (2002) · Zbl 1091.62028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.