zbMATH — the first resource for mathematics

Solving equations of random convex functions via anchored regression. (English) Zbl 1422.62235
Summary: We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

62J02 General nonlinear regression
62F10 Point estimation
90C25 Convex programming
Full Text: DOI arXiv
[1] Ahmed, A.; Recht, B.; Romberg, J., Blind deconvolution using convex programming, IEEE Trans. Inform. Theory, 60, 1711-1732, (2014) · Zbl 1360.94057
[2] S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252-260. PMLR, 2017. · Zbl 1408.62032
[3] Bahmani, S.; Romberg, J., A flexible convex relaxation for phase retrieval, Elect. J. Stat., 11, 5254-5281, (2017) · Zbl 1408.62032
[4] Bahmani, S.; Raj, B.; Boufounos, PT, Greedy sparsity-constrained optimization, J. Machine Learning Research, 14, 807-841, (2013) · Zbl 1320.90046
[5] Beck, A.; Eldar, YC, Sparsity constrained nonlinear optimization: Optimality conditions and algorithms, SIAM J. Optim., 23, 1480-1509, (2013) · Zbl 1295.90051
[6] Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Journal of Machine Learning Research W&CP, volume 30 of Proceedings of the 26th Conference on Learning Theory (COLT’13), pages 1046-1066, 2013.
[7] Blumensath, T., Compressed sensing with nonlinear observations and related nonlinear optimization problems, IEEE Trans. Inform. Theory, 59, 3466-3474, (2013) · Zbl 1364.94111
[8] Candès, E.; Li, X., Solving quadratic equations via PhaseLift when there are about as many equations as unknowns, Found. of Comput. Math., 14, 1017-1026, (2014) · Zbl 1312.90054
[9] Candès, E.; Strohmer, T.; Voroninski, V., Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming, Comm. Pure Appl. Math., 66, 1241-1274, (2013) · Zbl 1335.94013
[10] E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985-2007, Apr. 2015. · Zbl 1359.94069
[11] V. H. de la Peña and E. Giné. Decoupling: From dependence to independence. Probability and its Applications. Springer-Verlag, New York, 1999.
[12] Dümbgen, L.; van de Geer, S. A.; Veraar, M. C.; Wellner, J. A., Nemirovski’s inequalities revisited, American mathematical monthly, 117, 138-160, (2010) · Zbl 1213.60039
[13] Ehler, M.; Fornasier, M.; Sigl, J., Quasi-linear compressed sensing, SIAM J. Multiscale Model. Simul., 12, 725-754, (2014) · Zbl 1380.94050
[14] T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273-1281. PMLR, 2017.
[15] P. Hand and V. Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. preprint arXiv:1611.05985, 2016.
[16] P. Hand and V. Voroninski. Corruption robust phase retrieval via linear programming. preprint arXiv:1612.03547, 2016. · Zbl 1441.94040
[17] P. Hand and V. Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. preprint arXiv:1611.03935, 2016. · Zbl 1441.94040
[18] S. Haykin. Neural Networks and Learning Machines. Pearson, Upper Saddle River, NJ, USA, 3rd edition, 2009.
[19] Hillar, C. J.; Lim, L., Most tensor problems are NP-hard, Journal of the ACM, 60, 45:1-45:39, (2013) · Zbl 1281.68126
[20] Ichimura, H., Semiparametric least squares (SLS) and weighted SLS estimation of single-index models, Journal of Econometrics, 58, 71-120, (1993) · Zbl 0816.62079
[21] Johnstone, IM; Lu, AY, On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association, 104, 682-693, (2009) · Zbl 1388.62174
[22] Koltchinskii, V.; Mendelson, S., Bounding the smallest singular value of a random matrix without concentration, International Mathematics Research Notices, 2015, 12991-13008, (2015) · Zbl 1331.15027
[23] Lecué, G.; Mendelson, S., Regularization and the small-ball method II: Complexity dependent error rates, Journal of Machine Learning Research, 18, 1-48, (2017) · Zbl 1444.62051
[24] G. Lecué and S. Mendelson. Regularization and the small-ball method I: Sparse recovery. Ann. Statist., 46(2):611-641, 04 2018. · Zbl 1403.60085
[25] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013. · Zbl 1226.60003
[26] W. V. Li and A. Wei. Gaussian integrals involving absolute value functions, volume Volume 5 of Collections, pages 43-59. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009. · Zbl 1243.60043
[27] Ling, S.; Strohmer, T., Self-calibration and biconvex compressive sensing, Inverse Problems, 31, 115002, (2015) · Zbl 1327.93183
[28] P. Mccullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on statistics and applied probability. Chapman and Hall/CRC, London ; New York, 2nd edition, 1989. · Zbl 0744.62098
[29] McDiarmid, C., On the method of bounded differences, Surveys in combinatorics, 141, 148-188, (1989) · Zbl 0712.05012
[30] S. Mendelson. Learning without concentration. In Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25-39, 2014. · Zbl 1333.68232
[31] S. Mendelson. Learning without concentration for general loss functions. preprint; arXiv:1410.3192, 2014.
[32] A. Nemirovski. Topics in Non-parametric Statistics, chapter 5, pages 183-206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.
[33] Oymak, S.; Jalali, A.; Fazel, M.; Eldar, Y.; Hassibi, B., Simultaneously structured models with application to sparse and low-rank matrices, Information Theory, IEEE Transactions on, 61, 2886-2908, (2015) · Zbl 1359.94150
[34] Paley, REAC; Zygmund, A., A note on analytic functions in the unit circle, Mathematical Proceedings of the Cambridge Philosophical Society, 28, 266-272, (1932) · Zbl 0005.06602
[35] Y. Plan and R. Vershynin. The generalized LASSO with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528-1537, Mar. 2016. · Zbl 1359.94153
[36] Y. Plan, R. Vershynin, and E. Yudovina. High-dimensional estimation with geometric constraints. Information and Inference, 2016. · Zbl 1383.62121
[37] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014. · Zbl 1305.68005
[38] Shechtman, Y.; Beck, A.; Eldar, YC, GESPAR: Efficient phase retrieval of sparse signals, IEEE Trans. Sig. Proc., 62, 928-938, (2014) · Zbl 1394.94522
[39] M. Soltanolkotabi. Learning ReLUs via gradient descent. In Advances in Neural Information Processing Systems, volume 30, pages 2007-2017. Curran Associates, Inc., 2017.
[40] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 2 of Probability Theory, pages 583-602, Berkeley, Calif., 1972. University of California Press.
[41] J. A. Tropp. Convex Recovery of a Structured Signal from Independent Random Linear Measurements, pages 67-101. Springer International Publishing, Cham, 2015. · Zbl 1358.94034
[42] A. W. van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996. · Zbl 0862.60002
[43] Yu, Y.; Wang, T.; Samworth, R. J., A useful variant of the Davis-Kahan theorem for statisticians., Biometrika, 102, 315-323, (2014) · Zbl 1452.15010
[44] Zou, H.; Hastie, T.; Tibshirani, R., Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15, 265-286, (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.