Estimation from nonlinear observations via convex programming with application to bilinear regression. (English) Zbl 1420.62088

Summary: We propose a computationally efficient estimator, formulated as a convex program, for a broad class of nonlinear regression problems that involve difference of convex (DC) nonlinearities. The proposed method can be viewed as a significant extension of the “anchored regression” method formulated and analyzed in [S. Bahmani and J. Romberg, Found. Comput. Math. 19, No. 4, 813–841 (2019; Zbl 1422.62235)] for regression with convex nonlinearities. Our main assumption, in addition to other mild statistical and computational assumptions, is availability of a certain approximation oracle for the average of the gradients of the observation functions at a ground truth. Under this assumption and using a PAC-Bayesian analysis we show that the proposed estimator produces an accurate estimate with high probability. As a concrete example, we study the proposed framework in the bilinear regression problem with Gaussian factors and quantify a sufficient sample complexity for exact recovery. Furthermore, we describe a computationally tractable scheme that provably produces the required approximation oracle in the considered bilinear regression problem.


62F10 Point estimation
90C25 Convex programming
62J02 General nonlinear regression


Zbl 1422.62235
Full Text: DOI arXiv Euclid


[1] A. Aghasi, A. Ahmed, and P. Hand. Branchhull: Convex bilinear inversion from the entrywise product of signals with known signs. preprint, arXiv:1702.04342 [cs.IT], 2017.
[2] A. A. Ahmadi and G. Hall. DC decomposition of nonconvex polynomials with algebraic techniques., Mathematical Programming - Series B, 2017. · Zbl 1390.90418
[3] A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming., IEEE Transactions on Information Theory, 60(3) :1711-1732, March 2014. · Zbl 1360.94057
[4] P. Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers., Mathematical Methods of Statistics, 17(4):279-304, Dec 2008. · Zbl 1260.62038
[5] P. Alquier and K. Lounici. PAC-Bayesian bounds for sparse regression estimation with exponential weights., Electronic Journal of Statistics, 5:127-145, 2011. · Zbl 1274.62463
[6] J.-Y. Audibert and O. Catoni. Robust linear least squares regression., the Annals of Statistics, 39(5) :2766-2794, 10 2011. · Zbl 1231.62126
[7] S. Bahmani and J. Romberg. Lifting for blind deconvolution in random mask imaging: Identifiability and convex relaxation., SIAM Journal on Imaging Sciences, 8(4) :2203-2238, 2015. · Zbl 1330.94006
[8] S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252-260, Fort Lauderdale, FL, USA, 20-22 Apr 2017a. PMLR. · Zbl 1408.62032
[9] S. Bahmani and J. Romberg. A flexible convex relaxation for phase retrieval., Electronic Journal of Statistics, 11(2) :5254-5281, 2017b. · Zbl 1408.62032
[10] S. Bahmani and J. Romberg. Solving equations of random convex functions via anchored regression., J. Found. Comp. Math., 2018. In press; preprint arXiv:1702.05327 [cs.LG].
[11] O. Bousquet, V. Koltchinskii, and D. Panchenko. Some local measures of complexity of convex hulls and generalization bounds. In J. Kivinen and R. H. Sloan, editors, Computational Learning Theory, pages 59-73, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. · Zbl 1050.68055
[12] E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms., Information Theory, IEEE Transactions on, 61(4) :1985-2007, Apr. 2015. · Zbl 1359.94069
[13] O. Catoni., PAC-Bayesian supervised classification: the thermodynamics of statistical learning, volume 56 of Lecture Notes-Monograph Series. Institute of Mathematical Statistics, Beachwood, OH, USA, 2007. · Zbl 1277.62015
[14] O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression., arXiv preprint, Dec. 2017. arXiv:1712.02747 [math, stat].
[15] R. Y. Chen, A. Gittens, and J. A. Tropp. The masked sample covariance estimator: An analysis using matrix concentration inequalities., Information and Inference: A Journal of the IMA, 1(1):2-20, 2012. · Zbl 06242993
[16] Y. Chen and E. Candés. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In, Advances in Neural Information Processing Systems 28, pages 739-747. Curran Associates, Inc., 2015.
[17] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In, Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 353-360, New York, NY, USA, 2009. ACM.
[18] E. Giné and V. Koltchinskii. Concentration inequalities and asymptotic results for ratio type empirical processes., Annals of Probability, 34(3) :1143-1216, May 2006. · Zbl 1152.60021
[19] E. Giné, V. Koltchinskii, and J. A. Wellner. Ratio limit theorems for empirical processes. In E. Giné, C. Houdré, and D. Nualart, editors, Stochastic Inequalities and Applications, pages 249-278, Basel, 2003. Birkhäuser Basel. · Zbl 1055.60019
[20] T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273-1281, International Convention Centre, Sydney, Australia, 06-11 Aug 2017. PMLR.
[21] T. Goldstein and C. Studer. Phasemax: Convex phase retrieval via basis pursuit., IEEE Transactions on Information Theory, 64(4) :2675-2689, April 2018. · Zbl 1390.94194
[22] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1, Mar. 2014. URL, http://cvxr.com/cvx.
[23] Gurobi Optimization, Inc. Gurobi optimizer reference manual, 2016. URL, http://www.gurobi.com.
[24] P. Hartman. On functions representable as a difference of convex functions., Pacific J. Math., 9(3):707-713, 1959. · Zbl 0093.06401
[25] M. Junge and Q. Zeng. Noncommutative Bennett and Rosenthal inequalities., Annals of Probability, 41(6) :4287-4316, Nov. 2013. · Zbl 1290.46056
[26] V. Koltchinskii. Rademacher penalties and structural risk minimization., IEEE Transactions on Information Theory, 47(5) :1902-1914, 2001. · Zbl 1008.62614
[27] V. Koltchinskii., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics: École d’Été de Probabilités de Saint-Flour XXXVIII -2008. Springer-Verlag Berlin Heidelberg, 2011. · Zbl 1223.91002
[28] V. Koltchinskii and S. Mendelson. Bounding the smallest singular value of a random matrix without concentration., International Mathematics Research Notices, 2015(23):12991-13008, 2015. · Zbl 1331.15027
[29] V. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. In E. Giné, D. M. Mason, and J. A. Wellner, editors, High Dimensional Probability II, pages 443-457, Boston, MA, 2000. Birkhäuser Boston. · Zbl 1106.68385
[30] J. Langford and J. Shawe-Taylor. PAC-Bayes & margins. In, Advances in Neural Information Processing Systems, pages 439-446, 2003.
[31] M. Ledoux and M. Talagrand., Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013. · Zbl 1226.60003
[32] X. Li, S. Ling, T. Strohmer, and K. Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization., Applied and Computational Harmonic Analysis, 2018.. in press.
[33] S. Ling and T. Strohmer. Regularized gradient descent: A non-convex recipe for fast joint blind deconvolution and demixing., Information and Inference: A Journal of the IMA, 2018.. in press.
[34] W. Luo, W. Alghamdi, and Y. M. Lu. Optimal spectral initialization for signal recovery with applications to phase retrieval. Preprint, arXiv:1811.04420 [cs.IT], 2018.
[35] C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. Preprint, arXiv:1711.10467 [cs.LG], 2017.
[36] L. Mackey, M. I. Jordan, R. Y. Chen, B. Farrell, and J. A. Tropp. Matrix concentration inequalities via the method of exchangeable pairs., Annals of Probability, 42(3):906-945, May 2014. · Zbl 1294.60008
[37] D. McAllester and T. Akinbiyi., PAC-Bayesian Theory, pages 95-103. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
[38] D. A. McAllester. Some PAC-Bayesian theorems., Machine Learning, 37(3):355-363, Dec 1999. · Zbl 0945.68157
[39] S. Mendelson. Learning without concentration. In, Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25-39, 2014. · Zbl 1333.68232
[40] S. Mendelson. Learning without concentration., Journal of the ACM, 62(3):21:1-21:25, June 2015. ISSN 0004-5411. · Zbl 1333.68232
[41] M. Mondelli and A. Montanari. Fundamental limits of weak recovery with applications to phase retrieval. In, Proceedings of the 31st Conference On Learning Theory (COLT), volume 75 of Proceedings of Machine Learning Research, pages 1445-1450. PMLR, 2018. · Zbl 1464.62296
[42] P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating minimization. In, Advances in Neural Information Processing Systems 26, pages 2796-2804. Curran Associates, Inc., 2013. · Zbl 1394.94421
[43] R. I. Oliveira. The lower tail of random quadratic forms with applications to ordinary least squares., Probability Theory and Related Fields, 166(3) :1175-1194, Dec 2016. · Zbl 1360.60075
[44] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach., IEEE Transactions on Information Theory, 59(1):482-494, Jan 2013. · Zbl 1364.94153
[45] Y. Plan and R. Vershynin. The generalized lasso with non-linear observations., IEEE Transactions on Information Theory, 62(3) :1528-1537, March 2016. · Zbl 1359.94153
[46] A. W. van Der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996. · Zbl 0862.60002
[47] V. N. Vapnik., Statistical learning theory. Wiley, 1998. · Zbl 0935.62007
[48] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities., Theory of Probability & Its Applications, 16(2):264-280, 1971. · Zbl 0247.60005
[49] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In G. Kutyniok and Y. Eldar, editors, Compressed Sensing, Theory and Applications, pages 210-268. Cambridge University Press, 2012.
[50] Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis-Kahan theorem for statisticians., Biometrika, 102(2):315-323, 2015. · Zbl 1452.15010
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.