Qi, Zhengling; Cui, Ying; Liu, Yufeng; Pang, Jong-Shi Asymptotic properties of stationary solutions of coupled nonconvex nonsmooth empirical risk minimization. (English) Zbl 07592368 Math. Oper. Res. 47, No. 3, 2034-2064 (2022). Summary: This paper has two main goals: (a) establish several statistical properties – consistency, asymptotic distributions, and convergence rates – of stationary solutions and values of a class of coupled nonconvex and nonsmooth empirical risk-minimization problems and (b) validate these properties by a noisy amplitude-based phase-retrieval problem, the latter being of much topical interest. Derived from available data via sampling, these empirical risk-minimization problems are the computational workhorse of a population risk model that involves the minimization of an expected value of a random functional. When these minimization problems are nonconvex, the computation of their globally optimal solutions is elusive. Together with the fact that the expectation operator cannot be evaluated for general probability distributions, it becomes necessary to justify whether the stationary solutions of the empirical problems are practical approximations of the stationary solution of the population problem. When these two features, general distribution and nonconvexity, are coupled with nondifferentiability that often renders the problems “non-Clarke regular,” the task of the justification becomes challenging. Our work aims to address such a challenge within an algorithm-free setting. The resulting analysis is, therefore, different from much of the analysis in the recent literature that is based on local search algorithms. Furthermore, supplementing the classical global minimizer-centric analysis, our results offer a promising step to close the gap between computational optimization and asymptotic analysis of coupled, nonconvex, nonsmooth statistical estimation problems, expanding the former with statistical properties of the practically obtained solution and providing the latter with a more practical focus pertaining to computational tractability. MSC: 62F12 Asymptotic properties of parametric estimators 90C26 Nonconvex programming, global optimization 49J52 Nonsmooth analysis Keywords:asymptotic distribution; consistency; convergence rates; directional stationarity; nonconvexity; nonsmoothness; phase-retrieval problem; statistical analysis Software:Wirtinger Flow PDFBibTeX XMLCite \textit{Z. Qi} et al., Math. Oper. Res. 47, No. 3, 2034--2064 (2022; Zbl 07592368) Full Text: DOI arXiv References: [1] [1] Attouch H (1984) Variational Convergence for Functions and Operators (Pitman Press, Boston).Google Scholar · Zbl 0561.49012 [2] [2] Bryc W (2012) The Normal Distribution: Characterizations with Applications (Springer Science & Business Media, New York).Google Scholar [3] [3] Candes EJ, Li X, Soltanolkotabi M (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985-2007.Crossref, Google Scholar · Zbl 1359.94069 [4] [4] Chen Y, Chi Y, Fan J, Ma C (2019) Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Programming 176:5-37.Crossref, Google Scholar · Zbl 1415.90086 [5] [5] Chernoff H (1954) On the distribution of the likelihood ratio. Ann. Math. Statist. 25(3):573-578.Crossref, Google Scholar · Zbl 0056.37102 [6] [6] Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463-483.Crossref, Google Scholar · Zbl 0059.13203 [7] [7] Clarke FH (1990) Optimization and Nonsmooth Analysis. Classics in Applied Mathematics. SIAM, vol. 5, (Reprint from John Wiley, New York).Crossref, Google Scholar · Zbl 0696.49002 [8] [8] Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4):3344-3374.Crossref, Google Scholar · Zbl 1407.62250 [9] [9] Cui Y, Chang TH, Hong M, Pang JS (2020) A study of piecewise linear-quadratic programs. J. Optim. Theory Appl. 186:523-553.Crossref, Google Scholar · Zbl 1491.90110 [10] [10] Davis D, Drusvyatskiy D (2018) Graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted October 17, https://arxiv.org/abs/1810.07590.Google Scholar [11] [11] Davis D, Drusvyatskiy D, Paquette C (2018) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652-2695.Crossref, Google Scholar · Zbl 1464.65063 [12] [12] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119-154.Crossref, Google Scholar · Zbl 1433.65141 [13] [13] Dentcheva D, Penev S, Ruszczynski A (2017) Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Statist. Math. 69:737-760.Crossref, Google Scholar · Zbl 1447.62119 [14] [14] Duchi J, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471-529.Crossref, Google Scholar · Zbl 1478.90084 [15] [15] Dunson D, Hannah LA (2013) Multivariate convex regression with adaptive partitioning. J. Machine Learn. Res. 14:3261-3294.Google Scholar · Zbl 1318.62225 [16] [16] Dupačová J, Wets RJ-B (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517-1549.Crossref, Google Scholar · Zbl 0667.62018 [17] [17] Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer, New York).Google Scholar · Zbl 1062.90001 [18] [18] Ferguson TS (1996) A Course in Large Sample Theory (Routledge).Crossref, Google Scholar · Zbl 0871.62002 [19] [19] Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. A 222:594-604.Google Scholar [20] [20] Fisher RA (1925) Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22:700-725.Crossref, Google Scholar · JFM 51.0385.01 [21] [21] Geyer CJ (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993-2010.Crossref, Google Scholar · Zbl 0829.62029 [22] [22] Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Proc. 14th Internat. Conf. Artificial Intelligence Statist. 315-323.Google Scholar [23] [23] Gürkan G, Yonca Özge A, Robinson S (1999) Sample-path solution of stochastic variational inequalities. Math. Programming 84:313-333.Crossref, Google Scholar · Zbl 0972.90079 [24] [24] Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. Le Cam LM, Neyman J, eds. Proc. Fifth Berkeley Sympos. Math. Statist. Probab., vol. 1 (University of California Press), 221-233.Google Scholar · Zbl 0212.21504 [25] [25] Jin C, Liu L, Ge R, Jordan M (2018) On the local minima of the empirical risk. Proc. Neural Inform. Processing Systems (NIPS).Google Scholar [26] [26] King AJ (1993) Asymptotic behaviour of solutions in stochastic optimization: nonsmooth analysis and the derivation of non-normal limit distributions. Unpublished PhD dissertation, University of Washington, Seattle.Google Scholar [27] [27] King A, Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148-162.Link, Google Scholar · Zbl 0798.90115 [28] [28] LeCam L (1970) On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41(3):802-828.Crossref, Google Scholar · Zbl 0246.62039 [29] [29] Le Thi HA, Pham DT (2005) The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133:25-46.Google Scholar · Zbl 1116.90122 [30] [30] Loh PL (2017) Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Statist. 45(2):866-896.Crossref, Google Scholar · Zbl 1371.62023 [31] [31] Loh PL, Wainwright M (2012) High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40(3):1637-1664.Crossref, Google Scholar · Zbl 1257.62063 [32] [32] Loh PL, Wainwright M (2015) Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res. 16(1):559-616.Google Scholar · Zbl 1360.62276 [33] [33] Lu Z, Zhou Z, Sun Z (2019) Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Programming 176:369-401.Crossref, Google Scholar · Zbl 1415.90091 [34] [34] Luedtke AR, Van Der Laan MJ (2016) Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Statist. 44(2):713-742.Crossref, Google Scholar · Zbl 1338.62089 [35] [35] Ma J, Xu J, Maleki A (2019) Optimization-based AMP for phase retrieval: The impact of initialization and ℓ2 regularization. IEEE Trans. Inform. Theory 65(6):3600-3629.Crossref, Google Scholar · Zbl 1432.94032 [36] [36] Mei S, Bai Y, Montanari A (2018) The landscape of empirical risk for non-convex losses. Ann. Statist. 46:2747-2774.Crossref, Google Scholar · Zbl 1409.62117 [37] [37] Molchanov I (2005) Theory of Random Sets, vol. 19, no. 2 (Springer, London).Google Scholar [38] [38] Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Proc. 27th Internat. Conf. Machine Learn., 807-814.Google Scholar [39] [39] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574-1609.Crossref, Google Scholar · Zbl 1189.90109 [40] [40] Pang JS, Razaviyayn M, Alvarado A (2016) Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1):95-118.Link, Google Scholar · Zbl 1359.90106 [41] [41] Polyak B (1990) New stochastic approximation type procedures. Avtomatica i Telemekhanika 51(7):98-107.Google Scholar [42] [42] Polyak B, Juditsky A (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838-855.Crossref, Google Scholar · Zbl 0762.62022 [43] [43] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400-407.Crossref, Google Scholar · Zbl 0054.05901 [44] [44] Rockafellar RT (1970) Convex Analysis (Princeton University Press, Princeton).Crossref, Google Scholar · Zbl 0932.90001 [45] [45] Rockafellar RT, Wets RJ-B (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar · Zbl 0888.49001 [46] [46] Royset J (2020) Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Programming Ser. A 184:289-318.Crossref, Google Scholar · Zbl 1451.90108 [47] [47] Royset J, Wets RJ-B (2020) Variational analysis of constrained M-estimators. Ann. Statist. 48:2759-2790.Crossref, Google Scholar · Zbl 1460.62067 [48] [48] Scholtes S (2002) Introduction to Piecewise Differentiable Equations (Springer).Google Scholar [49] [49] Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841-858.Crossref, Google Scholar · Zbl 0688.62025 [50] [50] Shapiro A (2003) Monte Carlo sampling methods. Rusczyński A, Shapiro A, eds. Stochastic Programming, Handbooks in OR & MS, vol. 10 (NorthHolland Publishing Company, Amsterdam), 353-425.Crossref, Google Scholar [51] [51] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390-1399.Crossref, Google Scholar · Zbl 1109.60030 [52] [52] Shapiro A, Dentcheva D, Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM Publications, Philadelphia).Crossref, Google Scholar · Zbl 1183.90005 [53] [53] Shechtman Y, Eldar Y, Cohen O, Chapman H, Miao J, Segev M (2015) Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Processing Magazine 32(3):87-109.Crossref, Google Scholar [54] [54] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131-1198.Crossref, Google Scholar · Zbl 1401.94049 [55] [55] van de Geer S (2000) Empirical Processes in M-estimation, vol. 6 (Cambridge University Press).Google Scholar · Zbl 0953.62049 [56] [56] van der Vaart AW (1998) Asymptotic Statistics, vol. 3 (Cambridge University Press).Crossref, Google Scholar · Zbl 0943.62002 [57] [57] van der Vaart AW, Wellner J (1996) Weak Convergence and Empirical Processes (Springer).Crossref, Google Scholar · Zbl 0862.60002 [58] [58] Xu H (2010) Sample average approximation methods for a class of stochastic variational inequality problems. Asia-Pacific J. Oper. Res. 27(1):103-119.Crossref, Google Scholar · Zbl 1186.90083 [59] [59] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371-401.Crossref, Google Scholar · Zbl 1168.90009 [60] [60] Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20(4):595-601.Crossref, Google Scholar · Zbl 0034.22902 [61] [61] Wang G, Giannakis GB, Eldar YC (2018) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773-794.Crossref, Google Scholar · Zbl 1390.90451 [62] [62] Wets RJB (1979) A statistical approach to the solution of stochastic programs with (convex) simple recourse. Working paper, University of Kentucky, Lexington.Google Scholar This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.