Taylor, Jonathan The geometry of least squares in the 21st century. (English) Zbl 1402.62157 Bernoulli 19, No. 4, 1449-1464 (2013). Summary: It has been over 200 years since Gauss’s and Legendre’s famous priority dispute on who discovered the method of least squares. Nevertheless, we argue that the normal equations are still relevant in many facets of modern statistics, particularly in the domain of high-dimensional inference. Even today, we are still learning new things about the law of large numbers, first described in Bernoulli’s Ars Conjectandi 300 years ago, as it applies to high dimensional inference. The other insight the normal equations provide is the asymptotic Gaussianity of the least squares estimators. The general form of the Gaussian distribution, Gaussian processes, are another tool used in modern high-dimensional inference. The Gaussian distribution also arises via the central limit theorem in describing weak convergence of the usual least squares estimators. In terms of high-dimensional inference, we are still missing the right notion of weak convergence. In this mostly expository work, we try to describe how both the normal equations and the theory of Gaussian processes, what we refer to as the “geometry of least squares,” apply to many questions of current interest. Cited in 4 Documents MSC: 62J05 Linear regression; mixed models 60G15 Gaussian processes 62-03 History of statistics 01A61 History of mathematics in the 21st century 62J07 Ridge regression; shrinkage estimators (Lasso) Keywords:convex analysis; Gaussian processes; least squares; penalized regression Software:softImpute; covTest; glmnet; NESTA; glasso; hierNet × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Adler, R.J., Samorodnitsky, G. and Taylor, J.E. (2010). Excursion sets of three classes of stable random fields. Adv. in Appl. Probab. 42 293-318. · Zbl 1201.60044 · doi:10.1239/aap/1275055229 [2] Adler, R.J., Samorodnitsky, G. and Taylor, J.E. (2013). High level excursion set geometry for non-Gaussian infinitely divisible random fields. Ann. Probab. 41 134-169. · Zbl 1269.60051 · doi:10.1214/11-AOP738 [3] Adler, R.J. and Taylor, J.E. (2007). Random Fields and Geometry. Springer Monographs in Mathematics . New York: Springer. · Zbl 1149.60003 [4] Amari, S.I. and Nagaoka, H. (2000). Methods of Information Geometry. Translations of Mathematical Monographs 191 . Providence, RI: Amer. Math. Soc. Translated from the 1993 Japanese original by Daishi Harada. · Zbl 0960.62005 [5] Azaïs, J.M. and Wschebor, M. (2008). A general expression for the distribution of the maximum of a Gaussian field and the approximation of the tail. Stochastic Process. Appl. 118 1190-1218. · Zbl 1149.60033 · doi:10.1016/j.spa.2007.07.016 [6] Becker, S., Bobin, J. and Candès, E.J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1-39. · Zbl 1209.90265 · doi:10.1137/090756855 [7] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620 [8] Bien, J., Taylor, J. and Tibshirani, R. (2013). A lasso for hierarchical interactions. Ann. Statist. To appear. Available at . · Zbl 1292.62109 [9] Boyd, S., Parikh, N. and Chu, E. (2011). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers . Hanover: Now Publishers. · Zbl 1229.90122 [10] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge: Cambridge Univ. Press. · Zbl 1058.90049 [11] Bühlmann, P. (2012). Statistical significance in high-dimensional linear models. Available at . · Zbl 1273.62173 · doi:10.3150/12-BEJSP11 [12] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications. Springer Series in Statistics . Heidelberg: Springer. · Zbl 1273.62015 [13] Candès, E.J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5 [14] Candès, E.J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489-509. · Zbl 1231.94017 · doi:10.1109/TIT.2005.862083 [15] Donoho, D.L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582 [16] Donoho, D.L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914-18919. [17] Efron, B. (1978). The geometry of exponential families. Ann. Statist. 6 362-376. · Zbl 0436.62027 · doi:10.1214/aos/1176344130 [18] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. With discussion, and a rejoinder by the authors. · Zbl 1091.62054 · doi:10.1214/009053604000000067 [19] Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418-491. · Zbl 0089.38402 · doi:10.2307/1993504 [20] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045 [21] Friedman, J., Hastie, T. and Tibshirani, R. (2009). glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1.1-3. [22] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131 [23] Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2011). Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12 2297-2334. · Zbl 1280.94029 [24] Laber, E.B. and Murphy, S.A. (2011). Adaptive confidence intervals for the test error in classification. J. Amer. Statist. Assoc. 106 904-913. · Zbl 1229.62085 · doi:10.1198/jasa.2010.tm10053 [25] Lockhart, R., Taylor, J., Tibshirani, R. and Tibshirani, R. (2013). A significance test for the lasso. Available at . · Zbl 1305.62255 [26] Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287-2322. · Zbl 1242.68237 [27] Meinshausen, N. (2012). Sign-constrained least squares estimation for high-dimensional regression. Available at . · Zbl 1327.62422 · doi:10.1214/13-EJS818 [28] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x [29] Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371-1382. · Zbl 1323.62076 · doi:10.1198/jasa.2011.tm10382 [30] Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for high-dimensional analysis of MM-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400 [31] Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Math. Program. 103 127-152. · Zbl 1079.90102 · doi:10.1007/s10107-004-0552-5 [32] Obozinski, G., Jacob, L. and Vert, J.P. (2011). Group lasso with overlaps: The latent group lasso approach. Available at . [33] Obozinski, G., Wainwright, M.J. and Jordan, M.I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1-47. · Zbl 1373.62372 · doi:10.1214/09-AOS776 [34] Schneider, R. (1993). Convex Bodies : The Brunn-Minkowski Theory. Encyclopedia of Mathematics and Its Applications 44 . Cambridge: Cambridge Univ. Press. · Zbl 0798.52001 [35] Siegmund, D. and Zhang, H. (1993). The expected number of local maxima of a random field and the volume of tubes. Ann. Statist. 21 1948-1966. · Zbl 0801.62087 · doi:10.1214/aos/1176349404 [36] Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. · Zbl 0476.62035 · doi:10.1214/aos/1176345632 [37] Stigler, S.M. (1981). Gauss and the invention of least squares. Ann. Statist. 9 465-474. · Zbl 0477.62001 · doi:10.1214/aos/1176345451 [38] Sun, J. (1993). Tail probabilities of the maxima of Gaussian random fields. Ann. Probab. 21 34-71. · Zbl 0772.60038 · doi:10.1214/aop/1176989393 [39] Takemura, A. and Kuriki, S. (1997). Weights of \(\overline{\chi}^{2}\) distribution for smooth or piecewise smooth cone alternatives. Ann. Statist. 25 2368-2387. · Zbl 0897.62055 · doi:10.1214/aos/1030741077 [40] Takemura, A. and Kuriki, S. (2002). On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12 768-796. · Zbl 1016.60042 · doi:10.1214/aoap/1026915624 [41] Taylor, J.E. (2006). A Gaussian kinematic formula. Ann. Probab. 34 122-158. · Zbl 1094.60025 · doi:10.1214/009117905000000594 [42] Taylor, J., Takemura, A. and Adler, R.J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362-1396. · Zbl 1083.60031 · doi:10.1214/009117905000000099 [43] Taylor, J.E. and Tibshirani, R.J. (2013). Estimation error bounds for convex problems with geometrically decomposable penalties. Unpublished manuscript. [44] Taylor, J.E. and Vadlamani, S. (2013). Random fields and the geometry of Wiener space. Ann. Probab. To appear. Available at . · Zbl 1284.60100 [45] Taylor, J.E. and Worsley, K.J. (2007). Detecting sparse signals in random fields, with an application to brain mapping. J. Amer. Statist. Assoc. 102 913-928. · Zbl 1469.62353 · doi:10.1198/016214507000000815 [46] Taylor, J.E. and Worsley, K.J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1-27. · Zbl 1144.62083 · doi:10.1214/009053607000000406 [47] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538 [48] Tibshirani, R.J. (2012). The lasso problem and uniqueness. Available at . · Zbl 1337.62173 · doi:10.1214/13-EJS815 [49] Tibshirani, R.J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335-1371. · Zbl 1234.62107 · doi:10.1214/11-AOS878 [50] Tibshirani, R.J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003 [51] Tseng, P. (2013). On accelerated proximal gradient methods for convex-concave optimization submitted to siam. J. Optim. [52] Tsirel’son, B.S. (1982). A geometric approach to maximum likelihood estimation for an infinite-dimensional Gaussian location. I. Teor. Veroyatn. Primen. 27 388-395. · Zbl 0498.62075 [53] Vitale, R.A. (2001). Intrinsic volumes and Gaussian processes. Adv. in Appl. Probab. 33 354-364. · Zbl 0994.60039 · doi:10.1239/aap/999188318 [54] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646 [55] Weyl, H. (1939). On the volume of tubes. Amer. J. Math. 61 461-472. · Zbl 0021.35503 · doi:10.2307/2371513 [56] Worsley, K.J. (1995). Boundary corrections for the expected Euler characteristic of excursion sets of random fields, with an application to astrophysics. Adv. in Appl. Probab. 27 943-959. · Zbl 0836.60043 · doi:10.2307/1427930 [57] Worsley, K.J., Marrett, S., Neelin, P., Vandal, A.C., Friston, K.J. and Evans, A.C. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping 4 58-73. [58] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.