zbMATH — the first resource for mathematics

OR forum: An algorithmic approach to linear regression. (English) Zbl 1338.90272
Summary: Linear regression models are traditionally built through trial and error to balance many competing goals such as predictive power, interpretability, significance, robustness to error in data, and sparsity, among others. This problem lends itself naturally to a mixed integer quadratic optimization (MIQO) approach but has not been modeled this way because of the belief in the statistics community that MIQO is intractable for large scale problems. However, in the last 25 years (1991–2015), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving mixed integer optimization problems. We present an MIQO-based approach for designing high quality linear regression models that explicitly addresses various competing objectives and demonstrate the effectiveness of our approach on both real and synthetic data sets.

90C11 Mixed integer programming
90C20 Quadratic programming
62J05 Linear regression; mixed models
Full Text: DOI
[1] Bach FR (2008) Consistency of the group Lasso and multiple kernel learning. J. Machine Learn. Res. 9:1179-1225. · Zbl 1225.68147
[2] Bache K, Lichman M (2014) UCI machine learning repository. Accessed August 20, 2014, http://archive.ics.uci.edu/ml.
[3] Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust Optimization (Princeton University Press, Princeton, NJ). CrossRef
[4] Bertsimas D, Copenhaver M (2014) Characterization of the equivalence of robustification and regularization in linear, median, and matrix regression. Submitting for publication. · Zbl 1403.62040
[5] Bertsimas D, Fertis A (2009) On the equivalence of robust optimization and regularization in statistics. Technical report, http://www.mit.edu/bertsim/papers.html.
[6] Bertsimas D, Freund R (2004) Data, Models, and Decisions: The Fundamentals of Management Science (Dynamic Ideas Press, Belmont, MA).
[7] Bertsimas D, Mazumder R (2014) Least quantile regression via modern optimization. Ann. Statist. 42(6):2494-2525. CrossRef · Zbl 1302.62154
[8] Bertsimas D, Weismantel R (2005) Optimization Over Integers, Vol. 13 (Dynamic Ideas Press, Belmont, MA).
[9] Bertsimas D, Brown DB, Caramanis C (2011) Theory and applications of robust optimization. SIAM Rev. 53(3):464-501. CrossRef · Zbl 1233.90259
[10] Bertsimas D, King A, Mazumder R (2015) Best subset selection via a modern optimization lens. Ann. Statist. Forthcoming. · Zbl 1335.62115
[11] Bezanson J, Karpinski S, Shah VB, Edelman A (2012) Julia: A fast dynamic language for technical computing. arXiv:1411.1607.
[12] Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Documenta Mathematica, Extra Volume: Optim. Stories107-121. · Zbl 1270.90003
[13] Bühlmann P, Van De Geer S (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer, Berlin). CrossRef
[14] Chatterjee S, Hadi AS, Price B (2012) Regression Analysis by Example, 5th ed. (John Wiley & Sons, New York).
[15] Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J. Scientific Comput. 20(1):33-61. CrossRef · Zbl 0919.94002
[16] DeCock D (2011) Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. J. Statist. Ed. 19(3):1-15.
[17] DiCiccio TJ, Efron B (1996) Bootstrap confidence intervals. Statist. Sci.189-212. · Zbl 0955.62574
[18] Donoho DL, Huber PJ (1983) The notion of breakdown point. A Festschrift for Erich L. Lehmann157-184.
[19] Draper NR, Smith H (1998) Applied Regression Analysis, 3rd ed. (John Wiley & Sons, New York). CrossRef
[20] Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1-26. CrossRef · Zbl 0406.62024
[21] Eldar YC, Kutyniok G (2012) Compressed Sensing: Theory and Applications (Cambridge University Press, London). CrossRef
[22] Furnival GM, Wilson RW (1974) Regressions by leaps and bounds. Technometrics 16(4):499-511. CrossRef · Zbl 0294.62079
[23] Greenshtein E (2006) Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint. Ann. Statist. 34(5):2367-2386. CrossRef · Zbl 1106.62022
[24] Gurobi Inc. (2014) Gurobi optimizer reference manual. Accessed August 20, 2014, http://www.gurobi.com.
[25] Hampel FR (1971) A general qualitative definition of robustness. Ann. Math. Statist. 42(6):1887-1896. CrossRef · Zbl 0229.62041
[26] Hastie T (2015) Trevor Hastie lectures and talks. Accessed February 11, 2015, http://www-stat.stanford.edu/astie/TALKS/glmnet_webinar_Rsession.tgz.
[27] Jacob L, Obozinski G, Vert J-P (2009) Group Lasso with overlap and graph lasso. Proc. 26th Ann. Internat. Conf. Machine Learn. (ACM, New York), 433-440. CrossRef
[28] Javanmard A, Montanari A (2013) Confidence intervals and hypothesis testing for high-dimensional regression. arXiv preprint arXiv:1306.3171. · Zbl 1319.62145
[29] Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R (2014) A significance test for the Lasso. Ann. Statist. 42(2):413-468. CrossRef · Zbl 1305.62254
[30] Lubin M, Dunning I (2015) Computing in operations research using Julia. INFORMS J. Comput. 27(2):238-248. Link · Zbl 1331.90001
[31] Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics 8(1):60. CrossRef
[32] Massy WF (1965) Principal components regression in exploratory statistical research. J. Amer. Statist. Assoc. 60(309):234-256. CrossRef
[33] Mazumder R, Friedman JH, Hastie T (2011) Sparsenet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106(495):1125-1138. CrossRef · Zbl 1229.62091
[34] Miller A (1990) Subset Selection in Regression (CRC Press, Boca Raton, FL). CrossRef
[35] Nemhauser G (2013) Integer programming: The global impact. EURO, INFORMS, Rome, Italy, http://euro2013.org/wp-content/uploads/Nemhauser_EuroXXVI.pdf.
[36] R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/.
[37] Raskutti G, Wainwright MJ, Yu B (2011) Minimax rates of estimation for high-dimensional linear regression over-balls. IEEE Trans. Inform. Theory 57(10):6976-6994. CrossRef · Zbl 1365.62276
[38] Rhee SY, Taylor J, Wadhera G, Ben-Hur A, Brutlag DL, Shafer RW (2006) Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. 103(46):17355-17360. CrossRef
[39] Rousseeuw PJ (1984) Least median of squares regression. J. Amer. Statist. Assoc. 79(388):871-880. CrossRef · Zbl 0547.62046
[40] Ryan TP (2008) Modern Regression Methods, Vol. 655 (John Wiley & Sons, New York).
[41] Seber GA, Lee AJ (2003) Linear Regression Analysis, 2nd ed. (John Wiley & Sons, New York). CrossRef
[42] Shen X, Pan W, Zhu Y, Zhou H (2013) On constrained and regularized high-dimensional regression. Ann. Institute Statist. Math. 65(5):807-832. CrossRef · Zbl 1329.62307
[43] Tabachnick BG, Fidell LS (2001) Using Multivariate Statistics4th ed. (Allyn and Bacon, Boston).
[44] Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B (Methodological) 58(1):267-288. · Zbl 0850.62538
[45] Top500.org (2013) Top500 Supercomputer Sites, Directory page for Top500 lists. Result for each list since June 1993. Accessed December 4, 2013, http://www.top500.org/statistics/sublist/.
[46] Torgo L (2014) Regression data sets. Accessed August 20, 2014, http://www.dcc.fc.up.pt/torgo/Regression/DataSets.html. · Zbl 1032.62065
[47] Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy and Buildings 49:560-567. CrossRef
[48] Weisberg S (2014) Applied Linear Regression, 4th ed. (John Wiley & Sons, New York).
[49] Winner L (2014) Miscellaneous data sets. Accessed August 20, 2014, http://www.stat.ufl.edu/inner/datasets.html.
[50] Wold S, Ruhe A, Wold H, Dunn W III (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J. Scientific Statist. Comput. 5(3):735-743. CrossRef · Zbl 0545.62044
[51] Xu H, Caramanis C, Mannor S (2009) Robustness and regularization of support vector machines. J. Machine Learn. Res. 10:1485-1510. · Zbl 1235.68209
[52] Yeh IC (1998) Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Res. 28(12):1797-1808. CrossRef
[53] Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc.: Ser. B (Statist. Methodology) 68(1):49-67. CrossRef · Zbl 1141.62030
[54] Zhang C-H, Zhang T (2012) A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27(4):576-593. CrossRef · Zbl 1331.62353
[55] Zhang Y, Wainwright MJ, Jordan MI (2014) Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. J. Machine Learning Research: Workshop and Conf. Proc. 35:1-28.
[56] Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37(6A):3468-3497. CrossRef · Zbl 1369.62164
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.