The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. With comments by Ronald A. Thisted and M. R. Osborne and a rejoinder by the authors. (English) Zbl 0955.62608

Summary: Since the time of Gauss, it has been generally accepted that \(l_2\)-methods of combining observations by minimizing sums of squared errors have significant computational advantages over earlier \(l_1\)-methods based on minimization of absolute errors advocated by Boscovich, Laplace and others. However, \(l_1\)-methods are known to have significant robustness advantages over \(l_2\)-methods in many applications, and related quantile regression methods provide a useful, complementary approach to classical least-squares estimation of statistical models. Combining recent advances in interior point methods for solving linear programs with a new statistical preprocessing approach for \(l_1\)-type problems, we obtain a 10- to 100-fold improvement in computational speeds over current (simplex-based) \(l_1\)-algorithms in large problems, demonstrating that \(l_1\)-methods can be made competitive with \(l_2\)-methods in terms of computational speed throughout the entire range of problem sizes. Formal complexity results suggest that \(l_1\)-regression can be made faster than least-squares regression for \(n\) sufficiently large and \(p\) modest.


62J05 Linear regression; mixed models


Algorithm 478
Full Text: DOI


[1] ney, A., Ostrouchov, S. and Sorenson, D. (1995). LAPACK Users’ Guide. SIAM, Philadelphia.
[2] Barrodale, I. and Roberts, F. D. K. (1974). Solution of an overdetermined sy stem of equations in the 1 norm. Communications of the ACM 17 319-320.
[3] Bartels, R. and Conn, A. (1980). Linearly constrained discrete 1 problems. ACM Trans. Math. Software 6 594-608. · Zbl 0448.49017
[4] Bloomfield, P. and Steiger, W. L. (1983). Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhäuser, Boston. · Zbl 0536.62049
[5] Buchinsky, M. (1994). Changes in US wage structure 1963-87: an application of quantile regression. Econometrica 62 405- 458. · Zbl 0800.90235
[6] Buchinsky, M. (1995). Quantile regression, the Box-Cox transformation model and U.S. wage structure 1963-1987. J. Econometrics 65 109-154. · Zbl 0825.62956
[7] Chamberlain, G. (1994). Quantile regression, censoring and the structure of wages. In Advances in Econometrics (C. Sims, ed.). North-Holland, Amsterdam.
[8] Chambers, J. M. (1992). Linear models. In Statistical Models in S (J. M. Chambers and T. J. Hastie, eds.) 95-144. Wadsworth, Pacific Grove, CA.
[9] Charnes, A., Cooper, W. W. and Ferguson, R. O. (1955). Optimal estimation of executive compensation by linear programming. Management Science 1 138-151. · Zbl 0995.90590
[10] Chaudhuri, P. (1992). Generalized regression quantiles. In Proceedings of the Second Conference on Data Analy sis Based on the L1 Norm and Related Methods 169-186. North-Holland, Amsterdam.
[11] Chen, S. and Donoho, D. L. (1995). Atomic decomposition by basis pursuit. SIAM J. Sci. Stat. Comp. · Zbl 0919.94002
[12] Dikin, I. I. (1967). Iterative solution of problems of linear and quadratic programming. Soviet Math. Dokl. 8 674-675. · Zbl 0189.19504
[13] Edgeworth, F. Y. (1887). On observations relating to several quantities. Hermathena 6 279-285.
[14] Edgeworth, F. Y. (1888). On a new method of reducing observations relating to several quantities. Philosophical Magazine 25 184-191. · JFM 20.0219.01
[15] Fan, J. and Gijbels, I. (1996). Local Poly nomial Modelling and Its Applications. Chapman and Hall, London. · Zbl 0873.62037
[16] Fiacco, A. V. and McCormick, G. P. (1968). Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Wiley, New York. · Zbl 0193.18805
[17] Floy d, R. W. and Rivest, R. L. (1975). Expected time bounds for selection. Communications of the ACM 18 165-173. · Zbl 0296.68049
[18] Frisch, R. (1956). La Résolution des probl emes de programme linéaire par la méthode du potential logarithmique. Cahiers du Séminaire d’Econometrie 4 7-20.
[19] Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimis obnoxiae: pars prior. [Translated (1995) by G. W. Stewart as Theory of the Combination of Observations Least Subject to Error. SIAM, Philadelphia.] Gill, P., Murray, W., Saunders, M., Tomlin, T. and Wright,
[20] M. (1986). On projected Newton barrier methods for linear programming and an equivalence to Karmarkar’s projective method. Math. Programming 36 183-209. · Zbl 0624.90062
[21] Gonzaga, C. C. (1992). Path-following methods for linear programming. SIAM Rev. 34 167-224. JSTOR: · Zbl 0763.90063
[22] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London. · Zbl 0832.62032
[23] Gutenbrunner, C. and Jure cková, J. (1992). Regression quantile and regression rank score process in the linear model and derived statistics. Ann. Statist. 20 305-330. Gutenbrunner, C., Jure cková, J., Koenker, R. and Portnoy, S. · Zbl 0759.62015
[24] . Tests of linear hy potheses based on regression rank scores. J. Nonparametric Statist. 2 307-333. · Zbl 1360.62216
[25] Hall, P. and Sheather, S. (1988). On the distribution of a studentized quantile. J. Roy. Statist. Soc. Ser. B 50 381-391. JSTOR: · Zbl 0674.62034
[26] Karmarkar, N. (1984). A new poly nomial time algorithm for linear programming. Combinatorica 4 373-395. · Zbl 0557.90065
[27] Koenker, R. (1994). Confidence intervals for regression quantiles. In Asy mptotic Statistics, Proceedings of the Fifth Prague Sy mposium (P. Mandl and M. Hu sková, eds.) 349-359. Springer, Heidelberg.
[28] Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33-50. JSTOR: · Zbl 0373.62038
[29] Koenker, R. and d’Orey, V. (1987). Computing regression quantiles. J. Roy. Statist. Soc. Ser. C 36 383-393.
[30] Koenker, R. and d’Orey, V. (1993). Computing dual regression quantiles and regression rank scores. J. Roy. Statist. Soc. Ser. C 43 410-414.
[31] Koenker R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673-680. JSTOR: · Zbl 0810.62040
[32] Laplace, P.-S. (1789). Sur quelques points du sy st eme du monde. Mémoires de l’Académie des Sciences de Paris. (Reprinted in OEvres Complétes 11 475-558. Gauthier-Villars, Paris.)
[33] Lustig, I. J., Marsden, R. E. and Shanno, D. F. (1992). On implementing Mehrotra’s predictor-corrector interior-point method for linear programming. SIAM J. Optim. 2 435-449. · Zbl 0771.90066
[34] Lustig, I. J., Marsden, R. E. and Shanno, D. F. (1994). Interior point methods for linear programming: computational state of the art (with discussion). ORSA J. Comput. 6 1-36. · Zbl 0798.90100
[35] Manning, W., Blumberg, L. and Moulton, L. H. (1995). The demand for alcohol: the differential response to price. J. Health Economics 14 123-148.
[36] Mehrotra, S. (1992). On the implementation of a primal-dual interior point method. SIAM J. Optim. 2 575-601. · Zbl 0773.90047
[37] Meketon, M. S. (1986). Least absolute value regression. Technical report, Bell Labs, Holmdel, NJ.
[38] Mizuno, S., Todd, M. J. and Ye, Y. (1993). On adaptive-step primal dual interior point algorithms for linear programming. Math. Oper. Res. 18 964-981. JSTOR: · Zbl 0810.90091
[39] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 327-332. · Zbl 0517.62051
[40] Portnoy, S. (1991). Asy mptotic behavior of the number of regression quantile breakpoints. SIAM Journal of Scientific and Statistical Computing 12 867-883. · Zbl 0736.62061
[41] Powell, J. L. (1986). Censored regression quantiles. J. Econometrics 32 143-155. · Zbl 0605.62139
[42] Renegar, J. (1988). A poly nomial-time algorithm based on Newton’s method for linear programming. Math. Programming 40 59-93. · Zbl 0654.90050
[43] Shamir, R. (1993). Probabilistic analysis in linear programming. Statist. Sci. 8 57-64. · Zbl 0768.90054
[44] Siddiqui, M. (1960). Distribution of quantiles in samples from a bivariate population. J. Res. Nat. Bur. Stand. B 64 145- 150. · Zbl 0096.13402
[45] Sonnevend, G., Stoer, J. and Zhao, G. (1991). On the complexity of following the central path of linear programs by linear extrapolation II. Math. Programming 52 527-553. · Zbl 0742.90056
[46] Stigler, S. M. (1984). Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika 71 615-620. JSTOR: · Zbl 0555.62003
[47] Stigler, S. M. (1986). The History of Statistics: Measurement of Uncertainty before 1900. Harvard Univ. Press. · Zbl 0656.62005
[48] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. C 58 267-288. JSTOR: · Zbl 0850.62538
[49] Vanderbei, R. J., Meketon, M. J. and Freedman, B. A. (1986). A modification of Karmarkar’s linear programming algorithm. Algorithmica 1 395-407. · Zbl 0626.90056
[50] Wagner, H. M. (1959). Linear programming techniques for regression analysis. J. Amer. Statist. Assoc. 54 206-212. JSTOR: · Zbl 0088.35702
[51] Welsh, A. H. (1996). Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica 6 347-366. · Zbl 0884.62047
[52] Wright, M. H. (1992). Interior methods for constrained optimization. Acta Numerica 1 341-407. · Zbl 0766.65053
[53] Zhang, Y. (1992). Primal-dual interior point approach for computing 1-solutions and -solutions of overdetermined linear sy stems. J. Optim. Theory Appl. 77 323-341. · Zbl 0796.49029
[55] borne, 1985). It can be computed by the fast median algorithm of Bloomfield and Steiger, for example. The Barrodale-Roberts approach is equivalent to using a comparison sort in this context and seems already sufficient to explain the O n2 behavior observed. Recently, Osborne and Watson (1996) have observed that the secant algorithm can be applied here and interpreted as an alternative to the usual median of three partitioning in the fast median computation. The improvement over Bloomfield and Steiger can be staggering in problems which arise in fitting a deterministic model in the presence of noise. For the record, the code distributed by Bartels, Conn and Sinclair used a heap sort in the linesearch implementation and was perhaps the first to improve on the O n2 asy mptotics. It would seem to be time that S-PLUS used a more modern implementation. 3. There is at least some folk law concerning the inferior performance of interior point methods when compared with simplex-sty le methods in postoptimality computations. However, this is the ty pe of computation employ ed when stud GAUSSIAN HARE, LAPLACIAN TORTOISE 299
[56] G üler, O., den Hertog, D., Roos, C. and Terlaky, T. (1993). Degeneracy in interior point methods for linear programming: a survey. Ann. Oper. Res. 46 107-138. · Zbl 0785.90067
[57] Kennedy, W. and Gentle, J. E., Jr. (1980). Statistical Computing. Dekker, New York. · Zbl 0435.62003
[58] Monteiro, R. D. C. and Mehrotra, S. (1996). A general parametric analysis approach and its implications to sensitivity analysis in interior point methods. Math. Programming 72 65-82. · Zbl 0853.90083
[59] Osborne, M. R. (1985). Finite Algorithms in Optimization and Data Analy sis. Wiley, New York. · Zbl 0573.65044
[60] Osborne, M. R. and Watson, G. A. (1996). Aspects of Mestimation and l1 fitting. In Numerical Analy sis (D. F. Griffith and G. A. Watson, eds.). World Scientific, Singapore. Press, W., Flannery, B., Teukolsky, S. and Vetterling, W.
[61] . Numerical Recipes: The Art of Scientific Computing. Cambridge Univ. Press. · Zbl 0587.65003
[62] Thisted, R. A. (1988). Elements of Statistical Computing. Chapman and Hall, London. · Zbl 0663.62001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.