×

Self-concordant analysis for logistic regression. (English) Zbl 1329.62324

Summary: Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the \(\ell _{2}\)-norm and regularization by the \(\ell _{1}\)-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62J02 General nonlinear regression
90C20 Quadratic programming

Software:

Bolasso; gss; PLCP
PDF BibTeX XML Cite
Full Text: DOI Euclid

References:

[1] A. W. Van der Vaart., Asymptotic Statistics . Cambridge University Press, 1998.
[2] P. Massart., Concentration Inequalities and Model Selection: Ecole d’été de Probabilités de Saint-Flour 23 . Springer, 2003.
[3] S. A. Van De Geer. High-dimensional generalized linear models and the Lasso., Annals of Statistics , 36(2):614, 2008. · Zbl 1138.62323
[4] C. Gu. Adaptive spline smoothing in non-gaussion regression models., Journal of the American Statistical Association , pages 801-807, 1990.
[5] F. Bunea. Honest variable selection in linear and logistic regression models via, \ell 1 and \ell 1 + \ell 2 penalization. Electronic Journal of Statistics , 2 :1153-1194, 2008. · Zbl 1320.62170
[6] D. P. Bertsekas., Nonlinear programming . Athena Scientific, 1999. · Zbl 1015.90077
[7] S. Boyd and L. Vandenberghe., Convex Optimization . Cambridge University Press, 2003. · Zbl 1058.90049
[8] Y. Nesterov and A. Nemirovskii., Interior-point polynomial algorithms in convex programming . SIAM studies in Applied Mathematics, 1994. · Zbl 0824.90112
[9] R. Christensen., Log-linear models and logistic regression . Springer, 1997. · Zbl 0880.62073
[10] D. W. Hosmer and S. Lemeshow., Applied logistic regression . Wiley-Interscience, 2004. · Zbl 0967.62045
[11] C. Houdré and P. Reynaud-Bouret. Exponential inequalities, with constants, for U-statistics of order two. In, Stochastic inequalities and applications, Progress in Probability, 56 , pages 55-69. Birkhäuser, 2003. · Zbl 1036.60015
[12] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research , 7 :2541-2563, 2006. · Zbl 1222.62008
[13] M. Yuan and Y. Lin. On the non-negative garrotte estimator., Journal of The Royal Statistical Society Series B , 69(2):143-161, 2007. · Zbl 1120.62052
[14] H. Zou. The adaptive Lasso and its oracle properties., Journal of the American Statistical Association , 101 :1418-1429, December 2006. · Zbl 1171.62326
[15] M. J. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using, \ell 1 -constrained quadratic programming. IEEE Transactions on Information Theory , 55(5) :2183, 2009. · Zbl 1367.62220
[16] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics , 37(4) :1705-1732, 2009. · Zbl 1173.62022
[17] J. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal., Numerical Optimization Theoretical and Practical Aspects . Springer, 2003.
[18] J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In, Proceedings of the 21st Annual Conference on Learning Theory (COLT) , pages 263-274, 2008.
[19] P. McCullagh and J. A. Nelder., Generalized linear models . Chapman & Hall/CRC, 1989. · Zbl 0744.62098
[20] B. Efron. The estimation of prediction error: Covariance penalties and cross-validation., Journal of the American Statistical Association , 99(467):619-633, 2004. · Zbl 1117.62324
[21] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds., Journal of the American Statistical Association , 101(473):138-156, 2006. · Zbl 1118.62330
[22] G. Wahba., Spline Models for Observational Data . SIAM, 1990. · Zbl 0813.62001
[23] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions., Journal of Mathematical Analysis and Applications , 33:82-95, 1971. · Zbl 0201.39702
[24] G. H. Golub and C. F. Van Loan., Matrix Computations . Johns Hopkins University Press, 1996. · Zbl 0865.65009
[25] C. Gu., Smoothing spline ANOVA models . Springer, 2002. · Zbl 1051.62034
[26] K. Sridharan, N. Srebro, and S. Shalev-Shwartz. Fast rates for regularized objectives. In, Advances in Neural Information Processing Systems (NIPS) , 2008.
[27] I. Steinwart, D. Hush, and C. Scovel. A new concentration result for regularized risk minimizers., High Dimensional Probability: Proceedings of the Fourth International Conference , 51:260-275, 2006. · Zbl 1127.68090
[28] S. Arlot and F. Bach. Data-driven calibration of linear estimators with minimal penalties. In, Advances in Neural Information Processing Systems (NIPS) , 2009.
[29] T. J. Hastie and R. J. Tibshirani., Generalized Additive Models . Chapman & Hall, 1990. · Zbl 0747.62061
[30] Z. Harchaoui, F. R. Bach, and E. Moulines. Testing for homogeneity with kernel fisher discriminant analysis. Technical Report 00270806, HAL, 2008.
[31] R. Shibata. Statistical aspects of model selection. In, From Data to Model , pages 215-240. Springer, 1989.
[32] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions., Psychometrika , 52(3):345-370, 1987. · Zbl 0627.62005
[33] P. Liang, F. Bach, G. Bouchard, and M. I. Jordan. An asymptotic analysis of smooth regularizers. In, Advances in Neural Information Processing Systems (NIPS) , 2009.
[34] P. Craven and G. Wahba. Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation., Numerische Mathematik , 31(4):377-403, 1978/79. · Zbl 0377.65007
[35] K.-C. Li. Asymptotic optimality for \(C_ p\), \(C_ L\), cross-validation and generalized cross-validation: discrete index set., Annals of Statistics , 15(3):958-975, 1987. · Zbl 0653.62037
[36] F. Bach. Consistency of the group Lasso and multiple kernel learning., Journal of Machine Learning Research , 9 :1179-1225, 2008. · Zbl 1225.68147
[37] C. L. Mallows. Some comments on, C p . Technometrics , 15:661-675, 1973. · Zbl 0269.62061
[38] F. O’Sullivan, B. S. Yandell, and W. J. Raynor Jr. Automatic smoothing of regression functions in generalized linear models., Journal of the American Statistical Association , pages 96-103, 1986.
[39] R. Tibshirani. Regression shrinkage and selection via the Lasso., Journal of The Royal Statistical Society Series B , 58(1):267-288, 1996. · Zbl 0850.62538
[40] T. Zhang. Some sharp performance bounds for least squares regression with, \ell 1 regularization. Annals of Statistics , 37(5) :2109-2144, 2009. · Zbl 1173.62029
[41] A. Juditsky and A. S. Nemirovski. On verifiable sufficient conditions for sparse signal recovery via, \ell 1 minimization. Technical Report 0809.2650, arXiv, 2008. · Zbl 1211.90333
[42] A. d’Aspremont and L. El Ghaoui. Testing the nullspace property using semidefinite programming. Technical Report 0807.3520, arXiv, 2008. · Zbl 1211.90167
[43] P. Chaudhuri and P. A. Mykland. Nonlinear experiments: Optimal design and inference based on likelihood., Journal of the American Statistical Association , 88(422):538-546, 1993. · Zbl 0774.62079
[44] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables., Journal of The Royal Statistical Society Series B , 68(1):49-67, 2006. · Zbl 1141.62030
[45] F. Bach. Bolasso: model consistent Lasso estimation through the bootstrap. In, Proceedings of the International Conference on Machine Learning (ICML) , 2008.
[46] N. Meinshausen and P. Bühlmann. Stability selection. Technical report,
[47] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso., Annals of statistics , 34(3) :1436, 2006. · Zbl 1113.62082
[48] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation., Journal of Machine Learning Research , 9:485-516, 2008.
[49] J. M. Borwein and A. S. Lewis., Convex Analysis and Nonlinear Optimization . Number 3 in CMS Books in Mathematics. Springer, 2000.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.