×

A fast hybrid algorithm for large-scale \(l_{1}\)-regularized logistic regression. (English) Zbl 1242.62078

Summary: \(l_{1}\)-regularized logistic regression, also known as sparse logistic regression, is widely used in machine learning, computer vision, data mining, bioinformatics and neural signal processing. The use of \(l_{1}\) regularization attributes attractive properties to the classifier, such as feature selection, robustness to noise, and as a result, classifier generality in the context of supervised learning. When a sparse logistic regression problem has large-scale data in high dimensions, it is computationally expensive to minimize the non-differentiable \(l_{1}\)-norm in the objective function. Motivated by recent work [K. Koh, S.-J. Kim and S. Boyd, ibid. 8, 1519–1555 (2007; Zbl 1222.62092); E.T. Hale, W. Yin and Y. Zhang, SIAM J. Optim. 19, No. 3, 1107–1130 (2008; Zbl 1180.65076)], we propose a novel hybrid algorithm based on combining two types of optimization iterations: one being very fast and memory friendly while the other being slower but more accurate. Called hybrid iterative shrinkage (HIS), the resulting algorithm is comprised of a fixed point continuation phase and an interior point phase. The first phase is based completely on memory efficient operations such as matrix-vector multiplications, while the second phase is based on a truncated Newton’s method. Furthermore, we show that various optimization techniques, including line search and continuation, can significantly accelerate convergence. The algorithm has global convergence at a geometric rate (a Q-linear rate in optimization terminology). We present a numerical comparison with several existing algorithms, including an analysis using benchmark data from the UCI machine learning repository, and show our algorithm is the most computationally efficient without loss of accuracy.

MSC:

62J12 Generalized linear models (logistic models)
68T05 Learning and adaptive systems in artificial intelligence
65K05 Numerical mathematical programming methods
65C60 Computational problems in statistics (MSC2010)

Software:

UCI-ml; RCV1
PDFBibTeX XMLCite
Full Text: Link