×

Stochastic reformulations of linear systems: algorithms and convergence theory. (English) Zbl 1440.65045

Summary: We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain-specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem, and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient, conditions for the reformulation to be exact. Further, we propose and analyze three stochastic algorithms for solving the reformulated problem-basic, parallel, and accelerated methods-with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.

MSC:

65F10 Iterative numerical methods for linear systems
15A06 Linear equations (linear algebraic aspects)
15B52 Random matrices (algebraic aspects)
68W20 Randomized algorithms
65N75 Probabilistic methods, particle methods, etc. for boundary value problems involving PDEs
65Y20 Complexity and performance of numerical algorithms
68Q25 Analysis of algorithms and problem complexity
68W40 Analysis of algorithms
90C20 Quadratic programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] H. Avron, A. Druinsky, and A. Gupta, Revisiting asynchronous linear solvers: Provable convergence rate through randomization, J. ACM, 62 (2015), 51. · Zbl 1426.65047
[2] H. Avron, P. Maymounkov, and S. Toledo, Blendenpik: Supercharging LAPACK’s least-squares solver, SIAM J. Sci. Comput., 32 (2010), pp. 1217-1236, https://doi.org/10.1137/090767911. · Zbl 1213.65069
[3] H. H. Bauschke and J. M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev., 38 (1996), pp. 367-426, https://doi.org/10.1137/S0036144593251710. · Zbl 0865.47039
[4] J. Boyle and R. Dykstra, A method for finding projections onto the intersection of convex sets in Hilbert spaces, in Advances in Order Restricted Statistical Inference, Lect. Notes Stat. 37, Springer, Berlin, 1986, pp. 28-47. · Zbl 0603.49024
[5] G. Calafiore, F. Dabbene, and R. Tempo, Randomized algorithms for probabilistic robustness with real and complex structured uncertainty, IEEE Trans. Automat. Control, 45 (2000), pp. 2218-2235. · Zbl 1056.93599
[6] G. Calafiore and B. Polyak, Stochastic algorithms for exact and approximate feasibility of robust LMIs, IEEE Trans. Automat. Control, 46 (2001), pp. 1755-1759. · Zbl 1007.93080
[7] M. Charikar, K. Chen, and M. Farach-Colton, Finding frequent items in data streams, in Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP), Springer-Verlag, London, 2002, pp. 693-703. · Zbl 1057.68600
[8] P. L. Combettes and J.-C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping, SIAM J. Optim., 25 (2015), pp. 1221-1248, https://doi.org/10.1137/140971233. · Zbl 1317.65135
[9] G. Cormode and S. Muthukrishnan, An improved data stream summary: The count-min sketch and its applications, J. Algorithms, 55 (2005), pp. 58-75. · Zbl 1068.68048
[10] A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in Neural Information Processing Systems 27, Montreal, Canada, 2014.
[11] P. Drineas, R. Kannan, and M. W. Mahoney, Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication, SIAM J. Comput., 36 (2006), pp. 132-157, https://doi.org/10.1137/S0097539704442684. · Zbl 1111.68147
[12] P. Drineas, R. Kannan, and M. W. Mahoney, Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix, SIAM J. Comput., 36 (2006), pp. 158-183, https://doi.org/10.1137/S0097539704442696. · Zbl 1111.68148
[13] R. Dykstra, An algorithm for restricted least squares regression, J. Amer. Statist. Assoc., 78 (1983), pp. 837-842. · Zbl 0535.62063
[14] S. Elaydi, An Introduction to Difference Equations, 3rd ed., Undergrad. Texts Math., Springer, New York, 2005. · Zbl 1071.39001
[15] O. Fercoq and P. Richtárik, Accelerated, parallel, and proximal coordinate descent, SIAM J. Optim., 25 (2015), pp. 1997-2023, https://doi.org/10.1137/130949993. · Zbl 1327.65108
[16] J. P. Fillmore and M. L. Marx, Linear recursive sequences, SIAM Rev., 10 (1968), pp. 342-353, https://doi.org/10.1137/1010059. · Zbl 0169.51004
[17] R. M. Gower, D. Goldfarb, and P. Richtárik, Stochastic block BFGS: Squeezing more curvature out of data, in Proceedings of the 33rd International Conference on Machine Learning, New York, NY, 2016, pp. 1869-1878.
[18] R. M. Gower, F. Hanzely, P. Richtárik, and S. Stich, Accelerated stochastic matrix inversion: General theory and speeding up BFGS rules for faster second-order optimization, in Advances in Neural Information Processing Systems 31, NeurIPS, San Diego, CA, 2018, pp. 1619-1629.
[19] R. M. Gower and P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. Appl., 36 (2015), pp. 1660-1690, https://doi.org/10.1137/15M1025487. · Zbl 1342.65110
[20] R. M. Gower and P. Richtárik, Stochastic Dual Ascent for Solving Linear Systems, preprint, https://arxiv.org/abs/1512.06890, 2015. · Zbl 1342.65110
[21] R. M. Gower and P. Richtárik, Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse, preprint, https://arxiv.org/abs/1612.06255, 2016.
[22] R. M. Gower and P. Richtárik, Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1380-1409, https://doi.org/10.1137/16M1062053. · Zbl 1379.65016
[23] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems 26, NeurIPS, San Diego, CA, 2013, pp. 315-323, https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf.
[24] J. Konečný, J. Lu, P. Richtárik, and M. Takáč, Mini-batch semi-stochastic gradient descent in the proximal setting, IEEE J. Selected Topics Signal Process., 10 (2016), pp. 242-255.
[25] J. Konečný and P. Richtárik, S2GD: Semi-stochastic gradient descent methods, Front. Appl. Math. Stat., 3 (2017), https://doi.org/10.3389/fams.2017.00009. · Zbl 1386.90080
[26] D. Leventhal and A. S. Lewis, Randomized methods for linear constraints: Convergence rates and conditioning, Math. Oper. Res., 35 (2010), pp. 641-654, https://doi.org/10.1287/moor.1100.0456. · Zbl 1216.15006
[27] J. Liu and S. J. Wright, An accelerated randomized Kaczmarz algorithm, Math. Comp., 85 (2016), pp. 153-178. · Zbl 1327.65065
[28] N. Loizou and P. Richtárik, A new perspective on randomized gossip algorithms, in Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, Washington, DC, 2016, pp. 440-444.
[29] X. Meng, M. A. Saunders, and M. W. Mahoney, LSRN: A parallel iterative solver for strongly over- and underdetermined systems, SIAM J. Sci. Comput., 36 (2014), pp. C95-C118, https://doi.org/10.1137/120866580. · Zbl 1298.65053
[30] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, Cambridge, UK, 1995. · Zbl 0849.68039
[31] M. Mutný and P. Richtárik, Parallel stochastic Newton method, J. Comput. Math., 36 (2018), pp. 404-425. · Zbl 1413.65247
[32] A. Nedić, Random algorithms for convex minimization problems, Math. Program., 129 (2011), pp. 225-253. · Zbl 1229.90128
[33] D. Needell, Randomized Kaczmarz solver for noisy linear systems, BIT, 50 (2010), pp. 395-403. · Zbl 1195.65038
[34] D. Needell and J. A. Tropp, Paved with good intentions: Analysis of a randomized block Kaczmarz method, Linear Algebra Appl., 441 (2012), pp. 199-221. · Zbl 1282.65042
[35] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574-1609, https://doi.org/10.1137/070704277. · Zbl 1189.90109
[36] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Appl. Optim. 87, Kluwer Academic Publishers, Boston, MA, 2004. · Zbl 1086.90045
[37] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), pp. 341-362, https://doi.org/10.1137/100802001. · Zbl 1257.90073
[38] F. Niu, B. Recht, C. Ré, and S. Wright, Hogwild!: A lock-free approach to parallelizing stochastic gradient descent, in Advances in Neural Information Processing Systems 24, NeurIPS, San Diego, CA, 2011.
[39] P. Oswald and W. Zhou, Convergence analysis for Kaczmarz-type methods in a Hilbert space framework, Linear Algebra Appl., 478 (2015), pp. 131-161, https://doi.org/10.1016/j.laa.2015.03.028. · Zbl 1312.65051
[40] M. Pilanci and M. J. Wainwright, Newton sketch: A linear-time optimization algorithm with linear-quadratic convergence, SIAM J. Optim., 27 (2017), pp. 205-245, https://doi.org/10.1137/15M1021106. · Zbl 1456.90125
[41] M. Pilanci and M. J. Wainwright, Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares, J. Mach. Learn. Res., 17 (2016), pp. 1-38. · Zbl 1360.62400
[42] Z. Qu and Richtárik, Coordinate descent with arbitrary sampling I: Algorithms and complexity, Optim. Methods Softw., 31 (2016), pp. 829-857. · Zbl 1365.90205
[43] Z. Qu, P. Richtárik, M. Takáč, and O. Fercoq, SDNA: Stochastic dual Newton ascent for empirical risk minimization, in Proceedings of the 33rd International Conference on Machine Learning, New York, NY, 2016, pp. 1823-1832.
[44] Z. Qu, P. Richtárik, and T. Zhang, Quartz: Randomized dual coordinate ascent with arbitrary sampling, in Advances in Neural Information Processing Systems 28, NeurIPS, San Diego, CA, 2015.
[45] A. Ramdas, Rows vs. Columns for Linear Systems of Equations-Randomized Kaczmarz or Coordinate Descent?, preprint, https://arxiv.org/abs/arXiv:1406.5295v1, 2014.
[46] P. Richtárik and M. Takáč, Distributed coordinate descent method for learning with big data, J. Mach. Learn. Res., 17 (2016), pp. 1-25. · Zbl 1360.68709
[47] P. Richtárik and M. Takáč, On optimal probabilities in stochastic coordinate descent methods, Optim. Lett., 10 (2016), pp. 1233-1243. · Zbl 1353.90148
[48] P. Richtárik and M. Takáč, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), pp. 1-38. · Zbl 1301.65051
[49] P. Richtárik and M. Takáč, Parallel coordinate descent methods for big data optimization, Math. Program., 156 (2016), pp. 433-484. · Zbl 1342.90102
[50] P. Richtárik and M. Takáč, Stochastic Reformulations of Linear Systems: Accelerated Method, Tech. report, KAUST, Thuwal, Kingdom of Saudi Arabia, 2017. · Zbl 1440.65045
[51] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statistics, 22 (1951), pp. 400-407. · Zbl 0054.05901
[52] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877-898, https://doi.org/10.1137/0314056. · Zbl 0358.90053
[53] V. Rokhlin and M. Tygert, A fast randomized algorithm for overdetermined linear least-squares regression, Proc. Natl. Acad. Sci. USA, 105 (2008), pp. 13212-13218. · Zbl 1513.62144
[54] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM, Philadelphia, 2003, https://doi.org/10.1137/1.9780898718003. · Zbl 1031.65046
[55] M. Schmidt, N. Le Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), pp. 83-112. · Zbl 1358.90073
[56] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, Cambridge, UK, 2014. · Zbl 1305.68005
[57] S. Shalev-Shwartz and A. Tewari, Stochastic methods for \(\ell_1\)-regularized loss minimization, J. Mach. Learn. Res., 12 (2011), pp. 1865-1892. · Zbl 1280.62081
[58] S. Shalev-Shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss, J. Mach. Learn. Res., 14 (2013), pp. 567-599. · Zbl 1307.68073
[59] J. Sherman and W. J. Morrison, Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix (abstract), Ann. Math. Statist., 20 (1949), p. 621.
[60] D. A. Spielman and S.-H. Teng, Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 835-885, https://doi.org/10.1137/090771430. · Zbl 1311.65031
[61] N. Srebro, K. Sridharan, and A. Tewari, Optimistic Rates for Learning with a Smooth Loss, preprint, https://arxiv.org/abs/1009.3896, 2010.
[62] S. U. Stich, C. L. Müller, and B. Gärtner, Optimization of convex functions with random pursuit, SIAM J. Optim., 23 (2013), pp. 1284-1309, https://doi.org/10.1137/110853613. · Zbl 1273.90155
[63] T. Strohmer and R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15 (2009), pp. 262-278, https://doi.org/10.1007/s00041-008-9030-4. · Zbl 1169.68052
[64] M. Takáč, A. Bijral, P. Richtárik, and N. Srebro, Mini-batch primal and dual methods for SVMs, in Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, 2013.
[65] R. Tempo, G. Calafiore, and F. Dabbene, Randomized Algorithms for Analysis and Control of Uncertain Systems, Springer-Verlag, New York, 2013. · Zbl 1280.93002
[66] M. A. Woodbury, The Stability of Out-Input Matrices, Tech. report, Report of the Boulder Meeting, Boulder, CO, 1949.
[67] P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, in Proceedings of the 32nd International Conference on Machine Learning, PMLR, Vol. 37, Lille, France, 2015, pp. 1-9.
[68] A. Zouzias and N. M. Freris, Randomized extended Kaczmarz for solving least squares, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 773-793, https://doi.org/10.1137/120889897. · Zbl 1273.65053
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.