Faster kernel ridge regression using sketching and preconditioning.

*(English)*Zbl 1379.65008##### MSC:

65C60 | Computational problems in statistics (MSC2010) |

65F08 | Preconditioners for iterative methods |

62G08 | Nonparametric regression and quantile regression |

##### References:

[1] | N. Ailon and E. Liberty, Fast dimension reduction using Rademacher series on dual BCH codes, Discrete Comput. Geom., 42 (2008), pp. 615–630, . · Zbl 1180.94083 |

[2] | A. E. Alaoui and M. W. Mahoney, Fast randomized kernel ridge regression with statistical guarantees, in Proceedings of Neural Information Processing Systems (NIPS), 2015. |

[3] | H. Avron, K. L. Clarkson, and D. P. Woodruff, Sharper bounds for regularized data fitting, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Schloss Dagstuhl, Dagstuhl, Germany, 2017, pp. 27:1–27:22. |

[4] | H. Avron, P. Maymounkov, and S. Toledo, Blendenpik: Supercharging LAPACK’s least-squares solver, SIAM J. Sci. Comput., 32 (2010), pp. 1217–1236, . · Zbl 1213.65069 |

[5] | H. Avron, H. Nguyen, and D. Woodruff, Subspace embeddings for the polynomial kernel, Advances in Neural Information Processing Systems 27 (NIPS 2014). |

[6] | H. Avron and V. Sindhwani, High-performance kernel machines with implicit distributed optimization and randomization, Technometrics, 58 (2016), pp. 341–349, . |

[7] | F. R. Bach, Sharp analysis of low-rank kernel matrix approximations, in Proceedings of the 26th Annual Conference on Learning Theory (COLT), Princeton, NJ, 2013, pp. 185–209. |

[8] | G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression, Advances in Neural Information Processing Systems 23 (NIPS 2010). |

[9] | L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, Large-Scale Kernel Machines (Neural Information Processing), MIT Press, Cambridge, MA, 2007. |

[10] | M. Charikar, K. Chen, and M. Farach-Colton, Finding frequent items in data streams, Theoret. Comput. Sci., 312 (2004), pp. 3–15. · Zbl 1071.68020 |

[11] | J. Chen, H. Avron, and V. Sindhwani, Hierarchically compositional kernels for scalable nonparametric learning, J. Mach. Learn. Res., 18 (2017), 66. · Zbl 1434.68401 |

[12] | J. Chen, L. Wang, and M. Anitescu, A fast summation tree code for matérn kernel, SIAM J. Sci. Comput., 36 (2014), pp. A289–A309, . |

[13] | J. Chen, L. Wu, K. Audhkhasi, B. Kingsbury, and B. Ramabhadrari, Efficient one-vs-one kernel ridge regression for speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 2454–2458, . |

[14] | M. B. Cohen, J. Nelson, and D. P. Woodruff, Optimal approximate matrix product in terms of stable rank, in Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, 2016, pp. 11:1–11:14, . · Zbl 1404.65032 |

[15] | K. Cutajar, M. Osborne, J. Cunningham, and M. Filippone, Preconditioning kernel matrices, in Proceedings of the 33nd International Conference on Machine Learning (ICML), New York, 2016, pp. 2529–2538; also available online from . |

[16] | B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. Balcan, and L. Song, Scalable kernel methods via doubly stochastic gradients, Advances in Neural Information Processing Systems 27 (NIPS 2014). |

[17] | P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós, Faster least squares approximation, Numer. Math., 117 (2011), pp. 219–249, . · Zbl 1218.65037 |

[18] | G. Fung and O. L. Mangasarian, Proximal support vector machine classifiers, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2001. · Zbl 1101.68758 |

[19] | A. Gittens and M. Mahoney, Revisiting the Nyström method for improved large-scale machine learning, in Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, 28 (2013), pp. 567–575. |

[20] | R. Hamid, Y. Xiao, A. Gittens, and D. DeCoste, Compact random feature maps, in Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 32 (2014), pp. 19–27. |

[21] | P. Huang, H. Avron, T. Sainath, V. Sindhwani, and B. Ramabhadran, Kernel methods match deep neural networks on TIMIT, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 205–209. |

[22] | S. Kumar, M. Mohri, and A. Talwalkar, Sampling methods for the Nyström method, J. Mach. Learn. Res., 13 (2012), pp. 981–1006. · Zbl 1283.68292 |

[23] | Z. Lu, A. May, K. Liu, A. Bagheri Garakani, D. Guo, A. Bellet, L. Fan, M. Collins, B. Kingsbury, M. Picheny, and F. Sha, How to Scale Up Kernel Methods to Be as Good as Deep Neural Nets, ArXiv e-prints, 2014, . |

[24] | X. Meng, M. A. Saunders, and M. W. Mahoney, LSRN: A parallel iterative solver for strongly over- or underdetermined systems, SIAM J. Sci. Comput., 36 (2014), pp. C95–C118, . · Zbl 1298.65053 |

[25] | V. I. Morariu, B. V. Srinivasan, V. C. Raykar, R. Duraiswami, and L. S. Davis, Automatic online tuning for fast Gaussian summation, Advances in Neural Information Processing Systems 21 (NIPS 2008). |

[26] | R. Pagh, Compressed matrix multiplication, ACM Trans. Comput. Theory, 5 (2013), pp. 9:1–9:17, . · Zbl 1322.65055 |

[27] | J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero, Elemental: A new framework for distributed memory dense matrix computations, ACM Trans. Math. Software, 39 (2013), 13, . · Zbl 1295.65137 |

[28] | A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems 20 (NIPS 2007). |

[29] | V. C. Raykar and R. Duraiswami, Fast large scale Gaussian process regression using approximate matrix-vector products, in Proceedings of the Learning workshop, 2007. |

[30] | C. Saunders, A. Gammerman, and V. Vovk, Ridge regression learning algorithm in dual variables, in Proceedings of the 15th International Conference on Machine Learning (ICML), Madison, WI, (1998), pp. 515–521. |

[31] | J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Technical report, Carnegie Mellon University, Pittsburgh, PA, 1994. |

[32] | B. V. Srinivasan, Q. Hu, N. A. Gumerov, R. Murtugudde, and R. Duraiswami, Preconditioned Krylov solvers for kernel regression, ArXiv e-prints, 2014, . |

[33] | S. Tu, R. Roelofs, S. Venkataraman, and B. Recht, Large scale kernel learning using block coordinate descent, ArXiv e-prints, 2016, . |

[34] | C. K. I. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems 13 (NIPS 2000). |

[35] | D. P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends Theor. Comput. Sci., 10 (2014), pp. 1–157, . |

[36] | J. Yang, X. Meng, and M. W. Mahoney, Implementing randomized matrix algorithms in parallel and distributed environments, Proc. IEEE, 104 (2016), pp. 58–92, . |

[37] | Y. Yang, M. Pilanci, and M. J. Wainwright, Randomized sketches for kernels: Fast and optimal nonparametric regression, Ann. Statist., 45 (2017), pp. 991–1023, . · Zbl 1371.62039 |

[38] | T. Zhang, Learning bounds for kernel regression using effective data dimensionality, Neural Comput., 17 (2005), pp. 2077–2098, . · Zbl 1080.68044 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.