Fast approximation of matrix coherence and statistical leverage.

*(English)*Zbl 1437.65030Summary: The statistical leverage scores of a matrix \(A\) are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary \(n \times d\) matrix \(A\), with \(n >> d\), and that returns as output relative-error approximations to all \(n\) of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of \(n\) and \(d\)) in \(O(nd \log n)\) time, as opposed to the \(O(nd^{2})\) time required by the naïve algorithm that involves computing an orthogonal basis for the range of \(A\). Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with \(n \approx d\), and the extension to streaming environments.