*(English)*Zbl 1111.68148

Summary: In many applications, the data consist of (or may be naturally formulated as) an $m\times n$ matrix $A$. It is often of interest to find a low-rank approximation to $A$, i.e., an approximation $D$ to the matrix $A$ of rank not greater than a specified rank $k$, where $k$ is much smaller than $m$ and $n$. Methods such as the Singular Value Decomposition (SVD) may be used to find an approximation to $A$ which is the best in a well-defined sense. These methods require memory and time which are superlinear in $m$ and $n$; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an $m\times n$ matrix $A$, compute a description of a low-rank approximation ${D}^{*}$ to $A$, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix $A-{D}^{*}$. For any matrix $X$, let ${\parallel X\parallel}_{F}$ and ${\parallel X\parallel}_{2}$ denote its Frobenius norm and its spectral norm, respectively. In the first algorithm, $c$ columns of $A$ are randomly chosen. If the $m\times c$ matrix $C$ consists of those $c$ columns of $A$ (after appropriate rescaling), then it is shown that from ${C}^{T}C$ approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description ${D}^{*}$ of the matrix $A$ may be computed such that $\text{rank}\left({D}^{*}\right)\le k$ and such that

holds with high probability for both $\xi =2,F$. This algorithm may be implemented without storing the matrix $A$ in random access memory (RAM), provided it can make two passes over the matrix stored in external memory and use $O(cm+{c}^{2})$ additional RAM. The second algorithm is similar except that it further approximates the matrix $C$ by randomly sampling $r$ rows of $C$ to form an $r\times c$ matrix $W$. Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank $k$ approximation) that is at most ${\epsilon \parallel A\parallel}_{F}^{2}$, both algorithms take time which is polynomial in $k$, $1/\epsilon $, and $log(1/\delta )$, where $\delta >0$ is a failure probability; the first takes time linear in $max(m,n)$ and the second takes time independent of $m$ and $n$. Our bounds improve previously published results with respect to the rank parameter $k$ for both the Frobenius and spectral norms. In addition, the proofs for the error bounds use a novel method that makes important use of matrix perturbation theory. The probability distribution over columns of $A$ and the rescaling are crucial features of the algorithms which must be chosen judiciously.

[For Part I see the authors, ibid. 36, No. 1, 132–157 (2006; Zbl 1111.68147), reviewed above.]