Low-rank approximation of a matrix

  • Low-rank approximations
  • Link with PCA

Low-rank approximations

We consider a matrix A \in \mathbf{R}^{m \times n}, with SVD given as in the SVD theorem:

A=U \tilde{S} V^T, \quad \tilde{S}=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\right),

where the singular values are ordered in decreasing order, \sigma_1 \geq \ldots \geq \sigma_r>0. In many applications, it can be useful to approximate A with a low-rank matrix.

Example: Assume that A contains the log returns of n assets over m time periods so that each column of A is a time series for a particular asset. Approximating A by a rank-one matrix of the form pq^T, with p \in R^m and q \in R^n amounts to model the assets’ movements as all following the same pattern given by the time-profile p, each asset’s movements being scaled by the components in q. Indeed, the (t.i) component of A, which is the log-return of asset i at time t, then expresses as p(t)q(i).

We consider the low-rank approximation problem

\min _X\|A-X\|_F: \operatorname{rank}(X)=k,

where k (1 \leq k<r=\operatorname{rank}(A)) is given. In the above, we measure the error in the approximation using the Frobenius norm; using the largest singular value norm leads to the same set of solutions X.

Theorem: Low-rank approximation

A best k-rank approximation \hat{A}_k is given by zeroing out the r-k trailing singular values of A, that is

\hat{A}_k=U \hat{S}_k V^T, \hat{S}_k=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_k, 0, \ldots, 0\right) .

The minimal error is given by the Euclidean norm of the singular values that have been zeroed out in the process:

\left\|A-\hat{A}_k\right\|_F=\sqrt{\sigma_{k+1}^2+\ldots+\sigma_r^2} .

Sketch of proof: The proof rests on the fact that the Frobenius norm, is invariant by rotation of the input and output spaces, that is, \left\|U^T B V\right\|_F=\|B\|_F for any matrix B, and orthogonal matrices U, V of appropriate sizes. Since the rank is also invariant, we can reduce the problem to the case when A=\tilde{S}.

Example: low-rank approximation of a 4 \times 5 matrix.

Link with Principal Component Analysis

Principal Component Analysis operates on the covariance matrix of the data, which is proportional to A A^T and sets the principal directions to be the eigenvectors of that (symmetric) matrix. As noted here, the eigenvectors of A A^T are simply the left singular vectors of A. Hence, both methods, the above approximation method, and PCA rely on the same tool, the SVD. The latter is a more complete approach as it also provides the eigenvectors of A A^T, which can be useful if we want to analyze the data in terms of rows instead of columns.

In particular, we can express the explained variance directly in terms of the singular values. In the context of visualization, the explained variance is simply the ratio of the total amount of variance in the projected data, to that in the original. More generally, when we are approximating a data matrix by a low-rank matrix, the explained variance compares the variance in the approximation to that in the original data. We can also interpret it geometrically, as the ratio of the squared norm of the approximation matrix to that of the original matrix:

\frac{\left\|\hat{A}_k\right\|_F^2}{\|A\|_F^2}=\frac{\sigma_1^2+\ldots+\sigma_k^2}{\sigma_1^2+\ldots+\sigma_n^2} .

License

Hyper-Textbook: Optimization Models and Applications Copyright © by L. El Ghaoui. All Rights Reserved.

Share This Book