Low-rank approximation of a matrix
- Low-rank approximations
- Link with PCA
Low-rank approximations
We consider a matrix , with SVD given as in the SVD theorem:
data:image/s3,"s3://crabby-images/949f2/949f2f953cc444aa235259771d9dd35eca0571f7" alt="Rendered by QuickLaTeX.com A=U \tilde{S} V^T, \quad \tilde{S}=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\right),"
where the singular values are ordered in decreasing order, . In many applications, it can be useful to approximate
with a low-rank matrix.
Example: Assume that contains the log returns of
assets over
time periods so that each column of
is a time series for a particular asset. Approximating
by a rank-one matrix of the form
, with
and
amounts to model the assets’ movements as all following the same pattern given by the time-profile
, each asset’s movements being scaled by the components in
. Indeed, the
component of
, which is the log-return of asset
at time
, then expresses as
.
We consider the low-rank approximation problem
data:image/s3,"s3://crabby-images/88070/880709004fbd927fdd70071e5009e4187656485f" alt="Rendered by QuickLaTeX.com \min _X\|A-X\|_F: \operatorname{rank}(X)=k,"
where (
) is given. In the above, we measure the error in the approximation using the Frobenius norm; using the largest singular value norm leads to the same set of solutions
.
A best -rank approximation
is given by zeroing out the
trailing singular values of
, that is
data:image/s3,"s3://crabby-images/1190f/1190f8f7d86986619e0e057b3f82ba48e3179339" alt="Rendered by QuickLaTeX.com \hat{A}_k=U \hat{S}_k V^T, \hat{S}_k=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_k, 0, \ldots, 0\right) ."
The minimal error is given by the Euclidean norm of the singular values that have been zeroed out in the process:
data:image/s3,"s3://crabby-images/b4e67/b4e67213e3f85a272c0a995f49a41d717b9095db" alt="Rendered by QuickLaTeX.com \left\|A-\hat{A}_k\right\|_F=\sqrt{\sigma_{k+1}^2+\ldots+\sigma_r^2} ."
Sketch of proof: The proof rests on the fact that the Frobenius norm, is invariant by rotation of the input and output spaces, that is, for any matrix
, and orthogonal matrices
of appropriate sizes. Since the rank is also invariant, we can reduce the problem to the case when
.
Example: low-rank approximation of a matrix.
Link with Principal Component Analysis
Principal Component Analysis operates on the covariance matrix of the data, which is proportional to and sets the principal directions to be the eigenvectors of that (symmetric) matrix. As noted here, the eigenvectors of
are simply the left singular vectors of
. Hence, both methods, the above approximation method, and PCA rely on the same tool, the SVD. The latter is a more complete approach as it also provides the eigenvectors of
, which can be useful if we want to analyze the data in terms of rows instead of columns.
In particular, we can express the explained variance directly in terms of the singular values. In the context of visualization, the explained variance is simply the ratio of the total amount of variance in the projected data, to that in the original. More generally, when we are approximating a data matrix by a low-rank matrix, the explained variance compares the variance in the approximation to that in the original data. We can also interpret it geometrically, as the ratio of the squared norm of the approximation matrix to that of the original matrix:
data:image/s3,"s3://crabby-images/ec98a/ec98accf2d446d67a4a300a0c0974fc8752fca5d" alt="Rendered by QuickLaTeX.com \frac{\left\|\hat{A}_k\right\|_F^2}{\|A\|_F^2}=\frac{\sigma_1^2+\ldots+\sigma_k^2}{\sigma_1^2+\ldots+\sigma_n^2} ."