Singular value decomposition (SVD) theorem

Theorem: Singular Value Decomposition (SVD)
An arbitrary matrix A \in \mathbf{R}^{m \times n} admits a decomposition of the form

    \[A=\sum_{i=1}^r \sigma_i u_i v_i^T=U \tilde{S} V^T, \quad \tilde{S}:=\left(\begin{array}{cc} S & 0 \\ 0 & 0 \end{array}\right)\]

where U \in \mathbf{R}^{m \times m}, V \in \mathbf{R}^{n \times n} are both orthogonal matrices, and the matrix S is diagonal:

    \[S=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_r\right),\]

where the positive numbers \sigma_1 \geq \ldots \geq \sigma_r>0 are unique, and are called the singular values of A. The number r \leq \min (m, n) is equal to the rank of A, and the triplet (U, \tilde{S}, V) is called a singular value decomposition (SVD) of A. The first r columns of U: u_i, i=1, \ldots, r (resp. V: v_i, i=1, \ldots, r ) are called left (resp. right) singular vectors of A, and satisfy

    \[A v_i=\sigma_i u_i, \quad u_i^T A=\sigma_i v_i, \quad i=1, \ldots, r .\]

Proof: The matrix n \times n A^T A is real and symmetric. According to the spectral theorem, it admits an eigenvalue decomposition in the form A^T A=V \Lambda V^T, with V a n \times n matrix whose columns form an orthonormal basis (that is, V^T V=V V^T=I_n ), and \Lambda=\operatorname{diag}\left(\lambda_1, \ldots, \lambda_r, 0, \ldots, 0\right). Here, r is the rank of A^T A (if r=n then there are no trailing zeros in \Lambda ). Since A^T A is positive semi-definite, the \lambda_j ‘s are non-negative, and we can define the non-zero quantities \sigma_j:=\sqrt{\lambda_j}, j=1, \ldots, r.
Note that when j>r, A v_j=0, since then \left\|A v_j\right\|_2^2=v_j^T A^T A v_j=\lambda_j v_j^T v_j=0.
Let us construct an m \times m orthogonal matrix U as follows. We set

    \[u_i=\frac{1}{\sigma_i} A v_i, \quad i=1, \ldots, r .\]

These m-vectors are unit-norm, and mutually orthogonal since v_j ‘s are eigenvectors of A^T A. Using (say) the Gram-Schmidt orthogonalization procedure, we can complete (if necessary, that is in the case r<m ) this set of vectors by u_{r+1}, \ldots, u_m in order to form an orthogonal matrix U:=\left(u_1, \ldots, u_m\right) \in \mathbf{R}^m.
Let us check that U, V satisfy the conditions of the theorem, by showing that U^T A V^T=\tilde{S}:=\operatorname{diag}\left(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\right). We have

    \[\left(U^T A V\right)_{i j}=u_i^T A v_j= \begin{cases}\sigma_j u_i^T u_j & \text { if } j \leq r \\ 0 & \text { otherwise, }\end{cases}\]

where the second line stems from the fact that A v_j=0 when j>r. Thus, U^T A V=\tilde{S}, as claimed.

License

Hyper-Textbook: Optimization Models and Applications Copyright © by L. El Ghaoui. All Rights Reserved.

Share This Book