[latexpage]
Theorem: Singular Value Decomposition (SVD)
An arbitrary matrix $A \in \mathbb{R}^{m \times n}$ admits a decomposition of the form $$ where $U \in \mathbb{R}^{m \times m}, V \in \mathbb{R}^{n \times n}$ are both orthogonal matrices, and the matrix $S$ is diagonal: $$ where the positive numbers $\sigma_1 \geq \ldots \geq \sigma_r>0$ are unique, and are called the singular values of $A$. The number $r \leq \min (m, n)$ is equal to the rank of $A$, and the triplet $(U, \tilde{S}, V)$ is called a singular value decomposition (SVD) of $A$. The first $r$ columns of $U: u_i, i=1, \ldots, r$ (resp. $V: v_i, i=1, \ldots, r$ ) are called left (resp. right) singular vectors of $A$, and satisfy $$ |
Proof: The matrix $n \times n \; A^T A$ is real and symmetric. According to the spectral theorem, it admits an eigenvalue decomposition in the form $A^T A=V \Lambda V^T$, with $V$ a $n \times n$ matrix whose columns form an orthonormal basis (that is, $V^T V=V V^T=I_n$ ), and $\Lambda=\mathbf{diag}\left(\lambda_1, \ldots, \lambda_r, 0, \ldots, 0\right)$. Here, $r$ is the rank of $A^T A$ (if $r=n$ then there are no trailing zeros in $\Lambda$ ). Since $A^T A$ is positive semi-definite, the $\lambda_j$ ‘s are non-negative, and we can define the non-zero quantities $\sigma_j:=\sqrt{\lambda_j}, j=1, \ldots, r$.
Note that when $j>r, A v_j=0$, since then $\left\|A v_j\right\|_2^2=v_j^T A^T A v_j=\lambda_j v_j^T v_j=0$.
Let us construct an $m \times m$ orthogonal matrix $U$ as follows. We set
$$
u_i=\frac{1}{\sigma_i} A v_i, \quad i=1, \ldots, r .
$$
These $m$-vectors are unit-norm, and mutually orthogonal since $v_j$ ‘s are eigenvectors of $A^T A$. Using (say) the Gram-Schmidt orthogonalization procedure, we can complete (if necessary, that is in the case $r<m$) this set of vectors by $u_{r+1}, \ldots, u_m$ in order to form an orthogonal matrix $U:=\left(u_1, \ldots, u_m\right) \in \mathbb{R}^m$.
Let us check that $U, V$ satisfy the conditions of the theorem, by showing that
$$
U^T A V^T=\tilde{S}:=\mathbf{diag}\left(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\right).
$$
We have
$$
\left(U^T A V\right)_{i j}=u_i^T A v_j= \begin{cases}\sigma_j u_i^T u_j & \text { if } j \leq r \\ 0 & \text { otherwise, }\end{cases}
$$
where the second line stems from the fact that $A v_j=0$ when $j>r$. Thus, $U^T A V=\tilde{S}$, as claimed.