Optimal set of Least-Squares via SVD

L. El Ghaoui

Optimal set of Least-Squares via SVD

Theorem: optimal set of ordinary least-squares

The optimal set of the OLS problem

$p^*:= \min\limits_{x} ||Ax-y||_2$

can be expressed as

$X^{opt} = A^{\dagger} y + {\bf N}(A).$

where $A^{\dagger}$ is the pseudo-inverse of $A$ , and $A^{\dagger} y$ is the minimum-norm point in the optimal set. If $A$ is full column rank, the solution is unique, and equal to

$x^* = A^{\dagger} y = (A^TA)^{-1} A^T y.$

Proof: The following proof relies on the SVD of $A$ , and the rotational invariance of the Euclidean norm.

Optimal value of the problem

Using the SVD we can find the optimal set to the least-squares optimization problem

$p^*:= \min\limits_{x} ||Ax-y||_2^2$

Indeed, if $(U, \tilde{S}, V)$ is an SVD of $A$ , the problem can be written

$p^*=\min\limits_x\left\|U\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-y\right\|_2=\min\limits_x\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-U^T y\right\|_2^2,$

where we have exploited the fact that the Euclidean norm is invariant under the orthogonal transformation $U^T$ . With $\tilde{x}:=V^Tx$ , and $\tilde{y}:=U^Ty$ , and changing the variable $x$ to $\tilde{x}$ , we express the above as

$p^*= \min\limits_{\tilde{x}}\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) \tilde{x}- \tilde{y}\right\|_2^2.$

Expanding the terms, and using the partitioned notations $\tilde{x} = (\tilde{x}_r, \tilde{x}_{n-r})$ , $\tilde{y} = (\tilde{y}_r, \tilde{y}_{m-r})$ , we obtain

$p^* = \min\limits_{\tilde{x}_r} ||S\tilde{x}_r-\tilde{y}_{r}||_2^2 + ||\tilde{y}_{m-r}||_2^2.$

Since $S$ is invertible, we can reduce the first term in the objective to zero with the choice $\tilde{x}_r = S^{-1}\tilde{y}_r$ . Hence the optimal value is

$p^* = ||\tilde{y}_{m-r}||_2^2.$

We observe that the optimal value is zero if and only if $y \in {\bf R}(A)$ , which is exactly the same as $\tilde{y}_{m-r} = 0$ .

Optimal set

Let us detail the optimal set for the problem. The variable $\tilde{x}$ is partly determined, via its first $r$ components: $\tilde{x}_r^* = S^{-1}\tilde{y}_r$ . The remaining $n-r$ variables contained in $x_{n-r}$ are free, as $\tilde{x}_{n-r}$ does not appear in the objective function of the above problem.

Thus, optimal points are of the form $x= V\tilde{x}$ , with $\tilde{x} = (\tilde{x}^*_r, \tilde{x}_{n-r})$ , $\tilde{x}_r^* = S^{-1}\tilde{y}_r$ , and $\tilde{x}_{n-r}$ free.

To express this in terms of the original SVD of $A$ , we observe that $x= V\tilde{x}=V(\tilde{x}_r, \tilde{x}_{n-r})$ means that

$x=V_r \tilde{x}_r+V_{n-r} \tilde{x}_{n-r}=V_r S^{-1} \tilde{y}_r+V_{n-r} \tilde{x}_{n-r}$

where $V$ is partitioned as $V= (V_r, V_{n-r})$ , with $V_r \in \mathbb{R}^{n\times r}$ and $V_{n-r} \in \mathbb{R}^{n\times (n-r)}$ . Similarly, the vector $\tilde{y}_r$ can be expressed as $\tilde{y}_r = U_r^T y$ , with $U_r$ formed with the first $r$ columns of $U$ . Thus, any element $x^*$ in the optimal set is of the form

$x^* = x_{MN} + V_{n-r} z: z \in \mathbb{R}^{n-r},$

where $x_{MN}:= V_r S^{-1} U_r^T y$ . (We will soon explain the acronym appearing in the subscript.) The free components $\tilde{x}_{n-r}$ correspond to the degrees of freedom allowed to by the nullspace of $A$ .

Minimum-norm optimal point

The particular solution to the problem, $x_{MN}$ , is the minimum-norm solution, in the sense that it is the element of $X^{opt}$ that has the smallest Euclidean norm. This is best understood in the space of $\tilde{x}$ -variables.

Indeed, the particular choice $\tilde{x} = (\tilde{x}_r^*, 0)$ corresponds to the element in the optimal set that has the smallest Euclidean norm. Indeed, the norm of $x$ is the same as that of its rotated version, $\tilde{x}$ . the first $r$ elements in $\tilde{x}_r^*$ are fixed, and since $||\tilde{x}||_2^2 = ||\tilde{x}_r||_2^2 + ||\tilde{x}_{n-r}||_2^2$ , we see that the minimal norm is obtained with $\tilde{x}_{n-r} = 0$ .

Optimal set via the pseudo-inverse

The matrix $V_r S^{-1} U_r^T$ , which appears in the expression of the particular solution $x_{MN}$ mentioned above, is nothing else than the pseudo-inverse of $A$ , which is denoted $A^{\dagger}$ . Indeed, we can express the pseudo-inverse in terms of the SVD as

$A^{\dagger} = V\left(\begin{array}{ll} S^-1 & 0 \\ 0 & 0 \end{array}\right) U^T.$

With this convention, the minimum-norm optimal point is $A^{\dagger} y$ . Recall that the last $n-r$ columns of $V$ form a basis for the nullspace of $A$ . Hence the optimal set of the problem is

$X^{opt} = A^{\dagger}y + {\bf N}(A).$

When $A$ is full column rank $(r = n \leq m, A^TA \succ 0)$ , the optimal set reduces to a singleton, as the nullspace is ${0}$ . The unique optimal point expresses as

$x_{LS} = A^{\dagger}y = (A^TA)^{-1}A^Ty.$

Optimal set of Least-Squares via SVD

Optimal value of the problem

Optimal set

Minimum-norm optimal point

Optimal set via the pseudo-inverse

License

Share This Book