Optimal set of Least-Squares via SVD

Theorem: optimal set of ordinary least-squares

The optimal set of the OLS problem

p^*:= \min\limits_{x} ||Ax-y||_2

can be expressed as

X^{opt} = A^{\dagger} y + {\bf N}(A).

where A^{\dagger} is the pseudo-inverse of A, and A^{\dagger} y is the minimum-norm point in the optimal set. If A is full column rank, the solution is unique, and equal to

x^* = A^{\dagger} y = (A^TA)^{-1} A^T y.

Proof: The following proof relies on the SVD of A, and the rotational invariance of the Euclidean norm.

Optimal value of the problem

Using the SVD we can find the optimal set to the least-squares optimization problem

p^*:= \min\limits_{x} ||Ax-y||_2^2

Indeed, if (U, \tilde{S}, V) is an SVD of A, the problem can be written

p^*=\min\limits_x\left\|U\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-y\right\|_2=\min\limits_x\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-U^T y\right\|_2^2,

where we have exploited the fact that the Euclidean norm is invariant under the orthogonal transformation U^T. With \tilde{x}:=V^Tx, and \tilde{y}:=U^Ty, and changing the variable x to \tilde{x}, we express the above as

p^*= \min\limits_{\tilde{x}}\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) \tilde{x}- \tilde{y}\right\|_2^2.

Expanding the terms, and using the partitioned notations \tilde{x} = (\tilde{x}_r, \tilde{x}_{n-r}), \tilde{y} = (\tilde{y}_r, \tilde{y}_{m-r}), we obtain

p^* = \min\limits_{\tilde{x}_r} ||S\tilde{x}_r-\tilde{y}_{r}||_2^2 + ||\tilde{y}_{m-r}||_2^2.

Since S is invertible, we can reduce the first term in the objective to zero with the choice \tilde{x}_r = S^{-1}\tilde{y}_r. Hence the optimal value is

p^* = ||\tilde{y}_{m-r}||_2^2.

We observe that the optimal value is zero if and only if y \in {\bf R}(A), which is exactly the same as \tilde{y}_{m-r} = 0.

Optimal set

Let us detail the optimal set for the problem. The variable \tilde{x} is partly determined, via its first r components: \tilde{x}_r^* = S^{-1}\tilde{y}_r. The remaining n-r variables contained in x_{n-r} are free, as \tilde{x}_{n-r} does not appear in the objective function of the above problem.

Thus, optimal points are of the form x= V\tilde{x}, with \tilde{x} = (\tilde{x}^*_r, \tilde{x}_{n-r}), \tilde{x}_r^* = S^{-1}\tilde{y}_r, and \tilde{x}_{n-r} free.

To express this in terms of the original SVD of A, we observe that x= V\tilde{x}=V(\tilde{x}_r, \tilde{x}_{n-r}) means that

x=V_r \tilde{x}_r+V_{n-r} \tilde{x}_{n-r}=V_r S^{-1} \tilde{y}_r+V_{n-r} \tilde{x}_{n-r}

where V is partitioned as V= (V_r, V_{n-r}), with V_r \in \mathbb{R}^{n\times r} and V_{n-r} \in \mathbb{R}^{n\times (n-r)}. Similarly, the vector \tilde{y}_r can be expressed as \tilde{y}_r = U_r^T y, with U_r formed with the first r columns of U. Thus, any element x^* in the optimal set is of the form

x^* = x_{MN} + V_{n-r} z: z \in \mathbb{R}^{n-r},

where x_{MN}:= V_r S^{-1} U_r^T y. (We will soon explain the acronym appearing in the subscript.) The free components \tilde{x}_{n-r} correspond to the degrees of freedom allowed to by the nullspace of A.

Minimum-norm optimal point

The particular solution to the problem, x_{MN}, is the minimum-norm solution, in the sense that it is the element of X^{opt} that has the smallest Euclidean norm. This is best understood in the space of \tilde{x}-variables.

Indeed, the particular choice \tilde{x} = (\tilde{x}_r^*, 0) corresponds to the element in the optimal set that has the smallest Euclidean norm. Indeed, the norm of x is the same as that of its rotated version, \tilde{x}. the first r elements in \tilde{x}_r^*  are fixed, and since ||\tilde{x}||_2^2 = ||\tilde{x}_r||_2^2 + ||\tilde{x}_{n-r}||_2^2, we see that the minimal norm is obtained with \tilde{x}_{n-r} = 0.

Optimal set via the pseudo-inverse

The matrix V_r S^{-1} U_r^T, which appears in the expression of the particular solution x_{MN} mentioned above, is nothing else than the pseudo-inverse of A, which is denoted A^{\dagger}. Indeed, we can express the pseudo-inverse in terms of the SVD as

A^{\dagger} = V\left(\begin{array}{ll} S^-1 & 0 \\ 0 & 0 \end{array}\right) U^T.

With this convention, the minimum-norm optimal point is A^{\dagger} yRecall that the last n-r columns of V form a basis for the nullspace of A. Hence the optimal set of the problem is

X^{opt} = A^{\dagger}y + {\bf N}(A).

When A is full column rank (r = n \leq m, A^TA \succ 0), the optimal set reduces to a singleton, as the nullspace is {0}. The unique optimal point expresses as

x_{LS} = A^{\dagger}y = (A^TA)^{-1}A^Ty.

License

Hyper-Textbook: Optimization Models and Applications Copyright © by L. El Ghaoui. All Rights Reserved.

Share This Book