"
Theorem: optimal set of ordinary least-squares
 

The optimal set of the OLS problem

[latex]\begin{align*} p^*:= \min\limits_{x} ||Ax-y||_2, \end{align*}[/latex]

can be expressed as

[latex]\begin{align*} \mathbf{X}^\mathrm{opt} = A^{\dagger} y + {\bf N}(A), \end{align*}[/latex]

where [latex]A^{\dagger}[/latex] is the pseudo-inverse of [latex]A[/latex], and [latex]A^{\dagger} y[/latex] is the minimum-norm point in the optimal set. If [latex]A[/latex] is full column rank, the solution is unique, and equal to

[latex]\begin{align*} x^* = A^{\dagger} y = (A^TA)^{-1} A^T y. \end{align*}[/latex]

Proof: The following proof relies on the SVD of [latex]A[/latex], and the rotational invariance of the Euclidean norm.

Optimal value of the problem

Using the SVD we can find the optimal set to the least-squares optimization problem

[latex]\begin{align*} p^*:= \min\limits_{x} ||Ax-y||_2^2, \end{align*}[/latex]

Indeed, if [latex](U, \tilde{S}, V)[/latex] is an SVD of [latex]A[/latex], the problem can be written

[latex]\begin{align*} p^*=\min\limits_x\left\|U\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-y\right\|_2=\min\limits_x\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) V^T x-U^T y\right\|_2^2, \end{align*}[/latex]

where we have exploited the fact that the Euclidean norm is invariant under the orthogonal transformation [latex]U^T[/latex]. With [latex]\tilde{x}:=V^Tx[/latex], and [latex]\tilde{y}:=U^Ty[/latex], and changing the variable [latex]x[/latex] to [latex]\tilde{x}[/latex], we express the above as

[latex]\begin{align*} p^*= \min\limits_{\tilde{x}}\left\|\left(\begin{array}{ll} S & 0 \\ 0 & 0 \end{array}\right) \tilde{x}- \tilde{y}\right\|_2^2. \end{align*}[/latex]

Expanding the terms, and using the partitioned notations [latex]\tilde{x} = (\tilde{x}_r, \tilde{x}_{n-r})[/latex], [latex]\tilde{y} = (\tilde{y}_r, \tilde{y}_{m-r})[/latex], we obtain

[latex]\begin{align*} p^* = \min\limits_{\tilde{x}_r} ||S\tilde{x}_r-\tilde{y}_{r}||_2^2 + ||\tilde{y}_{m-r}||_2^2. \end{align*}[/latex]

Since [latex]S[/latex] is invertible, we can reduce the first term in the objective to zero with the choice [latex]\tilde{x}_r = S^{-1}\tilde{y}_r[/latex]. Hence the optimal value is

[latex]\begin{align*} p^* = ||\tilde{y}_{m-r}||_2^2. \end{align*}[/latex]

We observe that the optimal value is zero if and only if [latex]y \in {\bf R}(A)[/latex], which is exactly the same as [latex]\tilde{y}_{m-r} = 0[/latex].

Optimal set

Let us detail the optimal set for the problem. The variable [latex]\tilde{x}[/latex] is partly determined, via its first [latex]r[/latex] components: [latex]\tilde{x}_r^* = S^{-1}\tilde{y}_r[/latex]. The remaining [latex]n-r[/latex] variables contained in [latex]x_{n-r}[/latex] are free, as [latex]\tilde{x}_{n-r}[/latex] does not appear in the objective function of the above problem.

Thus, optimal points are of the form [latex]x= V\tilde{x}[/latex], with [latex]\tilde{x} = (\tilde{x}^*_r, \tilde{x}_{n-r})[/latex], [latex]\tilde{x}_r^* = S^{-1}\tilde{y}_r[/latex], and [latex]\tilde{x}_{n-r}[/latex] free.

To express this in terms of the original SVD of [latex]A[/latex], we observe that [latex]x= V\tilde{x}=V(\tilde{x}_r, \tilde{x}_{n-r})[/latex] means that

[latex]\begin{align*} x=V_r \tilde{x}_r+V_{n-r} \tilde{x}_{n-r}=V_r S^{-1} \tilde{y}_r+V_{n-r} \tilde{x}_{n-r} \end{align*}[/latex]

where [latex]V[/latex] is partitioned as [latex]V= (V_r, V_{n-r})[/latex], with [latex]V_r \in \mathbb{R}^{n\times r}[/latex] and [latex]V_{n-r} \in \mathbb{R}^{n\times (n-r)}[/latex]. Similarly, the vector [latex]\tilde{y}_r[/latex] can be expressed as [latex]\tilde{y}_r = U_r^T y[/latex], with [latex]U_r[/latex] formed with the first [latex]r[/latex] columns of [latex]U[/latex]. Thus, any element [latex]x^*[/latex] in the optimal set is of the form

[latex]\begin{align*} x^* = x_\mathrm{MN} + V_{n-r} z: z \in \mathbb{R}^{n-r}, \end{align*}[/latex]

where [latex]x_\mathrm{MN}:= V_r S^{-1} U_r^T y[/latex]. (We will soon explain the acronym appearing in the subscript.) The free components [latex]\tilde{x}_{n-r}[/latex] correspond to the degrees of freedom allowed to by the nullspace of [latex]A[/latex].

Minimum-norm optimal point

The particular solution to the problem, [latex]x_\mathrm{MN}[/latex], is the minimum-norm solution, in the sense that it is the element of [latex]\mathbf{X}^\mathrm{opt}[/latex] that has the smallest Euclidean norm. This is best understood in the space of [latex]\tilde{x}[/latex]-variables.

Indeed, the particular choice [latex]\tilde{x} = (\tilde{x}_r^*, 0)[/latex] corresponds to the element in the optimal set that has the smallest Euclidean norm. Indeed, the norm of [latex]x[/latex] is the same as that of its rotated version, [latex]\tilde{x}[/latex]. the first [latex]r[/latex] elements in [latex]\tilde{x}_r^*[/latex]  are fixed, and since [latex]||\tilde{x}||_2^2 = ||\tilde{x}_r||_2^2 + ||\tilde{x}_{n-r}||_2^2[/latex], we see that the minimal norm is obtained with [latex]\tilde{x}_{n-r} = 0[/latex].

Optimal set via the pseudo-inverse

The matrix [latex]V_r S^{-1} U_r^T[/latex], which appears in the expression of the particular solution [latex]x_\mathrm{MN}[/latex] mentioned above, is nothing else than the pseudo-inverse of [latex]A[/latex], which is denoted [latex]A^{\dagger}[/latex]. Indeed, we can express the pseudo-inverse in terms of the SVD as

[latex]\begin{align*} A^{\dagger} = V\left(\begin{array}{ll} S^-1 & 0 \\ 0 & 0 \end{array}\right) U^T. \end{align*}[/latex]

With this convention, the minimum-norm optimal point is [latex]A^{\dagger} y[/latex]. Recall that the last [latex]n-r[/latex] columns of [latex]V[/latex] form a basis for the nullspace of [latex]A[/latex]. Hence the optimal set of the problem is

[latex]\begin{align*} \mathbf{X}^\mathrm{opt} = A^{\dagger}y + {\bf N}(A). \end{align*}[/latex]

When [latex]A[/latex] is full column rank [latex](r = n \leq m,\; A^TA \succ 0)[/latex], the optimal set reduces to a singleton (a set with only one element), as the nullspace is [latex]{0}[/latex]. The unique optimal point expresses as

[latex]\begin{align*} x_\mathrm{LS} = A^{\dagger}y = (A^TA)^{-1}A^Ty. \end{align*}[/latex]

License

Icon for the Public Domain license

This work (Đại số tuyến tính by Tony Tin) is free of known copyright restrictions.