37
37.1. Set of solutions
[latexpage]
The following theorem provides all the solutions (optimal set) of a least-squares problem.
Theorem: optimal set of ordinary least-squares
The optimal set of the OLS problem \[ p^* = \min_x \|A x – y\|_2 \] can be expressed as \[ \mathbf{X}^{\text {opt }} = A^{\dagger} y + \mathbf{N}(A), \] where $A^{\dagger}$ is the pseudo-inverse of $A$, and $A^{\dagger} y$ is the minimum-norm point in the optimal set. If $A$ is full column rank, the solution is unique, and equal to \[ x^* = A^{\dagger} y = (A^T A)^{-1} A^T y. \] In general, the particular solution $A^{\dagger} y$ is the minimum-norm solution to the least-squares problem. |
37.2. Sensitivity analysis
We consider the situation where
\[ y + \delta y = A x, \]
with
- $A \in \mathbb{R}^{m \times n}$ the data matrix (known), with $A$ full column rank (hence $m \geq n$).
- $y \in \mathbb{R}^m$ is the measurement (known).
- $x \in \mathbb{R}^n$ is the vector to be estimated (unknown).
- $\delta y \in \mathbb{R}^m$ is a measurement noise or error (unknown).
We can use OLS to provide an estimate $\hat{x}_{\text {OLS }}$ of $X$. The idea is to seek the smallest vector $\delta y$ such that the above equation becomes feasible, that is,
\[ \min_{x, \delta y} \|\delta y\|_2 : y + \delta y = A x. \]
This leads to the OLS problem:
\[ \min_x \|A x – y\|_2. \]
Since $A$ is full column rank, its SVD can be expressed as
\[ A = U \begin{pmatrix} \Sigma \\ 0 \end{pmatrix} V^T, \]
where $\Sigma=\mathbf{diag}\left(\sigma_1, \ldots, \sigma_m\right)$ contains the singular values of $A$, with $\sigma_1 \geq \ldots \geq \sigma_m>0$.
Since $A$ is full column rank, the solution $\hat{x}_{\mathrm{LS}}$ to the OLS problem is unique, and can be written as a linear function of the measurement vector $y$:
\[ \hat{x}_{\mathrm{LS}} = A^{\dagger} y, \]
with $A^{\dagger}$ the pseudo-inverse of $A$. Again, since $A$ is full column rank,
\[ A^{\dagger} = (A^T A)^{-1} A^T = V \begin{pmatrix} \Sigma^{-1} & 0 \end{pmatrix} U^T. \]
The OLS formulation provides an estimate $\hat{x}$ of the input $x$ such that the residual error vector $Ax-y$ is minimized in norm. We are interested in analyzing the impact of perturbations in the vector $y$, on the resulting solution $\hat{x}_{LS}$. We begin by analyzing the absolute errors in the estimate and then turn to the analysis of relative errors.
Set of possible errors
Let us assume a simple model of potential perturbations: we assume that $\delta y$ belongs to a unit ball: $||\delta y||_2 \leq \alpha$, where $\alpha>0$ is given. We will assume $\alpha=1$ for simplicity; the analysis is easily extended to any $\alpha>0$.
We have
\[ \begin{aligned} \delta x &:= x – \hat{x} \\ &= x – A^{\dagger} y \\ &= x – A^{\dagger}(A x – \delta y) \\ &= (I – A^{\dagger} A) x + A^{\dagger} \delta y \\ &= A^{\dagger} \delta y. \end{aligned} \]
In the above, we have exploited the fact that $A^{\dagger}$ is a left inverse of $A$, that is, $A^{\dagger} A=I_n$.
The set of possible errors on the solution $\delta x$ is then given by
\[ \mathbf{E} = \left\{ A^{\dagger} \delta y : \|\delta y\|_2 \leq 1 \right\}, \]
which is an ellipsoid centered at zero, with principal axes given by the singular values of $A^{\dagger}$. This ellipsoid can be interpreted as an ellipsoid of confidence for the estimate $\hat{x}$, with size and shape determined by the matrix $A^{\dagger}(A^{\dagger})^T$.
We can draw several conclusions from this analysis:
- The largest absolute error in the solution that can result from a unit-norm, additive perturbation on $y$ is of the order of $1 / \sigma_m$, where $\sigma_m$ is the smallest singular value of $A$.
- The largest relative error is $\sigma_1 / \sigma_m$, the condition number of $A$.
37.3. BLUE property
We now return to the case of an OLS with full column rank matrix $A$.
Unbiased linear estimators
Consider the family of linear estimators, which are of the form
\[ \hat{x} = B y, \]
where $B \in \mathbb{R}^{n \times m}$. To this estimator, we associate the error
\[ \begin{aligned} \delta x &= x – \hat{x} \\ &= x – B y \\ &= x – B(A x – \delta y) \\ &= (I – B A) x + B \delta y. \end{aligned} \]
We say that the estimator (as determined by matrix $B$) is unbiased if the first term is zero:
\[ B A = I. \]
Unbiased estimators only exist when the above equation is feasible, that is, $A$ has a left inverse. This is equivalent to our condition that $A$ be full column rank. Since $A^{\dagger}$ is a left-inverse of $A$, the OLS estimator is a particular case of an unbiased linear estimator.
Best unbiased linear estimator
The above analysis leads to the following question: which is the best unbiased linear estimator? One way to formulate this problem is to assume that the perturbation vector $\delta y$ is bounded in some way, and try to minimize the possible impact of such bounded errors on the solution.
Let us assume that $\delta y$ belongs to a unit ball: $\|\delta y\|_2 \leq 1$. The set of resulting errors on the solution $\delta x$ is then
\[ \mathbf{E} = \left\{ B \delta y : \|\delta y\|_2 \leq 1 \right\}, \]
which is an ellipsoid centered at zero, with principal axes given by the singular values of $B$. This ellipsoid can be interpreted as an ellipsoid of confidence for the estimate $\hat{x}$, with size and shape determined by the matrix $BB^T$.
It can be shown that the OLS estimator is optimal in the sense that it provides the ‘‘smallest’’ ellipsoid of confidence among all unbiased linear estimators. Specifically:
\[ B B^T \succeq A^{\dagger} (A^{\dagger})^T. \]
This optimality of the LS estimator is referred to as the BLUE (Best Linear Unbiased Estimator) property.