Definition
For a vector [latex]z \in \mathbb{R}^m[/latex], the sample variance [latex]\sigma^2[/latex] measures the average deviation of its coefficients around the sample average [latex]\hat{x}[/latex]:
[latex]\begin{align*} \hat{z} &:= \frac{1}{m}(z(1)+\ldots+z(m)), \quad \sigma^2 := \frac{1}{m}\left((z(1)-\hat{z})^2+\ldots+(z(m)-\hat{z})^2\right), \end{align*}[/latex]
Now consider a matrix [latex]X = [x_1, \cdots, x_m] \in \mathbb{R}^{n\times m}[/latex], where each column [latex]x_i[/latex] represents a data point in [latex]\mathbb{R}^n[/latex]. We are interested in describing the amount of variance in this data set. To this end, we look at the numbers we obtain by projecting the data along a line defined by the direction [latex]u \in \mathbb{R}^n[/latex]. This corresponds to the vector in [latex]\mathbb{R}^m[/latex].
[latex]\begin{align*} z &= \begin{pmatrix} u^Tx_1 \\ \vdots \\ u^T x_m \end{pmatrix} = X^T u \in \mathbb{R}^m. \end{align*}[/latex]
The corresponding sample mean and variance are
[latex]\begin{align*} \hat{z} &= u^T \hat{x}, \quad \sigma^2(u) := \frac{1}{m} \sum\limits_{k=1}^m (u^Tx_k - u^T \hat{x})^2, \end{align*}[/latex]
where [latex]\hat{x} := \displaystyle\frac{1}{m}(x_1 + \cdots + x_m) \in \mathbb{R}^n[/latex] is the sample mean of the vectors [latex]x_1, \cdots, x_m[/latex].
The sample variance along direction [latex]u[/latex] can be expressed as a quadratic form in [latex]u[/latex]:
[latex]\begin{align*} \sigma^2(u) &= \frac{1}{m} \sum_{k=1}^m [u^T(x_k-\hat{x})]^2 = u^T\Sigma u, \end{align*}[/latex]
where [latex]\Sigma[/latex] is a [latex]n \times n[/latex] symmetric matrix, called the sample covariance matrix of the data points:
[latex]\begin{align*} \Sigma &= \frac{1}{m} \sum_{k=1}^m (x_k-\hat{x})(x_k - \hat{x})^T. \end{align*}[/latex]
Properties
The covariance matrix satisfies the following properties:
- The sample covariance matrix allows finding the variance along any direction in data space.
- The diagonal elements of [latex]\Sigma[/latex] give the variances of each vector in the data.
- The trace of [latex]\Sigma[/latex] gives the sum of all the variances.
- The matrix [latex]\Sigma[/latex] is positive semi-definite, since the associated quadratic form [latex]u \rightarrow u^T\Sigma u[/latex] is non-negative everywhere.