Sample covariance matrix
Definition
For a vector , the sample variance
measures the average deviation of its coefficients around the sample average
:
data:image/s3,"s3://crabby-images/f4b8b/f4b8b18a2bd5baea13cec9d8160f237cfd457153" alt="Rendered by QuickLaTeX.com \hat{z}:=\frac{1}{n}(z(1)+\ldots+z(m)), \quad \sigma^2:=\frac{1}{n}\left((z(1)-\hat{z})^2+\ldots+(z(m)-\hat{z})^2\right),"
Now consider a matrix , where each column
represents a data point in
. We are interested in describing the amount of variance in this data set. To this end, we look at the numbers we obtain by projecting the data along a line defined by the direction
. This corresponds to the (row) vector in
.
data:image/s3,"s3://crabby-images/828c3/828c31b08d400867de09bfa7bd87bd86eda51849" alt="Rendered by QuickLaTeX.com z = (u^Tx_1, \cdots, u^T x_m) = u^TX\in \mathbb{R}^m."
The corresponding sample mean and variance are
data:image/s3,"s3://crabby-images/f4779/f4779190d9c165a0d0679041b081ccfb14f57509" alt="Rendered by QuickLaTeX.com \hat{z} = u^T \hat{x}, \quad \sigma^2(u):= \frac{1}{m} \sum\limits_{k=1}^m (u^Tx_k - u^T \hat{x})^2,"
where is the sample mean of the vectors
.
The sample variance along direction can be expressed as a quadratic form in
:
![Rendered by QuickLaTeX.com \sigma^2(u) = \frac{1}{n} \sum\limits_{k=1}^n [u^T(x_k-\hat{x}]^2 = u^T\sum u,](https://ecampusontario.pressbooks.pub/app/uploads/quicklatex/quicklatex.com-d610723e38049e4205cb97cca2206749_l3.png)
where is a
symmetric matrix, called the sample covariance matrix of the data points:
data:image/s3,"s3://crabby-images/0f086/0f08602ae26a88a5b65589682beb746adbda873c" alt="Rendered by QuickLaTeX.com \sum: = \frac{1}{m} \sum\limits_{k=1}^m (x_k-\hat{x})(x_k - \hat{x})^T."
Properties
The covariance matrix satisfies the following properties.
- The sample covariance matrix allows finding the variance along any direction in data space.
- The diagonal elements of
give the variances of each vector in the data.
- The trace of
gives the sum of all the variances.
- The matrix
is positive semi-definite, since the associated quadratic form
is non-negative everywhere.
Matlab syntax
The following matlab syntax assumes that the data points in
are collected in a
matrix
:
.
>> xhat = mean(X,2); % mean of columns of matrix X >> Xc = X-xhat*ones(1,m); % centered data matrix >> Sigma = (1/m)*Xc'*Xc; % covariance matrix >> Sigma = cov(X',1); % built-in command produces the same thing