Covariance Matrix
From this section, we consider multivariate models. First of all, we need a way to measure
the dependence of variables each other.
The covariance between two random variables \(X\) and \(Y\) is defined as
\[
\text{Cov }[X, Y] = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])].
\]
Here, \((X - \mathbb{E}[X])\) and \((Y - \mathbb{E}[Y])\) are mean deviations which represent
how far the random variables \(X\) and \(Y\) deviate from their respective expected values(means). So, the covariance
measures how these deviation of random variables vary together(joint variation).
Note: \(\text{Cov }[X, Y] = \mathbb{E}[XY]- \mathbb{E}[X]\mathbb{E}[Y]\). So, if \(X\) and \(Y\) are independent,
\(\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]\) and thus \(\text{Cov }[X, Y] = 0\). However, in general, the converse is
NOT true. Zero covariance only tells us there is no linear dependence. The positive(negative) covariance
indicates positive(negative) linear association.
In practice, often we extend this idea to \(n\) random variables. Consider a vector \(x \in \mathbb{R}^n\) whose each
entries represent random variables \(X_1, \cdots X_n\).
The population covariance matrix of the vector \(x\) is defined as
\[
\begin{align*}
\Sigma &= \text{Cov }[x] \\\\
&= \mathbb{E}[(x - \mathbb{E}[x])(x - \mathbb{E}[x])^T] \\\\
&= \begin{bmatrix}
\text{Var }[X_1] & \text{Cov }[X_1, X_2] & \cdots & \text{Cov }[X_1, X_n] \\
\text{Cov }[X_2, X_1] & \text{Var }[X_2] & \cdots & \text{Cov }[X_2, X_n] \\
\vdots & \vdots & \ddots & \vdots \\
\text{Cov }[X_n, X_1] & \text{Cov }[X_n, X_2] & \cdots & \text{Var }[X_n]
\end{bmatrix} \\\\
&= \mathbb{E }[xx^T] - \mu \mu^T
\end{align*}
\]
Each diagonal entry \(\Sigma_{ii}\) represents the variance of \(X_i\) because
\[
\text{Cov }[X_i, X_i] = \mathbb{E}[(X_i - \mathbb{E}[X_i])^2] = \sigma^2
\]
Also, the total variance of \(\Sigma \) is defined as the trace of \(\Sigma \):
\[
\text{tr } (\Sigma ) = \sum_{i=1}^n \text{Var }[X_i].
\]
The covariance matrix is symmetric because the covariance itself is symmetric:
\[
\Sigma_{ij} = \text{Cov }[X_i, X_j] = \text{Cov }[X_j, X_i] = \Sigma_{ji}.
\]
(Or, for any matrix \(A\), \(\, AA^T\) is symmetric because \((AA^T)^T = (A^T)^TA^T = AA^T \).)
Since \(\Sigma\) is symmetric, it is always orthogonally diagonalizable:
\[
\Sigma = P D P^T
\]
where \(P\) is an orthogonal matrix whose columns are unit eigenvectors of \(\Sigma\) and \(D\) is a diagonal matrix
whose diagonal entries are eigenvalues of \(\Sigma\) corresponding to its eigenvectors.
Note: The total variance of \(\Sigma\) is equal to the sum of its eigenvalues.(ONLY the total!)
\[
\text{tr } (\Sigma ) = \sum_{i=1}^n \text{Var }[X_i] = \text{tr } (D) = \sum_{i=1}^n \lambda_i
\]
Moreover, the covariance matrix is always positive semi-definite:
For any vector \(v \in \mathbb{R}^n\),
\[
\begin{align*}
v^T \Sigma v &= v^T\mathbb{E}[(x - \mathbb{E}[x])(x - \mathbb{E}[x])^T] v \\\\
&= \mathbb{E}[v^T(x - \mathbb{E}[x])(x - \mathbb{E}[x])^T v] \\\\
&= \mathbb{E}[(v^T(x - \mathbb{E}[x]))^2] \geq 0
\end{align*}
\]
Thus, the diagonal entries of \(D\) are non-negative. In other words, the eigenvalues of \(\, \Sigma\)
always satisfies:
\[
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0.
\]
SideNote: \(\mathbb{E }[xx^T]\) is called the autocorrelation matrix denoted as \(R_{xx}\):
\[
\begin{align*}
R_{xx}
&= \mathbb{E }[xx^T] \\\\
&= \begin{bmatrix}
\mathbb{E }[X_1^2] & \mathbb{E }[X_1 X_2] & \cdots & \mathbb{E }[X_1 X_n] \\
\mathbb{E }[X_2 X_1] & \mathbb{E }[X_2^2] & \cdots & \mathbb{E }[X_2 X_n] \\
\vdots & \vdots & \ddots & \vdots \\
\mathbb{E }[X_n X_1] & \mathbb{E }[X_n X_2] & \cdots & \mathbb{E }[X_n^2]
\end{bmatrix} \\\\
\end{align*}
\]