Symmetric Matrices
A symmetric matrix is a square matrix \(A\) such that
\[A^T= A\]
For example, in the context of least-squares solutions, \(\hat{\beta}\) satisfies \(X^TX\beta = X^TY\).
In this expression, \(X^TX\) is always symmetric. To confirm, let \(X\) be an \(m \times n\) matrix. Then
\[(X^TX)^T = X^T(X^T)^T = X^TX.\]
An \(n \times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exists
an orthogonal matrix \(P\) and a diagonal matrix \(D\) such that:
\[
A = PDP^T = PDP^{-1}
\]
For this diagonalization, \(A\) must \(n\) linearly independent and orthonormal eigenvectors.
Importantly, in such cases, \(A\) is symmetric because
\[
\begin{align*}
A^T = (PDP^T)^T &= (P^T)^T D^T P^T \\\\
&= PDP^T \\\\
&= A.
\end{align*}
\]
The converse is also always true. If \(A\) is symmetric, it is guaranteed to be orthogonally diagonalizable.
Theorem 1:
An \(n \times n\) matrix \(A\) is orthogonally diagonalizable if and only if \(A\) is
a symmetric matrix.
On Eigenvalues & Eigenvectors, we considered the example:
\[
\begin{align*}
A &= \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix} \\\\
&= PDP^{-1} \\\\
&= \begin{bmatrix} -1 & -1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix}
\begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix}
\begin{bmatrix} \frac{-1}{3} & \frac{-1}{3} & \frac{2}{3} \\
\frac{-1}{3} & \frac{2}{3} & \frac{-1}{3} \\
\frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{bmatrix}.
\end{align*}
\]
Since, \(A\) is symmetric, it must be orthogonally diagonalizable. To ensure \(P\) is orthonormal, we need
orthonormal eigenvectors. This can be achieved using the Gram-Schmidt algorithm on the columns of \(P\).
Since only \(v_1 \cdot v_2 \neq = 0\) needs adjustment,
\[
\begin{align*}
v'_2 &= v_2 - \frac{v_2 \cdot v_1}{v_1 \cdot v_1}v_1 \\\\
&= \begin{bmatrix} -1\\ 1\\ 0\\ \end{bmatrix} - \frac{1}{2}\begin{bmatrix} -1\\ 0\\ 1\\ \end{bmatrix} \\\\
&= \begin{bmatrix} \frac{-1}{2}\\ 1 \\ \frac{-1}{2} \end{bmatrix}.
\end{align*}
\]
After normalization, the orthonormal eigenvectors are:
\[
u_1 = \begin{bmatrix} \frac{-1}{\sqrt{2}} \\ 0\\ \frac{1}{\sqrt{2}} \\ \end{bmatrix}
,\quad
u_2 = \begin{bmatrix} \frac{-1}{\sqrt{6}}\\ \frac{2}{\sqrt{6}}\\ \frac{-1}{\sqrt{6}}\\ \end{bmatrix}
\quad \text{, and }
u_3 = \begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \end{bmatrix}
\]
Therefore,
\[
\begin{align*}
A &= PDP^T \\\\
&= \begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\
0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}}\\
\frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix}
\begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \\ \end{bmatrix}
\begin{bmatrix} \frac{-1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\
\frac{-1}{\sqrt{6}} & \frac{2}{\sqrt{6}} & \frac{-1}{\sqrt{6}} \\
\frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{bmatrix} \\\\
&= PDP^{-1}
\end{align*}
\]
Quadratic Forms
A quadratic form on \(\mathbb{R}^n\) is a function
\[
Q(x) = x^TAx \, \in \mathbb{R}^n
\]
where \(A\) is an \(n\times n\) symmetric matrix.
Let's use our symmetric matrix again:
\[
A = \begin{bmatrix} 4 & 1 & 1\\ 1 & 4 & 1 \\ 1 & 1 & 4 \\ \end{bmatrix}
\]
Then:
\[
\begin{align*}
Q(x) &= x^TAx \\\\
&= 4x_1^2 +4 x_2^2 + 4x_3^2 + 2x_1x_2 + 2x_2x_3 + 2x_3x_1
\end{align*}
\]
We can transform this quadratic form into one with no cross-product terms.
Let \(x = Py\) be a change of variable where \(P\) is an invertible matrix and \(y\) is a
new variable in \(\mathbb{R}^n\). Since \(A\) is symmetric, by Theorem 1, \(A\) is
orthogonally diagonalizable. Thus
\[
\begin{align*}
x^TAx &= (Py)^TA(Py) \\\\
&= y^TP^TAPy \\\\
&= y^T(P^TAP)y \\\\
&= y^T(P^TPDP^TP)y \\\\
&= y^TDy
\end{align*}
\]
where \(y = P^{-1}x = P^Tx\).
Since \(P^T A P = D\), where \(D\) is a diagonal matrix, we get:
\[
\begin{align*}
x^T A x &= y^T D y \\\\
&= \lambda_1 y_1^2 + \lambda_2 y_2^2 + \cdots + \lambda_n y_n^2
\end{align*}
\]
where \(\lambda_1, \lambda_2, \ldots, \lambda_n\) are the eigenvalues of \(A\).
For our example, \(Q(x)\) becomes:
\[
3y_1^2 + 3y_2^2 + 6 y_3^2
\]
Thus, the eigenvalues of \(A\) are directly connected to its quadratic form. First,
we define the classification of quadratic forms.
A quadratic form \(Q(x)\) is classified as:
- Positive definite:
\(\qquad \qquad A \succ 0\) if \(\, \forall x \neq 0, \, Q(x) > 0\).
- Positive semdefinite:
\(\qquad \qquad A \succeq 0\) if \(\, \forall x, \, Q(x) \geq 0\).
- Negative definite:
\(\qquad \qquad A \prec 0\) if \(\, \forall x \neq 0, \, Q(x) < 0\).
- Negative semdefinite:
\(\qquad \qquad A \preceq 0\) if \(\, \forall x, \, Q(x) \leq 0\).
- Indefinite:
\(\qquad \qquad\) otherwise.
Theorem 2:
Let \(A\) be an \(n \times n\) symmetric matrix. Then a quadratic form \(x^TAx\) is
- positive definite iff the eigenvalues of \(A\) are all positive.
- negative definite iff the eigenvalues of \(A\) are all negative.
As mentined above, an orthogonal change of variable \(x= Py\) transforms \(x^TAx\) into
\[
Q(x) = y^T D y = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2.
\]
where \(\lambda_1, \ldots, \lambda_n\) are the eigenvalues of \(A\).
There is a one-to-one correspondence between all \(x \neq 0\) and \(y \neq 0\) because \(P\) is an
invertible matrix. So, \(Q(x)\) is determined by the signs of eigenvalues(or spectrum) of \(A\).
The columns of \(P\) are called the principal axes of the quadratic form \(x^TAx\).
Note: To check whether a matrix \(A\) is positive definite, it is effective to verify the existance of a
Cholesky factorization \(A = R^TR\) where \(R\) is an upper triangular matrix with positve
diagonal entries. This factorization is a modified version of the LU factorization.
Singular Value Decomposition(SVD)
Let \(A\) be an \(m \times n\) matrix. The symmetric matrix \(A^TA\) is orthogonally diagonalizable.
Let \(\{v_1, \cdots, v_n\}\) be an orthonormal basis for \(\mathbb{R}^n\) consisting of eigenvectors
of \(A^TA\) with corresponding eigenvalues \(\lambda_1, \cdots, \lambda_n\).
For \(1 \leq i \leq n\), the following holds:
\[
\begin{align*}
\| Av_i \|^2 &= (Av_i)^TAv_i = v_i^T(A^TAv_i) \\\\
&= v_i^T(\lambda_iv_i) \\\\
&= \lambda_i \geq 0
\end{align*}
\]
We assume the eigenvalues are arranged in descending order:
\[
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n
\]
For \(1 \leq i \leq n\), the singular values of \(A\) are defined as
\[
\sigma_i = \sqrt{\lambda_i} = \| Av_i \| \geq 0.
\]
If \(A\) has \(r\) nonzero singular values, then \(\{Av_1, \cdots, Av_r\}\) forms an orthogonal basis
for \(\text{Col }A\) and
\[\text{rank } A = \dim \text{ Col }A = r.
\]
Theorem 3: Singular Value Decomposition
Let \(A\) be an \(m \times n\) matrix with rank \(r\). Then there exsit an \(m \times n\)
matrix \(\Sigma = \begin{bmatrix} D & 0 \\ 0 & 0 \\ \end{bmatrix}\) where \(D\) is a \(r \times r\)
diagonal matrix with the first \(r\) nonzero singular values of \(A\), \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\)
and there exist an \(m \times m\) orthogonal matrix \(U\), and an \(n \times n\) orthogonal matrix \(V\) such that
\[A = U\Sigma V^T\]
Proof:
Normalize orthogonal basis for \(\text{Col }A\) \(\{Av_1, \cdots, Av_r\}\), then we obtain an orthonormal basis
\(\{u_i, \cdots, u_r\}\), where
\[
u_i = \frac{1}{\| Av_i \|}Av_i = \frac{1}{\sigma_i}Av_i \quad 1 \leq i \leq r
\]
or,
\[
Av_i = \sigma_i u_i \tag{1}
\]
We extend \(\{u_i, \cdots, u_r\}\) to an orthonormal basis for \(\mathbb{R}^m\), \(\{u_i, \cdots, u_r, u_{r+1}, \cdots, u_m\}\).
Let \(U = \begin{bmatrix}u_1 & \cdots & u_m \end{bmatrix}\) and \(V = \begin{bmatrix}v_1 & \cdots & v_n \end{bmatrix}\).
Clearly, by construction, \(U\) and \(V\) are orthogonal matrices. Also, by Equation (1),
\[AV = \begin{bmatrix} \sigma_1u_1 & \cdots & \sigma_ru_r & 0 & \cdots & 0 \end{bmatrix}.\]
Let \(D\) be a diagonal matrix with diagonal entries \(\sigma_1, \cdots, \sigma_r\) and
let \(\Sigma =\begin{bmatrix} D & 0 \\ 0 & 0 \\ \end{bmatrix}\). Then
\[
\begin{align*}
U\Sigma &= \begin{bmatrix}\sigma_1u_1 & \cdots & \sigma_ru_r & 0 & \cdots & 0\end{bmatrix} \\\\
&= AV
\end{align*}
\]
Finally, since \(V\) is an orthogonal matrix,
\[(U\Sigma ) V^T = (AV)V^T = AI = A.\]
Steps for SVD:
- Compute Orthogonal decomposition of \(A^TA\):
Compute the eigenvalues and eigenvectors of \(A^TA\).
Note: In practice, avoid directly computing \(A^TA\) to prevent numerical errors.
- Construct \(V\) and \(\Sigma\):
Arrange eigenvalues in descending order.
Use the corresponding unit eigenvectors as the columns of \(V\).
- Construct \(U\):
For nonzero singular values, compute \(u_k = \frac{1}{\sigma_k}Av_k\).
If there are not enough nonzero singular values, extend the orthonormal basis.
Interactive SVD Visualization
This interactive demo visualizes the Singular Value Decomposition process for 2×2 matrices. SVD expresses any matrix A as a product A = UΣVT, which can be understood geometrically as a composition of three transformations:
- VT: Rotation/reflection using right singular vectors (orthogonal)
- Σ: Scaling along the axes (singular values)
- U: Rotation/reflection using left singular vectors (orthogonal)
Use the visualization to see how each component of SVD contributes to the overall transformation:
Try different matrices to observe how SVD works with various transformations:
- Identity: No transformation occurs
- Scaling: Only the singular values differ from 1
- Rotation: U and V are identical rotation matrices
- Symmetric: U = V (special case for symmetric matrices)
- Shear: Combination of rotation and non-uniform scaling
SVD with Fundamental Subspaces
Consider an \(m \times n\) matrix \(A = U\Sigma V^T\).
Let \(u_1, \cdots, u_m\) be the left singular vectors, \(v_1, \cdots, v_n\) be
the right singular vectors, \(\sigma_1, \cdots, \sigma_n\) be the singular values, and \(r\) be the rank of \(A\).
\(\{Av_1, \cdots, Av_r\} = \{\sigma_1u_1, \cdots, \sigma_ru_r\} \) is an orthogonal basis for \(\text{Col }A\) and then
\(\{u_1, \cdots, u_r\}\) is an orthonormal basis for \(\text{Col }A\). Also, \((\text{Col }A)^{\perp} = \text{Nul }A^T \).
Thus, \(\{u_{r+1}, \cdots, u_m\}\) is an orthonormal basis for \(\text{Nul }A^T\).
Since \(\| Av_i \| = \sigma_i = 0\) iff \(i > r \,\), \(v_{r+1}, \cdots, v_n\) span a subspace of \(\text{Nul }A\)
of dimension \(n-r\), but we know that \(\dim \text{Nul }A = n - \text{rank } A = n -r\).
Therefore, \(\{v_{r+1}, \cdots, v_n\}\) is an orthonormal basis for \(\text{Nul }A\).
Finally, since \((\text{Nul }A)^{\perp} = \text{Col }A^T = \text{Row }A \), \(\{v_1, \cdots, v_r\}\) is an orthonormal basis for \(\text{Row }A\).
Now, we assume \(A\) is an \(n \times n\) matrix. From above facts, we derive additional invertible matrix theorems (See: invertible matrix):
- \((\text{Col }A)^{\perp} = \{0\}\).
- \((\text{Nul }A)^{\perp} = \mathbb{R}^n\).
- \(\text{Row }A = \mathbb{R}^n\).
- \(A\) has \(n\) nonzero singular values.