Diagonal Matrix
A diagonal matrix is \(n \times n\) square matrix whose entries outside the main diagonal are all zero.
\[AB = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
\begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix}
= \begin{bmatrix} 2 & 6 & 12 \\ 8 & 15 & 24 \\ 14 & 24 & 36 \end{bmatrix}
\]
\[BA = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix}
\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
= \begin{bmatrix} 2 & 4 & 6 \\ 12 & 15 & 18 \\ 28 & 32 & 36 \end{bmatrix}
\]
As you can see, multiplying on the right by the diagonal matrix \(B\) results in each "column"
of \(A\) being scaled by the corresponding diagonal entry of \(B\). Conversely, multiplying on
the left by \(B\) scales each "row" of \(A\).
A diagonal matrix with 1's on the main diagonal is called an identity matrix denoted by \(I_n\).
Like the above example, \(AB \neq BA\) in general, but clearly \(AI = IA\) and the resulting matix is just the original matrix \(A\).
For instance,
\[AI = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}.
\]
Transpose
The transpose of an \(m \times n\) matrix \(A\) is the \(n \times m\) matrix
interchanging its rows and corresponding columns, and is denoted by \(A^T\).
\[
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \qquad
A^T = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix}
\]
Note that the transpose operation does not change the main diagonal entries.
There are many properties related to the transpose operation. Let me introduce some of them:
- \((A^T)^T = A\)
- \((A+B)^T = A^T + B^T\)
- \((AB)^T = B^TA^T\)
For the third property, we can state the following:
Theorem 1:
The transpose of the product of matrices is the product of their transposes in reverse order.
Proof:
Let \(n\) be the number of columns of \(A\) (which equals to the number of rows of \(B\)). Then,
\[
((AB)^T)_{ij} = (AB)_{ji} =\sum_{k=1}^n a_{jk}b_{ki}.
\]
Also,
\[
(B^TA^T)_{ij} = \sum_{k=1}^n b_{ki}a_{jk} = \sum_{k=1}^n a_{jk}b_{ki}.
\]
This property extends to the product of more than two matrices.
Inverse of a Matrix
An \(n \times n\) matrix \(A\) is said to be invertible
if there exists an \(n \times n\) matrix \(B\) such that \[AB = BA = I.\]
The matrix \(B\) is the inverse of \(A\) denoted by \(A^{-1}\). Also, a matrix that is NOT invertible is
called a singular matrix.
By definition, we can say that \((A^{-1})^{-1} = A \, \) and also \((AB)^{-1} = B^{-1}A^{-1}\) because
\[
\begin{align*}
(AB)(B^{-1}A^{-1}) &= A(BB^{-1})A^{-1} \\\\
&= AIA^{-1} \\\\
&= AA^{-1} \\\\
&= I.
\end{align*}
\]
Using Theorem 1, we obtain the following result:
Theorem 2:
If \(A\) is invertible, then so is \(A^T\) and \((A^T)^{-1} = (A^{-1})^T \).
Proof:
Suppose a matrix \(A\) is an invertible matrix. By Theorem 1,
\[(A^{-1})^TA^T = (AA^{-1})^T = I^T = I\]
and similarly,
\[A^T(A^{-1})^T = (A^{-1}A)^T = I^T = I.\]
Thus, \(A^T\) is invertible and its inverse is \((A^{-1})^T\).
There is a simple formula for the inverse of a \(2 \times 2\) matrix.
Let \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\).
If the determinant of \(A\), \(\det (A) = ad - bc\) is nonzero, then \(A\) is
invertible and
\[
A^{-1} = \frac{1}{\det (A)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
\]
So, \(\det (A) = 0\) implies the matrix \(A\) is not invertible.
For example,
\[
\begin{align*}
\begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix}
\frac{1}{(2 - 72)}
\begin{bmatrix} 2 & -9 \\ -8 & 1 \end{bmatrix}
&=
\begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix}
\begin{bmatrix} -\frac{1}{35} & \frac{9}{70} \\ \frac{4}{35} & -\frac{1}{70} \end{bmatrix} \\\\
&=
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\end{align*}
\]
Now we can solve the matrix equation \(Ax = b\) for the vector \(x\) by using the inverse:
\[
\begin{align*}
Ax=b &\Rightarrow A^{-1}Ax= A^{-1}b \\\\
&\Rightarrow Ix = A^{-1}b \\\\
&\Rightarrow x = A^{-1}b
\end{align*}
\]
Note: In practice, solving by row reduction is often faster than finding \(A^{-1}\) and can be more accurate.
Finally, let's consider an invertible linear transformation \(T:\mathbb{R}^n \to \mathbb{R}^n\).
\(T\) is invertible if
\[\exists S: \mathbb{R}^n \to \mathbb{R}^n s.t. \forall x \in \mathbb{R}^n, S(T(x))=x \text{ and } T(S(x))=x.\]
Elementary Matrices
An elemetary matrix represents a single elemetary row opelation applied to an identity matrix.
Let
\[
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
\]
and
\[
E_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1\\ 0 & 1 & 0 \end{bmatrix}, \quad
E_2 = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad
E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3 & 1 \end{bmatrix}
\]
Then:
\[
\begin{align*}
&E_1A = \begin{bmatrix} 1 & 2 & 3 \\ 7 & 8 & 9 \\ 4 & 5 & 6 \end{bmatrix} \\\\
&E_2A = \begin{bmatrix} 2 & 4 & 6 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\\
&E_3A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ -5 & -7 & -9 \end{bmatrix}
\end{align*}
\]
\(E_1\) interchanges Row 2 and Row 3, \(E_2\) is scales Row 1 by 2,
and \(E_3\) adds (Row2 \(\times-3\)) to Row3. Moreover, we can store a "sequene" of the
row operations into a single matrix. For example:
\[
\begin{align*}
E_3E_2E_1A &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & -3 \end{bmatrix}
\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\
&= \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\-17 & -19 & -21 \end{bmatrix}\\\\
&= B
\end{align*}
\]
Since elementary row operations are "reversible", there always exsits a corresponding inverse matrix. Thus,
using the inverse of elementary matrices, we can recover the original mmatrix \(A\) from \(B\).
\[(E_1^{-1}E_2^{-1}E_3^{-1})E_3E_2E_1A = (E_1^{-1}E_2^{-1}E_3^{-1})B\]
Thus:
\[
\begin{align*}
A &= \begin{bmatrix} 0.5 & 0 & 0 \\ 0 & 3 & 1 \\ 0 & 1 & 0 \end{bmatrix}
\begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\ -17 & -19 & -21 \end{bmatrix}\\\\
&= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
\end{align*}
\]
Even though \(A\) in the above example is not invertible, the process of elementary row operations gives insight
into how to compute the inverse of a matrix \(A^{-1}\) if it exists.
If an \(n \times n\) matrix \(A\) is invertible, the reduced row echelon form of the matrix must be \(I_n\).
Each step of the row reduction can be represented by an elementary matrix.
\[
\begin{align*}
&(E_k \cdots E_1)A = I_n \tag{1} \\\\
&\Longrightarrow A = (E_k \cdots E_1)^{-1}I_n = (E_k \cdots E_1)^{-1} \\\\
&\Longrightarrow A^{-1} = ((E_k \cdots E_1)^{-1})^{-1} = E_k \cdots E_1 = (E_k \cdots E_1 )I_n \tag{2}
\end{align*}
\]
Equations (1) and (2) show that the same sequaence of elementary matrices transforms \(A\) into \(I_n\)
and \(I_n\) into \(A^{-1}\).
Therefore, prow reduction on the augmented matrix \(\begin{bmatrix} A & I \end{bmatrix}\)
gives us \(\begin{bmatrix} I & A^{-1} \end{bmatrix}\) if \(A\) is invertible.
Partitions
\[
\begin{align*}
&A =
\left[\begin{array}{ccc|cc}
1 & 2 & -1 & 3 & 4 \\
5 & 3 & 0 & -2 & 1 \\ \hline
-1 & 3 & 1 & 7 & 1 \\
\end{array}\right]
=
\left[\begin{array}{c|c}
A_{11}& A_{12} \\ \hline
A_{21} & A_{22} \\
\end{array}\right] \\\\
&B =
\left[\begin{array}{}
3 & 2 \\
3 & -5 \\
-1 & 6 \\ \hline
4 & 7\\
1 & 1\\
\end{array}\right]
=
B = \left[\begin{array}{}
B_1 \\ \hline
B_2 \\
\end{array}\right]
\end{align*}
\]
The partitions of \(A\) and \(B\) are conformable for block multiplication. Partitioning is particularly
useful when matrices are very large, as it enables a computer to process smaller submatrices at a time,
improving efficiency and computational feasibility.
\[
AB
=
\left[\begin{array}{c|c}
A_{11}B_1 + A_{12}B_2 \\
A_{21}B_1 + A_{22}B_2 \\
\end{array}\right]
=
\left[\begin{array}{}
26 & 11 \\
17 & -18\\ \hline
34 & 39 \\
\end{array}\right]
\]
LU Factorization
A matrix factorization expresses the matrix as a product of two or more matrices.
Since matrix multiplications corresponds to the linear transformations, matrix factorization is
fundamental for analyzing the properties of the original matrix (or the observed data).
In applied mathematics, matrix factorization is often used as a crucial preprocessing step
to enable more efficient computations.
Given an \(m \times n\) matrix \(A\), it can be factorized into the form
\[A = LU\]
where \(L \in \mathbb{R}^{m \times m}\) is a lower triangular matrix with 1's on its main diagonal and
\(U \in \mathbb{R}^{m \times n}\) is an echelon form of \(A\).
For example, let
\[A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}.\]
The echelon form of \(A\) is
\[U = \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix}\]
We can track the row operations used to reduce \(A\) into \(U\) as elementary matrices:
\[E_1 = \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}
\qquad
E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 3 & 0 & 1 \end{bmatrix}
\qquad
E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & \frac{7}{3} & 1 \end{bmatrix}
\]
Since \(E_3E_2E_1A = U \Longrightarrow A = (E_1^{-1}E_2^{-1}E_3^{-1})U\), it follows that:
\[L = E_1^{-1}E_2^{-1}E_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix} \]
Thus,
\[
\begin{align*}
A = LU
&= \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix}
\begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix} \\\\
&= \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}.
\end{align*}
\]
With the \(LU\) factorization, solving \(Ax = b\) becomes more efficient:
\[Ax = b \Longrightarrow LUx = b\]
Let \(Ux = y\) and solve the pair of equations sequentially:
- Solve \(Ly = b\) for \(y\)
- Solve \(Ux = y\) for \(x\)
If \(A\) is an \(n \times n\) dense matrix with large \(n\), using \(LU\) factorization is
faster than using \(A^{-1}\) to solve \(Ax =b\). This is especially useful when solving the
equation multiple times with different \(b\) vectors. Furthermore, \(LU\) factorization reduces the risk of
numerical errors which can arise from computing both both \(A^{-1}\) and \(A^{-1}b\).