Matrix Algebra - MATH-CS COMPASS

Diagonal Matrix

A diagonal matrix is \(n \times n\) square matrix whose entries outside the main diagonal are all zero. \[AB = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 6 & 12 \\ 8 & 15 & 24 \\ 14 & 24 & 36 \end{bmatrix} \] \[BA = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 12 & 15 & 18 \\ 28 & 32 & 36 \end{bmatrix} \] As you can see, multiplying on the right by the diagonal matrix \(B\) results in each "column" of \(A\) being scaled by the corresponding diagonal entry of \(B\). Conversely, multiplying on the left by \(B\) scales each "row" of \(A\).

A diagonal matrix with 1's on the main diagonal is called an identity matrix denoted by \(I_n\). Like the above example, \(AB \neq BA\) in general, but clearly \(AI = IA\) and the resulting matix is just the original matrix \(A\). For instance, \[AI = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}. \]

Transpose

The transpose of an \(m \times n\) matrix \(A\) is the \(n \times m\) matrix interchanging its rows and corresponding columns, and is denoted by \(A^T\). \[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \qquad A^T = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix} \] Note that the transpose operation does not change the main diagonal entries. There are many properties related to the transpose operation. Let me introduce some of them:

\((A^T)^T = A\)
\((A+B)^T = A^T + B^T\)
\((AB)^T = B^TA^T\)

For the third property, we can state the following:

Theorem 1: The transpose of the product of matrices is the product of their transposes in reverse order.

Proof: Let \(n\) be the number of columns of \(A\) (which equals to the number of rows of \(B\)). Then, \[ ((AB)^T)_{ij} = (AB)_{ji} =\sum_{k=1}^n a_{jk}b_{ki}. \] Also, \[ (B^TA^T)_{ij} = \sum_{k=1}^n b_{ki}a_{jk} = \sum_{k=1}^n a_{jk}b_{ki}. \] This property extends to the product of more than two matrices.

Inverse of a Matrix

An \(n \times n\) matrix \(A\) is said to be invertible if there exists an \(n \times n\) matrix \(B\) such that \[AB = BA = I.\] The matrix \(B\) is the inverse of \(A\) denoted by \(A^{-1}\). Also, a matrix that is NOT invertible is called a singular matrix.

By definition, we can say that \((A^{-1})^{-1} = A \, \) and also \((AB)^{-1} = B^{-1}A^{-1}\) because \[ \begin{align*} (AB)(B^{-1}A^{-1}) &= A(BB^{-1})A^{-1} \\\\ &= AIA^{-1} \\\\ &= AA^{-1} \\\\ &= I. \end{align*} \]
Using Theorem 1, we obtain the following result:

Theorem 2: If \(A\) is invertible, then so is \(A^T\) and \((A^T)^{-1} = (A^{-1})^T \).

Proof: Suppose a matrix \(A\) is an invertible matrix. By Theorem 1, \[(A^{-1})^TA^T = (AA^{-1})^T = I^T = I\] and similarly, \[A^T(A^{-1})^T = (A^{-1}A)^T = I^T = I.\] Thus, \(A^T\) is invertible and its inverse is \((A^{-1})^T\).

There is a simple formula for the inverse of a \(2 \times 2\) matrix. Let \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\). If the determinant of \(A\), \(\det (A) = ad - bc\) is nonzero, then \(A\) is invertible and \[ A^{-1} = \frac{1}{\det (A)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \] So, \(\det (A) = 0\) implies the matrix \(A\) is not invertible.
For example, \[ \begin{align*} \begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix} \frac{1}{(2 - 72)} \begin{bmatrix} 2 & -9 \\ -8 & 1 \end{bmatrix} &= \begin{bmatrix} 1 & 9 \\ 8 & 2 \end{bmatrix} \begin{bmatrix} -\frac{1}{35} & \frac{9}{70} \\ \frac{4}{35} & -\frac{1}{70} \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{align*} \]
Now we can solve the matrix equation \(Ax = b\) for the vector \(x\) by using the inverse: \[ \begin{align*} Ax=b &\Rightarrow A^{-1}Ax= A^{-1}b \\\\ &\Rightarrow Ix = A^{-1}b \\\\ &\Rightarrow x = A^{-1}b \end{align*} \]
Note: In practice, solving by row reduction is often faster than finding \(A^{-1}\) and can be more accurate.

Finally, let's consider an invertible linear transformation \(T:\mathbb{R}^n \to \mathbb{R}^n\).
\(T\) is invertible if \[\exists S: \mathbb{R}^n \to \mathbb{R}^n s.t. \forall x \in \mathbb{R}^n, S(T(x))=x \text{ and } T(S(x))=x.\]

Elementary Matrices

An elemetary matrix represents a single elemetary row opelation applied to an identity matrix.
Let \[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \] and \[ E_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1\\ 0 & 1 & 0 \end{bmatrix}, \quad E_2 = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3 & 1 \end{bmatrix} \] Then: \[ \begin{align*} &E_1A = \begin{bmatrix} 1 & 2 & 3 \\ 7 & 8 & 9 \\ 4 & 5 & 6 \end{bmatrix} \\\\ &E_2A = \begin{bmatrix} 2 & 4 & 6 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\\ &E_3A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ -5 & -7 & -9 \end{bmatrix} \end{align*} \] \(E_1\) interchanges Row 2 and Row 3, \(E_2\) is scales Row 1 by 2, and \(E_3\) adds (Row2 \(\times-3\)) to Row3. Moreover, we can store a "sequene" of the row operations into a single matrix. For example: \[ \begin{align*} E_3E_2E_1A &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & -3 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \\\\ &= \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\-17 & -19 & -21 \end{bmatrix}\\\\ &= B \end{align*} \] Since elementary row operations are "reversible", there always exsits a corresponding inverse matrix. Thus, using the inverse of elementary matrices, we can recover the original mmatrix \(A\) from \(B\). \[(E_1^{-1}E_2^{-1}E_3^{-1})E_3E_2E_1A = (E_1^{-1}E_2^{-1}E_3^{-1})B\] Thus: \[ \begin{align*} A &= \begin{bmatrix} 0.5 & 0 & 0 \\ 0 & 3 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \\ 7 & 8 & 9 \\ -17 & -19 & -21 \end{bmatrix}\\\\ &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \end{align*} \]
Even though \(A\) in the above example is not invertible, the process of elementary row operations gives insight into how to compute the inverse of a matrix \(A^{-1}\) if it exists.
If an \(n \times n\) matrix \(A\) is invertible, the reduced row echelon form of the matrix must be \(I_n\). Each step of the row reduction can be represented by an elementary matrix. \[ \begin{align*} &(E_k \cdots E_1)A = I_n \tag{1} \\\\ &\Longrightarrow A = (E_k \cdots E_1)^{-1}I_n = (E_k \cdots E_1)^{-1} \\\\ &\Longrightarrow A^{-1} = ((E_k \cdots E_1)^{-1})^{-1} = E_k \cdots E_1 = (E_k \cdots E_1 )I_n \tag{2} \end{align*} \] Equations (1) and (2) show that the same sequaence of elementary matrices transforms \(A\) into \(I_n\) and \(I_n\) into \(A^{-1}\).
Therefore, prow reduction on the augmented matrix \(\begin{bmatrix} A & I \end{bmatrix}\) gives us \(\begin{bmatrix} I & A^{-1} \end{bmatrix}\) if \(A\) is invertible.

Partitions

\[ \begin{align*} &A = \left[\begin{array}{ccc|cc} 1 & 2 & -1 & 3 & 4 \\ 5 & 3 & 0 & -2 & 1 \\ \hline -1 & 3 & 1 & 7 & 1 \\ \end{array}\right] = \left[\begin{array}{c|c} A_{11}& A_{12} \\ \hline A_{21} & A_{22} \\ \end{array}\right] \\\\ &B = \left[\begin{array}{} 3 & 2 \\ 3 & -5 \\ -1 & 6 \\ \hline 4 & 7\\ 1 & 1\\ \end{array}\right] = B = \left[\begin{array}{} B_1 \\ \hline B_2 \\ \end{array}\right] \end{align*} \] The partitions of \(A\) and \(B\) are conformable for block multiplication. Partitioning is particularly useful when matrices are very large, as it enables a computer to process smaller submatrices at a time, improving efficiency and computational feasibility. \[ AB = \left[\begin{array}{c|c} A_{11}B_1 + A_{12}B_2 \\ A_{21}B_1 + A_{22}B_2 \\ \end{array}\right] = \left[\begin{array}{} 26 & 11 \\ 17 & -18\\ \hline 34 & 39 \\ \end{array}\right] \]

LU Factorization

A matrix factorization expresses the matrix as a product of two or more matrices. Since matrix multiplications corresponds to the linear transformations, matrix factorization is fundamental for analyzing the properties of the original matrix (or the observed data). In applied mathematics, matrix factorization is often used as a crucial preprocessing step to enable more efficient computations.

Given an \(m \times n\) matrix \(A\), it can be factorized into the form \[A = LU\] where \(L \in \mathbb{R}^{m \times m}\) is a lower triangular matrix with 1's on its main diagonal and \(U \in \mathbb{R}^{m \times n}\) is an echelon form of \(A\).

For example, let \[A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}.\] The echelon form of \(A\) is \[U = \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix}\] We can track the row operations used to reduce \(A\) into \(U\) as elementary matrices: \[E_1 = \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} \qquad E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 3 & 0 & 1 \end{bmatrix} \qquad E_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & \frac{7}{3} & 1 \end{bmatrix} \] Since \(E_3E_2E_1A = U \Longrightarrow A = (E_1^{-1}E_2^{-1}E_3^{-1})U\), it follows that: \[L = E_1^{-1}E_2^{-1}E_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix} \] Thus, \[ \begin{align*} A = LU &= \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -3 & \frac{-7}{3} & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 0 \\ 0 & 0 & -2 \end{bmatrix} \\\\ &= \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & -2\\ -3 & 1 & 1 \end{bmatrix}. \end{align*} \]
With the \(LU\) factorization, solving \(Ax = b\) becomes more efficient: \[Ax = b \Longrightarrow LUx = b\] Let \(Ux = y\) and solve the pair of equations sequentially:

Solve \(Ly = b\) for \(y\)
Solve \(Ux = y\) for \(x\)

If \(A\) is an \(n \times n\) dense matrix with large \(n\), using \(LU\) factorization is faster than using \(A^{-1}\) to solve \(Ax =b\). This is especially useful when solving the equation multiple times with different \(b\) vectors. Furthermore, \(LU\) factorization reduces the risk of numerical errors which can arise from computing both both \(A^{-1}\) and \(A^{-1}b\).