System of Linear Equations
A linear equation can be expressed in the form
\[a_1x_1 + a_2x_2 + \cdots + a_nx_n = b\]
where \(x_1 \cdots x_n \) are variables and \(b\) and the coefficients \(a_1 \cdots a_n\) are real or complex numbers.
In practice, \(n\) would be more than thousands or more.
We refer to a collection of linear equations involving the same variables as a system of linear equations.
A system is called consistent if it has either exactly one solution or infinitely many solutions
Also, if a system of linear equations has no solution, it is considered inconsistent.
We want to store the essential information of a system such as the values of each coefficient, the number of equations, and
the number of variables into a rectangular array called a matrix.
For example, consider the linear system:
\[
\begin{align}
x_1\phantom{+a_20x_2}-3x_3 + \phantom{0}x_4 &= 10 \\
4x_1 -5x_2 + 6x_3 -2x_4 &= -8 \\
\phantom{a_3x_1+} 2x_2 + 2x_3 +5x_4 &=6
\end{align}
\]
This syatem can be represented as the coefficient matrix:
\[
\begin{bmatrix} 1 & 0 & -3 & 1\\ 4 & -5 & 6 & -2\\ 0 & 2 & 2 & 5 \end{bmatrix}
\]
This is a \(3 \times 4\) (read"3 by 4") matrix. Note: The number of rows always comes first.
Alternatively, we can represent it as the augmented matrix:
\[
\begin{bmatrix} 1 & 0 & -3 & 1 & 10\\ 4 & -5 & 6 & -2 & -8\\ 0 & 2 & 2 & 5 & 6 \end{bmatrix}
\]
This is a \(3 \times 5\) matrix which includes the constants from the right-hand side of the equations.
Here, the number of rows is 3, and the number of columns is 5.
How can we solve the linear system? The key idea is to replace the system with an equivalent system
that is easier to solve. What does "equivalent" mean here? Two matrices are considered row equivalent
if there is a sequence of elementary row operations that transforms one matrix into the other.
The three elementary row operations are as follows.
- Replacing one row by the sum of itself and a multiple of another row.
- Swapping two rows.
- Multiplying all entries in a row by a nonzero scalar.
First, we wnat to transform our augmented matrix into row echelon form, which satisfies
following conditions:
- All nonzero rows are above any rows of all zeros.
- The first nonzero entry in each row appears further to the right than the first nonzero
entry in the row directly above it.
- All entries in a column below a leading entry are zero.
We apply elementary row operations to the matrix:
\[
\begin{align*}
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
4 & -5 & 6 & -2 & -8 \\
0 & 2 & 2 & 5 & 6
\end{bmatrix} \\\\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
0 & -5 & 18 & -6 & -48 \\
0 & 2 & 2 & 5 & 6
\end{bmatrix} \quad (\text{Row2 $\leftrightarrow$ Row2 - 4*Row1}) \\\\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10 \\
0 & -5 & 18 & -6 & -48 \\
0 & 0 & \frac{46}{5} & \frac{13}{5} & \frac{-66}{5}
\end{bmatrix} \quad (\text{Row3 $\leftrightarrow$ Row3 + (2/5)*Row2})
\end{align*}
\]
The matrix is now in row echelon form. Next, we transform it into reduced row echelon form(RREF).
A matrix in row echelon form is RREF if it satisfies the following conditions additionally:
- The leading entry in each nonzero row is 1.
- Each leading 1 is the only nonzero entry in its column.
\[
\begin{align*}
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10\\
0 & 1 & \frac{-18}{5} & \frac{6}{5} & \frac{48}{5}\\
0 & 0 & \frac{46}{5} & \frac{13}{5}& \frac{-66}{5}\\
\end{bmatrix} \quad (\text{Row2 $\leftrightarrow$ (Row2 / -5)}) \\\\
&\begin{bmatrix}
1 & 0 & -3 & 1 & 10\\
0 & 1 & \frac{-18}{5} & \frac{6}{5} & \frac{48}{5}\\
0 & 0 & 1 & \frac{13}{46}& \frac{-33}{23}\\
\end{bmatrix} \quad (\text{Row3 $\leftrightarrow$ (Row3 * 5/46)}) \\\\
&\begin{bmatrix}
1 & 0 & 0 & \frac{85}{46} & \frac{131}{23}\\
0 & 1 & \frac{-18}{5} & \frac{6}{5} & \frac{48}{5}\\
0 & 0 & 1 & \frac{13}{46}& \frac{-33}{23}\\
\end{bmatrix} \quad (\text{Row1 $\leftrightarrow$ (Row1 + 3*Row3)}) \\\\
&\begin{bmatrix}
1 & 0 & 0 & \frac{85}{46} & \frac{131}{23}\\
0 & 1 & 0 & \frac{51}{23} & \frac{102}{23}\\
0 & 0 & 1 & \frac{13}{46}& \frac{-33}{23}\\
\end{bmatrix} \quad (\text{Row2 $\leftrightarrow$ (Row2 + 18/5*Row3)})
\end{align*}
\]
Finally, we get the solution of the linear system as follows.
\[
\begin{align}
x_1 &= -\frac{85}{46}x_4 + \frac{131}{23}\\\\
x_2 &= -\frac{51}{23}x_4 + \frac{102}{23}\\\\
x_3 &= -\frac{13}{46}x_4 - \frac{33}{23}\\\\
x_4 & \text{ is a free variable.}
\end{align}
\]
In this system, any solution is determined by a choice of the free variable \(x_4\).
The Matrix Equation
To get a new perspective on the system of linear equations, we would like to
describe it using the concept of vectors.
Given vectors \(v_1, v_2,\cdots, v_p \in \mathbb{R}^n \) and scalars \(c_1, c_2,\cdots, c_p\),
the vector \(y\) defined as
\[y = c_1v_1 + c_2v_2 + \cdots + c_pv_p\]
which is called a linear combination of \(v_1, v_2,\cdots, v_p\) with
weights \(c_1, c_2,\cdots, c_p\).
Let's reconsider the augmented matrix
\[\begin{bmatrix} 1 & 0 & -3 & 1 & 10\\ 4 & -5 & 6 & -2 & -8\\ 0 & 2 & 2 & 5 & 6 \end{bmatrix}.\]
Define its column vectors:
\(a_1 = \begin{bmatrix} 1 \\ 4 \\ 0 \end{bmatrix},
a_2 = \begin{bmatrix} 0 \\ -5 \\ 2 \end{bmatrix},
a_3 = \begin{bmatrix} -3 \\ 6 \\ 2 \end{bmatrix},
a_4 = \begin{bmatrix} 1 \\ -2 \\ 5 \end{bmatrix}\), and
\(b =\begin{bmatrix} 10 \\ -8 \\ 6 \end{bmatrix} \in \mathbb{R}^3\).
Then \(b\) is generated("spanned") by a linear combination of \(a_1, \cdots, a_4\) because there exist
weights \(x_1, \cdots, x_4\) s.t.
\[x_1a_1 + x_2a_2 + x_3a_3 + x_4a_4 = b\]
This is equivalent to the solution set we found earlier using the augmented matrix.
In other words, the vector \(b\) lies in the span of \(\{a_1, a_2, a_3, a_4\}\), which is
a subset of \( \mathbb{R}^3\).
\[b \in \text{Span } \{a_1, a_2, a_3, a_4\}\]
Here, we reinterpret the linear combination of vectors as the product of a
matrix \(A \in \mathbb{R}^{3\times4}\) and a vector \(x\in\mathbb{R}^4\).
\[
Ax = \begin{bmatrix} a_1 & a_2 & a_3 & a_4 \end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}
= x_1a_1 + x_2a_2 + x_3a_3 + x_4a_4 = b
\]
Hence, the matrix equation \(Ax = b\) has a solution if and only if \(b\) is a linear combination of
the columns of \(A\).
Linear Independence
A set of vectors \(\{v_1, v_2, \cdots, v_p\}\in \mathbb{R}^n \) is called linearly independent if
\[x_1v_1 + x_2v_2 + \cdots + x_pv_p = 0\]
has "only" the trivial solution(the zero vector in \(\mathbb{R}^n\)).
Consider a matrix \(A\) with columns \(\{v_1, v_2, \cdots, v_p\}\). The matrix equation \(Ax =0\) is called the homogeneous equation,
and it always has at least one solution: the trivial solution \(x=0\). By definition, the columns of \(A\) are linearly independent
if and only if \(Ax =0\) has "only" the trivial solution.
On the other hand, the equation \(Ax =0\) has a nontrivial solution if and only if
the columns of \(A\) are linearly dependent. When are the columns of \(A\) linearly dependent? Remember, if a linear system has a free variable,
basic variables "depend" on the choice of the free variable. Thus, the homogeneous equation \(Ax =0\) has a nontrivial solution
if and only if the equation has at least one free variable, which means the columns of \(A\) are linearly dependent.
Warning:
Consider the vectors
\[
u = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \qquad
v = \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix} \qquad
w = \begin{bmatrix} 7 \\ -9 \\ 7 \end{bmatrix}
\]
The set \(\{u, v, w\}\) is a linearly "dependent" set because \(v = 2u\), but the vector \(w\) is NOT
a linear combination of \(u\) and \(v\). In other words, \(w \notin Span\{u, v\}\).
Even in a linearly dependent set, some vectors might not be expressible as a combination of the others.
Nonhomogeneous System vs Homogeneous System
You might wonder about the relationship between a homogeneous equation \(Ax = 0\) and a
nonhomogeneous equation \(Ax = b\), where \(b\) is a nonzero vector.
Indeed, there is an important connection between them. Let's revisit the solution of our nonhomogeneous system:
\[\begin{align}
x_1 &= -\frac{85}{46}x_4 + \frac{131}{23}\\\\
x_2 &= -\frac{51}{23}x_4 + \frac{102}{23}\\\\
x_3 &= -\frac{13}{46}x_4 - \frac{33}{23}\\\\
x_4 & \text{ is a free variable.}
\end{align}\]
As a "vector", the general solution of \(Ax = b\) can be written in
parametric vector form as:
\[
x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}
= \begin{bmatrix} \frac{131}{23}\\ \frac{102}{23}\\ - \frac{33}{23}\\ 0 \end{bmatrix}
+ x_4\begin{bmatrix} -\frac{85}{46}\\ -\frac{51}{23} \\ -\frac{13}{46}\\ 1 \end{bmatrix}
= p + x_4v
\]
where \(p\) is a particular solution to \(Ax = b\) (when \(x_4 =0\)), and \(x_4v\) represents the
general solution to the corresponding homogeneous system \(Ax = 0\).
Let's solve the homogeneous system \(Ax = 0\) using the same matrix \(A\).
\[
\begin{bmatrix} 1 & 0 & -3 & 1 & 0\\ 4 & -5 & 6 & -2 & 0\\ 0 & 2 & 2 & 5 & 0 \end{bmatrix}
\xrightarrow{\text{rref}}
\begin{bmatrix} 1 & 0 & 0 & \frac{85}{46} & 0\\ 0 & 1 & 0 & \frac{51}{23} & 0\\ 0 & 0 & 1 & \frac{13}{46}& 0 \end{bmatrix}
\]
Then the solution is
\[
\begin{align}
x_1 &= -\frac{85}{46}x_4 \\\\ x_2 &= -\frac{51}{23}x_4 \\\\ x_3 &= -\frac{13}{46}x_4 \\\\ x_4 & \text{ is a free variable.}
\end{align}
\]
In parametric vector form, this solution becomes:
\[x = x_4\begin{bmatrix} -\frac{85}{46}\\ -\frac{51}{23} \\ -\frac{13}{46}\\ 1 \end{bmatrix}
= x_4v
\]
This is the general solution to \(Ax = 0\) where \(p = 0\).
So, the solutions of the nonhomogeneous system \(Ax =b\) can be obtained by adding the particular solution \(p\)
to the general solution of the homogeneous system \(Ax =0\).
Geometrically, you can imagine that the solution set of \(Ax = 0\) and \(Ax = b\) are "parallel."
This relationship leads us to an important result:
Theorem 1:
Suppose the system \(Ax = b\) is consistent for some vector \(b\).
Let \(p\) be a solution of the system. Then the solution set of \(Ax = b\) consists of
the set of all vectors of the form \(w = p + v_h\), where \(v_h\) is any solution of
the homogeneous system \(Ax = 0\).
Proof:
Let \(p\) be a particular solution of the nonhomogeneous system \(Ax = b\) and let \(v_h\) be
any solution of the homogeneous system \(Ax = 0\). Also, let \(w = p + v_h\).
\[ Aw = A(p+v_h) = Ap + Av_h = b + 0 = b \]
Thus \(w = p + v_h\) is a solution of \(Ax = b\).
Also, let \(w\) be any solution of \(Ax = b\), and define \(v_h = w -p\). Then
\[ Av_h = A(w-p) = Aw - Ap = b -b = 0.\]
Thus \(v_h = w -p\) satisfies \(Ax = 0\).
Therefore, every solution of \(Ax = b\) has the form \(w = p + v_h\).
This theorem is not only crucial in understanding linear systems but is also highly useful
in solving differential equations.