Linear Transformation

A transformation (or mapping) \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m \) is linear if
\(\forall u, v \in\mathbb{R}^m,\) and any scalars \(c\), the following two properties hold: \[T(u+v) = T(u) +T(v) \] \[T(cu) = cT(u)\]
For example, consider a matrix \(A \in \mathbb{R}^{m \times 3}\) and a vector \(x \in \mathbb{R}^3 \). Then, the matrix-vector product \(Ax\) is a linear transformation \(T: x \mapsto Ax\). To verify this, let \(u, v \in \mathbb{R}^3 \) and \(c\) be a scalar. Let's check the two conditions:

Vector addition: \[ \begin{align*} A(u+v) &= \begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix} \begin{bmatrix} u_1 + v_1 \\ u_2 + v_3 \\ u_3 + v_3 \end{bmatrix} \\\\ &= (u_1 + v_1)a_1 + (u_2 + v_2)a_2 + (u_3 + v_3)a_3 \\\\ &= (u_1a_1 + u_2a_2 + u_3a_3) + (v_1a_1 + v_2a_2 + v_3a_3) \\\\ &= Au + Av \end{align*} \]
Scalar multiplication: \[ \begin{align*} A(cu) &= \begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix} \begin{bmatrix} cu_1\\ cu_2 \\ cu_3 \end{bmatrix} \\\\ &= c(u_1a_1)+c(u_2a_2)+c(u_3a_3) \\\\ &= c(u_1a_1 + u_2a_2 + u_3a_3) \\\\ &= c(Au). \end{align*} \]
In general, the operations of vector addition and scalar multiplication are preserved under linear transformations.
In addition, if a transformation \(T\) is linear, then \[T(0)=0\] Also, for any scalar \(a, b\) and \(u, v\) is in the domain of \(T\), \[T(au + bv)=aT(u)+bT(v)\]

You probably learned following concepts somewhere in terms of functions \(f(x): \mathbb{R} \to \mathbb{R}\). Now, we would like to extend the definition to higher dimensions.

A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be onto(surjective) \(\mathbb{R}^m\) if \[ \begin{align*} &\forall b \in \mathbb{R}^m, \\\\ &\exists \, x \in \mathbb{R}^n \text{s.t } T(x) = b. \end{align*} \] Equivalently, the range of \(T\) (the set of all outputs) is equal to the codomain \(\mathbb{R}^m\).

Theorem 1: In a matrix transformation("linear" transformation) \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m \),

\(T\) maps \(\mathbb{R}^n\) onto \(\mathbb{R}^m \) iff the columns of a matrix \(A\) span \(\mathbb{R}^m\).

Note: This means that for each \(b \in \mathbb{R}^m\), the equation \(Ax =b \) has at lest one solution.

A mapping \(T: \mathbb{R}^n \to \mathbb{R}^m\) is said to be one-to-one(injective) if \[ \begin{align*} &\forall b \in \mathbb{R}^m, \forall u, v \in \mathbb{R}^n, \\\\ &T(u) = T(v) = b \Rightarrow u = v. \end{align*} \] Equivalently, for each \(b \in \mathbb{R}^m\), \(T(x) = b\) has either a unique solution or no solution at all.

Theorem 2: In a matrix transformation("linear" transformation) \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m \),

\(T\) is one-to-one iff the columns of a matrix \(A\) are linearly independent (or, the homogeneous equation \(Ax =0\) has only the trivial solution).

Note: A linear transformation \(T\) is one-to-one iff \(\, T(x)=0\) has only the trivial solution(\(x = 0\)).
Since \(T\) is linear, \(T(0)=0\). If \(T\) is one-to-one, clearly \(T(x)=0\) has only trivial solution. Also, if \(T\) is not one-to-one, it is possible that for some different vectors \(u, v \in \mathbb{R}^n\), \( T(u) = T(v) = b\). Then since \(T\) is linear, \(T(u-v)=T(u)-T(v) =b - b =0\). The vector \(u-v\) is nonzero because \(u \neq v\). Thus \(T(x)=0\) has more than one solution.

Linearity in Mathematics

Essentially, we are not only talking about vectors and matrices. The linear transformation is one of the most fundamental ideas in mathematics. You have encountered this essential concept multiple times outside Linear Algebra. Let's briefly observe a few examples.

A function \(f(x) = mx\) is linear transformation \(f:\mathbb{R} \rightarrow \mathbb{R}\). \[ \begin{align*} f(ax+by) &= m(ax+by) \\\\ &= a(mx)+b(my) \\\\ &= af(x) + bf(y) \end{align*} \] Note: the graph of fanctions that hold linearity must pass through the origin since linear transformations must satisfy \(T(0) = 0\).
If you have just started to learn calculus, you might know that \[\frac{d}{dx}(aX(t)+bY(t)) =a\frac{d}{dx}(X(t))+b\frac{d}{dx}(Y(t)).\] For example, \[ \begin{align*} \frac{d}{dx}(5x^3 +4x^2) &= 15x^2 + 8x \\\\ &= 5\frac{d}{dx}(x^3)+4\frac{d}{dx}(x^2) \end{align*} \] so the differential is just a linear operator. (in fact, the integration too.)

Have you learned statistics? The expected value also holds linearity. For any random variables \(X\) and \(Y\), and constants \(a\) and \(b\) \[\mathbb{E}[aX+bY]=a\mathbb{E}[X]+b\mathbb{E}[Y].\] You can see the linearity everywhere in mathematics and this is why "linear" algebra is so powerful.

Matrix Multiplication

Suppose \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times p\) matrix. Then the product \(AB\) is the \(m \times p\) matrix \[ AB = \begin{bmatrix} Ab_1 & Ab_2 & \cdots & Ab_p \end{bmatrix} \] where \(b_1, b_2, \cdots, b_p\) are columns of \(B\).

So, each column of \(AB\) is a linear combination of the columns of A with weights from the corresponding column of \(B\). Let's see an example: \[ \begin{align*} AB &= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} -1 & -2 \\ 0 & -3 \\ -4 & 0 \end{bmatrix} \\\\ &= \begin{bmatrix} (-1+0-12) & (-2-6+0) \\ (-4+0-24) & (-8-15+0) \\ (-7+0-36) & (-14-24+0) \end{bmatrix} \\\\ &= \begin{bmatrix} -13 & -8 \\ -28 & -23 \\ -43 & -38 \end{bmatrix} \end{align*} \] You can see the \((i,j)\)-entry in \(AB\) is \[(AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}.\] Also, note that in the example, \(BA\) is NOT defined because the number of columns of \(B\) does not match the number of rows of \(A\). This implies that in general, \(AB \neq BA\) even if the size of the matrices match each other. Similarly, \(AB = AC\) does NOT guarantee that \(B = C\), and \(AB = 0\) does NOT imply \(A=0\) or \(B=0\) in general.

You may wonder why matrix multiplication is not simply an entrywise operation, like sums and scalar multiples of matrices. The key idea is a linear transformation. Matrix multiplication is composition of linear transformations. Let's say a vector \(x\) is multiplied by a matrix \(B\), which means it maps \(x\) into the new vector \(Bx\). Next, a matrix \(A\) multiplies the vector \(Bx\). Then we get the resulting vector \(A(Bx)\). Since the conposition of functions is associative, \[(AB)x = A(Bx)\] which allows us to calculate \(Bx\) (matrix \(\times\) vector) first, instead of \(AB\) (matrix \(\times\) matrix). This is significant in numerical computation. For example, if the size of matrices A, B, C, D, and E are one million by one million and we want to compute \(ABCDEx\), where \(x\) is a column vector, we should calculate it from the rightmost side\(Ex\), which gives a new column "vector." Then we can avoid huge matrix multiplications. Ultimately, computing \(A(BCDEx)\) is reduced to just a matrix-vector multiplication, which avoids the computational complexity of performing the full matrix multiplication.

Interactive Linear Transformation Visualizer

Linear transformations are often best understood through visual intuition. In two dimensions, every linear transformation can be represented by a matrix multiplication that reshapes, rotates, reflects, or scales the plane.

The following interactive visualizer allows you to explore how different 2x2 matrices affect a 2D grid and various shapes. Try modifying the entries of the matrix and observe how the transformation changes the geometry.

For example, the above 90° clockwise rotation for the square shape is given by \[ \begin{align*} &\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 5 & 5 & 1 \\ 5 & 5 & 1 & 1 \end{bmatrix} \\\\ &= \begin{bmatrix} 5 & 5 & 1 & 1 \\ -1 & -5 & -5 & -1 \end{bmatrix}. \end{align*} \]