Vectorization
The vectorization operation, denoted by \(\text{vec }(\cdot)\), takes a matrix and stacks its columns
into a single column vector.
For example, let \(x \in \mathbb{R}^n \), and consider outer product \(xx^\top \in \mathbb{R}^{n \times n}\).
\[
\begin{align*}
\text{vec }(xx^\top) &= \begin{bmatrix}x_1x_1 \\ x_2x_1 \\ \vdots \\ x_nx_1 \\ x_1x_2 \\ x_2x_2 \\ \vdots \\ x_nx_2 \\ \vdots \\
x_1x_n \\ x_2x_n \\ \vdots \\ x_nx_n
\end{bmatrix} \\ \\
&= x \otimes x
\end{align*}
\]
The notation \(\otimes\) is called Kronecker product(more generally, tensor product).
This operation is widely used in fields like machine learning, statistics, and optimization for representing higher-order
interactions or matrix manipulations in a compact form.
Note: Conversely, we can reshape a vector into a matrix:
Consider a vector
\[
a = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 \end{bmatrix}^\top
\]
- row-major order (e.g. used C, C++, and Python)
\[
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}
\]
- column-major order (e.g. used by Julia, Matlab, R, and Fortran)
\[
A = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 4 & 6 \end{bmatrix}
\]
import numpy as np
vector = np.array([1, 2, 3, 4, 5, 6])
#In Python, you can choose these two ways(default is row-major):
matrix_r = vector.reshape((2, 3), order='C') # C stands for "C" programming language
matrix_c = vector.reshape((2, 3), order='F') # F stands for "F"ortran programming language
Kronecker Product
Let \(A \in \mathbb{R^{m \times n}}\) and \(B \in \mathbb{R^{p \times q}}\). In general, the Kronecker product \(A\otimes B\) is given by
\((mp) \times (nq)\) matrix:
\[
A\otimes B = \begin{bmatrix}
a_{11}B & a_{12}B & \cdots & a_{1n} B \\
a_{21}B & a_{22}B & \cdots & a_{2n} B \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1}B & a_{m2}B & \cdots & a_{mm} B \\
\end{bmatrix}.
\]
Each element \(a_{ij}\) of \(A\) is multiplied by the entire matrix \(B\) resulting in blocks of size \(p \times q\).
For example,
\[
\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix}
\otimes
\begin{bmatrix} 5 & 6 \\ 7 & 8 \\ \end{bmatrix}
=
\begin{bmatrix}
5 & 6 & 10 & 12 \\
7 & 8 & 14 & 16 \\
15 & 18 & 20 & 24 \\
21 & 24 & 28 & 32 \\
\end{bmatrix}.
\]
Useful Properties of the Kronecker Product
- Mixed-product property
\[
(A \otimes B)(C \otimes D) = (AC) \otimes (BD)
\]
- Transpose
\[
(A \otimes B)^\top = A^\top \otimes B^\top
\]
- Inverse
\[
(A \otimes B)^{-1} = A^{-1} \otimes B^{-1}
\]
- Trace
\[
\text{Tr }(A \otimes B) = \text{Tr }(A) \text{Tr }(B)
\]
- Determinant
\[
\det (A \otimes B) = \det(A)^n \det(B)^m
\]
where \(A \in \mathbb{R}^{m \times m}\), and \(B \in \mathbb{R}^{n \times n}\).
- Eigenvalues
Suppose \(A \in \mathbb{R}^{m \times m}\) and \(B \in \mathbb{R}^{n \times n}\). Then
\[
\text{Eigenvalues of } A \otimes B = \lambda_i \mu_j
\]
where \(\lambda_i (i = 1, \cdots m)\) and \(\mu_j (j = 1, \cdots, n)\) are eigenvalues of \(A\)
and \(B\) respectively.
Now, we are ready to discuss an important concept in machine learning.
Tensor
A tensor is a generalization of a 2d array to more than 2 dimensions. So far we have seen
the following tensors in mathematics.
- A scalar is a 0-dimensional tensor (single number, \(a \in \mathbb{R}\)).
- A vector is a 1-dimensional tensor (array of numbers, \(\vec{a} \in \mathbb{R}^n\)).
- A matrix is a 2-dimensional tensor (table of numbers, \(A \in \mathbb{R}^{m \times n}\)).
We can apply this notion to higher-order tensors. For example, the data of images are indeed, 3-dimensional tensor,
because real-world images usually include color information, which requires multiple channels. So, for any image \(I\),
\[
I \in \mathbb{R}^{H \times W \times C}
\]
where \(H\) is height, \(W\) is width, and \(C\) is channels(e.g., RGB: \(C = 3\)) of the image.
Formally, a tensor is an element of a tensor product of vector spaces.
Consider vector spaces \(V\) and \(W\) over a field \(\mathbb{F}\). The tensor product of \(V\) and \(W\), denoted \(V \otimes W\)
is a new vector space(tensor product space) whose elements are linear combinations of tensor products of vectors:
\[
v \otimes w \quad \text{where } v \in V, \, w \in W.
\]
A rank-r tensor, \(T\) is an element of a tensor product of \(r\) vector spaces:
\[
T \in V_1 \otimes V_2 \otimes \cdots \otimes V_r.
\]
In this context,
- A scalar is a rank-0 tensor, which is an element of \(\mathbb{F}\).
- A vector is a rank-1 tensor, which is an element of the vector space \(V\) or \(W\).
- A matrix is a rank-2 tensor, which is an element of the tensor product space \(V \otimes W\).
So, the Kronecker product is just a specific case of the tensor product applied to "matrices" and
the tensor product is a more general mathematical operation that applies to multilinear maps.