Kronecker Product

Vectorization

The vectorization operation, denoted by \(\text{vec }(\cdot)\), takes a matrix and stacks its columns into a single column vector.
For example, let \(x \in \mathbb{R}^n \), and consider outer product \(xx^\top \in \mathbb{R}^{n \times n}\). \[ \begin{align*} \text{vec }(xx^\top) &= \begin{bmatrix}x_1x_1 \\ x_2x_1 \\ \vdots \\ x_nx_1 \\ x_1x_2 \\ x_2x_2 \\ \vdots \\ x_nx_2 \\ \vdots \\ x_1x_n \\ x_2x_n \\ \vdots \\ x_nx_n \end{bmatrix} \\ \\ &= x \otimes x \end{align*} \] The notation \(\otimes\) is called Kronecker product(more generally, tensor product). This operation is widely used in fields like machine learning, statistics, and optimization for representing higher-order interactions or matrix manipulations in a compact form.

Note: Conversely, we can reshape a vector into a matrix:
Consider a vector \[ a = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 \end{bmatrix}^\top \]

row-major order (e.g. used C, C++, and Python)
column-major order (e.g. used by Julia, Matlab, R, and Fortran)

                import numpy as np

                vector = np.array([1, 2, 3, 4, 5, 6])

                #In Python, you can choose these two ways(default is row-major):

                matrix_r = vector.reshape((2, 3), order='C') # C stands for "C" programming language
                matrix_c = vector.reshape((2, 3), order='F') # F stands for "F"ortran programming language

Let \(A \in \mathbb{R^{m \times n}}\) and \(B \in \mathbb{R^{p \times q}}\). In general, the Kronecker product \(A\otimes B\) is given by \((mp) \times (nq)\) matrix: \[ A\otimes B = \begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1n} B \\ a_{21}B & a_{22}B & \cdots & a_{2n} B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \cdots & a_{mm} B \\ \end{bmatrix}. \] Each element \(a_{ij}\) of \(A\) is multiplied by the entire matrix \(B\) resulting in blocks of size \(p \times q\).
For example, \[ \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} \otimes \begin{bmatrix} 5 & 6 \\ 7 & 8 \\ \end{bmatrix} = \begin{bmatrix} 5 & 6 & 10 & 12 \\ 7 & 8 & 14 & 16 \\ 15 & 18 & 20 & 24 \\ 21 & 24 & 28 & 32 \\ \end{bmatrix}. \]

Useful Properties of the Kronecker Product

Mixed-product property
Transpose
Inverse
Trace
Determinant
Eigenvalues

Now, we are ready to discuss an important concept in machine learning.

Tensor

A tensor is a generalization of a 2d array to more than 2 dimensions. So far we have seen the following tensors in mathematics.

A scalar is a 0-dimensional tensor (single number, \(a \in \mathbb{R}\)).
A vector is a 1-dimensional tensor (array of numbers, \(\vec{a} \in \mathbb{R}^n\)).
A matrix is a 2-dimensional tensor (table of numbers, \(A \in \mathbb{R}^{m \times n}\)).

We can apply this notion to higher-order tensors. For example, the data of images are indeed, 3-dimensional tensor, because real-world images usually include color information, which requires multiple channels. So, for any image \(I\), \[ I \in \mathbb{R}^{H \times W \times C} \] where \(H\) is height, \(W\) is width, and \(C\) is channels(e.g., RGB: \(C = 3\)) of the image.

Formally, a tensor is an element of a tensor product of vector spaces. Consider vector spaces \(V\) and \(W\) over a field \(\mathbb{F}\). The tensor product of \(V\) and \(W\), denoted \(V \otimes W\) is a new vector space(tensor product space) whose elements are linear combinations of tensor products of vectors: \[ v \otimes w \quad \text{where } v \in V, \, w \in W. \] A rank-r tensor, \(T\) is an element of a tensor product of \(r\) vector spaces: \[ T \in V_1 \otimes V_2 \otimes \cdots \otimes V_r. \] In this context,

A scalar is a rank-0 tensor, which is an element of \(\mathbb{F}\).
A vector is a rank-1 tensor, which is an element of the vector space \(V\) or \(W\).
A matrix is a rank-2 tensor, which is an element of the tensor product space \(V \otimes W\).

So, the Kronecker product is just a specific case of the tensor product applied to "matrices" and the tensor product is a more general mathematical operation that applies to multilinear maps.

Kronecker Product

Linear Algebra

Vectorization

Kronecker Product

Tensor