Matrix Calculus - MATH-CS COMPASS

Derivative of the square matrix functions

So far, we have discussed the familiar derivatives such as gradients and Jacobian matrices. Here, we consider the derivative of matrix-valued function \(f\) with respect to the matirx input \(X\). \[ df = \frac{\partial f}{\partial X}dX \]
For example, consider \(f(X) = X^3\) where \(X \in \mathbb{R}^{n \times n}\). We want to find the most general symbolic expression for the derivative of this function using differential notation.
By the product rule, \[ \begin{align*} df &= (dX) X X + X(dX)X +X X (dX) \\\\ &= (dX)X^2 + X(dX)X + X^2(dX) \tag{1} \end{align*} \] Let's verify this expression using the definition of total differential \(df\): \[ \begin{align*} df &= f(X + dX) - f(X) \\\\ &= (X+dX)^3 - X^3 \\\\ &= (X+dX)(X+dX)(X+dX) - X^3 \\\\ &= X^3 + X^2(dX) + X(dX)X + X(dX)^2 + (dX)X^2 + (dX)X(dX) + (dX)^2X + (dX)^3 -X^3 \end{align*} \] Since the higher order terms are negligible as \(dX \to 0\), we obtain the expression (1).
Note: Matrix multiplication is not commutative.

Next, consider \(f(X) = X^{-1}\), where \(X\) is an invertible matrix.
Since \(X^{-1}X = I\), \[ d(X^{-1}X) = d(I) = 0 \, \in \mathbb{R}^{m \times n}. \] Then by the product rule, \[ \begin{align*} &df = d(X^{-1})X + X^{-1}(dX) = 0 \\\\ &\Longrightarrow d(X^{-1})X = - X^{-1}(dX). \end{align*} \] Thus, \[ df = d(X^{-1}) = - X^{-1}(dX)X^{-1}. \]

Derivative of the LU decomposition matrix

In the LU decomposition, a square matrix \(X\) is factorized into a product of a lower triangular matrix \(L\) with unit diagonal entries, and an upper triangular matrix \(U\).
By the product rule: \[ d(X) = d(LU) = (dL)U + L(dU) \] Note: Both matrices \(dL\) and \(dU\) are also triangular.
For example,
Let \(L = \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}\) and \(U = \begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix}\). Then we obtain their differentials: \(dL = \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}\) and \(dU = \begin{bmatrix} d(U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}\).
Thus, \[ \begin{align*} d(X) &= \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}\begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix} + \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}\begin{bmatrix} d( U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}\\\\ &= \begin{bmatrix} d( U_{11}) & d(U_{12}) \\ d(L_{21}) & d(U_{22}) \end{bmatrix} \end{align*} \]

The Derivative of \(f:\mathbb{R}^{n \times n} \rightarrow \mathbb{R}^{n \times n}\)

Calculus to Optimization & Analysis

Derivative of the square matrix functions

Derivative of the LU decomposition matrix