The Derivative of \(f:\mathbb{R}^{n \times n} \rightarrow \mathbb{R}^{n \times n}\)

Derivative of the square matrix functions Derivative of the LU decomposition matrix

Derivative of the square matrix functions

So far, we have discussed the familiar derivatives such as gradients and Jacobian matrices. Here, we consider the derivative of matrix-valued function \(f\) with respect to the matirx input \(X\). \[ df = \frac{\partial f}{\partial X}dX \]
For example, consider \(f(X) = X^3\) where \(X \in \mathbb{R}^{n \times n}\). We want to find the most general symbolic expression for the derivative of this function using differential notation.
By the product rule, \[ \begin{align*} df &= (dX) X X + X(dX)X +X X (dX) \\\\ &= (dX)X^2 + X(dX)X + X^2(dX) \tag{1} \end{align*} \] Let's verify this expression using the definition of total differential \(df\): \[ \begin{align*} df &= f(X + dX) - f(X) \\\\ &= (X+dX)^3 - X^3 \\\\ &= (X+dX)(X+dX)(X+dX) - X^3 \\\\ &= X^3 + X^2(dX) + X(dX)X + X(dX)^2 + (dX)X^2 + (dX)X(dX) + (dX)^2X + (dX)^3 -X^3 \end{align*} \] Since the higher order terms are negligible as \(dX \to 0\), we obtain the expression (1).
Note: Matrix multiplication is not commutative.

Next, consider \(f(X) = X^{-1}\), where \(X\) is an invertible matrix.
Since \(X^{-1}X = I\), \[ d(X^{-1}X) = d(I) = 0 \, \in \mathbb{R}^{m \times n}. \] Then by the product rule, \[ \begin{align*} &df = d(X^{-1})X + X^{-1}(dX) = 0 \\\\ &\Longrightarrow d(X^{-1})X = - X^{-1}(dX). \end{align*} \] Thus, \[ df = d(X^{-1}) = - X^{-1}(dX)X^{-1}. \]

Derivative of the LU decomposition matrix

In the LU decomposition, a square matrix \(X\) is factorized into a product of a lower triangular matrix \(L\) with unit diagonal entries, and an upper triangular matrix \(U\).
By the product rule: \[ d(X) = d(LU) = (dL)U + L(dU) \] Note: Both matrices \(dL\) and \(dU\) are also triangular.
For example,
Let \(L = \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}\) and \(U = \begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix}\). Then we obtain their differentials: \(dL = \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}\) and \(dU = \begin{bmatrix} d(U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}\).
Thus, \[ \begin{align*} d(X) &= \begin{bmatrix} 0 & 0 \\ d(L_{21}) & 0 \end{bmatrix}\begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22} \end{bmatrix} + \begin{bmatrix} 1 & 0 \\ L_{21} & 1 \end{bmatrix}\begin{bmatrix} d( U_{11}) & d(U_{12}) \\ 0 & d(U_{22}) \end{bmatrix}\\\\ &= \begin{bmatrix} d( U_{11}) & d(U_{12}) \\ d(L_{21}) & d(U_{22}) \end{bmatrix} \end{align*} \]