Derivative of the square matrix functions
So far, we have discussed the familiar derivatives such as gradients and Jacobian matrices. Here, we consider
the derivative of matrix-valued function \(f\) with respect to the matirx input \(X\).
\[
df = \frac{\partial f}{\partial X}dX
\]
For example, consider \(f(X) = X^3\) where \(X \in \mathbb{R}^{n \times n}\). We want to find the most general
symbolic expression for the derivative of this function using differential notation.
By the product rule,
\[
\begin{align*}
df &= (dX) X X + X(dX)X +X X (dX) \\\\
&= (dX)X^2 + X(dX)X + X^2(dX) \tag{1}
\end{align*}
\]
Let's verify this expression using the definition of total differential \(df\):
\[
\begin{align*}
df &= f(X + dX) - f(X) \\\\
&= (X+dX)^3 - X^3 \\\\
&= (X+dX)(X+dX)(X+dX) - X^3 \\\\
&= X^3 + X^2(dX) + X(dX)X + X(dX)^2 + (dX)X^2 + (dX)X(dX) + (dX)^2X + (dX)^3 -X^3
\end{align*}
\]
Since the higher order terms are negligible as \(dX \to 0\), we obtain the expression (1).
Note: Matrix multiplication is not commutative.
Next, consider \(f(X) = X^{-1}\), where \(X\) is an invertible matrix.
Since \(X^{-1}X = I\),
\[
d(X^{-1}X) = d(I) = 0 \, \in \mathbb{R}^{m \times n}.
\]
Then by the product rule,
\[
\begin{align*}
&df = d(X^{-1})X + X^{-1}(dX) = 0 \\\\
&\Longrightarrow d(X^{-1})X = - X^{-1}(dX).
\end{align*}
\]
Thus,
\[
df = d(X^{-1}) = - X^{-1}(dX)X^{-1}.
\]