Automatic Differentiation

Automatic Differentiation Analytic Example of Reverse-Mode AD Applications of AD Sample Code

Analytic Example of Reverse-Mode AD

To make the process of automatic differentiation concrete, let's walk through an analytic example using a composite scalar-valued function of two variables. We'll decompose the function into primitive operations, represent it as a computational graph, and compute its gradients using reverse-mode automatic differentiation (i.e., backpropagation).

Consider a function \[ f(x_1, x_2) = \log \left((x_1 + x_2)^2 + \sin(x_1 x_2) \right). \] We decompose this into primitive operations: \[ \begin{align*} &x_3 = x_1 + x_2 \\\\ &x_4 = x_3^2 \\\\ &x_5 = x_1 x_2 \\\\ &x_6 = \sin(x_5) \\\\ &x_7 = x_4 + x_6 \\\\ &x_8 = \log(x_7) = f \\\\ \end{align*} \]

Computational Graph for f(x₁, x₂) = log((x₁ + x₂)² + sin(x₁x₂)) x₁ x₂ x₃ x₁ + x₂ x₅ x₁ × x₂ x₄ x₃² x₆ sin(x₅) x₇ x₄ + x₆ x₈ = f log(x₇) Legend Input variables Intermediate values Output Forward pass Gradient flow ∂f/∂x₇ = 1/x₇ ∂f/∂x₄ = 1/x₇ ∂f/∂x₆ = 1/x₇ ∂f/∂x₃ = 2x₃/x₇ ∂f/∂x₅ = cos(x₅)/x₇

This computational graph clearly shows the DAG (Directed Acyclic Graph) structure. Notice how:

Starting from the output and working backwards: \[ \begin{align*} \frac{\partial f}{\partial x_8} &= 1 \\\\ \frac{\partial f}{\partial x_7} &= \frac{\partial f}{\partial x_8} \cdot \frac{\partial x_8}{\partial x_7} \\\\ &= 1 \cdot \frac{1}{x_7} = \frac{1}{x_7} \\\\ \frac{\partial f}{\partial x_4} &= \frac{\partial f}{\partial x_7} \cdot \frac{\partial x_7}{\partial x_4} \\\\ &= \frac{1}{x_7} \cdot 1 = \frac{1}{x_7} \\\\ \frac{\partial f}{\partial x_6} &= \frac{\partial f}{\partial x_7} \cdot \frac{\partial x_7}{\partial x_6} \\\\ &= \frac{1}{x_7} \cdot 1 = \frac{1}{x_7} \\\\ \frac{\partial f}{\partial x_3} &= \frac{\partial f}{\partial x_4} \cdot \frac{\partial x_4}{\partial x_3} \\\\ &= \frac{1}{x_7} \cdot 2 x_3\\\\ \frac{\partial f}{\partial x_5} &= \frac{\partial f}{\partial x_6} \cdot \frac{\partial x_6}{\partial x_5} \\\\ &= \frac{1}{x_7} \cdot \cos(x_5)\\\\ \end{align*} \]

Notice that the input variables \(x_1\) and \(x_2\) each contribute to multiple intermediate nodes:

This means we need to sum the gradients from all paths when computing the final derivatives.

To find the gradients with respect to the input variables, we sum contributions from all paths:

For \(\frac{\partial f}{\partial x_1}\): \[ \begin{align*} \frac{\partial f}{\partial x_1} &= \frac{\partial f}{\partial x_3} \cdot \frac{\partial x_3}{\partial x_1} + \frac{\partial f}{\partial x_5} \cdot \frac{\partial x_5}{\partial x_1} \\\\ &= \frac{2x_3}{x_7} \cdot 1 + \frac{\cos(x_5)}{x_7} \cdot x_2 \\\\ &= \frac{1}{x_7} \left[2x_3 + x_2 \cos(x_5)\right] \end{align*} \]

For \(\frac{\partial f}{\partial x_2}\): \[ \begin{align*} \frac{\partial f}{\partial x_2} &= \frac{\partial f}{\partial x_3} \cdot \frac{\partial x_3}{\partial x_2} + \frac{\partial f}{\partial x_5} \cdot \frac{\partial x_5}{\partial x_2} \\\\ &= \frac{2x_3}{x_7} \cdot 1 + \frac{\cos(x_5)}{x_7} \cdot x_1 \\\\ &= \frac{1}{x_7} \left[2x_3 + x_1 \cos(x_5)\right] \end{align*} \]

Replacing the intermediate variables with their expressions in terms of \(x_1\) and \(x_2\):

Finally, we get the derivatives: \[ \boxed{ \begin{align*} \frac{\partial f}{\partial x_1} &= \frac{2(x_1 + x_2) + x_2 \cos(x_1 x_2)}{(x_1 + x_2)^2 + \sin(x_1 x_2)} \\\\ \frac{\partial f}{\partial x_2} &= \frac{2(x_1 + x_2) + x_1 \cos(x_1 x_2)}{(x_1 + x_2)^2 + \sin(x_1 x_2)} \end{align*} } \]

The power of automatic differentiation lies in its systematic approach:

  1. Decompose complex functions into simple primitive operations
  2. Apply the chain rule mechanically through the computational graph
  3. Sum gradients when variables contribute through multiple paths
This process can be fully automated, making it the backbone of modern deep learning frameworks.

Applications of AD

Automatic differentiation is a core component in modern computational systems that require efficient and accurate derivatives. In particular, it powers nearly all deep learning frameworks such as:

These systems rely on automatic differentiation to:

  1. Train neural networks by computing gradients of loss functions with respect to millions (or billions) of parameters
  2. Optimize black-box functions in physics simulation, robotics, and finance
  3. Perform end-to-end differentiation through control flow, dynamic loops, and even solver calls (e.g., differentiable physics)

Sample Code

                                import numpy as np

                                class AutoDiffNode:
                                    """Node in the computational graph for automatic differentiation"""
                                    def __init__(self, value, grad=0.0):
                                        self.value = value
                                        self.grad = grad
                                        self.children = []  # Nodes that depend on this node
                                        self.local_gradients = []  # Local gradients to children

                                def manual_autodiff_example(x1_val, x2_val):
                                    """
                                    Manual implementation of automatic differentiation for:
                                    f(x1, x2) = log((x1 + x2)^2 + sin(x1 * x2))
                                    
                                    This demonstrates the forward and backward pass explicitly.
                                    """
                                    print(f"Computing f({x1_val}, {x2_val}) = log((x1 + x2)² + sin(x1 * x2))")
                                    print("="*60)
                                    
                                    # Forward Pass - Compute function value
                                    print("FORWARD PASS:")
                                    x1 = x1_val
                                    x2 = x2_val
                                    print(f"x1 = {x1}")
                                    print(f"x2 = {x2}")
                                    
                                    x3 = x1 + x2
                                    print(f"x3 = x1 + x2 = {x3}")
                                    
                                    x4 = x3**2
                                    print(f"x4 = x3² = {x4}")
                                    
                                    x5 = x1 * x2
                                    print(f"x5 = x1 * x2 = {x5}")
                                    
                                    x6 = np.sin(x5)
                                    print(f"x6 = sin(x5) = {x6}")
                                    
                                    x7 = x4 + x6
                                    print(f"x7 = x4 + x6 = {x7}")
                                    
                                    x8 = np.log(x7)
                                    f = x8
                                    print(f"x8 = log(x7) = {f}")
                                    print(f"\nFunction value: f = {f}")
                                    
                                    # Backward Pass - Compute gradients
                                    print("\n" + "="*60)
                                    print("BACKWARD PASS:")
                                    
                                    # Initialize gradient
                                    df_dx8 = 1.0
                                    print(f"∂f/∂x8 = {df_dx8}")
                                    
                                    # x8 = log(x7)
                                    df_dx7 = df_dx8 * (1.0 / x7)
                                    print(f"∂f/∂x7 = ∂f/∂x8 * ∂x8/∂x7 = {df_dx8} * (1/{x7}) = {df_dx7}")
                                    
                                    # x7 = x4 + x6
                                    df_dx4 = df_dx7 * 1.0
                                    df_dx6 = df_dx7 * 1.0
                                    print(f"∂f/∂x4 = ∂f/∂x7 * ∂x7/∂x4 = {df_dx7} * 1 = {df_dx4}")
                                    print(f"∂f/∂x6 = ∂f/∂x7 * ∂x7/∂x6 = {df_dx7} * 1 = {df_dx6}")
                                    
                                    # x4 = x3²
                                    df_dx3 = df_dx4 * (2 * x3)
                                    print(f"∂f/∂x3 = ∂f/∂x4 * ∂x4/∂x3 = {df_dx4} * 2*{x3} = {df_dx3}")
                                    
                                    # x6 = sin(x5)
                                    df_dx5 = df_dx6 * np.cos(x5)
                                    print(f"∂f/∂x5 = ∂f/∂x6 * ∂x6/∂x5 = {df_dx6} * cos({x5}) = {df_dx5}")
                                    
                                    # Now accumulate gradients for x1 and x2
                                    # x3 = x1 + x2
                                    df_dx1_from_x3 = df_dx3 * 1.0
                                    df_dx2_from_x3 = df_dx3 * 1.0
                                    
                                    # x5 = x1 * x2
                                    df_dx1_from_x5 = df_dx5 * x2
                                    df_dx2_from_x5 = df_dx5 * x1
                                    
                                    # Sum gradients from all paths
                                    df_dx1 = df_dx1_from_x3 + df_dx1_from_x5
                                    df_dx2 = df_dx2_from_x3 + df_dx2_from_x5
                                    
                                    print(f"\n∂f/∂x1 = ∂f/∂x3 * ∂x3/∂x1 + ∂f/∂x5 * ∂x5/∂x1")
                                    print(f"       = {df_dx3} * 1 + {df_dx5} * {x2}")
                                    print(f"       = {df_dx1_from_x3} + {df_dx1_from_x5}")
                                    print(f"       = {df_dx1}")
                                    
                                    print(f"\n∂f/∂x2 = ∂f/∂x3 * ∂x3/∂x2 + ∂f/∂x5 * ∂x5/∂x2")
                                    print(f"       = {df_dx3} * 1 + {df_dx5} * {x1}")
                                    print(f"       = {df_dx2_from_x3} + {df_dx2_from_x5}")
                                    print(f"       = {df_dx2}")
                                    
                                    # Verify with the closed form
                                    print("\n" + "="*60)
                                    print("VERIFICATION WITH CLOSED FORM:")
                                    expected_df_dx1 = (2*(x1 + x2) + x2*np.cos(x1*x2)) / ((x1 + x2)**2 + np.sin(x1*x2))
                                    expected_df_dx2 = (2*(x1 + x2) + x1*np.cos(x1*x2)) / ((x1 + x2)**2 + np.sin(x1*x2))
                                    
                                    print(f"Expected ∂f/∂x1 = {expected_df_dx1}")
                                    print(f"Expected ∂f/∂x2 = {expected_df_dx2}")
                                    print(f"Error in ∂f/∂x1: {abs(df_dx1 - expected_df_dx1)}")
                                    print(f"Error in ∂f/∂x2: {abs(df_dx2 - expected_df_dx2)}")
                                    
                                    return f, df_dx1, df_dx2


                                def pytorch_autodiff_example(x1_val, x2_val):
                                    """
                                    PyTorch implementation showing how modern autodiff frameworks handle this
                                    """
                                    import torch
                                    
                                    print("\n" + "="*60)
                                    print("PYTORCH AUTOMATIC DIFFERENTIATION:")
                                    
                                    # Create tensors with gradient tracking
                                    x1 = torch.tensor(x1_val, requires_grad=True, dtype=torch.float32)
                                    x2 = torch.tensor(x2_val, requires_grad=True, dtype=torch.float32)
                                    
                                    # Define the function
                                    f = torch.log((x1 + x2)**2 + torch.sin(x1 * x2))
                                    
                                    # Compute gradients
                                    f.backward()
                                    
                                    print(f"f({x1_val}, {x2_val}) = {f.item()}")
                                    print(f"∂f/∂x1 = {x1.grad.item()}")
                                    print(f"∂f/∂x2 = {x2.grad.item()}")
                                    
                                    return f.item(), x1.grad.item(), x2.grad.item()


                                def gradient_check(f, x1, x2, epsilon=1e-7):
                                    """
                                    Numerical gradient checking using finite differences
                                    """
                                    # Compute analytical gradients
                                    df_dx1, df_dx2 = manual_autodiff_example(x1, x2)
                                    
                                    print("\n" + "="*60)
                                    print("NUMERICAL GRADIENT CHECK:")
                                    
                                    # Numerical gradient for x1
                                    def eval_f(x1_val, x2_val):
                                        return np.log((x1_val + x2_val)**2 + np.sin(x1_val * x2_val))
                                    
                                    f_plus_x1 = eval_f(x1 + epsilon, x2)
                                    f_minus_x1 = eval_f(x1 - epsilon, x2)
                                    numerical_df_dx1 = (f_plus_x1 - f_minus_x1) / (2 * epsilon)
                                    
                                    # Numerical gradient for x2
                                    f_plus_x2 = eval_f(x1, x2 + epsilon)
                                    f_minus_x2 = eval_f(x1, x2 - epsilon)
                                    numerical_df_dx2 = (f_plus_x2 - f_minus_x2) / (2 * epsilon)
                                    
                                    print(f"Analytical ∂f/∂x1: {df_dx1}")
                                    print(f"Numerical  ∂f/∂x1: {numerical_df_dx1}")
                                    print(f"Difference: {abs(df_dx1 - numerical_df_dx1)}")
                                    
                                    print(f"\nAnalytical ∂f/∂x2: {df_dx2}")
                                    print(f"Numerical  ∂f/∂x2: {numerical_df_dx2}")
                                    print(f"Difference: {abs(df_dx2 - numerical_df_dx2)}")


                                if __name__ == "__main__":
                                    # Test with specific values
                                    x1 = 1.0
                                    x2 = 0.5
                                    
                                    # Manual implementation
                                    f_manual, grad_x1_manual, grad_x2_manual = manual_autodiff_example(x1, x2)
                                    
                                    # PyTorch implementation 
                                    f_pytorch, grad_x1_pytorch, grad_x2_pytorch = pytorch_autodiff_example(x1, x2)
                                
                                    # Numerical gradient check
                                    gradient_check(None, x1, x2)