Automatic Differentiation

Recall the chain rule

Any computer program can be thought of as a composition of functions, i.e.

\[ output = f(g(h(input))) = f(g(h(w_0)) = f(g(w_1)) = f(w_2) = w_3 \]

where

\[\begin{align*} w_0 &= input \\ w_1 &= h(w_0) \\ w_2 &= g(w_1) \\ w_3 &= f(w_2) = output \end{align*}\]

If we want to compute

\[ \frac{d}{d(input)}(output) \]

we can use the chain rule, i.e.

\[ \frac{d}{d(input)}(output) = \frac{d(output)}{dw_2}\frac{dw_2}{dw_1}\frac{dw_1}{d(input)} \]

The computer can do this for us! See here for more details.

Forward vs. Reverse Mode AD

  • Forward AD traverses the chain rule from inside to outside

    • Easy to implement

    • Lower memory footprint

  • Reverse AD traverses the chain rule from outside to inside

    • Requires half the operations of Forward AD

    • More difficult to implement

    • Larger memory footprint

ForwardDiff.jl

Forward mode AD in Julia

Let’s start with the example \(f(x) = 3 x^2\) which of course has the analytic derivative

\[ f'(x) = 6 x \]
using ForwardDiff

f(x) = 3*x^2

ForwardDiff.derivative(f, 3)
18

In place functions

function f!(y, x)
  y[1] = 3.0 * x^2
  y[2] = x^4
  nothing
end

y = [0.0, 0.0]
f!(y, 3)
@show y;
y = [27.0, 81.0]
y = [0.0, 0.0]
ForwardDiff.derivative(f!, y, 3)
2-element Vector{Float64}:
  18.0
 108.0
y = [0.0, 0.0]
dy = [0.0, 0.0]
ForwardDiff.derivative!(dy, f!, y, 3)
@show dy;
dy = [18.0, 108.0]

Gradients

Recall the gradient of a scalar function is

\[ \nabla f(x_1, x_2) = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}\right] \]
g(x) = 3.0 * x[1]^2 + x[2]^4
g([1.0, 2.0])
19.0
ForwardDiff.gradient(g, [1.0, 2.0])
2-element Vector{Float64}:
  6.0
 32.0
h(x, p) = p[1] * x[1]^2 + p[2] * x[2]^4

h([1.0, 2.0], [3.0, 1.0])
19.0
ForwardDiff.gradient(x -> h(x, [3.0, 1.0]), [1.0, 2.0])
2-element Vector{Float64}:
  6.0
 32.0

Jacobian

Recall the Jacobian of a vector function is

\[\begin{split} \nabla \vec{f}(x_1, x_2) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} \end{bmatrix} \end{split}\]
function f2(x)
  [3.0 * x[1]^2, x[1] * x[2]^4]
end

f2([3.0, 1.0])
2-element Vector{Float64}:
 27.0
  3.0
ForwardDiff.jacobian(f2, [3.0, 1.0])
2×2 Matrix{Float64}:
 18.0   0.0
  1.0  12.0

Other AD packages in Julia

  • Zygote.jl - Reverse Mode AD, source-to-source

  • Diffractor.jl - Next generation AD package w/ API similar to Zygote