Automatic Differentiation
Contents
Automatic Differentiation¶
Recall the chain rule¶
Any computer program can be thought of as a composition of functions, i.e.
\[
output = f(g(h(input))) = f(g(h(w_0)) = f(g(w_1)) = f(w_2) = w_3
\]
where
\[\begin{align*}
w_0 &= input \\
w_1 &= h(w_0) \\
w_2 &= g(w_1) \\
w_3 &= f(w_2) = output
\end{align*}\]
If we want to compute
\[
\frac{d}{d(input)}(output)
\]
we can use the chain rule, i.e.
\[
\frac{d}{d(input)}(output) = \frac{d(output)}{dw_2}\frac{dw_2}{dw_1}\frac{dw_1}{d(input)}
\]
The computer can do this for us! See here for more details.
Forward vs. Reverse Mode AD¶
Forward AD traverses the chain rule from inside to outside
Easy to implement
Lower memory footprint
Reverse AD traverses the chain rule from outside to inside
Requires half the operations of Forward AD
More difficult to implement
Larger memory footprint
ForwardDiff.jl¶
Forward mode AD in Julia
Let’s start with the example \(f(x) = 3 x^2\) which of course has the analytic derivative
\[
f'(x) = 6 x
\]
using ForwardDiff
f(x) = 3*x^2
ForwardDiff.derivative(f, 3)
18
In place functions¶
function f!(y, x)
y[1] = 3.0 * x^2
y[2] = x^4
nothing
end
y = [0.0, 0.0]
f!(y, 3)
@show y;
y = [27.0, 81.0]
y = [0.0, 0.0]
ForwardDiff.derivative(f!, y, 3)
2-element Vector{Float64}:
18.0
108.0
y = [0.0, 0.0]
dy = [0.0, 0.0]
ForwardDiff.derivative!(dy, f!, y, 3)
@show dy;
dy = [18.0, 108.0]
Gradients¶
Recall the gradient of a scalar function is
\[
\nabla f(x_1, x_2) = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}\right]
\]
g(x) = 3.0 * x[1]^2 + x[2]^4
g([1.0, 2.0])
19.0
ForwardDiff.gradient(g, [1.0, 2.0])
2-element Vector{Float64}:
6.0
32.0
h(x, p) = p[1] * x[1]^2 + p[2] * x[2]^4
h([1.0, 2.0], [3.0, 1.0])
19.0
ForwardDiff.gradient(x -> h(x, [3.0, 1.0]), [1.0, 2.0])
2-element Vector{Float64}:
6.0
32.0
Jacobian¶
Recall the Jacobian of a vector function is
\[\begin{split}
\nabla \vec{f}(x_1, x_2) =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2}
\end{bmatrix}
\end{split}\]
function f2(x)
[3.0 * x[1]^2, x[1] * x[2]^4]
end
f2([3.0, 1.0])
2-element Vector{Float64}:
27.0
3.0
ForwardDiff.jacobian(f2, [3.0, 1.0])
2×2 Matrix{Float64}:
18.0 0.0
1.0 12.0
Other AD packages in Julia¶
Zygote.jl - Reverse Mode AD, source-to-source
Diffractor.jl - Next generation AD package w/ API similar to Zygote