Given a design point \(\mathbf{x}_0\), the aim to find \(\nabla f(\mathbf{x}_0)\) for a sufficiently smooth objective function \(f\). Clearly, the gradient of a function \(f(x)\) with respect to \(\mathbf{x}\) is given by
Even though at present we will look into various algorithms to calculate \(\nabla
f(\mathbf{x})\), some applications only require the directional derivative of a function \(f\) in a descent direction \(\nabla f(\mathbf{x}) \cdot \mathbf{d}\). The computational cost of calculating the directional derivative is much less than calculating the entire gradient.
We outline the most popular methods for gradient calculation and compare them based on the computational cost and accuracy.
Finite Difference Methods
The finite difference method is the most common method for calculating derivatives of a function. It is based on the Taylor series expansion of a function about a point \(x_0\). The Taylor series expansion of a function \(f(x)\) about a point \(x_0\) is given by
The third column in the table lists the order of accuracy of a given scheme. To obtain more accurate approximations we can use higher order derivatives but at the cost of making the differences more expensive to compute. Other limitations of this method are faced with practical PDE problems which contain discontinuities (e.g. Heat flow through two mediums), in such cases the Taylor-series approximation is no longer valid.
# A comparison between forward,central and backward difference schemeusingPlots, ForwardDifffunctionplotPerformance(f,x₀,pt) n =15 Δx=zeros(n,1)for i =1:n Δx[i]=10.0^-iend ϵ =zeros(n,3) ∇f = ForwardDiff.derivative(f,x₀)for j=1:n ϵ[j,1]=abs( (f(x₀+Δx[j])-f(x₀))/Δx[j] - ∇f ) ϵ[j,2]=abs( (f(x₀)-f(x₀-Δx[j]))/Δx[j] - ∇f ) ϵ[j,3]=abs( (f(x₀+Δx[j])-f(x₀-Δx[j]))/(2Δx[j]) - ∇f )end pt =plot!(pt,Δx,ϵ, xaxis=:log, yaxis=:log, xlabel="Step Size", ylabel="Error", markershape=:auto, label=["Forward difference""Central difference""Backward difference"])return ptendpt1 =plotPerformance(x->sin(x), π/100, plot(title="Performance at x=0"))pt2 =plotPerformance(x->sin(x), π/4, plot(title="Performance at x=π/4"))pt3 =plotPerformance(x->sin(x), π/2, plot(title="Performance at x=π/2"))plot(pt1,pt2,pt3,layout=(3,1),size=(800,800))
Sensitivity to step-size
Inappropriate step size could affect the optimization stability and efficiency.
The main sources of errors in finite difference methods are truncation error and the subtractive cancellation error. It’s the error made by truncating an infinite sum (e.g. Taylor series) and approximating it by a finite sum. As we can see from the plot, the truncation error decreases and reaches a minimum. As we go on decreasing the step size the subtractive cancellation error due to finite precision arithmetic gets added.
The optimal step size can be determined from sensitivity analysis to minimize the total numerical error, yet this will introduce additional computational burdens.
Complex Variable Method (CVM)
This method was first introduced by @squire1998.
Given a function that is infinitely differentiable and can be smoothly extended into the complex plane. We can write it as F(z) and such function can be expanded about \(x_0\) using the Taylor series.
Take the imaginary part of both sides and divide by h.
\[F′(x_0)=Im(\frac{F(x_0+ih)}{h})+O(h^2)\]
Evaluating the function \(F\) for an imaginary argument \(x_0+ih\) and dividing by h, gives an approximation to the value of the derivative, \(F′(x_0)\), that is accurate to order \(O(h^2)\).
Consider the following function \(f(x) = u^2\) such that \(u^3 + u = x\). Following code implements this function in Python and and calculates the derivative using the complex step method.
Explicit function
Code
def f(x): return x**3+ x**4def df(x, h):return np.imag(f(x +1j*h) / h)# Test the functionx =1.0h =1e-8print(df(x,h))
---------------------------------------------------------------------------NameError Traceback (most recent call last)
Cell In[5], line 11 9 x =1.0 10 h =1e-8---> 11print(df(x,h))
Cell In[5], line 5, in df(x, h) 4defdf(x, h):
----> 5return np.imag(f(x +1j*h) / h)
NameError: name 'np' is not defined
Implicit function
Code
# Given input x, this function calculates the value of f(x). Since f(x) = u^2, we need to solve u^3 + u = x to get u.import numpy as npdef f(x):# We will solve the equation u^3 + u = x to get u without using inbuild python functiondef f(u):return u**3+ u - x# Initial guess for u u =0.5# Newton-Raphson method to solve the equationfor i inrange(100): u = u - (u**3+ u - x) / (3* u**2+1)return u**2x =2.0print(f(x))
1.0
Now we will modify the code to work with complex numbers and then calculate the derivative using the complex step method.
Code
# Given input x, this function calculates the value of f(x). Since f(x) = u^2, we need to solve u^3 + u = x to get u.def df(x, h):return np.imag(f(x +1j*h) / h)# Test the functionx =2.0h =1e-8print(df(x,h))
0.5
As can seen, no extra effort is required for conversion of the function to handle complex variables. However, things are not so straightforward in compiled languages like Fortran and C.
Fortran Example
program complex_step implicit none real(8) :: x, h real(8) :: df x = 1.0 h = 1e-8 df = df(x, h) print *, dfcontains real(8) function f(x) real(8), intent(in) :: x f = x**3 + x**4 end function f real(8) function df(x, h) real(8), intent(in) :: x, h df = aimag(f(x + cmplx(0.0, h)) / h) end function dfend program complex_step
A more complete treatment of the subject can be found in this paper by Martins.
Automatic Differntiation (AD)
AD is the most promising method for calculating gradients and Hessians. AD goes through the source code of a function and differtiates the code line-by-line analytically. This involves carrying forward the perturbations in the functions from earlier. AD is available for almost all languages with varying degrees of sophestication. It is easier to create a AD software for low level languages like Fortran and C. Packages like Tapenade have matured to a degree that a successful application of AD technology has been made to industrial CFD softwares.
AD methods for higher level languages are not very robust. Julia has many experimental AD packages that use
Source transformation
operator overloading
Dual numbers
Hyper-dual numbers
to calculate \(n^{th}\) order derivatives for native Julia functions.
AD can be applied in two modes:
Forward mode (Tangent mode)
Reverse mode (Adjoint mode)
Here we will demonstrate the AD capabilities of Tapenade to give a flavour of the process.
tapenade -forward -head function -output function -vars "in1, in2, out" -outvars "out" simpleTest.ftapenade -reverse -head function -output function -vars "in1, in2, out" -outvars "out" simpleTest.f
Forward Differentiated code simpleTest_d.f
C Generated by TAPENADE (INRIA, Tropics team)C Tapenade 3.9 (r5096) - 24 Feb 2014 16:53CC Differentiation of function in forward (tangent) mode:C variations of useful results: outC with respect to varying inputs: in1 in2C RW status of diff variables: out:out in1:in in2:inSUBROUTINE FUNCTION_D(in1, in1d, in2, in2d, out, outd)IMPLICITNONEREAL*8 in1, in2, outREAL*8 in1d, in2d, outdREAL*8 tmp1, tmp2REAL*8 tmp1dINTRINSICSIN tmp1d = in1d*COS(in1) tmp1 =SIN(in1) outd = tmp1d + in2d out = tmp1 + in2CRETURNEND
Reverse Differentiated code simpleTest_b.f
C Generated by TAPENADE (INRIA, Tropics team)C Tapenade 3.9 (r5096) - 24 Feb 2014 16:53CC Differentiation of function in reverse (adjoint) mode:C gradient of useful results: outC with respect to varying inputs: out in1 in2C RW status of diff variables: out:in-zero in1:out in2:outSUBROUTINE FUNCTION_B(in1, in1b, in2, in2b, out, outb)IMPLICITNONEREAL*8 in1, in2, outREAL*8 in1b, in2b, outbREAL*8 tmp1, tmp2REAL*8 tmp1bINTRINSICSINC tmp1b = outb in2b = outb in1b =COS(in1)*tmp1b outb =0.0END
Tapenade can be accessed to differentiate Fortran, C and (limited form of) C++ codes online here.
Python Example
Code
# Python example using autograd of an explicit function of two variables and single outputimport autograd.numpy as npfrom autograd import graddef f(x): x[1] =2*x[1]return x[0]**2+ x[1]**2grad_f = grad(f)x = np.array([1.0, 2.0])print(grad_f(x))
---------------------------------------------------------------------------TypeError Traceback (most recent call last)
Cell In[8], line 13 10 grad_f = grad(f)
12 x = np.array([1.0, 2.0])
---> 13print(grad_f(x))
File ~/anaconda3/lib/python3.12/site-packages/autograd/wrap_util.py:20, in unary_to_nary.<locals>.nary_operator.<locals>.nary_f(*args, **kwargs) 18else:
19 x =tuple(args[i] for i in argnum)
---> 20return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
File ~/anaconda3/lib/python3.12/site-packages/autograd/differential_operators.py:28, in grad(fun, x) 21@unary_to_nary 22defgrad(fun, x):
23""" 24 Returns a function which computes the gradient of `fun` with respect to 25 positional argument number `argnum`. The returned function takes the same 26 arguments as `fun`, but returns the gradient instead. The function `fun` 27 should be scalar-valued. The gradient has the same type as the argument."""---> 28 vjp, ans = _make_vjp(fun, x)
29ifnot vspace(ans).size ==1:
30raiseTypeError("Grad only applies to real scalar-output functions. " 31"Try jacobian, elementwise_grad or holomorphic_grad.")
File ~/anaconda3/lib/python3.12/site-packages/autograd/core.py:10, in make_vjp(fun, x) 8defmake_vjp(fun, x):
9 start_node = VJPNode.new_root()
---> 10 end_value, end_node = trace(start_node, fun, x)
11if end_node isNone:
12defvjp(g): return vspace(x).zeros()
File ~/anaconda3/lib/python3.12/site-packages/autograd/tracer.py:10, in trace(start_node, fun, x) 8with trace_stack.new_trace() as t:
9 start_box = new_box(x, t, start_node)
---> 10 end_box = fun(start_box)
11if isbox(end_box) and end_box._trace == start_box._trace:
12return end_box._value, end_box._node
File ~/anaconda3/lib/python3.12/site-packages/autograd/wrap_util.py:15, in unary_to_nary.<locals>.nary_operator.<locals>.nary_f.<locals>.unary_f(x) 13else:
14 subargs = subvals(args, zip(argnum, x))
---> 15return fun(*subargs, **kwargs)
Cell In[8], line 7, in f(x) 6deff(x):
----> 7 x[1] =2*x[1]
8return x[0]**2+ x[1]**2TypeError: 'ArrayBox' object does not support item assignment