Gradient Calculation Methods

Keywords

mdo, Finite Difference, Complex Step, hyperdual numbers, gradient calculation

Given a design point \(\mathbf{x}_0\), the aim to find \(\nabla f(\mathbf{x}_0)\) for a sufficiently smooth objective function \(f\). Clearly, the gradient of a function \(f(x)\) with respect to \(\mathbf{x}\) is given by

\[ \nabla f(\mathbf{x})=\frac{\partial f}{\partial x}=\begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix} \]

Even though at present we will look into various algorithms to calculate \(\nabla f(\mathbf{x})\), some applications only require the directional derivative of a function \(f\) in a descent direction \(\nabla f(\mathbf{x}) \cdot \mathbf{d}\). The computational cost of calculating the directional derivative is much less than calculating the entire gradient.

We outline the most popular methods for gradient calculation and compare them based on the computational cost and accuracy.

Finite Difference Methods

The finite difference method is the most common method for calculating derivatives of a function. It is based on the Taylor series expansion of a function about a point \(x_0\). The Taylor series expansion of a function \(f(x)\) about a point \(x_0\) is given by

\[f(x)=f(x_0)+\frac{f'(x_0)}{1!}(x-x_0)+\frac{f''(x_0)}{2!}(x-x_0)^2+...\]

If \(\Delta x = x-x_0\), then the three most widely used FD methods are:

Scheme	Approximation	Order of accuracy
Forward difference	\(\frac{df}{dx}=\frac{f(x_0+\Delta x)-f(x_0)}{\Delta x}\)	\(O(\Delta x)\)
Backward difference	\(\frac{df}{dx}=\frac{f(x_0)-f(x_0-\Delta x)}{\Delta x}\)	\(O(\Delta x)\)
Central differnce	\(\frac{df}{dx}=\frac{f(x_0+\Delta x)-f(x_0-\Delta x)}{2\Delta x}\)	\(O((\Delta x)^2)\)

The third column in the table lists the order of accuracy of a given scheme. To obtain more accurate approximations we can use higher order derivatives but at the cost of making the differences more expensive to compute. Other limitations of this method are faced with practical PDE problems which contain discontinuities (e.g. Heat flow through two mediums), in such cases the Taylor-series approximation is no longer valid.

# A comparison between forward,central and backward difference scheme
using Plots, ForwardDiff

function plotPerformance(f,x₀,pt)
    n = 15
    Δx=zeros(n,1)
    for i = 1:n
        Δx[i]=10.0^-i
    end

    ϵ = zeros(n,3)
    ∇f = ForwardDiff.derivative(f,x₀)
    for j=1:n
        ϵ[j,1]=abs( (f(x₀+Δx[j])-f(x₀))/Δx[j] - ∇f )
        ϵ[j,2]=abs( (f(x₀)-f(x₀-Δx[j]))/Δx[j] - ∇f )
        ϵ[j,3]=abs( (f(x₀+Δx[j])-f(x₀-Δx[j]))/(2Δx[j]) - ∇f )
    end

    pt = plot!(pt,Δx,ϵ,
        xaxis=:log,
        yaxis=:log,
        xlabel="Step Size",
        ylabel="Error",
        markershape=:auto,
        label=["Forward difference" "Central difference" "Backward difference"])
    return pt
end

pt1 = plotPerformance(x->sin(x), π/100, plot(title="Performance at x=0"))
pt2 = plotPerformance(x->sin(x), π/4, plot(title="Performance at x=π/4"))
pt3 = plotPerformance(x->sin(x), π/2, plot(title="Performance at x=π/2"))

plot(pt1,pt2,pt3,layout=(3,1),size=(800,800))

Sensitivity to step-size

Inappropriate step size could affect the optimization stability and efficiency.

The main sources of errors in finite difference methods are truncation error and the subtractive cancellation error. It’s the error made by truncating an infinite sum (e.g. Taylor series) and approximating it by a finite sum. As we can see from the plot, the truncation error decreases and reaches a minimum. As we go on decreasing the step size the subtractive cancellation error due to finite precision arithmetic gets added.

The optimal step size can be determined from sensitivity analysis to minimize the total numerical error, yet this will introduce additional computational burdens.

Complex Variable Method (CVM)

This method was first introduced by @squire1998.

Given a function that is infinitely differentiable and can be smoothly extended into the complex plane. We can write it as F(z) and such function can be expanded about \(x_0\) using the Taylor series.

\[f(x_0+ih)=f(x_0)+ihf'(x_0)−\frac{h^2f″(x_0)}{2!}−i\frac{h^3f^{(3)}}{3!}+....\]

Take the imaginary part of both sides and divide by h.

\[F′(x_0)=Im(\frac{F(x_0+ih)}{h})+O(h^2)\]

Evaluating the function \(F\) for an imaginary argument \(x_0+ih\) and dividing by h, gives an approximation to the value of the derivative, \(F′(x_0)\), that is accurate to order \(O(h^2)\).

using Plots
using ForwardDiff

N = 30
x₀ = 1.0
f(x) = x^3*cos(x) + 2*x^2 + 3*x + 4
∇f = ForwardDiff.derivative(f,x₀)

∇f_cs(x, Δx) = imag(f(x₀ + im * Δx) / Δx)
∇f_cd(x, Δx) = (f(x + Δx) - f(x - Δx)) / (2Δx)

ϵ = zeros(N, 2)
Δx = 10.0 .^ collect(-1:-1:-N)
for j in 1:N
    ϵ[j,1] = abs(∇f_cs(x₀, Δx[j]) - ∇f)
    ϵ[j,2] = abs(∇f_cd(x₀, Δx[j]) - ∇f)
    if(ϵ[j,1] < 1e-16)
        ϵ[j,1] = 1e-16
    end
end

plot(Δx, ϵ[:,1], label="Complex Step Differentiation", xaxis=:log, yaxis=:log, grid=true, legend=:topright)
plot!(Δx, ϵ[:,2], label="Central difference", xaxis=:log, yaxis=:log)
xlabel!("Step Size")
ylabel!("Error")
title!("Error Vs Step Size for complex variable trick and central difference methods")

Converting a real function to a complex function

Consider the following function \(f(x) = u^2\) such that \(u^3 + u = x\). Following code implements this function in Python and and calculates the derivative using the complex step method.

Explicit function

Code

def f(x): 
    return x**3 + x**4

def df(x, h):
    return np.imag(f(x + 1j*h) / h)

# Test the function

x = 1.0
h = 1e-8
print(df(x,h))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 11
      9 x = 1.0
     10 h = 1e-8
---> 11 print(df(x,h))

Cell In[5], line 5, in df(x, h)
      4 def df(x, h):
----> 5     return np.imag(f(x + 1j*h) / h)

NameError: name 'np' is not defined

Implicit function

Code

# Given input x, this function calculates the value of f(x). Since f(x) = u^2, we need to solve u^3 + u = x to get u.

import numpy as np

def f(x):
# We will solve the equation u^3 + u = x to get u without using inbuild python function

    def f(u):
        return u**3 + u - x

    # Initial guess for u
    u = 0.5

    # Newton-Raphson method to solve the equation
    for i in range(100):
        u = u - (u**3 + u - x) / (3 * u**2 + 1)

    return u**2

x = 2.0
print(f(x))

1.0

Now we will modify the code to work with complex numbers and then calculate the derivative using the complex step method.

Code

# Given input x, this function calculates the value of f(x). Since f(x) = u^2, we need to solve u^3 + u = x to get u.

def df(x, h):
    return np.imag(f(x + 1j*h) / h)

# Test the function

x = 2.0
h = 1e-8
print(df(x,h))

0.5

As can seen, no extra effort is required for conversion of the function to handle complex variables. However, things are not so straightforward in compiled languages like Fortran and C.

Fortran Example

program complex_step
    implicit none
    real(8) :: x, h
    real(8) :: df

    x = 1.0
    h = 1e-8
    df = df(x, h)
    print *, df

contains

    real(8) function f(x)
        real(8), intent(in) :: x
        f = x**3 + x**4
    end function f

    real(8) function df(x, h)
        real(8), intent(in) :: x, h
        df = aimag(f(x + cmplx(0.0, h)) / h)
    end function df
end program complex_step

A more complete treatment of the subject can be found in this paper by Martins.

Automatic Differntiation (AD)

AD is the most promising method for calculating gradients and Hessians. AD goes through the source code of a function and differtiates the code line-by-line analytically. This involves carrying forward the perturbations in the functions from earlier. AD is available for almost all languages with varying degrees of sophestication. It is easier to create a AD software for low level languages like Fortran and C. Packages like Tapenade have matured to a degree that a successful application of AD technology has been made to industrial CFD softwares.

AD methods for higher level languages are not very robust. Julia has many experimental AD packages that use

Source transformation
operator overloading
Dual numbers
Hyper-dual numbers

to calculate \(n^{th}\) order derivatives for native Julia functions.

AD can be applied in two modes:

Forward mode (Tangent mode)
Reverse mode (Adjoint mode)

Here we will demonstrate the AD capabilities of Tapenade to give a flavour of the process.

Simple Fortran Example Using Tapenade

simpleTest.f

      subroutine function(in1, in2, out)
          real*8 in1, in2, out
          real*8 tmp1, tmp2
          
          tmp1 = sin(in1)
          out = tmp1 + in2

          return
      end

Commands to differentiate the code

tapenade -forward
         -head function
         -output function
         -vars "in1, in2, out"
         -outvars "out"
         simpleTest.f

tapenade -reverse
         -head function
         -output function
         -vars "in1, in2, out"
         -outvars "out"
         simpleTest.f

Forward Differentiated code simpleTest_d.f

C        Generated by TAPENADE     (INRIA, Tropics team)
C  Tapenade 3.9 (r5096) - 24 Feb 2014 16:53
C
C  Differentiation of function in forward (tangent) mode:
C   variations   of useful results: out
C   with respect to varying inputs: in1 in2
C   RW status of diff variables: out:out in1:in in2:in
      SUBROUTINE FUNCTION_D(in1, in1d, in2, in2d, out, outd)
      IMPLICIT NONE
      REAL*8 in1, in2, out
      REAL*8 in1d, in2d, outd
      REAL*8 tmp1, tmp2
      REAL*8 tmp1d
      INTRINSIC SIN
      tmp1d = in1d*COS(in1)
      tmp1 = SIN(in1)
      outd = tmp1d + in2d
      out = tmp1 + in2
C
      RETURN
      END

Reverse Differentiated code simpleTest_b.f

C        Generated by TAPENADE     (INRIA, Tropics team)
C  Tapenade 3.9 (r5096) - 24 Feb 2014 16:53
C
C  Differentiation of function in reverse (adjoint) mode:
C   gradient     of useful results: out
C   with respect to varying inputs: out in1 in2
C   RW status of diff variables: out:in-zero in1:out in2:out
      SUBROUTINE FUNCTION_B(in1, in1b, in2, in2b, out, outb)
      IMPLICIT NONE
      REAL*8 in1, in2, out
      REAL*8 in1b, in2b, outb
      REAL*8 tmp1, tmp2
      REAL*8 tmp1b
      INTRINSIC SIN
C
      tmp1b = outb
      in2b = outb
      in1b = COS(in1)*tmp1b
      outb = 0.0
      END

Tapenade can be accessed to differentiate Fortran, C and (limited form of) C++ codes online here.

Python Example

Code

# Python example using autograd of an explicit function of two variables and single output

import autograd.numpy as np
from autograd import grad

def f(x):
    x[1] = 2*x[1]
    return x[0]**2 + x[1]**2

grad_f = grad(f)

x = np.array([1.0, 2.0])
print(grad_f(x))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 13
     10 grad_f = grad(f)
     12 x = np.array([1.0, 2.0])
---> 13 print(grad_f(x))

File ~/anaconda3/lib/python3.12/site-packages/autograd/wrap_util.py:20, in unary_to_nary.<locals>.nary_operator.<locals>.nary_f(*args, **kwargs)
     18 else:
     19     x = tuple(args[i] for i in argnum)
---> 20 return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)

File ~/anaconda3/lib/python3.12/site-packages/autograd/differential_operators.py:28, in grad(fun, x)
     21 @unary_to_nary
     22 def grad(fun, x):
     23     """
     24     Returns a function which computes the gradient of `fun` with respect to
     25     positional argument number `argnum`. The returned function takes the same
     26     arguments as `fun`, but returns the gradient instead. The function `fun`
     27     should be scalar-valued. The gradient has the same type as the argument."""
---> 28     vjp, ans = _make_vjp(fun, x)
     29     if not vspace(ans).size == 1:
     30         raise TypeError("Grad only applies to real scalar-output functions. "
     31                         "Try jacobian, elementwise_grad or holomorphic_grad.")

File ~/anaconda3/lib/python3.12/site-packages/autograd/core.py:10, in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

File ~/anaconda3/lib/python3.12/site-packages/autograd/tracer.py:10, in trace(start_node, fun, x)
      8 with trace_stack.new_trace() as t:
      9     start_box = new_box(x, t, start_node)
---> 10     end_box = fun(start_box)
     11     if isbox(end_box) and end_box._trace == start_box._trace:
     12         return end_box._value, end_box._node

File ~/anaconda3/lib/python3.12/site-packages/autograd/wrap_util.py:15, in unary_to_nary.<locals>.nary_operator.<locals>.nary_f.<locals>.unary_f(x)
     13 else:
     14     subargs = subvals(args, zip(argnum, x))
---> 15 return fun(*subargs, **kwargs)

Cell In[8], line 7, in f(x)
      6 def f(x):
----> 7     x[1] = 2*x[1]
      8     return x[0]**2 + x[1]**2

TypeError: 'ArrayBox' object does not support item assignment

Summary

	FDM	CVM	AD
Accuracy	Low	High	High
Black Box	Yes	No	No
Source	Yes	Yes	Yes
Smooth Function	Yes	Yes	Yes
Non-smooth Function	No	No	Yes
Discontinuous Function	No	No	Yes
Higher order derivatives	Yes	No	Yes