graph TB A[Start] --> B[Initial Guess] B --> C[Compute Gradient] C --> D[Check Convergence] D -->|Yes| E[Stop] D -->|No| F[Choose Descent Direction] F --> G[Choose Step Size] G --> H[Update Design] H --> C
Gradient Descent Methods
optimisation, mdo
Gradient Methods
The basic algorithm for all gradient methods was outlined earlier.
Here we will recap the algorithm in the form of a mermaid diagram that shows the generic algorithm of a gradient
Please note that here we have given a single convergence criterion. In practice, we may have multiple convergence criteria. Possible convergence criteria are:
- maximum number of iterations,
- maximum number of function evaluations,
- maximum number of gradient evaluations,
- maximum number of Hessian evaluations,
- maximum number of line searches,
- maximum number of iterations without improvement,
- magnitude of the change in the design per iteration, and
- magnitude of the change in the objective function per iteration.
This is not an exhaustive list. We can also use a combination of these criteria together.
Various methods differ in the way they choose the new point \(x_1\) such that \(f(x_1)<f(x_0)\). There are two steps involved in choosing the new point \(x_1\):
Choose the direction in which to move from \(x_0\). This is known as the descent direction.
Choose the distance to move along that direction. This is known as the step-size.
In the present case (\(x \in \mathbb{R}\)), the first is simple. We simply move in the direction of the negative gradient. The second is more complicated. We will discuss various methods for choosing the descent direction here.
Steepest Descent (Cauchy’s Method)
Cauchy proposed the method of steepest descent in 1847. He was a French mathematician who made major contributions to analysis and number theory. He was the first to prove the Cauchy integral theorem. He also made important contributions to mechanics and optics.
Steepest descent method is the simplest gradient method. It uses only the first order information. It is also known as Cauchy’s method. The method is as follows:
Start with an initial guess \(x_0\).
Evaluate \(f(x)\) at \(x_0\). If \(f'(x_0)=0\), then \(x_0\) is the optimum. If \(f'(x_0)\neq 0\), then choose a new point \(x_1\) such that \(f(x_1)<f(x_0)\).
Repeat the process until \(f'(x_n)=0\).
The new point \(x_1\) is chosen such that \(f(x_1)<f(x_0)\). This is known as the
Rosenbrock Problem
Rosenbrock function is a non-convex function used as a performance test problem for optimization algorithms.
\[f(x,y)=(a-x)^{2}+b(y-x^{2})^{2}\]
The global minimum is inside a long, narrow, parabolic shaped flat valley. Given
\[\begin{aligned} f_{x} &= -2(a-x) − 2bx(y−x^2)\\ f_{y} &= -2b(y–x^2) \end{aligned}\]
analytical solution is \((x,y)=(,)\). The numerical solution, however, poses a particular challenge.
using Optim, Plots, PlutoUI
pyplot()
plot([1.0], [1.0], seriestype=:scatter, label="Minima")
f(x::Vector) = (1.0 - x[1])^2 + 100.0 * (x[1] - x[2]^2)^2
x₁ = -5:0.05:5; x₂ = -5:0.05:5
z = [f([xi; yi]) for xi in x₁, yi in x₂]
plt = contour!(x₁,x₂,z,levels=50,
xlabel="x₁",
ylabel="x₂",
title="Rosenbrock Function",
titlefontsize=10)
niter = 100
x̄₀ = [3.0, 10.0]
# Steepest Descent
xsd = ones(niter,2)
xsd[1,:] = x̄₀
res = optimize(f, x̄₀, method = GradientDescent(),
iterations=niter, store_trace=true, extended_trace=true)
function plot_optim_trace(plt, res)
path=ones(niter,2)
tmp = Optim.x_trace(res)
for i in 1:1:niter
path[i,1] = tmp[i][1]
path[i,2] = tmp[i][2]
end
plot!(plt, path[:,1], path[:,2], seriestype=:scatter, label = "Steepest Descent")
end
savefig(joinpath(@OUTPUT,"ex1.svg"))
plt1 = plot()
plot_optim_trace(plt, res)
savefig(joinpath(@OUTPUT,"ex2.svg"))
Optim.x_trace(res)
Optim.minimizer(res)
function plot_optim_trace(plt, res, label_string)
niter = length(Optim.x_trace(res))
path=ones(niter,2)
tmp = Optim.x_trace(res)
for i in 1:1:niter
path[i,1] = tmp[i][1]
path[i,2] = tmp[i][2]
end
plot!(plt, path[:,1], path[:,2], seriestype=:line, markershape=:circle, lw=2, markersize = 4, label = label_string)
end
function run_and_plot_method(f, x̄₀, niter, method, plt)
if method=="GradientDescent"
res = optimize(f, x̄₀, GradientDescent(),
Optim.Options(iterations=niter,
store_trace=true,
extended_trace=true);
autodiff = :forward)
elseif method=="ConjugateGradient"
res = optimize(f, x̄₀, ConjugateGradient(),
Optim.Options(iterations=niter,
store_trace=true,
extended_trace=true);
autodiff = :forward)
end
plot_optim_trace(plt, res, method)
end
using Optim, Plots
# Elementary example of an Ellipse
niter = 50
x̄₀ = [2.0, 1.0]
f(x::Vector) = x'*[1 0; 0 40]*x
plt = plot([0.0], [0.0], seriestype=:scatter, label="Minima")
# Steepest Descent
run_and_plot_method(f, x̄₀, niter, "GradientDescent", plt)
#Conjugate Gradient
run_and_plot_method(f, x̄₀, niter, "ConjugateGradient", plt)
x₁ = 0:0.01:0.1; x₂ = 0:0.01:0.1
z = [f([xi; yi]) for xi in x₁, yi in x₂]
plt = contour!(x₁,x₂,z,levels=50, xlabel="x₁", ylabel="x₂", title="Rosenbrock Function", titlefontsize=10)
# Classic Example of Rosenbrock Function
niter = 10
x̄₀ = [3.0, 1.5]
plt = plot([1.0], [1.0], seriestype=:scatter, label="Minima")
f(x::Vector) = (1.0 - x[1])^2 + 100.0 * (x[1] - x[2]^2)^2
x₁ = -5:0.05:5; x₂ = -5:0.05:5
z = [f([xi; yi]) for xi in x₁, yi in x₂]
plt = contour!(x₁,x₂,z,levels=50, xlabel="x₁", ylabel="x₂", title="Rosenbrock Function", titlefontsize=10)
# Steepest Descent
xsd = ones(niter,2)
xsd[1,:] = x̄₀
res = optimize(f, x̄₀, method = GradientDescent(),
iterations=niter, store_trace=true, extended_trace=true)
plot_optim_trace(plt, res, "Steepest Descent")
#Conjugate Gradient
xcg = ones(niter,2)
xcg[1,:] = x̄₀
res = optimize(f, x̄₀, method = ConjugateGradient(),
iterations=niter, store_trace=true, extended_trace=true)
plot_optim_trace(plt, res, "Conjugate Gradient")