Study Guide

Study guide (mdo-2021)

Published

January 12, 2024

Reading Guide

Topic	Reference
Unconstrained Optimisation	Jasbir Arora (Ch. 10 & 11)
Optimality Conditions	Jasbir Arora (Ch. 4)
Genetic Algorithms	Kalyanmoy Deb (Ch. 6)
Simulated Annealing	Kalyanmoy Deb (Ch. 6)
Multi-objective Optimisation	Kalyanmoy Deb (Ch. 1, 2, 3)
	Jasbir Arora (Ch. 17)
Gradient Calculation Methods	Alonso Notes
Robust Optimisation	Class Notes
Surrogate Modelling

Gradient methods

Line search methods
- Gradient descent algorithms
  - Steepest descent
  - Coordinate descent
  - Stochastic gradient descent
  - Conjugate gradient
Trust region methods

Gradient descent algorithms

Select a descent direction
Select a step-size (\(\alpha\)) along the descent direction

Step-size Calculation

Step-size calculation methods can be classified as:

Exact step-size calculation

Iterative step-size calculation

Step-size calculation usually happens in two steps:

Bracketing the minima
- Equal interval search
Reduction in the interval of uncertainty (or the bracket) iteratively
- Equal interval search (1-point and 2-point methods)
- Golden-section search
- Polynomial interpolation
  - Quadratic curve fitting

Inexact line search

Armijo’s rule

Armijo’s rule only promises to find \(\alpha\) such that \(f(\bx^{(k+1)} < f(\bx^{k})\).

Drawbacks of steepest descent method

Poor performance for ill-conditioned problems (remember the ellipse in the lecture) as opposed to well-conditioned problems (circle in the lecture).
Method is fast away from the optimum and becomes slow in the neighbourhood of the optimum

Stoppage criteria

Similar criteria is used for gradient as well as non-gradient methods

Algorithm Performance

Computational complexity (space and time)

Rate of convergence

Consider any gradient algorithm. We get a series of design points \(\bx^{(0)}, \bx^{(1)}, ... , \bx^{(k)}, \bx^{(k+1)}, ... , \bx^{(p)}\). We can treat these design points as a sequence. The algorithm is converging if this sequence converges to the actual optima\(\bx^{*}\). We define convergence ratio (\(\beta\)) as follows:

\[\beta = \frac{\norm{\bx^{(k+1)} - \bx^*} }{\norm{\bx^{(k)} - \bx^*}^r}\]

where \(r\) is the rate of convergence.

Convergence	Condition
Linear	\(r=1\) and \(\beta < 1\) (Only first order information required)
Superlinear	\(r=1\) and \(\beta \rightarrow 0\)
Superlinear	\(1 < r<2\) and \(\beta < 1\) (Achieved using gradient information from previous steps)
Quadratic	\(r=2\) and \(\beta<1\) (Hessian information required)

Non-gradient methods

Any optimisation algorithm that does not require gradient information is called non-gradient method.
Any optimisation algorithm that does not get trapped into a local optima and searches the entire design space to find global optima is can a global method.
Any optimisation algorithm that works on populations of candidate designs and evolves them to reach optima is called evolutionary method.
Any optimisation algorithm that mimics (or is inspired by) natural systems/animal behaviour is called nature inspired method.

There is a lot of overlap and intersection between this terminology.

For example, Nelder Mead algorithm is a non-gradient algorithm. It is not global or evolutionary or nature inspired. Genetic algorithm is a non-gradient, global, evolutionary algorithm which is inspired by the evolution process found in all living beings.

This general area of optimisation is relatively new (as compared to the gradient methods) and hence the terminology has not settled down yet.

Calculation of Gradient (and Hessian)

Finite Difference
Complex variable trick
Automatic Differentiation
Hyperdual numbers (not discussed in the class)

Finite Difference

Surrogate Modelling

A standard way to find a \((n-1)\) degree interpolating polynomial through \(n\) data points is a Lagrange polynomial. A brief introduction to this for \(f(x):\R \rightarrow \R\) can be found here.