Study Guide

Study guide (mdo-2021)
Published

January 12, 2024

Reading Guide

Topic Reference
Unconstrained Optimisation Jasbir Arora (Ch. 10 & 11)
Optimality Conditions Jasbir Arora (Ch. 4)
Genetic Algorithms Kalyanmoy Deb (Ch. 6)
Simulated Annealing Kalyanmoy Deb (Ch. 6)
Multi-objective Optimisation Kalyanmoy Deb (Ch. 1, 2, 3)
Jasbir Arora (Ch. 17)
Gradient Calculation Methods Alonso Notes
Robust Optimisation Class Notes
Surrogate Modelling

Gradient methods

  1. Line search methods
    • Gradient descent algorithms
      • Steepest descent
      • Coordinate descent
      • Stochastic gradient descent
      • Conjugate gradient
  2. Trust region methods

Gradient descent algorithms

  • Select a descent direction
  • Select a step-size (\(\alpha\)) along the descent direction

Step-size Calculation

Step-size calculation methods can be classified as:

Exact step-size calculation
Iterative step-size calculation

Step-size calculation usually happens in two steps:

  • Bracketing the minima
    • Equal interval search
  • Reduction in the interval of uncertainty (or the bracket) iteratively
    • Equal interval search (1-point and 2-point methods)
    • Golden-section search
    • Polynomial interpolation
      • Quadratic curve fitting

Drawbacks of steepest descent method

  • Poor performance for ill-conditioned problems (remember the ellipse in the lecture) as opposed to well-conditioned problems (circle in the lecture).
  • Method is fast away from the optimum and becomes slow in the neighbourhood of the optimum

Stoppage criteria

  • Similar criteria is used for gradient as well as non-gradient methods

Algorithm Performance

Computational complexity (space and time)

Rate of convergence

Consider any gradient algorithm. We get a series of design points \(\bx^{(0)}, \bx^{(1)}, ... , \bx^{(k)}, \bx^{(k+1)}, ... , \bx^{(p)}\). We can treat these design points as a sequence. The algorithm is converging if this sequence converges to the actual optima\(\bx^{*}\). We define convergence ratio (\(\beta\)) as follows:

\[\beta = \frac{\norm{\bx^{(k+1)} - \bx^*} }{\norm{\bx^{(k)} - \bx^*}^r}\]

where \(r\) is the rate of convergence.

Convergence Condition
Linear \(r=1\) and \(\beta < 1\) (Only first order information required)
Superlinear \(r=1\) and \(\beta \rightarrow 0\)
Superlinear \(1 < r<2\) and \(\beta < 1\) (Achieved using gradient information from previous steps)
Quadratic \(r=2\) and \(\beta<1\) (Hessian information required)

Non-gradient methods

  • Any optimisation algorithm that does not require gradient information is called non-gradient method.

  • Any optimisation algorithm that does not get trapped into a local optima and searches the entire design space to find global optima is can a global method.

  • Any optimisation algorithm that works on populations of candidate designs and evolves them to reach optima is called evolutionary method.

  • Any optimisation algorithm that mimics (or is inspired by) natural systems/animal behaviour is called nature inspired method.

There is a lot of overlap and intersection between this terminology.

For example, Nelder Mead algorithm is a non-gradient algorithm. It is not global or evolutionary or nature inspired. Genetic algorithm is a non-gradient, global, evolutionary algorithm which is inspired by the evolution process found in all living beings.

This general area of optimisation is relatively new (as compared to the gradient methods) and hence the terminology has not settled down yet.

Calculation of Gradient (and Hessian)

  • Finite Difference
  • Complex variable trick
  • Automatic Differentiation
  • Hyperdual numbers (not discussed in the class)

Finite Difference

Surrogate Modelling

A standard way to find a \((n-1)\) degree interpolating polynomial through \(n\) data points is a Lagrange polynomial. A brief introduction to this for \(f(x):\R \rightarrow \R\) can be found here.