Study Guide
Reading Guide
Topic | Reference |
---|---|
Unconstrained Optimisation | Jasbir Arora (Ch. 10 & 11) |
Optimality Conditions | Jasbir Arora (Ch. 4) |
Genetic Algorithms | Kalyanmoy Deb (Ch. 6) |
Simulated Annealing | Kalyanmoy Deb (Ch. 6) |
Multi-objective Optimisation | Kalyanmoy Deb (Ch. 1, 2, 3) |
Jasbir Arora (Ch. 17) | |
Gradient Calculation Methods | Alonso Notes |
Robust Optimisation | Class Notes |
Surrogate Modelling |
Gradient methods
- Line search methods
- Gradient descent algorithms
- Steepest descent
- Coordinate descent
- Stochastic gradient descent
- Conjugate gradient
- Gradient descent algorithms
- Trust region methods
Gradient descent algorithms
- Select a descent direction
- Select a step-size (\(\alpha\)) along the descent direction
Step-size Calculation
Step-size calculation methods can be classified as:
Exact step-size calculation
Iterative step-size calculation
Step-size calculation usually happens in two steps:
- Bracketing the minima
- Equal interval search
- Reduction in the interval of uncertainty (or the bracket) iteratively
- Equal interval search (1-point and 2-point methods)
- Golden-section search
- Polynomial interpolation
- Quadratic curve fitting
Inexact line search
- Armijo’s rule
Armijo’s rule only promises to find \(\alpha\) such that \(f(\bx^{(k+1)} < f(\bx^{k})\).
Drawbacks of steepest descent method
- Poor performance for ill-conditioned problems (remember the ellipse in the lecture) as opposed to well-conditioned problems (circle in the lecture).
- Method is fast away from the optimum and becomes slow in the neighbourhood of the optimum
Stoppage criteria
- Similar criteria is used for gradient as well as non-gradient methods
Algorithm Performance
Computational complexity (space and time)
Rate of convergence
Consider any gradient algorithm. We get a series of design points \(\bx^{(0)}, \bx^{(1)}, ... , \bx^{(k)}, \bx^{(k+1)}, ... , \bx^{(p)}\). We can treat these design points as a sequence. The algorithm is converging if this sequence converges to the actual optima\(\bx^{*}\). We define convergence ratio (\(\beta\)) as follows:
\[\beta = \frac{\norm{\bx^{(k+1)} - \bx^*} }{\norm{\bx^{(k)} - \bx^*}^r}\]
where \(r\) is the rate of convergence.
Convergence | Condition |
---|---|
Linear | \(r=1\) and \(\beta < 1\) (Only first order information required) |
Superlinear | \(r=1\) and \(\beta \rightarrow 0\) |
Superlinear | \(1 < r<2\) and \(\beta < 1\) (Achieved using gradient information from previous steps) |
Quadratic | \(r=2\) and \(\beta<1\) (Hessian information required) |
Non-gradient methods
Any optimisation algorithm that does not require gradient information is called non-gradient method.
Any optimisation algorithm that does not get trapped into a local optima and searches the entire design space to find global optima is can a global method.
Any optimisation algorithm that works on populations of candidate designs and evolves them to reach optima is called evolutionary method.
Any optimisation algorithm that mimics (or is inspired by) natural systems/animal behaviour is called nature inspired method.
There is a lot of overlap and intersection between this terminology.
For example, Nelder Mead algorithm is a non-gradient algorithm. It is not global or evolutionary or nature inspired. Genetic algorithm is a non-gradient, global, evolutionary algorithm which is inspired by the evolution process found in all living beings.
This general area of optimisation is relatively new (as compared to the gradient methods) and hence the terminology has not settled down yet.
Calculation of Gradient (and Hessian)
- Finite Difference
- Complex variable trick
- Automatic Differentiation
- Hyperdual numbers (not discussed in the class)
Finite Difference
Surrogate Modelling
A standard way to find a \((n-1)\) degree interpolating polynomial through \(n\) data points is a Lagrange polynomial. A brief introduction to this for \(f(x):\R \rightarrow \R\) can be found here.