Questions

Published

May 20, 2025

Theory of optimisation

List six desirable properties of an optimisation algorithm.

Solution

Accuracy
Low Computational Cost
Simplicity (implementation)
Gives global minima
Able to handle discrete variables
Able to handle discontinuous functions
Able to handle discontinuous design space
Able to handle discrete variables

State and prove the first-order necessary optimality condition for unconstrained optimisation problem over a continuous domain.
What is a convex function?
Is the function \(f(x) = x(1-x)\) convex?
For most real-life problems, proving the design problem’s convexity is impossible. In such a scenario, how can the use of gradient algorithms be justified?
How is the rate of convergence of an optimisation algorithm defined, and what is the significance of the rate of convergence? Show that a sequence defined as \(x^{(k)} = 3^{-k^2}\) shows a superlinear convergence.
Consider the objective function \(f(x_1, x_2) = x_1^4 + x_2^4\). This is clearly a nonlinear optimisation problem. But if I change my design variables to \(y_1 = x_1^4,\ y_2 = x_2^4\), then the resulting optimisation problem becomes \(f(y_1, y_2) = y_1 + y_2\) which is clearly linear.
1. Can the problem \(f(x_1, x_2) = x_1^2 x_2^2 + x_1 x_2^2 + x_1^2 x_2 + x_1 x_2\) be converted to a linear optimisation form using this trick?
2. What are the advantages of using this trick to convert a nonlinear problem to linear problem?
Given \(f(x_1, x_2, x_3) = x_1^2 + 2x_2^2 + 3x_1 x_3\), calculate the Hessian.
We have an objective function \(f(x_1, x_2)\), such that \(\nabla f = [\beta_0, \beta_1]^T\) where \(\beta_i \ge 0\) are constants. For \(x_i \in [0, \infty)\), what is the minima of the function?

Gradient algorithms

Draw the direction of the gradient at the point (1,1).

Code

# This code write the quadratic form as x^T V^-1 lambda V x + b^T x + c and then plots the contours. Here the A matrix is written in term of the product of eigenvectors and eigenvalues.
import numpy as np
import matplotlib.pyplot as plt

V = np.array([[1, 1], [1, -1]])
l = np.array([[1, 0], [0, -2]])
A = np.dot(V, np.dot(l, np.linalg.inv(V)))
b = np.array([1, 1])
c = 0

x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)

X, Y = np.meshgrid(x, y)

Z = 0.5 * (A[0, 0] * X**2 + A[1, 1] * Y**2 + 2 * A[0, 1] * X * Y) + b[0] * X + b[1] * Y + c

contours = plt.contour(X, Y, Z, 20)
plt.clabel(contours, inline=True, fontsize=8)
plt.colorbar()
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.plot(1, 1, 'ro')
# Save the figure as pdf
plt.savefig('quadratic_contour.pdf')

plt.show()

Write the complete algorithm for the steepest descent method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the conjugate gradient method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the Newton-Raphson method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the quasi-Newton method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the BFGS method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the DFP method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Write the complete algorithm for the Levenberg-Marquardt method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Given an objective function \(f(x) = \frac{1}{2} x^T A x - b^T x,~x \in \Re^n\), prove that the conjugate gradient method reaches the optima \(x^{*}\) in \(n\) iterations.
Write a short note comparing the relative strengths and weaknesses of gradient and non-gradient optimisation algorithms.
Please write a short note on various numerical step-size calculation methods by clearly outlining their advantages and disadvantages.
Write down the complete algorithm for Marquardt’s method, either using a flowchart or a pseudocode. Please add extra text at the bottom of the algorithm to clarify each step further (if needed). Do not neglect to mention even simple (obvious) steps like Calculate the gradient \(c\).
Show that the BFGS formula given below is symmetric and satisfies the secant condition,

\[B^{(k+1)} = B^{(k)} + \frac{y^{(k)} y^{(k)^T}}{y^{(k)^T} s^{(k)} } + \frac{c^{(k)} c^{(k)^T}}{c^{(k)^T} d^{(k)} }\]

where, \(c\) is the gradient vector, \(d\) is the descent direction, \(y^{(k)} = c^{(k+1)} - c^{(k)}\), \(s^{(k)} = x^{(k+1)} - x^{(k)}\) and \(B\) is an approximation of the Hessian matrix \(H\).

List the advantages of gradient descent methods. What is the basic criterion for a direction to be called a descent direction? A mathematical statement is expected here.
We will investigate Newton’s method, BFGS and Levenberg-Marquardt algorithm in this question. We will consider the standard nonlinear problem \[\min_{\bar{x}} f(\bar{x}),\ \ \forall\ \bar{x} \in \Re^n\]
1. Write the update formula for the Newton’s method. Use standard notation.
2. What is the computational cost of each iteration of Newton’s method?
3. What is the computational cost of each iteration of BFGS method?
4. If researchers came up with an Hessian calculation algorithm that is \(\mathcal{O}(1)\) (which is very fast and impossible in real life), then would you prefer Newton’s method or BFGS?
5. Under which conditions is Levenberg-Marquardt algorithm better than the Newton’s method?

Solution

Cubic or higher nonlinear function and initial guess far away from the minima.

Non-gradient algorithms

Write two principle advantages and disadvantages of non-gradient optimisation methods over the gradient methods.
Write the complete algorithm for the Simulated Annealing method. All the steps with the relevant mathematical formulation are expected. A brief discussion of the algorithm’s parameters and their usual range of values is also expected.
Explain the particle swarm algorithm with the help of a flowchart or a pseudocode.
What are the various parameters and operators of particle swarm optimisation? Discuss their influence on the performance of the algorithm.
Describe atleast two different types of scaling and selection methods used in genetic algorithms.
On what basis are specific scaling and selection methods selected for an application?
What is the advantage of real-coded GAs over binary-coded GAs?
How do the values of crossover and mutation probabilities affect the performance of GA?
Illustrate the uniform crossover and mutation operators in genetic programming with an example.
Which parameter should a designer change in GA to make the algorithm perform more exploration and less exploitation?
What governs the length of the chromosome in binary GA?
Outline two disadvantages and advantages of GA over gradient optimisation algorithms.
What is the primary advantage of MOEA algorithms over the gradient-based multi-objective scalarisation algorithms?
Point out any two disadvantages of VEGA with respect to NSGA algorithms.
A Genetic Algorithm uses \(n=5\) population size for minimising \(f(x_1, x_2) = 10x_1 + 2x_2^2\) where \(x_1\) and \(x_2\) are integers. The population at the \(k^{th}\) iteration is \([(1,1), (1, 2), (2,1), (2,2), (1,0)]\). Assume that each design variable is encoded with four bits.

Which is the fittest individual in this population?
Assuming one member elitism, no mutation and roulette wheel selection methodology, calculate the next generation.

Gradient Calculation

A CFD calculation is performed over an aircraft defined by \(n\) design parameters. The CFD solution thus obtained is post-processed to calculate \(m\) objective functions of interest (like lift, drag, and pitching moment). Assuming that the computational cost of one CFD run is \(C\), write down the computational cost of the entire Jacobian matrix calculation for the forward step finite difference, central finite difference, complex step method and adjoint method.
Compare the performance of the finite difference method (FDM) with the complex variable method (CVM) regarding the accuracy, computational cost and ease of implementation. Can CVM be implemented for all analysis tools (CFD/FEM codes/packages) used in optimisation?
How many function evaluations will be required to approximate the entire Hessian matrix for the objective function \(\mathbf{f}(\mathbf{x})\) where \(\mathbf{f} \in \Re^m\) and \(\mathbf{x} \in \Re^n\) using central finite difference approximation?
Given a choice between the availability of two CFD solvers where the first CFD solver is open-source. Which gradient calculation method can be used with each solver? Discuss the relative advantages and disadvantages.
A CFD calculation is performed over an aircraft defined by \(n\) design parameters. The CFD solution thus obtained is post-processed to calculate \(m\) objective functions of interest (like lift, drag and pitching moment). Assuming that the computational cost of one CFD run is \(C\), write down the computational cost of the entire Jacobian matrix calculation for forward step finite difference, central finite difference and complex step method.

Surrogate Modelling

Write down the basis functions for a third-order polynomial regression fit for design variables \(x = [x_1~x_2]^T\).
Computer simulations of flow over an airfoil are deterministic. Multiple CFD runs with the same parameters (mesh size, convergence criteria and turbulence model) are expected to yield the same lift values. Hence, it is not reasonable to use a least squares fit (a non-interpolating fit) to generate surrogate models for such cases. Please comment.
A linear regression model from \(x \in \Re^n\) to an objective function \(f \in \Re\) using \(N\) data points where \(N\,>\,n+1\). The gradient descent technique was used to obtain the regression coefficients \(w_i\). \(w_i\) thus obtained are unique. Please comment.
What is the aliasing error? How is it overcome during sampling for surrogate models?
Write down the explicit solution for a linear polynomial regression problem with \(m\) basis functions and \(n\) sample points. Clearly define all the variables used in the expression as well as their dimensions.
Assume that the computational cost of an optimisation algorithm is \(\alpha\) per iteration. Therefore, if \(n\) iterations are required for convergence, then the total cost is \(\alpha n\). Also, assume that it is known from prior experience that the given problem requires \(m\) sampling points to create an accurate surrogate model of the entire design space. Under what conditions will you use the surrogate based optimisation purely from the consideration of the computational cost.
For the sampling data, \(x = [1\ 2\ 3]^T\) the value of the objective function is \(f(x) = [4\ 10\ 13]^T\). Write the nonlinear regression problem to solve for \(\beta\)’s to fit the model \(\hat{f}(x) = \beta_0 + \beta_1^2 x\). No need to solve the resulting problem.

Multi-objective Optimisation

Define Pareto optimality for multi-objective optimisation.
Outline the \(\epsilon\)-Constraint method.
List the advantages of the \(\epsilon\)-Constraint method.
Using the weighted \(l_2\) distance metric, find the Pareto-optimal solutions corresponding to the following weight vectors: \((w_1, w_2) = [(1,0), (0.5,0.5), (0,1)]\)

\[\mbox{Minimise}~ ~f_1(x,y) = x^3 + y^2, ~ ~f_2(x,y) = 5(y^2 - x)\]

Sketch the objective space and discuss if all Pareto-optimal solutions can be found by the weighted \(l_2\) distance metric method.

Given objective functions,

\[\mbox{Minimise}~ ~f_1(x,y) = x^3 + y^2, ~f_2(x,y) = y^2 - 4x\]
1. Using the weight vector \(\mathbf{w} = (w, 1-w)^T\), find the Pareto-optimal solutions in terms of \(w\).
2. What is the relationship between \(f_1\) and \(f_2\) for the Pareto-optimal solutions?
3. What is the Pareto-optimal solution corresponding to \(w = 0.5\)?
4. Show that the weighted-sum approach will not find half of the Pareto-optimal front.
\[\mbox{Minimise}~ ~f_1(x,y) = x^3 + y^2, ~f_2(x,y) = y^2 - 4x\]
1. Using the weight vector \(\mathbf{w} = (w, 1-w)^T\), find the Pareto-optimal solutions in terms of \(w\).
2. What is the relationship between \(f_1\) and \(f_2\) for the Pareto-optimal solutions?
3. What is the Pareto-optimal solution corresponding to \(w = 0.5\)?
4. Show that the weighted-sum approach will not find half of the Pareto-optimal front.
Is a randomly selected point in the design space of an unconstrained multi-objective optimisation problem always feasible? If yes, justify. If no, then give one example of a point that is not feasible.
Write the advantages and disadvantages of the weighted sum algorithm for multi-objective optimisation.
A team of engineers is solving a complex multi-objective optimisation problem. Their estimation shows that getting the entire pareto front is nearly impossible. What would you advise them as an optimisation expert? To make the problem a little concrete, lets assume that there are three objective functions. The team ranks the importance of the objective functions as \(f_1 > f_2 > f_3\). The computational cost for each objective function is \(C_{f_1} > C_{f_2} > C_{f_3}\).

Solution

Keep \(f_1\) in the objective and move \(f_2\) and \(f_3\) to the constraints.
Computational cost has no role to play in the solution methodology as in all the computational approaches, all the objective functions will have to be calculated.

Uncertainty quantification and propagation

Given \(y = f(x)\), derive the equations for \(\mu_y\) and \(\sigma_y\) using the first-order moment method of uncertainty propagation in terms of \(\mu_x\) and \(\sigma_x\). The complete derivation is expected.
Compare four relative advantages and disadvantages of moment methods and the Monte Carlo method for uncertainty propagation.
Given \(y = f(x)\), derive the equations for \(\mu_y\) and \(\sigma_y\) using the first-order moment method of uncertainty propagation in terms of \(\mu_x\) and \(\sigma_x\). The complete derivation is expected.
An aircraft is to be designed to fly over a range of cruise velocities from \(v_1\) to \(v_2\). Can this problem be posed as a robust design problem? If yes, how? If not, why not?
Given \(y = f(x)\), it is known that \(\mu_y = f(\mu_x)\). Write the most general form of function \(f\) for which this is possible.

MDO Architectures

A given optimisation problem involves three disciplines. Disciplines two and three are coupled, and disciplines one and three are coupled. Assume standard notation for all the variables. Also, assume that the objective function and constraints depend on state variables from all three disciplines. In other words, there are no local objective functions or constraints.

Write the mathematical definition of the problem using IDF and MDF architectures in standard notation.
Draw the XDSM for the IDF architecture.
Draw the XDSM for the MDF architecture.

What is the necessity for using multi-disciplinary optimisation methodology when designing complex systems like aircraft?
Draw an XDSM for the IDF and MDF architectures for the standard aeroelasticity optimisation problem. The aeroelastic wing optimisation problem is the optimisation of a flexible wing (characterised by design variables \(x \in \Re^n\) to maximise the aerodynamic efficiency (\(E = \frac{C_L}{C_D}\)) using a gradient optimiser. Gradient calculation is performed using the central finite difference method. Assume a global set of constraints (from the aerodynamics and structures discipline) \(g(x) \le 0\). Identify the coupling variables between the two disciplines.
Write a short note on the need for distributed MDO architectures over the monolithic MDO architectures.
Draw separate XDSM diagrams for Multi-disciplinary analysis of following problem using Jacobi and Gauss-Siedel iterations.

\[\begin{eqnarray}\nonumber & \min_{\bar{x}} f(\bar{x}) \\\nonumber \text{where} & \bar{x} = [\bar{x}_0,\, \bar{x}_1,\, \bar{x}_2] \\\nonumber \text{such that} & \bar{y}_{12} = D_1(\bar{x}_0, \bar{x}_1, \bar{y}_{21}) \\\nonumber & \bar{y}_{21} = D_2(\bar{x}_0, \bar{x}_2, \bar{y}_{12}) \nonumber \end{eqnarray}\]

All the variables have the usual meaning.

Explain why the Individual Discipline Feasible (IDF) approach should be faster than the Multi-Disciplinary Feasible (MDF) approach if we have a global low cost ideal optimiser. Assume that the optimiser is given to you by GOD and has all the nice properties you wish for.

Applications

A standard scalar nonlinear optimisation problem has been solved using the same optimiser (same algorithm and same implementation like Matlab fmincon) by multiple people. They have all reported slightly different minima. What could be the reason? List at least three possible sources of difference.

Solution

Initial guess
Gradient calculation method
Stoppage criteria

Consider an aircraft optimisation problem. Apart from all the geometric parameters, one of the design variable is \(n_e\), where \(n_e\) is the number of engines. Would you recommend using steepest gradient descent algorithm for this problem? If yes, why not Newton’s method? If no, please give explanation.
A team of engineers is performing aerodynamic wing optimisation using Ansys software. They found out that the gradient optimisation algorithm is not converging because of errors in the gradient calculation. Suggest two ways to overcome this problem.

Solution

Non-gradient algorithms
Surrogate models

A complex optimisation problem has been given to two different teams. They come up with two different answers. Team A has reported \(\bar{x}_A\) as the minima using BFGS algorithm with finite difference gradient calculation method. Team B has reported \(\bar{x}_B\) as the minima using DFP (another quasi Newton method) algorithm with complex step gradient calculation method.
1. State two likely sources of errors resulting in two different minima?
2. Devise a methodology to identify which team’s answer is likely to be correct. The methodology should use least amount of computation.

Solution 1

Initial guess
Gradient calculation error

Solution 2

Ask Team A to use complex stop with the their algo. If the answers match, then the problem is with gradient calculation using FD. If the answer still doesn’t match, ask them to use same initial guess as Team B. If the answer matches, great. If no, then the problem is with the quasi Newton algorithms. We cannot be certain whose answer is correct.
Ask teams to start from the same initial guess. Ask team A to run DFP + FDM. If they report different minima then problem is in gradient calculation. If they get the same point then we need to investigate if the problem has two local minima.