One-dimensional Design Space

Published

January 13, 2025

Keywords

optimisation, mdo

Problem Statement

The general problem statement is far too complex to tackle at present. So we will start our exploration with the simplest case of a real valued single variable, unconstrained, minimisation (or maximisation) problem where objective function is smooth enough.

Through out these notes, I will keep referring to a function being smooth enough. A function is said to be smooth if it is continuous and differentiable. A function is said to be \(n\) times differentiable if its \(n\)th derivative exists. The term smooth enough has to be taken in the context of the problem. For example, for zeroth order methods like gradient free algorithms, smooth means simply continuous. For first order methods like gradient descent, smooth means differentiable. For second order methods like Newton’s method, smooth means at least twice differentiable.

\[\min_{x} f(x),\quad x \in \mathbb{R}\]

where \(f(x)\) is a scalar function of a scalar variable \(x\).

We need to answer following questions:

Does the optimum exist?
Is the optimum unique?
What are the necessary and sufficient conditions for the optimum?
How to find the optimum?

The answer to the last question is quite simple if we have the explicit form of the function. We can simply find the roots of the derivative of the function and evaluate the function at those points to find the optimum. Depending on the form of the objective function, a numerical algorithm may be required to evaluate all the roots of \(f'(x)\).

However, in most of the engineering problems, we do not have the explicit form of the function. We only have the function as an implicit function or a black box that can be evaluated at any point. In such cases, we need to use iterative methods to find the optimum. We will discuss these methods in detail later.

Existence of Optimum

The existence of an optimum is not guaranteed. Consider the following function:

\[f(x) = \frac{-1}{x^2}\]

The function is defined for all real numbers except \(x=0\). The function is continuous and differentiable. The function is unbounded below. Therefore, the minimum does not exist. Apart from pathological cases like this, the existence of an optimum is guaranteed if the function is continuous and differentiable. Most of the engineering optimisation problems satisfy these conditions. However, care must be taken to ensure that the problem is not posed in such a way as to cause such problems.

Uniqueness of Optimum

Assuming that we are able to find an optimum, the uniqueness is not guaranteed.

Consider the following famous function:

#| echo: true
#| code-fold: true

using Plots
using Optim

# Define the function
f(x) = (6x - 2)^2 * sin(12x - 4)

# Generate a range of x values
x = 0:0.01:1

# Generate the y values
y = f.(x)

# Create a new plot
p = plot(x, y, label="f(x)", size=(250, 200))
xlabel!("x")
ylabel!("f(x)")
# Set the plot limits
xlims!(0, 1)
ylims!(-10, 10)

# Find the global optima
# global_optima = optimize(f, 0, 1)
global_optima = optimize(f, 0, 1)

# Add the optima to the plot
scatter!(p, [global_optima.minimizer], [global_optima.minimum], label="Global Optima", color=:red)

local_optima = optimize(f, 0, 0.5)
scatter!(p, [local_optima.minimizer], [local_optima.minimum], label="Local Optima", color=:blue)
# Display the plot
display(p)

\[f(x) = (6x-2)^2 \sin(12x-4),\quad x \in [0,1]\]

The function is continuous and differentiable. The function is bounded below. We can see from the plot that there are one local minimum and one global minimum. Therefore, the optimum is not unique.

Global Optimum: The point \(x^*\) is the global optimum if \(f(x^*)\leq f(x)\) for all \(x \in \mathbb{D}\) where \(\mathbb{D}\) is the design space. The global optimum is also known as the absolute optimum.
Local Optimum: The point \(x^*\) is the local optimum if there exists a \(\delta>0\) such that \(f(x^*)\leq f(x)\) for all \(x\) such that \(|x-x^*|<\delta\).

We can prove that the optimum is unique, if the function is convex.

Convex function: A function \(f(x)\) is convex if \(f''(x)\geq 0\) for all \(x\). A function \(f(x)\) is strictly convex if \(f''(x)>0\) for all \(x\).

A more intuitive definition of convexity is that the line joining any two points on the function lies above the function. Mathematically,

\[f(\lambda x_1 + (1-\lambda)x_2) \leq \lambda f(x_1) + (1-\lambda)f(x_2), \forall\quad 0 < \lambda < 1\]

#| echo: true
#| code-fold: true
#| fig-cap: "We can see a convex function $f(x) = x^2$ and a non-convex function $f(x) = x^3$. The line joining any two points on the convex function lies above the function. This is not true for the non-convex function. Also note that if the region of interest is restricted to $x \\in [0, 1]$ (where $f''(x) \\geq 0$), $x^3$ is also convex. This gives us an important insight that a function can be convex in one region and non-convex in another region. We will elaborate on this insight later for the multidimensional case."

convex_function(x) = x^2
non_convex_function(x) = x^3

using Plots

# Generate a range of x values
x = -1:0.01:1

# Generate the y values
y_convex = convex_function.(x)
y_non_convex = non_convex_function.(x)

# Create a new plot
p = plot(x, y_convex, label="Convex Function", size=(250, 200))
xlabel!("x")
ylabel!("f(x)")
# Set the plot limits
xlims!(-1, 1)
ylims!(-1, 1)

plot!(p, x, y_non_convex, label="Non-convex Function", color=:red)
plot!(p, [1, -1], [1, -1], label="", color=:black)
plot!(p, [1,-1], [1, 1], label="", color=:black)
# Display the plot
display(p)

Now we will prove that a convex function has a unique optimum.

Theorem: Let \(f(x)\) be a convex function of a single variable \(x \in \mathbb{D}\subset \mathbb{R}\). Then \(f(x)\) has a strict global minimum.

Proof. Proof by contradiction.

Let a convex function \(f(x)\) have two local minima at points \(x_1\) and \(x_2\). By definition of a convex function, we have

\[f((x_1 + x_2)/2) \leq \frac{f(x_1) + f(x_2)}{2}\]

Since \(x_1\) and \(x_2\) are local minima, we have

\[f(x_1) \leq f((x_1 + x_2)/2)\]

\[f(x_2) \leq f((x_1 + x_2)/2)\]

Adding the above two equations, we get

\[f(x_1) + f(x_2) \leq 2f((x_1 + x_2)/2)\]

This contradicts the definition of a convex function. Therefore, a convex function has a unique global minimum.

Also note that the converse of the above theorem is not true. A function may have a unique global minimum, but may not be convex. For example, consider the following function:

\[f(x) = x^4 - 2x^2 + x\]

The function has a unique global minimum at \(x=1/2\). However, the function is not convex as \(f''(x) = 12x^2 - 4\) is negative for \(x \in (-\infty, -1/\sqrt{3}) \cup (1/\sqrt{3}, \infty)\).

Once the optimum has been found numerically, we need to check if it is a local or a global optimum. This can be done by evaluating the function at a few points around the optimum. If the function value at all these points is higher than the optimum, then the optimum is a global optimum. If the function value at some of these points is lower than the optimum, then the optimum is a local optimum.

The above method is not foolproof. It is possible that the function value at all the points around the optimum is higher than the optimum, but there exists a point far away from the optimum where the function value is lower than the optimum. In such cases, we need to use more sophisticated methods to prove that the optimum is a global optimum.

To prove that the optimum is a local optimum, we need check if it satisfies the necessary and sufficient conditions for an optimum.

Necessary and Sufficient Conditions for Optimum

The necessary and sufficient conditions for an optimum are given by the following theorems.

Theorem: Let \(f(x)\) be a function of a single variable \(x\). Let \(x^*\) be a local minimum of \(f(x)\). If \(f(x)\) is differentiable at \(x^*\), then \(f'(x^*)=0\).

Proof. Let \(x^*\) be a local minimum of \(f(x)\). Then there exists a \(\delta>0\) such that \(f(x^*)\leq f(x)\) for all \(x\) such that \(|x-x^*|<\delta\). Let \(x\) be such that \(|x-x^*|<\delta\). Then

\[\begin{aligned} f(x) - f(x^*) &\geq 0\\ \frac{f(x) - f(x^*)}{x-x^*} &\geq 0\\ \lim_{x\to x^*} \frac{f(x) - f(x^*)}{x-x^*} &\geq 0\\ f'(x^*) &\geq 0 \end{aligned}\]

Similarly, if \(x^*\) is a local maximum of \(f(x)\), then \(f'(x^*)\leq 0\).

Theorem: Let \(f(x)\) be a function of a single variable \(x\). Let \(x^*\) be a local minimum of \(f(x)\). If \(f(x)\) is twice differentiable at \(x^*\), then \(f''(x^*)\geq 0\).

\[\begin{aligned} f(x) - f(x^*) &\geq 0\\ \frac{f(x) - f(x^*)}{x-x^*} &\geq 0\\ \lim_{x\to x^*} \frac{f(x) - f(x^*)}{x-x^*} &\geq 0\\ f'(x^*) &\geq 0 \end{aligned}\]

Similarly, if \(x^*\) is a local maximum of \(f(x)\), then \(f'(x^*)\leq 0\).

Finding the Optimum

Brute Force Search: Evaluate \(f(x)\) at a large number of points and choose the point with the lowest value of \(f(x)\).
Indirect search: Find the roots of \(f'(x)=0\). Evaluate \(f(x)\) at the roots and choose the point with the lowest value of \(f(x)\).
Iterative Gradient Methods

Start with an initial guess \(x_0\). Evaluate \(f(x)\) at \(x_0\). If \(f'(x_0)=0\), then \(x_0\) is the optimum. If \(f'(x_0)\neq 0\), then choose a new point \(x_1\) such that \(f(x_1)<f(x_0)\). Repeat the process until \(f'(x_n)=0\).

Gradient methods have a rich history with contributions from Newton, Leibniz, Euler, Cauchy, Gauss, Jacobi, Lagrange, and others.
Iterative Non-gradient Methods

These methods are generally used when the gradient of the objective function is not available (for e.g. if \(x \in \mathbb{N}\)) or is too expensive to compute (objective function is \(C_L/C_D\) that needs to calculated using a CFD solution). Examples of non-gradient methods are Genetic Algorithm, Simulated Annealing and Nelder-Mead method.

Gradient methods are also known as descent methods, as they follow the direction of the negative gradient. They may only use the first order information (\(\nabla f\)) or may use second order information (\(\nabla^2 f\)).

Since these methods are only using the local information, they are also known as local methods. These methods do not guarantee global optimality.

These methods are also known as deterministic methods, as they do not use any random numbers.

In the next section we will discuss the iterative gradient methods in detail.