Quadratic Forms and Taylor Series Expansion

Published

February 5, 2025

Quadratic Forms

The quadratic form is a nonlinear function with only second-order terms. For example:

\[ F(\mathbf{x}) = x_1^2 + 2x_2^2 + 7x_3^2 + 2x_1x_2 + 4x_2x_3 + 14x_3x_1 \]

is a quadratic form. The general expression for a quadratic form can be written as: \[ F(\mathbf{x}) = \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n p_{ij} x_i x_j \]

Expanding this yields:

\[ \begin{aligned} F(\mathbf{x}) &= \frac{1}{2} \big[ \big(p_{11}x_1^2 + p_{12}x_1x_2 + \cdots + p_{1n}x_1x_n\big) \\ &+ \big(p_{21}x_2x_1 + p_{22}x_2^2 + \cdots + p_{2n}x_2x_n\big) \\ &+ \cdots + \big(p_{n1}x_nx_1 + \cdots + p_{nn}x_n^2\big) \big]. \end{aligned} \]

If \(p_{ij}\) and \(x_i\) are given, then \(F(\mathbf{x})\) is a scalar and is called a quadratic form because all terms are second-order terms.

The quadratic form can be written in matrix notation. To do so, consider

\[ \mathbf{y}=\mathbf{P x} \]

Here \(\mathbf{P}\) is a \(n \times n\) matrix and \(\mathbf{x}\) and \(\mathbf{y}\) are \(n \times 1\) vectors. Now (1.19) can be written as:

\[ F(\mathrm{x})=\frac{1}{2} \sum_{i=1}^n x_i\left(\sum_{j=1}^n p_{i j} x_j\right)=\underbrace{\frac{1}{2} \sum_{i=1}^n x_i y_i}_{\text{dot product }} \]

Combining the above 2 equations, we can write

\[ F(\mathbf{x})=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{y} \Rightarrow F(\mathbf{x})=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{P} \mathbf{x} \]

Matrix \(\mathbf{P}\) is called the matrix of the quadratic form and contains information about its mathematical nature. \(F(\mathbf{x})\) in (1.20) can be rewritten as shown below: \[ \begin{aligned} F(\mathbf{x})= & \frac{1}{2}\left\{\left[p_{11} x_1^2+p_{22} x_2^2+\cdots+p_{n n} x_n^2\right]\right. \\ & +\left[\left(p_{12}+p_{21}\right) x_1 x_2+\left(p_{13}+p_{31}\right) x_1 x_3+\cdots+\left(p_{1 n}+p_{n 1}\right) x_1 x_n\right] \\ & +\left[\left(p_{23}+p_{32}\right) x_2 x_3+\left(p_{24}+p_{42}\right) x_2 x_4+\cdots+\left(p_{2 n}+p_{n 2}\right) x_2 x_n\right] \\ & \left.+\cdots+\left[\left(p_{n-1, n}+p_{n, n-1}\right) x_{n-1} x_n\right]\right\} \end{aligned} \]

Here we have separated the square terms corresponding to the diagonal terms of the \(\mathbf{P}\) matrix from the other terms. For \(i \neq j\), the coefficient of \(x_i x_j\) is \(\left(p_{i j}+p_{j i}\right)\). Define \[ a_{i j}=\frac{1}{2}\left(p_{i j}+p_{j i}\right) \Rightarrow a_{i j}+a_{j i}=p_{i j}+p_{j i} \] for all \(i\) and \(j\) coefficients of matrix A. Replacing the \(p s\) by \(a s\), we can write \[ F(\mathbf{x})=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{P} \mathbf{x}=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{A} \mathbf{x} \]

From (1.26), we can see that the value of the quadratic form does not change when \(\mathbf{P}\) is replaced by \(\mathbf{A}\). However, \(\mathbf{A}\) is always symmetric, since \[ \begin{aligned} a_{i j} & =\frac{1}{2}\left(p_{i j}+p_{j i}\right) \\ a_{j i} & =\frac{1}{2}\left(p_{j i}+p_{i j}\right) \end{aligned} \]

There can be many \(\mathbf{P}\) matrices, but there is only one \(\mathbf{A}\) matrix for a quadratic form. Asymmetric matrices are not very useful for getting a mathematical insight into the problem. Symmetric matrices tell us more about the matrix.

Example

Express the following quadratic form as a symmetric matrix:

\[ F\left(x_1, x_2, x_3\right)=\frac{1}{2}\left(5 x_1^2+x_1 x_2-12 x_1 x_3+10 x_2^2-7 x_2 x_3+x_3^2\right) \]

Solution: We first express the equation in matrix form: \[ \begin{aligned} F(\mathbf{x}) & =\frac{1}{2}\left[x_1 x_2 x_3\right]\left[\begin{array}{ccc} 5 & 1 & -12 \\ 0 & 10 & -7 \\ 0 & 0 & 1 \end{array}\right]\left\{\begin{array}{l} x_1 \\ x_2 \\ x_3 \end{array}\right\}=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{P} \mathbf{x} . \\ p_{11} & =5, p_{22}=10, p_{33}=1, p_{12}=1, p_{13}=-12, p_{23}=-7 . \end{aligned} \]

However, \(\mathbf{P}\) is asymmetric. We need to find \(\mathbf{A}\). \[ \begin{aligned} p_{12}+p_{21} & =1+0=1, \quad a_{12}=0.5 . \\ p_{13}+p_{31} & =-12+0=-12, \quad a_{13}=-6 . \\ p_{23}+p_{32} & =-7+0=-7, \quad a_{23}=-3.5 . \\ F(\mathbf{x}) & =\frac{1}{2}\left[x_1 x_2 x_3\right]\left[\begin{array}{ccc} 5 & 0.5 & -6 \\ 0.5 & 10 & -3.5 \\ -6 & -3.5 & 1 \end{array}\right]\left\{\begin{array}{l} x_1 \\ x_2 \\ x_3 \end{array}\right\}=\frac{1}{2} \mathbf{x}^{\mathrm{T}} \mathbf{A} \mathbf{x} . \end{aligned} \]

The nature of a quadratic form is revealed by the values (positive and/or negative) that it can take. Important properties such as the positive definiteness of a matrix can be deduced from the quadratic form.

Matrix A can be classified depending on the sign of the quadratic form. The matrix is:

Negative semidefinite if \((1 / 2) \mathbf{x}^T \mathbf{A x} \leq 0\) for all \(\mathbf{x}\) except at least one \(\mathbf{x} \neq 0\) with \(\mathbf{x}^{\mathrm{T}} \mathbf{A x}=0\).
Negative definite if \((1 / 2) \mathbf{x}^{\top} \mathbf{A} \mathbf{x}<0\) for all \(\mathbf{x}\) except \(\mathbf{x}=0\).
Positive semidefinite if \((1 / 2) \mathbf{x}^{\mathrm{T}} \mathbf{A} \mathbf{x} \geq 0\) for all \(\mathbf{x}\) except at least one \(\mathbf{x} \neq 0\) with \(\mathbf{x}^{\mathrm{T}} \mathbf{A x}=0\).
Positive definite if \((1 / 2) \mathbf{x}^{\mathrm{T}} \mathbf{A x}>0\) except for \(\mathbf{x}=0\).
Indefinite if \((1 / 2) \mathbf{x}^{\mathrm{T}} \mathbf{A x}\) is positive for some \(\mathbf{x}\) and negative for others.

Many physical quantities such as energy are often positive, and therefore quadratic forms representing these quantities yield positive definite matrices.

Example

Determine the form of the matrices given below.

\(\quad \mathbf{A}=\left[\begin{array}{ccc}2 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 13\end{array}\right]\) (b) \(\mathbf{A}=\left[\begin{array}{ccc}1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1\end{array}\right]\)

Solution: (a) \[ \begin{aligned} \mathbf{x}^{\mathrm{T}} \mathbf{A x}= & {\left[x_1 x_2 x_3\right]\left[\begin{array}{ccc} 2 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 13 \end{array}\right]\left\{\begin{array}{l} x_1 \\ x_2 \\ x_3 \end{array}\right\} } \\ = & {\left[x_1 x_2 x_3\right]\left\{\begin{array}{c} 2 x_1 \\ 4 x_2 \\ 13 x_3 \end{array}\right\}=2 x_1^2+4 x_2^2+13 x_3^2>0 } \\ & \text { unless } \quad x_1=x_2=x_3=0(\mathbf{x}=0) \end{aligned} \]

Therefore, the form of the matrix \(\mathbf{A}\) is positive definite.

\[ \begin{aligned} \mathbf{x}^{\mathrm{T}} \mathbf{A} \mathbf{x} & =\left[x_1 x_2 x_3\right]\left[\begin{array}{ccc} 1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]\left\{\begin{array}{l} x_1 \\ x_2 \\ x_3 \end{array}\right\} \\ & =\left[x_1 x_2 x_3\right]\left\{\begin{array}{c} x_1-x_2 \\ -x_1+x_2 \\ x_3 \end{array}\right\}=x_1^2-2 x_1 x_2+x_2^2+x_3^2 \\ & =\left(x_1-x_2\right)^2+x_3^2 \geq 0 \end{aligned} \]

for all \(\mathbf{x}\). However, \(\mathbf{x}^{\mathrm{T}} \mathbf{A x}=0\) when \(\left(x_1-x_2\right)^2+x_3^2=0\). Therefore, the matrix is not positive definite but positive semidefinite, if \(x_3=0\), \(x_1=x_2\). For example, \(1,1,0 ; 2,2,0 ; \ldots ; 10,10,0\) are possible values of \(x\).

Finding the form of a matrix from its quadratic form requires algebraic manipulations and can be cumbersome. A direct approach to evaluate the form of a matrix comes from its eigenvalues.

Eigenvalue check for the form of a matrix

Let \(\lambda_i, i=1,2, \ldots, n\) be \(n\) eigenvalues of a symmetric \(n \times n\) matrix \(\mathbf{A}\) associated with the quadratic form \(F(\mathbf{x})=(1 / 2) \mathbf{x}^{\mathrm{T}} \mathbf{A} \mathbf{x}\). Then, 1. \(\mathbf{A}\) is positive definite if \(\lambda_i>0, i=1, n\) are strictly positive. 2. \(\mathbf{A}\) is positive semidefinite if \(\lambda_i \geq 0, i=1, n\) are nonnegative. 3. \(\mathbf{A}\) is negative definite if \(\lambda_i<0, i=1, n\) are strictly negative. 4. \(\mathbf{A}\) is negative semidefinite if \(\lambda_i \leq 0, i=1, n\) are nonpositive. 5. \(\mathbf{A}\) is indefinite if some \(\lambda_i<0\) and some other \(\lambda_i>0\).

Example 1.6 Determine the form of the following matrix using the eigenvalue approach.

\(\quad \mathbf{A}=\left[\begin{array}{ccc}-2 & 4 & -7 \\ 4 & -4 & 0 \\ -7 & 0 & -1\end{array}\right]\) (b) \(\mathbf{B}=\left[\begin{array}{ccc}1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1\end{array}\right]\)

Solution: (a) Eigenvalues of \(\mathbf{A}\) are given by

\[ \begin{aligned} & \left|\begin{array}{ccc} 2-\lambda & 4 & -7 \\ 4 & -4-\lambda & 0 \\ -7 & 0 & -1-\lambda \end{array}\right|=0 \\ & \lambda_1=6.28, \lambda_2=-10.05, \lambda_3=-3.22 \end{aligned} \]

The form is indefinite. (b) Eigenvalues of \(\mathbf{B}=\left[\begin{array}{ccc}1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1\end{array}\right]\) are obtained using

\[ \begin{aligned} \left|\begin{array}{ccc} 1-\lambda & -1 & 0 \\ -1 & 1-\lambda & 0 \\ 0 & 0 & 1-\lambda \end{array}\right|=0 \\ \lambda_1=0, \lambda_2=1, \lambda_3=2 . \end{aligned} \]

Thus, the form is positive semidefinite since there are nonnegative eigenvalues. 1.4.3 Differentiation of the quadratic form

\[ \begin{aligned} F(\mathbf{x}) & =\frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n a_{i j} x_i x_j \\ \frac{\partial F(\mathbf{x})}{\partial x_i} & =\sum_{j=1}^n a_{i j} x_j \Rightarrow \nabla \mathbf{F}(\mathbf{x})=\mathbf{A} \mathbf{x} \\ \frac{\partial^2 F(\mathbf{x})}{\partial x_j \partial x_i} & =a_{i j} \Rightarrow \mathbf{H}=\mathbf{A} \end{aligned} \]

Components \(a_{i j}\) of the matrix \(\mathbf{A}\) are the components of the Hessian matrix associated with the quadratic form.

With the basic concepts of the gradient, Hessian, Taylor series expansion and quadratic forms behind us, we are in a position to lay out the necessary and sufficient conditions for optimization.

Minimum point of a single-variable function

1.5 Optimality Criteria for Unconstrained Optimization

Necessary conditions must be satisfied at the optimum point. A point cannot be optimum if it does not satisfy the necessary conditions. However, satisfaction of necessary conditions does not guarantee an optimum. If a candidate optimum point satisfies the sufficient conditions, we have an optimum point.

The problem we are addressing is: Minimize \(f(\mathbf{x})\) without any constraints on \(\mathbf{x}\). We first look at the single-variable case and then at the multivariable case. 1.5.1 Necessary conditions for optimality of functions of one variable

Expand the function in a Taylor series:

\[ \begin{aligned} f(x) & =f\left(x^*\right)+f^{\prime}\left(x^*\right) d+\frac{1}{2} f^{\prime \prime}\left(x^*\right) d^2+R \\ \Delta f & =f^{\prime}\left(x^*\right) d+\frac{1}{2} f^{\prime \prime}\left(x^*\right) d^2+R \end{aligned} \]

Let \(x^*\) be a local minimum point of a function shown in Figure 1.2, and \(x\) be another point in its vicinity.

By definition, the function value at the minimum point must be lower than that at a nearby point. Then, we can write:

\[ \Delta f=f(x)-f\left(x^*\right) \geq 0 \]

Assuming \(d\) is small (local approximation), the first-order term in (1.32) dominates, \[ f^{\prime}\left(x^*\right) d \geq 0 \]

However, \(d=x-x^*\) can be either positive or negative. Therefore, for (1.34) to be true, \[ f^{\prime}\left(x^*\right)=0 \]

Points satisfying \(f^{\prime}\left(x^*\right)=0\) can be local minimum, maximum or neither maximum nor minimum (inflection or saddle points). In general, they are called stationary points.

4.5 Broyden-Fletcher-Goldfarb-Shanno Method

The update formulas must satisfy the above equation. For example, consider a simple update formula: \({ }^{[2]}\) \[ \mathbf{B}^{(k+1)}=\mathbf{B}^{(k)}+\frac{\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)^{\mathrm{T}}}{\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)^{\mathrm{T}} \mathbf{s}^{(k)}} \]

This formula satisfies the secant condition: \[ \mathbf{B}^{(k+1)} \mathbf{s}^{(k)}=\mathbf{B}^{(k)} \mathbf{s}^{(k)}+\frac{\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)^{\mathrm{T}} \mathbf{s}^{(k)}}{\left(\mathbf{y}^{(k)}-\mathbf{B}^{(k)} \mathbf{s}^{(k)}\right)^{\mathrm{T}} \mathbf{s}^{(k)}}=\mathbf{y}^{(k)} \]

The update formula shows that the approximate Hessian can be obtained by using only two sets of information: change in gradient and change in design. To start the method, the condition \(\mathbf{B}^{(0)}=\mathbf{I}\) can be used. However, better performance may be obtained if a more realistic approximation can be obtained. For example, the Hessian could be calculated at the starting design and then used with a quasi-Newton method.

All direct quasi-Newton methods are based on the formula, \[ \mathbf{B}^{(k+1)}=\mathbf{B}^{(k)}+\Delta \mathbf{B}^{(k)} \]

Here \(\Delta \mathbf{B}^{(k)}\) is an update to \(\mathbf{B}^{(k)}\). One of the most powerful direct-update methods is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. 4.5 Broyden-Fletcher-Goldfarb-Shanno Method

The BFGS algorithm was proposed in 1970 and gives an update formula which is robust to lack of accuracy in the line search. This makes the BFGS method computationally efficient, as function calls can be saved by performing somewhat inaccurate line searches. The BFGS algorithm is summarized below. \({ }^{[1]}\) 4.5.1 BFGS algorithm 1. Estimate \(\mathbf{x}^{(0)}\). Choose a positive definite \(\mathbf{B}^{(0)}\). If no information is available, select \(\mathbf{B}^{(0)}=\mathbf{I}\). Choose \(\varepsilon\). Set \(k=0\). Compute \(\mathbf{c}^{(0)}\).

Check for optimal point:

Calculate \(\left\|\mathbf{c}^{(k)}\right\|\). If \(\left\|\mathbf{c}^{(k)}\right\|<\varepsilon\) stop. If not, continue the process.

Find the searrch direction \(\mathbf{d}^{(k)}\) :

\[ \mathbf{B}^{(k)} \mathbf{d}^{(k)}=-\mathbf{c}^{(k)} \]

Compute \(\alpha_k=\alpha\) to minimize \(f\left(\mathbf{x}^{(k)}+\alpha \mathbf{d}^{(k)}\right)\).
Update design according to the following rule: \[ \mathbf{x}^{(k+1)}=\mathbf{x}^{(k)}+\alpha_k \mathbf{d}^{(k)} \]
Update the Hessian approximation as follows: \[ \begin{aligned} \mathbf{s}^{(k)} & =\alpha_k \mathbf{d}^{(k)} \\ \mathbf{y}^{(k)} & =\mathbf{c}^{(k+1)}-\mathbf{c}^{(k)} \end{aligned} \]

\[ \mathbf{B}^{(k+1)}=\mathbf{B}^{(k)}+\frac{\mathbf{y}^{(k)} \mathbf{y}^{(k) \mathrm{T}}}{\mathbf{y}^{(k)(\mathrm{T})} \mathbf{s}^{(k)}}+\frac{\mathbf{c}^{(k)} \mathbf{c}^{(k) \mathrm{T}}}{\mathbf{c}^{(k)(\mathrm{T})} \mathbf{d}^{(k)}} \]

Set \(k=k+1\). Go to Step 2.

The BFGS update formulation keeps the Hessian approximately positive definite. In numerical calculations, difficulties may sometimes arise due to inexact line search and round-off and truncation errors, in which case the Hessian can become singular or indefinite. In such a situation, an approach similar to the one used for Newton’s method can be used by making the updated Hessian positive definite to ensure that the search direction is a descent direction.

Example 4.8 Show that the BFGS update formula maintains symmetry and satisfies the secant condition. Solution: The BFGS formula is given by the following equation: \[ \mathbf{B}^{(k+1)}=\mathbf{B}^{(k)}+\frac{\mathbf{y}^{(k)} \mathbf{y}^{(k) \mathrm{T}}}{\mathbf{y}^{(k)(\mathrm{T})} \mathbf{s}^{(k)}}+\frac{\mathbf{c}^{(k)} \mathbf{c}^{(k) \mathrm{T}}}{\mathbf{c}^{(k)(\mathrm{T})} \mathbf{d}^{(k)}} \]

Taking the transpose of the matrices on both sides yields the result given below. \[ \mathbf{B}^{(k+1) \mathrm{T}}=\mathbf{B}^{(k) \mathrm{T}}+\frac{\left(\mathbf{y}^{(k)} \mathbf{y}^{(k) \mathrm{T}}\right)^{\mathrm{T}}}{\mathbf{y}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}+\frac{\left(\mathbf{c}^{(k)} \mathbf{c}^{(k) \mathrm{T}}\right)}{\mathbf{c}^{(k) \mathrm{T}} \mathbf{d}^{(k)}} \]

where the fact has been used that the denominator terms on the right-hand side of the equation are scalar and so are unchanged by taking the transpose. Assuming \(\mathbf{B}^{(k)}\) is symmetric, we get: \[ \mathbf{B}^{(k+1) \mathrm{T}}=\mathbf{B}^{(k)}+\frac{\mathbf{y}^{(k) \mathrm{T}^{\mathrm{T}}} \mathbf{y}^{(k) \mathrm{T}}}{\mathbf{y}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}+\frac{\mathbf{c}^{(k) \mathrm{T}^{\mathrm{T}}} \mathbf{c}^{(k) \mathrm{T}}}{\mathbf{c}^{(k) \mathrm{T} \mathrm{~d}^{(k)}}}=\mathbf{B}^{(k)}+\frac{\mathbf{y}^{(k)} \mathbf{y}^{(k) \mathrm{T}}}{\mathbf{y}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}+\frac{\mathbf{c}^{(k)} \mathbf{c}^{(k) \mathrm{T}}}{\mathbf{c}^{(k) \mathrm{T}} \mathbf{d}^{(k)}} \]

To put it in other words: \[ \mathbf{B}^{(k+1) \mathrm{T}}=\mathbf{B}^{(k+1)} \]

where we use the fact that for any matrix, in general, let us call it \(\mathbf{A}\), the relationship \(\mathbf{A}^{\mathrm{T}^{\mathrm{T}}}=\mathbf{A}\) holds true. Thus the BFGS formula maintains symmetry.

To see if the formula satisfies the secant condition, we multiply both sides by \(s^{(k)}\) : \[ \mathbf{B}^{(k+1)} \mathbf{s}^{(k)}=\mathbf{B}^{(k)} \mathbf{s}^{(k)}+\frac{\mathbf{y}^{(k)} \mathbf{y}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}{\mathbf{y}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}+\frac{\mathbf{c}^{(k)} \mathbf{c}^{(k) \mathrm{T}} \mathbf{s}^{(k)}}{\mathbf{c}^{(k) \mathrm{T}} \mathbf{d}^{(k)}} \]

The elements \(\mathbf{c}, \mathbf{s}\), and \(\mathbf{d}\) obey the relations given below: \[ \mathbf{c}^{(k) \mathrm{T}} \mathbf{s}^{(k)}=\mathbf{c}^{(k) \mathrm{T}} \mathbf{s}^{(k)}=\mathbf{c}^{(k) \mathrm{T}} \alpha_k \mathbf{d}^{(k)}=\alpha_k \mathbf{c}^{(k) \mathrm{T}} \mathbf{d}^{(k)} \]

Therefore, the matrix equation may be simplified to read as shown below: \[ \mathbf{B}^{(k+1)} \mathbf{s}^{(k)}=\mathbf{B}^{(k)} \mathbf{s}^{(k)}+\mathbf{y}^{(k)}+\alpha_k \mathbf{c}^{(k)} \]

Since, \(\mathbf{B}^{(k)} \mathbf{s}^{(k)}=\mathbf{B}^{(k)} \alpha_k \mathbf{d}^{(k)}=-\alpha_k \mathbf{c}^{(k)}\), we get \[ \mathbf{B}^{(k+1)} \mathbf{s}^{(k)}=\mathbf{y}^{(k)} \]

and the secant condition is satisfied. Example 4.9 Minimize the following function using the BFGS method. \[ f(\mathbf{x})=x_1^2+2 x_2^2+\mathrm{e}^{-x_1-x_2} \]

Perform two iterations starting from the point (1,2). Solution: In the BFGS method, we need to calculate the gradient. We assume an approximation for the matrix \(\mathbf{B}\) to start the algorithm.

For the first iteration about point \((1,2)\), \[ \mathbf{c}^{(0)}=\left\{\begin{array}{l} 2 x_1-\mathrm{e}^{-x_1-x_2} \\ 4 x_2-\mathrm{e}^{-x_1-x_2} \end{array}\right\}=\left\{\begin{array}{l} 1.9502 \\ 7.9502 \end{array}\right\}, \quad \mathbf{B}^{(0)}=\left[\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right] \]

The search direction is then obtained using the relation given below: \[ \mathbf{B}^{(0)} \mathbf{d}^{(0)}=-\mathbf{c}^{(0)} \]

Solving the equation, we get an expression for \(d\) : \[ \mathbf{d}^{(0)}=\left\{\begin{array}{l} -1.9502 \\ -7.9502 \end{array}\right\} \]

We numerically find \(\alpha=0.23779\) to minimize \(f\left(\mathbf{x}^{(0)}+\alpha \mathbf{d}^{(0)}\right)\). The new design is then given below:

\[ \mathbf{x}^{(1)}=\mathbf{x}^{(0)}+\alpha \mathbf{d}^{(0)}=\left\{\begin{array}{l} 0.5362 \\ 0.1095 \end{array}\right\} \]

We now need to update the Hessian approximation for the next iteration. We calculate the various vectors needed:

\[ \begin{aligned} & \mathbf{s}^{(0)}=\alpha \mathbf{d}^{(0)}=\left\{\begin{array}{l} -0.4637 \\ -1.8905 \end{array}\right\}, \\ & \mathbf{y}^{(0)}=\mathbf{c}^{(1)}-\mathbf{c}^{(0)}=\left\{\begin{array}{c} 0.5481 \\ -0.0863 \end{array}\right\}-\left\{\begin{array}{l} 1.9502 \\ 7.9502 \end{array}\right\}=\left\{\begin{array}{c} -1.4021 \\ -8.0365 \end{array}\right\} . \end{aligned} \]

Then, \[ \mathbf{B}^{(1)}=\mathbf{B}^{(0)}+\frac{\mathbf{y}^{(0)} \mathbf{y}^{(0) \mathrm{T}}}{\mathbf{y}^{(0) \mathrm{T}} \mathbf{s}^{(0)}}+\frac{\mathbf{c}^{(0)} \mathbf{c}^{(0) \mathrm{T}}}{\mathbf{c}^{(0) \mathrm{T}} \mathbf{d}^{(0)}}=\left[\begin{array}{ll} 1.0673 & 0.4798 \\ 0.4798 & 4.1332 \end{array}\right] \]

We can now proceed with the second iteration: \[ \mathbf{x}^{(1)}=\left\{\begin{array}{l} 0.5362 \\ 0.1095 \end{array}\right\} \mathbf{c}^{(1)}=\left\{\begin{array}{c} 0.5481 \\ -0.0863 \end{array}\right\} . \]

The search direction can be found using the following equation. \[ \mathbf{B}^{(1)} \mathbf{d}^{(1)}=-\mathbf{c}^{(1)} \]

Solving this yields the result: \[ \mathbf{d}^{(1)}=\left\{\begin{array}{c} -0.5517 \\ 0.0849 \end{array}\right\} \]

We again find \(\alpha\) using the 1D search. It comes to 0.40549 . The new design point is as shown: \[ \mathbf{x}^{(2)}=\mathbf{x}^{(1)}+\alpha \mathbf{d}^{(1)}=\left\{\begin{array}{l} 0.3123 \\ 0.1440 \end{array}\right\} \]

At this point, the gradient has decreased to the following: \[ \mathbf{c}^{(2)}=\left\{\begin{array}{l} -0.009 \\ -0.056 \end{array}\right\} \]

We again need to update the Hessian approximation. We obtain the intermediate vector, given by: \[ \mathbf{s}^{(1)}=\alpha_1 \mathbf{d}^{(1)}=\left\{\begin{array}{c} -0.2239 \\ 0.0345 \end{array}\right\} \] and \[ \mathbf{y}^{(1)}=\mathbf{c}^{(2)}-\mathbf{c}^{(1)}=\left\{\begin{array}{c} -0.5571 \\ 0.0303 \end{array}\right\} \]

The updated Hessian approximation can be written as shown below: \[ \mathbf{B}^{(2)}=\mathbf{B}^{(1)}+\frac{\mathbf{y}^{(1)} \mathbf{y}^{(1) T}}{\mathbf{y}^{(1) T} \mathbf{S}^{(1)}}+\frac{\mathbf{c}^{(1)} \mathbf{c}^{(1) \mathrm{T}}}{\mathbf{c}^{(1) \mathrm{T}} \mathbf{d}^{(1)}}=\left[\begin{array}{cc} 2.5662 & 0.50619 \\ 0.50619 & 4.1157 \end{array}\right] \]

Out of curiosity, we check this with the actual Hessian: \[ \mathbf{H}^{(2)}=\left[\begin{array}{cc} 2+\mathrm{e}^{-x_1-x_2} & \mathrm{e}^{-x_1-x_2} \\ \mathrm{e}^{-x_1-x_2} & 4+\mathrm{e}^{-x_1-x_2} \end{array}\right]=\left[\begin{array}{cc} 2.6336 & 0.6336 \\ 0.6336 & 4.6336 \end{array}\right] \]

It is clear that the BFGS method approximates the Hessian quite well in just two iterations, even when we started with the unit matrix as the first guess.