Quasi-Newton Methods
optimisation, mdo
BFGS
BFGS method is named after the initials of its inventors, Broyden, Fletcher, Goldfarb, and Shanno. Here is the update formula for the BFGS method where \(B^{(k)}\) is the approximation of the Hessian at the \(k\)-th iteration.
\[ B^{(k+1)} = B^{(k)} + \frac{y^{(k)}y^{(k)T}}{y^{(k)T}s^{(k)}} - \frac{B^{(k)}s^{(k)}s^{(k)T}B^{(k)}}{s^{(k)T}B^{(k)}s^{(k)}} \]
where \(y^{(k)} = \nabla f(x^{(k+1)}) - \nabla f(x^{(k)})\) and \(s^{(k)} = x^{(k+1)} - x^{(k)}\).
The initial approximation \(B^{(0)}\) is usually taken as the identity matrix or any positive definite matrix.
Derivation of the BFGS update formula can be found in the book by Nocedal and Wright (Nocedal and Wright 2006).
Here we present the pseudo code for the BFGS method.
Start with an initial guess \(x^{(0)}\) and an initial approximation of the Hessian \(B^{(0)}\).
Compute the gradient \(\nabla f(x^{(k)})\).
Compute the search direction \(B^{(k)}d^{(k)} = -\nabla f(x^{(k)})\) (*This step requires the solution of a linear system.).
Perform a line search to find the step size \(\alpha^{(k)}\).
Calculate the next iterate \(x^{(k+1)} = x^{(k)} + \alpha^{(k)}d^{(k)}\).
Compute the gradient \(\nabla f(x^{(k+1)})\).
Compute the difference \(y^{(k)} = \nabla f(x^{(k+1)}) - \nabla f(x^{(k)})\) and \(s^{(k)} = x^{(k+1)} - x^{(k)}\).
Update the approximation of the Hessian using the BFGS update formula.
If the stopping criterion is not satisfied, go to step 3.
Here is the Python implementation of the BFGS method.
def bfgs(f, x0, tol=1e-6, max_iter=1000):
x = x0
B = np.eye(len(x0))
for i in range(max_iter):
g = grad(f, x)
if np.linalg.norm(g) < tol:
break
p = -np.dot(B, g)
alpha = line_search(f, x, p, g)
s = alpha * p
x_new = x + s
y = grad(f, x_new) - g
Bs = np.dot(B, s)
B += np.outer(y, y) / np.dot(y, s) - np.outer(Bs, Bs) / np.dot(Bs, s)
x = x_new
return x
L-BFGS (available in Matlab
fminunc
) is a limited-memory version of the BFGS method. It is particularly useful for large-scale optimisation problems where the Hessian matrix is too large to store in memory. The L-BFGS method stores only a few vectors to approximate the Hessian matrix. The L-BFGS method is also known as the low-memory BFGS method.