Optimization

Introduction

Pierre de Fermat, 1601 (Beaumont-de-Lomagne, near Montauban) – 1665 (Castres)
method maximis et minimis
Early developments that led to infinitesimal calculus

The problem is to solve

\[(P)\left\{\begin{array}{l} Min\; f(x)\\ x\in\R^n \end{array}\right. \]

Automatique differentiation

Code

using LinearAlgebra
using ForwardDiff
A = [-1 2;1 4]; b=[1,1]; c=1
f(x) = 0.5*x'*A*x + b'*x +c

analytic_∇f(x) = 0.5*(A' + A)*x+b
analytic_∇²f(x) = 0.5*(A' + A)

∇f(x) = ForwardDiff.gradient(f, x)
∇²f(x) = ForwardDiff.hessian(f, x)

x0 = [1,-1]
println("∇f(x0) = ", ∇f(x0))
println("analytic_∇f(x0) = ", analytic_∇f(x0))
println("analytic_∇f(x0) - ∇f(x0) = ", analytic_∇f(x0)- ∇f(x0))

println("∇²f(x0) = ", ∇²f(x0))
println("analytic_∇²f(x0) = ", analytic_∇²f(x0))
println("analytic_∇²f(x0) - ∇²f(x0) = ", analytic_∇²f(x0)- ∇²f(x0))

∇f(x0) = [-1.5, -1.5]
analytic_∇f(x0) = [-1.5, -1.5]
analytic_∇f(x0) - ∇f(x0) = [0.0, 0.0]
∇²f(x0) = [-1.0 1.5; 1.5 4.0]
analytic_∇²f(x0) = [-1.0 1.5; 1.5 4.0]
analytic_∇²f(x0) - ∇²f(x0) = [0.0 0.0; 0.0 0.0]

Newton’s Algorithm

Solve $\nabla f(x) = 0$ by Newton’s method

Require $f \colon \R^n \to \R$, $x_{0} \in \R^n$ ( initial point)

$k \gets 0$
continue = true
While continue $\#$ See Section Section 3.1
- $d_k $ solution of $\nabla ^2f(x_k)\, d =-\nabla f(x\_k)$ $\#$ Newton’s direction
- $x_{k+1} \gets x_{k} + d_{k}$ # Mise à jour de l’itéré
- $k \gets k + 1$
- continue = stop_function($\nabla f_k$,$x_k$,$x_{k+1}$,$f_k$,$f_{k+1}$,AbsTol,RelTol,$\varepsilon$)
EndWhile

Stop criteria

Stop criteria, Stagnation criteria are more restrictive ($\varepsilon = 0.01$, for example).
Criteria	Formula
$\norme{\nabla f(x_{k+1})}=0$	$\norme{\nabla f(x_{k+1})} < \mathrm{max}(\mathrm{RelTol}\norme{\nabla f(x_0)},\mathrm{AbsTol})$
Stagnation of the iterate	$\norme{x_{k+1}-x_k} < \varepsilon\mathrm{max}(\mathrm{RelTol}\norme{x_{k+1}},\mathrm{AbsTol})$
Stagnation of the function	$\abs{f(x_{k+1})-f(x_k)} < \varepsilon\mathrm{max}(\mathrm{RelTol}\abs{f(x_{k+1})},\mathrm{AbsTol})$
Maximum number of iteration	$k+1 = \mathrm{max\_iter}$

Exercice

Code

using LinearAlgebra
using ForwardDiff

"""
   Solve by Newton's algorithm the optimization problem Min f(x)
   Case where f is a function from R^n to R
"""


function algo_Newton(f,x0::Vector{<:Real};AbsTol= abs(eps()), RelTol = abs(eps()), ε=0.01, nbit_max = 0)
# to complete
    
    # flag = 0 if the program stop on the first criteria
    # flag = 2 if the program stop on the second criteria
    # ...
    return xₖ, flag, fₖ, ∇fₖ , k  
end

algo_Newton (generic function with 1 method)

Code

include("assets/julia/MyOptims.jl")

A = [1 0 ; 0 9/2]
b = [0,0]; c=0.
f1(x) = 0.5*x'*A*x + b'*x +c
x0 = [1000,-20]
println("Results for Newton on f1 : ", algo_Newton(f1,x0))
println("eigen value of 0.5(A^T+A) = ", 0.5*eigen(A'+A).values)
using Plots;
x=range(-10,stop=10,length=100)
y=range(-10,stop=10,length=100)
f11(x,y) = f1([x,y])
p1 = plot(x,y,f11,st=:contourf)

A = [-1 0 ; 0 3]
f2(x) = 0.5*x'*A*x + b'*x +c
println("Results for Newton on f2 : ", algo_Newton(f2,x0))
println("eigen value of 0.5(A^T+A) = ", 0.5*eigen(A'+A).values)
f21(x,y) = f1([x,y])
p2 = plot(x,y,f21,st=:contourf)
# Rosenbrock function
x=range(-1.5,stop=2,length=100)
y=range(-0.5,stop=3.,length=100)
f3(x) = (1-x[1])^2 + 100*(x[2]-x[1]^2)^2
f31(x,y) = f3([x,y])
p3 = plot(x,y,f31,st=:contourf)
x0 = [1.1,1.2]

println("Results for Newton on f3 : ", algo_Newton(f3,[1.1,1.2]))

println("Results for Newton on f3 : ", algo_Newton(f3,[3.,0.]))
plot(p1,p2,p3)

Results for Newton on f1 : ([0.0, 0.0], 0, 0.0, [0.0, 0.0], 1)
eigen value of 0.5(A^T+A) = [1.0, 4.5]
Results for Newton on f2 : ([0.0, 0.0], 0, 0.0, [0.0, 0.0], 1)
eigen value of 0.5(A^T+A) = [-1.0, 3.0]
Results for Newton on f3 : ([1.0, 1.0], 0, 0.0, [-0.0, 0.0], 7)
Results for Newton on f3 : ([1.0, 1.0], 0, 0.0, [-0.0, 0.0], 5)

Steepest descent

Algorithm

Require $f \colon \R^n \to \R$, $x_{0} \in \R^n$ ( initial point)

$k \gets 0$
continue = true
While continue $\#$ See Section Section 3.1
- $d_k = -\nabla f(x_k)$
- $\alpha_k = argmin_{\alpha>0}\{f(x_k+\alpha d_k)\}$
- $x_{k+1} \gets x_{k} + \alpha_kd_{k}$
- $k \gets k + 1$
- continue = stop_function($\nabla f_k$,$x_k$,$x_{k+1}$,$f_k$,$f_{k+1}$,AbsTol,RelTol,$\varepsilon$)
EndWhile

Steepest descent for a quadratic function

Exercise:

Let $f(x)=x^TAx +b^Tx + c$, where $A$ symetric and positive-definite.

If $x_k$ is a fixed vector and $d_k$ a nonzero vector compute $\alpha^*$ the solution of \[Min\; g(\alpha)=f(x_k+\alpha d_k)\]

Solution:

\[\alpha^* = -\frac{2x_k^TAd_k + b^Td_k}{2d_k^TAd_k}\]

Exercise:

Complete the steepest_descent_quad function and execute the following sript and explain why the Newton’s algorithm converge in one iteration.

A = [1.0 0.0; 0.0 9.0]
b = [-4.0, -18.0]
n= 5

Result with Newton's algorithm : 
xsol = [2.0, 1.0]
flag = 0
fsol = 0.0
∇f_xsol = [0.0, 0.0]
nb_iter = 1

Linear search: Armijo’s condition and backtraking

Let $x_k$ the current iterate and $d_k=-\nabla f(x_k)$. We note $g_k(\alpha)=f(x_k+\alpha d_k)$ We want to find an $\alpha>0$ such that we have a sufficient decay of the function $g_k$, i.e. which verify the Armijo’s condition : $g_f(\alpha)=f(x_k+\alpha d_k)\le f_k+c_1\alpha\nabla f_k^Td_k=\tilde{g}_f(\alpha)$

Initialization
- $\alpha_0=1$
- $\rho\in]0,1[$ ($\rho=0.8$)
- $c\in]0,1[$ ($c=10^{-4}$)
- $k=0$
While $g_{f}(\alpha_k)>\tilde{g}_f(\alpha_k)=f_k+c_1\alpha_k\nabla f_k^Td_k$
- $\alpha_k:=\rho\alpha_k$
- $k:=k+1$
end

Exercise: Complete the descent_backtrac function and execute the following sript.

Code

using Plots
x = range(-10*(n/2)-xsol[1],stop=10*(n/2)-xsol[1],length=100)
y = range(-9*(n/2)-xsol[2],stop=9*(n/2)-xsol[2],length=100)
f_contour(x,y) = f([x,y])
z = @. f_contour(x', y)
nb_levels = 7
x0 = (n/2)*[9,1]
xₖ = x0
p1 = plot()
for nbit in 1:nb_levels
  xsol, flag, fsol, ∇f_xsol , nb_iter  = descent_backtrac(f,x0,nbit_max=nbit)
  #xsol, flag, fsol, ∇f_xsol , nb_iter  = my_descent(f,x0,nbit_max=nbit)
  println("xsol = ", xsol)
  println("flag = ", flag)
  println("fsol = ", fsol)
  println("∇f_xsol = ", ∇f_xsol)
  println("nb_iter = ", nb_iter)
  plot!(p1,[xₖ[1],xsol[1]],[xₖ[2],xsol[2]],arrow=true)
  xₖ = xsol
end
levels = [f(Xsol[k,:]) for k in nb_levels+1:-1:1]
contour!(p1,xx,yy,z,levels=levels,cbar=false,color=:turbo)
plot(p1,legend=false)

xsol = [11.752095999999996, -4.577888000000002]
flag = 3
fsol = 375.1188872581122
∇f_xsol = [19.504191999999993, -100.40198400000004]
nb_iter = 1
xsol = [9.657849330627375, 6.2026929433378895]
flag = 3
fsol = 302.25478113451095
∇f_xsol = [15.31569866125475, 93.648472980082]
nb_iter = 2
xsol = [8.013338708990371, -3.8527352759069133]
flag = 3
fsol = 248.1015993513241
∇f_xsol = [12.026677417980743, -87.34923496632445]
nb_iter = 3
xsol = [6.721984054246145, 5.52631741186767]
flag = 3
fsol = 206.68507722534292
∇f_xsol = [9.44396810849229, 81.47371341361807]
nb_iter = 4
xsol = [5.70794569998511, -3.221855953011487]
flag = 3
fsol = 174.16547050584487
∇f_xsol = [7.415891399970221, -75.99340715420678]
nb_iter = 5
xsol = [4.911670424146116, 4.93787400796178]
flag = 3
fsol = 148.03948998207602
∇f_xsol = [5.823340848292233, 70.88173214331205]
nb_iter = 6
xsol = [4.286393961724215, -2.6729940280221545]
flag = 3
fsol = 126.64556351718664
∇f_xsol = [4.5727879234484305, -66.11389250439878]
nb_iter = 7