Skip to main content

5.4 Constrained Optimization and Lagrange Multipliers

 


In the unconstrained optimization we considered the problem of solving for the minimum of a function

$min \quad f(x) $ where $f : \mathbb{R}^D \to  \mathbb{R}$

In the constrained optimization we have additional constraints. That is, for real-valued functions $g_i : \mathbb{R}^D \to \mathbb{R}$ for $i = 1,\ldots,m$, we consider the constrained optimization problem 

$min\quad f(x)$ 
subject to $g_i(x) \le 0$ for all  $i = 1,\ldots,m$.

In general the functions $f$ and $g_i$ could be non-convex in general.




One obvious, but not very practical, way of converting the constrained problem into an unconstrained one is to use an indicator function

$J(x)=f(x)+ \sum_{i=1}^{m} 1(g_i(x))$

where $1(z)$ is an infinite step function
$1(z)=\left\{\begin{matrix}
0 & if z \le 0\\
\infty & otherwise
\end{matrix}\right\}$

This gives infinite penalty if the constraint is not satisfied and hence would provide the same solution.However, this infinite step function is equally difficult to optimize. We can overcome this difficulty by introducing Lagrange multipliers. The idea of Lagrange multipliers is to replace the step function with a linear function.

We associate to problem the Lagrangian by introducing the Lagrange multipliers $\lambda_i \ge 0$ corresponding to each inequality constraint respectively  so that

${L}(x,\lambda)=f(x)+\sum_{i=1}^m \lambda_i g_i(x)$
$\quad =f(x)+\lambda^Tg(x)$

We now introduce the idea of Lagrangian duality. In general, duality in optimization is the idea of converting an optimization problem in one set of variables $x$(called the primal variables), into another optimization problem in a different set of variables $\lambda$ (called the dual variables). We introduce two different approaches to duality:
Lagrangian Duality and Legendre-Fenchel Duality

Lagrangian Duality
The problem
$min\quad f(x)$ 
subject to $g_i(x) \le 0$ for all  $i = 1,\ldots,m$
is know as the primal problem, corresponds to the primal variable $x$.The associated Lagrangian dual problem is given by

$max D(\lambda),\quad \lambda \in \mathbb{R}^m$
subject to $\lambda \ge 0$

where $\lambda$ are the dual variables and $D(\lambda)=min_{x \in \mathbb{R}^d} L(x,\lambda)$

Assume that $f(.)$ and $g(.)$ are differentiable, we find the Lagrange dual problem by differentiating the Lagrangian with respect to $x$, setting the differential to 0, and solving for the optimal value.

Equality constraints
Consider 
$min\quad f(x)$ 
subject to $g_i(x) \le 0$ for all  $i = 1,\ldots,m$
                 $h_j(x) = 0$ for all  $j=1,\ldots,n$
We can model equality constraints by replacing them with two inequality constraints.
That is for each equality constraint $h_j(x) = 0$ we equivalently replace it by two constraints $h_j(x) \le 0$ and $h_j(x) \ge 0$. It turns out that the resulting Lagrange multipliers are then unconstrained.Therefore, we constrain the Lagrange multipliers corresponding to the inequality constraints in equation to be non-negative, and leave the Lagrange multipliers corresponding to the equality constraints unconstrained.
$L(x,\lambda,v)=f(x_0)+\sum_{i=1}^{m} \lambda g_i(x)+ \sum_{i=1}^{n} v_i h_i(x)$
The optimal value of the dual problem is less than or equal to the optimal value of the Primal problem( weak duality)

$D(\lambda,v) \le inf L(x,\lambda,v)$
Example:
Least Square problem ( find the dual)
Minimize $||x^2||$
Subject to $Ax=b$
$L(x,v)=||x^2||+\sum v_i ( Ax-b)_i$
$L(x,v)=x^Tx+v^T ( Ax-b)$
$\bigtriangledown_x L(x,v)=2 x + A^T v=0$
$x^0=\frac{-1}{2}A^Tv$
So Lagrangian Dual is 
$D(v)=L(x^0,v)=\frac{1}{4}v^TAA^Tv+\frac{-1}{2}v^TAA^Tv-v^Tb$
$=\frac{-1}{4}v^TAA^Tv-v^Tb$

Example:
Dual of general LP. 
Find the dual function of the LP 
minimize $c^Tx$ 
subject to 
$Ax=b$ 
$x \ge 0$
1. The Lagrangian is 
$L(x, λ,v) = c^T x + \lambda^T. -x + v^T (Ax − b)$
$ L(x, λ,v)= c^Tx - λ^Tx+v^T Ax-v^Tb$
$L(x, λ,v)=(c^T-λ^T+v^TA)x-v^Tb$
The dual problem is
maximize $D(λ,v )$ subject to $λ \ge 0$.
is having two options
$-v^Tb \qquad  if \quad c^T-λ^T+v^TA=0$
$-\infty   \qquad  if \quad c^T-λ^T+v^TA \ne 0\qquad unbounded$ this is not feasible
So the minimization LP problem become a maximization problem in standard  LP form
Maximize $-v^Tb$ 
subject to
$c+A^Tv \ge 0$ because $\lambda \ge 0$
$\lambda_i \ge 0$

Example:
Minimize the function $f(x,y)=x^2+y^2$
Subject to the constraint $x+2y-5=0$
$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=x^2+y^2+\lambda (x+2y-5)$
$\frac{\partial L}{\partial x}=0,\quad 2x+\lambda=0$
$\frac{\partial L}{\partial y}=0,\quad 2y+2\lambda=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x+2y-5=0$
from equation 1 $\lambda=-2x$ so from equation 2 $y=2x$
from equation 3
$x+2y-5=0$
$x+4x-5=0$
$x=1$ So $y=2$
So the minimum is $f(x,y)=x^2+y^2=f(1,2)=1^2+2^2=5$
Example:
Minimize the function
$f(x,y)=x^2-10x+y^2-14y+28$
subject to the constraint $x+y=8$
$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=x^2-10x+y^2-14y+28+\lambda (x+y-8)$
$\frac{\partial L}{\partial x}=0,\quad 2x-10+\lambda=0$
$\frac{\partial L}{\partial y}=0,\quad 2y-14+\lambda=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x+y-8=0$
Subtracting 2 from 1
$2x-2y=4$ 
$x-y=2$
$y=x-2$
Using the equation 3
$x+y-8=0$
$x+x-2-8=0$
$x=5$
so $y=3$
minimum value is $f(5,3)=-38$
Example:
Use the method of Lagrangian Multipliers to find the point that represent the minimum value for
$f(x,y)=x^2+y^2$ subject to the constraint $x+4y=20$

$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=x^2+y^2+\lambda (x+4y-20)$
$\frac{\partial L}{\partial x}=0,\quad 2x+\lambda=0$
$\frac{\partial L}{\partial y}=0,\quad 2y+4\lambda=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x+4y-20=0$
Solving equation 1 and 2
$x=\frac{20}{17}$
using equation 3 
$y=\frac{80}{17}$
The minimum point is $(20/7, 80/7)$ and the minimum value is $400/7$
Example:
Maximize $f(x,y)=2x+2xy+y$ subject to the constraint $2x+y=100$
$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=2x+2xy+y+\lambda (2x+y-100)$
$\frac{\partial L}{\partial x}=0,\quad 2y+2\lambda=0$
$\frac{\partial L}{\partial y}=0,\quad 2x+\lambda=0$
$\frac{\partial L}{\partial \lambda}=0,\quad 2x+y-100=0$
Solving equation 1 and 2 we will get
$x=25$
So
$y=50$
The maximum point is $(25,50)$ and the value is $2600$

Example:
Maximize the function $f(x,y)=2xy$ subject to the constraint $x+y=6$

$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=2xy+\lambda (x+y-6)$
$\frac{\partial L}{\partial x}=0,\quad 2y+\lambda=0$
$\frac{\partial L}{\partial y}=0,\quad 2x+\lambda=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x+y-6=0$
from the first two equations
$2y-2x=0$
$y=x$
from equation 3
$2x-6=0$
$x=3$
So  $y=3$
So the maximum occurs at $(3,3)$ and the value is $f(x,y)=2xy=2.3.3=18$
Example:
optimize $f(x,y)=xy+1$ with the constraint
$g(x,y)=x^2+y^2-1=0$
$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
$L(x,y,\lambda)=xy+1+\lambda (x^2+y^2-1)$
$\frac{\partial L}{\partial x}=0,\quad y+\lambda 2x=0$
$\frac{\partial L}{\partial y}=0,\quad x+\lambda 2y=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x^2+y^2-1=0$
From the first  equation
$x=\frac{-y}{2 \lambda}$
putting it into second
$\frac{-y}{2 \lambda}+\lambda 2y=0$
$-y+4 \lambda^2 y=0$
$y(4\lambda^2-1)=0$
This means that $y=0$ or $4\lambda^2-1=0$
so
$\lambda= \pm \frac{1}{2}$
when $y=0$ $x=\pm 1$ from equation 3
so we have two points $(1,0)$, $(-1,0)$
when $\lambda=\pm \frac{1}{2}$
$y=x$ or $y=-x$
So
using equation 3
$x^2+x^2-1=0$
$2x^2=1$
$x=\pm \frac{1}{\sqrt{2}}$
So we have four points
$(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}),(\frac{1}{\sqrt{2}},\frac{-1}{\sqrt{2}}),(\frac{-1}{\sqrt{2}},\frac{1}{\sqrt{2}}),(\frac{-1}{\sqrt{2}},\frac{-1}{\sqrt{2}})$

$f(x,y)=f(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})=\frac{1}{4}+1=\frac{5}{4}$
$f(x,y)=f(\frac{1}{\sqrt{2}},\frac{-1}{\sqrt{2}})=\frac{-1}{4}+1=\frac{3}{4}$
$f(x,y)=f(\frac{-1}{\sqrt{2}},\frac{1}{\sqrt{2}})=\frac{-1}{4}+1=\frac{3}{4}$
$f(x,y)=f(\frac{-1}{\sqrt{2}},\frac{-1}{\sqrt{2}})=\frac{1}{4}+1=\frac{5}{4}$
$f(x,y)=f(1,0)=1$
$f(x,y)=f(-1,0)=1$
From this maximum and minimum can be found very easily.

Example:
Maximize the function $f(x, y, z) = xy + yz + xz$ on the unit sphere $g(x, y, z) = x^2 + y^2 +z^2=1$
$L(x,y,z,\lambda)=xy+yz+xz+\lambda(x^2+y^2+z^2-1)$
$\frac{\partial L}{\partial x}=0,\quad y+z+\lambda 2x=0$
$\frac{\partial L}{\partial y}=0,\quad x+z+\lambda 2y=0$
$\frac{\partial L}{\partial z}=0,\quad y+x+\lambda 2z=0$
$\frac{\partial L}{\partial \lambda}=0,\quad x^2+y^2+z^2-1=0$
From the first two equations
$x-\lambda 2x-y+\lambda 2y=0$
$x(1-2 \lambda)-y(1-2 \lambda)=0$
$(1-2 \lambda)(x-y)=0$
when $x-y=0$,we have $x=y$
so from equation 4 we have
$x^2+x^2+z^2=1$
$2x^2+z^2=1$
$z=\pm \sqrt{1-2x^2}$
So the points $(x,x,\sqrt{1-2x^2})$ and $(x,x,- \sqrt{1-2x^2})$ represents maximum value of $f(x,y,z)$
when $x \ne y$ we have $(1-2\lambda)=0$
so $\lambda=\frac{1}{2}$
which results in equation $x+y+z=0$ so $z=-x-y$
$f(x,y,z)=xy+y(-x-y)+x(-x-y)$
$f(x,y,z)=xy-xy-y^2-x^2-xy$
$f(x,y,z)=-xy-y^2-x^2$ which is always negative

Example:
Find the extreme values of the function $f(x,y)=x^2+2y^2$ on the circle $x^2+y^2=1.$
Using Lagrange multipliers, 
$f_x = \lambda g_x$ 
$f_y = \lambda g_y$

which results in following equations

$2x=\lambda 2x.........(1)$
$4y=\lambda 2y..........(2)$
$x^2+y^2=1...............(3)$

From equation (1) $x=0$ or $\lambda=1$ when $x \ne 0$
From equation (3) when $x=0$ , $y=\pm 1$
This result in $f$ has extreme values at points $(0,1)$, $(0,-1)$
When $\lambda=1$ from equation(2) $y=0$
This result in extreme points $(1,0)$ and $(-1,0)$

Evaluating $f$ at these four points, we find that 
$ f(0,1)=2 $
$ f(0,-1)=2 $
$ f(1,0)=1 $
$ f(-1,0)=1$
Therefore the maximum value of $f$ on the circle $x^2+y^2=1$ is $f(0,\pm 1)=2$ and minimum value is $f(\pm 1,0)=1$

Example: Lagrangian Duality


Example:
Finding extreme values of $f(x,y,z)=x^2+2y^2+3z^2$ on unit sphere $x^2+y^2+z^2=1$

You can write the Lagrangian as:


So we can get the gradient and putting it equal zero:



and we have these solutions: (±1,0,0,1)(0,±1,0,2) and (0,0,±1,3).


Example:

Find the extrema of $f(x, y) = x$ subject to $g(x, y) = x^2 + 2y^2 =3.$

$f_x=\lambda g_x$
$1=\lambda 2x$
$x=\frac{1}{2\lambda}$

$f_y=\lambda g_y$
$0=\lambda 4y$
$y=0$

When $y=0. x=\pm \sqrt{3}$
when $y \ne 0, x =\infty $
So
$f(\sqrt{3},0)=\sqrt{3}$
$f(-\sqrt{3},0)=-\sqrt{3}$
$f(\infty,y)=\infty $

so the maximum is $\sqrt{3}$ and minimum is $-\sqrt{3}$

Example
Find the maximum value of $f(x,y,z) = xyz$ given that $g(x,y,z) = x + y + z = 3$ and $x,y,z>=0.$
$L(x,y,z,\lambda)=xyz+\lambda(x+y+z-3)$
$\frac{\partial L}{\partial x}=0,\quad yz+\lambda =0$
$\frac{\partial L}{\partial y}=0,\quad xz+\lambda =0$
$\frac{\partial L}{\partial z}=0,\quad xy+\lambda =0$
$\frac{\partial L}{\partial \lambda}=0,\quad x+y+z-3=0$
The first 3 equation implies that
$xz=yz=xy$
$xz=yz$
$z(x-y)=0$
so
$x=y$
we  have $xz=xy$
so $xz=x.x$
$x=z$
This implies that $x=y=z$
putting it into equation 4 $x+y+z-3=0$
$3x-3=0$
$3(x-1)=0$
So $x=1,y=1,z=1$ is the maximum and the value is $f(x,y,z)=1.1.1=1$

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper Question Paper July 2021 and evaluation scheme Question Paper June 2022 and evaluation scheme Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping and Matri

1.1 Solving system of equations using Gauss Elimination Method

Elementary Transformations Key to solving a system of linear equations are elementary transformations that keep the solution set the same, but that transform the equation system into a simpler form: Exchange of two equations (rows in the matrix representing the system of equations) Multiplication of an equation (row) with a constant  Addition of two equations (rows) Add a scalar multiple of one row to the other. Row Echelon Form A matrix is in row-echelon form if All rows that contain only zeros are at the bottom of the matrix; correspondingly,all rows that contain at least one nonzero element are on top of rows that contain only zeros. Looking at nonzero rows only, the first nonzero number from the left pivot (also called the pivot or the leading coefficient) is always strictly to the right of the  pivot of the row above it. The row-echelon form is where the leading (first non-zero) entry of each row has only zeroes below it. These leading entries are called pivots Example: $\begin

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(A∪B)=P(A)+P(B)$  wh