Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content

5.4 Constrained Optimization and Lagrange Multipliers

 


In the unconstrained optimization we considered the problem of solving for the minimum of a function

minf(x) where f:RDR

In the constrained optimization we have additional constraints. That is, for real-valued functions gi:RDR for i=1,,m, we consider the constrained optimization problem 

minf(x) 
subject to gi(x)0 for all  i=1,,m.

In general the functions f and gi could be non-convex in general.




One obvious, but not very practical, way of converting the constrained problem into an unconstrained one is to use an indicator function

J(x)=f(x)+mi=11(gi(x))

where 1(z) is an infinite step function
1(z)={0ifz0otherwise}

This gives infinite penalty if the constraint is not satisfied and hence would provide the same solution.However, this infinite step function is equally difficult to optimize. We can overcome this difficulty by introducing Lagrange multipliers. The idea of Lagrange multipliers is to replace the step function with a linear function.

We associate to problem the Lagrangian by introducing the Lagrange multipliers λi0 corresponding to each inequality constraint respectively  so that

L(x,λ)=f(x)+mi=1λigi(x)
=f(x)+λTg(x)

We now introduce the idea of Lagrangian duality. In general, duality in optimization is the idea of converting an optimization problem in one set of variables x(called the primal variables), into another optimization problem in a different set of variables λ (called the dual variables). We introduce two different approaches to duality:
Lagrangian Duality and Legendre-Fenchel Duality

Lagrangian Duality
The problem
minf(x) 
subject to gi(x)0 for all  i=1,,m
is know as the primal problem, corresponds to the primal variable x.The associated Lagrangian dual problem is given by

maxD(λ),λRm
subject to λ0

where λ are the dual variables and D(λ)=minxRdL(x,λ)

Assume that f(.) and g(.) are differentiable, we find the Lagrange dual problem by differentiating the Lagrangian with respect to x, setting the differential to 0, and solving for the optimal value.

Equality constraints
Consider 
minf(x) 
subject to gi(x)0 for all  i=1,,m
                 hj(x)=0 for all  j=1,,n
We can model equality constraints by replacing them with two inequality constraints.
That is for each equality constraint hj(x)=0 we equivalently replace it by two constraints hj(x)0 and hj(x)0. It turns out that the resulting Lagrange multipliers are then unconstrained.Therefore, we constrain the Lagrange multipliers corresponding to the inequality constraints in equation to be non-negative, and leave the Lagrange multipliers corresponding to the equality constraints unconstrained.
L(x,λ,v)=f(x0)+mi=1λgi(x)+ni=1vihi(x)
The optimal value of the dual problem is less than or equal to the optimal value of the Primal problem( weak duality)

D(λ,v)infL(x,λ,v)
Example:
Least Square problem ( find the dual)
Minimize ||x2||
Subject to Ax=b
L(x,v)=||x2||+vi(Axb)i
L(x,v)=xTx+vT(Axb)
xL(x,v)=2x+ATv=0
x0=12ATv
So Lagrangian Dual is 
D(v)=L(x0,v)=14vTAATv+12vTAATvvTb
=14vTAATvvTb

Example:
Dual of general LP. 
Find the dual function of the LP 
minimize cTx 
subject to 
Ax=b 
x0
1. The Lagrangian is 
L(x,λ,v)=cTx+λT.x+vT(Axb)
L(x,λ,v)=cTxλTx+vTAxvTb
L(x,λ,v)=(cTλT+vTA)xvTb
The dual problem is
maximize D(λ,v) subject to λ0.
is having two options
vTbifcTλT+vTA=0
ifcTλT+vTA0unbounded this is not feasible
So the minimization LP problem become a maximization problem in standard  LP form
Maximize vTb 
subject to
c+ATv0 because λ0
λi0

Example:
Minimize the function f(x,y)=x2+y2
Subject to the constraint x+2y5=0
L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=x2+y2+λ(x+2y5)
Lx=0,2x+λ=0
Ly=0,2y+2λ=0
Lλ=0,x+2y5=0
from equation 1 λ=2x so from equation 2 y=2x
from equation 3
x+2y5=0
x+4x5=0
x=1 So y=2
So the minimum is f(x,y)=x2+y2=f(1,2)=12+22=5
Example:
Minimize the function
f(x,y)=x210x+y214y+28
subject to the constraint x+y=8
L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=x210x+y214y+28+λ(x+y8)
Lx=0,2x10+λ=0
Ly=0,2y14+λ=0
Lλ=0,x+y8=0
Subtracting 2 from 1
2x2y=4 
xy=2
y=x2
Using the equation 3
x+y8=0
x+x28=0
x=5
so y=3
minimum value is f(5,3)=38
Example:
Use the method of Lagrangian Multipliers to find the point that represent the minimum value for
f(x,y)=x2+y2 subject to the constraint x+4y=20

L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=x2+y2+λ(x+4y20)
Lx=0,2x+λ=0
Ly=0,2y+4λ=0
Lλ=0,x+4y20=0
Solving equation 1 and 2
x=2017
using equation 3 
y=8017
The minimum point is (20/7,80/7) and the minimum value is 400/7
Example:
Maximize f(x,y)=2x+2xy+y subject to the constraint 2x+y=100
L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=2x+2xy+y+λ(2x+y100)
Lx=0,2y+2λ=0
Ly=0,2x+λ=0
Lλ=0,2x+y100=0
Solving equation 1 and 2 we will get
x=25
So
y=50
The maximum point is (25,50) and the value is 2600

Example:
Maximize the function f(x,y)=2xy subject to the constraint x+y=6

L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=2xy+λ(x+y6)
Lx=0,2y+λ=0
Ly=0,2x+λ=0
Lλ=0,x+y6=0
from the first two equations
2y2x=0
y=x
from equation 3
2x6=0
x=3
So  y=3
So the maximum occurs at (3,3) and the value is f(x,y)=2xy=2.3.3=18
Example:
optimize f(x,y)=xy+1 with the constraint
g(x,y)=x2+y21=0
L(x,y,λ)=f(x,y)+λg(x,y)
L(x,y,λ)=xy+1+λ(x2+y21)
Lx=0,y+λ2x=0
Ly=0,x+λ2y=0
Lλ=0,x2+y21=0
From the first  equation
x=y2λ
putting it into second
y2λ+λ2y=0
y+4λ2y=0
y(4λ21)=0
This means that y=0 or 4λ21=0
so
λ=±12
when y=0 x=±1 from equation 3
so we have two points (1,0), (1,0)
when λ=±12
y=x or y=x
So
using equation 3
x2+x21=0
2x2=1
x=±12
So we have four points
(12,12),(12,12),(12,12),(12,12)

f(x,y)=f(12,12)=14+1=54
f(x,y)=f(12,12)=14+1=34
f(x,y)=f(12,12)=14+1=34
f(x,y)=f(12,12)=14+1=54
f(x,y)=f(1,0)=1
f(x,y)=f(1,0)=1
From this maximum and minimum can be found very easily.

Example:
Maximize the function f(x,y,z)=xy+yz+xz on the unit sphere g(x,y,z)=x2+y2+z2=1
L(x,y,z,λ)=xy+yz+xz+λ(x2+y2+z21)
Lx=0,y+z+λ2x=0
Ly=0,x+z+λ2y=0
Lz=0,y+x+λ2z=0
Lλ=0,x2+y2+z21=0
From the first two equations
xλ2xy+λ2y=0
x(12λ)y(12λ)=0
(12λ)(xy)=0
when xy=0,we have x=y
so from equation 4 we have
x2+x2+z2=1
2x2+z2=1
z=±12x2
So the points (x,x,12x2) and (x,x,12x2) represents maximum value of f(x,y,z)
when xy we have (12λ)=0
so λ=12
which results in equation x+y+z=0 so z=xy
f(x,y,z)=xy+y(xy)+x(xy)
f(x,y,z)=xyxyy2x2xy
f(x,y,z)=xyy2x2 which is always negative

Example:
Find the extreme values of the function f(x,y)=x2+2y2 on the circle x2+y2=1.
Using Lagrange multipliers, 
fx=λgx 
fy=λgy

which results in following equations

2x=λ2x.........(1)
4y=λ2y..........(2)
x2+y2=1...............(3)

From equation (1) x=0 or λ=1 when x0
From equation (3) when x=0 , y=±1
This result in f has extreme values at points (0,1), (0,1)
When λ=1 from equation(2) y=0
This result in extreme points (1,0) and (1,0)

Evaluating f at these four points, we find that 
f(0,1)=2
f(0,1)=2
f(1,0)=1
f(1,0)=1
Therefore the maximum value of f on the circle x2+y2=1 is f(0,±1)=2 and minimum value is f(±1,0)=1

Example: Lagrangian Duality


Example:
Finding extreme values of f(x,y,z)=x2+2y2+3z2 on unit sphere x2+y2+z2=1

You can write the Lagrangian as:


So we can get the gradient and putting it equal zero:



and we have these solutions: (±1,0,0,1)(0,±1,0,2) and (0,0,±1,3).


Example:

Find the extrema of f(x,y)=x subject to g(x,y)=x2+2y2=3.

fx=λgx
1=λ2x
x=12λ

fy=λgy
0=λ4y
y=0

When y=0.x=±3
when y0,x=
So
f(3,0)=3
f(3,0)=3
f(,y)=

so the maximum is 3 and minimum is 3

Example
Find the maximum value of f(x,y,z)=xyz given that g(x,y,z)=x+y+z=3 and x,y,z>=0.
L(x,y,z,λ)=xyz+λ(x+y+z3)
Lx=0,yz+λ=0
Ly=0,xz+λ=0
Lz=0,xy+λ=0
Lλ=0,x+y+z3=0
The first 3 equation implies that
xz=yz=xy
xz=yz
z(xy)=0
so
x=y
we  have xz=xy
so xz=x.x
x=z
This implies that x=y=z
putting it into equation 4 x+y+z3=0
3x3=0
3(x1)=0
So x=1,y=1,z=1 is the maximum and the value is f(x,y,z)=1.1.1=1

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let p(x,y) is the joint distribution of the two random variables x,y. The distributions p(x) and p(y) are the corresponding marginal distributions, and p(y|x) is the conditional distribution of y given x. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: P(AB)=P(A)+P(B)P(AB) Suppose A and B are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  P(AB)=0  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...