Skip to main content

5.5 Convex Optimization



Convex optimization problems are a particularly useful class of optimization problems, where we can guarantee global optimality. When $f(.)$ is a convex function, and when the constraints involving $g(.)$ and $h(.)$ are convex sets, this is called a convex optimization problem. In this setting, we have strong duality: The optimal solution of the dual problem is the same as the optimal solution of the primal problem.

A set $C$ is a convex set if for any $x, y \in C$ and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$\theta x + (1 - \theta)y \in C $.

Convex sets are sets such that a straight line connecting any two elements of the set lie inside the set. Figures 7.5 and 7.6 illustrate convex and non convex sets, respectively.

Convex functions are functions such that a straight line between any two points of the function lie above the function. 

Non convex function

Convex Function

Definition
Let function $f : \mathbb{R}^D \to  \mathbb{R}$ be a function whose domain is a convex set. The function $f$ is a convex function if for all $x,y$ in the domain of $f$, and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.

Remark. A concave function is the negative of a convex function

relation between convex functions and convex sets is to consider the set obtained by “filling in” a convex function. A convex function is a bowl-like object, and we imagine pouring water into it to fill it up. This resulting filled-in set, called the epigraph of the convex function and  is a convex set.

If a function $f : \mathbb{R}^n \to \mathbb{R}$ is differentiable, we can specify convexity in terms of its gradient $\bigtriangledown_x f(x)$.A function $f(x)$ is convex if and only if for any two points $x, y$ it holds that
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$

If we further know that a function $f(x)$ is twice differentiable, that is, the Hessian  exists for all values in the domain of $x$, then the function $f(x)$ is convex if and only if $\bigtriangledown_x^2 f(x)$ is positive semi definite

Example

Lets consider the  negative entropy function $f(x) = x log_2x$ is convex for $x > 0$.The following figure shows this function.From the figure we can see that the function is convex.
To illustrate the previous definitions of convexity, let us check the calculations for two points $x = 2$ and $x = 4$. Note that to prove convexity of $f(x)$ we would need to check for all points $x \in \mathbb{R}$.

As per definition of convexity 
$0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
Consider $\theta=0.5$
LHS
$f(0.5 \times 2 + (1 - 0.5) \times 4)=f(3)=2.log_23\approx 4.75$
RHS
$0.5 \times 2log_22 + (1 -0.5)\times 4log_24=5$.

So the definition for convexity is satisfied.
Since $f(x)$ is differentiable,alternatively we can  use the derivative of $f(x)$ and use the equation
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$
to obtain
$ \bigtriangledown_x (x log_2 x)=1.log_2x+x.\frac{1}{x log_e2}=log_2x+\frac{1}{log_e2}$
Using the same test point $x=2$ and $x=4$
The LHS is
$f(4)=8$
The RHS is
$f(x)+\bigtriangledown_x^T (y-x)=f(2)+\bigtriangledown f(2).(4-2)$
$\quad \qquad=2+(1+\frac{1}{log_e2}).2\approx 6.9$

The equality
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
is sometimes called Jensen's inequality.In fact a whole class of inequalities for taking non negative weighted sums of convex functions are called Jensen's in equality.

In summary a constrained optimization problem is called a convex optimization problem if
$min_x f(x)$
subject to 
$g_i(x)  \le 0$ for all $i=1,\ldots,m$
$h_j(x)=0$ for all $j=1,\ldots,n$

where all functions $f(x)$ and $g_i(x)$ are convex functions, and all $h_j(x)=0$ are convex sets.

Consider whether the following statements are true or false:
a. The intersection of any two convex sets is convex.
b. The union of any two convex sets is convex.
c. The difference of a convex set A from another convex set B is convex.
a. true
b. false
c. false

Consider whether the following statements are true or false:
a. The sum of any two convex functions is convex.
b. The difference of any two convex functions is convex.
c. The product of any two convex functions is convex.
d. The maximum of any two convex functions is convex.
a. true
b. false
c. false
d. true
Show that every affine function $f(x) = ax + b, x \in R$ is convex. 

Solution. 

$f((1 − \lambda)x_1 + \lambda x_2)= a((1 − \lambda)x_1 + \lambda x_2) + b$ 
$= a((1 − \lambda)x_1 + \lambda x_2) + ((1 − \lambda) + \lambda)b $
$= (1 − \lambda)(ax_1 + b) + \lambda(ax_2 + b) $
$= (1 − \lambda)f(x_1) + \lambda f(x_2) $

Show that $f(x) = x^2 , x \in R$ is strictly convex
Solution.
 Pick $x_1, x_2$ so that $x_1 \ne  x_2$, and pick $\lambda \in (0, 1)$
$ f((1 − \lambda)x_1 + \lambda x_2) = ((1 − \lambda)x_1 + \lambda x_2)^2$
$ = (1 − \lambda)^2x_1^2 + \lambda^ 2x_2^2 + 2(1 − \lambda)\lambda x_1x_2$
 What to do from here? Since $x_1 \ne x_2, (x_1−x_2)^2 > 0$
 Expanding, this means that $x_1^2+x_2^2 \ge 2x_1x_2$. This means that 

$(1 − λ)^2x_1^2 + λ^2x_2^2 + 2(1 − λ)λx_1x_2 < (1 − λ)^2x_1^2 + λ^2x_2^2 + (1 − λ)(λ)(x_1^2 + x_2^2 )$
$ = (1 − 2λ + λ^2 + λ - λ 2 )x_1^2 + (λ − λ^ 2 + λ^ 2 )x_2^2$
$ = (1 − λ)x _1^2 + λx_2^2 = (1 − λ)f(x_1) + λf(x_2)$ which proves strict convexity. 

This last example shows that proving convexity can be difficult and non intuitive, even for simple functions like $x^2$ . The good news is that oftentimes there are simpler conditions that we can check. These conditions involve the first and second derivatives of a function.
Note that the second derivative is 2 and is positive which also proves the convexity.
Example:
Let $f : R^n \to R$ be defined as $f(x) = \frac{1}{2} x^TAx + b^T x + c$ where $A$ is a symmetric matrix in $R^n, b \in R^n$ and $c \in R$. The Hessian matrix of $f$ is $A$ at any$x\in R^n$ 
$f$ is convex iff $A$ is positive semi-definite. 


Let $f(x) = x log x$ be defined on $C = \{x \in R : x > 0\}$. 
$f ' (x) = 1 + log x$ and  $f ''(x) = \frac{1}{ x} > 0 ∀ x \in C$ So, f(x) is convex. 

Let $f(x) = \frac{1}{2} ||Ax − b||^ 2$ Or, $f(x) =\frac{1}{2} (Ax − b)^ T (Ax − b)$
$f''(x) = A^ TA$ which is positive semi-definite. 
Therefore $f(x)$ is convex.

$f(x) = log(x)$ defined on $C = \{x \in R : x > 0\}$. 
$f '(x) = \frac{1}{x}$ and 
$f ''(x) = \frac{− 1}{x^2} < 0  \forall x \in C$. So, $f$ is concave. 

A critical point of a function $f : R^n \to R$ is any point $x$ at which $∇f(x) = 0.$ Compute all of the critical points of the following functions. If no critical point exists, explain why. 
$(a) f(x) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(b) f(x) = e^{ −||x||^2}$
$ (c) f(x) = x_1^2 − 2x_1x_2 + \frac{1}{3} x_2^3 − 8x_2 $
$(d) f(x) = (2x_1 − x_2)^ 2 + (x_2 − x_3) ^2 + (x3 − 1)^2$ 

Solution 
$(a) ∇f(x) = [2x_1 − 4, 4x_2] ^T = 0, then \quad x = (2, 0).$ 
$(b) ∇f(x) = −2e^{ −||x||^2} x = 0, then \quad x = 0.$
$ (c) ∇f(x) = [2x_1 − 2x_2, −2x_1 + x_2^2 − 8]^T = 0, then \quad x = (−2, −2) or \quad x = (4, 4)$
$(d) ∇f(x) = [4(2x_1 −x_2), −2(2x_1 −x_2) + 2(x_2 −x_3), −2(x_2 −x_3) + 2(x_3 −1)]^T = 0, then \quad x = (0.5, 1, 1)$

Find the local minimizers and maximizers for the following functions if they exist: 
$(a) f(x) = x_2 + cos x $
$(b) f(x_1, x_2) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(c) f(x_1, x_2) = e ^{−(x_1^2+x_2^2)}$
$(d) f(x_1, x_2, x_3) = (2x_1 − x_2)^2 + (x_2 − x_3)^2 + (x_3 − 1)^2$
Solution 
$(a) x = 0$ is local (global, since f is convex) minimizer; 
$(b) (x_1, x_2)^T = (2, 0)^T$ is local (global, since f is convex) minizer; 
$(c) (x_1, x_2) = (0, 0)$ is local (global, since f is convex) maximizer; 
$(d) (x_1, x_2, x_3) = ( 1/2 , 1, 1)$ is local (global, since f is convex) minimizer.


Is the function $f(x, y) = 2x^2 + y^2 + 6xy - x + 3y - 7$ convex, concave, or neither? Justify your answer.

$\frac{\mathrm{d} f}{\mathrm{d} x}=4x+6y-1$
$\frac{\mathrm{d} f}{\mathrm{d} y}=2y+6y+3$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x^2}=4$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y^2}=2$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x \mathrm{d} y}=6$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y \mathrm{d} x}=6$

So the Hessian is
$\begin{bmatrix}
4 & 6 \\
6 & 2
\end{bmatrix}$

How do you determine if it is concave or convex?

To find out if it is concave or convex, look at the second derivative. If the result is positive, it is convex. If it is negative, then it is concave.( for single variable)

If the Hessian is positive semi definite it is convex else concave.
It is noted that Hessian are symmetric matrices whose eigen values are real.

If the Hessian at a given point has all positive eigenvalues, it is said to be a positive-definite matrix. This is the multivariable equivalent of "convex".
If all of the eigenvalues are negative, it is said to be a negative-definite matrix. This is like "concave".
If either eigenvalue is 0, then you will need more information (possibly a graph or table) to see what is going on. And, if the eigenvalues are mixed (one positive, one negative), you have a saddle point:(convex at one point and concave at another point).

A matrix is positive definite if it’s symmetric and all its pivots are positive. 

if all upper left $k \times k$ determinants of a symmetric matrix are positive, the matrix is positive definite.

A matrix is positive definite if $x^TAx \gt 0$ for all vectors $x \ne 0$.     

We can use any one of the test.It is noted that the given matrix is not positive definite.Hence not convex.One eigen value is positive and one is negative.So it is neither convex nor concave.

Check for convexity $f(x_1, x_2) = x_1x_2$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &1 \\
1& 0
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) =\frac{1}{x_1x_2} \begin{bmatrix}
\frac{2}{x_1^2} &\frac{1}{x_1x_2} \\
\frac{1}{x_1x_2}& \frac{2}{x_2^2}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Check for convexity $f(x_1, x_2) = \frac{x_1}{x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &\frac{-1}{x_2^2} \\
\frac{-1}{x_2^2}& \frac{2x_1}{x_2^3}
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
\frac{2}{x_2} &\frac{-2x_1}{x_2^2} \\
\frac{-2x_1}{x_2^2}& \frac{2x_1^2}{x_2^3}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper Question Paper July 2021 and evaluation scheme Question Paper June 2022 and evaluation scheme Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping and Matri

1.1 Solving system of equations using Gauss Elimination Method

Elementary Transformations Key to solving a system of linear equations are elementary transformations that keep the solution set the same, but that transform the equation system into a simpler form: Exchange of two equations (rows in the matrix representing the system of equations) Multiplication of an equation (row) with a constant  Addition of two equations (rows) Add a scalar multiple of one row to the other. Row Echelon Form A matrix is in row-echelon form if All rows that contain only zeros are at the bottom of the matrix; correspondingly,all rows that contain at least one nonzero element are on top of rows that contain only zeros. Looking at nonzero rows only, the first nonzero number from the left pivot (also called the pivot or the leading coefficient) is always strictly to the right of the  pivot of the row above it. The row-echelon form is where the leading (first non-zero) entry of each row has only zeroes below it. These leading entries are called pivots Example: $\begin

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(A∪B)=P(A)+P(B)$  wh