Skip to main content

5.5 Convex Optimization



Convex optimization problems are a particularly useful class of optimization problems, where we can guarantee global optimality. When $f(.)$ is a convex function, and when the constraints involving $g(.)$ and $h(.)$ are convex sets, this is called a convex optimization problem. In this setting, we have strong duality: The optimal solution of the dual problem is the same as the optimal solution of the primal problem.

A set $C$ is a convex set if for any $x, y \in C$ and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$\theta x + (1 - \theta)y \in C $.

Convex sets are sets such that a straight line connecting any two elements of the set lie inside the set. Figures 7.5 and 7.6 illustrate convex and non convex sets, respectively.

Convex functions are functions such that a straight line between any two points of the function lie above the function. 

Non convex function

Convex Function

Definition
Let function $f : \mathbb{R}^D \to  \mathbb{R}$ be a function whose domain is a convex set. The function $f$ is a convex function if for all $x,y$ in the domain of $f$, and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.

Remark. A concave function is the negative of a convex function

relation between convex functions and convex sets is to consider the set obtained by “filling in” a convex function. A convex function is a bowl-like object, and we imagine pouring water into it to fill it up. This resulting filled-in set, called the epigraph of the convex function and  is a convex set.

If a function $f : \mathbb{R}^n \to \mathbb{R}$ is differentiable, we can specify convexity in terms of its gradient $\bigtriangledown_x f(x)$.A function $f(x)$ is convex if and only if for any two points $x, y$ it holds that
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$

If we further know that a function $f(x)$ is twice differentiable, that is, the Hessian  exists for all values in the domain of $x$, then the function $f(x)$ is convex if and only if $\bigtriangledown_x^2 f(x)$ is positive semi definite

Example

Lets consider the  negative entropy function $f(x) = x log_2x$ is convex for $x > 0$.The following figure shows this function.From the figure we can see that the function is convex.
To illustrate the previous definitions of convexity, let us check the calculations for two points $x = 2$ and $x = 4$. Note that to prove convexity of $f(x)$ we would need to check for all points $x \in \mathbb{R}$.

As per definition of convexity 
$0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
Consider $\theta=0.5$
LHS
$f(0.5 \times 2 + (1 - 0.5) \times 4)=f(3)=2.log_23\approx 4.75$
RHS
$0.5 \times 2log_22 + (1 -0.5)\times 4log_24=5$.

So the definition for convexity is satisfied.
Since $f(x)$ is differentiable,alternatively we can  use the derivative of $f(x)$ and use the equation
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$
to obtain
$ \bigtriangledown_x (x log_2 x)=1.log_2x+x.\frac{1}{x log_e2}=log_2x+\frac{1}{log_e2}$
Using the same test point $x=2$ and $x=4$
The LHS is
$f(4)=8$
The RHS is
$f(x)+\bigtriangledown_x^T (y-x)=f(2)+\bigtriangledown f(2).(4-2)$
$\quad \qquad=2+(1+\frac{1}{log_e2}).2\approx 6.9$

The equality
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
is sometimes called Jensen's inequality.In fact a whole class of inequalities for taking non negative weighted sums of convex functions are called Jensen's in equality.

In summary a constrained optimization problem is called a convex optimization problem if
$min_x f(x)$
subject to 
$g_i(x)  \le 0$ for all $i=1,\ldots,m$
$h_j(x)=0$ for all $j=1,\ldots,n$

where all functions $f(x)$ and $g_i(x)$ are convex functions, and all $h_j(x)=0$ are convex sets.

Consider whether the following statements are true or false:
a. The intersection of any two convex sets is convex.
b. The union of any two convex sets is convex.
c. The difference of a convex set A from another convex set B is convex.
a. true
b. false
c. false

Consider whether the following statements are true or false:
a. The sum of any two convex functions is convex.
b. The difference of any two convex functions is convex.
c. The product of any two convex functions is convex.
d. The maximum of any two convex functions is convex.
a. true
b. false
c. false
d. true
Show that every affine function $f(x) = ax + b, x \in R$ is convex. 

Solution. 

$f((1 − \lambda)x_1 + \lambda x_2)= a((1 − \lambda)x_1 + \lambda x_2) + b$ 
$= a((1 − \lambda)x_1 + \lambda x_2) + ((1 − \lambda) + \lambda)b $
$= (1 − \lambda)(ax_1 + b) + \lambda(ax_2 + b) $
$= (1 − \lambda)f(x_1) + \lambda f(x_2) $

Show that $f(x) = x^2 , x \in R$ is strictly convex
Solution.
 Pick $x_1, x_2$ so that $x_1 \ne  x_2$, and pick $\lambda \in (0, 1)$
$ f((1 − \lambda)x_1 + \lambda x_2) = ((1 − \lambda)x_1 + \lambda x_2)^2$
$ = (1 − \lambda)^2x_1^2 + \lambda^ 2x_2^2 + 2(1 − \lambda)\lambda x_1x_2$
 What to do from here? Since $x_1 \ne x_2, (x_1−x_2)^2 > 0$
 Expanding, this means that $x_1^2+x_2^2 \ge 2x_1x_2$. This means that 

$(1 − λ)^2x_1^2 + λ^2x_2^2 + 2(1 − λ)λx_1x_2 < (1 − λ)^2x_1^2 + λ^2x_2^2 + (1 − λ)(λ)(x_1^2 + x_2^2 )$
$ = (1 − 2λ + λ^2 + λ - λ 2 )x_1^2 + (λ − λ^ 2 + λ^ 2 )x_2^2$
$ = (1 − λ)x _1^2 + λx_2^2 = (1 − λ)f(x_1) + λf(x_2)$ which proves strict convexity. 

This last example shows that proving convexity can be difficult and non intuitive, even for simple functions like $x^2$ . The good news is that oftentimes there are simpler conditions that we can check. These conditions involve the first and second derivatives of a function.
Note that the second derivative is 2 and is positive which also proves the convexity.
Example:
Let $f : R^n \to R$ be defined as $f(x) = \frac{1}{2} x^TAx + b^T x + c$ where $A$ is a symmetric matrix in $R^n, b \in R^n$ and $c \in R$. The Hessian matrix of $f$ is $A$ at any$x\in R^n$ 
$f$ is convex iff $A$ is positive semi-definite. 


Let $f(x) = x log x$ be defined on $C = \{x \in R : x > 0\}$. 
$f ' (x) = 1 + log x$ and  $f ''(x) = \frac{1}{ x} > 0 ∀ x \in C$ So, f(x) is convex. 

Let $f(x) = \frac{1}{2} ||Ax − b||^ 2$ Or, $f(x) =\frac{1}{2} (Ax − b)^ T (Ax − b)$
$f''(x) = A^ TA$ which is positive semi-definite. 
Therefore $f(x)$ is convex.

$f(x) = log(x)$ defined on $C = \{x \in R : x > 0\}$. 
$f '(x) = \frac{1}{x}$ and 
$f ''(x) = \frac{− 1}{x^2} < 0  \forall x \in C$. So, $f$ is concave. 

A critical point of a function $f : R^n \to R$ is any point $x$ at which $∇f(x) = 0.$ Compute all of the critical points of the following functions. If no critical point exists, explain why. 
$(a) f(x) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(b) f(x) = e^{ −||x||^2}$
$ (c) f(x) = x_1^2 − 2x_1x_2 + \frac{1}{3} x_2^3 − 8x_2 $
$(d) f(x) = (2x_1 − x_2)^ 2 + (x_2 − x_3) ^2 + (x3 − 1)^2$ 

Solution 
$(a) ∇f(x) = [2x_1 − 4, 4x_2] ^T = 0, then \quad x = (2, 0).$ 
$(b) ∇f(x) = −2e^{ −||x||^2} x = 0, then \quad x = 0.$
$ (c) ∇f(x) = [2x_1 − 2x_2, −2x_1 + x_2^2 − 8]^T = 0, then \quad x = (−2, −2) or \quad x = (4, 4)$
$(d) ∇f(x) = [4(2x_1 −x_2), −2(2x_1 −x_2) + 2(x_2 −x_3), −2(x_2 −x_3) + 2(x_3 −1)]^T = 0, then \quad x = (0.5, 1, 1)$

Find the local minimizers and maximizers for the following functions if they exist: 
$(a) f(x) = x_2 + cos x $
$(b) f(x_1, x_2) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(c) f(x_1, x_2) = e ^{−(x_1^2+x_2^2)}$
$(d) f(x_1, x_2, x_3) = (2x_1 − x_2)^2 + (x_2 − x_3)^2 + (x_3 − 1)^2$
Solution 
$(a) x = 0$ is local (global, since f is convex) minimizer; 
$(b) (x_1, x_2)^T = (2, 0)^T$ is local (global, since f is convex) minizer; 
$(c) (x_1, x_2) = (0, 0)$ is local (global, since f is convex) maximizer; 
$(d) (x_1, x_2, x_3) = ( 1/2 , 1, 1)$ is local (global, since f is convex) minimizer.


Is the function $f(x, y) = 2x^2 + y^2 + 6xy - x + 3y - 7$ convex, concave, or neither? Justify your answer.

$\frac{\mathrm{d} f}{\mathrm{d} x}=4x+6y-1$
$\frac{\mathrm{d} f}{\mathrm{d} y}=2y+6y+3$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x^2}=4$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y^2}=2$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x \mathrm{d} y}=6$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y \mathrm{d} x}=6$

So the Hessian is
$\begin{bmatrix}
4 & 6 \\
6 & 2
\end{bmatrix}$

How do you determine if it is concave or convex?

To find out if it is concave or convex, look at the second derivative. If the result is positive, it is convex. If it is negative, then it is concave.( for single variable)

If the Hessian is positive semi definite it is convex else concave.
It is noted that Hessian are symmetric matrices whose eigen values are real.

If the Hessian at a given point has all positive eigenvalues, it is said to be a positive-definite matrix. This is the multivariable equivalent of "convex".
If all of the eigenvalues are negative, it is said to be a negative-definite matrix. This is like "concave".
If either eigenvalue is 0, then you will need more information (possibly a graph or table) to see what is going on. And, if the eigenvalues are mixed (one positive, one negative), you have a saddle point:(convex at one point and concave at another point).

A matrix is positive definite if it’s symmetric and all its pivots are positive. 

if all upper left $k \times k$ determinants of a symmetric matrix are positive, the matrix is positive definite.

A matrix is positive definite if $x^TAx \gt 0$ for all vectors $x \ne 0$.     

We can use any one of the test.It is noted that the given matrix is not positive definite.Hence not convex.One eigen value is positive and one is negative.So it is neither convex nor concave.

Check for convexity $f(x_1, x_2) = x_1x_2$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &1 \\
1& 0
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) =\frac{1}{x_1x_2} \begin{bmatrix}
\frac{2}{x_1^2} &\frac{1}{x_1x_2} \\
\frac{1}{x_1x_2}& \frac{2}{x_2^2}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Check for convexity $f(x_1, x_2) = \frac{x_1}{x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &\frac{-1}{x_2^2} \\
\frac{-1}{x_2^2}& \frac{2x_1}{x_2^3}
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
\frac{2}{x_2} &\frac{-2x_1}{x_2^2} \\
\frac{-2x_1}{x_2^2}& \frac{2x_1^2}{x_2^3}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...