Skip to main content

5.5 Convex Optimization



Convex optimization problems are a particularly useful class of optimization problems, where we can guarantee global optimality. When $f(.)$ is a convex function, and when the constraints involving $g(.)$ and $h(.)$ are convex sets, this is called a convex optimization problem. In this setting, we have strong duality: The optimal solution of the dual problem is the same as the optimal solution of the primal problem.

A set $C$ is a convex set if for any $x, y \in C$ and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$\theta x + (1 - \theta)y \in C $.

Convex sets are sets such that a straight line connecting any two elements of the set lie inside the set. Figures 7.5 and 7.6 illustrate convex and non convex sets, respectively.

Convex functions are functions such that a straight line between any two points of the function lie above the function. 

Non convex function

Convex Function

Definition
Let function $f : \mathbb{R}^D \to  \mathbb{R}$ be a function whose domain is a convex set. The function $f$ is a convex function if for all $x,y$ in the domain of $f$, and for any scalar $\theta$ with $0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.

Remark. A concave function is the negative of a convex function

relation between convex functions and convex sets is to consider the set obtained by “filling in” a convex function. A convex function is a bowl-like object, and we imagine pouring water into it to fill it up. This resulting filled-in set, called the epigraph of the convex function and  is a convex set.

If a function $f : \mathbb{R}^n \to \mathbb{R}$ is differentiable, we can specify convexity in terms of its gradient $\bigtriangledown_x f(x)$.A function $f(x)$ is convex if and only if for any two points $x, y$ it holds that
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$

If we further know that a function $f(x)$ is twice differentiable, that is, the Hessian  exists for all values in the domain of $x$, then the function $f(x)$ is convex if and only if $\bigtriangledown_x^2 f(x)$ is positive semi definite

Example

Lets consider the  negative entropy function $f(x) = x log_2x$ is convex for $x > 0$.The following figure shows this function.From the figure we can see that the function is convex.
To illustrate the previous definitions of convexity, let us check the calculations for two points $x = 2$ and $x = 4$. Note that to prove convexity of $f(x)$ we would need to check for all points $x \in \mathbb{R}$.

As per definition of convexity 
$0 \le \theta \le 1$, we have 
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
Consider $\theta=0.5$
LHS
$f(0.5 \times 2 + (1 - 0.5) \times 4)=f(3)=2.log_23\approx 4.75$
RHS
$0.5 \times 2log_22 + (1 -0.5)\times 4log_24=5$.

So the definition for convexity is satisfied.
Since $f(x)$ is differentiable,alternatively we can  use the derivative of $f(x)$ and use the equation
$f(y) \ge f(x)+ \bigtriangledown_x f(x)^T (y- x)$
to obtain
$ \bigtriangledown_x (x log_2 x)=1.log_2x+x.\frac{1}{x log_e2}=log_2x+\frac{1}{log_e2}$
Using the same test point $x=2$ and $x=4$
The LHS is
$f(4)=8$
The RHS is
$f(x)+\bigtriangledown_x^T (y-x)=f(2)+\bigtriangledown f(2).(4-2)$
$\quad \qquad=2+(1+\frac{1}{log_e2}).2\approx 6.9$

The equality
$f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 -\theta)f(y)$.
is sometimes called Jensen's inequality.In fact a whole class of inequalities for taking non negative weighted sums of convex functions are called Jensen's in equality.

In summary a constrained optimization problem is called a convex optimization problem if
$min_x f(x)$
subject to 
$g_i(x)  \le 0$ for all $i=1,\ldots,m$
$h_j(x)=0$ for all $j=1,\ldots,n$

where all functions $f(x)$ and $g_i(x)$ are convex functions, and all $h_j(x)=0$ are convex sets.

Consider whether the following statements are true or false:
a. The intersection of any two convex sets is convex.
b. The union of any two convex sets is convex.
c. The difference of a convex set A from another convex set B is convex.
a. true
b. false
c. false

Consider whether the following statements are true or false:
a. The sum of any two convex functions is convex.
b. The difference of any two convex functions is convex.
c. The product of any two convex functions is convex.
d. The maximum of any two convex functions is convex.
a. true
b. false
c. false
d. true
Show that every affine function $f(x) = ax + b, x \in R$ is convex. 

Solution. 

$f((1 − \lambda)x_1 + \lambda x_2)= a((1 − \lambda)x_1 + \lambda x_2) + b$ 
$= a((1 − \lambda)x_1 + \lambda x_2) + ((1 − \lambda) + \lambda)b $
$= (1 − \lambda)(ax_1 + b) + \lambda(ax_2 + b) $
$= (1 − \lambda)f(x_1) + \lambda f(x_2) $

Show that $f(x) = x^2 , x \in R$ is strictly convex
Solution.
 Pick $x_1, x_2$ so that $x_1 \ne  x_2$, and pick $\lambda \in (0, 1)$
$ f((1 − \lambda)x_1 + \lambda x_2) = ((1 − \lambda)x_1 + \lambda x_2)^2$
$ = (1 − \lambda)^2x_1^2 + \lambda^ 2x_2^2 + 2(1 − \lambda)\lambda x_1x_2$
 What to do from here? Since $x_1 \ne x_2, (x_1−x_2)^2 > 0$
 Expanding, this means that $x_1^2+x_2^2 \ge 2x_1x_2$. This means that 

$(1 − λ)^2x_1^2 + λ^2x_2^2 + 2(1 − λ)λx_1x_2 < (1 − λ)^2x_1^2 + λ^2x_2^2 + (1 − λ)(λ)(x_1^2 + x_2^2 )$
$ = (1 − 2λ + λ^2 + λ - λ 2 )x_1^2 + (λ − λ^ 2 + λ^ 2 )x_2^2$
$ = (1 − λ)x _1^2 + λx_2^2 = (1 − λ)f(x_1) + λf(x_2)$ which proves strict convexity. 

This last example shows that proving convexity can be difficult and non intuitive, even for simple functions like $x^2$ . The good news is that oftentimes there are simpler conditions that we can check. These conditions involve the first and second derivatives of a function.
Note that the second derivative is 2 and is positive which also proves the convexity.
Example:
Let $f : R^n \to R$ be defined as $f(x) = \frac{1}{2} x^TAx + b^T x + c$ where $A$ is a symmetric matrix in $R^n, b \in R^n$ and $c \in R$. The Hessian matrix of $f$ is $A$ at any$x\in R^n$ 
$f$ is convex iff $A$ is positive semi-definite. 


Let $f(x) = x log x$ be defined on $C = \{x \in R : x > 0\}$. 
$f ' (x) = 1 + log x$ and  $f ''(x) = \frac{1}{ x} > 0 ∀ x \in C$ So, f(x) is convex. 

Let $f(x) = \frac{1}{2} ||Ax − b||^ 2$ Or, $f(x) =\frac{1}{2} (Ax − b)^ T (Ax − b)$
$f''(x) = A^ TA$ which is positive semi-definite. 
Therefore $f(x)$ is convex.

$f(x) = log(x)$ defined on $C = \{x \in R : x > 0\}$. 
$f '(x) = \frac{1}{x}$ and 
$f ''(x) = \frac{− 1}{x^2} < 0  \forall x \in C$. So, $f$ is concave. 

A critical point of a function $f : R^n \to R$ is any point $x$ at which $∇f(x) = 0.$ Compute all of the critical points of the following functions. If no critical point exists, explain why. 
$(a) f(x) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(b) f(x) = e^{ −||x||^2}$
$ (c) f(x) = x_1^2 − 2x_1x_2 + \frac{1}{3} x_2^3 − 8x_2 $
$(d) f(x) = (2x_1 − x_2)^ 2 + (x_2 − x_3) ^2 + (x3 − 1)^2$ 

Solution 
$(a) ∇f(x) = [2x_1 − 4, 4x_2] ^T = 0, then \quad x = (2, 0).$ 
$(b) ∇f(x) = −2e^{ −||x||^2} x = 0, then \quad x = 0.$
$ (c) ∇f(x) = [2x_1 − 2x_2, −2x_1 + x_2^2 − 8]^T = 0, then \quad x = (−2, −2) or \quad x = (4, 4)$
$(d) ∇f(x) = [4(2x_1 −x_2), −2(2x_1 −x_2) + 2(x_2 −x_3), −2(x_2 −x_3) + 2(x_3 −1)]^T = 0, then \quad x = (0.5, 1, 1)$

Find the local minimizers and maximizers for the following functions if they exist: 
$(a) f(x) = x_2 + cos x $
$(b) f(x_1, x_2) = x_1^2 − 4x_1 + 2x_2^2 + 7$
$(c) f(x_1, x_2) = e ^{−(x_1^2+x_2^2)}$
$(d) f(x_1, x_2, x_3) = (2x_1 − x_2)^2 + (x_2 − x_3)^2 + (x_3 − 1)^2$
Solution 
$(a) x = 0$ is local (global, since f is convex) minimizer; 
$(b) (x_1, x_2)^T = (2, 0)^T$ is local (global, since f is convex) minizer; 
$(c) (x_1, x_2) = (0, 0)$ is local (global, since f is convex) maximizer; 
$(d) (x_1, x_2, x_3) = ( 1/2 , 1, 1)$ is local (global, since f is convex) minimizer.


Is the function $f(x, y) = 2x^2 + y^2 + 6xy - x + 3y - 7$ convex, concave, or neither? Justify your answer.

$\frac{\mathrm{d} f}{\mathrm{d} x}=4x+6y-1$
$\frac{\mathrm{d} f}{\mathrm{d} y}=2y+6y+3$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x^2}=4$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y^2}=2$
$\frac{\mathrm{d}^2 f}{\mathrm{d} x \mathrm{d} y}=6$
$\frac{\mathrm{d}^2 f}{\mathrm{d} y \mathrm{d} x}=6$

So the Hessian is
$\begin{bmatrix}
4 & 6 \\
6 & 2
\end{bmatrix}$

How do you determine if it is concave or convex?

To find out if it is concave or convex, look at the second derivative. If the result is positive, it is convex. If it is negative, then it is concave.( for single variable)

If the Hessian is positive semi definite it is convex else concave.
It is noted that Hessian are symmetric matrices whose eigen values are real.

If the Hessian at a given point has all positive eigenvalues, it is said to be a positive-definite matrix. This is the multivariable equivalent of "convex".
If all of the eigenvalues are negative, it is said to be a negative-definite matrix. This is like "concave".
If either eigenvalue is 0, then you will need more information (possibly a graph or table) to see what is going on. And, if the eigenvalues are mixed (one positive, one negative), you have a saddle point:(convex at one point and concave at another point).

A matrix is positive definite if it’s symmetric and all its pivots are positive. 

if all upper left $k \times k$ determinants of a symmetric matrix are positive, the matrix is positive definite.

A matrix is positive definite if $x^TAx \gt 0$ for all vectors $x \ne 0$.     

We can use any one of the test.It is noted that the given matrix is not positive definite.Hence not convex.One eigen value is positive and one is negative.So it is neither convex nor concave.

Check for convexity $f(x_1, x_2) = x_1x_2$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &1 \\
1& 0
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) =\frac{1}{x_1x_2} \begin{bmatrix}
\frac{2}{x_1^2} &\frac{1}{x_1x_2} \\
\frac{1}{x_1x_2}& \frac{2}{x_2^2}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Check for convexity $f(x_1, x_2) = \frac{x_1}{x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
0 &\frac{-1}{x_2^2} \\
\frac{-1}{x_2^2}& \frac{2x_1}{x_2^3}
\end{bmatrix}$ , which is neither positive semi definite nor negative semi definite. Therefore, $f$ is neither convex nor concave.

Check for convexity $f(x_1, x_2) = \frac{1}{x_1x_2}$ on $R^2$

 Solution. The Hessian of $f$ is 
$∇^2 f(x) = \begin{bmatrix}
\frac{2}{x_2} &\frac{-2x_1}{x_2^2} \\
\frac{-2x_1}{x_2^2}& \frac{2x_1^2}{x_2^3}
\end{bmatrix}$ , which is  positive  definite . Therefore, $f$ is  convex.

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

Vectors in Machine Learning

As data scientists we work with data in various formats such as text images and numerical values We often use vectors to represent data in a structured and efficient manner especially in machine learning applications In this blog post we will explore what vectors are in terms of machine learning their significance and how they are used What is a Vector? In mathematics, a vector is a mathematical object that has both magnitude and direction. In machine learning, a vector is a mathematical representation of a set of numerical values. Vectors are usually represented as arrays or lists of numbers, and each number in the list represents a specific feature or attribute of the data. For example, suppose we have a dataset of houses, and we want to predict their prices based on their features such as the number of bedrooms, the size of the house, and the location. We can represent each house as a vector, where each element of the vector represents a specific feature of the house, such as the nu...

2.14 Singular Value Decomposition

The Singular Value Decomposition ( SVD) of a matrix is a central matrix decomposition method in linear algebra.It can be applied to all matrices,not only to square matrices and it always exists.It has been referred to as the 'fundamental theorem of linear algebra'( strang 1993). SVD Theorem: Let $A^{m \times n}$ be a rectangular matrix of rank $r \in [0,min(m,n)]$. The SVD of A is a decomposition of the form. $A= U \Sigma V^T $ with an orthogonal matrix $U \in \mathbb{R}^{m \times m}$ with column vectors $u_i, i=1,\ldots,m$ and an orthogonal matrix $V \in \mathbb{R}^{n \times n}$ with column vectors $v_j, j=1,\ldots,n$.Moreover, $\Sigma$ is an $m \times n$ matrix with $\sum_{ii} = \sigma \ge 0$ and $\sigma_{ij}=0, i \ne j$. The diagonal entries $\Sigma_i=1,\ldots,r$ of $\sigma$ are called singular values . $u_i$ are called left singular vectors , and $v_j$ are called right singular vectors .By convention singular values are ordered ie; $\sigma_1 \ge \sigma_2 \ldots \sigma_r \...