3.9 Higher Order Derivatives

3.9 Higher Order Derivatives - Hessian

Some times we are interested in derivatives of higher order e.g when we want to use Newton’s Method for optimization, which requires second-order derivatives (Nocedal and Wright, 2006).

Consider a function $f : \mathbb{R}^2 \to \mathbb{R}$ of two variables $x,y$. We use the following notation for higher-order partial derivatives (and for gradients):

$\frac{\partial ^2f}{\partial x^2}$ is the second partial derivative of $f$ with respect to $x$.

$\frac{\partial ^nf}{\partial x^n}$ is the $n$th partial derivative of $f$ with respect to $x$.

$\frac{\partial ^2f}{\partial y \partial x}=\frac{\partial }{\partial y}(\frac{\partial f}{\partial x})$

is the partial derivative obtained by first partial differentiating with respect to $x$ and then with respect to $y$.

$\frac{\partial ^2f}{\partial x \partial y}$ is the partial derivative obtained by first partial differentiating by $y$ and then $x$.

The Hessian is the collection of all second-order partial derivatives.

If $f(x,y)$ is a twice continuously differentiable function, then

$\frac{\partial ^2f}{\partial x \partial y}=\frac{\partial ^2f}{\partial y \partial x}$

i.e., the order of differentiation does not matter, and the corresponding Hessian matrix.

$H=\begin{bmatrix}
\frac{\partial ^2f}{\partial x^2} &\frac{\partial ^2f}{\partial x \partial y} \\
\frac{\partial ^2f}{\partial x \partial y}& \frac{\partial ^2f}{\partial y^2}
\end{bmatrix}$

is symmetric. The Hessian is denoted as $\triangledown ^2_{x,y}f(x,y)$. Generally for $x \in \mathbb{R}^n$ and $f:\mathbb{R}^n \to \mathbb{R}$, the Hessian is an $n \times n $ matrix. The hessian measures the curvature of the function locally around $(x,y)$.

General Hessian Matrix

$H=\begin{bmatrix}
\frac{\partial ^2f}{\partial x_1^2} &\frac{\partial ^2f}{\partial x_1 \partial x_2}& \cdots & \frac{\partial ^2f}{\partial x_1 \partial x_n} \\
\frac{\partial ^2f}{\partial x_2 \partial x_1} &\frac{\partial ^2f}{\partial x_2^2}& \cdots & \frac{\partial ^2f}{\partial x_2 \partial x_n} \\
\cdots & \cdots & \cdots & \cdots \\
\frac{\partial ^2f}{\partial x_n \partial x_1} &\frac{\partial ^2f}{\partial x_n \partial x_2}& \cdots & \frac{\partial ^2f}{\partial x_n^2}
\end{bmatrix}$

Hessian of a vector field : if $f:\mathbb{R}^n \to \mathbb{R}^m$ is a vector field, the Hessian is an $(m \times n \times n )$ is tensor.

Hessians are used in Machine Learning for the determination of local minima and local maxima for solving the optimization problems.

Conditions for minima ,maxima and saddle point

The Hessian of a function is denoted by $\triangledown ^2_{x,y}f(x,y)$, where $f$ is twice differentiable function and if $f(x_0,y_0)$ is one of its stationary point then

$\triangledown ^2_{x,y}f(x_0,y_0)>0$ i.e, positive definite $(x_0,y_0)$ is a point of local minimum.

$\triangledown ^2_{x,y}f(x_0,y_0)<0$ i.e, negative definite $(x_0,y_0)$ is a point of local maximum.

$\triangledown ^2_{x,y}f(x_0,y_0)$ is neither positive nor negative i.e. Indefinite , $(x_0,y_0)$ is a saddle point

The determinant ($D$) of the Hessian matrix can be used to determine whether a critical point of a function is a local minimum, local maximum, or a saddle point. Here's how it works:

Local Minimum or Maximum Test:

If the determinant($D$) of the Hessian matrix at a critical point is positive , it indicates that the function is curving upward and downward in both directions of the critical point, suggesting the possibility of a local minimum or maximum.

If the determinant($D$) is negative , it indicates that the function is curving differently in different directions, which suggests that the critical point is a saddle point.

Determining Minima or Maxima:

To further classify whether the critical point is a local minimum or maximum, you can examine the signs of the second-order partial derivatives. Specifically, look at the signs of $\frac{\partial^2{f}}{\partial{x^2}}$ and $\frac{\partial^2{f}}{\partial{y^2}}$ (or corresponding partial derivatives in higher dimensions).

If both $\frac{\partial^2{f}}{\partial{x^2}}$ and $\frac{\partial^2{f}}{\partial{y^2}}$ are positive at the critical point (i.e., the leading principal minors of the Hessian matrix are positive), then it's a local minimum.

If both $\frac{\partial^2{f}}{\partial{x^2}}$ and $\frac{\partial^2{f}}{\partial{y^2}}$ are negative at the critical point, then it's a local maximum.

If one of them is positive while the other is negative, it's a saddle point.

Please note that this test applies to functions of two variables (i.e.,$f(x,y)$), and the classification of critical points can be more complex in higher dimensions. Additionally, when $D=0$, the test is inconclusive, and further analysis may be needed.

Example:

Let the function $f(x,y)= x^2+y^2$ . It's second order partial derivatives exist and they're continuous throughout the Domain .Find the Hessian Matrix

$\frac{\partial ^2f}{\partial x^2} =2$

$\frac{\partial ^2f}{\partial y^2} =2$

$\frac{\partial ^2f}{\partial x \partial y} =0$

$\frac{\partial ^2f}{\partial y \partial x} =0$

$H=\begin{bmatrix}
\frac{\partial ^2f}{\partial x^2} &\frac{\partial ^2f}{\partial x \partial y} \\
\frac{\partial ^2f}{\partial x \partial y}& \frac{\partial ^2f}{\partial y^2}
\end{bmatrix}$

$H=\begin{bmatrix}
2&0 \\
0& 2
\end{bmatrix}$

Suppose a function is defined by $f(x,y)=x^4-32x^2+y^4-18y^2$ . Find the maximum and minimum value of the function if it exists. Justify your answer.

$\frac{\partial ^2f}{\partial x^2} =12x^2-64$

$\frac{\partial ^2f}{\partial y^2} =12y^2-36$

$\frac{\partial ^2f}{\partial x \partial y} =0$

$\frac{\partial ^2f}{\partial y \partial x} =0$

$H=\triangledown ^2_{x,y}f(x,y)=\begin{bmatrix}
12x^2-64&0 \\
0& 12y^2-36
\end{bmatrix}$

We solve for the Stationary points of the function $f(x,y)$ by equating it's partial derivatives $\frac{\partial{f}}{\partial{x}}$ and $\frac{\partial{f}}{\partial{y}}$ to 0.

$$4x(x^2-16)=0⟹x=\pm4,0$$

$$4y(y^2-9)=0⟹y=\pm3,0$$

$The possible pairing gives us critical points$ $(±4,±3), (±4,0),(0,±3),(0,0)$

Now as the Hessian consists of even functions which reduces a lot of effort. we only need to check for the pairs $(4,3),(4,0),(0,3),(0,0)$.

$\triangledown ^2_{x,y}f(4,3)=\begin{bmatrix}
128&0 \\
0& 72
\end{bmatrix}$

It's positive definite matrix and thus it's the local minimum of the function.

$\triangledown ^2_{x,y}f(4,0)=\begin{bmatrix}
128&0 \\
0& -36
\end{bmatrix}$

It's indefinite thus ruled out.

$\triangledown ^2_{x,y}f(0,3)=\begin{bmatrix}
-64&0 \\
0&72
\end{bmatrix}$

It's indefinite thus ruled out.

$\triangledown ^2_{x,y}f(0,0)=\begin{bmatrix}
-64&0 \\
0&-36
\end{bmatrix}$

This is negative definite making it a local maximum of the function.

So $f(0,0)\ge f(x,y)\ge f(\pm4,\pm3)$

Thus we have bounded the above function and it's point of local minimum is $(\pm4,\pm3)$ and point of local maximum is $(0,0)$

$f(x,y)=x^3−12x+y^3−75y+91$ Find the local minima and maxima.

We'll follow these steps:

Calculate the first-order partial derivatives.
Find the critical points by setting both partial derivatives equal to zero.
Use the second-order partial derivatives to classify these critical points.

Step 1: Calculate the first-order partial derivatives:
$\frac{\partial{f}}{\partial{x}}=3x^2-12$

$\frac{\partial{f}}{\partial{x}}=3y^2-75$

Step 2: Find the critical points by setting both partial derivatives equal to zero:

\begin{aligned} �^{2} - 12 & = 0 (Equation 1) \\ 3 �^{2} - 75 & = 0 (Equation 2) \end{aligned}

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

3.9 Higher Order Derivatives - Hessian

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem