3.3 Partial Differentiation and Gradients-Jacobian

The generalization of the derivative to functions of several variables is the gradient.Where the function $f$ depends on one or more variables $x \in R^n$, e.g.,$f(x) = f(x1,x2)$.

We find the gradient of the function $f$ with respect to $x$ by varying one variable at a time and keeping the others constant. The gradient is then the collection of these partial derivatives.

Definition Partial Derivative

For a function $f: R^n \to R$, $x \to f(x), x \in R^n$ of $n$ variables $x_1,x_2,\ldots,x_n$, we define partial derivatives as

$\frac{\partial f}{\partial x_1}=\lim_{h \to 0}\frac{f(x_1+h,x_2,\ldots,x_n)-f(x))}{h}$

$\vdots$

$\frac{\partial f}{\partial x_n}=\lim_{h \to 0}\frac{f(x_1,x_2,\ldots,x_n+h)-f(x))}{h}$

and collect them in a row vector.The row vector is called the gradient of $f$ or the Jacobian and is the generalization of the derivative form.

Example:

if $f(x1,x2)=x_1^2x_2+x_1x_2^3 \in R$, then the derivative of $f$ with respect to $x_1$ and $x_2$ are.

$\frac{\partial f(x_1,x_2)}{\partial x_1}=2x_1x_2+x_2^3$

$\frac{\partial f(x_1,x_2)}{\partial x_2}=x_1^2+3x_1x_2^2$

and the gradient is then

$\frac{\mathrm{d}f }{\mathrm{d} x}=\left [ \frac{\partial f(x_1,x_2)}{\partial x_1}\quad \frac{\partial f(x_1,x_2)}{\partial x_2}\right ]=\left[ 2x_1x_2+x_2^3 \quad x_1^2+3x_1x_2^2 \right ] \in R^{1 \times 2}$

The gradient corresponds to the rate of steepest ascent/descent

Each component of the gradient tells you how fast your function is changing with respect to the standard basis. It's not too far-fetched then to wonder, how fast the function might be changing with respect to some arbitrary direction? Letting $\vec{v}$ denote a unit vector, we can project along this direction in the natural way, namely via the dot product $grad(f(a)).\vec{v}$. This is a fairly common definition of the directional derivative.

We can then ask in what direction is this quantity maximal? You'll recall that
$grad(f(a)).\vec{v}==|grad(f(a))||\vec{v}|cos(\theta)$

Since $\vec{v}$ is unit vector, we have $|grad(f)|cos(θ)$, which is maximal when $cos(θ)=1$ in particular when $\vec{v}$ points in the same direction as $grad(f(a))$

Basic Rules of Partial Differentiation

Product rule:

$\frac{\partial (f(x)g(x))}{\partial x}=\frac{\partial f}{\partial x}.g(x)+f(x).\frac{\partial g}{\partial x}$

Sum Rule:

$\frac{\partial }{\partial x}(f(x)+g(x))=\frac{\partial f}{\partial x}+ \frac{\partial g}{\partial x}$

Chain Rule:

$\frac{\partial }{\partial x}(g \circ f)(x)=\frac{\partial }{\partial x}(g(f(x)))=\frac{\partial g }{\partial f}\frac{\partial f }{\partial x}$

Example Partial derivatives using chain rule:

if $f(x,y)=(x+2y^3)^2$, we obtain partial derivatives

$\frac{\partial f(x,y)}{\partial x}=2(x+2y^3)\frac{\partial }{\partial x}(x+2y^3)=2(x+2y^3)$

$\frac{\partial f(x,y)}{\partial y}=2(x+2y^3)\frac{\partial }{\partial y}(x+2y^3)=12(x+2y^3)y^2$

Chain Rule

Chain rule is widely used in machine learning especially in neural network during the back propagation.

consider a function $f: R^2 \mapsto R$ of two variables $x_1,x_2$.Furthermore $x_1(t)$ and $x_2(t)$ are themselves functions of $t$.To compute the gradiant of $f$ with respect to $t$, we need to apply the chain rule for multivariate function as.

$\frac{\mathrm{d}f }{\mathrm{d} t}=\left [ \frac{\partial f }{\partial x_1} \quad \frac{\partial f }{\partial x_2}\right ] \begin{bmatrix}
\frac{\partial x_1(t)}{\partial t} \\\frac{\partial x_2(t)}{\partial t}
\end{bmatrix}=\frac{\partial f }{\partial x_1}\frac{\partial x_1 }{\partial t}+\frac{\partial f }{\partial x_2}\frac{\partial x_2 }{\partial t}$

where $d$ denotes the gradient and $\partial$ denotes partial derivatives.

Example:

consider $f(x_1,x_2)=x_1^2+2x_2$, where $x_1=sin(t)$ and $x_2=cos(t)$, then

$\frac{\mathrm{d}f }{\mathrm{d} t}=\frac{\partial f }{\partial x_1}\frac{\partial x_1 }{\partial t}+\frac{\partial f }{\partial x_2}\frac{\partial x_2 }{\partial t}$

$=2x_1cost+2.-sin(t)$

$=2sin(t)cos(t)-2sin(t)$

$=2sin(t)(cos(t)-1)$

is the corresponding derivative of $f$ with respect to $t$.

If $f(x_1,x_2)$ is a function of $x_1$ and $x_2$, where $x_1(s,t)$ and $x_2(s,t)$ are themselves functions of two variables $s$ and $t$, the chain rule yields the partial derivatives.

$\frac{\mathrm{d}f }{\mathrm{d}s}=\frac{\mathrm{d}f }{\mathrm{d}x_1}\frac{\mathrm{d}x_1}{\mathrm{d} s}+\frac{\mathrm{d}f }{\mathrm{d} x_2}\frac{\mathrm{d}x_2 }{\mathrm{d}s}$

$\frac{\mathrm{d}f }{\mathrm{d}t}=\frac{\mathrm{d}f }{\mathrm{d}x_1}\frac{\mathrm{d}x_1}{\mathrm{d} t}+\frac{\mathrm{d}f }{\mathrm{d} x_2}\frac{\mathrm{d}x_2 }{\mathrm{d}t}$

$\frac{\mathrm{d}f }{\mathrm{d}(s,t)}=\frac{\mathrm{d}f }{\mathrm{d}x}\frac{\mathrm{d}x}{\mathrm{d} (s,t)}=\begin{bmatrix}
\frac{\mathrm{d}f }{\mathrm{d}x_1} & \frac{\mathrm{d}f }{\mathrm{d}x_2}\end{bmatrix}
\begin{bmatrix}
\frac{\mathrm{d}x_1 }{\mathrm{d}s} & \frac{\mathrm{d}x_1}{\mathrm{d}t}\\
\frac{\mathrm{d}x_2 }{\mathrm{d}s} & \frac{\mathrm{d}x_2}{\mathrm{d}t}\\
\end{bmatrix}$

Example Problems

Find the gradient of $f(x,y)=x^2y$ at the point (3,2)

The gradient is just the vector of partial derivatives

$\frac{\mathrm{d}f }{\mathrm{d}x}= 2xy$

$\frac{\mathrm{d}f }{\mathrm{d}y}=x^2$
The gradient is
$\begin{bmatrix}
2xy & x^2 \\
\end{bmatrix}$
The Gradient at (3,2) is
$\begin{bmatrix}
12 & 9 \\
\end{bmatrix}$

Let $f(x,y,z)=xye^{x^2+z^2-5}$. Calculate the gradient of $f$ at the point $(1,3,-2)$

$\bigtriangledown f(x,y,z)=\left[\frac{\partial f}{\partial x}\quad \frac{\partial f}{\partial y} \quad \frac{\partial f}{\partial z}\right]$

$\frac{\partial f}{\partial x}=y(x.e^{x^2+z^2-5}.2x + e^{x^2+z^2-5})=(y+2x^2y)e^{x^2+z^2-5}$

$\frac{\partial f}{\partial y}=(x.e^{x^2+z^2-5})$

$\frac{\partial f}{\partial z}=(2xyz.e^{x^2+z^2-5})$

Therefore

$\bigtriangledown f(1,3,-2)=\left[9\quad 1 \quad -12\right]$

$g(x,y)=\frac{x^2y}{x^2+y^2}$ if $(x,y)!=0$ Find the partial derivative of $g(x,y)$ at $(0,0)$.

Note that the partial derivative

$\frac{\partial g}{\partial x}(0,0)=0$

$\frac{\partial g}{\partial y}(0,0)=0$

For a scalar function $f(x, y, z ) = x^2 +3y^2 +2z^2$, find the gradient and its magnitude at the point $(1,2-1)$ , university question

$\frac{\partial f}{\partial x}=2x$

$\frac{\partial f}{\partial y}=6y$

$\frac{\partial f}{\partial y}=4z$

The gradient is

$\bigtriangledown f(x,y,z)=\left[\frac{\partial f}{\partial x}\quad \frac{\partial f}{\partial y} \quad \frac{\partial f}{\partial z}\right]$

There fore

$\bigtriangledown f(x,y,z)=\left[2x \quad 6y \quad 4z \right]$

$\bigtriangledown f(1,2,-1)=\left[2\quad 12 \quad -4\right]$

Suppose you were trying to minimize $f(x, y) = x^2+ 2y + 2y^2$. Along what vector should you travel from $(5, 12)$.

In order to minimize we should travel in the -ve direction of the gradient

$\frac{\partial f}{\partial x}=2x$

$\frac{\partial f}{\partial y}=2+4y$

$\bigtriangledown f(x,y)=\left[2x \quad 2+4y \right]$

gradient at (5,12) is

$\bigtriangledown f(5,12)=\left[10\quad 50 \right]$

In order to minimize travel in the direction $- \left[10\quad 50 \right]$

A skier is on a mountain with equation
$z = 100 - 0.4x^2 - 0.3y^2$
where $z$ denotes height.

(a) The skier is located at the point with xy-coordinates $(1,1)$, and wants to ski downhill along the steepest possible path. In which direction (indicated by a vector (a; b) in the xy-plane) should the skier begin skiing?

Solution:

Direction of greatest rate of decrease is opposite of direction of gradient.

$\bigtriangledown g(x,y)=\left[ -0.8x \quad -0.6y \right]$

$\bigtriangledown g(1,1)=\left[ -0.8\quad -0.6 \right]$

The gradient vector having magnitude 1 .So the unit vector in the opposite direction is

$u= -\bigtriangledown g(1,1)=\left[ 0.8 \quad 0.6 \right]$

The skier begins skiing in the direction given by the xy-vector $(a ,b)$ you found in part (i), so the skier heads in a direction in space given by the vector $(a , b , c)$. Find the value of c.

Solution:

$D_ug(1,1) = g(1, 1) . u = (-u) .u = -1$

gives the slope. which is the ratio of vertical change to horizontal change. In the direction of the vector $(a,b,c)$.This ratio is $\frac{c}{\sqrt{a^2+b^2}}$

. So

$D_ug(1, 1) =\frac{c}{\sqrt{a^2+b^2}}=\frac{c}{1}= c$

$-1=c$

Find the direction of greatest increase of the function $f(x,y)=4x^2+y^2+2y$ at the point $P(1,2)$ ( university question)

$\frac{\partial f}{\partial x}=8x$

$\frac{\partial f}{\partial y}=2y+2$

$\bigtriangledown f(x,y)=\left[8x \quad 2y+2 \right]$

gradient at (1,2) is

$\bigtriangledown f(1,2)=\left[8\quad 6 \right]=8i+6j$

Find the partial derivative and gradient of the function $f(x,y,z)=x^5e^{2z}/y$ (university question)

$\frac{\partial f}{\partial x}=ye^{2z}5x^4/y^2=e^{2z}5x^4/y$

$\frac{\partial f}{\partial y}=-x^5e^{2z}/y^2$

$\frac{\partial f}{\partial z}=y x^5e^{2z}.2/y^2=2x^5e^{2z}/y$

$\bigtriangledown f(x,y,z)=\left[e^{2z}5x^4/y \quad -x^5e^{2z}/y^2 \quad 2x^5e^{2z}/y \right]$

Find the gradient of the function $f(x,y)=x^2+y^2$ at the point $(x,y)=(1,5)$ university question

$\frac{\partial f}{\partial x}=2x$

$\frac{\partial f}{\partial y}=2y$

$\bigtriangledown f(x,y)=\left[2x \quad 2y \right]$

gradient at (1,5) is

$\bigtriangledown f(1,5)=\left[2\quad 10 \right]$

Look for more example problems

https://mathinsight.org/partial_derivative_examples

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

3.3 Partial Differentiation and Gradients-Jacobian

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem