The generalization of the derivative to functions of several variables is the gradient.Where the function $f$ depends on one or more variables $x \in R^n$, e.g.,$f(x) = f(x1,x2)$.
We find the gradient of the function $f$ with respect to $x$ by varying one variable at a time and keeping the others constant. The gradient is then the collection of these partial derivatives.
Definition Partial Derivative
For a function $f: R^n \to R$, $x \to f(x), x \in R^n$ of $n$ variables $x_1,x_2,\ldots,x_n$, we define partial derivatives as
$\frac{\partial f}{\partial x_1}=\lim_{h \to 0}\frac{f(x_1+h,x_2,\ldots,x_n)-f(x))}{h}$
$\vdots$
$\frac{\partial f}{\partial x_n}=\lim_{h \to 0}\frac{f(x_1,x_2,\ldots,x_n+h)-f(x))}{h}$
and collect them in a row vector.The row vector is called the gradient of $f$ or the Jacobian and is the generalization of the derivative form.
Example:
if $f(x1,x2)=x_1^2x_2+x_1x_2^3 \in R$, then the derivative of $f$ with respect to $x_1$ and $x_2$ are.
$\frac{\partial f(x_1,x_2)}{\partial x_1}=2x_1x_2+x_2^3$
$\frac{\partial f(x_1,x_2)}{\partial x_2}=x_1^2+3x_1x_2^2$
and the gradient is then
$\frac{\mathrm{d}f }{\mathrm{d} x}=\left [ \frac{\partial f(x_1,x_2)}{\partial x_1}\quad \frac{\partial f(x_1,x_2)}{\partial x_2}\right ]=\left[ 2x_1x_2+x_2^3 \quad x_1^2+3x_1x_2^2 \right ] \in R^{1 \times 2}$
The gradient corresponds to the rate of steepest ascent/descent
$grad(f(a)).\vec{v}==|grad(f(a))||\vec{v}|cos(\theta)$
Since $\vec{v}$ is unit vector, we have $|grad(f)|cos(θ)$, which is maximal when $cos(θ)=1$ in particular when $\vec{v}$ points in the same direction as $grad(f(a))$
Basic Rules of Partial Differentiation
Product rule:
$\frac{\partial (f(x)g(x))}{\partial x}=\frac{\partial f}{\partial x}.g(x)+f(x).\frac{\partial g}{\partial x}$
Sum Rule:
$\frac{\partial }{\partial x}(f(x)+g(x))=\frac{\partial f}{\partial x}+ \frac{\partial g}{\partial x}$
Chain Rule:
$\frac{\partial }{\partial x}(g \circ f)(x)=\frac{\partial }{\partial x}(g(f(x)))=\frac{\partial g }{\partial f}\frac{\partial f }{\partial x}$
Example Partial derivatives using chain rule:
if $f(x,y)=(x+2y^3)^2$, we obtain partial derivatives
$\frac{\partial f(x,y)}{\partial x}=2(x+2y^3)\frac{\partial }{\partial x}(x+2y^3)=2(x+2y^3)$
$\frac{\partial f(x,y)}{\partial y}=2(x+2y^3)\frac{\partial }{\partial y}(x+2y^3)=12(x+2y^3)y^2$
Chain Rule
Chain rule is widely used in machine learning especially in neural network during the back propagation.
consider a function $f: R^2 \mapsto R$ of two variables $x_1,x_2$.Furthermore $x_1(t)$ and $x_2(t)$ are themselves functions of $t$.To compute the gradiant of $f$ with respect to $t$, we need to apply the chain rule for multivariate function as.
$\frac{\mathrm{d}f }{\mathrm{d} t}=\left [ \frac{\partial f }{\partial x_1} \quad \frac{\partial f }{\partial x_2}\right ] \begin{bmatrix}
\frac{\partial x_1(t)}{\partial t} \\\frac{\partial x_2(t)}{\partial t}
\end{bmatrix}=\frac{\partial f }{\partial x_1}\frac{\partial x_1 }{\partial t}+\frac{\partial f }{\partial x_2}\frac{\partial x_2 }{\partial t}$
where $d$ denotes the gradient and $\partial$ denotes partial derivatives.
Example:
consider $f(x_1,x_2)=x_1^2+2x_2$, where $x_1=sin(t)$ and $x_2=cos(t)$, then
$\frac{\mathrm{d}f }{\mathrm{d} t}=\frac{\partial f }{\partial x_1}\frac{\partial x_1 }{\partial t}+\frac{\partial f }{\partial x_2}\frac{\partial x_2 }{\partial t}$
$=2x_1cost+2.-sin(t)$
$=2sin(t)cos(t)-2sin(t)$
$=2sin(t)(cos(t)-1)$
is the corresponding derivative of $f$ with respect to $t$.
If $f(x_1,x_2)$ is a function of $x_1$ and $x_2$, where $x_1(s,t)$ and $x_2(s,t)$ are themselves functions of two variables $s$ and $t$, the chain rule yields the partial derivatives.
$\frac{\mathrm{d}f }{\mathrm{d}s}=\frac{\mathrm{d}f }{\mathrm{d}x_1}\frac{\mathrm{d}x_1}{\mathrm{d} s}+\frac{\mathrm{d}f }{\mathrm{d} x_2}\frac{\mathrm{d}x_2 }{\mathrm{d}s}$
$\frac{\mathrm{d}f }{\mathrm{d}t}=\frac{\mathrm{d}f }{\mathrm{d}x_1}\frac{\mathrm{d}x_1}{\mathrm{d} t}+\frac{\mathrm{d}f }{\mathrm{d} x_2}\frac{\mathrm{d}x_2 }{\mathrm{d}t}$
$\frac{\mathrm{d}f }{\mathrm{d}(s,t)}=\frac{\mathrm{d}f }{\mathrm{d}x}\frac{\mathrm{d}x}{\mathrm{d} (s,t)}=\begin{bmatrix}
\frac{\mathrm{d}f }{\mathrm{d}x_1} & \frac{\mathrm{d}f }{\mathrm{d}x_2}\end{bmatrix}
\begin{bmatrix}
\frac{\mathrm{d}x_1 }{\mathrm{d}s} & \frac{\mathrm{d}x_1}{\mathrm{d}t}\\
\frac{\mathrm{d}x_2 }{\mathrm{d}s} & \frac{\mathrm{d}x_2}{\mathrm{d}t}\\
\end{bmatrix}$
Example Problems
Find the gradient of $f(x,y)=x^2y$ at the point (3,2)
The gradient is just the vector of partial derivativesThe gradient is
$\begin{bmatrix}
2xy & x^2 \\
\end{bmatrix}$
The Gradient at (3,2) is
$\begin{bmatrix}
12 & 9 \\
\end{bmatrix}$
$z = 100 - 0.4x^2 - 0.3y^2$
where $z$ denotes height.
Comments
Post a Comment