We will encounter situations where we need to take gradients of matrices with respect to vectors ( or other matrices), which results in multidimensional tensor.We can think of this tensor as a multidimensional array that collects partial derivatives. For example if we compute the gradient of an $m \times n$ matrix $A$ with respect to a $p \times q$ matrix $B$, the resulting Jacobian would be $(m \times n) \times ( p \times q)$.i.e, a four dimensional tensor $J$, whose entries are given as $J_{ijkl}=\frac{\partial A_{ij}}{\partial B_{kl}}$
Since matrices represent linear mappings, we can exploit the fact that there is a vector-space isomorphism (linear, invertible mapping) between the space $R^{m \times n}$ of $m \times n$ matrices and the space $R^{mn}$ of $mn$ vectors.Therefore, we can re-shape our matrices into vectors of lengths $mn$ and
$pq$, respectively. The gradient using these $mn$ vectors results in a Jacobian Matrices can be of size $mn \times pq$. The following Figure visualizes both approaches.
Example Problems:
Compute the derivatives $\frac{\mathrm{d} f}{\mathrm{d} x}$ of the following functions by using the chain rule. Provide the dimensions of every single partial derivative. Describe your steps in detail.
$f(z)=sin(z), \quad z=Ax+b , \quad A \in \mathbb{R}^{E \times D }, x \in \mathbb{R}^D, b \in \mathbb{R}^E$
Compute the derivatives $\frac{\mathrm{d} f}{\mathrm{d} x}$ of the following functions by using the chain rule. Provide the dimensions of every single partial derivative. Describe your steps in detail.
$f(z)=exp(\frac{-1}{2}z)$
$z=g(y)=y^TS^{-1}y$
$y=h(x)=x-\mu$
where $x,\mu \in \mathbb{R}^D$, $S \in \mathbb{R}^{D \times D}$
Comments
Post a Comment