3.5 Gradients of Matrices

We will encounter situations where we need to take gradients of matrices with respect to vectors ( or other matrices), which results in multidimensional tensor.We can think of this tensor as a multidimensional array that collects partial derivatives. For example if we compute the gradient of an $m \times n$ matrix $A$ with respect to a $p \times q$ matrix $B$, the resulting Jacobian would be $(m \times n) \times ( p \times q)$.i.e, a four dimensional tensor $J$, whose entries are given as $J_{ijkl}=\frac{\partial A_{ij}}{\partial B_{kl}}$

Since matrices represent linear mappings, we can exploit the fact that there is a vector-space isomorphism (linear, invertible mapping) between the space $R^{m \times n}$ of $m \times n$ matrices and the space $R^{mn}$ of $mn$ vectors.Therefore, we can re-shape our matrices into vectors of lengths $mn$ and

$pq$, respectively. The gradient using these $mn$ vectors results in a Jacobian Matrices can be of size $mn \times pq$. The following Figure visualizes both approaches.

Example Problems:

Compute the derivatives $\frac{\mathrm{d} f}{\mathrm{d} x}$ of the following functions by using the chain rule. Provide the dimensions of every single partial derivative. Describe your steps in detail.

$f(z)=sin(z), \quad z=Ax+b , \quad A \in \mathbb{R}^{E \times D }, x \in \mathbb{R}^D, b \in \mathbb{R}^E$

$f(z)=exp(\frac{-1}{2}z)$

$z=g(y)=y^TS^{-1}y$

$y=h(x)=x-\mu$

where $x,\mu \in \mathbb{R}^D$, $S \in \mathbb{R}^{D \times D}$

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

3.5 Gradients of Matrices

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem