We will encounter situations where we need to take gradients of matrices with respect to vectors ( or other matrices), which results in multidimensional tensor.We can think of this tensor as a multidimensional array that collects partial derivatives. For example if we compute the gradient of an m×n matrix A with respect to a p×q matrix B, the resulting Jacobian would be (m×n)×(p×q).i.e, a four dimensional tensor J, whose entries are given as Jijkl=∂Aij∂Bkl
Since matrices represent linear mappings, we can exploit the fact that there is a vector-space isomorphism (linear, invertible mapping) between the space Rm×n of m×n matrices and the space Rmn of mn vectors.Therefore, we can re-shape our matrices into vectors of lengths mn and
pq, respectively. The gradient using these mn vectors results in a Jacobian Matrices can be of size mn×pq. The following Figure visualizes both approaches.
Example Problems:
Compute the derivatives dfdx of the following functions by using the chain rule. Provide the dimensions of every single partial derivative. Describe your steps in detail.
f(z)=sin(z),z=Ax+b,A∈RE×D,x∈RD,b∈RE
Comments
Post a Comment