In many machine learning applications, we find good model parameters by performing gradient descent, which relies on the fact that we can compute the gradient of a learning objective with respect to the parameters of the model. For a given objective function, we can obtain the gradient with respect to the model parameters using calculus and applying the chain rule. We already seen the gradient of a squared loss with respect to the parameters of a linear regression model. Consider the function $f(x)=\sqrt{(x^2+exp(x^2)}+cos(x^2+exp(x^2)$ By application of the chain rule, and noting that differentiation is linear,we compute the gradient $\frac{\mathrm{d} f}{\mathrm{d} x}=\frac{2x + 2x\, exp(x^2)}{2\,\sqrt{x+exp(x^2)}}-sin(x^2+exp(x^2))(2x+exp(x^2)2x)$ Writing out the gradient in this explicit way is often impractical since it often results in a very lengthy expression for a derivative. In practice,it means that, if we are not careful, the impleme...
This blog is written for the following two courses of KTU using python. CST284-Mathematics for Machine Learning-KTU Minor course and CST294-Computational Fundamentals for Machine Learning-KTU honors course. Queries can be send to Dr Binu V P. 9847390760