Computing the gradient can be very time consuming. However, often it is possible to find a “cheap” approximation of the gradient. Approximating the gradient is still useful as long as it points in roughly the same direction as the true gradient. Stochastic gradient descent (often shortened as SGD) is a stochastic approximation of the gradient descent method for minimizing an objective function that is written as a sum of differentiable functions. The word stochastic here refers to the fact that we acknowledge that we do not know the gradient precisely, but instead only know a noisy approximation to it. By constraining the probability distribution of the approximate gradients, we can still theoretically guarantee that SGD will converge. In machine learning, given $n = 1,\ldots,N$ data points, we often consider objective functions that are the sum of the losses $L(\theta)$ incurred by each example $n$. In mathematical notation, we have the form $L(\theta)=\sum_{n=1}^N ...
This blog is written for the following two courses of KTU using python. CST284-Mathematics for Machine Learning-KTU Minor course and CST294-Computational Fundamentals for Machine Learning-KTU honors course. Queries can be send to Dr Binu V P. 9847390760