Mathematics for Machine Learning with Python- CST284 KTU Minor

Posts

As data scientists we work with data in various formats such as text images and numerical values We often use vectors to represent data in a structured and efficient manner especially in machine learning applications In this blog post we will explore what vectors are in terms of machine learning their significance and how they are used What is a Vector? In mathematics, a vector is a mathematical object that has both magnitude and direction. In machine learning, a vector is a mathematical representation of a set of numerical values. Vectors are usually represented as arrays or lists of numbers, and each number in the list represents a specific feature or attribute of the data. For example, suppose we have a dataset of houses, and we want to predict their prices based on their features such as the number of bedrooms, the size of the house, and the location. We can represent each house as a vector, where each element of the vector represents a specific feature of the house, such as the nu

Difference between Batch Gradient Descent and Stochastic Gradient Descent

Gradient Descent Gradient Descent is a generic optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea is to tweak parameters iteratively in order to minimize the cost function. An important parameter of Gradient Descent (GD) is the size of the steps, determined by the learning rate hyperparameters. If the learning rate is too small, then the algorithm will have to go through many iterations to converge, which will take a long time, and if it is too high we may jump the optimal value. Note: When using Gradient Descent, we should ensure that all features have a similar scale (e.g. using Scikit-Learn’s StandardScaler class), or else it will take much longer to converge. Types of Gradient Descent: Typically, there are three types of Gradient Descent: Batch Gradient Descent Stochastic Gradient Descent Mini-batch Gradient Descent Stochastic Gradient Descent (SGD): The word ‘stochastic‘ means a system or process linked with a random probabil

Orthogonal Subspace

we said that two vectors $v$ and $w$ are orthogonal if their dot product, $v . w$, is 0. In $R^2$ or $R^3$ this matches our geometric understanding of orthogonal, and in higher dimensions the idea still applies, even though we can’t visualize it. Consider the vectors $a=\begin{pmatrix} 1 \\ 2 \\ 3\\ 4 \end{pmatrix}, b=\begin{pmatrix} 2 \\ 1 \\ 0\\ -1 \end{pmatrix}$ These two vectors are orthogonal because their dot product $1.2+2.1+3.0+4.-1=0$ Now, we can extend these definitions to subspaces of a vector space. Definition -Two subspaces $V$ and $W$ of a vector space are orthogonal if every vector $v \in V$ is perpendicular to every vector $w \in W$. As a simple example, in $R^2$ the span of $\begin{pmatrix}1 \\ 0 \\ \end{pmatrix}$ is the set of all vectors of the form $\begin{pmatrix}c \\ 0 \\ \end{pmatrix}$, where $c$ is some real number, while the span of $\begin{pmatrix}0\\ 1 \\ \end{pmatrix}$ is the set of all vectors of the form $\begin{pmatrix}0\\ d \\ \end{pmatrix}$,where $d$ i

5.7 Quadratic Programming

Consider the case of a convex quadratic objective function, where the constraints are affine i.e., $min_{x \in \mathbb{R}^d} \quad \frac{1}{2}x^TQx+c^Tx$ subject to $Ax \le b$ Where $A \in \mathbb{R}^{m \times d},b \in \mathbb{R}^m$ and $c \in \mathbb{R}^d$. The square symmetric matrix $ Q \in \mathbb{R}^{d \times d}$ is positive definite, and therefore the objective function is convex.This is known as a quadratic program . Observe that it has $d$ variables and $m$ linear constraints. Consider the quadratic program of two variables $min_{x \in \mathbb{R}^2}\frac{1}{2}\begin{bmatrix} x_1\\ x_2 \end{bmatrix}^T\begin{bmatrix} 2 & 1\\ 1 & 4 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix}+\begin{bmatrix} 5\\ 3 \end{bmatrix}^T\begin{bmatrix} x_1\\ x_2 \end{bmatrix}$ subject to $\begin{bmatrix} 1 & 0\\ -1& 0\\ 0& 1\\ 0& -1 \end{bmatrix}\begin{bmatrix} x_1\\ x_2 \end{bmatrix} \le \begin{bmatrix} 1\\ 1\\ 1\\ 1 \end{bmatrix}$ This is illustrated in figure The

5.6 Linear Programming

In Mathematics, linear programming is a method of optimising operations with some constraints. The main objective of linear programming is to maximize or minimize the numerical value. It consists of linear functions which are subjected to the constraints in the form of linear equations or in the form of inequalities. Linear programming is considered an important technique that is used to find the optimum resource utilisation. The term “linear programming” consists of two words as linear and programming. The word “linear” defines the relationship between multiple variables with degree one. The word “programming” defines the process of selecting the best solution from various alternatives. Linear Programming is widely used in Mathematics and some other fields such as economics, business, telecommunication, and manufacturing fields. In this article, let us discuss the definition of linear programming, its components, and different methods to solve linear programming problems. What is L

5.5 Convex Optimization

Convex optimization problems are a particularly useful class of optimization problems, where we can guarantee global optimality. When $f(.)$ is a convex function, and when the constraints involving $g(.)$ and $h(.)$ are convex sets, this is called a convex optimization problem. In this setting, we have strong duality: The optimal solution of the dual problem is the same as the optimal solution of the primal problem. A set $C$ is a convex set if for any $x, y \in C$ and for any scalar $\theta$ with $0 \le \theta \le 1$, we have $\theta x + (1 - \theta)y \in C $. Convex sets are sets such that a straight line connecting any two elements of the set lie inside the set. Figures 7.5 and 7.6 illustrate convex and non convex sets, respectively. Convex functions are functions such that a straight line between any two points of the function lie above the function. Non convex function Convex Function Definition Let function $f : \mathbb{R}^D \to \mathbb{R}$ be a function whose domain is a c

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

Posts

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

Course outcomes and Model Question Paper Mathematics for Machine Learning- CST 284 - KTU

Question paper June 2022 Mathematics for Machine Learning- CST 284 - KTU

Vectors in Machine Learning

Difference between Batch Gradient Descent and Stochastic Gradient Descent

Orthogonal Subspace

5.7 Quadratic Programming

5.6 Linear Programming

5.5 Convex Optimization