Skip to main content

NumPy arrays in Machine Learning

Arrays are the main data structure used in machine learning. In Python, arrays from the NumPy
library, called N-dimensional arrays or the ndarray, are used as the primary data structure for
representing data. 

NumPy (Numerical Python)is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.
To use the numpy package in your program, you have to import the package as follows
import numpy as np
Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non negative integers. In NumPy dimensions are called axes.The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
We can initialize numpy arrays from nested Python lists, and access elements using square brackets:
Python array indexing start from 0.
Example:
import numpy as np
a = np.array([1, 2, 3]) # Create a rank 1 array 
print(type(a)) # Prints "<class 'numpy.ndarray'>" 
print(a.shape) # Prints "(3,)" 
print(a[0], a[1], a[2]) # Prints "1 2 3" 
a[0] = 5 # Change an element of the array 
print(a) # Prints "[5, 2, 3]" 
print(a.size) # prints 3 
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array 
print(b.shape) # Prints "(2, 3)" 
print(b.ndim) # Prints 2
print(b[0, 0], b[0, 1], b[1, 0]) # Prints "1 2 4"

Numpy also provides many functions for intrinsic array  creation:
import numpy as np 
a = np.zeros((2,2)) # Create an array of all zeros 
print(a) # Prints "[[ 0. 0.]  [ 0. 0.]]" 
b = np.ones((1,2)) # Create an array of all ones
print(b) # Prints "[[ 1. 1.]]" 
c = np.full((2,2), 7) # Create a constant array 
print(c) # Prints "[[ 7. 7.]  [ 7. 7.]]" 
d = np.eye(2) # Create a 2x2 identity matrix 
print(d) # Prints "[[ 1. 0.] [ 0. 1.]]"
e = np.random.random((2,2)) # Create an array filled with random values 
print(e) # Might print "[[ 0.91940167    0.08143941] [ 0.68744134   0.87236687]]"
>>> np.arange(10)
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 >>> np.arange(2, 10, dtype=float) 
array([ 2., 3., 4., 5., 6., 7., 8., 9.]) 
>>> np.arange(2, 3, 0.1) 
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
 [[ 0 1 2] 
[ 3 4 5] 
[ 6 7 8] 
[ 9 10 11]]
>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2 array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ]) >>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points 
>>> f = np.sin(x)
Array Indexing, Slicing and Iterating
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
>>> a = np.arange(10)**3 
>>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
 >>> a[2] 
>>> a[2:5]
 array([ 8, 27, 64]) 
# equivalent to a[0:6:2] = 1000; # from start to position 6, exclusive, set every 2nd element to 1000 
>>> a[:6:2] = 1000 
>>> a 
array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729])
 >>> a[ : :-1] # reversed a 
array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000]) 
>>> for i in a:
     ... print(i**(1/3.))
  ...
 9.999999999999998 
1.0 
9.999999999999998 
3.0
 9.999999999999998 
4.999999999999999 
5.999999999999999 
6.999999999999999 
7.999999999999999 
8.999999999999998
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) 
# Use slicing to pull out the sub array consisting of the first 2 rows  and columns 1 and 2; b is the following array of shape (2, 2): 
>>>b = a[:2, 1:3]
>>>b
[[2 3] 
 [6 7]] 
 Two ways of accessing the data in the middle row of the array.  Mixing integer indexing with slices yields an array of lower rank,  while using only slices yields an array of the same rank as the original array: 
>>>row_r1 = a[1, :] # Rank 1 view of the second row of a row_
>>>r2 = a[1:2, :] # Rank 2 view of the second row of a 
print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)" 
print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)" 
 # We can make the same distinction when accessing columns of an array: 
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:
import numpy as np # Create a new array from which we will select elements 
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
print(a) # prints "array([[ 1, 2, 3],  [ 4, 5, 6],  [ 7, 8, 9],  [10, 11, 12]])" 
 # Create an array of indices 
b = np.array([0, 2, 0, 1])
 # Select one element from each row of a using the indices in b 
print(a[np.arange(4), b]) # Prints "[ 1 6 7 11]" 
 # Mutate one element from each row of a using the indices in b 
a[np.arange(4), b] += 10
 print(a) # prints "array([[11, 2, 3],  [ 4, 5, 16],  [17, 8, 9],  [10, 21, 12]])
b= array([[ 0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]])
 >>> b[2,3] 
23 
>>> b[0:5, 1] # each row in the second column of b 
array([ 1 11 21 31 41]) 
>>> b[ : ,1] # equivalent to the previous example array([ 1, 11, 21, 31, 41]) 
>>> b[1:3, : ] # each column in the second and third row of b
 array([[10 11 12 13], [20 21 22 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:
>>>>>> b[-1] # the last row. Equivalent to b[-1,:] 
array([40 41 42 43])
The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1,2,...] is equivalent to x[1,2,:,:,:]

Indexing with Arrays of Indices
import numpy as np
a = np.array([10,11,12,13,14,15,16,17,18,19,20])
i=np.array([3,4,5])
print(a[i]) # will print 13,14,15
j = np.array([[3, 4], [5, 6]) # a bidimensional array of indices
printf( a[j]) # will print  array([[ 13 14] [15 16]])  the same shape as j 
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([1,2])
print(a[i]) # will print [[13 14 15][16 17 18]]
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
print(a[i])
#will print
[[[13 14 15]
[16 17 18]]

[[16 17 18]
[19 20 21]]]
We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
j=np.array([[1,1],[2,2]])
print(a[i,j])
#will print
[[14 17]
[18 21]]
Another common use of indexing with arrays is the search of the maximum value of time-dependent series:
>>>>>> time = np.linspace(20, 145, 5) # time scale 
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series 
>>> time 
array([ 20. , 51.25, 82.5 , 113.75, 145. ]) 
>>> data 
array([[ 0. , 0.84147098, 0.90929743, 0.14112001], 
           [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], 
           [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], 
           [-0.53657292, 0.42016704, 0.99060736, 0.65028784], 
           [-0.28790332, -0.96139749, -0.75098725, 0.14987721]]) 
 # index of the maxima for each series 
>>> ind = data.argmax(axis=0)
>>> ind
 array([2, 0, 3, 1]) # times corresponding to the maxima 
>>> time_max = time[ind] 
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]... 
 >>> time_max 
array([ 82.5 , 20. , 113.75, 51.25]) 
>>> data_max 
array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
You can also use indexing with arrays as a target to assign to:>>> a = np.arange(5) 
>>> a 
array([0, 1, 2, 3, 4]) 
>>> a[[1,3,4]] = 0 
>>> a
 array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>>>>> a = np.arange(5) 
>>> a[[0,0,2]]=[1,2,3] 
>>> a 
array([2, 1, 3, 3, 4])
Boolean array indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]]) 
bool_idx = (a > 2) 
# Find the elements of a that are bigger than 2
# this returns a numpy array of Booleans of the same # shape as a, where each slot of bool_idx tells 
# whether that element of a is > 2.
print(bool_idx) # Prints "[[False False]  [ True True] # [ True True]]" 
# We use boolean array indexing to construct a rank 1 array 
# consisting of the elements of a corresponding to the True values # of bool_idx 
print(a[bool_idx]) # Prints "[3 4 5 6]"
Lets see another example:
>>>a=np.arange(12).reshape(3,4)
>>> b1= np.array([False,True,True]) # first dim selection 
 >>> a[b1,:] # selecting rows 
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) 
Iterating over multidimensional arrays is done with respect to the first axis:
a = np.array([[1,2], [3, 4], [5, 6]])
>>> for row in a:
... print(row)
...
[1 2]
[3 4]
[5 6]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:
>>>for element in a.flat:
                     ... print(element)  
 ... 
6
Datatypes
Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:
import numpy as np 
 x = np.array([1, 2]) # Let numpy choose the datatype
 print(x.dtype) # Prints "int64" 
 x = np.array([1.0, 2.0]) # Let numpy choose the datatype 
print(x.dtype) # Prints "float64"
 x = np.array([1, 2], dtype=np.int64) # Force a particular datatype 
print(x.dtype) # Prints "int64"
Basic Operations
Arithmetic operators on arrays apply element wise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )
 >>> b = np.arange( 4 )
 >>> b 
array([0 1 2 3]) 
>>> c = a-b         
>>> c array([20 29 38 47])
 >>> b**2 
array([0 1  4 9]) 
>>> 10*np.sin(a) 
array([ 9.12945251  -9.88031624  7.4511316  -2.62374854])
 >>> a<35 
array([ True True False False])
Basic Arithmetic
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64) 
# Elementwise sum; both produce the array  [[ 6.0 8.0]  [10.0 12.0]] 
print(x + y) 
print(np.add(x, y))
# Element wise difference; both produce the array  [[-4.0 -4.0]  [-4.0 -4.0]]
print(x - y) 
print(np.subtract(x, y))
# Element wise product both produce the array [[ 5.0 12.0]  [21.0 32.0]] 
print(x * y)
print(np.multiply(x, y))
# Element wise division; both produce the array  [[ 0.2 0.33333333]  [ 0.42857143 0.5 ]] 
print(x / y) 
print(np.divide(x, y))
# Elementwise square root; produces the array  [[ 1. 1.41421356]  [ 1.73205081 2. ]]
print(np.sqrt(x))
Dot product
v = np.array([9,10]) 
w = np.array([11, 12])
 # Inner product of vectors; both produce 219 
print(v.dot(w)) 
print(np.dot(v, w))
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
# Matrix / vector product; both produce the rank 1 array [29 67] 
print(x.dot(v)) 
print(np.dot(x, v)) 
# Matrix / matrix product; both produce the rank 2 array # [[19 22]  [43 50]] 
print(x.dot(y)) 
print(np.dot(x, y))
print(x @ y) #only in python 3.5 or later
Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:
x = np.array([[1,2],[3,4]]) 
print(np.sum(x)) # Compute sum of all elements; prints "10" 
print(np.sum(x, axis=0)) # Compute sum of each column; prints "[4 6]" 
print(np.sum(x, axis=1)) # Compute sum of each row; prints "[3 7]"
Transpose
print(x.T) # Prints "[[1 3] [2 4]]"
Min and Max
print(a.min) # print 1
print(a.max) #print 4
Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.
>>> B = np.arange(3) 
>>> B 
array([0 1 2]) 
>>> np.exp(B) 
array([1.  2.71828183 7.3890561 ])
 >>> np.sqrt(B) 
array([0.  1.  1.41421356]) 
>>> C = np.array([2. -1. 4.]) 
>>> np.add(B, C) 
array([2. 0. 6.])
The ix_() function
The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:
>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> result = ax+bx*cx
>>>results[3,2,4]
17
Structured arrays
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. For example
#creating a structured array
x=np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
#printing the array
print(x)
#printing the first record(structure)
print(x[1])
#printing the names in all records
print(x['name'])
#printing the name in first record
print(x[1]['name'])
Note: Here the string indexing is used for accessing the data
Vector Stacking
How do we construct a 2D array from a list of equally-sized row vectors. if x and y are two vectors of the same length then in NumPy this works via the functions column_stack, dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example
>>> x = np.arange(0,10,2)
>>> y = np.arange(5)
>>> m = np.vstack([x,y])
>>> m
array([[0, 2, 4, 6, 8],
       [0, 1, 2, 3, 4]])
>>> xy = np.hstack([x,y])
>>> xy
array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])
>>> a = np.array([4.,2.]) 
>>> b = np.array([3.,8.])
 >>> np.column_stack((a,b)) # returns a 2D array array([[4., 3.], [2., 8.]])
Splitting one array into smaller ones
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:
import numpy as np
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z=np.hsplit(x,2) #splitting into 2
print(y)
print(z)
[[1 2]
[5 6]]

[[3  4]
[7 8]]
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z,k=np.hsplit(x,(1,3))
print(y)
print(z)
print(k)
[[1]
[5]] 

[[2 3]
[6 7]] 

[[4]
[8]]
Copies and Views
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
No copy at all
x=np.array([1,2,3,4])
y=x
print(id(x))
print(id(y))
x[1]=5
print(x,y)
32677040
32677040
[1 5 3 4] [1 5 3 4]
it is noted that in the above case both x and y refer to the same memory location
View or Shallow Copy
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
x=np.array([1,2,3,4])
y=x.view()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
30054832
29911504
[1 5 3 4] [1 5 3 4]
It is noted that slicing an array will return the view or shallow copy
 y=x[:]
Deep Copy
The copy method makes a complete copy of the array and its data.
x=np.array([1,2,3,4])
y=x.copy()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
30146912
30196080
[1 5 3 4] [1 2 3 4]
Array Broadcasting
Broadcasting is the name given to the method that NumPy uses to allow array arithmeticbetween arrays with a di erent shape or size. Although the technique was developed for NumPy,it has also been adopted more broadly in other numerical computational libraries, such asTheano, TensorFlow, and Octave. Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.
For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:
import numpy as np 
 # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y 
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
v = np.array([1, 0, 1]) 
y = np.empty_like(x) # Create an empty matrix with the same shape as x 
 # Add the vector v to each row of the matrix x with an explicit loop 
for i in range(4): 
    y [i, :] = x[i, :] + v  
print(y)
 [[ 2 2 4] 
 [ 5 5 7] 
 [ 8 8 10] 
 [11 11 13]] 
This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing element wise summation of x and vv. We could implement this approach like this:
import numpy as np # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y 
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1]) 
vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other 
print(vv) 
# Prints "[[1 0 1]  [1 0 1]  [1 0 1]  [1 0 1]]" 
y = x + vv # Add x and vv elementwise 
print(y) # Prints "[[ 2 2 4  [ 5 5 7]  [ 8 8 10]  [11 11 13]]"
Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:
import numpy as np 
 # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y
 x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
v = np.array([1, 0, 1]) 
y = x + v # Add v to each row of x using broadcasting
 print(y) 
[[ 2 2 4] 
 [ 5 5 7]
[ 8 8 10] 
[11 11 13]]
consider another example
import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([1,0])
print(x+y.reshape(2,1))
[[2 3]
 [4 5]]
Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.
Simple Linear Algebra  Operations
>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[1. 2.]
 [3. 4.]]
>>> a.transpose()
array([[1., 3.],
       [2., 4.]])
>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[1., 0.],
       [0., 1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])
>>> j @ j        # matrix product
array([[-1.,  0.],
       [ 0., -1.]])
>>> np.trace(u)  # trace
2.0
>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
       [ 4.]])
>>> np.linalg.eig(j)
(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j        , 0.70710678-0.j        ],
       [0.        -0.70710678j, 0.        +0.70710678j]]))
Returns
    The eigenvalues, each repeated according to its multiplicity.
    The normalized (unit "length") eigenvectors, such that the
    column ``v[:,i]`` is the eigenvector corresponding to the
    eigenvalue ``w[i]``

Solving system of linear equations
let 2x1+3x2 +5x3= 10
     3x1-2x2+x3=3
     x1+5x2+7x3=8
the matrix representation is
Ax=b
where
A=[[ 2 , 3, 5],
     [ 3,  -2 ,1],
     [ 1,  5 ,  7  ]])
b=[10,3,8]
The following is the python code to solve the problem
import numpy as np
A=np.array([[ 2 , 3, 5],
     [ 3,  -2 ,1],
     [ 1,  5 ,  7  ]])
 
b=np.array([10,3,8])
x=np.linalg.solve(A,b)
print(x)

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...