NumPy arrays in Machine Learning

Arrays are the main data structure used in machine learning. In Python, arrays from the NumPy

library, called N-dimensional arrays or the ndarray, are used as the primary data structure for

representing data.

NumPy (Numerical Python)is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.

To use the numpy package in your program, you have to import the package as follows

import numpy as np

Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non negative integers. In NumPy dimensions are called axes.The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

Python array indexing start from 0.

Example:

import numpy as np

a = np.array([1, 2, 3]) # Create a rank 1 array

print(type(a)) # Prints "<class 'numpy.ndarray'>"

print(a.shape) # Prints "(3,)"

print(a[0], a[1], a[2]) # Prints "1 2 3"

a[0] = 5 # Change an element of the array

print(a) # Prints "[5, 2, 3]"

print(a.size) # prints 3

b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array

print(b.shape) # Prints "(2, 3)"

print(b.ndim) # Prints 2

print(b[0, 0], b[0, 1], b[1, 0]) # Prints "1 2 4"

Numpy also provides many functions for intrinsic array creation:

import numpy as np

a = np.zeros((2,2)) # Create an array of all zeros

print(a) # Prints "[[ 0. 0.] [ 0. 0.]]"

b = np.ones((1,2)) # Create an array of all ones

print(b) # Prints "[[ 1. 1.]]"

c = np.full((2,2), 7) # Create a constant array

print(c) # Prints "[[ 7. 7.] [ 7. 7.]]"

d = np.eye(2) # Create a 2x2 identity matrix

print(d) # Prints "[[ 1. 0.] [ 0. 1.]]"

e = np.random.random((2,2)) # Create an array filled with random values

print(e) # Might print "[[ 0.91940167 0.08143941] [ 0.68744134 0.87236687]]"

>>> np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> np.arange(2, 10, dtype=float)

array([ 2., 3., 4., 5., 6., 7., 8., 9.])

>>> np.arange(2, 3, 0.1)

array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

>>> b = np.arange(12).reshape(4,3) # 2d array

>>> print(b)

[[ 0 1 2]

[ 3 4 5]

[ 6 7 8]

[ 9 10 11]]

>>> from numpy import pi

>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2 array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ]) >>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points

>>> f = np.sin(x)

Array Indexing, Slicing and Iterating

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

>>> a = np.arange(10)**3

>>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])

>>> a[2]

>>> a[2:5]

array([ 8, 27, 64])

# equivalent to a[0:6:2] = 1000; # from start to position 6, exclusive, set every 2nd element to 1000

>>> a[:6:2] = 1000

>>> a

array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729])

>>> a[ : :-1] # reversed a

array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000])

>>> for i in a:

... print(i**(1/3.))

...

9.999999999999998

1.0

9.999999999999998

3.0

9.999999999999998

4.999999999999999

5.999999999999999

6.999999999999999

7.999999999999999

8.999999999999998

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the sub array consisting of the first 2 rows and columns 1 and 2; b is the following array of shape (2, 2):

>>>b = a[:2, 1:3]

>>>b

[[2 3]

[6 7]]

Two ways of accessing the data in the middle row of the array. Mixing integer indexing with slices yields an array of lower rank, while using only slices yields an array of the same rank as the original array:

>>>row_r1 = a[1, :] # Rank 1 view of the second row of a row_

>>>r2 = a[1:2, :] # Rank 2 view of the second row of a

print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"

print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:

col_r1 = a[:, 1]

col_r2 = a[:, 1:2]

One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

import numpy as np # Create a new array from which we will select elements

a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a) # prints "array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])"

# Create an array of indices

b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b

print(a[np.arange(4), b]) # Prints "[ 1 6 7 11]"

# Mutate one element from each row of a using the indices in b

a[np.arange(4), b] += 10

print(a) # prints "array([[11, 2, 3], [ 4, 5, 16], [17, 8, 9], [10, 21, 12]])

b= array([[ 0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]])

>>> b[2,3]

>>> b[0:5, 1] # each row in the second column of b

array([ 1 11 21 31 41])

>>> b[ : ,1] # equivalent to the previous example array([ 1, 11, 21, 31, 41])

>>> b[1:3, : ] # each column in the second and third row of b

array([[10 11 12 13], [20 21 22 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

>>>>>> b[-1] # the last row. Equivalent to b[-1,:]

array([40 41 42 43])

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].

The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1,2,...] is equivalent to x[1,2,:,:,:]

Indexing with Arrays of Indices
import numpy as np
a = np.array([10,11,12,13,14,15,16,17,18,19,20])
i=np.array([3,4,5])
print(a[i]) # will print 13,14,15

j = np.array([[3, 4], [5, 6]) # a bidimensional array of indices

printf( a[j]) # will print array([[ 13 14] [15 16]]) the same shape as j

a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([1,2])
print(a[i]) # will print [[13 14 15][16 17 18]]

a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
print(a[i])

#will print

[[[13 14 15]
[16 17 18]]

[[16 17 18]
[19 20 21]]]

We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.

a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])

i=np.array([[1,2],[2,3]])

j=np.array([[1,1],[2,2]])

print(a[i,j])

#will print

[[14 17]
[18 21]]

Another common use of indexing with arrays is the search of the maximum value of time-dependent series:

>>>>>> time = np.linspace(20, 145, 5) # time scale

>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series

>>> time

array([ 20. , 51.25, 82.5 , 113.75, 145. ])

>>> data

array([[ 0. , 0.84147098, 0.90929743, 0.14112001],

[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],

[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],

[-0.53657292, 0.42016704, 0.99060736, 0.65028784],

[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])

# index of the maxima for each series

>>> ind = data.argmax(axis=0)

>>> ind

array([2, 0, 3, 1]) # times corresponding to the maxima

>>> time_max = time[ind]

>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...

>>> time_max

array([ 82.5 , 20. , 113.75, 51.25])

>>> data_max

array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])

You can also use indexing with arrays as a target to assign to:>>> a = np.arange(5)

>>> a

array([0, 1, 2, 3, 4])

>>> a[[1,3,4]] = 0

>>> a

array([0, 0, 2, 0, 0])

However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>>>>> a = np.arange(5)

>>> a[[0,0,2]]=[1,2,3]

>>> a

array([2, 1, 3, 3, 4])

Boolean array indexing

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)

# Find the elements of a that are bigger than 2

# this returns a numpy array of Booleans of the same # shape as a, where each slot of bool_idx tells

# whether that element of a is > 2.

print(bool_idx) # Prints "[[False False] [ True True] # [ True True]]"

# We use boolean array indexing to construct a rank 1 array

# consisting of the elements of a corresponding to the True values # of bool_idx

print(a[bool_idx]) # Prints "[3 4 5 6]"

Lets see another example:

>>>a=np.arange(12).reshape(3,4)
>>> b1= np.array([False,True,True]) # first dim selection

>>> a[b1,:] # selecting rows

array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])

Iterating over multidimensional arrays is done with respect to the first axis:

a = np.array([[1,2], [3, 4], [5, 6]])

>>> for row in a:

... print(row)

...

[1 2]

[3 4]

[5 6]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:

>>>for element in a.flat:

... print(element)

...

Datatypes

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

import numpy as np

x = np.array([1, 2]) # Let numpy choose the datatype

print(x.dtype) # Prints "int64"

x = np.array([1.0, 2.0]) # Let numpy choose the datatype

print(x.dtype) # Prints "float64"

x = np.array([1, 2], dtype=np.int64) # Force a particular datatype

print(x.dtype) # Prints "int64"
Basic Operations
Arithmetic operators on arrays apply element wise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )

>>> b = np.arange( 4 )

>>> b

array([0 1 2 3])

>>> c = a-b

>>> c array([20 29 38 47])

>>> b**2

array([0 1 4 9])

>>> 10*np.sin(a)

array([ 9.12945251 -9.88031624 7.4511316 -2.62374854])

>>> a<35

array([ True True False False])

Basic Arithmetic

import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)

y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array [[ 6.0 8.0] [10.0 12.0]]

print(x + y)

print(np.add(x, y))

# Element wise difference; both produce the array [[-4.0 -4.0] [-4.0 -4.0]]

print(x - y)

print(np.subtract(x, y))

# Element wise product both produce the array [[ 5.0 12.0] [21.0 32.0]]

print(x * y)

print(np.multiply(x, y))

# Element wise division; both produce the array [[ 0.2 0.33333333] [ 0.42857143 0.5 ]]

print(x / y)

print(np.divide(x, y))

# Elementwise square root; produces the array [[ 1. 1.41421356] [ 1.73205081 2. ]]

print(np.sqrt(x))

Dot product

v = np.array([9,10])

w = np.array([11, 12])

# Inner product of vectors; both produce 219

print(v.dot(w))

print(np.dot(v, w))

x = np.array([[1,2],[3,4]])

y = np.array([[5,6],[7,8]])

# Matrix / vector product; both produce the rank 1 array [29 67]

print(x.dot(v))

print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array # [[19 22] [43 50]]

print(x.dot(y))

print(np.dot(x, y))

print(x @ y) #only in python 3.5 or later

Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:
x = np.array([[1,2],[3,4]])

print(np.sum(x)) # Compute sum of all elements; prints "10"

print(np.sum(x, axis=0)) # Compute sum of each column; prints "[4 6]"

print(np.sum(x, axis=1)) # Compute sum of each row; prints "[3 7]"

Transpose

print(x.T) # Prints "[[1 3] [2 4]]"

Min and Max

print(a.min) # print 1

print(a.max) #print 4

Universal Functions

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

>>> B = np.arange(3)

>>> B

array([0 1 2])

>>> np.exp(B)

array([1. 2.71828183 7.3890561 ])

>>> np.sqrt(B)

array([0. 1. 1.41421356])

>>> C = np.array([2. -1. 4.])

>>> np.add(B, C)

array([2. 0. 6.])

The ix_() function

The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:

>>> a = np.array([2,3,4,5])

>>> b = np.array([8,5,4])

>>> c = np.array([5,4,6,8,3])

>>> ax,bx,cx = np.ix_(a,b,c)

>>> result = ax+bx*cx

>>>results[3,2,4]

Structured arrays
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. For example

#creating a structured array

x=np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

#printing the array
print(x)

#printing the first record(structure)
print(x[1])

#printing the names in all records
print(x['name'])

#printing the name in first record
print(x[1]['name'])

Note: Here the string indexing is used for accessing the data

Vector Stacking

How do we construct a 2D array from a list of equally-sized row vectors. if x and y are two vectors of the same length then in NumPy this works via the functions column_stack, dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example

>>> x = np.arange(0,10,2)

>>> y = np.arange(5)

>>> m = np.vstack([x,y])

>>> m

array([[0, 2, 4, 6, 8],

[0, 1, 2, 3, 4]])

>>> xy = np.hstack([x,y])

>>> xy

array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])
>>> a = np.array([4.,2.])

>>> b = np.array([3.,8.])

>>> np.column_stack((a,b)) # returns a 2D array array([[4., 3.], [2., 8.]])
Splitting one array into smaller ones
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:
import numpy as np
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z=np.hsplit(x,2) #splitting into 2
print(y)
print(z)
[[1 2]
[5 6]]

[[3 4]
[7 8]]

x=np.array([[1,2,3,4],[5,6,7,8]])

y,z,k=np.hsplit(x,(1,3))

print(y)

print(z)

print(k)

[[1]
[5]]

[[2 3]
[6 7]]

[[4]
[8]]

Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

No copy at all

x=np.array([1,2,3,4])

y=x

print(id(x))

print(id(y))

x[1]=5

print(x,y)

32677040
32677040
[1 5 3 4] [1 5 3 4]
it is noted that in the above case both x and y refer to the same memory location

View or Shallow Copy
Different array objects can share the same data. The view method creates a new array object that looks at the same data.

x=np.array([1,2,3,4])

y=x.view()

print(id(x))

print(id(y))

x[1]=5

print(x,y)

30054832
29911504
[1 5 3 4] [1 5 3 4]

It is noted that slicing an array will return the view or shallow copy
y=x[:]

Deep Copy
The copy method makes a complete copy of the array and its data.

x=np.array([1,2,3,4])

y=x.copy()

print(id(x))

print(id(y))

x[1]=5

print(x,y)

30146912
30196080
[1 5 3 4] [1 2 3 4]
Array Broadcasting

Broadcasting is the name given to the method that NumPy uses to allow array arithmeticbetween arrays with a di erent shape or size. Although the technique was developed for NumPy,it has also been adopted more broadly in other numerical computational libraries, such asTheano, TensorFlow, and Octave. Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:
import numpy as np

# We will add the vector v to each row of the matrix x,

# storing the result in the matrix y

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

v = np.array([1, 0, 1])

y = np.empty_like(x) # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop

for i in range(4):

y [i, :] = x[i, :] + v

print(y)

[[ 2 2 4]

[ 5 5 7]

[ 8 8 10]

[11 11 13]]

This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing element wise summation of x and vv. We could implement this approach like this:

import numpy as np # We will add the vector v to each row of the matrix x,

# storing the result in the matrix y

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

v = np.array([1, 0, 1])

vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other

print(vv)

# Prints "[[1 0 1] [1 0 1] [1 0 1] [1 0 1]]"

y = x + vv # Add x and vv elementwise

print(y) # Prints "[[ 2 2 4 [ 5 5 7] [ 8 8 10] [11 11 13]]"

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:
import numpy as np

# We will add the vector v to each row of the matrix x,

# storing the result in the matrix y

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

v = np.array([1, 0, 1])

y = x + v # Add v to each row of x using broadcasting

print(y)

[[ 2 2 4]

[ 5 5 7]

[ 8 8 10]

[11 11 13]]
consider another example
import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([1,0])
print(x+y.reshape(2,1))

[[2 3]

[4 5]]

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

Simple Linear Algebra Operations

>>> import numpy as np

>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])

>>> print(a)

[[1. 2.]

[3. 4.]]

>>> a.transpose()

array([[1., 3.],

[2., 4.]])

>>> np.linalg.inv(a)

array([[-2. , 1. ],

[ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"

>>> u

array([[1., 0.],

[0., 1.]])

>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> j @ j # matrix product

array([[-1., 0.],

[ 0., -1.]])

>>> np.trace(u) # trace

2.0

>>> y = np.array([[5.], [7.]])

>>> np.linalg.solve(a, y)

array([[-3.],

[ 4.]])

>>> np.linalg.eig(j)

(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ],

[0. -0.70710678j, 0. +0.70710678j]]))

Returns

The eigenvalues, each repeated according to its multiplicity.

The normalized (unit "length") eigenvectors, such that the

column ``v[:,i]`` is the eigenvector corresponding to the

eigenvalue ``w[i]``

Solving system of linear equations
let 2x1+3x2 +5x3= 10
3x1-2x2+x3=3
x1+5x2+7x3=8
the matrix representation is
Ax=b
where
A=[[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])
b=[10,3,8]
The following is the python code to solve the problem
import numpy as np
A=np.array([[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])

b=np.array([10,3,8])
x=np.linalg.solve(A,b)
print(x)

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

NumPy arrays in Machine Learning

Labels

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem