Arrays are the main data structure used in machine learning. In Python, arrays from the NumPy
library, called N-dimensional arrays or the ndarray, are used as the primary data structure for
representing data.
NumPy (Numerical Python)is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.
To use the numpy package in your program, you have to import the package as follows
import numpy as np
Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non negative integers. In NumPy dimensions are called axes.The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
We can initialize numpy arrays from nested Python lists, and access elements using square brackets:
Python array indexing start from 0.
Example:
import numpy as np
a = np.array([1, 2, 3]) # Create a rank 1 array
print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)"
print(a[0], a[1], a[2]) # Prints "1 2 3"
a[0] = 5 # Change an element of the array
print(a) # Prints "[5, 2, 3]"
print(a.size) # prints 3
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
print(b.shape) # Prints "(2, 3)"
print(b.ndim) # Prints 2
print(b[0, 0], b[0, 1], b[1, 0]) # Prints "1 2 4"
Numpy also provides many functions for intrinsic array creation:
import numpy as np
a = np.zeros((2,2)) # Create an array of all zeros
print(a) # Prints "[[ 0. 0.] [ 0. 0.]]"
b = np.ones((1,2)) # Create an array of all ones
print(b) # Prints "[[ 1. 1.]]"
c = np.full((2,2), 7) # Create a constant array
print(c) # Prints "[[ 7. 7.] [ 7. 7.]]"
d = np.eye(2) # Create a 2x2 identity matrix
print(d) # Prints "[[ 1. 0.] [ 0. 1.]]"
e = np.random.random((2,2)) # Create an array filled with random values
print(e) # Might print "[[ 0.91940167 0.08143941] [ 0.68744134 0.87236687]]"
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(2, 10, dtype=float)
array([ 2., 3., 4., 5., 6., 7., 8., 9.])
>>> np.arange(2, 3, 0.1)
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2 array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ]) >>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)
Array Indexing, Slicing and Iterating
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
>>> a = np.arange(10)**3
>>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
# equivalent to a[0:6:2] = 1000; # from start to position 6, exclusive, set every 2nd element to 1000
>>> a[:6:2] = 1000
>>> a
array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729])
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000])
>>> for i in a:
... print(i**(1/3.))
...
9.999999999999998
1.0
9.999999999999998
3.0
9.999999999999998
4.999999999999999
5.999999999999999
6.999999999999999
7.999999999999999
8.999999999999998
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
# Use slicing to pull out the sub array consisting of the first 2 rows and columns 1 and 2; b is the following array of shape (2, 2):
>>>b = a[:2, 1:3]
>>>b
[[2 3]
[6 7]]
Two ways of accessing the data in the middle row of the array. Mixing integer indexing with slices yields an array of lower rank, while using only slices yields an array of the same rank as the original array:
>>>row_r1 = a[1, :] # Rank 1 view of the second row of a row_
>>>r2 = a[1:2, :] # Rank 2 view of the second row of a
print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:
import numpy as np # Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a) # prints "array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])"
# Create an array of indices
b = np.array([0, 2, 0, 1])
# Select one element from each row of a using the indices in b
print(a[np.arange(4), b]) # Prints "[ 1 6 7 11]"
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10
print(a) # prints "array([[11, 2, 3], [ 4, 5, 16], [17, 8, 9], [10, 21, 12]])
b= array([[ 0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1] # each row in the second column of b
array([ 1 11 21 31 41])
>>> b[ : ,1] # equivalent to the previous example array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ] # each column in the second and third row of b
array([[10 11 12 13], [20 21 22 23]])
When fewer indices are provided than the number of axes, the missing indices are considered complete slices:
>>>>>> b[-1] # the last row. Equivalent to b[-1,:]
array([40 41 42 43])
The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1,2,...] is equivalent to x[1,2,:,:,:]
Indexing with Arrays of Indices
import numpy as np
a = np.array([10,11,12,13,14,15,16,17,18,19,20])
i=np.array([3,4,5])
print(a[i]) # will print 13,14,15
import numpy as np
a = np.array([10,11,12,13,14,15,16,17,18,19,20])
i=np.array([3,4,5])
print(a[i]) # will print 13,14,15
j = np.array([[3, 4], [5, 6]) # a bidimensional array of indices
printf( a[j]) # will print array([[ 13 14] [15 16]]) the same shape as j
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([1,2])
print(a[i]) # will print [[13 14 15][16 17 18]]
i=np.array([1,2])
print(a[i]) # will print [[13 14 15][16 17 18]]
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
print(a[i])
i=np.array([[1,2],[2,3]])
print(a[i])
#will print
[[[13 14 15]
[16 17 18]]
[[16 17 18]
[19 20 21]]]
[16 17 18]]
[[16 17 18]
[19 20 21]]]
We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
j=np.array([[1,1],[2,2]])
print(a[i,j])
#will print
[[14 17]
[18 21]]
[18 21]]
Another common use of indexing with arrays is the search of the maximum value of time-dependent series:
>>>>>> time = np.linspace(20, 145, 5) # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
# index of the maxima for each series
>>> ind = data.argmax(axis=0)
>>> ind
array([2, 0, 3, 1]) # times corresponding to the maxima
>>> time_max = time[ind]
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
>>> data_max
array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
You can also use indexing with arrays as a target to assign to:>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>>>>> a = np.arange(5)
>>>>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
Boolean array indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (a > 2)
# Find the elements of a that are bigger than 2
# this returns a numpy array of Booleans of the same # shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.
print(bool_idx) # Prints "[[False False] [ True True] # [ True True]]"
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values # of bool_idx
print(a[bool_idx]) # Prints "[3 4 5 6]"
Lets see another example:
>>>a=np.arange(12).reshape(3,4)
>>> b1= np.array([False,True,True]) # first dim selection
>>> b1= np.array([False,True,True]) # first dim selection
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])
Iterating over multidimensional arrays is done with respect to the first axis:
a = np.array([[1,2], [3, 4], [5, 6]])
>>> for row in a:
... print(row)
...
[1 2]
[3 4]
[5 6]
However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:
>>>for element in a.flat:
... print(element)
...
1
2
3
4
5
6
Datatypes
Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:
import numpy as np
x = np.array([1, 2]) # Let numpy choose the datatype
print(x.dtype) # Prints "int64"
x = np.array([1.0, 2.0]) # Let numpy choose the datatype
print(x.dtype) # Prints "float64"
x = np.array([1, 2], dtype=np.int64) # Force a particular datatype
print(x.dtype) # Prints "int64"
Basic Operations
Arithmetic operators on arrays apply element wise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )
Basic Operations
Arithmetic operators on arrays apply element wise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0 1 2 3])
>>> c = a-b
>>> c array([20 29 38 47])
>>> b**2
array([0 1 4 9])
>>> 10*np.sin(a)
array([ 9.12945251 -9.88031624 7.4511316 -2.62374854])
>>> a<35
array([ True True False False])
Basic Arithmetic
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
# Elementwise sum; both produce the array [[ 6.0 8.0] [10.0 12.0]]
print(x + y)
print(np.add(x, y))
# Element wise difference; both produce the array [[-4.0 -4.0] [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))
# Element wise product both produce the array [[ 5.0 12.0] [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))
# Element wise division; both produce the array [[ 0.2 0.33333333] [ 0.42857143 0.5 ]]
print(x / y)
print(np.divide(x, y))
# Elementwise square root; produces the array [[ 1. 1.41421356] [ 1.73205081 2. ]]
print(np.sqrt(x))
Dot product
v = np.array([9,10])
w = np.array([11, 12])
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))
# Matrix / matrix product; both produce the rank 2 array # [[19 22] [43 50]]
print(x.dot(y))
print(np.dot(x, y))
print(x @ y) #only in python 3.5 or later
Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:
x = np.array([[1,2],[3,4]])
x = np.array([[1,2],[3,4]])
print(np.sum(x)) # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0)) # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1)) # Compute sum of each row; prints "[3 7]"
Transpose
print(x.T) # Prints "[[1 3] [2 4]]"
Min and Max
print(a.min) # print 1
print(a.max) #print 4
Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.
>>> B = np.arange(3)
>>> B
array([0 1 2])
>>> np.exp(B)
array([1. 2.71828183 7.3890561 ])
>>> np.sqrt(B)
array([0. 1. 1.41421356])
>>> C = np.array([2. -1. 4.])
>>> np.add(B, C)
array([2. 0. 6.])
The ix_() function
The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:
>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> result = ax+bx*cx
>>>results[3,2,4]
17
Structured arrays
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. For example
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. For example
#creating a structured array
x=np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
#printing the array
print(x)
print(x)
#printing the first record(structure)
print(x[1])
print(x[1])
#printing the names in all records
print(x['name'])
print(x['name'])
#printing the name in first record
print(x[1]['name'])
print(x[1]['name'])
Note: Here the string indexing is used for accessing the data
Vector Stacking
How do we construct a 2D array from a list of equally-sized row vectors. if x and y are two vectors of the same length then in NumPy this works via the functions column_stack, dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example
>>> x = np.arange(0,10,2)
>>> y = np.arange(5)
>>> m = np.vstack([x,y])
>>> m
array([[0, 2, 4, 6, 8],
[0, 1, 2, 3, 4]])
>>> xy = np.hstack([x,y])
>>> xy
array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])
>>> a = np.array([4.,2.])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array array([[4., 3.], [2., 8.]])
Splitting one array into smaller ones
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:
import numpy as np
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z=np.hsplit(x,2) #splitting into 2
print(y)
print(z)
[[1 2]
[5 6]]
Splitting one array into smaller ones
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:
import numpy as np
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z=np.hsplit(x,2) #splitting into 2
print(y)
print(z)
[[1 2]
[5 6]]
[[3 4]
[7 8]]
[7 8]]
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z,k=np.hsplit(x,(1,3))
print(y)
print(z)
print(k)
[[1]
[5]]
[5]]
[[2 3]
[6 7]]
[[4]
[8]]
Copies and Views
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
No copy at all
x=np.array([1,2,3,4])
y=x
print(id(x))
print(id(y))
x[1]=5
print(x,y)
3267704032677040
[1 5 3 4] [1 5 3 4]
it is noted that in the above case both x and y refer to the same memory location
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
x=np.array([1,2,3,4])
y=x.view()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
29911504
[1 5 3 4] [1 5 3 4]
It is noted that slicing an array will return the view or shallow copy
y=x[:]
y=x[:]
Deep Copy
The copy method makes a complete copy of the array and its data.
The copy method makes a complete copy of the array and its data.
x=np.array([1,2,3,4])
y=x.copy()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
30146912
30196080
[1 5 3 4] [1 2 3 4]
Array Broadcasting
import numpy as np
30196080
[1 5 3 4] [1 2 3 4]
Array Broadcasting
Broadcasting is the name given to the method that NumPy uses to allow array arithmeticbetween arrays with a di erent shape or size. Although the technique was developed for NumPy,it has also been adopted more broadly in other numerical computational libraries, such asTheano, TensorFlow, and Octave. Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.
For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:import numpy as np
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x) # Create an empty matrix with the same shape as x
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y [i, :] = x[i, :] + v
print(y)
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing element wise summation of x and vv. We could implement this approach like this:
import numpy as np # We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other
print(vv)
# Prints "[[1 0 1] [1 0 1] [1 0 1] [1 0 1]]"
y = x + vv # Add x and vv elementwise
print(y) # Prints "[[ 2 2 4 [ 5 5 7] [ 8 8 10] [11 11 13]]"
Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:
import numpy as np
import numpy as np
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v # Add v to each row of x using broadcasting
print(y)
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
consider another example
import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([1,0])
print(x+y.reshape(2,1))
consider another example
import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([1,0])
print(x+y.reshape(2,1))
[[2 3]
[4 5]]
Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.
Solving system of linear equations
let 2x1+3x2 +5x3= 10
3x1-2x2+x3=3
x1+5x2+7x3=8
the matrix representation is
Ax=b
where
A=[[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])
b=[10,3,8]
The following is the python code to solve the problem
import numpy as np
A=np.array([[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])
b=np.array([10,3,8])
x=np.linalg.solve(A,b)
print(x)
Simple Linear Algebra Operations
>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[1. 2.]
[3. 4.]]
>>> a.transpose()
array([[1., 3.],
[2., 4.]])
>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[1., 0.],
[0., 1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])
>>> j @ j # matrix product
array([[-1., 0.],
[ 0., -1.]])
>>> np.trace(u) # trace
2.0
>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
[ 4.]])
>>> np.linalg.eig(j)
(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ],
[0. -0.70710678j, 0. +0.70710678j]]))
Returns
The eigenvalues, each repeated according to its multiplicity.
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]``
let 2x1+3x2 +5x3= 10
3x1-2x2+x3=3
x1+5x2+7x3=8
the matrix representation is
Ax=b
where
A=[[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])
b=[10,3,8]
The following is the python code to solve the problem
import numpy as np
A=np.array([[ 2 , 3, 5],
[ 3, -2 ,1],
[ 1, 5 , 7 ]])
b=np.array([10,3,8])
x=np.linalg.solve(A,b)
print(x)
Comments
Post a Comment