Skip to main content

Matrix Decomposition-LU , QR , Cholesky , Eigen , SVD

Matrix Decomposition
Many complex matrix operations cannot be solved efficiently or with stability using the limited precision of computers. Matrix decompositions are methods that reduce a matrix into constituent parts that make it easier to calculate more complex matrix operations. Matrix decomposition methods, also called matrix factorization methods, are a foundation of linear algebra in computers,even for basic operations such as solving systems of linear equations, calculating the inverse, and calculating the determinant of a matrix. In this tutorial, you will discover matrix decompositions and how to calculate them in Python.
It is an approach that can simplify more complex matrix operations that can be performed on the decomposed matrix rather than on the original matrix itself. A common analogy for matrix decomposition is the factoring of numbers, such as the factoring of 10 into 2x5. For this reason, matrix decomposition is also called matrix factorization. Like factoring real values, there are many ways to decompose a matrix, hence there are a range of different matrix decomposition techniques. Two simple and widely used matrix decomposition methods are the LU matrix decomposition and the QR matrix decomposition.
LU Decomposition
LU decomposition of a matrix is the factorization of a given square matrix into two triangular matrices, one upper triangular matrix and one lower triangular matrix, such that the product of these two matrices gives the original matrix.It was introduced by Alan Turing in 1948, who also created the turing machine.
The LU decomposition is for square matrices and decomposes a matrix into L and U components.
A = L x U
Or, without the dot notation.
A = LU
Where A is the square matrix that we wish to decompose, L is the lower triangle matrix and U is the upper triangle matrix.
The LU decomposition is found using an iterative numerical process and can fail for those matrices that cannot be decomposed or decomposed easily. A variation of this decomposition that is Numerically more stable to solve in practice is called the LUP decomposition, or the LU decomposition with partial pivoting.
A = L x U x P
The rows of the parent matrix are re-ordered to simplify the decomposition process and the additional P matrix specifies a way to permute the result or return the result to the original order. There are also other variations of the LU. The LU decomposition is often used to simplify the solving of systems of linear equations, such as finding the coefficients in a linear regression,as well as in calculating the determinant and inverse of a matrix.
The LU decomposition can be implemented in Python with the lu() function. More specifically, this function calculates an LPU decomposition. The example below first defines a 3x3 square matrix. The LU decomposition is calculated, then the original matrix is reconstructed from the components.
# LU decomposition
from numpy import array
from scipy.linalg import lu
# define a square matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
print(A)
# factorize
P, L, U = lu(A)
print(P)
print(L)
print(U)
# reconstruct
B = P.dot(L).dot(U)
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]

[[ 1. 0. 0. ]
[ 0.14285714 1. 0. ]
[ 0.57142857 0.5 1. ]]

[[ 7.00000000e+00 8.00000000e+00 9.00000000e+00]
[ 0.00000000e+00 8.57142857e-01 1.71428571e+00]
[ 0.00000000e+00 0.00000000e+00 -1.58603289e-16]]

[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
See an example, how LU decomposition can be used to solve system of liner equation.
https://www.geeksforgeeks.org/l-u-decomposition-system-linear-equations/
The given system of equations is A X = C. We substitute A = L U. Thus, we have L U X = C.
We put Z = U X, where Z is a matrix or artificial variables and solve for L Z = C first and then solve for U X = Z to find X or the values of the variables, which was required.

Folllowing is a sample code illustrating this with system of equation
x1+x2+x3=1
4x1+3x2-x3=6
3x1+5x2+3x=4

import numpy as np
import scipy.linalg as sp
A=np.array([[1,1,1],[4,3,-1],[3,5,3]])
C=np.array([1,6,4])
#solution
X=np.linalg.inv(A).dot(C.T)
print(X)
#applying LU decomposition
P,L,U=sp.lu(A)
L=P.dot(L)
print(L)
print(U)
#solve LZ=C 
Z=np.linalg.inv(L).dot(C.T)
print(Z)
#solve Z=UX
X=np.linalg.inv(U).dot(Z.T)
print(X)
QR Decomposition
The QR decomposition is for n x m matrices (not limited to square matrices) and decomposes
a matrix into Q and R components.
A = Q x R
Or, without the dot notation.
A = QR
Where A is the matrix that we wish to decompose, Q a orthogonal matrix with the size mxm, and R is
an upper triangle matrix with the size mxn. The QR decomposition is found using an iterative numerical method that can fail for those matrices that cannot be decomposed, or decomposed easily. Like the LU decomposition, the QR decomposition is often used to solve systems of linear equations, although is not limited to square matrices.
The QR decomposition can be implemented in NumPy using the qr() function. By default, the function returns the Q and R matrices with smaller or reduced dimensions that is more economical. We can change this to return the expected sizes of m x m for Q and m x n for R by specifying the mode argument as 'complete', although this is not required for most applications. The example below defines a 3 x 2 matrix, calculates the QR decomposition, then reconstructs the original matrix from the decomposed elements.
# QR decomposition
from numpy import array
from numpy.linalg import qr
# define rectangular matrix
A = array([[1, 2],[3, 4],[5, 6]])
print(A)
# factorize
Q, R = qr(A, 'complete')
print(Q)
print(R)
# reconstruct
B = Q.dot(R)
print(B)
o/p:
[[1 2]
[3 4]
[5 6]]

[[-0.16903085 0.89708523 0.40824829]
[-0.50709255 0.27602622 -0.81649658]
[-0.84515425 -0.34503278 0.40824829]]

[[-5.91607978 -7.43735744]
[ 0. 0.82807867]
[ 0. 0. ]]

[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]]
Learn more about QR decomposition from here https://en.wikipedia.org/wiki/QR_decomposition
Cholesky Decomposition
The Cholesky decomposition is for square symmetric matrices where all values are greater than zero, so-called positive definite matrices. For our interests in machine learning, we will focus on the Cholesky decomposition for real-valued matrices and ignore the cases when working with complex numbers. The decomposition is defined as follows:
A = L x LT
Where A is the matrix being decomposed, L is the lower triangular matrix and LT is the transpose of L. The decompose can also be written as the product of the upper triangular matrix, for example:
A = UT x U
Where U is the upper triangular matrix. The Cholesky decomposition is used for solving linear least squares for linear regression, as well as simulation and optimization methods. When decomposing symmetric matrices, the Cholesky decomposition is nearly twice as efficient as the LU decomposition and should be preferred in these cases.
While symmetric, positive definite matrices are rather special, they occur quite frequently in some applications, so their special factorization, called Cholesky decomposition,is good to know about. When you can use it, Cholesky decomposition is about a factor of two faster than alternative methods for solving linear equations.
The Cholesky decomposition can be implemented in NumPy by calling the cholesky() function. The function only returns L as we can easily access the L transpose as needed.
# Cholesky decomposition
from numpy import array
from numpy.linalg import cholesky
# define symmetrical matrix
A = array([
[2, 1, 1],
[1, 2, 1],
[1, 1, 2]])
print(A)
# factorize
L= cholesky(A)
print(L)
# reconstruct
B = L.dot(L.T)
print(B)
o/p:
[[2 1 1]
[1 2 1]
[1 1 2]]

[[ 1.41421356 0. 0. ]
[ 0.70710678 1.22474487 0. ]
[ 0.70710678 0.40824829 1.15470054]]

[[ 2. 1. 1.]
[ 1. 2. 1.]
[ 1. 1. 2.]]
Learn Cholesky decomposition from here https://www.geeksforgeeks.org/cholesky-decomposition-matrix-decomposition/

Eigen Decomposition
Matrix decompositions are a useful tool for reducing a matrix to their constituent parts in order to simplify a range of more complex operations. Perhaps the most used type of matrix decomposition is the eigen decomposition that decomposes a square matrix into eigen vectors and eigenvalues. This decomposition also plays a role in methods used in machine learning, such as in the Principal Component Analysis method or PCA A vector is an eigen vector of a matrix if it satisfies the following equation.
A.v = ƛ v
where  ƛ is the eigen value and v is the eigen vector
A matrix could have one eigen vector and eigenvalue for each dimension of the parent matrix.Not all square matrices can be decomposed into eigen vectors and eigenvalues, and some can only be decomposed in a way that requires complex numbers. The parent matrix can be shown to be a product of the eigenvectors and eigenvalues.
A = Q Ʌ Q-1
where Q is the matrix with eigen vectors as columns and Ʌ is the diagonal matrix with eigen values.Q-1 is the transpose of Q.
However, we often want to decompose matrices into their eigenvalues and eigen vectors.Doing so can help us to analyze certain properties of the matrix, much as decomposing an integer into its prime factors can help us understand the behavior of that integer.
Eigen is not a name, e.g. the method is not named after "Eigen"; eigen (pronouncedeye-gan) is a German word that means own or innate, as in belonging to the parent matrix. A decomposition operation does not result in a compression of the matrix; instead, it breaks it down into constituent parts to make certain operations on the matrix easier to perform. Like other matrix decomposition methods, Eigen decomposition is used as an element to simplify the calculation of other more complex matrix operations.
Almost all vectors change direction, when they are multiplied by A. Certain exceptional vectors x are in the same direction as Ax. Those are the "eigenvectors".Multiply an eigenvector by A, and the vector Ax is the number ƛ times the original x.The eigenvalue ƛ  tells whether the special vector x is stretched or shrunk or reversed or left unchanged  when it is multiplied by A.
Eigen decomposition can also be used to calculate the principal components of a matrix in the Principal Component Analysis method or PCA that can be used to reduce the dimensionality of data in machine learning.
Decomposing a matrix in terms of its eigenvalues and its eigenvectors gives valuable insights into the properties of the matrix. Certain matrix calculations, like computing the power of the matrix, become much easier when we use the eigen decomposition of the matrix.
Eigenvectors are unit vectors, which means that their length or magnitude is equal to 1.0. They are often referred as right vectors, which simply means a column vector (as opposed to a row vector or a left vector).
The eigen decomposition can be calculated in NumPy using the eig() function. The example below first defines a 3 x 3 square matrix. The eigen decomposition is calculated on the matrix returning the eigenvalues and eigenvectors.
#eigen decomposition
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
print(A)
# factorize
values, vectors = eig(A)
print(values)
print(vectors)
o/p:
[[1 2 3]
[4 5 6]
[7 8 9]]
[ 1.61168440e+01 -1.11684397e+00 -9.75918483e-16]
[[-0.23197069 -0.78583024 0.40824829]
[-0.52532209 -0.08675134 -0.81649658]
We can confirm that a vector is indeed an eigen vector of a matrix.The example multiplies the original matrix with the first eigen vector and compares it to the first eigen vector multiplied by the first eigenvalue
# confirm eigen vector
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# factorize
values, vectors = eig(A)
# confirm first eigenvector
B = A.dot(vectors[:, 0])
print(B)
C = vectors[:, 0] * values[0]
print(C)
[ -3.73863537 -8.46653421 -13.19443305]
[ -3.73863537 -8.46653421 -13.19443305]
Reconstruct Matrix
We can reverse the process and reconstruct the original matrix given only the eigen vectors and eigenvalues. First, the list of eigen vectors must be taken together as a matrix, where each vector becomes a row. The eigenvalues need to be arranged into a diagonal matrix. The NumPy diag() function can be used for this. Next, we need to calculate the inverse of the eigen vector matrix, which we can achieve with the inv() NumPy function. Finally, these elements need to be multiplied together with the dot() function.
# reconstruct matrix
from numpy import diag
from numpy.linalg import inv
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
print(A)
# factorize
values, vectors = eig(A)
# create matrix from eigenvectors
Q = vectors
# create inverse of eigenvectors matrix
R = inv(Q)
# create diagonal matrix from eigenvalues
L = diag(values)
# reconstruct the original matrix
B = Q.dot(L).dot(R)
print(B)
o/p:
[[1 2 3]
[4 5 6]
[7 8 9]]
[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
Note:eigen decomposition can be used to find inverse and power of large matrices efficiently
giving the amazing decomposition of A into a similarity transformation involving P and D

The fact that this decomposition is always possible for a square matrix A as long as P is a square matrix is known in this work as the eigen decomposition theorem.
Furthermore, squaring both sides of equation  gives



by induction, it follows that for general positive integer powers,

The inverse of A is




Singular Value Decomposition ( SVD)
Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent elements. Perhaps the most known and widely used matrix decomposition method is the Singular-Value Decomposition, or SVD. All matrices have an SVD, which makes it more stable than other methods, such as the eigen decomposition. As such, it is often used in a wide array of applications including compressing, denoising, and data reduction The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler. For the case of simplicity we will focus on the SVD for real-valued matrices and ignore the case for complex numbers.
A = U Ʃ VT
Where A is the real  mxn matrix that we wish to decompose, U is an m x m matrix, Ʃ  is an mx n diagonal matrix, and is the VT transpose of an n x n matrix.
The Singular Value Decomposition is a highlight of linear algebra.The diagonal values in the  Ʃ matrix are known as the singular values of the original matrix A. The columns of the U matrix are called the left-singular vectors of A, and the columns of V are called the right-singular vectors of A.  Every rectangular matrix has a singular value decomposition, although the resulting matrices may contain complex numbers and the limitations of floating point arithmetic may cause some matrices to fail to decompose neatly.
The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. The SVD allows us to discover some of the same kind of information as the eigen decomposition. However, the SVD is more generally applicable.The singular value  decomposition (SVD) has numerous applications in statistics,machine learning, and computer science. Applying the SVD to a matrix is like looking inside it with X-ray vision The SVD is used widely both in the calculation of other matrix operations, such as matrix inverse, but also as a data reduction method in machine learning. SVD can also be used in least squares linear regression, image compression, and denoising data.
Calculate Singular-Value Decomposition
The SVD can be calculated by calling the svd() function. The function takes a matrix and returns the U,  Ʃ and VT elements. The Ʃ diagonal matrix is returned as a vector of singular values. The V matrix is returned in a transposed form, e.g.VT . The example below defines a 3 x 2 matrix and calculates the singular-value decomposition.
# singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([[1, 2],[3, 4],[5, 6]])
print(A)
# factorize
U, s, V = svd(A)
print(U)
print(s)
print(V)
[[1 2]
[3 4]
[5 6]]
[[-0.2298477 0.88346102 0.40824829]
[-0.52474482 0.24078249 -0.81649658]
[-0.81964194 -0.40189603 0.40824829]]

[ 9.52551809 0.51430058]

[[-0.61962948 -0.78489445]
[-0.78489445 0.61962948]]
The original matrix can be reconstructed from the U, s and V  elements. The U, s, and V elements returned from the svd() cannot be multiplied directly. The s vector must be converted into a diagonal matrix using the diag() function. By default, this function will create a square matrix that is m x m, relative to our original matrix. This causes a problem as the size of the matrices do not fit the rules of matrix multiplication, where the number of columns in a matrix must match the number of rows in the subsequent matrix. We can achieve this by creating a new s matrix of all zero values that is m x n (e.g. more rows) and populate the first nxn part of the matrix with the square diagonal matrix calculated via diag().
# reconstruct rectangular matrix from svd
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd
# define matrix
A = array([[1, 2],[3, 4],[5, 6]])
print(A)
# factorize
U, s, V = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(V))
print(B)
o/p
[[1 2]
[3 4]
[5 6]]

[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]]
The diagonal matrix can be used directly when reconstructing a square matrix, as follows.
# reconstruct square matrix from svd
from numpy import array
from numpy import diag
from scipy.linalg import svd
# define matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
print(A)
# factorize
U, s, V = svd(A)
# create n x n Sigma matrix
Sigma = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(V))
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
It is noted that U and V matrix can be easily obtained by computing the eigen vectors of A.AT and AT .A respectively. The following program will illustrate this process.
import numpy as np
A=np.array([[2,2],[1,1]])
#applying svd
U,S,Vt=np.linalg.svd(A)
print(U)
print(S)
print(Vt)
#U is the eigen vectors of A.(A.T) eigen value is the square of S
ev,EU=np.linalg.eig(A.dot(A.T))
print(EU,ev)
#V is the eigen vectors of (A.T).A
ev,EV=np.linalg.eig(A.T.dot(A))
#it is noted that U and Vt are orthogonal matrices
print(U.dot(U.T))
print(Vt.dot(Vt))
S=np.diag(S)
#reconstruction the original matrix from the factors.
Ar=U.dot(S).dot(Vt)
print(Ar)

[[-0.89442719 -0.4472136 ]
[-0.4472136 0.89442719]]
[ 3.16227766 0. ]
[[-0.70710678 -0.70710678]
[-0.70710678 0.70710678]]
[[ 0.89442719 -0.4472136 ]
[ 0.4472136 0.89442719]] [ 10. 0.]
[[ 1. 0.]
[ 0. 1.]]
[[ 1. 0.]
[ 0. 1.]]
[[ 2. 2.]
 Application of SVD
Pseudo inverse
The pseudo inverse is the generalization of the matrix inverse for square matrices to rectangular
matrices where the number of rows and columns are not equal. It is also called the Moore-Penrose
Inverse after two independent discoverers of the method or the Generalized Inverse.
Matrix inversion is not defined for matrices that are not square. When A has more columns than rows, then solving a linear equation using the pseudo inverse provides one of the many possible solutions.
The pseudo inverse is of matrix A  is calculated using the singular value decomposition of A:
. The pseudo inverse of A = V . D .UT
where V and U is the matrix obtained from SVD and D is the inverse of the diagonal matrix Ʃ.D can be obtained by calculating the reciprocal of each non zero element in Ʃ and taking the transpose if the original matrix was rectangular.
The pseudo inverse provides one way of solving the linear regression equation, specifically when there are more rows than there are columns, which is often the case. NumPy provides the function pinv() for calculating the pseudo inverse of a rectangular matrix. The example below defines a 4 x 2 matrix and calculates the pseudo inverse.
pseudo inverse
from numpy import array
from numpy.linalg import pinv
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print(A)
# calculate pseudo inverse
B = pinv(A)
print(B)
[[ 0.1 0.2]
[ 0.3 0.4]
[ 0.5 0.6]
[ 0.7 0.8]]
[[ -1.00000000e+01 -5.00000000e+00 9.04289323e-15 5.00000000e+00]
[ 8.50000000e+00 4.50000000e+00 5.00000000e-01 -3.50000000e+00]]

Dimensionality Reduction
A popular application of SVD is for dimensionality reduction. Data with a large number of features, such as more features (columns) than observations (rows) may be reduced to a smaller subset of features that are most relevant to the prediction problem. The result is a matrix with a lower rank that is said to approximate the original matrix. To do this we can perform an SVD operation on the original data and select the top k largest singular values in Ʃ. These columns can be selected from Ʃ and the rows selected from VT . An approximate B of the original vector A can then be reconstructed.
B = U x Ʃk x VkT
In natural language processing, this approach can be used on matrices of word occurrences or word frequencies in documents and is called Latent Semantic Analysis or Latent Semantic Indexing. In practice, we can retain and work with a descriptive subset of the data called T.
This is a dense summary of the matrix or a projection.
T = U x Ʃk
Further, this transform can be calculated and applied to the original matrix A as well as
other similar matrices.
T = A x Vk
The example below demonstrates data reduction with the SVD. First a 3x 10 matrix is defined, with more columns than rows. The SVD is calculated and only the first two features are selected. The elements are recombined to give an accurate reproduction of the originalma trix. Finally the transform is calculated in two different ways.
# data reduction with svd
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd
# define matrix
A = array([
[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
print(A)
# factorize
U, s, V = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[0], :A.shape[0]] = diag(s)
# select
n_elements = 2
Sigma = Sigma[:, :n_elements]
V = V[:n_elements, :]
# reconstruct
B = U.dot(Sigma.dot(V))
print(B)
# transform
T = U.dot(Sigma)
print(T)
T = A.dot(V.T)
print(T)
[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]]

[[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[ 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
[ 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]]

[[-18.52157747 6.47697214]
[-49.81310011 1.91182038]
[-81.10462276 -2.65333138]]

[[-18.52157747 6.47697214]
[-49.81310011 1.91182038]
[-81.10462276 -2.65333138]]
The scikit-learn provides a TruncatedSVD class that implements this capability directly. The TruncatedSVD class can be created in which you must specify the number of desirable features or components to select, e.g. 2. Once created, you can t the transform (e.g. calculate VkT ) by calling the fit() function, then apply it to the original matrix by calling the transform() function. The result is the transform of A called T above. The example below demonstrates the TruncatedSVD class.
# svd data reduction in scikit-learn
from numpy import array
from sklearn.decomposition import TruncatedSVD
# define matrix
A = array([
[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
print(A)
# create transform
svd = TruncatedSVD(n_components=2)
# fit transform
svd.fit(A)
# apply transform
result = svd.transform(A)
print(result)
[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]]

[[ 18.52157747 6.47697214]
[ 49.81310011 1.91182038]
[ 81.10462276 -2.65333138]]
Note:We can expect there to be some instability when it comes to the sign given the nature of the calculations involved and the differences in the underlying libraries and methods used. This instability of sign should not be a problem in practice as long as the transform is trained for reuse.

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...