Skip to main content

1.1 Solving system of equations using Gauss Elimination Method

Elementary Transformations

Key to solving a system of linear equations are elementary transformations that keep the solution set the same, but that transform the equation system into a simpler form:

Exchange of two equations (rows in the matrix representing the system of equations)
Multiplication of an equation (row) with a constant 
Addition of two equations (rows)
Add a scalar multiple of one row to the other.

Row Echelon Form

A matrix is in row-echelon form if

All rows that contain only zeros are at the bottom of the matrix; correspondingly,all rows that contain at least one nonzero element are on top of rows that contain only zeros.

Looking at nonzero rows only, the first nonzero number from the left pivot (also called the pivot or the leading coefficient) is always strictly to the right of the  pivot of the row above it.

The row-echelon form is where the leading (first non-zero) entry of each row has only zeroes below it. These leading entries are called pivots

Example:
$\begin{bmatrix}
1 & 1 & 2\\
0 & 1& 3\\
0 & 0 & 5
\end{bmatrix}$
Reduced Row Echelon Form

An equation system is in reduced reduced row-echelon form (also: row-reduced echelon form or row canonical form) if 

It is in row-echelon form.
Every pivot is 1.
The pivot is the only nonzero entry in its column.

The reduced row-echelon form will play an important role  because it allows us to determine the general solution of a system of linear equations in a straightforward way.

$\begin{bmatrix}
1 & 3 & 0 & 0 & 3\\
0 & 0 & 1 & 0 & 9\\
0& 0 & 0 & 1 & -4
\end{bmatrix}$

  • Two matrices are row equivalent if one can be obtained from the other by a sequence of elementary row operations.
  • The matrix in reduced row echelon form that is row equivalent to A is denoted by rref(A)
  • The rank of a matrix A is the number of rows in rref(A).

Gaussian Elimination

Gaussian elimination is an algorithm that performs elementary transformations to bring a system of linear equations into row-echelon form(Gauss) or reduced row-echelon form(Gauss Jordan elimination).This transformation is necessary for solving a system of linear equations.

Consider the set of equations of the form $Ax=b$

Write the augmented matrix for the linear equations.$[A|b]$
Use elementary row operations on the augmented matrix $[A|b]$ to transform it into reduced row echelon form.
If a zero is on the diagonal, switch the rows until a nonzero is in its place.
Use back substitution to find the solution.

[Gauss-Jordan Elimination] For a given system of linear equations, we can find a solution as follows.
This procedure is called Gauss-Jordan elimination.

  • Write the augmented matrix of the system of linear equations.
  • Use elementaray row operations to reduce the augmented matrix into (reduced) row echelon form.
  • Write the system of linear equation corresponding to the matrix in row echelon form.
  • Solve the system using back substitution.

Solve the system by Gaussian Elimination:
Example 1:
$2x+y−z=8$
$−3x−y+2z=−11$
$−2x+y+2z=−3$

Write the augmented matrix:
$\begin{bmatrix} 2 & 1 & -1 & | & 8\\ -3 & -1 & 2 & | & -11\\ -2& 1 & 2 & | & -3 \end{bmatrix}$

Use elementary row operations to reduce the matrix to reduced row echelon form:
$\begin{bmatrix} 1 & 0 & 0 & | & 2\\ 0 & 1 & 0 & | & 3\\ 0& 0 & 1 & | & -1 \end{bmatrix}$

Using elementary row operations to obtain reduced row echelon form .The solution to the system is revealed in the last column: $x=2,y=3,z=−1$

Note that this system of equation has unique solution.

Example 2:
Consider the system of equation
$x+y−3z=4$
$2x+y−z=2$
$3x+2y-4z=7$
Write the augmented matrix:
$\begin{bmatrix}
1 & 1 & -3 & | & 4\\
2 & 1 & -1& | & 2\\
3& 2 & -4 & | & 7
\end{bmatrix}$
Use elementary row operations to reduce the matrix to reduced row echelon form:
$\begin{bmatrix}
1 & 1 & -3 & | & 4\\
0 & -1 & 5& | & -6\\
0& 0 & 0 & | & 1
\end{bmatrix}$

The last row indicates that $0x+0y+0z=1$.An equation that cannot be satisfied by any values of $x, y$, and $z$. The process stops: this system has no solutions.Gaussian elimination reveals that the given system of equations produce an  inconsistent system.

Example 3:
Consider the system of equation
$x+y−3z=0$
$2x+y−z=0$
$3x+2y-4z=0$

A system such as this one, where the constant term on the right‐hand side of every equation is 0, is called a homogeneous system. In matrix form it reads $A x = 0$. Since every homogeneous system is consistent—because $x = 0$ is always a solution—a homogeneous system has either exactly one solution (the trivial solution, $x = 0$) or infinitely many. It is not necessary to explicitly augment the coefficient matrix with the column $b = 0$, since no elementary row operation can affect these zeros. That is, if $A′ $ is an echelon form of $A$, then elementary row operations will transform $[ A| 0]$ into $[ A′| 0]$.

From the reduced echelon form matrix 
$\begin{bmatrix}
1 & 1 & -3 & | & 0\\
0 & -1 & 5& | & 0\\
0& 0 & 0 & | & 0
\end{bmatrix}$
The variable $z$ is free variable which can take any value let it be $t$
$z=t$
$-y+5t=0$
$y=5t$
$x+y-3t=0$
$x=3t-y$
$x=3t-5t$
$x=-2t$
So the solution is $ t(-2,5,1)$

Example 4:
Consider the system of equation
$x+y−3z=4$
$2x+y−z=2$
$3x+2y-4z=6$

Write the augmented matrix:
$\begin{bmatrix} 1 & 1 & -3 & | & 4\\ 2& 1 & -1& | & 2\\ 3& 2 & -4 & | & 7 \end{bmatrix}$

Use elementary row operations to reduce the matrix to reduced row echelon form:
$\begin{bmatrix}
1 & 1 & -3 & | & 4\\
0& 1 & -5& | & 6\\
0& 0 & 0 & | & 0
\end{bmatrix}$

Here, the third row translates into $0 x + 0 y + 0 z = 0$, an equation which is satisfied by any $x, y,$ and $z$. Since this offer no constraint on the unknowns, there are not three conditions on the unknowns, only two (represented by the two nonzero rows in the final augmented matrix). Since there are 3 unknowns but only 2 constraints, $3 − 2 =1$ of the unknowns, $z$ say, is arbitrary; this is called a free variable. Let $z = t$, where $t$ is any real number. Back‐substitution of $z = t$ into the second row  gives
using row 2
$y - 5 z = 6$
So
 $y = 5 z + 6 = 5 t + 6$

using row 1
$x + 2 z = -2$
$x = -2 -2t$
We can further do row reduction can make reduced row echelon form. The non pivot column represents free variables.The system has two equations and 3 unknowns so there are infinitely many solutions.
$ \begin{bmatrix}
1 & 0 & 2 & | & -2\\
0& 1 & -5& | & 6\\
0& 0 & 0 & | & 0
\end{bmatrix}$

the particular solution when $z=0$ is 
$x=-2 , y=6 , z=0$

An immediate solution can be found by taking -2 times the first column plus 6 times the second column plus 0 times the third column.This solution is called particular solution or special solution.How ever this is not the only solution of the system of linear equations.All the solutions of the system $Ax=0$ can also be added to the particular solution.

-2 times first column +  5 times second column + third column will give 0

so $(-2,5,1)$ is the solution for $Ax=0$
putting it together we obtain all the solutions of the system of equations called general solutions

$(x,y,z) =  (-2,6,0) + (-2,5,1) t$

The general approach we followed consisted of the following three steps
1.Find a particular solution to $Ax=b$ ( by setting free variables to 0)
2.Find all solutions to $Ax=0$
3.Combine the solutions from step 1 and step 2 to the general solutions

Neither the general nor the particular solution is unique.

Linear system with fewer equations than unknowns, if consistent, has infinitely many solutions. The condition “fewer equations than unknowns” means that the number of rows in the coefficient matrix is less than the number of unknowns. This implies that there must be at least one free variable. Since such a variable can, by definition, take on infinitely many values so the system will have infinitely many solutions.

The general solutions to a consistent non homogeneous linear system, $A x = b$, is equal to the general solution of the corresponding homogeneous system, $A x = 0$, plus a particular solution of the non homogeneous system. That is, if $x = x h$ represents the general solution of $A x = 0$, then $x = x h + x$ represents the general solution of $A x + b$, where $x$ is any particular solution of the (consistent) non homogeneous system $A x = b$.

[Technical note: Theorem C, which concerns a linear system, has a counterpart in the theory of linear differential equations. Let $L$ be a linear differential operator; then the general solution of a solvable non homogeneous linear differential equation, $L(y) = d$ (where d ≢ 0), is equal to the general solution of the corresponding homogeneous equation, $L(y) = 0$, plus a particular solution of the non homogeneous equation. That is, if $y = yh$ represents the general solution of $L(y) = 0$, then $y = yh + y$ represents the general solution of $L(y) = d$, where $y$ is any particular solution of the (solvable) non homogeneous linear equation $L(y) = d$.]

Example 5:
Let $b=(b_1,b_2,b_3)^T $ be a vector and $A$ be a matrix
$\begin{bmatrix}
2 & 1 & -1\\
-1& -3 & 1\\
1&8 & -2
\end{bmatrix}$

For what values of $b1, b2,$ and $b3$ will the system $A x = b$ be consistent?
The augmented matrix is
$\begin{bmatrix}
2 & 1 & -1&|& b_1\\
-1& -3 & 1 & | & b_2\\
1&8 & -2 & | & b_3
\end{bmatrix}$
After row reduction the row echelon form is
$\begin{bmatrix}
-1 & -3& 1&|& b_2\\
0& -5& 1 & | & b_1+2b_2\\
0&0 & 0 & | & b_1+3b_2+b_3
\end{bmatrix}$
The bottom row now implies that $b_1 + 3 b_2 + b_3$ must be zero if this system is to be consistent. Therefore, the given system has solutions (infinitely many, in fact) only for those column vectors 
$b = ( b1, b2, b3)^T$ for which $b1 + 3 b2 + b3 = 0.$

Example 6:
Find all solutions to the system of linear equations ( university question)
$-4x+5z=-2$
$-3x-3y+5z=3$
$-x+2y+2z=-1$

Write the augmented matrix:
$\begin{bmatrix} -4 & 0 & 5 & | & -2\\ -3 & -3 & 5 & | & 3\\ -1&2& 2 & | & -1\end{bmatrix}$

exchange $R_1$ and $R_3$ also multiply all rows with -1
$\begin{bmatrix} 1 & -2 & -2 & | & 1\\ 3 & 3 & -5 & | & -3\\ 4 & 0 & -5 & | & 2\end{bmatrix}$

now do $R_2=R_2+R_1* -3$ and $R_3=R_3+R_1*-4$
$\begin{bmatrix} 1 & -2 & -2 & | & 1\\ 0 & 9 & 1 & | & 0\\ 0 & 8 & 3 & | & -2\end{bmatrix}$

now do $R_3=R_3+R_2*-3$
$\begin{bmatrix} 1 & -2 & -2 & | & 1\\ 0 & 9 & 1 & | & 0\\ 0 & -19 & 0 & | & -2\end{bmatrix}$

So
$-19y=2$, $y=2/19$

$9y+z=0$, $z=-9y$, $z=-18/19$

$x-2y-2z=1$, $x=1+2y+2z$.$x=1+4/19-36/19$, $x=-13/19$

Example 7: ( university question)
Find all solutions in $x=\begin{bmatrix}
x_1\\
x_2\\
x_3
\end{bmatrix} \in \mathbb{R}^3$ of the equation system $Ax=12x$, where
$A=\begin{bmatrix}
6 & 4 & 3\\
6& 0& 9\\
0& 8 &0
\end{bmatrix}$
and $\sum_{i=1}^3 x_i=1$

We start by rephrasing the problem into solving a homogeneous system of linear equations. Let $x$ be in $\mathbb{R}^3$. We notice that $Ax = 12x$ is equivalent to $(A - 12 I)x = 0$, which can be rewritten as the homogeneous system $\tilde{A}x = 0$, where we define

$\tilde{A}=\begin{bmatrix}
-6 & 4 & 3\\
6& -12& 9\\
0& 8 & -12
\end{bmatrix}$

The constraint $\sum_{i=1}^3 x_i=1$ can be represented as fourth equation which leads to an augmented system

$\tilde{A}=\begin{bmatrix}
-6 & 4 & 3 & | & 0\\
6& -12& 9 & | & 0\\
0& 8 & -12 & | & 0\\
1 & 1 & 1 & | & 1
\end{bmatrix}$

now convert this into reduced echelon form $R_2=R_2+R_1$

$\tilde{A}=\begin{bmatrix}
-6 & 4 & 3 & | & 0\\
0& -8& 12 & | & 0\\
0& 8 & -12 & | & 0\\
1 & 1 & 1 & | & 1
\end{bmatrix}$

$R_3=R_3+R_2$
$\tilde{A}=\begin{bmatrix}
-6 & 4 & 3 & | & 0\\
0& -8& 12 & | & 0\\
0& 0 & 0 & | & 0\\
1 & 1 & 1 & | & 1
\end{bmatrix}$

$\tilde{A}=\begin{bmatrix}
1 & 1 & 1 & | & 1\\
0& -8& 12 & | & 0\\
-6 & 4 & 3 & | & 0\\
0& 0 & 0 & | & 0\\
\end{bmatrix}$

$R_3=R_1*6+R_3$

$\tilde{A}=\begin{bmatrix}
1 & 1 & 1 & | & 1\\
0& -8& 12 & | & 0\\
0 & 10 & 9 & | & 6\\
0& 0 & 0 & | & 0\\
\end{bmatrix}$

$R_2=R_2/-8$
$\tilde{A}=\begin{bmatrix}
1 & 1 & 1 & | & 1\\
0& 1& -3/2 & | & 0\\
0 & 10 & 9 & | & 6\\
0& 0 & 0 & | & 0\\
\end{bmatrix}$

$R_3=R_3-10 R_2$
$\tilde{A}=\begin{bmatrix}
1 & 1 & 1 & | & 1\\
0& 1& -3/2 & | & 0\\
0 & 0 & 24 & | & 6\\
0& 0 & 0 & | & 0\\
\end{bmatrix}$

$R_3=R_3/24$
$\tilde{A}=\begin{bmatrix}
1 & 1 & 1 & | & 1\\
0& 1& -3/2 & | & 0\\
0 & 0 & 1 & | & 1/4\\
0& 0 & 0 & | & 0\\
\end{bmatrix}$

$x_3=1/4$
$x_2=3/8$
$x_1=3/8$

so the solution is 
$x=\frac{1}{8}\begin{bmatrix}
3\\
3\\
2
\end{bmatrix}$

Example 8: (university question)

Find the solution of the system of equations $Ax=b$, where $A$ and $b$ are defined as
$A=\begin{bmatrix}
1 & -1 & 0 & 0 & 1\\
1 & 1 & 0 & -3 & 0\\
2 & -1& 0 & 1& -1\\
-1 & 2 & 0 & -2 & -1\\
\end{bmatrix}$

$b=\begin{bmatrix}
3\\
6\\
5\\
-1
\end{bmatrix}$

We start again by writing down the augmented matrix and apply Gaussian elimination to obtain the reduced row echelon form:
$\begin{bmatrix}
1 & -1 & 0 & 0 & 1 & | & 3\\
1 & 1 & 0 & -3 & 0 &|& 6\\
2 & -1& 0 & 1& -1& |&5\\
-1 & 2 & 0 & -2 & -1&|&-1\\
\end{bmatrix}$

$R_2=R_2-R_1, R_3=R_3-2*R_1, R_4=R_4+R_1$

$\begin{bmatrix}
1 & -1 & 0 & 0 & 1 & | & 3\\
0 & 2 & 0 & -3 & -1 &|& 3\\
0 & 1& 0 & 1& -3& |&-1\\
0 & 1 & 0 & -2 & 0&|&2\\
\end{bmatrix}$

$R_1=R_1+R_3, R_2=R_2-2*R_3, R_4=R_4-R_3$ swap $R_2$ and $R_3$
$\begin{bmatrix}
1 & 0 & 0 & 1& -2 & | & 2\\
0 & 1 & 0 & 1 & -3 &|& -1\\
0 & 0& 0 & -5& 5& |&5\\
0 & 0 & 0 & -3 & 3&|&3\\
\end{bmatrix}$

$R_3=R_3/-5, R_4=R_4/-3 , R_4=R_4-R_3$
$\begin{bmatrix}
1 & 0 & 0 & 1& -2 & | & 2\\
0 & 1 & 0 & 1 & -3 &|& -1\\
0 & 0& 0 & 1& -1& |&-1\\
0 & 0 & 0 & 0 & 0&|&0\\
\end{bmatrix}$

$R_1=R_1-R_3, R_2=R_2-R_3$

$\begin{bmatrix}
1 & 0 & 0 & 0& -1 & | & 3\\
0 & 1 & 0 & 0 & -2 &|& 0\\
0 & 0& 0 & 1& -1& |&-1\\
0 & 0 & 0 & 0 & 0&|&0\\
\end{bmatrix}$

From the reduced row echelon form we apply the “Minus-1 Trick” in order to get the following system:

$\begin{bmatrix}
1 & 0 & 0 & 0& -1 & | & 3\\
0 & 1 & 0 & 0 & -2 &|& 0\\
0 & 0 & -1 & 0 & 0 &|& 0\\
0 & 0& 0 & 1& -1& |&-1\\
0 & 0 & 0 & 0 & -1&|&0\\
\end{bmatrix}$

The right-hand side of the system yields us a particular solution while the columns corresponding to the $-1$ pivots are the directions of the solution space. We then obtain the set of all possible solutions as
$x=\begin{bmatrix}
3\\
0\\
0\\
-1\\
0
\end{bmatrix}+ \lambda_1 \begin{bmatrix}
0\\
0\\
-1\\
0\\
0
\end{bmatrix}+ \lambda _2 \begin{bmatrix}
-1\\
-2\\
0\\
-1\\
-1
\end{bmatrix}$

Example 9: (university question)
Find for what value of a, is the system 
$-2x+4y-2z-w+4v=-3$
$4x-8y+3z-3w+v=2$
$x-2y+z-w+v=0$
$x-2y+0z-3w+4v=a$, consistent. Hence solve the same.

The augmented matrix is 
$\begin{bmatrix}
-2 & 4 & -2& -1 & 4 & | & -3\\
4 & -8 & 3 & -3 & 1 &|& 2\\
1 & -2& 1 & -1& 1& |&0\\
1 & -2 & 0 & -3 & 4&|&a\\
\end{bmatrix}$

The row  reduced echelon form is
$\begin{bmatrix}
1 & -2 & 0& 0 & -2 & | & 2\\
0 & 0 & 1 & 0 & 1 &|& -1\\
0 & 0& 0 & 1&-2& |&1\\
0 & 0 & 0 & 0 & 0&|&a+1\\
\end{bmatrix}$

The system is consistent when  a =-1 

The particular solution is 
$\begin{bmatrix}
2\\
0\\
-1\\
1\\
0
\end{bmatrix}$

General Solution is
$\begin{bmatrix}
2\\
1\\
0\\
0\\
0\\
\end{bmatrix} y +\begin{bmatrix}2\\
0 \\
-1\\
2\\
1\\
\end{bmatrix} v $

Find all the solutions to the following system of linear equations ( university question)
$x_1+x_2+x_3=10$
$x_2+x_3=3$
$2x_1+2x_2+x_3=5$

The augmented matrix is 
$\begin{bmatrix}
1 & 1 & 1&|&10 \\
0& 1 & 1 & |&3 \\
2 & 2& 1 &|& 5\\
\end{bmatrix}$

$R3=R1 \times -2 + R3$
$\begin{bmatrix}
1 & 1 & 1&|&10 \\
0& 1 & 1 & |&3 \\
0 & 0&-1 &|& -15\\
\end{bmatrix}$

$R1=R1 - R2$
$\begin{bmatrix}
1 & 0 & 0&|&7 \\
0& 1 & 1 & |&3 \\
0 & 0&-1 &|& -15\\
\end{bmatrix}$

$R2=R2 + R3$
$\begin{bmatrix}
1 & 0 & 0&|&7 \\
0& 1 & 0 & |&12 \\
0 & 0&-1 &|& -15\\
\end{bmatrix}$

$R3=R3 \times -1$
$\begin{bmatrix}
1 & 0 & 0&|&7 \\
0& 1 & 0 & |&-12 \\
0 & 0& 1 &|& 15\\
\end{bmatrix}$

So the solution is $(x_1,x_2,x_3)=(7,-12,15)$

Find all the solutions to the following system of linear equations ( university question)
$x_1+x_2=4$
$2x_1+x_2+x_3=10$
$2x_1-3x_2+2x_3=3$

The augmented matrix is 
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
2& 1 & 1 & |&10 \\
2 & -3& 2 &|& 3\\
\end{bmatrix}$
$R2=R1 \times -2 + R2$
$R3=R1 \times -2 + R3$
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
0& -1 & 1 & |&2 \\
0 & -5&2 &|& -5\\
\end{bmatrix}$

$R2=R2 \times -1$
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
0& 1 & -1 & |&-2 \\
0 & -5&2 &|& -5\\
\end{bmatrix}$

$R3=R3 + R2 \times 5$
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
0& 1 & -1 & |&-2 \\
0 &0&-3 &|& -15\\
\end{bmatrix}$

$R3=R3 / -3$
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
0& 1 & -1 & |&-2 \\
0 &0&1 &|& 5\\
\end{bmatrix}$

$R2=R3+R2$
$\begin{bmatrix}
1 & 1 & 0&|&4 \\
0& 1 & 0 & |&3 \\
0 &0&1 &|& 5\\
\end{bmatrix}$

$R1=R1-R2$
$\begin{bmatrix}
1 & 0 & 0&|&1 \\
0& 1 & 0 & |&3 \\
0 &0&1 &|& 5\\
\end{bmatrix}$

So the solution is $(x_1,x_2,x_3)=(1,3,5)$

Here, the third row translates into 0 x + 0 y + 0 z = 0, an equation which is satisfied by any x, y, and z. Since this offer no constraint on the unknowns, there are not three conditions on the unknowns, only two (represented by the two nonzero rows in the final augmented matrix). Since there are 3 unknowns but only 2 constrants, 3 − 2 =1 of the unknowns, z say, is arbitrary; this is called a free variable. Let z = t, where t is any real number. Back‐substitution of z = t into the second row (− y + 5 z = −6) gives [100201030011]Here, the third row translates into 0 x + 0 y + 0 z = 0, an equation which is satisfied by any x, y, and z. Since this offer no constraint on the unknowns, there are not three conditions on the unknowns, only two (represented by the two nonzero rows in the final augmented matrix). Since there are 3 unknowns but only 2 constrants, 3 − 2 =1 of the unknowns, z say, is arbitrary; this is called a free variable. Let z = t, where t is any real number. Back‐substitution of z = t into the second row (− y + 5 z = −6) gives from

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...