Skip to main content

4.0 Basics of Probability

Sample Spaces, Events, and Their Probabilities

Sample Spaces and Events

Rolling an ordinary six-sided die is a familiar example of a random experiment, an action for which all possible outcomes can be listed, but for which the actual outcome on any given trial of the experiment cannot be predicted with certainty. In such a situation we wish to assign to each outcome, such as rolling a two, a number, called the probability of the outcome, that indicates how likely it is that the outcome will occur. Similarly, we would like to assign a probability to any event, or collection of outcomes, such as rolling an even number, which indicates how likely it is that the event will occur if the experiment is performed.

Definition
A random experiment is a mechanism that produces a definite outcome that cannot be predicted with certainty. The sample space associated with a random experiment is the set of all possible outcomes. An event is a subset of the sample space.

An event $E$ is said to occur on a particular trial of the experiment if the outcome observed is an element of the set $E$.

Example
The sample space for the experiment that consists of tossing a single coin is
$S=\{H,T\}$

The sample space for the experiment that consists of rolling a single die.The outcomes could be labeled according to the number of dots on the top face of the die. Then the sample space is the set 
$S=\{1,2,3,4,5,6\}$.

 Find the events that correspond to the phrases “an even number is rolled” The outcomes that are even are 2, 4, and 6, so the event that corresponds to the phrase “an even number is rolled” is the set {2,4,6}, which it is natural to denote by the letter E. We write 
$E=\{2,4,6\}$.

Probability

Definition
The probability of an outcome $e$ in a sample space $S$ is a number $p$ between 0 and 1 that measures the likelihood that $e$ will occur on a single trial of the corresponding random experiment. The value $p = 0$ corresponds to the outcome $e$ being impossible and the value $p = 1$ corresponds to the outcome $e$ being certain.

Definition

The probability of an event $A$ is the sum of the probabilities of the individual outcomes of which it is composed. It is denoted $P(A)$.

If an event $E$ is $E=\{e_1,e_2,\ldots,e_k\}$, then

$P(E)=P(e_1)+P(e_2)+ \cdots +P(e_k)$

Since the whole sample space $S$ is an event that is certain to occur, the sum of the probabilities of all the outcomes must be the number 1.
In ordinary language probabilities are frequently expressed as percentages. For example, we would say that there is a 70% chance of rain tomorrow, meaning that the probability of rain is 0.70. We will use this practice here, but in all the computational formulas that follow we will use the form 0.70 and not 70%.

Example:
A die is called “balanced” or “fair” if each side is equally likely to land on top. Assign a probability to each outcome in the sample space for the experiment that consists of tossing a single fair die. Find the probabilities of the events E: “an even number is rolled” and T: “a number greater than two is rolled.”

Solution:

With outcomes labeled according to the number of dots on the top face of the die, the sample space is the set $S=\{1,2,3,4,5,6\}$. Since there are six equally likely outcomes, which must add up to 1, each is assigned probability 1/6.

Since $E=\{2,4,6\}$
$P(E)=1∕6+1∕6+1∕6=3∕6=1∕2.$

Since $T=\{3,4,5,6\}$
$ P(T)=4∕6=2∕3$.
Example
If a coin is tossed 3 times, find the probability that no two successive tosses show the same face.

Solution :
When the coin is tossed 3 times,
$S = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\} = 8$
The favorable outcome of no two successive will show the same face
i.e. $\{ HTH, THT\}=2$
The probability of no two successive will show the same face is given by,
$Probability=2/8=1/4$

A sample space is $S=\{a,b,c,d,e\}$. Identify two events as $U={a,b,d}$ and $V={b,c,d}$. Suppose $P(a)$ and $P(b)$ are each 0.2 and $P(c)$ and $P(d)$ are each 0.1.

Determine what $P(e)$ must be.
Find $P(U)$.
Find $P(V)$.

Solution:
$P(a)+P(b)+P(c)+P(d)+P(e)=1$
$P(e)=1-0.2-0.2-0.1-0.1=0.4$

$P(U)=P(a)+P(b)+P(d)=0.2+0.2+0.1=0.5$

$P(V)=P(b)+P(c)+P(d)=0.2+0.1+0.1=0.4$

Complements, Intersections, and Unions

Complements

Definition

The complement of an event $A$ in a sample space $S$, denoted $A^c$, is the collection of all outcomes in $S$ that are not elements of the set $A$. It corresponds to negating any description in words of the event $A$.

Example:

Two events connected with the experiment of rolling a single die are E: “the number rolled is even” and T: “the number rolled is greater than two.” Find the complement of each.

Solution:

In the sample space $S=\{1,2,3,4,5,6\}$ the corresponding sets of outcomes are $E=\{2,4,6\}$ and $T=\{3,4,5,6\}$. The complements are $E^c={1,3,5}$ and $T^c={1,2}$.

In words the complements are described by “the number rolled is not even” and “the number rolled is not greater than two.” Of course easier descriptions would be “the number rolled is odd” and “the number rolled is less than three.”

If there is a 60% chance of rain tomorrow, what is the probability of fair weather? The obvious answer, 40%, is an instance of the following general rule.

Probability Rule for Complements
$P(A^c)=1−P(A)$

This formula is particularly useful when finding the probability of an event directly is difficult.

Example

Find the probability that at least one heads will appear in five tosses of a fair coin.

Solution:

Identify outcomes by lists of five hs and ts, such as $tthtt$ and $hhttt$. Although it is tedious to list them all, it is not difficult to count them. 
Let $O$ denote the event “at least one heads.” There are many ways to obtain at least one heads, but only one way to fail to do so: all tails. Thus although it is difficult to list all the outcomes that form $O$, it is easy to write $O^c=\{ttttt\}$. Since there are 32 equally likely outcomes, each has probability 1/32, so 
$P(O^c)=\frac{1}{32}$, hence $P(O)=1−1∕32≈0.97$ or about a 97% chance.


Intersection of Events

Definition
The intersection of events $A$ and $B$, denoted $A ∩ B$, is the collection of all outcomes that are elements of both of the sets $A$ and B. It corresponds to combining descriptions of the two events using the word “and.”

Example:
In the experiment of rolling a single die, find the intersection $E ∩ T$ of the events $E$: “the number rolled is even” and $T$: “the number rolled is greater than two.”

Solution:

The sample space is $S={1,2,3,4,5,6}$. Since the outcomes that are common to $E={2,4,6}$ and $T={3,4,5,6}$ are 4 and 6, $E∩T={4,6}$.

In words the intersection is described by “the number rolled is even and is greater than two.” The only numbers between one and six that are both even and greater than two are four and six, corresponding to $E ∩ T$ given above.

Suppose the die has been “loaded” so that $P(1)=1∕12$, $P(6)=3∕12$, and the remaining four outcomes are equally likely with one another. Now find the probability that the number rolled is both even and greater than two.

The information on the probabilities of the six outcomes that we have so far is
outcome           1        2    3    4    5    6
probability       1/12    p    p    p    p     3/12

So  1-(1/12+3/12)=1-1/3=2/3
$4p=2/3$

So $p=1/6$

$P(E ∩ T)=\{4,6\}=1/6+3/12=5/12$

Mutually Exclusive Events
Definition
Events A and B are mutually exclusive if they have no elements in common.

For A and B to have no outcomes in common means precisely that it is impossible for both A and B to occur on a single trial of the random experiment. This gives the following rule.

Probability Rule for Mutually Exclusive Events

Events A and B are mutually exclusive if and only if
$P(A∩B)=0$

Any event $A$ and its complement $A^c$ are mutually exclusive, but $A$ and $B$ can be mutually exclusive without being complements.

Union of Events


Definition

The union of events $A$ and $B$, denoted $A ∪ B$, is the collection of all outcomes that are elements of one or the other of the sets $A$ and $B$, or of both of them. It corresponds to combining descriptions of the two events using the word “or.”

Example:

In the experiment of rolling a single die, find the union of the events E: “the number rolled is even” and T: “the number rolled is greater than two.”

Solution:

Since the outcomes that are in either E={2,4,6} or T={3,4,5,6} (or both) are 2, 3, 4, 5, and 6, E∪T={2,3,4,5,6}. Note that an outcome such as 4 that is in both sets is still listed only once (although strictly speaking it is not incorrect to list it twice).

In words the union is described by “the number rolled is even or is greater than two.” Every number between one and six except the number one is either even or is greater than two, corresponding to E ∪ T given above.

The following Additive Rule of Probability is a useful formula for calculating the probability of $A∪B$
Additive Rule of Probability
$P(A∪B)=P(A)+P(B)−P(A∩B)$

Example:
Two fair dice are thrown. Find the probabilities of the following events:
1.both dice show a four
2.at least one die shows a four

112131415161122232425262132333435363142434445464152535455565162636465666

Let $A$ denotes the event first die shows 4 and $B$ denotes the event second die shows 4
$P(A∩B)=P(A).P(B)=6/36 . 6/36=1/36$
$P(A∪B)=P(A)+P(B)-P(A∩B)=6/36+6/36-1/36=11/36$

Conditional Probability and Independent Events

Suppose a fair die has been rolled and you are asked to give the probability that it was a five. There are six equally likely outcomes, so your answer is 1/6. But suppose that before you give your answer you are given the extra information that the number rolled was odd. Since there are only three odd numbers that are possible, one of which is five, you would certainly revise your estimate of the likelihood that a five was rolled from 1/6 to 1/3.

In general, the revised probability that an event A has occurred, taking into account the additional information that another event B has definitely occurred on this trial of the experiment, is called the conditional probability of A given B and is denoted by $P(A|B)$.

Definition

The conditional probability of $A$ given $B$, denoted $P(A|B)$B), is the probability that event $A$ has occurred in a trial of a random experiment for which it is known that event $B$ has definitely occurred. It may be computed by means of the following formula:

Rule for Conditional Probability
$P(A|B)=\frac{P(A∩B)}{P(B)}$

Example:
A fair die is rolled.
1.Find the probability that the number rolled is a five, given that it is odd.
2.Find the probability that the number rolled is odd, given that it is a five.

Solution:
The sample space for this experiment is the set S={1,2,3,4,5,6} consisting of six equally likely outcomes. Let F denote the event “a five is rolled” and let O denote the event “an odd number is rolled,” so that
F={5} and O={1,3,5}

This is the introductory example, so we already know that the answer is 1/3. To use the formula in the definition to confirm this we must replace A in the formula (the event whose likelihood we seek to estimate) by F and replace B (the event we know for certain has occurred) by O:

$P(F|O)=\frac{P(F∩O)}{P(O)}$

Since $F∩O={5}∩{1,3,5}={5}$

$P(F∩O)=1∕6.$

Since $O={1,3,5}, P(O)=3∕6.$

Thus
$P(F|O)=\frac{P(F∩O)}{P(O)}=\frac{1∕6}{3∕6}=1/3$

2.This is the same problem, but with the roles of F and O reversed. Since we are given that the number that was rolled is five, which is odd, the probability in question must be 1. To apply the formula to this case we must now replace A (the event whose likelihood we seek to estimate) by O and B (the event we know for certain has occurred) by F:

$P(O|F)=\frac{P(O∩F)}{P(F)}$

Obviously $P(F)=1∕6$.
 In part (a) we found that $P(F∩O)=1∕6.$ Thus
$P(O|F)=\frac{P(O∩F)}{P(F)}=\frac{1∕6}{1∕6}=1$

Independent Events

Although typically we expect the conditional probability P(A|B) to be different from the probability P(A) of A, it does not have to be different from P(A). When P(A|B)=P(A), the occurrence of B has no effect on the likelihood of A. Whether or not the event A has occurred is independent of the event B.

Using algebra it can be shown that the equality P(A|B)=P(A) holds if and only if the equality P(A∩B)=P(A)⋅P(B) holds, which in turn is true if and only if P(B|A)=P(B).This is the basis for the following definition.

Definition

Events A and B are independent if
$P(A∩B)=P(A)⋅P(B)$

If A and B are not independent then they are dependent.

The formula in the definition has two practical but exactly opposite uses:

In a situation in which we can compute all three probabilities P(A), P(B) and P(A∩B), it is used to check whether or not the events A and B are independent:

If P(A∩B)=P(A)⋅P(B), then A and B are independent.
If P(A∩B)≠P(A)⋅P(B), then A and B are not independent.

In a situation in which each of P(A) and P(B) can be computed and it is known that A and B are independent, then we can compute P(A∩B) by multiplying together P(A) and P(B)
i.e; P(A∩B)=P(A)⋅P(B).

Example:

A single fair die is rolled. Let A={3} and B={1,3,5}. Are A and B independent?

Solution:
In this example we can compute all three probabilities 
P(A)=1∕6, P(B)=1∕2, and P(A∩B)=P({3})=1∕6.
Since the product P(A)⋅P(B)=(1∕6)(1∕2)=1∕12 is not the same number as  P(A∩B)=1∕6, the events A and B are not independent.

A jar contains 10 marbles, 7 black and 3 white. Two marbles are drawn without replacement, which means that the first one is not put back before the second one is drawn.
1.What is the probability that both marbles are black?
2.What is the probability that exactly one marble is black?
3.What is the probability that at least one marble is black?


Solution:1.Let B1 be the event first marble is black and B2 be the event second marble is black
$P(B1∩B2)=7/10.6/9=0.47$
2.Let B1 be the event that the first marble is black and W2 be the event that second marble is white.
Similarly W1 be the event that the first marble is white and B2 be the event that second marble is black.


$P(B1∩W2)=7/10.3/9=0.23$
$P(W1∩B2)=3/10.7/9=0.23$
$P(B1∩W2) ∪ P(W1∩B2)= 0.23+0.23=0.46$


3.The first pick was a black or the second or both
P(B1)=7/10.3/9=0.23$
P(B2)=3/10.7/9=0.23$
P(B1∩ B2)=7/10.6/9=0.47$
Final probability 0.23+0.23+0.47=0.93


This is same as 1- probability of choosing two white marbles.
$P(W1 ∩ W2)=3/10.2/9=0.07=0.93$

Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...