4.1b Probability Distributions for Continuous Random Variable

For a discrete random variable $X$ the probability that $X$ assumes one of its possible values on a single trial of the experiment makes good sense. This is not the case for a continuous random variable.

For example, suppose $X$ denotes the length of time a commuter just arriving at a bus stop has to wait for the next bus. If buses run every 30 minutes without fail, then the set of possible values of X is the interval denoted [0,30], the set of all decimal numbers between 0 and 30. But although the number 7.211916 is a possible value of $X$, there is little or no meaning to the concept of the probability that the commuter will wait precisely 7.211916 minutes for the next bus. If anything the probability should be zero, since if we could meaningfully measure the waiting time to the nearest millionth of a minute it is practically inconceivable that we would ever get exactly 7.211916 minutes. More meaningful questions are those of the form: What is the probability that the commuter's waiting time is less than 10 minutes, or is between 5 and 10 minutes? In other words, with continuous random variables one is concerned not with the event that the variable assumes a single particular value, but with the event that the random variable assumes a value in a particular interval.

Definition

The probability distribution of a continuous random variable $X$ is an assignment of probabilities to intervals of decimal numbers using a function $f(x)$, called a density function, in the following way: the probability that $X$ assumes a value in the interval $[a,b]$ is equal to the area of the region that is bounded above by the graph of the equation $y=f(x)$, bounded below by the x-axis, and bounded on the left and right by the vertical lines through $a$ and $b$. The total area under the curve is 1.

Every density function $f(x)$ must satisfy the following two conditions:

For all numbers $x, f(x)≥0$, so that the graph of $y=f(x)$ never drops below the x-axis.
The area of the region under the graph of $y=f(x)$ and above the x-axis is 1.

Because the area of a line segment is 0, the definition of the probability distribution of a continuous random variable implies that for any particular decimal number, say 'a', the probability that $X$ assumes the exact value 'a' is 0. This property implies that whether or not the endpoints of an interval are included makes no difference concerning the probability of the interval.

For any continuous random variable X:

$P(a≤X≤b)=P(a<X≤b)=P(a≤X<b)=P(a<X<b)$

Example

A random variable $X$ has the uniform distribution on the interval [0,1]]: the density function is $f(x)=1$, if x is between 0 and 1 and $f(x)=0$ for all other values of $x$, as shown in Figure

Find $P(X > 0.75)$, the probability that X assumes a value greater than 0.75.
Find $P(X ≤ 0.2)$, the probability that X assumes a value less than or equal to 0.2.
Find $P(0.4 < X < 0.7)$, the probability that X assumes a value between 0.4 and 0.7.

1.$P(X > 0.75)$ is the area of the rectangle of height 1 and base length 1−0.75=0.25, hence is $base×height=(0.25)(1)=0.25$

2.$P(X ≤ 0.2)$ is the area of the rectangle of height 1 and base length 0.2−0=0.2, hence is base×height=(0.2)⋅(1)=0.2

3.P(0.4 < X < 0.7) is the area of the rectangle of height 1 and length 0.7−0.4=0.3, hence is base×height=(0.3)⋅(1)=0.3.

A man arrives at a bus stop at a random time (that is, with no regard for the scheduled service) to catch the next bus. Buses run every 30 minutes without fail, hence the next bus will come any time during the next 30 minutes with evenly distributed probability (a uniform distribution). Find the probability that a bus will come within the next 10 minutes.

The graph of the density function is a horizontal line above the interval from 0 to 30 and is the x-axis everywhere else. Since the total area under the curve must be 1, the height of the horizontal line is 1/30. The probability sought is $P(0≤X≤10)$. By definition, this probability is the area of the rectangular region bounded above by the horizontal line $f(x)=1∕30$, bounded below by the x-axis, bounded on the left by the vertical line at 0 (the y-axis), and bounded on the right by the vertical line at 10. This is the shaded region.Its area is the base of the rectangle times its height, 10⋅(1∕30)=1∕3.

Thus P(0≤X≤10)=1∕3.

Normal Distribution

Most people have heard of the “bell curve.” It is the graph of a specific density function $f(x)$ that describes the behavior of continuous random variables as different as the heights of human beings, the amount of a product in a container that was filled by a high-speed packing machine, or the velocities of molecules in a gas. The formula for $f(x)$ contains two parameters $\mu$ and $\sigma$ that can be assigned any specific numerical values, so long as $\sigma$ is positive. We will not need to know the formula for $f(x)$, but for those who are interested it is

$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-1/2(\mu-x)^2}{\sigma^2}}$

The probability distribution corresponding to the density function for the bell curve with parameters $\mu$ and $\sigma$ is called the normal distribution with mean $\mu$ and standard deviation $\sigma$.

A continuous random variable whose probabilities are described by the normal distribution with mean $\mu$ and standard deviation $\sigma$ is called a normally distributed random variable, or a normal random variable for short, with mean $\mu$ and standard deviation $\sigma$

The density curve for the normal distribution is symmetric about the mean.

Standard normal distribution
A standard normal random variable is a normally distributed random variable with mean $\mu = 0$ and standard deviation $\sigma = 1$. It will always be denoted by the letter $Z$.

The probability values can be obtained from the table.

Probability Computations for General Normal Random Variables

If $X$ is a normally distributed random variable with mean $\mu$ and standard deviation $\sigma$, then
$P(a<X<b)=P(\frac{a−μ}{σ}<Z<\frac{b−\mu}{\sigma})$
where $Z$ denotes a standard normal random variable. a can be any decimal number or −∞−∞; $b$ can be any decimal number or ∞.

The new endpoints $\frac{(a−\mu)}{\sigma}$ and $\frac{(b−\mu)}{\sigma}$ are the $z$-scores of $a$ and $b$.

Example

Let $X$ be a normal random variable with mean $\mu = 10$ and standard deviation $\sigma = 2.5$. Compute the following probabilities.
1.$P(X < 14)$.
2.$P(8<X<14)$.

1.$P(X<14)=P(Z<\frac{14-\mu}{\sigma})$

$\quad =P(Z<\frac{14-10}{2.5})$

$=P(Z<1.60)$

$=0.9452$

2.$P(8<X<14)$

$= P(\frac{8-10}{2.5} <X < \frac{14-10}{2.5})$

$=P(-0.80 < Z < 1.60)$

$=0.9452-0.2119$

$=0.7333$

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

4.1b Probability Distributions for Continuous Random Variable

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem