Skip to main content

4.1b Probability Distributions for Continuous Random Variable

For a discrete random variable $X$ the probability that $X$ assumes one of its possible values on a single trial of the experiment makes good sense. This is not the case for a continuous random variable. 

For example, suppose $X$ denotes the length of time a commuter just arriving at a bus stop has to wait for the next bus. If buses run every 30 minutes without fail, then the set of possible values of X is the interval denoted [0,30], the set of all decimal numbers between 0 and 30. But although the number 7.211916 is a possible value of $X$, there is little or no meaning to the concept of the probability that the commuter will wait precisely 7.211916 minutes for the next bus. If anything the probability should be zero, since if we could meaningfully measure the waiting time to the nearest millionth of a minute it is practically inconceivable that we would ever get exactly 7.211916 minutes. More meaningful questions are those of the form: What is the probability that the commuter's waiting time is less than 10 minutes, or is between 5 and 10 minutes? In other words, with continuous random variables one is concerned not with the event that the variable assumes a single particular value, but with the event that the random variable assumes a value in a particular interval.

Definition

The probability distribution of a continuous random variable $X$ is an assignment of probabilities to intervals of decimal numbers using a function $f(x)$, called a density function, in the following way: the probability that $X$ assumes a value in the interval $[a,b]$ is equal to the area of the region that is bounded above by the graph of the equation $y=f(x)$, bounded below by the x-axis, and bounded on the left and right by the vertical lines through $a$ and $b$. The total area under the curve is 1.


Every density function $f(x)$ must satisfy the following two conditions:


For all numbers $x, f(x)≥0$, so that the graph of $y=f(x)$ never drops below the x-axis.
The area of the region under the graph of $y=f(x)$ and above the x-axis is 1.

Because the area of a line segment is 0, the definition of the probability distribution of a continuous random variable implies that for any particular decimal number, say 'a', the probability that $X$ assumes the exact value 'a' is 0. This property implies that whether or not the endpoints of an interval are included makes no difference concerning the probability of the interval.

For any continuous random variable X:
$P(a≤X≤b)=P(a<X≤b)=P(a≤X<b)=P(a<X<b)$

Example
A random variable $X$ has the uniform distribution on the interval [0,1]]: the density function is $f(x)=1$, if x is between 0 and 1 and $f(x)=0$ for all other values of $x$, as shown in Figure 

  1. Find $P(X > 0.75)$, the probability that X assumes a value greater than 0.75.
  2. Find $P(X ≤ 0.2)$, the probability that X assumes a value less than or equal to 0.2.
  3. Find $P(0.4 < X < 0.7)$, the probability that X assumes a value between 0.4 and 0.7.
1.$P(X > 0.75)$ is the area of the rectangle of height 1 and base length 1−0.75=0.25, hence is $base×height=(0.25)(1)=0.25$

2.$P(X ≤ 0.2)$ is the area of the rectangle of height 1 and base length 0.2−0=0.2, hence is base×height=(0.2)⋅(1)=0.2

3.P(0.4 < X < 0.7) is the area of the rectangle of height 1 and length 0.7−0.4=0.3, hence is base×height=(0.3)⋅(1)=0.3.

A man arrives at a bus stop at a random time (that is, with no regard for the scheduled service) to catch the next bus. Buses run every 30 minutes without fail, hence the next bus will come any time during the next 30 minutes with evenly distributed probability (a uniform distribution). Find the probability that a bus will come within the next 10 minutes.

The graph of the density function is a horizontal line above the interval from 0 to 30 and is the x-axis everywhere else. Since the total area under the curve must be 1, the height of the horizontal line is 1/30. The probability sought is $P(0≤X≤10)$. By definition, this probability is the area of the rectangular region bounded above by the horizontal line $f(x)=1∕30$, bounded below by the x-axis, bounded on the left by the vertical line at 0 (the y-axis), and bounded on the right by the vertical line at 10. This is the shaded region.Its area is the base of the rectangle times its height, 10⋅(1∕30)=1∕3. 

Thus P(0≤X≤10)=1∕3.



Normal Distribution
Most people have heard of the “bell curve.” It is the graph of a specific density function $f(x)$ that describes the behavior of continuous random variables as different as the heights of human beings, the amount of a product in a container that was filled by a high-speed packing machine, or the velocities of molecules in a gas. The formula for $f(x)$ contains two parameters $\mu$ and $\sigma$ that can be assigned any specific numerical values, so long as $\sigma$ is positive. We will not need to know the formula for $f(x)$, but for those who are interested it is

$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-1/2(\mu-x)^2}{\sigma^2}}$

The probability distribution corresponding to the density function for the bell curve with parameters $\mu$ and $\sigma$ is called the normal distribution with mean $\mu$ and standard deviation $\sigma$.
A continuous random variable whose probabilities are described by the normal distribution with mean $\mu$ and standard deviation $\sigma$ is called a normally distributed random variable, or a normal random variable for short, with mean $\mu$ and standard deviation $\sigma$

The density curve for the normal distribution is symmetric about the mean.




Standard normal distribution
A standard normal random variable is a normally distributed random variable with mean $\mu = 0$ and standard deviation $\sigma = 1$. It will always be denoted by the letter $Z$.


The probability values can be obtained from the table.

Probability Computations for General Normal Random Variables

If $X$ is a normally distributed random variable with mean $\mu$ and standard deviation $\sigma$, then
$P(a<X<b)=P(\frac{a−μ}{σ}<Z<\frac{b−\mu}{\sigma})$
where $Z$ denotes a standard normal random variable. a can be any decimal number or −∞−∞; $b$ can be any decimal number or ∞.

The new endpoints $\frac{(a−\mu)}{\sigma}$ and $\frac{(b−\mu)}{\sigma}$ are the $z$-scores of $a$ and $b$.


Example

Let $X$ be a normal random variable with mean $\mu = 10$ and standard deviation $\sigma = 2.5$. Compute the following probabilities.
1.$P(X < 14)$.
2.$P(8<X<14)$.

1.$P(X<14)=P(Z<\frac{14-\mu}{\sigma})$
$\quad =P(Z<\frac{14-10}{2.5})$
$=P(Z<1.60)$
$=0.9452$


2.$P(8<X<14)$
 $= P(\frac{8-10}{2.5} <X < \frac{14-10}{2.5})$
$=P(-0.80 < Z < 1.60)$
$=0.9452-0.2119$
$=0.7333$


Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...