Bernoulli Distribution
Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $\mu$ and the value 0 with probability $1-\mu$. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability $\mu$ and failure/no/false/zero with probability $1-\mu$.The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution).
The Bernoulli distribution is a distribution for a single binary random variable $X$ with state $x \in \{0,1\}$. It is governed by a single continuous parameter $\mu \in [0, 1] $ that represents the probability of $X = 1$. The Bernoulli distribution $Ber(\mu)$ is defined as
$p(x |\mu) = \mu^x(1-\mu)^{1-x}$ , $x \in [0, 1]$
$E[x]=\mu$
This is due to the fact that for a Bernoulli distributed random variable $X$ with $Pr(X=1)=\mu$ and $Pr(X=0)=1-\mu$ we find
$E [X]=\Pr(X=1)\cdot 1+\Pr(X=0)\cdot 0=\mu \cdot 1+ (1-\mu)\cdot 0=\mu$
variance of $X$ is $V[X]=E[X^2]-(E[X])^2$
$V[X]=\mu-(\mu)^2=\mu(1-\mu)$
$V[x]=\mu(1-\mu)$
where $E[x]$ and $V[x]$ are the mean and variance of the binary random variable $X$.
An example where the Bernoulli distribution can be used is when we are interested in modeling the probability of “heads” when flipping a coin.
Binomial Distribution
The binomial distribution with parameters $N$ and $\mu$ is the discrete probability distribution of the number of successes in a sequence of $N$ independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability $\mu$) or failure (with probability $1 − \mu$).
The Binomial distribution is a generalization of the Bernoulli distribution to a distribution over integers. In particular, the Binomial can be used to describe the probability of observing $m$ occurrences of $X = 1$ in a set of $N$ samples from a Bernoulli distribution where $p(X = 1) = \mu \in [0, 1]$. The Binomial distribution $Bin(N, \mu)$ is defined as
$p(m|N,\mu)=\binom{N}{m}\mu^m(1-\mu)^{N-m}$,
$\binom{N}{m}$ is the Binomial Coefficient and is equal to $\binom{N}{m}=\frac{n!}{(n-m)!m!}$
and hence the name of the distribution
Example:
Suppose a biased coin comes up heads with probability 0.3 when tossed. The probability of seeing exactly 4 heads in 6 tosses is
$p(4|6,0.3)=\binom{6}{4}0.3^4(1-0.3)^{6-4}=0.059535$
$E[X]=E[X_1]+E[X_2]+\cdots+E[X_N]$
$E[X]=\mu+\mu+\cdots+\mu$
$E[X]=N\mu$
Variance $V[X]=E[X^2]-(E[X])^2=N\mu-(N\mu)^2=N\mu(1-\mu)$
$V[X]=N\mu(1-\mu)$
An example where the Binomial could be used is if we want to describe the probability of observing $m$ “heads” in $N$ coin-flip experiments, if the probability for observing head in a single experiment is $\mu$.
Beta Distribution
The beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.
We may wish to model a continuous random variable on a finite interval. The Beta distribution is a distribution over a continuous random variable $\mu \in [0, 1]$, which is often used to represent the probability for some binary event (e.g., the parameter governing the Bernoulli distribution).The $Beta(\alpha,\beta)$ itself is governed by two parameters $\alpha> 0, \beta > 0$ and is defined as
$p(\mu|\alpha,\beta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\mu^{\alpha-1}(1-\mu)^{\beta-1}$
$E[\mu]=\frac{\alpha}{\alpha+\beta}, V[\mu]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
Where $\Gamma(.)$ is the Gamma function defined as
$\Gamma(t)=\int_0^\infty x^{t-1}exp(-x)dx$, $ t>0$
$\Gamma(t+1)=t\Gamma(t)$
Lets ignore the coefficient $\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}$. This is just a normalizing constant to make the function integrate to 1.
The difference between the binomial and the beta is that the former models the number of successes (x), while the latter models the probability (p) of success.
In other words, the probability is a parameter in binomial; In the Beta, the probability is a random variable.
You can choose the $\alpha$ and $\beta$ parameters however you think they are supposed to be. If you think the probability of success is very high, let’s say 90%, set 90 for $\alpha$ and 10 for $\beta$. If you think otherwise, 90 for $\beta$ and 10 for $\alpha$.
As $\alpha$ becomes larger (more successful events), the peak of the probability distribution will shift towards the right, whereas an increase in $\beta$ moves the distribution towards the left (more failures).Also, the distribution will narrow if both $\alpha$ and $\beta$ increase, for we are more certain.
Example: probability of probability
Let’s say how likely someone would agree to go on a date with you follows a Beta distribution with $\alpha = 2$ and $\beta= 8$. What is the probability that your success rate will be greater than 50%?
$P(X>0.5) = 1- CDF(0.5) = 0.01953$
Dr. Bognar at the University of Iowa built
the calculator for Beta distribution, which I found useful and beautiful. You can experiment with different values of α and β and visualize how the shape changes.
Why do we use the Beta distribution?If we just want the probability distribution to model the probability, any arbitrary distribution over (0,1) would work. And creating one should be easy. Just take any function that doesn’t blow up anywhere between 0 and 1 and stays positive, then integrate it from 0 to 1, and simply divide the function with that result. You just got a probability distribution that can be used to model the probability. In that case, why do we insist on using the beta distribution over the arbitrary probability distribution?
The Beta distribution is the conjugate prior for the Bernoulli, binomial, negative binomial and geometric distributions (seems like those are the distributions that involve success & failure) in Bayesian inference.
Computing a posterior using a conjugate prior is very convenient, because you can avoid expensive numerical computation involved in Bayesian Inference.
If we choose to use the beta distribution as a prior, during the modeling phase, we already know the posterior will also be a beta distribution. Therefore, after carrying out more experiments (asking more people to go on a date with you), you can compute the posterior simply by adding the number of acceptances and rejections to the existing parameters α, β respectively, instead of multiplying the likelihood with the prior distribution.
Poisson distribution
In probability theory and statistics, the Poisson distribution named after French mathematician Siméon Denis Poisson,is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
For instance, a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10
A discrete random variable X is said to have a Poisson distribution, with parameter {$ \lambda >0$}, if it has a probability mass function given by:
$f(k,\lambda)=Pr(X=k)=\frac{\lambda^ke^{- \lambda}}{k!}$
where
$k$ is the number of occurances (k=0,1,2...)
$e$ is the Eulers number (2.71828)
The positive real number $\lambda$ is equal to the expected value of $X$ and also to its variance
$\lambda=E(X)=Var(X)$
The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. The number of such events that occur during a fixed time interval is, under the right circumstances, a random number with a Poisson distribution.
The equation can be adapted if, instead of the average number of events $\lambda$, we are given a time rate for the number of events $r$ to happen. Then $\lambda=rt$,(showing $r$ number of events per unit of time), and
$ P(k $ events in interval $ t)=\frac {(rt)^{k}e^{-rt}}{k!}$
Comments
Post a Comment