Skip to main content

Posts

4.7 Conjugacy and Exponential Family

  Conjugacy According to Bayes’ theorem, the posterior is proportional to the product of the prior and the likelihood. The specification of the prior can be tricky for two reasons: First, the prior should encapsulate our knowledge about the problem before we see any data. This is often difficult to describe. Second, it is often not possible to compute the posterior distribution analytically. However, there are some priors that are computationally  convenient and are called conjugate priors. In Bayesian probability theory, if the posterior distribution $p(θ | x)$ is in the same probability distribution family as the prior probability distribution $p(θ)$, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function $p(x | θ)$. A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior; otherwise numerical integration may be necessary. Further, conjugate priors may give intuit

4.6 Bernoulli, Binomial ,Beta and Poisson Distributions

Bernoulli Distribution Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $\mu$ and the value 0 with probability $1-\mu$. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability $\mu$ and failure/no/false/zero with probability $1-\mu$ . The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). The Bernoulli distribution is a distribution for a single binary random variable $X$ with state $x \in \{0,1\}$. It is governed by a single continuous parameter $\mu \in  [0, 1] $ that represents the probability of $X = 1$. The Bernoulli  distribution $Ber(\mu)$ is defined a

4.5 Gaussian Distribution

 The Gaussian distribution is the most well studied probability distribution for continuous-valued random variables. It is also referred to as the normal distribution. Its importance originates from the fact that it has many computationally convenient properties. The Gaussian distribution arises naturally when we consider sums of independent and identically distributed random variables. This is known as the central limit theorem. There are several applications of this in Machine learning like linear regression, density estimation, reinforcement learning etc. For a univariate random variable, the Gaussian distribution has a density that is given by $p(x|\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$ The multivariate Gaussian distribution is fully characterized by a mean vectorvector $\mu$ and a covariance matrix $\Sigma$ and defined as $p(x|\mu,\Sigma)=(2\pi)^{-\frac{D}{2}}|\Sigma|^{-1/2}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$ where $x \

4.4 Summary Statistics and Independence

We are often interested in summarizing sets of random variables and comparing pairs of random variables. A statistic of a random variable is a deterministic function of that random variable. The summary statistics of a distribution provide one useful view of how a random variable behaves,and as the name suggests, provide numbers that summarize and characterize the distribution.  Mean and the Variance are two well known summary statistics. There are two ways to compare a pair of random variables: first, how to say that two random variables are independent and second, how to compute an inner product between them. Means and Covariances Mean and (co)variance are often useful to describe properties of probability distributions (expected values and spread). There is a useful family of distributions (called the exponential family), where the statistics of the random variable capture all possible information. The concept of the expected value is central to machine learning, and the foundatio

Syllabus Mathematics for Machine Learning- CST 284 - KTU

Syllabus Module 1 LINEAR ALGEBRA : Systems of Linear Equations – Matrices, Solving Systems of Linear Equations. Vector Spaces –Vector Spaces, Linear Independence, Basis and Rank. Linear Mappings –Matrix Representation of Linear Mappings, Basis Change, Image and Kernel. Module 2  ANALYTIC GEOMETRY, MATRIX DECOMPOSITIONS : Norms, Inner Products, Lengths and Distances, Angles and Orthogonality, Orthonormal Basis, Orthogonal  Complement, Orthogonal Projections – Projection into One Dimensional Subspaces, Projection onto General Subspaces, Gram-Schmidt Orthogonalization. Determinant and Trace, Eigenvalues and Eigenvectors, Cholesky Decomposition, Eigen decomposition and Diagonalization, Singular Value Decomposition, Matrix Approximation. Module 3 VECTOR CALCULUS : Differentiation of Univariate Functions - Partial Differentiation and Gradients, Gradients of Vector Valued Functions, Gradients of Matrices, Useful Identities for Computing Gradients. Back propagation and Automatic Differentiati

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(A∪B)=P(A)+P(B)$  wh