3 Standard distributions

After introducing probability theory, it’s natural to introduce some concrete example distributions. These distributions will be the backbone of our modeling exercise.

The first use of modeling data with probability distributions is density estimation. Here we assign a certain distribution to the data, and then determine the parameters of the distribution to best fit the data. Density estimation is always an ill-posed problem, because as long as a distribution has support for the data, there is always chance of that distribution producing the data, albeit the probability might be slim.

Parameterized distributions can fell short for real world complex data non-parameterized methods still have parameters, just that the parameters serve a different purpose. For parameterized distributions they are used to determine the shape of the distribution, while for non-parameterized methods, they decide the model complexity.

And model complexity can grow rapidly with data size.

Deep neural nets offer the best of both worlds.

3.1 Discrete variables

3.1.1 Bernoulli distribution

3.1.2 Binomial distribution

3.1.3 Multinomial distribution

3.2 The multivariate Gaussian

3.2.1 Geometry of the Gaussian

3.2.2 Moments

3.2.3 Limitations

3.2.4 Conditional distribution

3.2.5 Marginal distribution

3.2.6 Bayes’ theorem

3.2.7 Maximum likelihood

3.2.8 Sequential estimation

3.2.9 mixture of Gaussians

3.3 Periodic variables

3.3.1 von Mises distribution

3.4 The exponential family

\[ p(\mathbf{x}|\mathbf{\eta}) = h(\mathbf{x})g(\mathbf{\eta}) \exp\{\mathbf{\eta}^T \mathbf{u}(\mathbf{x})\} \]

The four key elements of the exponential family are:

base measure
partition function
natural parameter
sufficient statistics

3.4.1 Sufficient statistics

Due to the formula constraint of the exponential family, any information about the data that is not in the sufficient statistics is lost.