Chapter 3

Probability Distributions

A probability distribution specifies the relative likelihoods of all possible outcomes.

Random Variables

Formally, a random variable is a function that assigns a real number to each outcome in the probability space. Define your own discrete random variable for the uniform probability space on the right and sample to find the empirical distribution.

Click and drag to select sections of the probability space, choose a real number value, then press "Submit."

Color	Value
	0

Sample from probability space to generate the empirical distribution of your random variable.

Sample Distribution

Reset

Discrete and Continuous

There are two major classes of probability distributions.

Discrete Continuous

A discrete random variable has a finite or countable number of possible values.

If $ X $ is a discrete random variable, then there exists unique nonnegative functions, $ f(x) $ and $ F(x) $, such that the following are true:

$$\begin{align*}P(X = x) &= f(x)\\P(X < x) &= F(x)\end{align*}$$

Choose one of the following major discrete distributions to visualize. The probability mass function $ f(x) $ is shown in yellow and the cumulative distribution function $ F(x) $ in orange (controlled by the slider).

A Bernoulli random variable takes the value 1 with probability of $p$ and the value 0 with probability of $1-p$. It is frequently used to represent binary experiments, such as a coin toss.

A binomial random variable is the sum of $n$ independent Bernoulli random variables with parameter $p$. It is frequently used to model the number of successes in a specified number of identical binary experiments, such as the number of heads in five coin tosses.

A negative binomial random variable counts the number of successes in a sequence of independent Bernoulli trials with parameter $p$ before $r$ failures occur. For example, this distribution could be used to model the number of heads that are flipped before three tails are observed in a sequence of coin tosses.

A geometric random variable counts the number of trials that are required to observe a single success, where each trial is independent and has success probability $p$. For example, this distribution can be used to model the number of times a die must be rolled in order for a six to be observed.

A Poisson random variable counts the number of events occurring in a fixed interval of time or space, given that these events occur with an average rate $\lambda$. This distribution has been used to model events such as meteor showers and goals in a soccer match.

The uniform distribution is a continuous distribution such that all intervals of equal length on the distribution's support have equal probability. For example, this distribution might be used to model people's full birth dates, where it is assumed that all times in the calendar year are equally likely.

The normal (or Gaussian) distribution has a bell-shaped density function and is used in the sciences to represent real-valued random variables that are assumed to be additively produced by many small effects. For example the normal distribution is used to model people's height, since height can be assumed to be the result of many small genetic and evironmental factors.

Student's t-distribution, or simply the t-distribution, arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.

A chi-squared random variable with $k$ degrees of freedom is the sum of $k$ independent and identically distributed squared standard normal random variables. It is often used in hypothesis testing and in the construction of confidence intervals.

The exponential distribution is the continuous analogue of the geometric distribution. It is often used to model waiting times.

The F-distribution, also known as the Fisher–Snedecor distribution, arises frequently as the null distribution of a test statistic, most notably in the analysis of variance.

The gamma distribution is a general family of continuous probability distributions. The exponential and chi-squared distributions are special cases of the gamma distribution.

The beta distribution is a general family of continuous probability distributions bound between 0 and 1. The beta distribution is frequently used as a conjugate prior distribution in Bayesian statistics.

PMF Distribution	Mean	Variance
$f(x;p) = \begin{cases} p & \text{if } x = 1 \\ 1-p & \text{if } x = 0 \end{cases}$ $ f(x; n,p) = \binom{n}{x}p^{x}(1-p)^{n-x}$ $f(x; n,r,p) = \binom{x + r -1}{x}p^{x}(1-p)^{r}$ $ f(x; p) = (1-p)^{x}p$ $ f(x;\lambda) = \dfrac{\lambda^{x}e^{-\lambda}}{x!}$ $f(x;a,b) = \left\{\begin{array}{ll} \dfrac{1}{b-a} \text{ for } x \in [a,b]\\ 0 \qquad \text{ otherwise } \end{array}\right.$ $ f(x;\mu, \sigma^2) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}} e^{-\dfrac{(x-\mu)^{2}}{2\sigma^{2}}}$ $\dfrac{Z}{\sqrt{U/k}} \qquad \begin{array}{ll} Z \sim N(0,1)\\ U \sim \chi_{k} \end{array}$ $\sum_{i=1}^{k}Z_{i}^{2} \qquad Z_{i} \overset{i.i.d.}{\sim} N(0,1)$ $ f(x;\lambda) = \begin{cases} \lambda e^{-\lambda x} & \text{if } x \geq 0 \\ 0 & \text{otherwise} \end{cases} $ $\dfrac{U_{1}/d_{1}}{U_{2}/d_{2}} \qquad \begin{array}{ll} U_{1} \sim \chi_{d_{1}}\\ U_{2} \sim \chi_{d_{2}} \end{array}$ $ f(x; k,\theta) = \dfrac{1}{\Gamma(k)\theta^{k}}x^{k-1}e^{-\dfrac{x}{\theta}}$ $f(x;\alpha,\beta) = \dfrac{\Gamma(\alpha + \beta)x^{\alpha - 1}(1-x)^{\beta - 1}}{\Gamma(\alpha)\Gamma(\beta)}$	$p$ $np$ $\dfrac{pr}{1-p}$ $\dfrac{1}{p}$ $\lambda$ $\dfrac{a+b}{2}$ $\mu$ $0$ $k$ $\frac{1}{\lambda}$ $\dfrac{d_{2}}{d_{2}-2}$ $k\theta$ $\dfrac{\alpha}{\alpha + \beta}$	$p(1-p)$ $np(1-p)$ $\dfrac{pr}{(1-p)^{2}}$ $\dfrac{1-p}{p^{2}}$ $\lambda$ $\dfrac{(b-a)^{2}}{12}$ $\sigma^{2}$ $\dfrac{k}{k-2}$ $2k$ $\frac{1}{\lambda^{2}}$ $\dfrac{2d_{2}^{2}(d_{1} + d_{2} -2)}{d_{1}(d_{2}-2)^{2}(d_{2}-4)}$ $k\theta^{2}$ $\dfrac{\alpha\beta}{(\alpha + \beta)^{2}(\alpha + \beta + 1)}$

$\large p$ = 0.5

$\large n$ = 5
$\large p$ = 0.5

$\large r$ = 5
$\large p$ = 0.5

$\large p$ = 0.5

$\large\lambda$ = 5

$\large a$ = -5
$\large b$ = 5

$\large \mu$ = 0
$\large \sigma$ = 1

$\large k$ = 5

$\large \lambda$ = 5

$\large d_{1}$ = 5
$\large d_{2}$ = 5

$\large k$ = 5
$\large \theta$ = 5

$\large \alpha$ = 5
$\large \beta$ = 5

Central Limit Theorem

The Central Limit Theorem (CLT) states that the sample mean of a sufficiently large number of i.i.d. random variables is approximately normally distributed. The larger the sample, the better the approximation.

Change the parameters $\alpha$ and $\beta$ to change the distribution from which to sample.

$\large \alpha$ = 1.00
$\large \beta$ = 1.00

Choose the sample size and how many sample means should be computed (draw number), then press "Sample." Check the box to display the true distribution of the sample mean.

Sample size = 1
Draws = 1

Theoretical

Sample

This visualization was adapted from Philipp Plewa's fantastic visualization of the central limit theorem.

Download

Chapter 3

Probability Distributions

Random Variables

Discrete and Continuous

Central Limit Theorem

Next