Probability distributions

Written by:

Ylva B Almquist

This refers to distributions that are based on a set of logical and mathematical assumptions.

Main types of probability distributions

Discrete probability distributions
Continuous probability distributions

Discrete probability distributions

Discrete values: Data can only take certain values within a range (finite).
Mass function: A probability can be assigned to any specific value in the distribution.

Example
When we flip a (fair) coin, we have two possible outcomes: Head (H) and Tail (T).
The probability of getting H is 1/2 = 0.50.
The probability of getting T is 1/2 = 0.50.

Suppose that we flip the coin two times. We have four possible outcomes: HH, HT, TH, and TT. Now supposed that we want to calculate the probability of getting different numbers of T.
The probability of getting 0 T is 1/4 = 0.25.
The probability of getting 1 T is 2/4 = 0.50.
The probability of getting 2 T is 1/4 = 0.25.

Continuous probability distributions

Continuous values: Data can take any value within an range (infinite).
Density function: The probability of any particular value is zero. Therefore, these distributions are usually described in terms of probability density, i.e., the probability that a value will fall within a certain range.

Note
In case you have forgotten what was meant by discrete and continuous values, you can read more about this under Types of values.

We will not provide any comprehensive review of the different types of continuous and discrete probability distributions. Instead, below, we will focus on the specific types of distributions that you will encounter later on in the guide:

Normal distribution
T-distribution
Chi-squared distribution
F-distribution
Binomial distribution

Normal distribution

The normal distribution – sometimes also called Gaussian distribution – is a common type of a continuous probability distribution.

While it is a theoretical distribution, many observed variables follow this distribution. One example is height, as seen in the figure below.

Note
In this setting, “normal” should be interpreted as something that is typical (or regular), not as something that is natural.

Key characteristics of the normal distribution

It is unimodal. This means it only has one peak (representing the mean).
Its tails are asymptotic. This means that that they approach but do not intersect with the horizontal axis (x-axis).
The area under the curve represents the probabilities and sums to one.
Values are symmetrically distributed around the mean, with no skew. Symmetrical means that the tails on each side of the mean are equally large. In other words, 50% of the values are on one side of the mean, and 50% of the values are on the other side of the mean.

The standard normal distribution

The standard normal distribution – also referred to as the z-distribution – is a special type of normal distribution that has a mean of 0 and a standard deviation of 1. It is sometimes called the bell curve due to its shape.

In the figure below, the black line in the middle show the mean (0). Each of the light-grey lines denotes one standard deviation from the mean (ranging from -3 standard deviations on the left-hand side of the mean and +3 standard deviations on the right-hand side of the mean).

Note
Any normal distribution can be standardized by converting the values into z-scores (see Standardization: z-scores). This allows us to calculate the probability of certain values occurring and to make comparisons between different samples.

Other normal distributions

Normal distributions can look quite different. The curves below are all examples of normal distributions. Some have a more pronounced peaks whereas other have flatter curves. Some have their peak (and mean) below 0 whereas others have their peak (and mean) above 0.

This is possible since the shape of a normal distribution defined both by the mean value and the standard deviation!

What is then a standard deviation?

A simple definition of standard deviation is that it expresses how much variation exists from the mean for a given variable (see Variation for further discussion).

If we have a small standard deviation, it suggests that the individuals in our data have values close to the mean, and if we have a large standard deviation, it indicates that the values are more spread out over a large range of values.

The central limit theorem

This theorem states that the distribution of a sample mean approximates a normal distribution as the sample size increases.

Empirical rule

The empirical rule of normal distributions tells us the following:

68% of the values fall within -1 and +1 standard deviations.
95% of all values fall within -2 and +2 standard deviations.
Nearly 100% of all values fall within -3 and +3 standard deviations.

Example
The empirical rule can also be applied to observed distributions.

Let us assume that we have collected information about weight for a sample of individuals. If the mean weight in this sample is 70 kilos and the standard deviation is 5 kilos, the empirical rule would give us the following information:

68% of the individuals have a weight of 65-75 kilos:
Lower limit: 70 kilos – (5 kilos*1); upper limit: 70 kilos + (5 kilos*1)

95% of the individuals have a weight of 60-80 kilos:
Lower limit: 70 kilos – (5 kilos*2); upper limit: 70 kilos + (5 kilos*2)

Nearly 100% of the have a weight of 55-85 kilos:
Lower limit: 70 kilos – (5 kilos*3); upper limit: 70 kilos + (5 kilos*3)

As long as we have information about the mean value and the standard deviation, it is possible to do the same calculation for all normally distributed variables.

Skewness

What if the values are not symmetrically distributed around the mean? When data are asymmetrically distributed around the mean, we can refer to the distribution as skewed.

The skew can be either positive (right tail longer) or negative (left tail longer).

Example
Empirical examples of a positively skewed distribution are: number of hospital visits, number of days in unemployment, number of telephone calls during a day. Most individuals will have the value zero or a low value, whereas a few will have increasingly high values.

Example
Empirical examples of a negatively skewed distribution are: age of retirement, or a very easy test. Most individuals will have high values, and then a few will have very low values.

Measuring skewness

It is possible to measure the skewness of a distribution (see Summarize and Tabstat).

A standard normal distribution has a skewness of 0.
Negative skewness value = Longer tail to the left.
Positive skewness value = Longer tail to the right.

Note
There is no universally accepted rule of thumb when it comes to skewness values. Some argue that a skewness value between -1 and +1 means that the distribution is sufficiently normal whereas others argue that a value between -2 and +2 is acceptable.

T-distribution

A specific type of the normal distribution is the t-distribution. The distributions look very similar, insomuch as they both are bell-shaped and have values symmetrically distributed around the mean.

However, the t-distribution has heavier tails. This means that, in comparison to the normal distribution, more values are located in the tail ends than the center.

It is nonetheless worth noting that the larger the sample size, the more similar the t-distribution will be to a normal distribution.

Measuring kurtosis

Kurtosis is a measure that can be used to capture the how heavy the tails of the distribution are.

A standard normal distribution has a kurtosis of 0 = Mesokurtic distribution.
Kurtosis value above 0 = Leptokurtic distribution (sharper peak and longer/fatter tails).
Kurtosis value below 0 = Platykurtic distribution (rounder peak and shorter/thinner tails).

Note
In Stata, you have to subtract 3 from the kurtosis value. While there is no generally accepted rule of thumb, most consider values between -2 and +2 to be acceptable.

Chi-square distribution

Coming soon!

F-distribution

Coming soon!

Binomial distribution

Coming soon!