Normal distribution

The normal distribution is a common type of a continuous probability distribution.

Note
In this setting, “normal” should be interpreted as something that is typical (or regular), not as something that is natural.

Such distributions assume that values are symmetrically distributed around the mean, with no skew.

Symmetrical means that the tails on each side of the mean are equally large. In other words, 50% of the values are on one side of the mean, and 50% of the values are on the other side of the mean.

While it is a theoretical distribution, many real-life variables follow this distribution. One example is height, as seen in the figure below.

The standard normal distribution

The standard normal distribution – also referred to as the z-distribution – is a special type of normal distribution that has a mean of 0 and a standard deviation of 1.

In the figure below, the dark-grey line in the middle show the mean (0). Each of the light-grey lines denote one standard deviation from the mean (ranging from -3 standard deviations on the left-hand side of the mean and +3 standard deviations on the right-hand side of the mean).

The standard normal distribution is also called Gaussian distribution or bell curve, due to its shape.

Note
Any normal distribution can be standardized by converting the values into z-scores (see Standardization: z-scores). This allows us to calculate the probability of certain values occurring and to make comparisons between different samples.

Other normal distributions

Normal distributions can look quite different. The curves below are all examples of normal distributions. Some have a more pronounced peaks whereas other have flatter curves. Some have their peak (and mean) below 0 whereas others have their peak (and mean) above 0.

This is possible since the shape of a normal distribution defined both by the mean value and the standard deviation!

What is then a standard deviation?

A simple definition of standard deviation is that it expresses how much variation exists from the mean for a given variable (see Variation for further discussion).

If we have a small standard deviation, it suggests that the individuals in our data have values close to the mean, and if we have a large standard deviation, it indicates that the values are more spread out over a large range of values.

Empirical rule

The empirical rule of normal distributions tells us the following:

  • 68% of the values fall within -1 and +1 standard deviations.
  • 95% of all values fall within -2 and +2 standard deviations.
  • Nearly 100% of all values fall within -3 and +3 standard deviations.
Example
The empirical rule can also be applied to observed distributions.

Let us assume that we have collected information about weight for a sample of individuals. If the mean weight in this sample is 70 kilos and the standard deviation is 5 kilos, the empirical rule would give us the following information:
 
68% of the individuals have a weight of 65-75 kilos:
Lower limit: 70 kilos – (5 kilos*1); upper limit: 70 kilos + (5 kilos*1)
 
95% of the individuals have a weight of 60-80 kilos:
Lower limit: 70 kilos – (5 kilos*2); upper limit: 70 kilos + (5 kilos*2)
 
Nearly 100% of the have a weight of 55-85 kilos:
Lower limit: 70 kilos – (5 kilos*3); upper limit: 70 kilos + (5 kilos*3)

As long as we have information about the mean value and the standard deviation, it is possible to do the same calculation for all normally distributed variables.

Skewness

What if the values are not symmetrically distributed around the mean? When data are asymmetrically distributed around the mean, we can refer to the distribution as skewed.

The skew can be either positive (right tail longer) or negative (left tail longer).

Example
Empirical examples of a positively skewed distribution are: number of hospital visits, number of days in unemployment, number of telephone calls during a day. Most individuals will have the value zero or a low value, whereas a few will have increasingly high values.
Example
Empirical examples of a negatively skewed distribution are: age of retirement, or a very easy test. Most individuals will have high values, and then a few will have very low values.
Measuring skewness

It is possible to measure the skewness of a distribution (see Summarize and Tabstat).

  • A standard normal distribution has a skewness of 0.
  • Negative skewness value = Longer tail to the left.
  • Positive skewness value = Longer tail to the right.

Note
A skewness value between -2 and +2 is usually considered acceptable, meaning that the distribution can be said to be approximately normal. This means that we can then treat it as it were normally distributed (but with due caveats).