One-way ANOVA

Quick facts

Number of variables
One group variable
One test variable

Scales of variable(s)
Group variable: categorical 
Test variable: continuous

Introduction

The one-way ANOVA is very similar to the independent samples t-test. The difference is that the one-way ANOVA allows you to have more than two categories in your group variable.

In other words, the one-way ANOVA is a parametric method for comparing the mean of one variable between two or more (unrelated) groups.

Example

Mean number of ice cones per week during May in Swedish children ages 5-10  Mean number of ice cones per week during June in Swedish children ages 5-10  Mean number of ice cones per week during July in Swedish children ages 5-10  
We might be interested in knowing whether these are monthly differences in ice cream consumption among small children. Accordingly, we can compare the mean number of consumed ice cones across these three months.
Note
The one-way ANOVA is considered an omnibus test since it only tells whether whether there are significant differences overall, and not exactly which groups are different from the others. There are nonetheless post-hoc tests that can accomplish this.
Note
ANOVA stands for “analysis of variance”. The one-way ANOVA is sometimes referred to as one-factor ANOVA, one-way analysis of variance, or between-subjects ANOVA.
F-distribution

The one-way ANOVA assumes a F-distribution.

The F-distribution is a continuous probability distribution (similar to the chi-square distribution). It is positively skewed and bounded at zero (i.e., it cannot go below 0).

The shape of the distribution is determined by the degrees of freedom – for the numerator (df1) and for the denominator (df2). The fewer the degrees of freedom the more the peak of the distribution approaches 0.

F-statistic

An F-statistic is the ratio of two variances. As you may remember from an earlier chapter (see Variation), the variance is the average of squared deviations from the mean value.

More specifically, we need these two estimates of the variance:

Variance between groupsSSbetweenThe sum of squares that represents the variation between the groups.
Variance within groupsSSwithinThe sum of squares that represents the variation within groups that is due to chance.

The F-statistic is then calculated by dividing the variance between groups with the variance within groups.

P-value

When performing a one-way ANOVA, we want to examine whether if there is sufficient evidence to reject the null hypothesis (which stipulates that there is no difference between the groups).

A p-value that is lower than 0.05 means that we can reject the null hypothesis.

Assumptions

First, you have to check your data to see that the assumptions behind the one-way ANOVA hold. If your data “passes” these assumptions, you will have a valid result.  

Below is a checklist for these assumptions.

Continuous test variable Your test variable should be continuous (i.e. interval/ratio). For example: Income, height, weight, number of years of schooling, and so on. Although they are not really continuous, it is still very common to use ratings as continuous variables, such as: “How satisfied with your income are you?” (on a scale 1-10) or “To what extent do you agree with the previous statement?” (on a scale 1-5). 
Normally distributed test variableThe test variable should be approximately normally distributed. Use a histogram to check (see Histogram).
Two or more unrelated categories in the group variableYour group variable should be categorical (i.e. nominal or ordinal) and consist of two or more groups. Unrelated means that the groups should be mutually exclusive: no individual can be in more than one of the groups. For example: low vs. medium vs. high educational level; liberal vs. conservative vs. socialist political views; or poor vs. fair, vs. good vs. excellent health; and so on. 
Equal variance The variance in the test variable should be equal across the groups of the group variable. 
No outliers An outlier is an extreme (low or high) value. For example, if most individuals have a test score between 40 and 60, but one individual has a score of 96 or another individual has a score of 1, this will distort the test. 

Function

Basic command
oneway testvar groupvar
Useful options
oneway testvar groupvar, tab  
oneway testvar groupvar, bonferroni
Explanations
testvarInsert the name of the test variable.
groupvarInsert the name of the group variable.
tabProduce summary table.
bonferroniReports the results from a Bonferroni multiple-comparison test.
More information
help oneway
Note
There are many different postestimation commands that you can apply to ANOVA. These options are described here: help anova postestimation 

Practical example

Dataset
StataData1.dta
Variable nameincome
Variable labelAnnual salary income (Age 40, Year 2010)
Value labelsN/A
Variable nameeduc
Variable labelEducational level (Age 40, Year 2010)
Value labels1=Compulsory
2=Upper secondary
3=University
oneway income educ, tab bonferroni

The first table provides some summary statistics. Here we can see that the mean income for the different groups:

  • Compulsory education: 164316.86.
  • Upper secondary education: 178904.49.
  • University education: 238989.77.  

Next table gives the F-statistic, which in this example is 331.85. Then look under Prob > F. Here we get a p-value that is 0.0000. Since it is below 0.05, we can conclude that the means between the groups are not equal.

At the lower part of the same table, we get the results from Barlett’s test for equal variances. The null hypothesis for this test is that the variances are equal. Since we get a p-value (next to Prob>chi2) below 0.05, it suggests that the assumption of equal variances is violated.

Note
Violation against the assumption of equal variances often happen with large datasets like the one used in the example. Also, the test is rather sensitive to data which is not normally distributed (the income variable used here is slightly skewed). Therefore, it might be a good idea to also perform a non-parametric test (in this case, a Kruskal-Wallis ANOVA).

The fact that the F statistic tell us that the means between the groups are not equal says very little about wherein the differences lie: which groups are different? To answer this, we can take a look at the third table, showing the results from the Bonferroni test. The first lines of entries for each combination represents the mean differences. The second lines of entries are Bonferroni-adjusted p-values. In this example, they are all 0.000 (which is below 0.05), suggesting that there are significant differences between all three groups.