*Original version written by Christoffer Åkesson
Quick facts
| Number of variables | One group variable (x) One test variable (y) One or more covariates (z) |
| Scale of variable(s) | Group variable: categorical Test variable: continuous Covariates: categorical or continuous |
As discussed in One-way ANOVA, the one-way ANOVA is the statistical procedure of comparing the means of two or more groups. ANCOVA is very similar to ANOVA. The key difference between the two is that ANCOVA allows you to control for the effects of one or more extraneous variables, known as covariates (also see the discussion on confounding in X, y, and z). These covariates can take any form, i.e. they can be either categorical or continuous – but if you have a non-binary categorical covariate (i.e. one with more than two categories) you need to create dummy variables for this one (see Dummy variables).
For example, you could use ANCOVA to see which diet was best for losing weight after controlling for age and body mass index at baseline (i.e. your test variable would be “weight loss”, your group variable would be “type of diet” and your covariates would be “age” and “body mass index at baseline”).
| Note In many ways, ANCOVA is equivalent to multiple linear regression (which is described in more detail in Linear regression). Why then use ANCOVA? Well, the answer depends on what you want to achieve with your analysis, but for most purposes, we would argue that linear regression is more flexible. |
Assumptions
First, you have to check your data to see that the assumptions behind ANCOVA hold. If your data ‘passes’ these assumptions, you will have a valid result.
Checklist
| Continuous and normally distributed test variable | Your test variable should be continuous (i.e. interval/ratio) and normally distributed. For example: Income, height, weight, number of years of schooling, and so on. Although they are not really continuous, it is still very common to use ratings as continuous variables, such as: “How satisfied with your income are you?” (on a scale 1-10) or “To what extent do you agree with the previous statement?” (on a scale 1-5). |
| Two or more unrelated categories in the group variable | Your group variable should be categorical (i.e. nominal or ordinal) and consist of two or more groups. Unrelated means that the groups should be mutually excluded: no individual can be in more than one of the groups. For example: low vs. medium vs. high educational level; liberal vs. conservative vs. socialist political views; or poor vs. fair, vs. good vs. excellent health; and so on. |
| Equal variance | The variance in the test variable should be equal across the groups of the group variable. |
| No outliers | An outlier is an extreme (low or high) value. For example, if most individuals have a test score between 40 and 60, but one individual has a score of 96 or another individual has a score of 1, this will distort the test. |
| Homogenetiy of regression slopes | Your test variables and any covariate(s) should have the same slopes across all levels of the categorical group variable. |
Function
| Basic command |
anova testvar groupvar c.covariate |
| Explanations | |
testvar | Insert the name of the test variable |
groupvar | Insert the name of the group variable |
covariate | Insert the name of the covariate variable |
| Note You need to tell Stata that a variable in your ANOVA statement is continuous or it will treat it as another categorical factor. You denote continuous independent variables within the ANOVA command by placing “c.” in front of it. |
More informationhelp anova |
Practical example
| Dataset |
| StataData1.dta |
| Variable name | gpa |
| Variable label | Grade point average (Age 15, Year 1985) |
| Value labels | N/A |
| Variable name | bullied |
| Variable label | Exposure to bullying (Age 15, Year 1985) |
| Value labels | 0=No 1=Yes |
| Variable name | cognitive |
| Variable label | Cognitive test score (Age 15, Year 1985) |
| Value labels | N/A |
anova gpa bullied c.cognitive |

In this example, we are interested in seeing if grade point average (gpa) differs between individuals according to whether they have been exposed to bullying or not (bullied), while controlling for cognitive test scores (cognitive). The null hypothesis is that there is no difference in gpa between unexposed and exposed.
| Note Partial SS (SS=sum of squares) refers to variation assigned to one variable while controlling for the other variable. |
As can be seen, the F statistic for bullied is 6.19. The corresponding p-value is 0.0129. Since this is below 0.05, it means that there is a statistically significant difference in grade point average between unexposed and exposed to bullying when we control for cognitive test scores. In other words, we can reject the null hypothesis. We can also see that the variable cognitive is statistically significantly related to grade point average (F=4969.88, p <0.05).
Postestimation commands
There are many different postestimation commands that you can apply to ANCOVA.
More informationhelp anova postestimation |
For example, we can use the postestimation command contrast to obtain the adjusted mean differences:
contrast r.bullied, asobserved |

In the column called Contrast, you see the mean difference (-0.0487) in grade point average between those who were exposed to bullying and those who were not exposed, controlled for cognitive test scores.
We can also use the postestimation command margins, which gives us predicted means for each of the group:
margins bullied |

Looking at the column called Margin, we see that the predicted mean in grade point average is slightly higher for individuals who were not exposed to bullying (3.227) than for individuals who were exposed to bullying (3.179), controlled for cognitive test scores. Also note that the difference in means between the groups is around 0.0487, which is what we saw with contrast.