Chi-square test

Quick facts

Number of variables
Two

Scales of variable(s)
Categorical 

There are two different forms of the chi-square test: a) The multidimensional chi-square test, and b) The goodness of fit chi-square test. It is the first form that will be covered in this part of the guide. The second form is discussed in other sections.  

The multidimensional chi-square test assesses whether there is a relationship between two categorical variables. For example, let us assume that you want to see if young women smoke more than young men. The variable gender has two categories (men and women) and, in this particular case, the variable smoking consists of the categories: no smoking, occasional smoking and frequent smoking. The multidimensional chi-square test can be thought of as a simple crosstable where the distribution of these two variables is displayed: 

No smokingOccasional smokingFrequent smoking
Men (age 15-24) 85%10%5%
Women (age 15-24) 70%20%10%

Assumptions

First, you have to check your data to see that the assumptions behind the chi-square test hold. If your data “passes” these assumptions, you will have a valid result. 

Checklist

Two or more unrelated categories in both variables Both variables should be categorical (i.e. nominal or ordinal) and consist of two or more groups. Unrelated means that the groups should be mutually excluded: no individual can be in more than one of the groups. For example: low vs. medium vs. high educational level; liberal vs. conservative vs. socialist political views; or poor vs. fair, vs. good vs. excellent health; and so on. 

Function

Basic command
tab varname1 varname2, chi2
Useful options
tab varname1 varname2, chi2 exact
Explanations
varname1Insert the name of the first variable you want to use (is included as the row variable).
varname2Insert the name of the second variable you want to use (is included as the column variable).
chi2Report Pearson’s chi-squared.
exactReport Fisher’s exact test (useful if you have empty cells in your crosstable).
Short names
tabTabulate
More information
help tabulate twoway

Practical example

Dataset
StataData1.dta
Variable namemarstat40
Variable labelMarital status (Age 40, Year 2010)
Value labels1=Married
2=Unmarried
3=Divorced
4=Widowed
Variable nameearlyret
Variable labelEarly retirement (Age 50, Year 2020)
Value labels0=No
1=Yes

tab earlyret marstat40, chi2

Here we can see the crosstable of our two variables. It is followed by the chi-square value (Pearson chi2) and a p-value (Pr). If the p-value is below 0.05 it means that the two variables are not independent from one another. In this example, since the p-value is 0.000, it means that there are significant differences in early retirement according to marital status (or, by principle, vice versa).