There are two different forms of the chi-square test: a) The multidimensional chi-square test, and b) The goodness of fit chi-square test. It is the first form that will be covered in this part of the guide. The second form is discussed in other sections.
The multidimensional chi-square test assesses whether there is a relationship between two categorical variables. For example, let us assume that you want to see if young women smoke more than young men. The variable gender has two categories (men and women) and, in this particular case, the variable smoking consists of the categories: no smoking, occasional smoking and frequent smoking. The multidimensional chi-square test can be thought of as a simple crosstable where the distribution of these two variables is displayed:
No smoking
Occasional smoking
Frequent smoking
Men (age 15-24)
85%
10%
5%
Women (age 15-24)
70%
20%
10%
Assumptions
First, you have to check your data to see that the assumptions behind the chi-square test hold. If your data “passes” these assumptions, you will have a valid result.
Checklist
Two or more unrelated categories in both variables
Both variables should be categorical (i.e. nominal or ordinal) and consist of two or more groups. Unrelated means that the groups should be mutually excluded: no individual can be in more than one of the groups. For example: low vs. medium vs. high educational level; liberal vs. conservative vs. socialist political views; or poor vs. fair, vs. good vs. excellent health; and so on.
Function
Basic command
tab varname1 varname2, chi2
Useful options
tab varname1 varname2, chi2 exact
Explanations
varname1
Insert the name of the first variable you want to use (is included as the row variable).
varname2
Insert the name of the second variable you want to use (is included as the column variable).
chi2
Report Pearson’s chi-squared.
exact
Report Fisher’s exact test (useful if you have empty cells in your crosstable).
Short names
tab
Tabulate
More information help tabulate twoway
Practical example
Dataset
StataData1.dta
Variable name
marstat40
Variable label
Marital status (Age 40, Year 2010)
Value labels
1=Married 2=Unmarried 3=Divorced 4=Widowed
Variable name
earlyret
Variable label
Early retirement (Age 50, Year 2020)
Value labels
0=No 1=Yes
tab earlyret marstat40, chi2
Here we can see the crosstable of our two variables. It is followed by the chi-square value (Pearson chi2) and a p-value (Pr). If the p-value is below 0.05 it means that the two variables are not independent from one another. In this example, since the p-value is 0.000, it means that there are significant differences in early retirement according to marital status (or, by principle, vice versa).