Simple logistic regression with a categorical (non-binary) x

Written by:

Ylva B Almquist

Theoretical examples

Example 1
We want to investigate the association between educational attainment (x) and divorce (y). Educational attainment has the values: 1=Compulsory, 2=Upper secondary, and 3=University. We choose Compulsory as our reference category. Let us say that we get an OR for upper secondary education that is 0.82 and we get an OR for university education that is 0.69. We can thus conclude – based on the direction of the estimates – that higher levels of educational attainment are associated with a lower risk of divorce.

Example 2
Suppose we are interested in the association between family type (x) and children’s average school marks (y). Family type has three categories: 1=Two-parent household, 2=Joint custody, and 3=Single-parent household. We choose Two-parent household as our reference category. Children’s average school marks are categorised into 0=Above average and 1=Below average. The analysis results in an OR of 1.02 for Joint custody and an OR of 1.55 for Single-parent household. That would mean that children living in family types other than two-parent households are more likely to have school marks below average.

Practical example

Dataset

StataData1.dta

Variable name	earlyret
Variable label	Early retirement (Age 50, Year 2020)
Value labels	0=No 1=Yes

Variable name	educ
Variable label	Educational level (Age 40, Year 2010)
Value labels	1=Compulsory 2=Upper secondary 3=University

sum earlyret educ if pop_logistic==1

The variable educ has three categories: 1=Compulsory, 2=Upper secondary, and 3=University. Here, we (with ib1) specify that the first category (Compulsory) will be the reference category.

logistic earlyret ib1.educ if pop_logistic==1

When we look at the results for educ, we see two odds ratios: one for Upper secondary and one for University. They are compared to the reference category, which in this case in Compulsory (OR=1.00). The odds ratio for Upper secondary is 0.71, meaning that those with upper secondary education have lower odds of earlyret, compared to those with compulsory education. The odds ratio for University is 0.33, which suggests that these individuals are even less likely to having retired at age 50, compared to those with compulsory education.

The dummies for educ are both significantly different from the reference category, as reflected in the p-value (0.000) and the 95% confidence intervals (0.60-0.84, and 0.27-0.40 respectively).

Testing the overall effect

The output presented and interpreted above, is based on the odds ratios for the dummy variables of educ. But what about the overall statistical effect of educ on earlyret? We can assess it through contrast, which is a postestimation command.

contrast p.educ, noeffects

Here, we focus on the row for linear, which shows a p-value (P>chi2) below 0.05. This suggests that we have a statistically significant trend in earlyret according to educ.

More information
help contrast

We will also produce a graph of the trend. First, however, we need to apply the post-estimation command margins.

Note
This command can also be used for variables that are continuous or binary, but is particularly useful for categorical, non-binary (i.e. ordinal) variables.

margins educ

marginsplot

This is our marginsplot. A quite clear trend is shown here.

Note
The y-axis shows predicted probabilities (i.e. not log odds or odds ratios).

More information
help marginsplot

Summary
There is a negative association between educational level and early retirement; the higher the educational level, the lower the odds of early retirement (Upper secondary vs Compulsory: OR=0.71, 95% CI=0.60-0.84; University vs Compulsory: OR=0.33, 95% CI=0.27-0.40).