Simple Poisson regression with a categorical (non-binary) x

Theoretical examples

Example 1
We conduct a study among people who subscribe to a fishing magazine, focusing on the association between experience of fishing (x) and the number of fishes caught during the individual’s last fishing expedition (y). Experience of fishing has three categories: 1=Low level, 2=Medium level, and 3=High level. Low level is chosen as the reference category. The number of catches ranges between 0 and 30. We find that the IRR is 1.50 for Medium level and 2.03 for High level. This means that individuals with more experience have a higher rate of catches.
Example 2
In this example, we examine the association between temperament (x) and the number of cigarettes smoked per week (y). Temperament is categorised as: 1=Sanguine, 2=Choleric, 3=Melancholic, and 4=Phlegmatic. Phlegmatic is chosen as the reference category. The number of cigarettes ranges from 0 to 150. We find that the IRR is 0.81 for Melancholic, 1.29 for Choleric, and 3.73 for Sanguine. In other words, individuals with melancholic temperament have a lower rate of cigarette smoking compared to the phlegmatic, whereas the opposite is true for individuals whose temperament is characterised as choleric or sanguine.

Practical example

Dataset
StataData1.dta
Variable namechildren
Variable labelNumber of children (Age 40, Year 2010)
Value labelsN/A
Variable nameeduc
Variable labelEducational level (Age 40, Year 2010)
Value labels1=Compulsory
2=Upper secondary
3=University

sum children educ if pop_poisson==1

The variable educ has three categories: 1=Compulsory, 2=Upper secondary, and 3=University. Here, we (with ib1) specify that the first category (Compulsory) will be the reference category.

poisson children ib1.educ if pop_poisson==1, irr

When we look at the results for the dummies for educ, we see that the incidence rate ratios are 1.17 for Upper secondary and 1.32 for University. Thus, having a higher level of educational attainment is associated with a higher rate of children.

Both dummies for educ are significantly different from the reference category, as reflected in the p-values and the 95% confidence intervals.

Test the overall effect

The output presented and interpreted above, is based on the relative rate ratios for the dummy variables of educ. Let us also assess the overall statistical effect of educ on children? We can assess it through contrast, which is a postestimation command.

contrast p.educ, noeffects

Here, we focus on the row for linear, which shows a p-value (P>chi2) below 0.05. This suggests that we have a statistically significant trend in children according to educ.

More information
help contrast

We will also produce a graph of the trend. First, however, we need to apply the post-estimation command margins.

Note
This command can also be used for variables that are continuous or binary, but is particularly useful for categorical, non-binary (i.e. ordinal) variables. 

margins educ

marginsplot

Note
The y-axis shows predicted number of events (i.e. not log incidence rates or incidence rate ratios).

This graph quite clearly shows that the higher the level of educational attainment, the higher the number of children.

More information
help marginsplot

Summary
At age 40, there is a clear, and statistically significant, trend in the rate of children according to educational level: higher levels of education are associated with a higher rate of children.