Simple Poisson regression with a categorical (non-binary) x
Written by:
Ylva B Almquist
Theoretical examples
Example 1 We conduct a study among people who subscribe to a fishing magazine, focusing on the association between experience of fishing (x) and the number of fishes caught during the individual’s last fishing expedition (y). Experience of fishing has three categories: 1=Low level, 2=Medium level, and 3=High level. Low level is chosen as the reference category. The number of catches ranges between 0 and 30. We find that the IRR is 1.50 for Medium level and 2.03 for High level. This means that individuals with more experience have a higher rate of catches.
Example 2 In this example, we examine the association between temperament (x) and the number of cigarettes smoked per week (y). Temperament is categorised as: 1=Sanguine, 2=Choleric, 3=Melancholic, and 4=Phlegmatic. Phlegmatic is chosen as the reference category. The number of cigarettes ranges from 0 to 150. We find that the IRR is 0.81 for Melancholic, 1.29 for Choleric, and 3.73 for Sanguine. In other words, individuals with melancholic temperament have a lower rate of cigarette smoking compared to the phlegmatic, whereas the opposite is true for individuals whose temperament is characterised as choleric or sanguine.
Practical example
Dataset
StataData1.dta
Variable name
children
Variable label
Number of children (Age 40, Year 2010)
Value labels
N/A
Variable name
educ
Variable label
Educational level (Age 40, Year 2010)
Value labels
1=Compulsory 2=Upper secondary 3=University
sum children educ if pop_poisson==1
The variable educ has three categories: 1=Compulsory, 2=Upper secondary, and 3=University. Here, we (with ib1) specify that the first category (Compulsory) will be the reference category.
poisson children ib1.educ if pop_poisson==1, irr
When we look at the results for the dummies for educ, we see that the incidence rate ratios are 1.17 for Upper secondary and 1.32 for University. Thus, having a higher level of educational attainment is associated with a higher rate of children.
Both dummies for educ are significantly different from the reference category, as reflected in the p-values and the 95% confidence intervals.
Test the overall effect
The output presented and interpreted above, is based on the relative rate ratios for the dummy variables of educ. Let us also assess the overall statistical effect of educ on children? We can assess it through contrast, which is a postestimation command.
contrast p.educ, noeffects
Here, we focus on the row for linear, which shows a p-value (P>chi2) below 0.05. This suggests that we have a statistically significant trend in children according to educ.
More information help contrast
We will also produce a graph of the trend. First, however, we need to apply the post-estimation command margins.
Note This command can also be used for variables that are continuous or binary, but is particularly useful for categorical, non-binary (i.e. ordinal) variables.
margins educ
marginsplot
Note The y-axis shows predicted number of events (i.e. not log incidence rates or incidence rate ratios).
This graph quite clearly shows that the higher the level of educational attainment, the higher the number of children.
More information help marginsplot
Summary At age 40, there is a clear, and statistically significant, trend in the rate of children according to educational level: higher levels of education are associated with a higher rate of children.