Multiple Poisson regression

Written by:

Ylva B Almquist

Quick facts

Number of variables
One

Scales of variable(s)
Categorical (nominal/ordinal)

Theoretical example

Example
Suppose we are interested to see if having young children (x), residential area (x), and income (x) are related to the number of pets owned (y). Having young children is measured as either 0=No young children and 1=Young children. Residential area has the values 1=Metropolitan, 2=Smaller city, and 3=Rural. We choose Metropolitan as our reference category. Income is measured as the yearly household income from salary in thousands of SEK (ranges between 100 and 700 SEK). The number of pets owned ranges between 0 and 50.

We get an IRR for Young children that is 1.23. That means that those who have young children are have a higher rate of pets, compared to those who do not have young children. This association is adjusted for residential area and income.

With regards to residential area, we get an IRR for Smaller city of 1.30, whereas the IRR for Rural is 7.44. This suggests that those who live in a smaller city have a higher rate of pets compared to those living in metropolitan areas, and so are those living in rural areas. These results are adjusted for having young children and income.

Finally, the IRR for income is 0.98. This suggests that for every unit increase in income (i.e. for every additional one thousand SEK), the rate of pets decreases. This association is adjusted for having young children as well as residential area.

Practical example

Dataset

StataData1.dta

Variable name	children
Variable label	Number of children (Age 40, Year 2010)
Value labels	N/A

Variable name	siblings
Variable label	Number of siblings (Age 15, Year 1985)
Value labels	N/A

Variable name	sex
Variable label	Sex
Value labels	0=Man 1=Woman

Variable name	educ
Variable label	Educational level (Age 40, Year 2010)
Value labels	1=Compulsory 2=Upper secondary 3=University

sum children siblings sex educ if pop_poisson==1

poisson children siblings sex ib1.educ if pop_poisson==1, irr

In this model, we have three x-variables: siblings, sex, and educ. When we put them together, their statistical effect on educ is mutually adjusted.

When it comes to the incidence rate ratios, they have changed in comparison to the simple regression models. For example, the odds ratio for siblings has increased marginally 1.01 to 1.02. The incidence rate ratio for sex has become slightly closer to 1 (from 1.32 to 1.30). This is also the case for the dummies of educ: the incidence risk ratio for Upper secondary has changed from 1.17 to 1.15 and the one for University has changed from 1.32 to 1.29.

The association between the siblings and children has become statistically significant (p<0.05) after mutual adjustment. However, it was very close to being significant also in the simple model (p=0.07). The associations between sex and children on the one hand, and between educ and children on the other hand, are still statistically significant.

Note
A specific incidence risk ratio from a simple Poisson regression model can increase when other x-variables are included. Usually, it is just “noise”, i.e. not any large increases, and therefore not much to be concerned about. But it can also reflect that there is something going on that we need to explore further. There are many possible explanations for increases in multiple regression models: a) We actually adjust for a confounder and then “reveal” the “true” statistical effect. b) There are interactions among the x-variables in their effect on the y-variable. c) There is something called collider bias (which we will not address in this guide) which basically mean that both the x-variable and the y-variable causes another x-variable in the model. d) The simple regression models and the multiple regression model are based on different samples. e) It can be due to rescaling bias (see Mediation analysis).

Summary
In the fully adjusted model, it can be observed that the association between the number of siblings and the number of children at age 40 now reaches a statistically significant level (IRR=1.02; 95% CI=1.00-1.03). The associations between sex and number of children as well as between educational level and number of children have become somewhat attenuated, but remain statistically significant.

Estimates table and coefficients plot

If we have multiple models, we can facilitate comparisons between the regression models by asking Stata to construct estimates tables and coefficients plots. What we do is to run the regression models one-by-one, save the estimates after each, and then use the commands estimates table and coefplot.

The coefplot option is not part of the standard Stata program, so unless you already have added this package, you need to install it:

ssc install coefplot

As an example, we can include the three simple regression models as well as the multiple regression model. The quietly option is included in the beginning of the regression commands to suppress the output.

Run and save the first simple regression model:

quietly poisson children siblings if pop_poisson==1, irr

estimates store model1

Run and save the second simple regression model:

quietly poisson children sex if pop_poisson==1, irr

estimates store model2

Run and save the third simple regression model:

quietly poisson children ib1.educ if pop_poisson==1, irr

estimates store model3

Run and save the multiple regression model:

quietly poisson children siblings sex ib1.educ if pop_poisson==1, irr

estimates store model4

Produce the estimates table (include the option eform to show the incidence rate ratios):

estimates table model1 model2 model3 model4, eform

Produce the coefficients plot (include the option eform to show the incidence rate ratios):

coefplot model1 model2 model3 model4, eform

Note
You can improve the graph by using the Graph Editor to delete “_cons” as well as to adjust the category and label names.