Multiple multinomial regression

Written by:

Ylva B Almquist

Quick facts

Number of variables
One dependent (y)
At least two independent (x)

Scales of variable(s)
Dependent: nominal (with more than two categories)
Independent: categorical (nominal/ordinal) or continuous (ratio/interval)

Theoretical example

Example
Suppose we are interested to see if having young children (x), residential area (x), and income (x) are related to smoking (y). Having young children is measured as either 0=No young children and 1=Young children. Residential area has the values 1=Metropolitan, 2=Smaller city, and 3=Rural. We choose Metropolitan as our reference category. Income is measured as the yearly household income from salary in thousands of SEK (ranges between 100 and 700 SEK). Smoking has the values 1=Non-smoker, 2=Former smoker, and 3=Current smoker. We choose Non-smoker as our base outcome.

In the regression analysis, we get an RRR of 1.19 for Young children and Former smoker, suggesting that those who have young children are more likely to be former smokers than non-smokers compared to those who do not have young children. Then we get an RRR of 0.77 for Young children and Current smoker, which means that those who have young children are less likely to be current smokers than non-smokers compared to those who do not have young children. These results are adjusted for residential area and income.

The RRR for Smaller city and Former smoker is 2.09, which suggests that those who live in a smaller city are more likely to be former smokers than non-smokers compared to those who live in a metropolitan area. The RRR for Smaller city and Current smoker is 3.71, which suggests that those who live in a smaller city are more likely to be current smokers than non-smokers compared to those who live in a metropolitan area. The RRR for Rural and Former smoker is 3.59, which suggests that those who live in a rural area are more likely to be former smokers than non-smokers compared to those who live in a metropolitan area. The RRR for Rural and Current smoker is 5.01, which suggests that those who live in a rural area are more likely to be current smokers than non-smokers compared to those who live in a metropolitan area. These results are adjusted for having young children and income.

With regard to income, the RRR for Former smoker is 0.93, suggesting that for every unit increase in income, the risk of being a former smoker decreases. The RRR for Current smoker is 0.78, which means that for every unit increase in income, the risk of being a current smoker also decreases. These results are adjusted for having young children and residential area.

Practical example

Dataset

StataData1.dta

Variable name	marstat40
Variable label	Marital status (Age 40, Year 2010)
Value labels	1=Married 2=Unmarried 3=Divorced 4=Widowed

Variable name	gpa
Variable label	Grade point average (Age 15, Year 1985)
Value labels	N/A

Variable name	sex
Variable label	Sex
Value labels	0=Man 1=Woman

Variable name	educ
Variable label	Educational level (Age 40, Year 2010)
Value labels	1=Compulsory 2=Upper secondary 3=University

sum marstat40 gpa sex educ if pop_multinom==1

mlogit marstat40 gpa sex ib1.educ if pop_multinom==1, rrr b(1)

In this model, we have three x-variables: gpa, sex, and educ. When we put them together, their statistical effect on marstat40 is mutually adjusted.

When it comes to the relative risk ratios, they have changed in comparison to the simple regression models. For example, the relative risk ratios for gpa have decreased (become closer to 1). The relative risk ratios for sex have also decreased slightly – apart from the one for Divorced (which is a bit higher now). Concerning the dummies of educ, they are also closer to 1 now, except for the ones for Widowed.

Regarding statistical significance, the same results are in the single regression models are found here.

Note
A specific relative risk ratio from a simple multinomial regression model can increase when other x-variables are included. Usually, it is just “noise”, i.e. not any large increases, and therefore not much to be concerned about. But it can also reflect that there is something going on that we need to explore further. There are many possible explanations for increases in multiple regression models: a) We actually adjust for a confounder and then “reveal” the “true” statistical effect. b) There are interactions among the x-variables in their effect on the y-variable. c) There is something called collider bias (which we will not address in this guide) which basically mean that both the x-variable and the y-variable causes another x-variable in the model. d) The simple regression models and the multiple regression model are based on different samples. e) It can be due to rescaling bias (see Mediation analysis).

Summary
In the fully adjusted model, most differences are attenuated but the overall patterns remain the same.

Estimates table and coefficients plot

If we have multiple models, we can facilitate comparisons between the regression models by asking Stata to construct estimates tables and coefficients plots. What we do is to run the regression models one-by-one, save the estimates after each, and than use the commands estimates table and coefplot.

The coefplot option is not part of the standard Stata program, so unless you already have added this package, you need to install it:

ssc install coefplot

As an example, we can include the three simple regression models as well as the multiple regression model. The quietly option is included in the beginning of the regression commands to suppress the output.

Run and save the first simple regression model:

quietly mlogit marstat40 gpa if pop_multinom==1, rrr b(1)

estimates store model1

Run and save the second simple regression model:

quietly mlogit marstat40 sex if pop_multinom==1, rrr b(1)

estimates store model2

Run and save the third simple regression model:

quietly mlogit marstat40 ib1.educ if pop_multinom==1, rrr b(1)

estimates store model3

Run and save the multiple regression model:

quietly mlogit marstat40 gpa sex ib1.educ if pop_multinom==1, rrr b(1)

estimates store model4

Produce the estimates table (include the option eform to show relative risk ratios):

estimates table model1 model2 model3 model4, eform

Produce the coefficients plot (include the option eform to show relative risk ratios):

coefplot model1 model2 model3 model4, eform

Note
You can improve the graph by using the Graph Editor to delete “_cons” as well as to adjust the category and label names.