Quick facts
Number of variables
One dependent (y)
At least two independent (x)
Scales of variable(s)
Dependent: time-to-event
Independent: categorical (nominal/ordinal) or continuous (ratio/interval)
Theoretical example
| Example In a post-apocalyptic universe, we would like to study the effects of sex, marital status, and urbanicity on cause-specific mortality during a year-long, localized epidemic. In this example, the failure event is death due to an engineered zombie virus (0=No event, 1=Event). Sex is coded as 0=Male and 1=Female, whereas marital status is coded in four categories: 1=Married, 2=Divorced, 3=Widowed, and 4=Never married. Urbanicity is coded into three categories: 0=Rural, 1=Suburban, and 2=Urban. Our reference categories are male, married, and rural, respectively. The HR for females is 0.92. Relative to males, and holding marital status and urbanicity constant, females have an estimated 8% lower mortality attributable to zombie virus. Estimating the effects of marital status on survival, we get an HR of 1.34 for divorced individuals, an HR of 0.96 for widowed individuals, and an HR of 2.28 for never-married individuals. Holding the other covariates constant, divorced and never-married individuals have a higher estimated hazard for mortality compared to married individuals. Conversely, these results indicate that widowed individuals have an estimated 4% lower risk of mortality relative to those who are married. For our measure of urbancity, the HR is 1.45 for suburban areas and 12.78 for urban areas. After controlling for sex and marital status, we estimate that individuals living in the suburbs have a 1.45 times higher risk of death relative to individuals living in rural areas. Individuals living in urban areas have a 12.78 times higher risk of mortality due to the zombie virus, relative to individuals in rural areas and holding the other covariates constant. Yikes. |
Practical example
| Dataset |
| StataData1.dta |
| Variable name | cvd |
| Variable label | Out-patient care due to CVD (Ages 41-50, Year 2011-2020) |
| Value labels | 0=No 1=Yes |
| Variable name | gpa |
| Variable label | Grade point average (Age 15, Year 1985) |
| Value labels | N/A |
| Variable name | sex |
| Variable label | Sex |
| Value labels | 0=Man 1=Woman |
| Variable name | marstat40 |
| Variable label | Marital status (Age 40, Year 2010) |
| Value labels | 1=Married 2=Unmarried 3=Divorced 4=Widowed |
sum cvd gpa sex marstat40 if pop_cox==1 |

stcox gpa sex ib1.marstat40 if pop_cox==1, noshow |

In this model, we have three x-variables: gpa, sex, and marstat40. When we put them together, their statistical effect on cvd is mutually adjusted.
When it comes to the hazard ratios, they have changed in comparison to the simple regression models. For example, the hazard ratios have become closer to 1 for both gpa and sex: they changed from 0.48 to 0.55 for gpa, and from 0.52 to 0.59 for sex. This is largely also the case for the dummies of marstat40: the hazard ratio for Unmarried has changed from 2.86 to 2.52 and the one for Divorced has changed from 3.11 to 2.96. The hazard ratio for Widowed has nevertheless increased: from 2.46 to 2.97.
All x-variables still demonstrate statistically significant associations with cvd.
| Note A specific hazard ratio from a simple Cox regression model can increase when other x-variables are included. Usually, it is just “noise”, i.e. not any large increases, and therefore not much to be concerned about. But it can also reflect that there is something going on that we need to explore further. There are many possible explanations for increases in multiple regression models: a) We actually adjust for a confounder and then “reveal” the “true” statistical effect. b) There are interactions among the x-variables in their effect on the y-variable. c) There is something called collider bias (which we will not address in this guide) which basically mean that both the x-variable and the y-variable causes another x-variable in the model. d) The simple regression models and the multiple regression model are based on different samples. e) It can be due to rescaling bias (see Mediation analysis). |
| Summary In the fully adjusted model, it can be observed that while most associations are slightly attenuated in strength, they remain largely the same as in the simple models. |
Estimates table and coefficients plot
If we have multiple models, we can facilitate comparisons between the regression models by asking Stata to construct estimates tables and coefficients plots. What we do is to run the regression models one-by-one, save the estimates after each, and than use the commands estimates table and coefplot.
The coefplot option is not part of the standard Stata program, so unless you already have added this package, you need to install it:
ssc install coefplot |
As an example, we can include the three simple regression models as well as the multiple regression model. The quietly option is included in the beginning of the regression commands to suppress the output.
Run and save the first simple regression model:
quietly stcox gpa if pop_cox==1, noshow |
estimates store model1 |
Run and save the second simple regression model:
quietly stcox sex if pop_cox==1, noshow |
estimates store model2 |
Run and save the third simple regression model:
quietly stcox ib1.marstat40 if pop_cox==1, noshow |
estimates store model3 |
Run and save the multiple regression model:
quietly stcox gpa sex ib1.marstat40 if pop_cox==1, noshow |
estimates store model4 |
Produce the estimates table (include the option eform to show hazard ratios):
estimates table model1 model2 model3 model4, eform |

Produce the coefficients plot (include the option eform to show hazard ratios):
coefplot model1 model2 model3 model4, eform |

| Note You can improve the graph by using the Graph Editor to adjust the category and label names. |