It is easy to estimate and graph the survival function and hazard function: we can use non-parametric methods such as the Kaplan-Meier product-limit estimator (see Kaplan-Meier curves).
Alternatively, we can estimate the survival distribution based on parametric regression models – in this context often referred to accelerated failure time models (or location-scale models). Within this framework, there are many different types which all assume different shapes of the distribution (e.g. exponential, Weibull, log-normal, log-logistic, Gompertz, and generalised gamma). While these will not be covered in this guide, you can explore them in Stata if you want to:
More informationhelp stintreg |
Then we have the proportional hazards model – or simply Cox regression – which is a semi-parametric type of model. Unlike non-parametric methods, proportional hazards models can account for more of the detail of the data. Additionally, they are more flexible compared to parametric models since there are fewer assumptions.
| Note For many (if not most) variables that capture events (i.e. case vs non-case), observational time/time at risk is relevant to consider. Yet, this information is not always available. Even when it is available, many researchers do not make any use of it and instead perform analyses suitable for binary outcomes, e.g. logistic regression. |
| Note Although time-to-event is a continuous variable, it is seldom a feasible alternative to apply a linear regression. This is primarily due to the incapability of linear regression models to account for censoring, but also because time-to-event variables often have a skewed distribution. |
| Note In some instances, Poisson regression is a viable alternative to Cox regression. This is for example the case when data are grouped (i.e. aggregated). |