Non-parametric, parametric, and semi-parametric models

Written by:

Ylva B Almquist

It is easy to estimate and graph the survival function and hazard function: we can use non-parametric methods such as the Kaplan-Meier product-limit estimator (see Kaplan-Meier curves).

Alternatively, we can estimate the survival distribution based on parametric regression models – in this context often referred to accelerated failure time models (or location-scale models). Within this framework, there are many different types which all assume different shapes of the distribution (e.g. exponential, Weibull, log-normal, log-logistic, Gompertz, and generalised gamma). While these will not be covered in this guide, you can explore them in Stata if you want to:

More information
help stintreg

Then we have the proportional hazards model – or simply Cox regression – which is a semi-parametric type of model. Unlike non-parametric methods, proportional hazards models can account for more of the detail of the data. Additionally, they are more flexible compared to parametric models since there are fewer assumptions.

Note
For many (if not most) variables that capture events (i.e. case vs non-case), observational time/time at risk is relevant to consider. Yet, this information is not always available. Even when it is available, many researchers do not make any use of it and instead perform analyses suitable for binary outcomes, e.g. logistic regression.

Note
Although time-to-event is a continuous variable, it is seldom a feasible alternative to apply a linear regression. This is primarily due to the incapability of linear regression models to account for censoring, but also because time-to-event variables often have a skewed distribution.

Note
In some instances, Poisson regression is a viable alternative to Cox regression. This is for example the case when data are grouped (i.e. aggregated).