Hurdle regression

Written by:

Ylva B Almquist

Finally, we would like to make you aware that a viable alternative to zero-inflated Poisson regression is hurdle regression. Like zero-inflated Poisson regression, hurdle regression will model the outcome in two steps. But where the first step in the zero-inflated Poisson regression predicts whether the outcome is 0, the first step in the hurdle regression predicts whether the outcome is 1.

Assume that we are interested in predicting the number of months an individual has received means-tested social assistance. Means-tested social assistance is a relatively rare outcome (at least at population level), so the vast majority of individuals have not received any social assistance. Since data include both recipients and non-recipients, the first model (typically a logistic regression model) determines whether one has received social assistance, and the second model (typically a model for count data) determines the number of months in receipt of social assistance given that one has received benefits (i.e. when the ‘hurdle’ has been crossed). Such an approach thus allows for testing hypotheses about whether there are different processes governing the occurrence and the continuation of the outcome.

Hurdle regression comes in many versions, of which Poisson-logit and negative binomial-logit are two examples. With the commands hplogit and hnblogit, we can produce the Poisson-logit and negative binomial-logit versions of the hurdle model. These commands require separate installations. We will not go through these here, but if you are interested, we suggest that you install hplogit and hnblogit and then review the help files.