Linear probability modelling

Written by:

Ylva B Almquist

Finally, we would like to make you aware that a viable alternative to the logistic regression model is the linear probability model (LPM). Estimating an LPM means that you enforce a linear regression model (following the instructions in Linear regression) on your binary outcome. The coefficients (estimates) that are derived from this analysis would then be interpreted as the mean difference in the outcome, i.e. difference in probabilities. The coefficients can thus be interpreted as risk differences.

As long as we are interested in estimating and interpreting associations, and have a strong interest in comparing crude (i.e. unadjusted) and adjusted coefficients between models and/or across samples, the LPM has clear advantages over the logistic regression model. Apart from the fact that we do not have to bother with the interpretation of odds ratios, the potential problem of rescaling bias when we perform mediation analysis (see Mediation analysis) is obliterated. We also retain statistical power for interaction analysis (see Interaction analysis).

A clear disadvantage with LPM, as highlighted in the introduction of this chapter, is that we might end up with predicted probabilities that are larger than 1 or smaller than 0. This might not be a problem if the goal with our analysis is – as mentioned above – to examine associations rather than making predictions. Another disadvantage is that the interpretations of coefficients for continuos x-variables become problematic (the slope of the linear equation does not approximate well for values at the beginning and the end of the range of values). Consequently, if we have a strong interest in interpreting such associations we need to recode the continuos variables into groups and use dummy variables in our regression. Also, an additional disadvantage is that the error term is not normally distributed, but this is really only a problem with small samples.