Model diagnostics

Before we can trust the results from our linear regression analysis to be valid, we need to assess our model to check that it does not violate any of the fundamental assumptions of linear regression.

More information
help reg postestimation

Checklist

Continuous and normally distributed outcomeThe y-variable has to be continuous. It should also be normally distributed. Check this with a histogram. If it is not normally distributed, you might need to consider another alternative. For example, you can transform your y-variable (e.g. through categorisation, or log transformation).
Correct model specificationYour model should be correctly specified. This means that the x-variables that are included should be meaningful and contribute to the model. No important (confounding) variables should be omitted (often referred to as omitted variable bias).
No outliersOutliers are individuals who do not follow the overall pattern of data. Sometimes referred to as influential observations (however, not all outliers are influential).
HomoscedasticityThe variance around the regression line should be constant across all values of the x-variable(s).
NormalityThe residuals for our x-variables should be normally distributed.
LinearityThe effect of x on y should be linear.
No multicollinearityMulticollinearity may occur when two or more x-variables that are included simultaneously in the model are strongly correlated with each another. Actually, this does not violate the assumptions, but it does create greater standard errors which makes it harder to reject the null hypothesis.

Types of model diagnostics

Link testAssess model specification
Residual plotCheck for linearity, homoscedasticity, and outliers
Breusch-Pagan/Cook-Weisberg testCheck for homoscedasticity
Density plotCheck for normality
Normal probability plotCheck for normality
Normal quantile plotCheck for normality
Variance inflation factorCheck for multicollinearity
Correlation matrixCheck for multicollinearity