Model diagnostics

Step 9: Model diagnostics

Next we produce model diagnostics based on the multiple linear regression analysis with bullied, gradeyear, sex, bornoutswe, famtype, numrelocations, and healthissues.

Note
Remember that the analyses should only be based on those with a value of 1 for the pop variable.
Note
Remember to use factor variables when including categorical/non-binary variables.

Here, we should examine the following:

  • Model specification using a link test.
  • Normality using a density plot, a probability plot, and a quantile plot.
  • Multicollinearity using the variance inflation factor and a correlation matrix.

Before performing each test for model diagnostics, we need to specify for which model the diagnostics should be performed for. Therefore we need to perform the analysis of our full model before the model diagnostics. We use the “quietly” option in our command to suppress the output.

quietly reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe i.famtype numrelocations if pop==1
linktest

Here we can see that _hat is statistically significant and hatsq_ is not. This looks good and means that our model is not incorrectly specified. It should also be noted that our research question is to investigate associations, and not exact predictions, which means that it would not necessarily have been a huge problem if the model did not pass the linktest.

quietly reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe i.famtype numrelocations if pop==1
predict res if pop==1, resid
kdensity res, normal

pnorm res

qnorm res

In the density plot we can see that the graph over the residuals follow the normal distribution curve relatively well, the probability curve shows a little bit of deviation but is still OK, and the quintile plot shows a hint of an S-figure but not enough to deem it problematic.

quietly reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe i.famtype numrelocations if pop==1
estat vif

We want to see mean VIF <10 and low VIF-values. That is the case here. Then we create a correlation matrix.

estat vce, corr

We do not want to see any high correlations, preferably all of them should be <0.7. This looks good.

Note
We do not examine linearity with a residual plot here, as all our x-variables have discrete values.
Summary
Based on model diagnostics, we do not see any potential issues with the assumptions behind linear regression analysis.

In your thesis, the results from your model diagnostics may look different and you might see potential problems. What to do about this depends a lot on your data and variables, and we suggest you discuss with your supervisor what actions that should be taken regarding your analyses.

Note
For extra review, please re-visit the section on Model diagnostics.
Note
When all analyses are completed, remember to save the dataset and do-file!

After performing all of the analyses and model diagnostics, it is time to decide what to include in the result section of our thesis. Let us look at this further in the next section on results.