Model diagnostics

Step 7: Model diagnostics

Here we will investigate if the assumptions of logistic regression were held up in our analysis of the data with various model diagnostic tests.

Note
Remember that the analyses should only be based on individuals with a value of 1 for the pop variable. 
Note
Remember to use factor variables when including categorical/non-binary variables. 

We will produce model diagnostics based on the multiple logistic regression analysis with conflict_dich, sex, year, age_cat, qual, stress, and health_dich (i.e., model3).

Note
For extra review, please re-visit this section: Model diagnostics.

Here, we should investigate the following: 

  • Model specification using a link test. 
  • Multicollinearity using a correlation matrix. 
  • Model fit using a Hosmer-Lemeshow test and a ROC curve. 
Note
For extra review, please re-visit these sections: Link test, Correlation matrix, The Hosmer and Lemeshow test and ROC curve.

Before performing each test for model diagnostics, we need to specify for which model the diagnostics should be performed for. Therefore we need to perform the analysis of our full model before the model diagnostics. We use the “quietly” option in our command to suppress the output.

Link test

quietly logistic health_dich conflict_dich i.sex i.year i.age_cat i.qual i.stress if pop==1
linktest

The link test shows that the model is correctly specified as the _hat value is below 0.05, thus it is statistically significant (0.000) and the _hatsq value is above 0.05, thus it is not statistically significant (0.120).

Correlation matrix

quietly logistic health_dich conflict_dich i.sex i.year i.age_cat i.qual i.stress if pop==1
estat vce, corr

The correlation matrix shows that there are only weak correlations here except for the dummy variables for stress which is not relevant because they reflect the same variables. Thus, there is no problem with multicollinearity.

Hosmer-Lemeshow test

quietly logistic health_dich conflict_dich i.sex i.year i.age_cat i.qual i.stress if pop==1
lroc

The Hosmer-Lemeshow test shows that the model fits the data better compared to the null model, i.e. it has better predictive power. The ROC curve shows that the model predicts cases and non-cases quite well. AUC = 0.7425

In summary, the model seems to fit the data well. That the ROC curve only shows that the model is pretty good (instead of good or very good) is not a big problem here because we are investigating correlation and not predictive ability.

Summary
Based on model diagnostics, we do not see any potential issues with the assumptions behind linear regression analysis.

In your thesis, the results from your model diagnostics may look different and you might see potential problems. What to do about this depends a lot on your data and variables, and we suggest you discuss with your supervisor what actions that should be taken regarding your analyses.

Note
When all analyses are completed, remember to save the dataset and do-file!

After performing all of the analyses and model diagnostics, it is time to decide what to include in the result section of our thesis. Let us look at this further in the next section on results.