Multiple linear regression analysis

Written by:

Linnea Eriksson

Step 7: Multiple linear regression analyses

After performing all of the simple linear regression analyses, now we will perform multiple linear regression analyses with all of the variables bullied, gradeyear, sex, bornoutswe, famtype, numrelocations, and healthissues. With these analyses we will answer the second research question:

Does the relationship persist after accounting for sex, grade level, foreign background, family type, and the number of relocations?

Note
Remember that the analyses should only be based on those with a value of 1 for the pop variable.

Note
Remember to use factor variables when including categorical/non-binary variables.

We will start by adding the covariates one at a time, to see step-by-step how the associations differ.

reg healthissues i.bullied i.sex if pop==1

We will also save all the estimates, to be able to compare the models after all analyses have been conducted. The estimates form this model will be called “m2”.

estimates store m2

reg healthissues i.bullied i.sex i.gradeyear if pop==1

The estimates form this model will be called “m3”.

estimates store m3

reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe if pop==1

The estimates form this model will be called “m4”.

estimates store m4

reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe i.famtype if pop==1

The estimates form this model will be called “m5”.

estimates store m5

reg healthissues i.bullied i.sex i.gradeyear i.bornoutswe i.famtype numrelocations if pop==1

After performing all of the models, we can see that the B-coefficient changes slightly when adding each variable to the model. For now, we will focus on the multiple regression containing all of the variables.

Looking at the adjusted R-squared value, 18.6% of the variation in health issues is explained by the included variables. We can also see how the coefficients have changed compared to the simple linear regression analysis of bullied and healthissues, now that the model is adjusted for the covariates.

Thereafter we save the estimates from the model and name it “m6”.

estimates store m6

Then we create an estimate table based on the estimates we saved from the models (m1-m6).

Note
You can include p-values in the estimate table by adding “p” at the end of the command.

estimates table m1 m2 m3 m4 m5 m6, p

Here we can compare the estimates for the variables in each model. We can note the differences between the models when adding the covariates, both in terms of B-coefficients and p-values.

Summary
The association between exposure to bullying and self-rated health issues is still statistically significant after adjusting for the other variables (p<0.0001). The B-coefficient has decreased slightly from B=8.14 in the simple model (m1) to B=6.52 in the fully adjusted model (m6).

Note
During our statistical analysis, it is important to consider the covariates (gradeyear, gender, bornoutswe, familytype, numrelocations) we included in the model. We should think about if it is reasonable to assume that all of them are underlying factors, whether some variables should be excluded and if there are other potential variables to include in the model.

Note
For extra review of multiple linear regression analysis, please re-visit the section on Multiple linear regression.

The next step in our analysis, based on our aim and research questions, is interaction analysis.