Simple linear regression analyses

Step 5: Simple linear regression with the independent variable

Because the outcome variable, in this case healthissues, is continuous, the type of regression analysis that will be used is linear regression. With this analysis we will answer the first research question:

Is there a relationship between exposure to bullying and health problems?

Note
Remember that the analyses should only be based on those with a value of 1 for the pop variable.
Note
Remember to use factor variables when including categorical/non-binary variables.

We want to perform simple linear regression analysis for our independent variable both to answer the first research question, but also to have something to compare the multiple linear regression model to. Now, we perform a simple linear regression analysis with bullied and healthissues.

reg healthissues i.bullied if pop==1

Here we can see that 3.8% of the variation in health issues is explained by exposure to bullying. There is a positive (B=8.14) and statistically significant (p=0.000) relationship between exposure to bullying and self-rated health issues.

Summary
There is a positive relationship between exposure to bullying and self-rated health problems. Individuals who had been exposed to bullying in school for the past year have a higher mean value of self-rated health issues compared to the individuals not exposed to bullying.

Thereafter, we save the estimates from the model and name it “m1”. We save these estimates in order to compare the simple linear regression model to the multiple linear regression model later on.

estimates store m1

Step 6: Simple linear regression analyses with the covariates

Although the aim of our study did not include investigating the covariates and their own association to our outcome, it may still be relevant to conduct these analyses. This may help understand the analytical model and how the covariates relate to the outcome. Therefore, we will explore this in the remaining part of this page. Again, this is just to explore and have a deeper understanding of our model and its variables. We first perform a simple linear regression analysis with sex and health issues.

reg healthissues i.sex if pop==1

Here we can see that 15% of the variation in health issues is explained by gender. There is a positive (B=6.7) and statistically significant (p=0.000) relationship between gender and self-rated health issues. This means that girls have a higher mean value of self-rated health issues, compared to boys.

Next step is to perform a simple linear regression analysis with gradeyear and healthissues.

reg healthissues i.gradeyear if pop==1

Here we can see that only 0.1% of the variation in health issues is explained by which grade the individuals are in. There is a positive (B=0.58) and statistically significant (p=0.002) relationship between grade and self-rated health issues. This means that those in the 10th grade have a higher mean value of self-rated health issues, compared to those in 7th grade. This association is however quite weak and the variance explained by grade very small.

Next step is to perform a simple linear regression analysis with bornoutswe and healthissues.

reg healthissues i.bornoutswe if pop==1

Here we can see that 0% of the variation in health issues is explained by being born outside of Sweden. There is no statistically significant association between being born outside of Sweden and self-rated health issues (p=0.683).

Next step is to perform a simple linear regression analysis with famtype and healthissues.

reg healthissues i.famtype if pop==1

Here we can see that 0.9% of the variation in health issues is explained by family type. There is a positive and statistically significant (p=0.000) relationship between family type and self-rated health issues. The individuals who alternate between their parents have a bit higher mean value (B=0.98) of self-rated health issues compared to the reference group of living with both parents. The individuals who live with one parent have an even higher mean value (B=1.93) of self-rated health issues compared to the reference group.

Next step is to perform a simple linear regression analysis with numrelocations and healthissues.

reg healthissues numrelocations if pop==1

Here we can see that 1% of the variation in health issues is explained by number or relocations. There is a positive (B=0.46) and statistically significant (p=0.000) relationship between number of relocations and self-rated health issues. This means that more relocations is associated with more self-rated health issues.

Note
For extra review, please re-visit the section on Simple linear regression.

Now that all simple linear regression models have been performed, the next step is to perform our multiple linear regression model.