For this example, we will use Approach A to conduct an interaction analysis based on linear regression. We want to see if sex (z) moderates the association between grade point average (x) and income (y).
| Dataset |
| StataData1.dta |
| Variable name | income |
| Variable label | Annual salary income (Age 40, Year 2010) |
| Value labels | N/A |
| Variable name | gpa |
| Variable label | Grade point average (Age 15, Year 1985) |
| Value labels | N/A |
| Variable name | sex |
| Variable label | Sex |
| Value labels | 0=Man 1=Woman |
Define the analytical sample
We start by defining the analytical sample:
gen pop_interact1=1 if income!=. & gpa!=. & sex!=. |
Let us have a quick look at the variables:
sum income gpa sex if pop_interact1==1 |

Simple regression models
First, we will run the simple models, one for gpa and income, and one for sex and income.
reg income gpa if pop_interact1==1 |

reg income sex if pop_interact1==1 |

There are statistically significant associations in both simple models. More specifically, the B coefficient for gpa is 37281 (95 % CI: 33653 to 40909) and the B coefficient for sex is -76479 (95% CI: -81299 to -71659).
Multiple regression model
Next, we run a model with both independent variables included:
reg income gpa sex if pop_interact1==1 |

We can note that the B coefficients increase quite a lot (i.e. become further from 0).
Multiple regression model with interaction effect
In this step, we will include the interaction term using Approach 2 (two hashtags mean that we specify the main effects and the interaction effect at the same time):
reg income c.gpa##i.sex if pop_interact1==1 |

In the table above, we can see that the estimate for the interaction term has a p-value below 0.05 (0.000). This suggests that there is a statistically significant interaction effect between grade point average and sex on income.
Interpretation
How can we understand this interaction effect that we found? Since our x-variable – sex – is binary, the easiest strategy for gaining more insight would be to do sex-specific analyses of the association between gpa and income.
We start with a model for men, and then continue with the same for women.
reg income gpa if pop_interact1==1 & sex==0 |

reg income gpa if pop_interact1==1 & sex==1 |

The sex-specific models show that the slope in income according to grade point average is steeper among men compared to among women.
Illustration
In order to illustrate the interaction, we can use the margins command. The first step is re-run the model. The quietly option is included in the beginning of the command to suppress the output.
quietly reg income c.gpa##i.sex if pop_interact1==1 |
Then we can produce the margins (the quietly option is included here as well):
quietly margins sex, at(gpa=(1 5)) |
| Note We specify 1 and 5 here since they represent the lowest and highest values for the variable gpa. |
And then, finally, it is time to produce the marginsplot:
marginsplot |

| Summary There is a positive, statistically significant association between grade point average at age 15 and income at age 40. While this association exists among men and women alike, the slope is steeper among men. |