Regardless of whether you choose Approach A or B, here are two ways that you can generate the product term or combination variable. Doing it manually – what we here call Approach 1, requires that you use gen,and sometimes recode and/or if. Approach 2 does it automatically. While we like to do it manually since we feel more in control of what is happening, doing it automatically is of course easier and faster.
| Note The manual approach creates interaction terms in the dataset, whereas the automatic approach treats interaction terms as virtual (they do not actually exist in the dataset). |
Approach 1: Manual
To illustrate what we mean by a manual approach, we will present two examples. For the first, we create a simple product term whereas, for the second, we create a combination variable.
Product term
Let us assume that we want to see the interaction effect between blood pressure and sex on some outcome. Blood pressure (bp) is a continuous variable whereas sex (sex) is a binary variable (0=Man, 1=Woman). We can simply multiply these terms: x*z
gen bp_sex=bp*sex |
In our model, we would thus include the following independent variables: bp, sex, and bp_sex. This is what it would look like if we did a very basic logistic regression analysis:
logistic yvarname bp sex bp_sex |
Combination variable
If one of our independent variables are ordinal or nominal (non-binary), we cannot multiply them. Instead, we have to create combinations of the variables. Let us now assume that we want to see the interaction effect between stress level and sex on some outcome. Stress level (stress) is an ordinal variable with three categories (1=Low, 2=Medium, 3=High), whereas sex (sex) is a binary variable (0=Man, 1=Woman).
In other words, there are six possible combinations. There are many ways that we can use gen, recode, and if to create the combination variable, and this is one of them:
gen stress_sex=. |
recode stress_sex (.=1) if stress==1 & sex==0 |
recode stress_sex (.=2) if stress==2 & sex==0 |
recode stress_sex (.=3) if stress==3 & sex==0 |
recode stress_sex (.=4) if stress==1 & sex==1 |
recode stress_sex (.=5) if stress==2 & sex==1 |
recode stress_sex (.=6) if stress==3 & sex==1 |
We would then include the following independent variables in the model: ib1.stress, sex, and ib1.stress_sex (the choice of reference categories is up to you). This is what it would look like if we did a very basic logistic regression analysis:
logistic yvarname ib1.stress sex ib1.stress_sex |
Approach 2: Automatic
To illustrate what we mean by automatic, we first have to further discuss what factor variables are in Stata.
We have already showed earlier in this guide how factor variables can be used as a way of specifying the reference category of categorical (non-binary) variables that we include in regression analysis (also see Factor variables).
However, this is just one application. We can also use factor variables to denote interactions. There are five factor-variable operators (i.e. prefix) that are possible to use:
| i. | Specify an indicator variable. |
| c. | Specify a continuous variable. |
| o. | Specify omitted levels (categories) of a variable. |
| # | Binary operator to specify an interaction. |
| ## | Binary operator to specify factorial interactions. |
Product term and combination variable
Let us assume that we want to see the interaction effect between blood pressure and sex on some outcome. Blood pressure (bp) is a continuous variable whereas sex (sex) is a binary variable (0=Man, 1=Woman). If we would specify the interaction with a binary operator, it would look like this:
c.bp#i.sex
In our model, we would include the following: bp, sex, and c.bp#i.sex. This is what it would look like if we did a very basic logistic regression analysis:
logistic yvarname bp sex c.bp#i.sex |
Alternatively, we could have made use of Stata’s factorial interactions:
logistic yvarname c.bp##i.sex |
This would produce exactly the same output.
As you probably figured out already, we do the same if one of our independent variables are ordinal or nominal (non-binary). Let us assume that we want to see the interaction effect between stress level and sex on some outcome. Stress level (stress) is an ordinal variable with three categories (1=Low, 2=Medium, 3=High), whereas sex (sex) is a binary variable (0=Man, 1=Woman). In other words, there are six possible combinations. If we would specify the interaction with a binary operator, it would like this:
i.stress#i.sex
In our model, we would include the following: ib1.stress, sex, and i.stress#i.sex. This is what it would look like if we did a very basic logistic regression analysis:
logistic yvarname ib1.stress sex i.stress#i.sex |
And, of course, we could have made use of Stata’s factorial interactions instead:
logistic yvarname i.stress##i.sex |
This would produce exactly the same output.
| Note It is possible also to specify the reference category (base level) of interaction terms. For example: c.bp#ib1.sex or ib2.stress##ib0.sex |