Two ways of generating the interaction term

Regardless of whether you choose Approach A or B, here are two ways that you can generate the product term or combination variable. Doing it manually – what we here call Approach 1, requires that you use gen,and sometimes recode and/or if. Approach 2 does it automatically. While we like to do it manually since we feel more in control of what is happening, doing it automatically is of course easier and faster.

Note
The manual approach creates interaction terms in the dataset, whereas the automatic approach treats interaction terms as virtual (they do not actually exist in the dataset).

Approach 1: Manual

To illustrate what we mean by a manual approach, we will present two examples. For the first, we create a simple product term whereas, for the second, we create a combination variable.

Product term

Let us assume that we want to see the interaction effect between blood pressure and sex on some outcome. Blood pressure (bp) is a continuous variable whereas sex (sex) is a binary variable (0=Man, 1=Woman). We can simply multiply these terms: x*z

gen bp_sex=bp*sex

In our model, we would thus include the following independent variables: bp, sex, and bp_sex. This is what it would look like if we did a very basic logistic regression analysis:

logistic yvarname bp sex bp_sex

Combination variable

If one of our independent variables are ordinal or nominal (non-binary), we cannot multiply them. Instead, we have to create combinations of the variables. Let us now assume that we want to see the interaction effect between stress level and sex on some outcome. Stress level (stress) is an ordinal variable with three categories (1=Low, 2=Medium, 3=High), whereas sex (sex) is a binary variable (0=Man, 1=Woman).

In other words, there are six possible combinations. There are many ways that we can use gen, recode, and if to create the combination variable, and this is one of them:

gen stress_sex=.
recode stress_sex (.=1) if stress==1 & sex==0
recode stress_sex (.=2) if stress==2 & sex==0
recode stress_sex (.=3) if stress==3 & sex==0
recode stress_sex (.=4) if stress==1 & sex==1
recode stress_sex (.=5) if stress==2 & sex==1
recode stress_sex (.=6) if stress==3 & sex==1

We would then include the following independent variables in the model: ib1.stress, sex, and ib1.stress_sex (the choice of reference categories is up to you). This is what it would look like if we did a very basic logistic regression analysis:

logistic yvarname ib1.stress sex ib1.stress_sex

Approach 2: Automatic

To illustrate what we mean by automatic, we first have to further discuss what factor variables are in Stata.

We have already showed earlier in this guide how factor variables can be used as a way of specifying the reference category of categorical (non-binary) variables that we include in regression analysis (also see Factor variables).

However, this is just one application. We can also use factor variables to denote interactions. There are five factor-variable operators (i.e. prefix) that are possible to use:

i.Specify an indicator variable.
c.Specify a continuous variable.
o.Specify omitted levels (categories) of a variable.  
#Binary operator to specify an interaction.
##Binary operator to specify factorial interactions.

Product term and combination variable

Let us assume that we want to see the interaction effect between blood pressure and sex on some outcome. Blood pressure (bp) is a continuous variable whereas sex (sex) is a binary variable (0=Man, 1=Woman). If we would specify the interaction with a binary operator, it would look like this:

c.bp#i.sex

In our model, we would include the following: bp, sex, and c.bp#i.sex. This is what it would look like if we did a very basic logistic regression analysis:

logistic yvarname bp sex c.bp#i.sex

Alternatively, we could have made use of Stata’s factorial interactions:

logistic yvarname c.bp##i.sex

This would produce exactly the same output.

As you probably figured out already, we do the same if one of our independent variables are ordinal or nominal (non-binary). Let us assume that we want to see the interaction effect between stress level and sex on some outcome. Stress level (stress) is an ordinal variable with three categories (1=Low, 2=Medium, 3=High), whereas sex (sex) is a binary variable (0=Man, 1=Woman). In other words, there are six possible combinations. If we would specify the interaction with a binary operator, it would like this:

i.stress#i.sex

In our model, we would include the following: ib1.stress, sex, and i.stress#i.sex. This is what it would look like if we did a very basic logistic regression analysis:

logistic yvarname ib1.stress sex i.stress#i.sex

And, of course, we could have made use of Stata’s factorial interactions instead:

logistic yvarname i.stress##i.sex

This would produce exactly the same output.

Note
It is possible also to specify the reference category (base level) of interaction terms. For example: c.bp#ib1.sex or ib2.stress##ib0.sex