Regression analysis is of course about data, but it is also about design. The way in which you think your variables are related needs to be translated into an analytical strategy (or modelling strategy). A good way to start is to make a drawing with boxes and arrows: each variable is put into one box and then you put simple-headed or double-headed arrows between the boxes to illustrate how the variables are associated to one another. Remember that the analytical strategy should reflect the aim of the study.
| Example Suppose we are interested in the association between children’s cognitive ability and educational attainment in adulthood. To examine this association is thus the aim of the study. We think that this association may be confounded by parents’ educational attainment and mediated by children’s school marks. Moreover, we suspect that the association may look different depending on the child’s gender. The research questions (RQs) can thus be formulated as: RQ1. Is children’s cognitive ability associated with educational attainment in adulthood? RQ2. If so, is this association confounded by parents’ educational attainment? RQ3. To what extent is the association between children’s cognitive ability and educational attainment in adulthood mediated by school marks in childhood? RQ4. Is there any gender difference in the association between children’s cognitive ability and educational attainment in adulthood? |
Accordingly, these are the variables we need to include in our analysis:
| x | Cognitive ability in childhood | Ratio |
| y | Educational attainment in adulthood | Ordinal |
| z/confounder | Parents’ educational attainment | Ordinal |
| z/mediator | School marks in childhood | Ratio |
| z/moderator | Child’s gender | Nominal (binary) |
And this is how we may choose to illustrate our analytical strategy:

Often, we want to break down our analysis in different steps – or models. We want our analysis – as a whole – to answer our research questions.
Note that there is no “perfect” way of setting up models. It is often a matter of academic traditions and taste. Some prefer to add variables (confounders, mediators) stepwise, so that each subsequent model becomes more and more complex. Others prefer to do a series of separate models and then finish with “full” model.
We only have some advice:
- Always also present an unadjusted analysis for your main association (i.e. simple regression).
- Remember that confounders and mediators play different roles: we are supposed to get rid of the confounding, whereas the mediation could tell us something about possible explanations. In other words, make sure not to mix these up in the analysis (or, in the interpretation and discussion of the results).
- Moderators are a different kind of animal, and are therefore treated and presented in a slightly different way in comparison to confounders and mediators.
Unadjusted model
First, start with a simple regression analysis of your main association:

We would also encourage you to do the same for your other variables:

| Note If we would have had several confounders, and/or mediators, and/or moderators, these would also have generated their own simple regression model. |
Model 1
We continue with multiple regression analysis, by focusing on our main association (x and y) and adding the confounding variable to the model.

Here, we are interested to see if the estimate(s) for the association between x and y changes when the confounder is added. Does it become weaker (compared to the simple model)?
| Note In cases where you have several confounders, you can choose to enter them stepwise one at a time, a few at a time, or all at once. Just remember that if you enter more than one at a time, and you do see a change in the estimate for the association between x and y, you need to check which confounder(s) that might be causing this change. |
Model 2
The next step is to add the mediator.

Again, we are interested to see if the estimate(s) for the association between x and y changes when the mediator is added. Does it become weaker (compared to the simple model)?
| Note As for cases where you have several mediators: you can choose to enter them stepwise one at a time, a few at a time, or all at once. Just remember that if you enter more than one at a time, and you do see a change in the estimate for the association between x and y, you need to check which mediator(s) that might be causing this change. |
| Note Remember that this kind of mediation approach might be criticised if you do a non-linear (e.g. logistic, ordinal, multinomial, Cox) regression analysis. See Mediation analysis for an alternative approach to mediation analysis. |
Model 3
And the final step is to add the moderator. Like it was said earlier, this is more complicated – we will save the details for Interaction analysis. But for now, we will just specify this as the following:
