Confounding variables

A confounder is a variable that influences both the x-variable and the y-variable and, therefore, makes you think that there is an actual relationship between x and y (but it is due to z). Put differently, the confounder distorts the analysis. Suppose that we find that people who consume a lot of coffee (x) have an increased risk of lung cancer (y). A probable confounder could be cigarette smoking (z): smokers drink more coffee and have greater risk of lung cancer.

One should always worry about confounding in research, both when we conduct our own research and when we review others’ research.

Address confounding by study design

If you are about to collect your own data, there are many ways to design a study to reduce potential confounding (see Section 3.1). The most obvious solution might be to do an experimental study (e.g. a randomised controlled trial; RCT). Experimental studies are, however, not always feasible, and most of the time, we do observational studies (e.g. cohort studies or case-control studies). Here, it is necessary to review the scientific literature, and then make sure to collect data on all potential confounders.

Address confounding in statistical analysis

Usually, we work with data that have already been collected – and perhaps for other purposes than what we are interested in. At this stage, you can explore multiple regression analysis with adjustment for confounding, as well as try out stratified analysis and interaction analysis (see Mediation analysis). Make sure to adjust for confounding, the best that you can.

Address confounding with specific methods

In addition, there are some specific statistical methods that can be used to handle confounding, such as propensity score matching. This will not be covered in this guide, but if you are interested, we recommend that you explore this further.

More information
help teffects psmatch