X and y

If you read about a variable being “independent”, an “exposure”, or a “predictor” – what does that mean? Basically, it means that someone thinks that this variable has an (statistical) effect on another variable. For the sake of simplicity, let us just call this type of variable “x”. The other variable – the one that x is assumed to affect – is called “dependent” variable or “outcome”. Again, to make it simpler, we can call it “y”.

Examples
Smoking (x) –> Lung cancer (y)
Unemployment (x) –> Low income (y)
Yoga lessons (x) –> Lower stress levels (y)

The examples presented above may suggest that it is easy to know which variable is x and which is y, but this is not always the case. Sometimes the situation is more complex. As an example, let us take the association between health and educational attainment: does a lower educational attainment (x) lead to worse health (y) or does poor health (x) result in lower educational attainment (y)? These kinds of issues are sometimes discussed in terms of “direction of causality” (again, see A note on causal inference for a more thorough discussion about causality). In cases like that you need to think about what is more reasonable: what does the previous literature/theory say about the association? Preferably, we would want to design the study in a way that solves the issue of directionality.