Scatterplot

Quick facts

Number of variables
Two

Scales of variable(s)
Continuous

When we had two categorical variables, we could produce a crosstable to see how these two variables were related. If we have two continuous variables, we may use something called a scatterplot instead. Each dot in the scatterplot represents one individual in our data. We may also include a reference line here, to see if we have a pattern in our data (this will be discussed later).

The scatterplot can thus be used to illustrate how two continuous variables co-vary – or “correlate” – in their pattern of values. If increasing values of one variable correspond to increasing values of another variable, it is called a positive correlation. If increasing values of one variable correspond to decreasing values of another variable, we have a negative correlation. In the graph below, different types of correlation are presented. The letter “x” stands for x-axis (horizontal axis) and the letter “y” stands for y-axis (vertical axis).

Note
While not addressed here, patterns can of course also be non-linear (in contrast to the positive and negative correlations shown in the graphs above).

Function

Basic command
graph twoway scatter yvar xvar
Useful options
graph twoway (scatter yvar xvar) (lfit yvar xvar)
graph twoway (scatter yvar xvar) (lfitci yvar xvar
Explanations
yvarInsert the name of the first variable you want to use. This variable will be chosen for the y-axis (vertical axis).
xvarInsert the name of the first variable you want to use. This variable will be chosen for the x-axis (horizontal axis).
lfit Fit a regression line.
lfitciFit a regression line and include confidence intervals.
More information
help scatter

Practical example

Dataset
StataData1.dta
Variable namegpa
Variable labelGrade point average (Age 15, Year 1985)
Value labelsN/A
Variable namecognitive
Variable labelCognitive test score (Age 15, Year 1985)
Value labelsN/A

graph twoway (scatter gpa cognitive) (lfitci gpa cognitive)

In the scatterplot above, we display gpa on the y-axis (vertical axis) and cognitive on the x-axis (horizontal axis). We can see a quite clear positive correlation here: the higher the cognitive test scores, the higher the grade point average. This is also illustrated by the fitted regression line.

Note
You can use the Graph Editor (see Graphs) to further edit the scatterplot.