It is easy to define an analytical sample in Stata. However, there are some different ways through which you can apply the analytical sample – below, we have described our favourite approach.
You first need to determine exactly which variables are included in the analysis (i.e. all variable you use, not all variables in the data material). They should have been properly examined (i.e. reviewed and checked with some initial descriptive statistics) and recoded as you want them.
In the example below, we have chosen four variables that we want to include in our study.
Practical example
| Dataset |
| StataData1.dta |
| Variable name | sex |
| Variable label | Sex |
| Value labels | 0=Man 1=Woman |
| Variable name | bullied |
| Variable label | Exposure to bullying (Age 15, Year 1985) |
| Value labels | 0=No 1=Yes |
| Variable name | gpa |
| Variable label | Grade point average (Age 15, Year 1985) |
| Value labels | N/A |
| Variable name | cognitive |
| Variable label | Cognitive test score (Age 15, Year 1985) |
| Value labels | N/A |
sum sex bullied gpa cognitive |

Apart from sex, we can see that they all have (different amounts of) missing values.
The first step is to create a “pop” variable – “pop” stands for population – with the gen command (see P-values).
gen pop=1 if sex!=. & bullied!=. & gpa!=. & cognitive!=. |
Through this, we specify that the new variable pop is assigned the value 1 if there is no missing information for any of the four variables. Let us check what it looks like:
tab pop |

We can then apply the pop variable to anything we like, using if. For example:
tab sex if pop==1 |

| Note Of course, you do not have to call this variable “pop” – choose any name you like. |