The “pop” variable

It is easy to define an analytical sample in Stata. However, there are some different ways through which you can apply the analytical sample – below, we have described our favourite approach.

You first need to determine exactly which variables are included in the analysis (i.e. all variable you use, not all variables in the data material). They should have been properly examined (i.e. reviewed and checked with some initial descriptive statistics) and recoded as you want them.

In the example below, we have chosen four variables that we want to include in our study.

Practical example

Dataset

StataData1.dta

Variable name	sex
Variable label	Sex
Value labels	0=Man 1=Woman

Variable name	bullied
Variable label	Exposure to bullying (Age 15, Year 1985)
Value labels	0=No 1=Yes

Variable name	gpa
Variable label	Grade point average (Age 15, Year 1985)
Value labels	N/A

Variable name	cognitive
Variable label	Cognitive test score (Age 15, Year 1985)
Value labels	N/A

sum sex bullied gpa cognitive

Apart from sex, we can see that they all have (different amounts of) missing values.

The first step is to create a “pop” variable – “pop” stands for population – with the gen command (see P-values).

gen pop=1 if sex!=. & bullied!=. & gpa!=. & cognitive!=.

Through this, we specify that the new variable pop is assigned the value 1 if there is no missing information for any of the four variables. Let us check what it looks like:

tab pop

We can then apply the pop variable to anything we like, using if. For example:

tab sex if pop==1

Note
Of course, you do not have to call this variable “pop” – choose any name you like.