The “pop” variable

It is easy to define an analytical sample in Stata. However, there are some different ways through which you can apply the analytical sample – below, we have described our favourite approach.

You first need to determine exactly which variables are included in the analysis (i.e. all variable you use, not all variables in the data material). They should have been properly examined (i.e. reviewed and checked with some initial descriptive statistics) and recoded as you want them.

In the example below, we have chosen four variables that we want to include in our study.

Practical example

Dataset
StataData1.dta
Variable namesex
Variable labelSex
Value labels0=Man
1=Woman
Variable namebullied
Variable labelExposure to bullying (Age 15, Year 1985)
Value labels0=No
1=Yes
Variable namegpa
Variable labelGrade point average (Age 15, Year 1985)
Value labelsN/A
Variable namecognitive
Variable labelCognitive test score (Age 15, Year 1985)
Value labelsN/A

sum sex bullied gpa cognitive

Apart from sex, we can see that they all have (different amounts of) missing values.

The first step is to create a “pop” variable – “pop” stands for population – with the gen command (see P-values).

gen pop=1 if sex!=. & bullied!=. & gpa!=. & cognitive!=.

Through this, we specify that the new variable pop is assigned the value 1 if there is no missing information for any of the four variables. Let us check what it looks like:

tab pop

We can then apply the pop variable to anything we like, using if. For example:

tab sex if pop==1

Note
Of course, you do not have to call this variable “pop” – choose any name you like.