Variable structure

We will now use some syntax to create the variables, as specified below:

First variable
Name: id
Label: ID number
Type: int
Format: %8.0g
Values labels: N/A

Second variable
Name: sex
Label: What is your biological sex?
Type: int
Format: %8.0g
Values labels: 0=Man, 1=Woman

Third variable
Name: srh
Label: How would you rate your health?
Type: int
Format: %8.0g
Values labels: 1=Poor, 2=Good, 3=Excellent

Fourth variable
Name: income
Label: What is your annual income?
Type: int
Format: %9.0g
Value labels: N/A

Fifth variable
Name: comment
Label: Do you have any comments on the survey?
Type: strL
Format: %9s
Value labels: N/A

Step 1. Generate variables and specify type

As a first step, we generate the variables and specify their type. We also need to tell Stata what the values of the new variable should be. For the numeric variables, we go with missing (denoted by “.”).

gen int id=.
gen int sex=.
gen int srh=.
gen int income=.
gen strL comment=""
Note
For our string variable, it is slightly different when we specify the missing values. We need to use double quotes here – but nothing actually has to be specified within the double quotes.
More information
help generate

Step 2. Add labels

We can now add labels for the variables:

label variable id "ID number"
label variable sex "What is your biological sex?"
label variable srh "How would you rate your health?"
label variable income "What is your annual income?"
label variable comment "Do you have any comments on the survey?"
More information
help label

Step 3. Specify value labels

The final step is to add value labels for the categorical variables. We do this by first specifying a set of value labels:

label define sex 0 "Man" 1 "Woman"
label define srh 1 "Poor" 2 "Good" 3 "Excellent"
Note
For simplicity reasons, the set of value labels have the same name as the corresponding variables. But if we would have had, for example, a whole set of variables which all had the response options “No” and “Yes”, we could have created just one set of value labels and used it for all those variables.

Step 4. Apply value labels

Now it is time to apply our set of values to the right variables:

label values sex sex
label values srh srh
Note
Following label values, the first specification is the variable name and the second specification is the name of the set of value labels.

Adjust labels

Do you suddenly realize that you need to adjust your value labels? You can change them by adding a comma and then the option replace to the command:

label define sex 0 "Man" 1 "Woman", replace
label define srh 1 "Poor" 2 "Good" 3 "Excellent", replace

You can also delete value labels by writing the following:

label drop sex
label drop srh
More information
help label