Below, an example of a short questionnaire is presented.

Before we can actually code the questionnaire responses, we need to create the variable structure in Stata.
In the questionnaire shown above, there is a total of five variables:
- ID number
- What is your biological sex?
- How would you rate your health?
- What is your annual income?
- Do you have any comments on the survey?
In Stata, each of these variables should be specified according to:
- Name
- Label
- Type
- Format
- Value Label
Name
This is the name that you choose for a variable.
- Make it short, clear, and logical.
- Avoid any spaces or special symbols.
- Underscores can be useful.
- It is highly recommended that you use lower case letters.
Label
This a more elaborate description of your variable. If the variable is drawn from a questionnaire, it would be practical to use the question as the label.
Type
There are two different types of variables in Stata: numeric and string.
Numeric variables
Can only handle numeric data. Such variables are the basis of quantitative research – which is why we usually always “translate” categorical variables into numeric variables by assigning a number to each category.
Numbers are stored as byte, int, long, float, or double. Among these, byte, int, and long can hold only integers (i.e. whole numbers) whereas float and double can handle decimals. The default storage type when you create a new variable in Stata is float.
String variables
Can handle any data (i.e. any numbers and letters) but is more difficult to analyse. Therefore, they are often processed (“quantified”) in ways that make it possible to use them in statistical analysis.
Either way, strings are stored as str#, for instance, str1, str2, str3, …, or as strL. The # sign indicates the maximum length of the string, i.e. how many characters that the variable can store. For example, a str2 can hold the word “no”, but not the word “yes”. A strL can hold strings up to 2000000000 characters.
| Note If you are worried about the size of your data files, it is good to read up on the different storage types. If not, just keep in mind the difference between numeric and string variables. And also make sure to know whether your variable is an integer (whole numbers) or not (has decimals). |
Format
The format of the variable is a function of its type. The different storage types and their default formats are:
- byte %8.0g
- int % 8.0g
- long %12.0g
- float %9.0g
- double %10.0g
- str# %#s
- strL %9s
Value label
This is where you specify the labels for any categories that the variable might have (thus, only useful for categorical variables, not continuous ones).