The standard score – or the z-score – is very useful when we have continuous (ratio/interval) variables with different normal distributions (see Distributions for more information about distributions).
For example, if we have one variable called income (measured as annual household income in Swedish crowns) and another variable called years of schooling (measured as the total number of years spent in the educational system), these variables obviously have very different distributions.
Suppose we want to compare which one – income or years of schooling – has a larger statistical effect on our outcome. That is not possible using the variables we have. The solution is to standardize (i.e. calculate z-scores for) these two variables so that they are comparable.
Z-scores are expressed in terms of standard deviations from the mean.
What we do is that we take a variable and “rescale” it so that it has a mean of 0 and a standard deviation of 1.
Each individual’s value on the standardized variable indicates its difference from the mean of the original (unstandardized) variable in number of standard deviations.
A value of 1.5 would thus suggest that this individual has a value that is 1½ standard deviations above the mean, whereas a value of -2 would suggest that this individual has a value that is 2 standard deviations below the mean.
Function
Basic command
egen newvarname=std(oldvarname)
Explanations
newvarname
Insert the name of the newvariable.
oldvarname
Insert the name of the old variable.
std
Standard deviation
More information help egen
Practical example
Dataset
StataData1.dta
Variable name
gpa
Variable label
Grade point average (Age 15, Year 1985)
Value labels
N/A
Variable name
cognitive
Variable label
Cognitive test score (Age 15, Year 1985)
Value labels
N/A
egen z_gpa=std(gpa)
egen z_cognitive=std(cognitive)
Now you have new versions – containing z-scores – of the two variables.