Standardization: z-scores

Written by:

Ylva B Almquist

The standard score – or the z-score – is very useful when we have continuous (ratio/interval) variables with different normal distributions (see Distributions for more information about distributions).

For example, if we have one variable called income (measured as annual household income in Swedish crowns) and another variable called years of schooling (measured as the total number of years spent in the educational system), these variables obviously have very different distributions.

Suppose we want to compare which one – income or years of schooling – has a larger statistical effect on our outcome. That is not possible using the variables we have. The solution is to standardize (i.e. calculate z-scores for) these two variables so that they are comparable.

Z-scores are expressed in terms of standard deviations from the mean.

What we do is that we take a variable and “rescale” it so that it has a mean of 0 and a standard deviation of 1.

Each individual’s value on the standardized variable indicates its difference from the mean of the original (unstandardized) variable in number of standard deviations.

A value of 1.5 would thus suggest that this individual has a value that is 1½ standard deviations above the mean, whereas a value of -2 would suggest that this individual has a value that is 2 standard deviations below the mean.

Function

Basic command

egen newvarname=std(oldvarname)

Explanations
`newvarname`	Insert the name of the new variable.
`oldvarname`	Insert the name of the old variable.
`std`	Standard deviation

More information
help egen

Practical example

Dataset

StataData1.dta

Variable name	gpa
Variable label	Grade point average (Age 15, Year 1985)
Value labels	N/A

Variable name	cognitive
Variable label	Cognitive test score (Age 15, Year 1985)
Value labels	N/A

egen z_gpa=std(gpa)

egen z_cognitive=std(cognitive)

Now you have new versions – containing z-scores – of the two variables.

sum gpa z_gpa cognitive z_cognitive

codebook gpa z_gpa cognitive z_cognitive, compact