Recode string variables

Written by:

Ylva B Almquist

Recoding string variables builds on the same principle as for numeric variables. However, you need to use a command called replace instead of recode.

It is preferable to generate a copy of the old variable before you start replacing values (or expressions, which is the term used below).

Note
In this example, we are taking a sneak peek at if (which is described in more detail in Condition the data with if).

Function

Basic command

replace varname="exp2" if varname=="exp1"

Explanations
`varname`	Insert the name of the variable that you want to recode.
`exp1`	Specify the value/expression that you want to change.
`exp2`	Specify the value/expression that you want to change to.

More information
help replace

Practical example

Dataset

StataData1.dta

Variable name	marstat30
Variable label	Marital status (Age 30, Year 2000)
Value labels	N/A

First, let us have a look at this variable.

describe marstat30

tab marstat30

We can see that marstat30 is a string variable with four values specified (D, M, UM, and W). As it happens, we know that D=Divorced, M=Married, UM=Unmarried, and W=Widowed. This is what we want to change the values to.

We start by creating a copy of the variable.

gen marstat30num=marstat30

Now we can recode the copy.

replace marstat30num="Divorced" if marstat30num=="D"

replace marstat30num="Married" if marstat30num=="M"

replace marstat30num="Unmarried" if marstat30num=="UM"

replace marstat30num="Widowed" if marstat30num=="W"

Let us see what the variable looks like.

tab marstat30num

Now, it would be even easier to use encode to transform marstat30 into a numeric variable while retaining the values as value labels.