Recode string variables

Recoding string variables builds on the same principle as for numeric variables. However, you need to use a command called replace instead of recode.

It is preferable to generate a copy of the old variable before you start replacing values (or expressions, which is the term used below).

Note
In this example, we are taking a sneak peek at if (which is described in more detail in Condition the data with if).

Function

Basic command
replace varname="exp2" if varname=="exp1"
Explanations
varnameInsert the name of the variable that you want to recode.
exp1Specify the value/expression that you want to change.
exp2Specify the value/expression that you want to change to.
More information
help replace

Practical example

Dataset
StataData1.dta
Variable namemarstat30
Variable labelMarital status (Age 30, Year 2000)
Value labelsN/A

First, let us have a look at this variable.

describe marstat30

tab marstat30

We can see that marstat30 is a string variable with four values specified (D, M, UM, and W). As it happens, we know that D=Divorced, M=Married, UM=Unmarried, and W=Widowed. This is what we want to change the values to.

We start by creating a copy of the variable.

gen marstat30num=marstat30

Now we can recode the copy.

replace marstat30num="Divorced" if marstat30num=="D"
replace marstat30num="Married" if marstat30num=="M"
replace marstat30num="Unmarried" if marstat30num=="UM"
replace marstat30num="Widowed" if marstat30num=="W"

Let us see what the variable looks like.

tab marstat30num

Now, it would be even easier to use encode to transform marstat30 into a numeric variable while retaining the values as value labels.