Date variables

Date variables – do not get us started. This is a science in itself! It might nonetheless be very useful later on if you want to perform time-to-event analysis (survival analysis) to be able to generate date variables.

In this example, we will use three variables that specify year, month, and day, respectively, and combine them into a nicely formatted date variable.

Note
This requires that you have performed the practical example in Substring first.

Practical example

Dataset
StataData1.dta
Variable namecvd_year_str
Variable labelYear of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labelsN/A
Variable namecvd_month_str
Variable labelMonth of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labelsN/A
Variable namecvd_day_str
Variable labelDay of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labelsN/A

All three are string variables. To make things smoother, we will transform them into numeric variables, using real.

gen cvd_year=real(cvd_year_str)
gen cvd_month=real(cvd_month_str)
gen cvd_day=real(cvd_day_str)

Just to double-check that everything worked out:

sum cvd_year cvd_month cvd_day

The next step it to generate the date variable.

gen cvd_date=mdy(cvd_month,cvd_day,cvd_year)
Note
The option “mdy” means that the date is specified as month/day/year. This will create a special Stata date variable.

And finally, we format the date variable so it makes more sense for Stata:

format %d cvd_date