Date variables – do not get us started. This is a science in itself! It might nonetheless be very useful later on if you want to perform time-to-event analysis (survival analysis) to be able to generate date variables.
In this example, we will use three variables that specify year, month, and day, respectively, and combine them into a nicely formatted date variable.
Note This requires that you have performed the practical example in Substring first.
Practical example
Dataset
StataData1.dta
Variable name
cvd_year_str
Variable label
Year of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labels
N/A
Variable name
cvd_month_str
Variable label
Month of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labels
N/A
Variable name
cvd_day_str
Variable label
Day of out-patient care due to CVD (Ages 41-50, Years 2011-2020)
Value labels
N/A
All three are string variables. To make things smoother, we will transform them into numeric variables, using real.
gen cvd_year=real(cvd_year_str)
gen cvd_month=real(cvd_month_str)
gen cvd_day=real(cvd_day_str)
Just to double-check that everything worked out:
sum cvd_year cvd_month cvd_day
The next step it to generate the date variable.
gen cvd_date=mdy(cvd_month,cvd_day,cvd_year)
Note The option “mdy” means that the date is specified as month/day/year. This will create a special Stata date variable.
And finally, we format the date variable so it makes more sense for Stata: