Descriptive analysis

Before we move on to Cox regression analysis, let us explore the time-to-event data properly first.  

We can start with a simple description of how the data are arranged:

stdescribe, noshow

This shows, among other things, that we have 10,000 individuals in the analytical sample, of which 518 (5.18%) have experienced the outcome (cvd).  

We also get some descriptive statistics for entry time, exit time, and time at risk. Since we have specified cvd_origin as date of birth, the values for mean/min/median/max entry time and exit time reflect age.  

In the output above, mean age at entry is 40.51. This is actually the same for all individuals since they have the same date for cvd_origin and the same date for cvd_enter, which also explains why the same value is presented for min, median, and max.  

The mean age at exit is 50.25 (min: 40.53, median: 50.50, max: 50.50). The reason why the same value is given for median and max is because a great majority of the individuals in the sample have not experienced the event are thus are censored at the end of follow-up (which equals age 50.50). 

Time at risk is here presented as years. We can see that the mean is 9.74 (min=0.03, median=10.00, max=10.00). Again, the median and max values are the same since most individuals are censored at the end of follow-up (i.e. after 10 years). 

More information
help stdescribe

Restricted mean survival time 

Another way of obtaining the mean time at risk is through stci, which produces the restricted mean survival time (same as mean time at risk) along with information on standard errors and confidence intervals.   

Estimating the restricted mean (or average) survival time is determined by calculating the area under the survival curve, restricting the estimation to the longest follow-up time. Below, we also include the noshow option.

stci, rmean noshow

The restricted mean survival time in this example is 9.74 (years). Because of the way that our model is specified, the restricted mean survival time is the same as the mean time at risk. 

However, the estimate has been flagged by Stata since the observation with the longest follow-up time is censored, which leads to the survivor function not reaching zero. As a consequence, the mean is underestimated. 

Extended mean survival time 

An alternative to the restricted mean survival time is to look at the extended mean survival time instead. This extends the survivor function from the last observed time to zero by using an exponential function.

stci, emean noshow

The extended mean survival time is 910 (years), which of course is a completely absurd estimate. This shows that the extended mean survival time should be used very cautiously.  

We can produce a graph to have a closer look at the issue:

stci, emean graph

The curve shows the survival probability (i.e. the probability of not experiencing out-patient care due to CVD) across analysis time (years). The area under the curve is the proportion of individuals not experiencing the event. In sum, it takes more than 4,500 years for the survival function to reach 0 – although life expectancy is indeed increasing globally, we can probably conclude that estimating the extended mean survival time is not a reasonable alternative in the context of this example. This is perhaps not completely surprising since we do not expect that the entire sample at some point will be experiencing out-patient care due to CVD. 

More information
help stci

Summary statistics

It is also possible to produce some summary statistics:

stsum, noshow

This shows time at risk, the incidence rate, and number of subjects, as well as survival time at the 25th, 50th, and 75th percentile. 

Note
Survival time at the 50th percentile is the same as median survival time.
More information
help stsum

Median survival time 

Another way of obtaining the median survival time through the following command: 

stci, median noshow

In the current example, we do not get any values for survival time at the 25th, 50th, or 75th percentile since the percentage of failure (i.e. proportion of cases with cvd) is very low (less than 5%). It would nevertheless be possible to estimate survival time at the 1st to 4th percentile. Let us try out the first and last of these:

stci, p(1) noshow
stci, p(4) noshow

Since we specified origin as date of birth (well, not the exact date) when we applied stset to the data, we actually get the median survival age for the two percentiles. For the 1st percentile, median survival age is 42.64 (95% CI: 42.21-42.85), whereas it is 48.19 (95% CI: 47.55-48.71) for the 4th percentile.

More information
help stci