Though the Cox model assumes that the hazard function is continuous, and therefore are no tied failure times, ties nonetheless occur. Stata provides four options for dealing with tied failures in your data when calculating the partial likelihood. A brief explanation for these methods given a hypothetical tie between two individuals is outlined below.
Method
Explanation
Exact marginal calculation
In this method, we assume that time is continuous, the two individuals did not really fail at the same time, but our measurements are imprecise. So, we do not know the order in which they failed. The likelihood calculation is based on the probability that the two individuals fail in any order, which is the sum of the probability that individual 1 fails first + the probability that individual 2 fails first.
Exactpartial calculation
In this method, we assume that time is discrete and that the two individuals really did fail at the same time. So, we treat it as a multinomial problem where the conditional probability is derived from a set of possibilities. This method can take a long time to calculate and may produce questionable results if the risk sets are large and with many ties.
Breslow approximation
Approximates the exact marginal calculation by using a common denominator for all failure events. In other words, the risk sets for the second – nth failure events are not adjusted for previous failures. This is the fastest method, and works better if the number of failures is small relative to the size of the risk set.
Efron approximation
Also approximates the exact marginal calculation, but the risk set for the second – nth failure events are adjusted using probability weights. The Efron approximation is more accurate than the Breslow approximation but is relatively slower to calculate.
In Stata, the Breslow method is the default method, and does not need to be specified. You may remember seeing “Cox regression — Breslow method for ties” in the output from your practical examples earlier in this section. If one of the other three methods is more suitable, you can specify the method after the comma per the below examples:
stcox var1…varx, efron
stcox var1…varx, exactm
stcox var1…varx, exactp
Note If there are no ties in your data, you will obtain identical results, no matter which method you select. Having a few ties in your data will also not yield wildly different results.
You can also check the number of ties in your data in Stata.
First save your data! Then, after your data have been stset, keep only the failures:
keep if _d
Sort by time:
sort _t
Generate a count of the instances of time:
by _t : gen number = _n
Keep one observation representing time:
by _t : keep if _n ==1
Check the average number of failures per time:
summarize number
Check the frequency of the number of failures:
tab number
Note You can use the preserve command before dropping observations, and the restore command at the end to return your data to its original state.