The survival functions can be built separately for different subgroups, e.g. separately for women and men, and then compared. Such a comparison may concern two curves or more.
The window with settings for the comparison of survival curves
is accessed via the menu Advanced statistics
→Survival analysis
→Comparison groups
Comparisons of survival curves , at particular points of the survival time , in the program can be made with the use of three tests:
Log-rank test the most popular test drawing on the Mantel-Heanszel procedure for many 2 x 2 tables (Mantel-Heanszel 19591), Mantel 19662), Cox 19723)),
Gehan's generalization of Wilcoxon's test deriving from Wilcoxon's test (Breslow 1970, Gehan 19654)5)),
Tarone-Ware test deriving from Wilcoxon's test (Tarone and Ware 19776)).
The three tests are based on the same test statistic, they only differ in weights the particular points of the timeline on which the test statistic is based.
Log-rank test: – all the points of the timeline have the same weight which gives the later values of the timeline a greater influence on the result;
Gehan's generalization of Wilcoxon's test: – time moments are weighted with the number of observations in each of them, so greater weights are ascribed to the initial values of the time line;
Tarone-Ware test: – time moments are weighted with the root of the number of observations in each of them, so the test is situated between the two tests described earlier.
An important condition for using the tests above is the proportionality of hazard. Hazard, defined as the slope of the survival curve, is the measure of how quickly a failure event takes place. Breaking the principle of hazard proportionality does not completely disqualify the tests above but it carries some risks. First of all, the placement of the point of the intersection of the curves with respect to the timeline has a decisive influence on decreasing the power of particular tests.
EXAMPLE cont. (transplant.pqs file)
Hypotheses:
In calculations was used chi-square statistics form:
where:
- covariance matrix of dimensions
where:
diagonal: ,
off diagonal:
– number of moments in time with failure event (death),
– observed number of failure events (deaths) in the -th moment of time,
– observed number of failure events (deaths) in the w -th group w in the -th moment of time,
– expected number of failure events (deaths) in the w -th group w in the -th moment of time,
– the number of cases at risk in the -th moment of time.
The statistic asymptotically (for large sizes) has the Chi-square distribution with degrees of freedom.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
Hazard ratio
In the log-rank test the observed values of failure events (deaths) and the appropriate expected values are given.
The measure for describing the size of the difference between a pair of survival curves is the hazard ratio ().
If the hazard ratio is greater than 1, e.g. , then the degree of the risk of a failure event in the first group is twice as big as in the second group. The reverse situation takes place when is smaller than one. When is equal to 1 both groups are equally at risk.
Note
The confidence interval for is calculated on the basis of the standard deviation of the logarithm (Armitage and Berry 19947)).
EXAMPLE cont. (transplant.pqs file)
Hypotheses:
In the calculation the chi-square statistic was used, in the following form:
where:
– vector of the weights for the compared groups, informing about their natural order (usually the subsequent natural numbers).
The statistic asymptotically (for large sizes) has the Chi-square distribution with degree of freedom.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
In order to conduct a trend analysis in the survival curves the grouping variable must be a numerical variable in which the values of the numbers inform about the natural order of the groups. The numbers in the analysis are treated as the weights.
EXAMPLE cont. (transplant.pqs file)
Often, when we want to compare the survival times of two or more groups, we should remember about other factors which may have an impact on the result of the comparison. An adjustment (correction) of the analysis by such factors can be useful. For example, when studying rest homes and comparing the length of the stay of people below and above 80 years of age, there was a significant difference in the results. We know, however, that sex has a strong influence on the length of stay and the age of the inhabitants of rest homes. That is why, when attempting to evaluate the impact of age, it would be a good idea to stratify the analysis with respect to sex.
Hypotheses for the differences in survival curves:
Hypotheses for the analysis of trends in survival curves:
where -are the survival curves after the correction by the variable determining the strata.
The calculations for test statistics are based on formulas described for the tests, not taking into account the strata, with the difference that matrix U and V is replaced with the sum of matrices and . The summation is made according to the strata created by the variables with respect to which we adjust the analysis l={1,2,…,L}
The p-value, designated on the basis of the test statistic, is compared with the significance level :
EXAMPLE cont. (transplant.pqs file)
The differences for two survival curves
Liver transplantations were made in two hospitals. We will check if the patients' survival time after transplantations depended on the hospital in which the transplantations were made. The comparisons of the survival curves for those hospitals will be made on the basis of all tests proposed in the program for such a comparison.
Hypotheses:
On the basis of the significance level , based on the obtained value p=0.6004 for the log-rank test (p=0.6959 for Gehan's and 0.6465 for Tarone-Ware) we conclude that there is no basis for rejecting the hypothesis . The length of life calculated for the patients of both hospitals is similar.
The same conclusion will be reached when comparing the risk of death for those hospitals by determining the risk ratio. The obtained estimated value is HR=1.1499 and 95% of the confidence interval for that value contains 1: 0.6570, 2.0126.
Differences for many survival curves
Liver transplantations were made for people at different ages. 3 age groups were distinguished: years years</latex>, years years, years years. We will check if the patients' survival time after transplantations depended on their age at the time of the transplantation.
Hypotheses:
On the basis of the significance level , based on the obtained value p=0.0692 in the log-rank test (p=0.0928 for Gehan's and p=0.0779 for Tarone-Ware) we conclude that there is no basis for the rejection of the hypothesis . The length of life calculated for the patients in the three compared age groups is similar. However, it is noticeable that the values are quite near to the standard significance level 0.05.
When examining the hazard values (the ratio of the observed values and the expected failure events) we notice that they are a little higher with each age group (0.68, 0.93, 1.43). Although no statistically significant differences among them are seen it is possible that a growth trend of the hazard value (trend in the position of the survival rates) will be found.
Trend for many survival curves
If we introduce into the test the information about the ordering of the compared categories (we will use the age variable in which the age ranges will be numbered, respectively, 1, 2, and 3), we will be able to check if there is a trend in the compared curves. We will study the following hypotheses:
On the basis of the significance level , based on the obtained value p=0.0237 in the log-rank test (p=0.0317 for Gehan's and p=0.0241 for Tarone-Ware) we conclude that the survival curves are positioned in a certain trend. On the Kaplan-Meier graph the curve for people aged </latex>55 years; 60 years) is the lowest. Above that curve there is the curve for patients aged 50 years; 55 years). The highest curve is the one for patients aged 45 years; 50 years). Thus, the older the patient at the time of a transplantation, the lower the probability of survival over a certain period of time.
Survival curves for stratas
Let us now check if the trend observed before is independent of the hospital in which the transplantation took place. For that purpose we will choose a hospital as the stratum variable.
The report contains, firstly, an analysis of the strata: both the test results and the hazard ratio. In the first stratum the growing trend of hazard is visible but not significant. In the second stratum a trend with the same direction (a result bordering on statistical significance) is observed. A cumulation of those trends in a common analysis of strata allowed the obtainment of the significance of the trend of the survival curves. Thus, the older the patient at the time of a transplantation, the lower the probability of survival over a certain period of time, independently from the hospital in which the transplantation took place.
A comparative analysis of the survival curves, corrected by strata, yields a result significant for the log-rank and Tarone-Ware tests and not significant for Gehan's test, which might mean that the differences among the curves are not so visible in the initial survival periods as in the later ones. By looking at the hazard ratio of the curves compared in pairs
we can localize significant differences. For the comparison of the curve of the youngest group with the curve of the oldest group the hazard ratio is the smallest, 0.53, the 95\% confidence interval for that ratio, 0.26 ; 1.05, does contain value 1 but is on the verge of that value, which can suggest that there are significant differences between the respective curves. In order to confirm that supposition an inquisitive researcher can, with the use of the data filter in the analysis window, compare the curves in pairs.
However, it ought to be remembered that one of the corrections for multiple comparisons should be used and the significance level should be modified. In this case, for Bonferroni's correction, with three comparisons, the significance level will be 0.017. For simplicity, we will only avail ourselves of the log-rank test.
45 lat; 50 lat) vs 50 lat; 55 lat)
45 lat; 50 lat) vs 55 lat; 60 lat)
50 lat; 55 lat) vs 55 lat; 60 lat)
As expected, statistically significant differences only concern the survival curves of the youngest and oldest groups.