On the basis of the coefficient and its error of estimation we can infer if the independent variable for which the coefficient was estimated has a significant effect on the dependent variable. For that purpose we use Wald test.
Hypotheses:
or, equivalently:
The Wald test statistics is calculated according to the formula:
The statistic asymptotically (for large sizes) has the Chi-square distribution with degree of freedom.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
A good model should fulfill two basic conditions: it should fit well and be possibly simple. The quality of Cox proportional hazard model can be evaluated with a few general measures based on:
- the maximum value of likelihood function of a full model (with all variables),
- the maximum value of the likelihood function of a model which only contains one free word,
- the observed number of failure events.
,
, and
is a kind of a compromise between the good fit and complexity. The second element of the sum in formulas for information criteria (the so-called penalty function) measures the simplicity of the model. That depends on the number of parameters (
) in the model and the number of complete observations (
). In both cases the element grows with the increase of the number of parameters and the growth is the faster the smaller the number of observations.
The information criterion, however, is not an absolute measure, i.e. if all the compared models do not describe reality well, there is no use looking for a warning in the information criterion.
It is an asymptomatic criterion, appropriate for large sample sizes.
Because the correction of the Akaike information criterion concerns the sample size (the number of failure events) it is the recommended measure (also for smaller sizes).
Just like the corrected Akaike criterion it takes into account the sample size (the number of failure events), Volinsky and Raftery (2000)1).
The value of that coefficient falls within the range of , where values close to 1 mean excellent goodness of fit of the model,
– a complete lack of fit. Coefficient
is calculated according to the formula:
As coefficient does not assume value 1 and is sensitive to the amount of variables in the model, its corrected value is calculated:
The basic tool for the evaluation of the significance of all variables in the model is the Likelihood Ratio test. The test verifies the hypothesis:
The test statistic has the form presented below:
The statistic asymptotically (for large sizes) has the Chi-square distribution with degrees of freedom.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
Hypotheses:
The test statistic has the form:
where:
- field error.
The statistic has asymptotically (for large numbers) a normal distribution.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
In addition, a proposed cut-off point value for the combination of independent variables and model parameters is given for the ROC curve.
EXAMPLE cont. (remissionLeukemia.pqs file)