en:statpqpl:survpl:phcoxpl:werpl

Model verification

Statistical significance of particular variables in the model (significance of the odds ratio)

On the basis of the coefficient and its error of estimation we can infer if the independent variable for which the coefficient was estimated has a significant effect on the dependent variable. For that purpose we use Wald test.

Hypotheses:

$\begin{array}{cc} \mathcal{H}_0: & \beta_i=0,\\ \mathcal{H}_1: & \beta_i\ne 0. \end{array}$

or, equivalently:

$\begin{array}{cc} \mathcal{H}_0: & OR_i=1,\\ \mathcal{H}_1: & OR_i\ne 1. \end{array}$

The Wald test statistics is calculated according to the formula:

$\begin{displaymath} \chi^2=\left(\frac{b_i}{SE_{b_i}}\right)^2 \end{displaymath}$

The statistic asymptotically (for large sizes) has the Chi-square distribution with $1$ degree of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

$\begin{array}{ccl} $ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ \mathcal{H}_1, \\ $ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\ \end{array}$

The quality of the constructed model

A good model should fulfill two basic conditions: it should fit well and be possibly simple. The quality of Cox proportional hazard model can be evaluated with a few general measures based on: $L_{FM}$ - the maximum value of likelihood function of a full model (with all variables),

$L_0$ - the maximum value of the likelihood function of a model which only contains one free word,

$d$ - the observed number of failure events.

Information criteria are based on the information entropy carried by the model (model insecurity), i.e. they evaluate the lost information when a given model is used to describe the studied phenomenon. We should, then, choose the model with the minimum value of a given information criterion.

$AIC$ , $AICc$ , and $BIC$ is a kind of a compromise between the good fit and complexity. The second element of the sum in formulas for information criteria (the so-called penalty function) measures the simplicity of the model. That depends on the number of parameters ( $k$ ) in the model and the number of complete observations ( $d$ ). In both cases the element grows with the increase of the number of parameters and the growth is the faster the smaller the number of observations.

The information criterion, however, is not an absolute measure, i.e. if all the compared models do not describe reality well, there is no use looking for a warning in the information criterion.

Akaike information criterion

$\begin{displaymath} AIC=-2\ln L_{FM}+2k, \end{displaymath}$

It is an asymptomatic criterion, appropriate for large sample sizes.

Corrected Akaike information criterion

$\begin{displaymath} AICc=AIC+\frac{2k(k+1)}{d-k-1}, \end{displaymath}$

Because the correction of the Akaike information criterion concerns the sample size (the number of failure events) it is the recommended measure (also for smaller sizes).

Bayesian information criterion or Schwarz criterion

$\begin{displaymath} BIC=-2\ln L_{FM}+k\ln(d), \end{displaymath}$

Just like the corrected Akaike criterion it takes into account the sample size (the number of failure events), Volinsky and Raftery (2000)¹⁾.

Pseudo R $^2$ - the so-called McFadden R $^2$ is a goodness of fit measure of the model (an equivalent of the coefficient of multiple determination $R^2$ defined for multiple linear regression).

The value of that coefficient falls within the range of $<0; 1)$ , where values close to 1 mean excellent goodness of fit of the model, $0$ – a complete lack of fit. Coefficient $R^2_{Pseudo}$ is calculated according to the formula:

$\begin{displaymath} R^2_{Pseudo}=1-\frac{\ln L_{FM}}{\ln L_0}. \end{displaymath}$

As coefficient $R^2_{Pseudo}$ does not assume value 1 and is sensitive to the amount of variables in the model, its corrected value is calculated:

$\begin{displaymath} R^2_{Nagelkerke}=\frac{1-e^{-(2/d)(\ln L_{FM}-\ln L_0)}}{1-e^{(2/d)\ln L_0}} \quad \textrm{or}\quad R^2_{Cox-Snell}=1-e^{\frac{(-2\ln L_0)-(-2\ln L_{FM})}{d}}. \end{displaymath}$

Statistical significance of all variables in the model

The basic tool for the evaluation of the significance of all variables in the model is the Likelihood Ratio test. The test verifies the hypothesis:

$\begin{array}{cc} \mathcal{H}_0: & \textrm{all }\beta_i=0,\\ \mathcal{H}_1: & \textrm{there is }\beta_i\neq0. \end{array}$

The test statistic has the form presented below:

$\begin{displaymath} \chi^2=-2\ln(L_0/L_{FM})=-2\ln(L_0)-(-2\ln(L_{FM})). \end{displaymath}$

The statistic asymptotically (for large sizes) has the Chi-square distribution with $k$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

AUC - area under the ROC curve - The ROC curve – constructed based on information about the occurrence or absence of an event and the combination of independent variables and model parameters – allows us to assess the ability of the built PH Cox regression model to classify cases into two groups: (1–event) and (0–no event). The resulting curve, and in particular the area under it, illustrates the classification quality of the model. When the ROC curve coincides with the diagonal $y = x$ , the decision to assign a case to the selected class (1) or (0) made on the basis of the model is as good as randomly allocating the cases under study to these groups. The classification quality of the model is good when the curve is well above the diagonal $y=x$ , that is, when the area under the ROC curve is much larger than the area under the straight line $y=x$ , thus larger than $0.5$

Hypotheses:

$\begin{array}{cl} \mathcal{H}_0: & AUC=0.5, \\ \mathcal{H}_1: & AUC\neq 0.5. \end{array}$

The test statistic has the form:

$\begin{displaymath} Z=\frac{AUC-0.5}{SE_{0.5}}, \end{displaymath}$

where:

$SE_{0.5}$ - field error.

The statistic $Z$ has asymptotically (for large numbers) a normal distribution.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

In addition, a proposed cut-off point value for the combination of independent variables and model parameters is given for the ROC curve.

EXAMPLE cont. (remissionLeukemia.pqs file)

¹⁾

Volinsky C.T., Raftery A.E. (2000) , Bayesian information criterion for censored survival models. Biometrics, 56(1):256–262