en:statpqpl:wielowympl:logistpl:werpl

Model verification

Statistical significance of particular variables in the model (significance of the Odds Ratio)

On the basis of the coefficient and its error of estimation we can infer if the independent variable for which the coefficient was estimated has a significant effect on the dependent variable. For that purpose we use Wald test.

Hypotheses:

$\begin{array}{cc} \mathcal{H}_0: & \beta_i=0,\\ \mathcal{H}_1: & \beta_i\ne 0. \end{array}$

or, equivalently:

$\begin{array}{cc} \mathcal{H}_0: & OR_i=1,\\ \mathcal{H}_1: & OR_i\ne 1. \end{array}$

The Wald test statistics is calculated according to the formula:

$\begin{displaymath} \chi^2=\left(\frac{b_i}{SE_{b_i}}\right)^2 \end{displaymath}$

The statistic asymptotically (for large sizes) has the Chi-square distribution with $1$ degree of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

$\begin{array}{ccl} $ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ \mathcal{H}_1, \\ $ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\ \end{array}$

The quality of the constructed model

A good model should fulfill two basic conditions: it should fit well and be possibly simple. The quality of multiple linear regression can be evaluated can be evaluated with a few general measures based on: $L_{FM}$ - the maximum value of likelihood function of a full model (with all variables),

$L_0$ - the maximum value of the likelihood function of a model which only contains one free word,

$n$ - the sample size.

Information criteria are based on the information entropy carried by the model (model insecurity), i.e. they evaluate the lost information when a given model is used to describe the studied phenomenon. We should, then, choose the model with the minimum value of a given information criterion.

$AIC$ , $AICc$ , and $BIC$ is a kind of a compromise between the good fit and complexity. The second element of the sum in formulas for information criteria (the so-called penalty function) measures the simplicity of the model. That depends on the number of variables ( $k$ ) in the model and the sample size ( $n$ ). In both cases the element grows with the increase of the number of variables and the growth is the faster the smaller the number of observations. The information criterion, however, is not an absolute measure, i.e. if all the compared models do not describe reality well, there is no use looking for a warning in the information criterion.

Akaike information criterion

$\begin{displaymath} AIC=-2\ln L_{FM}+2k, \end{displaymath}$

It is an asymptomatic criterion, appropriate for large sample sizes.

Corrected Akaike information criterion

$\begin{displaymath} AICc=AIC+\frac{2k(k+1)}{n-k-1}, \end{displaymath}$

Because the correction of the Akaike information criterion concerns the sample size it is the recommended measure (also for smaller sizes).

Bayesian information criterion or Schwarz criterion

$\begin{displaymath} BIC=-2\ln L_{FM}+k\ln(n), \end{displaymath}$

Just like the corrected Akaike criterion it takes into account the sample size.

Pseudo R $^2$ - the so-called McFadden R $^2$ is a goodness of fit measure of the model (an equivalent of the coefficient of multiple determination $R^2$ defined for multiple linear regression).

The value of that coefficient falls within the range of $<0; 1)$ , where values close to 1 mean excellent goodness of fit of a model, $0$ - a complete lack of fit Coefficient $R^2_{Pseudo}$ is calculated according to the formula:

$\begin{displaymath} R^2_{Pseudo}=1-\frac{\ln L_{FM}}{\ln L_0}. \end{displaymath}$

As coefficient $R^2_{Pseudo}$ never assumes value 1 and is sensitive to the amount of variables in the model, its corrected value is calculated:

$\begin{displaymath} R^2_{Nagelkerke}=\frac{1-e^{-(2/n)(\ln L_{FM}-\ln L_0)}}{1-e^{(2/n)\ln L_0}} \quad \textrm{lub}\quad R^2_{Cox-Snell}=1-e^{\frac{(-2\ln L_0)-(-2\ln L_{FM})}{n}}. \end{displaymath}$

Statistical significance of all variables in the model

The basic tool for the evaluation of the significance of all variables in the model is the Likelihood Ratio test. The test verifies the hypothesis:

$\begin{array}{cc} \mathcal{H}_0: & \textrm{all }\beta_i=0,\\ \mathcal{H}_1: & \textrm{there is }\beta_i\neq0. \end{array}$

The test statistic has the form presented below:

$\begin{displaymath} \chi^2=-2\ln(L_0/L_{FM})=-2\ln(L_0)-(-2\ln(L_{FM})). \end{displaymath}$

The statistic asymptotically (for large sizes) has the Chi-square distribution with $k$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

Hosmer-Lemeshow test - The test compares, for various subgroups of data, the observed rates of occurrence of the distinguished value $O_g$ and the predicted probability $E_g$ . If $O_g$ and $E_g$ are close enough then one can assume that an adequate model has been built.

For the calculation the observations are first divided into $G$ subgroups - usually deciles ( $G=10$ ).

Hypotheses:

$\begin{array}{cl} \mathcal{H}_0: & O_g=E_g $ for all categories,$\\ \mathcal{H}_1: & O_g \neq E_g $ for at least one category.$ \end{array}$

The test statistic has the form presented below:

$\begin{displaymath} H=\sum_{g=1}^G\frac{(O_g-E_g)^2}{E_g(1-\frac{E_g}{N_g})}, \end{displaymath}$

where:

$N_g$ - the number of observations in group $g$ .

The statistic asymptotically (for large sizes) has the Chi-square distribution with $G-2$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

AUC - the area under the ROC curve - The ROC curve built on th ebasis of the value of the dependent variable, and the predicted probability of dependent variable $P$ , allows to evaluate the ability of the constructed logistic regression model to classify the cases into two groups: (1) and (0). The constructed curve, especially the area under the curve, presents the classification quality of the model. When the ROC curve overlaps with the diagonal $y = x$ , then the decision about classifying a case within a given class (1) or (0), made on the basis of the model, is as good as a random division of the studied cases into the groups. The classification quality of a model is good when the curve is much above the diagonal $y = x$ , that is when the area under the ROC curve is much larger than the area under the $y = x$ line, i.e. it is greater than $0.5$

Hypotheses:

$\begin{array}{cl} \mathcal{H}_0: & AUC=0.5, \\ \mathcal{H}_1: & AUC\neq 0.5. \end{array}$

The test statistic has the form presented below:

$\begin{displaymath} Z=\frac{AUC-0.5}{SE_{0.5}}, \end{displaymath}$

where:

$SE_{0.5}$ - area error.

Statistics $Z$ asymptotically (for large sizes) has the normal distribution.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

Additionally, for ROC curve the suggested value of the cut-off point of the predicted probability is given, together with the table of sensitivity and specificity for each possible cut-off point.

Note!

More possibilities of calculating a cut-off point are offered by module **ROC curve**. The analysis is made on the basis of observed values and predicted probability obtained in the analysis of Logistic Regression.

Classification

On the basis of the selected cut-off point of predicted probability we can change the classification quality. By default the cut-off point has the value of 0.5. The user can change the value into any value from the range of $(0.1)$ , e.g. the value suggested by the ROC curve.

As a result we shall obtain the classification table and the percentage of properly classified cases, the percentage of properly classified (0) - specificity, and the percentage of properly classified (1) - sensitivity.