Model verification

On the basis of the coefficient and its error of estimation we can infer if the independent variable for which the coefficient was estimated has a significant effect on the dependent variable. For that purpose we use Wald test.

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \beta_i=0,\\
\mathcal{H}_1: & \beta_i\ne 0.
\end{array}

or, equivalently:

\begin{array}{cc}
\mathcal{H}_0: & OR_i=1,\\
\mathcal{H}_1: & OR_i\ne 1.
\end{array}

The Wald test statistics is calculated according to the formula:

\begin{displaymath}
\chi^2=\left(\frac{b_i}{SE_{b_i}}\right)^2
\end{displaymath}

The statistic asymptotically (for large sizes) has the Chi-square distribution with $1$ degree of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

A good model should fulfill two basic conditions: it should fit well and be possibly simple. The quality of multiple linear regression can be evaluated can be evaluated with a few general measures based on: $L_{FM}$ - the maximum value of likelihood function of a full model (with all variables),

$L_0$ - the maximum value of the likelihood function of a model which only contains one free word,

$n$ - the sample size.

  • Information criteria are based on the information entropy carried by the model (model insecurity), i.e. they evaluate the lost information when a given model is used to describe the studied phenomenon. We should, then, choose the model with the minimum value of a given information criterion.

$AIC$, $AICc$, and $BIC$ is a kind of a compromise between the good fit and complexity. The second element of the sum in formulas for information criteria (the so-called penalty function) measures the simplicity of the model. That depends on the number of variables ($k$) in the model and the sample size ($n$). In both cases the element grows with the increase of the number of variables and the growth is the faster the smaller the number of observations. The information criterion, however, is not an absolute measure, i.e. if all the compared models do not describe reality well, there is no use looking for a warning in the information criterion.

  • Akaike information criterion

\begin{displaymath}
AIC=-2\ln L_{FM}+2k,
\end{displaymath}

It is an asymptomatic criterion, appropriate for large sample sizes.

  • Corrected Akaike information criterion

\begin{displaymath}
AICc=AIC+\frac{2k(k+1)}{n-k-1},
\end{displaymath}

Because the correction of the Akaike information criterion concerns the sample size it is the recommended measure (also for smaller sizes).

  • Bayesian information criterion or Schwarz criterion

\begin{displaymath}
BIC=-2\ln L_{FM}+k\ln(n),
\end{displaymath}

Just like the corrected Akaike criterion it takes into account the sample size.

  • Pseudo R$^2$ - the so-called McFadden R$^2$ is a goodness of fit measure of the model (an equivalent of the coefficient of multiple determination $R^2$ defined for multiple linear regression).

The value of that coefficient falls within the range of $<0; 1)$, where values close to 1 mean excellent goodness of fit of a model, $0$ - a complete lack of fit Coefficient $R^2_{Pseudo}$ is calculated according to the formula:

\begin{displaymath}
R^2_{Pseudo}=1-\frac{\ln L_{FM}}{\ln L_0}.
\end{displaymath}

As coefficient $R^2_{Pseudo}$ never assumes value 1 and is sensitive to the amount of variables in the model, its corrected value is calculated:

\begin{displaymath}
R^2_{Nagelkerke}=\frac{1-e^{-(2/n)(\ln L_{FM}-\ln L_0)}}{1-e^{(2/n)\ln L_0}} \quad \textrm{lub}\quad R^2_{Cox-Snell}=1-e^{\frac{(-2\ln L_0)-(-2\ln L_{FM})}{n}}.
\end{displaymath}

The basic tool for the evaluation of the significance of all variables in the model is the Likelihood Ratio test. The test verifies the hypothesis:

\begin{array}{cc}
\mathcal{H}_0: & \textrm{all }\beta_i=0,\\
\mathcal{H}_1: & \textrm{there is }\beta_i\neq0.
\end{array}

The test statistic has the form presented below:

\begin{displaymath}
\chi^2=-2\ln(L_0/L_{FM})=-2\ln(L_0)-(-2\ln(L_{FM})).
\end{displaymath}

The statistic asymptotically (for large sizes) has the Chi-square distribution with $k$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

For the calculation the observations are first divided into $G$ subgroups - usually deciles ($G=10$).

Hypotheses:

$
\begin{array}{cl}
\mathcal{H}_0: & O_g=E_g $ for all categories,$\\
\mathcal{H}_1: & O_g \neq E_g $ for at least one category.$
\end{array}
$

The test statistic has the form presented below:

\begin{displaymath}
H=\sum_{g=1}^G\frac{(O_g-E_g)^2}{E_g(1-\frac{E_g}{N_g})},
\end{displaymath}

where:

$N_g$ - the number of observations in group $g$.

The statistic asymptotically (for large sizes) has the Chi-square distribution with $G-2$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & AUC=0.5, \\
\mathcal{H}_1: & AUC\neq 0.5.
\end{array}

The test statistic has the form presented below:

\begin{displaymath}
Z=\frac{AUC-0.5}{SE_{0.5}},
\end{displaymath}

where:

$SE_{0.5}$ - area error.

Statistics $Z$ asymptotically (for large sizes) has the normal distribution.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Additionally, for ROC curve the suggested value of the cut-off point of the predicted probability is given, together with the table of sensitivity and specificity for each possible cut-off point.

Note!

More possibilities of calculating a cut-off point are offered by module **ROC curve**. The analysis is made on the basis of observed values and predicted probability obtained in the analysis of Logistic Regression.

On the basis of the selected cut-off point of predicted probability we can change the classification quality. By default the cut-off point has the value of 0.5. The user can change the value into any value from the range of $(0.1)$, e.g. the value suggested by the ROC curve.

As a result we shall obtain the classification table and the percentage of properly classified cases, the percentage of properly classified (0) - specificity, and the percentage of properly classified (1) - sensitivity.