en:statpqpl:wielowympl:wielorpl:weryfpl

Model verification

Statistical significance of particular variables in the model.

On the basis of the coefficient and its error of estimation we can infer if the independent variable for which the coefficient was estimated has a significant effect on the dependent variable. For that purpose we use t-test.

Hypotheses:

$\begin{array}{cc} \mathcal{H}_0: & \beta_i=0,\\ \mathcal{H}_1: & \beta_i\ne 0. \end{array}$

Let us estimate the test statistics according to the formula below:

$\begin{displaymath} t=\frac{b_i}{SE_{b_i}} \end{displaymath}$

The test statistics has t-Student distribution with $n-k$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

$\begin{array}{ccl} $ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ \mathcal{H}_1, \\ $ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\ \end{array}$

The quality of the constructed model of multiple linear regression can be evaluated with the help of several measures.

The standard error of estimation – it is the measure of model adequacy:

$\begin{displaymath} SE_e=\sqrt{\frac{\sum_{i=1}^ne_i^2}{n-(k+1)}}. \end{displaymath}$

The measure is based on model residuals $e_i=y_i-\widehat{y}_i$ , that is on the discrepancy between the actual values of the dependent variable $y_i$ in the sample and the values of the independent variable $\widehat{y}_i$ estimated on the basis of the constructed model. It would be best if the difference were as close to zero as possible for all studied properties of the sample. Therefore, for the model to be well-fitting, the standard error of estimation ( $SE_e$ ), expressed as $e_i$ variance, should be the smallest possible.

Multiple correlation coefficient $R=\sqrt{R^2} \in <0; 1>$ – defines the strength of the effect of the set of variables $X_1,X_2,\ldots X_k$ on the dependent variable $Y$ .
Multiple determination coefficient $R^2$ – it is the measure of model adequacy.

The value of that coefficient falls within the range of $<0; 1>$ , where 1 means excellent model adequacy, 0 – a complete lack of adequacy. The estimation is made using the following formula:

$\begin{displaymath} T_{SS}=E_{SS}+R_{SS}, \end{displaymath}$

where:

$T_{SS}$ – total sum of squares,

$E_{SS}$ – the sum of squares explained by the model,

$R_{SS}$ – residual sum of squares.

The coefficient of determination is estimated from the formula:

$\begin{displaymath} R^2=\frac{T_{SS}}{E_{SS}}. \end{displaymath}$

It expresses the percentage of the variability of the dependent variable explained by the model.

As the value of the coefficient $R^2$ depends on model adequacy but is also influenced by the number of variables in the model and by the sample size, there are situations in which it can be encumbered with a certain error. That is why a corrected value of that parameter is estimated:

$\begin{displaymath} R^2_{adj}=R^2-\frac{k(1-R^2)}{n-(k+1)}. \end{displaymath}$

Information criteria are based on the entropy of information carried by the model (model uncertainty) i.e. they estimate the information lost when a given model is used to describe the phenomenon under study. Therefore, we should choose the model with the minimum value of a given information criterion.

The $AIC$ , $AICc$ and $BIC$ is a kind of trade-off between goodness of fit and complexity. The second element of the sum in the information criteria formulas (the so-called loss or penalty function) measures the simplicity of the model. It depends on the number of variables in the model ( $k$ ) and the sample size ( $n$ ). In both cases, this element increases as the number of variables increases, and this increase is faster the smaller the number of observations.The information criterion, however, is not an absolute measure, i.e., if all the models being compared misdescribe reality in the information criterion there is no point in looking for a warning.

Akaike information criterion

$\begin{displaymath} AIC=n\cdot \ln{\frac{R_{SS}}{n}}+2(k+1)+(constant) \end{displaymath}$

where, the constant can be omitted because it is the same in each of the compared models.

This is an asymptotic criterion - suitable for large samples i.e. when $\frac{n}{k+2}>40$ . For small samples, it tends to favor models with a large number of variables.

Example of interpretation of AIC size comparison

Suppose we determined the AIC for three models $AIC_1$ =100, $AIC_2$ =101.4, $AIC_3$ =110. Then the relative reliability for the model can be determined. This reliability is relative because it is determined relative to another model, usually the one with the smallest AIC value. We determine it according to the formula: $e^{(AIC_{min}− AIC_i)/2}$ . Comparing model 2 to model 1, we will say that the probability that it will minimize the loss of information is about half of the probability that model 1 will do so (specifically exp((100− 101.4)/2) = 0.497). Comparing model 3 to model one, we will say that the probability that it will minimize information loss is a small fraction of the probability that model 1 will do so (specifically exp((100- 110)/2) = 0.007).

Akaike coreccted information criterion

$\begin{displaymath} AICc=AIC+\frac{2(k+3)(k+4)}{n-k} \end{displaymath}$

Correction of Akaike's criterion relates to sample size, which makes this measure recommended also for small sample sizes.

Bayes Information Criterion (or Schwarz criterion)

$\begin{displaymath} BIC=n\cdot \ln{\frac{R_{SS}}{n}}+(k+1)\ln{n}+(constant) \end{displaymath}$

where, the constant can be omitted because it is the same in each of the compared models.

Like Akaike's revised criterion, the BIC takes into account the sample size.

Error analysis for ex post forecasts:

MAE (mean absolute error) -– forecast accuracy specified by MAE informs how much on average the realised values of the dependent variable will deviate (in absolute value) from the forecasts.

$\begin{displaymath} MAE=\frac{1}{n}\sum_{i=1}^n\left|e_i\right| \end{displaymath}$

MPE (mean percentage error) -– informs what average percentage of the realization of the dependent variable are forecast errors.

$\begin{displaymath} MPE=\frac{1}{n}\sum_{i=1}^n\frac{e_i}{y_i} \end{displaymath}$

MAPE (mean absolute percentage error) -– informs about the average size of forecast errors expressed as a percentage of the actual values of the dependent variable. MAPE allows you to compare the accuracy of forecasts obtained from different models.

$\begin{displaymath} MAPE=\frac{1}{n}\sum_{i=1}^n\left|\frac{e_i}{y_i}\right| \end{displaymath}$

Statistical significance of all variables in the model

The basic tool for the evaluation of the significance of all variables in the model is the analysis of variance test (the F-test). The test simultaneously verifies 3 equivalent hypotheses:

$\begin{array}{cc} \mathcal{H}_0: & \textrm{all }\beta_i=0,\\ \mathcal{H}_0: & R^2=0,\\ \mathcal{H}_0: & $a lack of a linear relation$, \end{array}$ $\begin{array}{cc} \mathcal{H}_1: & \textrm{exists } \beta_i\neq0;\\ \mathcal{H}_1: & R^2\neq0;\\ \mathcal{H}_1: & $linearity of the relation$. \end{array}$

The test statistics has the form presented below:

$\begin{displaymath} F=\frac{E_{MS}}{R_{MS}} \end{displaymath}$

where:

$\displaystyle E_{MS}=\frac{E_{SS}}{df_{E}}$ – the mean square explained by the model,

$\displaystyle R_{MS}=\frac{R_{SS}}{df_{R}}$ – residual mean square,

$df_E=k$ , $df_R=n-(k+1)$ – appropriate degrees of freedom.

That statistics is subject to F-Snedecor distribution with $df_E$ and $df_R$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

EXAMPLE (publisher.pqs file)