The Kruskal-Wallis ANOVA

The Kruskal-Wallis one-way analysis of variance by ranks (Kruskal 1952 1); Kruskal and Wallis 1952 2)) is an extension of the U-Mann-Whitney test on more than two populations. This test is used to verify the hypothesis that there is no shift in the compared distributions, i.e., most often the insignificant differences between medians of the analysed variable in ($k\geq2$) populations (but you need to assume, that the variable distributions are similar - comparison of rank variances can be checked using Conover's rank test).

Additional analyses:

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \phi_1=\phi_2=...=\phi_k,\\
\mathcal{H}_1: & $not all $\phi_j$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\phi_1,\phi_2,...\phi_k$ distributions of the analysed variable of each population.

The test statistic is defined by:

\begin{displaymath}
H=\frac{1}{C}\left(\frac{12}{N(N+1)}\sum_{j=1}^k\left(\frac{\left(\sum_{i=1}^{n_j}R_{ij}\right)^2}{n_j}\right)-3(N+1)\right),
\end{displaymath}

where:

$N=\sum_{j=1}^k n_j$,

$n_j$ – samples sizes $(j=1,2,...k)$,

$R_{ij}$ – ranks ascribed to the values of a variable for $(i=1,2,...n_j)$, $(j=1,2,...k)$,

$\displaystyle C=1-\frac{\sum(t^3-t)}{N^3-N}$ – correction for ties,

$t$ – number of cases included in a tie.

The formula for the test statistic $H$ includes the correction for ties $C$. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $C=1$).

The $H$ statistic asymptotically (for large sample sizes) has the Chi-square distribution with the number of degrees of freedom calculated using the formula: $df = (k - 1)$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The POST-HOC tests

Introduction to the contrasts and the POST-HOC tests was performed in the unit, which relates to the one-way analysis of variance.

The Dunn test

For simple comparisons, equal-size groups as well as unequal-size groups.

The Dunn test (Dunn 19643)) includes a correction for tied ranks (Zar 20104)) and is a test corrected for multiple testing. The Bonferroni or Sidak correction is most commonly used here, although other, newer corrections are also available, described in more detail in Multiple comparisons.

Example - simple comparisons (comparing 2 selected median / mean ranks with each other):

\begin{array}{cc}
\mathcal{H}_0: & \theta_j=\theta_{j+1},\\
\mathcal{H}_1: & \theta_j \neq \theta_{j+1}.
\end{array}

\begin{displaymath}
CD=Z_{\frac{\alpha}{c}}\sqrt{\frac{N(N+1)}{12}\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)},
\end{displaymath}

where:

$\displaystyle Z_{\frac{\alpha}{c}}$ - is the critical value (statistic) of the normal distribution for a given significance level $\alpha$ corrected on the number of possible simple comparisons $c$.

\begin{displaymath}
Z=\frac{\sum_{j=1}^k c_j\overline{R}_j}{\sqrt{\frac{N(N+1)}{12}\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)}},
\end{displaymath}

where:

$\overline{R}_j$ – mean of the ranks of the $j$-th group, for $(j=1,2,...k)$,

The formula for the test statistic $Z$ includes a correction for tied ranks. This correction is applied when tied ranks are present (when there are no tied ranks this correction is not calculated because $\sum(t^3-t)=0$).

The test statistic asymptotically (for large sample sizes) has the normal distribution, and the p-value is corrected on the number of possible simple comparisons $c$.

Conover-Inman test

The non-parametric equivalent of Fisher LSD5), used for simple comparisons of both groups of equal and different sizes.

\begin{displaymath}
CD=\sqrt{F_{\alpha,1,N-k}}\cdot\sqrt{S^2\frac{N-1-H}{N-k}\sum_{j=1}^k \frac{c_j^2}{n_j}},
\end{displaymath}

where:

$\displaystyle S^2=\frac{1}{N-1}\left(\sum_{j=1}^k\sum_{i=1}^{n_j}R_{ij}^2-N\frac{(N+1)^2}{4}\right)$

$\displaystyle F_{\alpha,1,N-k}$ is the critical value (statistic) Snedecor's F distribution for a given significance level $\alpha$ and for degrees of freedom respectively: 1 i $N-k$.

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_j\overline{R}_j}{\sqrt{S^2\frac{N-1-H}{N-k}\sum_{j=1}^k \frac{c_j^2}{n_j}}},
\end{displaymath}

where:

$\overline{R}_j$ – The mean ranks of the $j$-th group, for $(j=1,2,...k)$,

This statistic follows a t-Student distribution with $N-k$ degrees of freedom.

The settings window with the Kruskal-Wallis ANOVA can be opened in Statistics menu→NonParametric tests Kruskal-Wallis ANOVA or in ''Wizard''.

EXAMPLE (jobSatisfaction.pqs)

A group of 120 people was interviewed, for whom the occupation is their first job obtained after receiving appropriate education. The respondents rated their job satisfaction on a five-point scale, where:

1- unsatisfying job,

2- job giving little satisfaction,

3- job giving an average level of satisfaction,

4- job that gives a fairly high level of satisfaction,

5- job that is very satisfying.

We will test whether the level of reported job satisfaction does not change for each category of education.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the level of job satisfaction is the same for each education category,$\\
\mathcal{H}_1: & $at least one education category (one population)  $ \\
&$has different levels of job satisfaction.$
\end{array}$

The obtained value of p=0.001 indicates a significant difference in the level of satisfaction between the compared categories of education. Dunn's POST-HOC analysis with Bonferroni's correction shows that significant differences are between those with primary and secondary education and those with primary and tertiary education. Slightly more differences can be confirmed by selecting the stronger POST-HOC Conover-Iman.

In the graph showing medians and quartiles we can see homogeneous groups determined by the POST-HOC test. If we choose to present Dunn's results with Bonferroni correction we can see two homogeneous groups that are not completely distinct, i.e. group (a) - people who rate job satisfaction lower and group (b)- people who rate job satisfaction higher. Vocational education belongs to both of these groups, which means that people with this education evaluate job satisfaction quite differently. The same description of homogeneous groups can be found in the results of the POST-HOC tests.

We can provide a detailed description of the data by selecting descriptive statistics in the analysis window and indicating to add counts and percentages to the description.

We can also show the distribution of responses in a column plot.

1)
Kruskal W.H., Wallis W.A. (1952), Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583-621
2) , 3)
Dunn O. J. (1964), Multiple comparisons using rank sums. Technometrics, 6: 241–252
4)
Zar J. H., (2010), Biostatistical Analysis (Fifth Editon). Pearson Educational
5)
Conover W. J. (1999), Practical nonparametric statistics (3rd ed). John Wiley and Sons, New York