The Kendall's concordance coefficient and a test to examine its significance

The Kendall's $\widetilde{W}$ coefficient of concordance is described in the works of Kendall, Babington-Smith (1939)1) and Wallis (1939)2). It is used when the result comes from different sources (from different raters) and concerns a few ($k\geq2$) objects. However, the assessment concordance is necessary. Is often used in measuring the interrater reliability strength – the degree of (raters) assessment concordance.

The Kendall's coefficient of concordance is calculated on an ordinal scale or a interval scale. Its value is calculated according to the following formula:

\begin{displaymath}
\widetilde{W}=\frac{12U-3n^2k(k+1)^2}{n^2k(k^2-1)-nC},
\end{displaymath}

where:

$n$ – number of different assessments sets (the number of raters),

$k$ – number of ranked objects,

$\displaystyle U=\sum_{j=1}^k\left(\sum_{i=1}^nR_{ij}\right)^2$,

$R_{ij}$ – ranks ascribed to the following objects $(j=1,2,...k)$, independently for each rater $(i=1,2,...n)$,

$\displaystyle C=\sum(t^3-t)$ – a correction for ties,

$t$ – number of cases incorporated into tie.

The coefficient's formula includes $C$ – the correction for ties. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $C=0$).

Note

$W$ – the Kendall's coefficient in a population;

$\widetilde{W}$ – the Kendall's coefficient in a sample.

The value of $W\in<0; 1>$ and it should be interpreted in the following way:

The Kendall's W coefficient of concordance vs. the Spearman coefficient:

\begin{displaymath}
\bar{r}_s=\frac{n\widetilde{W}-1}{n-1}
\end{displaymath}

The Kendall's W coefficient of concordance vs. the Friedman ANOVA:

The chi-square test of significance for the Kendall's coefficient of concordance

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: &  W=0\\
\mathcal{H}_1: &  W\neq0
\end{array}

The test statistic is defined by: \begin{displaymath}
\chi^2=n(k-1)\widetilde{W}
\end{displaymath} This statistic asymptotically (for large sample sizes) has the Chi-square distribution with the degrees of freedom calculated according to the following formula: $df=k-1$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the test of the Kendall's W significance can be opened in Statistics menu →NonParametric testsKendall's W or in ''Wizard''.

EXAMPLE (judges.pqs file)

In the 6.0 system, dancing pairs grades are assessed by 9 judges. The judges point for example an artistic expression. They asses dancing pairs without comparing each of them and without placing them in the particular „podium place” (they create a ranking). Let's check if the judges assessments are concordant.

\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Judges&Couple A&Couple B&Couple C&Couple D&Couple E&Couple F\\\hline
S1&3&6&2&5&4&1\\
S2&4&6&1&5&3&2\\
S3&4&6&2&5&3&1\\
S4&2&6&3&5&4&1\\
S5&2&6&1&5&4&3\\
S6&3&5&1&6&4&2\\
S7&5&4&1&6&3&2\\
S8&3&6&2&5&4&1\\
S9&2&6&3&5&4&1\\\hline
\end{tabular}

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $a lack of concordance between 9 judges assessments,$\\
& $in the population represented by the sample, $\\
\mathcal{H}_1: & $the 9 judges assessments in the population represented$\\
& $by the sample are concordant.$
\end{array}$

Comparing the p <0.0001 with the significance level $\alpha=0.05$, we have stated that the judges assessments are statistically concordant. The concordance strength is high: $\widetilde{W} = 0.83$, similarly the average Spearman's rank-order correlation coefficient: $\bar{r}_s = 0.81$. This result can be presented in the graph, where the X-axis represents the successive judges. Then the more intersection of the lines we can see (the lines should be parallel to the X axis, if the concordance is perfect), the less there is the concordance of rateres evaluations.

1)
Kendall M.G., Babington-Smith B. (1939), The problem of m rankings. Annals of Mathematical Statistics, 10, 275-287
2)
Wallis W.A. (1939), The correlation ratio for ranked data. Journal of the American Statistical Association, 34,533-538