PQStat - Baza Wiedzy

The Kendall's concordance coefficient and a test to examine its significance

The Kendall's $\widetilde{W}$ coefficient of concordance is described in the works of Kendall, Babington-Smith (1939)¹⁾ and Wallis (1939)²⁾. It is used when the result comes from different sources (from different raters) and concerns a few ( $k\geq2$ ) objects. However, the assessment concordance is necessary. Is often used in measuring the interrater reliability strength – the degree of (raters) assessment concordance.

The Kendall's coefficient of concordance is calculated on an ordinal scale or a interval scale. Its value is calculated according to the following formula:

$\begin{displaymath} \widetilde{W}=\frac{12U-3n^2k(k+1)^2}{n^2k(k^2-1)-nC}, \end{displaymath}$

where:

$n$ – number of different assessments sets (the number of raters),

$k$ – number of ranked objects,

$\displaystyle U=\sum_{j=1}^k\left(\sum_{i=1}^nR_{ij}\right)^2$ ,

$R_{ij}$ – ranks ascribed to the following objects $(j=1,2,...k)$ , independently for each rater $(i=1,2,...n)$ ,

$\displaystyle C=\sum(t^3-t)$ – a correction for ties,

$t$ – number of cases incorporated into tie.

The coefficient's formula includes $C$ – the correction for ties. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $C=0$ ).

Note

$W$ – the Kendall's coefficient in a population;

$\widetilde{W}$ – the Kendall's coefficient in a sample.

The value of $W\in<0; 1>$ and it should be interpreted in the following way:

$\widetilde{W}\approx1$ means a strong concordance in raters assessments;
$\widetilde{W}\approx0$ means a lack of concordance in raters assessments.

The Kendall's W coefficient of concordance vs. the Spearman coefficient:

When the values of the Spearman $r_s$ correlation coefficient (for all possible pairs) are calculated, the average coefficient – marked by $\bar{r}_s$ is a linear function of $\widetilde{W}$ coefficient:

$\begin{displaymath} \bar{r}_s=\frac{n\widetilde{W}-1}{n-1} \end{displaymath}$

The Kendall's W coefficient of concordance vs. the Friedman ANOVA:

The Kendall's $\widetilde{W}$ coefficient of concordance and the Friedman ANOVA are based on the same mathematical model. As a result, the value of the chi-square test statistic for the Kendall's coefficient of concordance and the value of the chi-square test statistic for the Friedman ANOVA are the same.

The chi-square test of significance for the Kendall's coefficient of concordance

Basic assumptions:

measurement on an ordinal scale or on an interval scale.

Hypotheses:

$\begin{array}{cl} \mathcal{H}_0: & W=0\\ \mathcal{H}_1: & W\neq0 \end{array}$

The test statistic is defined by: $\begin{displaymath} \chi^2=n(k-1)\widetilde{W} \end{displaymath}$ This statistic asymptotically (for large sample sizes) has the Chi-square distribution with the degrees of freedom calculated according to the following formula: $df=k-1$ .

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$ :

$\begin{array}{ccl} $ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ \mathcal{H}_1, \\ $ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\ \end{array}$

The settings window with the test of the Kendall's W significance can be opened in Statistics menu →NonParametric tests→Kendall's W or in ''Wizard''.

EXAMPLE (judges.pqs file)

In the 6.0 system, dancing pairs grades are assessed by 9 judges. The judges point for example an artistic expression. They asses dancing pairs without comparing each of them and without placing them in the particular „podium place” (they create a ranking). Let's check if the judges assessments are concordant.

$\begin{tabular}{|c|c|c|c|c|c|c|} \hline Judges&Couple A&Couple B&Couple C&Couple D&Couple E&Couple F\\\hline S1&3&6&2&5&4&1\\ S2&4&6&1&5&3&2\\ S3&4&6&2&5&3&1\\ S4&2&6&3&5&4&1\\ S5&2&6&1&5&4&3\\ S6&3&5&1&6&4&2\\ S7&5&4&1&6&3&2\\ S8&3&6&2&5&4&1\\ S9&2&6&3&5&4&1\\\hline \end{tabular}$

Hypotheses:

$\begin{array}{cl} \mathcal{H}_0: & $a lack of concordance between 9 judges assessments,$\\ & $in the population represented by the sample, $\\ \mathcal{H}_1: & $the 9 judges assessments in the population represented$\\ & $by the sample are concordant.$ \end{array}$

Comparing the p <0.0001 with the significance level $\alpha=0.05$ , we have stated that the judges assessments are statistically concordant. The concordance strength is high: $\widetilde{W} = 0.83$ , similarly the average Spearman's rank-order correlation coefficient: $\bar{r}_s = 0.81$ . This result can be presented in the graph, where the X-axis represents the successive judges. Then the more intersection of the lines we can see (the lines should be parallel to the X axis, if the concordance is perfect), the less there is the concordance of rateres evaluations.