The monotonic correlation coefficients

The monotonic correlation may be described as monotonically increasing or monotonically decreasing. The relation between 2 features is presented by the monotonic increasing if the increasing of the one feature accompanies with the increasing of the other one. The relation between 2 features is presented by the monotonic decreasing if the increasing of the one feature accompanies with the decreasing of the other one.

The Spearman's rank-order correlation coefficient $r_s$ is used to describe the strength of monotonic relations between 2 features: $X$ and $Y$. It may be calculated on an ordinal scale or an interval one. The value of the Spearman's rank correlation coefficient should be calculated using the following formula:

\begin{displaymath} \label{rs}
r_s=1-\frac{6\sum_{i=1}^nd_i^2}{n(n^2-1)},
\end{displaymath}

where:

$d_i=R_{x_i}-R_{y_i}$ – difference of ranks for the feature $X$ and $Y$,

$n$ number of $d_i$.

This formula is modified when there are ties:

\begin{displaymath}
r_s=\frac{\Sigma_X+\Sigma_Y-\sum_{i=1}^nd_i^2}{2\sqrt{\Sigma_X\Sigma_Y}},
\end{displaymath}

where:

  • $\Sigma_X=\frac{n^3-n-T_X}{12}$, $\Sigma_Y=\frac{n^3-n-T_Y}{12}$,
  • $T_X=\sum_{i=1}^s (t_{i_{(X)}}^3-t_{i_{(X)}})$, $T_Y=\sum_{i=1}^s (t_{i_{(Y)}}^3-t_{i_{(Y)}})$,
  • $t$ – number of cases included in tie.

This correction is used, when ties occur. If there are no ties, the correction is not calculated, because the correction is reduced to the formula describing the above equation.

Note

$R_s$ – the Spearman's rank correlation coefficient in a population;

$r_s$ – the Spearman's rank correlation coefficient in a sample.

The value of $r_s\in<-1; 1>$, and it should be interpreted the following way:

The Kendall's tau correlation coefficient (Kendall (1938)1)) is used to describe the strength of monotonic relations between features . It may be calculated on an ordinal scale or interval one. The value of the Kendall's $\tilde{\tau}$ correlation coefficient should be calculated using the following formula: \begin{displaymath}
\tilde{\tau}=\frac{2(n_C-n_D)}{\sqrt{n(n-1)-T_X}\sqrt{n(n-1)-T_Y}},
\end{displaymath}

where:

  • $n_C$ – number of pairs of observations, for which the values of the ranks for the $X$ feature as well as $Y$ feature are changed in the same direction (the number of agreed pairs),
  • $n_D$ – number of pairs of observations, for which the values of the ranks for the $X$ feature are changed in the different direction than for the $Y$ feature (the number of disagreed pairs),
  • $T_X=\sum_{i=1}^s (t_{i_{(X)}}^2-t_{i_{(X)}})$, $T_Y=\sum_{i=1}^s (t_{i_{(Y)}}^2-t_{i_{(Y)}})$,
  • $t$ – number of cases included in a tie.

The formula for the $\tilde{\tau}$ correlation coefficient includes the correction for ties. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $T_X=0$ i $T_Y=0$) .

Note

$\tau$ – the Kendall's correlation coefficient in a population;

$\tilde{\tau}$ – the Kendall's correlation coefficient in a sample.

The value of $\tilde{\tau}\in<-1; 1>$, and it should be interpreted the following way:

Spearman's versus Kendall's coefficient

EXAMPLE cont. (sex-height.pqs file)

1)
Kendall M.G. (1938), A new measure of rank correlation. Biometrika, 30, 81-93