Pasek boczny

en:statpqpl:korelpl:nparpl:wsppl

The monotonic correlation coefficients

The monotonic correlation may be described as monotonically increasing or monotonically decreasing. The relation between 2 features is presented by the monotonic increasing if the increasing of the one feature accompanies with the increasing of the other one. The relation between 2 features is presented by the monotonic decreasing if the increasing of the one feature accompanies with the decreasing of the other one.

The Spearman's rank-order correlation coefficient $r_s$ is used to describe the strength of monotonic relations between 2 features: $X$ and $Y$. It may be calculated on an ordinal scale or an interval one. The value of the Spearman's rank correlation coefficient should be calculated using the following formula:

\begin{displaymath} \label{rs}
r_s=1-\frac{6\sum_{i=1}^nd_i^2}{n(n^2-1)},
\end{displaymath}

where:

$d_i=R_{x_i}-R_{y_i}$ – difference of ranks for the feature $X$ and $Y$,

$n$ number of $d_i$.

This formula is modified when there are ties:

\begin{displaymath}
r_s=\frac{\Sigma_X+\Sigma_Y-\sum_{i=1}^nd_i^2}{2\sqrt{\Sigma_X\Sigma_Y}},
\end{displaymath}

where:

  • $\Sigma_X=\frac{n^3-n-T_X}{12}$, $\Sigma_Y=\frac{n^3-n-T_Y}{12}$,
  • $T_X=\sum_{i=1}^s (t_{i_{(X)}}^3-t_{i_{(X)}})$, $T_Y=\sum_{i=1}^s (t_{i_{(Y)}}^3-t_{i_{(Y)}})$,
  • $t$ – number of cases included in tie.

This correction is used, when ties occur. If there are no ties, the correction is not calculated, because the correction is reduced to the formula describing the above equation.

Note

$R_s$ – the Spearman's rank correlation coefficient in a population;

$r_s$ – the Spearman's rank correlation coefficient in a sample.

The value of $r_s\in<-1; 1>$, and it should be interpreted the following way:

  • $r_s\approx1$ means a strong positive monotonic correlation (increasing) – when the independent variable increases, the dependent variable increases too;
  • $r_s\approx-1$ means a strong negative monotonic correlation (decreasing) – when the independent variable increases, the dependent variable decreases;
  • if the Spearman's correlation coefficient is of the value equal or very close to zero, there is no monotonic dependence between the analysed features (but there might exist another relation - a non monotonic one, for example a sinusoidal relation).

The Kendall's tau correlation coefficient (Kendall (1938)1)) is used to describe the strength of monotonic relations between features . It may be calculated on an ordinal scale or interval one. The value of the Kendall's $\tilde{\tau}$ correlation coefficient should be calculated using the following formula: \begin{displaymath}
\tilde{\tau}=\frac{2(n_C-n_D)}{\sqrt{n(n-1)-T_X}\sqrt{n(n-1)-T_Y}},
\end{displaymath}

where:

  • $n_C$ – number of pairs of observations, for which the values of the ranks for the $X$ feature as well as $Y$ feature are changed in the same direction (the number of agreed pairs),
  • $n_D$ – number of pairs of observations, for which the values of the ranks for the $X$ feature are changed in the different direction than for the $Y$ feature (the number of disagreed pairs),
  • $T_X=\sum_{i=1}^s (t_{i_{(X)}}^2-t_{i_{(X)}})$, $T_Y=\sum_{i=1}^s (t_{i_{(Y)}}^2-t_{i_{(Y)}})$,
  • $t$ – number of cases included in a tie.

The formula for the $\tilde{\tau}$ correlation coefficient includes the correction for ties. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $T_X=0$ i $T_Y=0$) .

Note

$\tau$ – the Kendall's correlation coefficient in a population;

$\tilde{\tau}$ – the Kendall's correlation coefficient in a sample.

The value of $\tilde{\tau}\in<-1; 1>$, and it should be interpreted the following way:

  • $\tilde{\tau}\approx1$ means a strong agreement of the sequence of ranks (the increasing monotonic correlation) – when the independent variable increases, the dependent variable increases too;
  • $\tilde{\tau}\approx-1$ means a strong disagreement of the sequence of ranks (the decreasing monotonic correlation) – when the independent variable increases, the dependent variable decreases;
  • if the Kendall's $\tilde{\tau}$ correlation coefficient is of the value equal or very close to zero, there is no monotonic dependence between analysed features (but there might exist another relation - a non monotonic one, for example a sinusoidal relation).

Spearman's versus Kendall's coefficient

  • for an interval scale with a normality of the distribution, the $r_s$ gives the results which are close to $r_p$, but $\tilde{\tau}$ may be totally different from $r_p$,
  • the $\tilde{\tau}$ value is less or equal to $r_p$ value,
  • the $\tilde{\tau}$ is an unbiased estimator of the population parameter $\tau$, while the $r_s$ is a biased estimator of the population parameter $R_s$.

EXAMPLE cont. (sex-height.pqs file)

1)
Kendall M.G. (1938), A new measure of rank correlation. Biometrika, 30, 81-93
en/statpqpl/korelpl/nparpl/wsppl.txt · ostatnio zmienione: 2022/02/13 19:58 przez admin

Narzędzia strony