Spis treści

Comparison - two groups

\begin{pspicture}(0,2.5)(15,14.5)
\rput(2,14){\hyperlink{interwalowa}{Interval scale}}
\rput[tl](.1,13.4){\ovalnode{A}{\hyperlink{rozklad_normalny}{\begin{tabular}{c}Are\\the data\\normally\\distributed?\end{tabular}}}}
\rput[tl](0.1,10){\ovalnode{B}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{A}{B}
\rput[tl](0.1,8){\ovalnode{C}{\hyperlink{wariancja}{\begin{tabular}{c}Are\\the variances\\equal?\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{B}{C}
\rput[br](2.9,3){\rnode{D}{\psframebox{\hyperlink{test_t_student_niezaleny}{\begin{tabular}{c}t-test for\\independent\\groups\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{C}{D}

\rput(6,8.7){\psframebox{\hyperlink{test_t_student_zalezny}{\begin{tabular}{c}t-test for\\dependent\\groups\end{tabular}}}}
\rput(6.2,6.7){\psframebox{\hyperlink{test_cochran_cox}{\begin{tabular}{c}t-test with\\Cochran-Cox\\adjustment\end{tabular}}}}
\psline{->}(3.15,12.7)(6.2,12.7)
\psline{->}(3.35,9.3)(4.9,9.3)
\psline{->}(3.45,6.9)(4.9,6.9)
\rput(2.4,10.4){Y}
\rput(2.4,8.3){N}
\rput(2.3,5.2){Y}
\rput(4.8,12.5){N}
\rput(6.8,11.9){Y}
\rput(9.0,11.5){N}
\rput(12,11.4){Y}
\rput(13.8,11.5){N}
\rput(4.2,9.5){Y}
\rput(4.2,7.1){N}

\rput(8,14){\hyperlink{porzadkowa}{Ordinal scale}}
\rput[tl](6.2,13.5){\ovalnode{E}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\rput[br](7.85,9.8){\rnode{F}{\psframebox{\hyperlink{test_wilcoxon_kolejnosci_par}{\begin{tabular}{c}Wilcoxon\\test for\\dependent\\groups\end{tabular}}}}}
\rput[br](10.3,8.9){\rnode{G}{\psframebox{\hyperlink{test_mann-whitney}{\begin{tabular}{c}Mann\\Whitney\\test,\\$\chi^2$ test\\for trend\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{E}{F}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{E}{G}

\rput(13,14){\hyperlink{nominalna}{Nominal scale}}
\rput[tl](11,13.5){\ovalnode{H}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\rput[br](13.5,9.1){\rnode{I}{\psframebox{\begin{tabular}{c}\hyperlink{test_bowker_mcnemar}{Bowker-}\\\hyperlink{test_bowker_mcnemar}{-McNemar,}\\\hyperlink{test_z_dla_dwoch_zal_proporcji}{$Z$ test for}\\\hyperlink{test_z_dla_dwoch_zal_proporcji}{2 proportions}\end{tabular}}}}
\rput[br](16.1,7.7){\rnode{J}{\psframebox{\begin{tabular}{c}\hyperlink{test_chi_r_na_c}{$\chi^2$ tests,}\\\hyperlink{test_z_dla_dwoch_proporcji}{$Z$ test for 2 proportions}\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{H}{I}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{H}{J}

\rput(6.2,5.5){(\hyperlink{test_f_snedecora}{Fisher-Snedecor test})}
\psline[linestyle=dotted]{<-}(3.6,5.9)(4.2,5.7)
\rput(9,8.4){\hyperlink{testy_normalnosci}{normality tests}}
\psline[linestyle=dotted]{<-}(3.6,11.1)(4.5,9.8)
\psline[linestyle=dotted]{-}(4.5,9.75)(7.6,9.75)
\psline[linestyle=dotted]{-}(7.6,9.75)(7.8,8.6)
\end{pspicture}

 

Parametric tests

The Fisher-Snedecor test

The F-Snedecor test is based on a variable $F$ which was formulated by Fisher (1924), and its distribution was described by Snedecor. This test is used to verify the hypothesis about equality of variances of an analysed variable for 2 populations.

Basic assumptions:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \sigma_1^2=\sigma_2^2,\\
\mathcal{H}_1: & \sigma_1^2\ne\sigma_2^2,
\end{array}

where:

$\sigma_1^2$, $\sigma_2^2$ – variances of an analysed variable of the 1st and the 2nd population.

The test statistic is defined by: \begin{displaymath}
F=\displaystyle{\frac{sd_1^2}{sd_2^2}},
\end{displaymath}

where:

$sd_1^2$, $sd_2^2$ – variances of an analysed variable of the samples chosen randomly from the 1st and the 2nd population.

The test statistic has the F Snedecor distribution with $n_1-1$ and $n_2-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Fisher-Snedecor test can be opened in Statistics menu→Parametric testsF Fisher Snedecor.

Note

Calculations can be based on raw data or data that are averaged like: arithmetic means, standard deviations and sample sizes.

2022/02/09 12:56

The t-test for independent groups

The $t$-test for independent groups is used to verify the hypothesis about the equality of means of an analysed variable in 2 populations.

Basic assumptions:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \mu_1=\mu_2,\\
\mathcal{H}_1: & \mu_1\ne\mu_2.
\end{array}

where:

$\mu_1$, $\mu_2$ – means of an analysed variable of the 1st and the 2nd population.

The test statistic is defined by: \begin{displaymath}
t=\frac{\displaystyle{\overline{x}_1-\overline{x}_2}}{\displaystyle{\sqrt{\frac{(n_1-1)sd_1^2+(n_2-1)sd_2^2}{n_1+n_2-2}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}},
\end{displaymath}

where:

$\overline{x}_1, \overline{x}_2 $ – means of an analysed variable of the 1st and the 2nd sample,

$n_1, n_2 $ – the 1st and the 2nd sample size,

$sd_1^2, sd_2^2 $ – variances of an analysed variable of the 1st and the 2nd sample.

The test statistic has the t-Student distribution with $df=n_1+n_2-2$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note:

  • pooled standard deviation is defined by:

\begin{displaymath}
SD_p=\sqrt{\frac{(n_1-1)sd_1^2+(n_2-1)sd_2^2}{n_1+n_2-2}},
\end{displaymath}

  • standard error of difference of means is defined by:

\begin{displaymath}
SE_{\overline{x}_1-\overline{x}_2}=\displaystyle{\sqrt{\frac{(n_1-1)sd_1^2+(n_2-1)sd_2^2}{n_1+n_2-2}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}.
\end{displaymath}

Standardized effect size.

The Cohen's d determines how much of the variation occurring is the difference between the averages.

\begin{displaymath}
	d=\left|\frac{\overline{x}_1-\overline{x}_2}{SD_p}\right|
\end{displaymath}.

When interpreting an effect, researchers often use general guidelines proposed by Cohen 1) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the t- test for independent groups can be opened in Statistics menu→Parametric testst-test for independent groups or in ''Wizard''.

If, in the window which contains the options related to the variances, you have choosen:

  • equal, the t-test for independent groups will be calculated ,
  • different, the t-test with the Cochran-Cox adjustment will be calculated,
  • check equality, to calculate the Fisher-Snedecor test, basing on its result and set the level of significance, the t-test for independent groups with or without the Cochran-Cox adjustment will be calculated.

Note

Calculations can be based on raw data or data that are averaged like: arithmetic means, standard deviations and sample sizes.

EXAMPLE (cholesterol.pqs file)

Five hundred subjects each were drawn from a population of women and a population of men over 40 years of age. The study concerned the assessment of cardiovascular disease risk. Among the parameters studied is the value of total cholesterol. The purpose of this study will be to compare men and women as to this value. We want to show that these populations differ on the level of total cholesterol and not only on the level of cholesterol broken down into its fractions.

The distribution of age in both groups is a normal distribution (this was checked with the Lilliefors test). The mean cholesterol value in the male group was $\overline{x}_1=201.1$ and the standard deviation $sd_1=47.6$, in the female group $\overline{x}_2=191.5$ and $sd_2=43.5$ respectively. The Fisher-Snedecor test indicates small but statistically significant ($p=0.0434$) differences in variances. The analysis will use the Student's t-test with Cochran-Cox correction

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $The average total cholesterol of the female population is different from$\\
&$the average total cholesterol of the male population,$\\
\mathcal{H}_1: & $The average total cholesterol of the female population equals$ \\
&$the average total cholesterol of the male population.$
\end{array}$

Comparing $p=0.0009$ with a significance level $\alpha=0.05$ we find that women and men in Poland have statistically significant differences in total cholesterol values. The average Polish man over the age of 40 has higher total cholesterol than the average Polish woman by almost 10 units.

2022/02/09 12:56

The t-test with the Cochran-Cox adjustment

The Cochran-Cox adjustment relates to the t-test for independent groups (1957)2) and is calculated when variances of analysed variables in both populations are different.

The test statistic is defined by:

\begin{displaymath}
t=\frac{\overline{x}_1-\overline{x}_2}{\sqrt{\frac{sd_1^2}{n_1}+\frac{sd_2^2}{n_2}}}.
\end{displaymath}

The test statistic has the t-Student distribution with degrees of freedom proposed by Satterthwaite (1946)3) and calculated using the formula:

\begin{displaymath}
df=\frac{\left( \frac{sd_1^2}{n_1}+\frac{sd_2^2}{n_2}\right)^2}{\left( \frac{sd_1^2}{n_1}\right)^2\cdot \frac{1}{(n_1-1)}+\left( \frac{sd_2^2}{n_2}\right)^2\cdot \frac{1}{(n_2-1)}}.
\end{displaymath}

The settings window with the t- test for independent groups can be opened in Statistics menu→Parametric testst-test for independent groups or in ''Wizard''.

If, in the window which contains the options related to the variances, you have choosen:

  • equal, the t-test for independent groups will be calculated ,
  • different, the t-test with the Cochran-Cox adjustment will be calculated,
  • check equality, to calculate the Fisher-Snedecor test, basing on its result and set the level of significance, the t-test for independent groups with or without the Cochran-Cox adjustment will be calculated.

Note Calculations can be based on raw data or data that are averaged like: arithmetic means, standard deviations and sample sizes.

2022/02/09 12:56

The t-test for dependent groups

The $t$-test for dependent groups is used when the measurement of an analysed variable you do twice, each time in different conditions (but you should assume, that variances of the variable in both measurements are pretty close to each other). We want to check how big is the difference between the pairs of measurements ($d_i=x_{1i}-x_{2i}$). This difference is used to verify the hypothesis informing us that the mean of the difference in the analysed population is 0.

Basic assumptions:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \mu_0=0,\\
\mathcal{H}_1: & \mu_0\ne0,
\end{array}

where:

$\mu_0$, – mean of the differences $d_i$ in a population.

The test statistic is defined by:

\begin{displaymath}
t=\frac{\overline{d}}{sd_d}\sqrt{n},
\end{displaymath}

where:

$\overline{d}$ – mean of differences $d_i$ in a sample,

$sd_d $ – standard deviation of differences $d_i$ in a sample,

$n$ – number of differences $d_i$ in a sample.

Test statistic has the t-Student distribution with $n-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

  • standard deviation of the difference is defined by:

\begin{displaymath}
sd_d=\displaystyle{\sqrt{\frac{\sum_{i=1}^{n}(d_i-\overline{d})^2}{n-1}}},
\end{displaymath}

  • standard error of the mean of differences is defined by:

\begin{displaymath}
SEM_{d}=\displaystyle{\frac{SD_d}{\sqrt{n}}}.
\end{displaymath}

Standardized effect size.

The Cohen's d determines how much of the variation occurring is the difference between the averages, while taking into account the correlation of the variables.

\begin{displaymath}
	d=\frac{dz}{\sqrt{1-r_p}},
\end{displaymath}.

When interpreting an effect, researchers often use general guidelines proposed by Cohen 4) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the t-test for dependent groups can be opened in Statistics menu→Parametric testst-test for dependent groups or in ''Wizard''.

Note

Calculations can be based on raw data or data that are averaged like: arithmetic mean of difference, standard deviation of difference and sample size.

EXAMPLE(BMI.pqs file)

A clinic treating eating disorders studied the effect of a recommended „diet A” on weight change. A sample of 120 obese patients were put on the diet. Their BMI levels were measured twice: before the diet and after 180 days of the diet. To test the effectiveness of the diet, the obtained BMI measurements were compared.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $Mean BMI values do not change with diet,$\\
\mathcal{H}_1: & $Mean BMI values change as a result of diet.$
\end{array}$

Comparing $p<0.0001$ with a significance level $\alpha=0.05$ we find that the mean BMI level changed significantly. Before the diet, it was higher by less than 2 units on average.

The study was able to use the Student's t-test for dependent groups because the distribution of the difference between pairs of measurements was a normal distribution (Lilliefors test, $p=0.0837$).

2022/02/09 12:56
2022/02/09 12:56

Non-parametric tests

The Mann-Whitney U test

The Mann-Whitney $U$ test is also called as the Wilcoxon Mann-Whitney test (Mann and Whitney (1947)5) and Wilcoxon (1949)6)). This test is used to verify the hypothesis that there is no shift in the compared distributions, i.e., most often the insignificance of differences between medians of an analysed variable in 2 populations (but you should assume that the distributions of a variable are pretty similar to each other - comparison of rank variances can be performed with the Conover rank test.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \phi_1=\phi_2,\\
\mathcal{H}_1: & \phi_1\neq\phi_2,
\end{array}

where:

$\phi_1, \phi_2$ distributions of an analysed variable of the 1st and the 2nd population.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Depending on a sample size, the test statistic is calculated using by different formulas:

  • For a small sample size:

\begin{displaymath}
U=n_1n_2+\frac{n_1(n_1+1)}{2}-R_1,
\end{displaymath}

or

\begin{displaymath}
U'=n_1n_2+\frac{n_2(n_2+1)}{2}-R_2,
\end{displaymath}

where $n_1, n_2$ are sample sizes, $R_1, R_2$ are rank sums for the samples.

This statistic has the Mann-Whitney distribution and it does not contain any correction for ties. The value of the exact probability of the Mann-Whitney distribution is calculated with the accuracy up to the hundredth place of the fraction.

  • For a large sample size:

\begin{displaymath}
Z=\frac{U-\frac{n_1n_2}{2}}{\sqrt{\frac{n_1n_2(n1+n_2+1)}{12}-\frac{n_1n_2\sum (t^3-t)}{12(n_1+n_2)(n_1+n_2-1)}}},
\end{displaymath}

where:

$U$ can be replaced with $U'$,

$t$ – number of cases included in a tie.

The formula for the $Z$ statistic includes the correction for ties. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $\frac{n_1n_2\sum (t^3-t)}{12(n_1+n_2)(n_1+n_2-1)}=0$)

The $Z$ statistic asymptotically (for large sample sizes) has the normal distribution.

The Mann-Whitney test with the continuity correction (Marascuilo and McSweeney (1977)7))

The continuity correction should be used to guarantee the possibility of taking in all the values of real numbers by the test statistic, according to the assumption of the normal distribution. The formula for the test statistic with the continuity correction is defined as:

\begin{displaymath}
Z=\frac{\left|U-\frac{n_1n_2}{2}\right|-0.5}{\sqrt{\frac{n_1n_2(n1+n_2+1)}{12}-\frac{n_1n_2\sum (t^3-t)}{12(n_1+n_2)(n_1+n_2-1)}}}.
\end{displaymath}

Standardized effect size

The distribution of the Mann-Whitney test statistic is approximated by the normal distribution, which can be converted to an effect size $r=\left|Z/(n_1+n_2)\right|$ 8) to then obtain the Cohen's d value according to the standard conversion used for meta-analyses:

\begin{displaymath}
	d=\frac{2r}{\sqrt{1-r^2}}
\end{displaymath}

When interpreting an effect, researchers often use general guidelines proposed by 9) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the Mann-Whitney U test can be opened in Statistics menu → NonParametric tests (ordered categories)Mann-Whitney or in ''Wizard''.

EXAMPLE (computer.pqs file)

There was made a hypothesis that at some university male math students spend statistically more time in front of a computer screen than the female math students. To verify the hypothesis from the population of people who study math at this university, there was drawn a sample consisting of 54 people (25 women and 29 men). These persons were asked how many hours they spend in front of the computer screens daily. There were obtained the following results:

(time, sex): (2, k) (2, m) (2, m) (3, k) (3, k) (3, k) (3, k) (3, m) (3, m) (4, k) (4, k) (4, k) (4, k) (4, m) (4, m) (5, k) (5, k) (5, k) (5, k) (5, k) (5, k) (5, k) (5, k) (5, k) (5, m) (5, m) (5, m) (5, m) (6, k) (6, k) (6, k) (6, k) (6, k) (6, m) (6, m) (6, m) (6, m) (6, m) (6, m) (6, m) (6, m) (7, k) (7, m) (7, m) (7, m) (7, m) (7, m) (7, m) (7, m) (7, m) (7, m) (8, k) (8, m) (8, m).}

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the median of the time spent in front of a computer screen is exactly the same both$\\
& $in the male and the female population of students, at the analysed university,$\\
\mathcal{H}_1: & $the median of the time spent in front of a computer screen is different among the $\\
& $male  population and the female population of students, at the analysed university.$
\end{array}$

Based on the assumed $\alpha=0.05$ and the $Z$ statistic of the Mann-Whitney test without correction for continuity (p=0.0154) as well as with this correction p=0.0158, as well as on the exact $U$ statistic (p=0.0149) we can assume that there are statistically significant differences between female and male math students in the amount of time spent in front of the computer. These differences are that female students spend less time in front of the computer than male students. They can be described by the median, quartiles, and the largest and smallest value, which we also see in a box-and-whisker plot. Another way to describe the differences is to represent the time spent in front of the computer based on a table of counts and percentages (which we run in the analysis window by setting descriptive statistics includegraphics ) or based on a column plot.

2022/02/09 12:56

The Wilcoxon test (matched-pairs)

The Wilcoxon matched-pairs test, is also called as the Wilcoxon test for dependent groups (Wilcoxon 194510),194911)). It is used if the measurement of an analysed variable you do twice, each time in different conditions. It is the extension for the two dependent samples of the Wilcoxon test (signed-ranks) – designed for a one sample. We want to check how big is the difference between the pairs of measurements ($d_i=x_{1i}-x_{2i}$) for each of $i$ analysed objects. This difference is used to verify the hypothesis determining that the median of the difference in the analysed population counts to 0.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: &  \theta_0=0, \\
\mathcal{H}_1: &  \theta_0\neq 0,
\end{array}

where:

$ \theta_0$ – median of the differences $d_i$ in a population.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Depending on the sample size, the test statistic is calculated by using different formulas:

  • For small a sample size:

\begin{displaymath}
T=\min\left(\sum R_-,\sum R_+\right),
\end{displaymath}

where:

$\sum R_+$ – sums of positive ranks,

$\sum R_-$ – sums of negative ranks.

This statistic has the Wilcoxon distribution and does not contain any correction for ties.

  • For a large sample size

\begin{displaymath}
Z=\frac{T-\frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}-\frac{\sum t^3-\sum t}{48}}},
\end{displaymath}

where:

$n$ – number of ranked signs (number of the ranks),

$t$ – number of the cases included in a tie.

The formula for the Z statistic includes the correction for ties. This correction is used, when the ties occur (if there are no ties, the correction is not calculated, because of $\frac{\sum t^3-\sum t}{48}=0$).

The $Z$ statistic (for large sample sizes) asymptotically has the normal distribution.

The Wilcoxon test with the continuity correction (Marascuilo and McSweeney (1977)12))

The continuity correction is used to guarantee the possibility of taking in all the values of the real numbers by the test statistic, according to the assumption of the normal distribution. The test statistic with the continuity correction is defined by:

\begin{displaymath}
Z=\frac{\left|T-\frac{n(n+1)}{4}\right|-0.5}{\sqrt{\frac{n(n+1)(2n+1)}{24}-\frac{\sum t^3-\sum t}{48}}}.
\end{displaymath}

Note

The median calculated for the difference column includes all pairs of results except those with a difference of 0.

Standardized effect size

The distribution of the Wilcoxon test statistic is approximated by the normal distribution, which can be converted to an effect size $r=\left|Z/n\right|$ 13) to then obtain the Cohen's d value according to the standard conversion used for meta-analyses:

\begin{displaymath}
	d=\frac{2r}{\sqrt{1-r^2}}
\end{displaymath}

When interpreting an effect, researchers often use general guidelines proposed by 14) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the Wilcoxon test for dependent groups can be opened in Statistics menu → NonParametric testsWilcoxon (matched-pairs) or in ''Wizard''.

EXAMPLE (pain.pqs file)

There was chosen a sample consisting of 22 patients suffering from a cancer. They were examined to check the level of felt pain (1 – 10 scale, where 1 means the lack of pain and 10 means unbearable pain). This examination was repeated after a month of the treatment with a new medicine which was supposed to lower the level of felt pain. There were obtained the following results:

(pain before, pain after): (2, 2) (2, 3) (3, 1) (3,1) (3, 2) (3, 2) (3, 3) (4, 1) (4, 3) (4, 4) (5, 1) (5, 1) (5, 2) (5, 4) (5, 4) (6, 1) (6, 3) (7, 2) (7, 4) (7, 4) (8, 1) (8, 3). Now, you want to check if this treatment has any influence on the level of felt pain in the population, from which the sample was chosen.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the median of the differences between the level of pain before and after a month$\\
& $of treatment in the analysed population comes to 0,$\\
\mathcal{H}_1: & $the median of the differences between the level of pain before and after a month$\\
& $of treatment in the analysed population is different from 0.$
\end{array}$

Comparing the <latex>$p$</latex> value = 0.0001 of the Wilcoxon test, based on the $T$ statistic, with the significance level $\alpha=0.05$ you assume, that there is a statistically significant difference if concerning the level of felt pain between these 2 examinations. The difference is, that the level of pain decreased (the sum of the negative ranks is significantly greater than the sum of the positive ranks). Exactly the same decision you would make on the basis of $p$ value = 0.00021 or $p$ value = 0.00023 of the Wilcoxon test which is based on the $Z$ statistic or the $Z$ statistic with the continuity correction. We can see the differences in a box-and-whisker plot or a column plot.

2022/02/09 12:56

The Chi-square tests

These tests are based on data collected in the form of a contingency table of 2 traits, trait X and trait Y, the former having $r$ and the latter $c$ categories, so the resulting table has $r$ rows and $c$ columns. Therefore, we can speak of the 2×2 chi-square test (for tables with two rows and two columns) or the RxC chi-square test (with multiple rows and columns)).

We can read the details of the chi-square test of the two features here:

chi-square test 2x2

chi-square test RxC.

Basic assumptions:

The additional assumption for the $\chi^2$ :

General hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & O_{ij}=E_{ij} $ for all categories,$\\
\mathcal{H}_1: & O_{ij} \neq E_{ij} $ for at least one category,$
\end{array}$

where:

$O_{ij}$observed frequencies in a contingency table,

$E_{ij}$expected frequencies in a contingency table.

Hypotheses in the meaning of independence:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no dependence between the analysed features of the population (both$\\
& $classifications are statistically independent according to $X$ and $Y$ feature),$\\
\mathcal{H}_1: & $there is a dependence between the analysed features of the population.$
\end{array}$

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Additionally

  • In addition to the chi-square test, another related test may need to be determined. In the event that Cochran's condition is not satisfied, one can determine:
  • If we obtain a table of Rx2, and the R categories can be ordered, it is possible to determine the trend:
  • When significant relationships or differences are found based on a test performed on a table larger than 2×2, then multiple comparisons can be performed with appropriate correction of the multiple comparisons to locate the location of these relationships/differences. This correction can be done automatically when the table has many columns. In such case, in test option window you should select Multiple column comparisons (RxC).
  • In the case where we want to describe the strength of the relationship between feature X and feature Y, we can determine:
  • In the case when we want to describe for 2×2 tables the effect size showing the impact of a risk factor, we can determine:
2022/02/09 12:56

The Chi-square test for large tables

These tests are based on the data gathered in the form of a contingency table of 2 features ($X$, $Y$). One of them has possible $r$ categories $X_1, X_2,..., X_r$ and the other one $c$ categories $Y_1, Y_2,..., Y_c$ (look at the table (\ref{tab_kontyngencji_obser})).

The $\chi^2$ test for $r\times c$ tables is also known as the Pearson's Chi-square test (Karl Pearson 1900). This test is an extension on 2 features of the Chi-square test (goodness-of-fit).

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij}-E_{ij})^2}{E_{ij}}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the [en:statpqpl:rozkladypl:ciaglepl#rozklad_chi_kwadrat|Chi-square distribution]] with a number of degrees of freedom calculated using the formula: $df=(r-1)(c-1)$.
The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$.

The settings window with the Chi-square test (RxC) can be opened in Statistics menu → NonParametric testsChi-square, Fisher, OR/RR or in ''Wizard''

EXAMPLE (country-education.pqs file)

There is a sample of 605 persons ($n=605$), who had 2 features analysed for ($X$=country of residence, $Y$=education). The first feature occurrs in 4 categories, and the second one in 3 categories ($X_1$=Country 1, $X_2$=Country 2, $X_3$=Country 3, $X_4$=Country 4, $Y_1$=primary, $Y_2$=secondary, $Y_3$=higher). The data distribution is shown below, in the contingency table:

Based on this sample, you would like to find out if there is any dependence between education and country of residence in the analysed population.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no dependence between education and country of residence$\\
&$in the analysed population,$\\
\mathcal{H}_1: & $there is a dependence between education and country of residence$\\
&$in the analysed population.$
\end{array}$

Cochran's condition is satisfied.

The p-value = 0.0006. So, on the basis of the significance level $\alpha=0.05$ we can draw the conclusion that there is a dependence between education and country of residence in the analysed population. If we are interested in more precise information about the detected dependencies, we will obtain it by determining multiple comparisons through the options Fisher, Yates and others… and then Multiple column comparisons (RxC) and one of the corrections e.g. Benjamini-Hochberg

A closer look reveals that only the second country differs from the other countries in educational attainment in a statistically significant way.

2022/02/09 12:56

The Chi-square test for small tables

These tests are based on the data gathered in the form of a contingency table of 2 features ($X$, $Y$), each of them has 2 possible categories $X_1, X_2$ and $Y_1, Y_2$ (look at the table (\ref{tab_kontyngencji_obser})).

The $\chi^2$ test for $2\times 2$ tables – The Pearson's Chi-square test (Karl Pearson 1900) is constraint of the Chi-square test for (r x c) tables.

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^2\sum_{j=1}^2\frac{(O_{ij}-E_{ij})^2}{E_{ij}}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the Chi-square distribution with a 1 degree of freedom.

The settings window with the Chi-square test (2×2) can be opened in Statistics menu → NonParametric testsChi-square, Fisher, OR/RR or in ''Wizard''.

EXAMPLE (sex-exam.pqs file)

There is a sample consisting of 170 persons ($n=170$). Using this sample, you want to analyse 2 features ($X$=sex, $Y$=exam passing). Each of these features occurs in two categories ($X_1$=f, $X_2$=m, $Y_1$=yes, $Y_2$=no). Based on the sample you want to get to know, if there is any dependence between sex and exam passing in the above population. The data distribution is presented in the contingency table below:

\begin{tabular}{|c|c||c|c|c|}
\hline
\multicolumn{2}{|c||}{Observed frequencies }& \multicolumn{3}{|c|}{exam passing}\\\cline{3-5}
\multicolumn{2}{|c||}{$O_{ij}$} & yes & no & total \\\hline \hline
\multirow{3}{*}{sex}& f & 50 & 40 & 90 \\\cline{2-5}
& m & 20 & 60 & 80 \\\cline{2-5}
& total & 70 & 100 & 170\\\hline
\end{tabular}

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no dependence between sex and exam passing in the analysed population,$\\
\mathcal{H}_1: & $there is a dependence between sex and exam passing in the analysed population.$
\end{array}$

The expectation count table contains no values less than 5. Cochran's condition is satisfied.

At the assumed significance level of $\alpha=0.05$ all tests performed confirmed the truth of the alternative hypothesis:

  • chi-square test, p=0.000053,
  • chi-square test with Yeates correction, p=0.000103,
  • Fisher's exact test, p=0.000083,
  • mid-p test, p=0.000054.
2022/02/09 12:56

The Fisher's test for large tables

The Fisher test for $r\times c$ tables is also called the Fisher-Freeman-Halton test (Freeman G.H., Halton J.H. (1951)16)). This test is an extension on $r\times c$ tables of the Fisher's exact test. It defines the exact probability of an occurrence specific distribution of numbers in the table (when we know $n$ and we set the marginal totals).

If you define marginal sums of each row as:

\begin{displaymath}
W_i=\sum_{j=1}^cO_{ij},
\end{displaymath}

where:

$ O_{ij} $observed frequencies in a table,

and the marginal sums of each column as:

\begin{displaymath}
K_i=\sum_{i=1}^rO_{ij}.
\end{displaymath}

then, having defined the marginal sums for the different distributions of the observed frequencies represented by $U_{ij}$, you can calculate the $P$ probabilities:

\begin{displaymath}
P=\frac{D^{-1}\prod_{j=1}^{c}K_j!}{U_{1j}!U_{2j}!\dots U_{rj}},
\end{displaymath}

where \begin{displaymath}
D=\frac{(W_1+W_2+\dots+W_r)!}{W_1!W_2!\dots W_r!}.
\end{displaymath}

The exact significance level $p$: is the sum of $P$ probabilities (calculated for new values $U_{ij}$), which are smaller or equal to $P$ probability of the table with the initial numbers $O_{ij}$.

The exact p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$.

The settings window with the Fisher exact test (RxC) can be opened in Statistics menu → NonParametric tests Chi-square, Fisher, OR/RR or in ''Wizard''.

Info.

The process of calculation of p-values for this test is based on the algorithm published by Mehta (1986)17).

EXAMPLE (job prevention.pqs file)

In the population of people living in the rural areas of Komorniki municipality it was examined whether the performance of preventive health examinations depends on the type of occupational activity of the residents. A random sample of 120 people was collected and asked about their education and whether they perform preventive examinations. Complete answers were obtained from 113 persons.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no correlation between performance of preventive examinations$\\
&$and the type of work performed by the residents of rural areas of the Komorniki commune,$\\
\mathcal{H}_1: & $there is a correlation between performance of preventive examinations$\\
&$and the type of work performed by the residents of rural areas of the Komorniki commune.$
\end{array}$

Cochran's condition is not satisfied, thus we should not use the chi-square test.

Value p<0.0001. Therefore, at the significance level $\alpha=0.05$ we can say that there is a relationship between the performance of preventive examinations and the type of work performed by residents of rural areas of Komorniki municipality. If we are interested in more precise information about the correlations detected, we will obtain it by determining multiple comparisons through the options Fisher, Yates and others… and then Multiple column comparisons (RxC) and one of the corrections e.g. Benjamini-Hochberg.

A closer analysis allows us to conclude that health professionals perform preventive examinations significantly more often than the other groups (100% of people in this group performed examinations), and the unemployed significantly less often (no one in this group performed an examination). Farmers, other manual workers and other white-collar workers take preventive examinations in about 50%, which means that these three groups are not statistically significantly different from each other. Part of the p-values obtained in the table is marked with an asterisk, it denotes those results which were obtained by using the Fisher's exact test with Benjamini-Hochberg correction, values not marked with an asterisk are the results of the chi-square test with Benjamini-Hochberg correction, in which Cochran's assumptions were fulfilled.

2022/02/09 12:56

The Chi-square test corrections for small tables

These tests are based on data collected in the form of a contingency table of 2 features ($X$, $Y$), each of which has possible $2$ categories $X_1, X_2$ and $Y_1, Y_2$ (look at the table(\ref{tab_kontyngencji_obser})).

The Chi-square test with the Yate's correction for continuity

The $\chi^2$test with the Yate's correction (Frank Yates (1934)18)) is a more conservative test than the Chi-square test (it rejects a null hypothesis more rarely than the $\chi^2$ test). The correction for continuity guarantees the possibility of taking in all the values of real numbers by a test statistic, according to the $\chi^2$ distribution assumption.

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^2\sum_{j=1}^2\frac{(|O_{ij}-E_{ij}|-0.5)^2}{E_{ij}}.
\end{displaymath}

The Fisher test for (2×2) tables

The Fisher test for $2\times 2$ tables is also called the Fisher exact test (R. A. Fisher (1934)19), (1935)20)). This test enables you to calculate the exact probability of the occurrence of the particular number distribution in a table (knowing $n$ and defined marginal sums.

\begin{displaymath}
P=\frac{{O_{11}+O_{21} \choose O_{11}}{O_{12}+O_{22} \choose O_{12}}}{{O_{11}+O_{12}+O_{21}+O_{22} \choose O_{11}+O_{12}}}.
\end{displaymath}

If you know each marginal sum, you can calculate the $P$ probability for various configurations of observed frequencies. The exact $p$ significance level is the sum of probabilities which are less or equal to the analysed probability.

The mid-p test

The mid-p is the Fisher exact test correction. This modified p-value is recommended by many statisticians (Lancaster 196121), Anscombe 198122), Pratt and Gibbons 198123), Plackett 198424), Miettinen 198525) and Barnard 198926), Rothman 200827)) as a method used in decreasing the Fisher exact test conservatism. As a result, using the mid-p the null hypothesis is rejected much more qucikly than by using the Fisher exact test. For large samples a p-value is calculated by using the $\chi^2$ test with the Yate's correction and the Fisher test gives quite similar results. But a p-value of the $\chi^2$ test without any correction corresponds with the mid-p.

The p-value of the mid-p is calculated by the transformation of the probability value for the Fisher exact test. The one-sided p-value is calculated by using the following formula:

\begin{displaymath}
p_{I(mid-p)}=p_{I(Fisher)}-0.5\cdot P_{punktu(tabeli\quad zadanej)},
\end{displaymath}

where:

$p_{I(mid-p)}$ – one-sided p-value of mid-p,

$p_{I(Fisher)}$ – one-sided p-value of Fisher exact test,

and the two-sided p-value is defined as a doubled value of the smaller one-sided probability: \begin{displaymath}
p_{II(mid-p)}=2p_{I(mid-p)},
\end{displaymath}

where:

$p_{II(mid-p)}$ – two-sided p-value of mid-p.

The settings window with the chi-square test and its corrections can be opened in Statistics menu → NonParametric testsChi-square, Fisher, OR/RR or in ''Wizard''.

2022/02/09 12:56

The Chi-square test for trend

The $\chi^2$ test for trend (also called the Cochran-Armitage trend test 28)29))is used to determine whether there is a trend in proportion for particular categories of an analysed variables (features). It is based on the data gathered in the contingency tables of 2 features. The first feature has the possible $r$ ordered categories: $X_1, X_2,..., X_r$ and the second one has 2 categories: $G_1$, $G_2$. The contingency table of $r\times 2$ observed frequencies

\begin{tabular}{|c|c||c|c|c|}
\hline
\multicolumn{2}{|c||}{Observed frequencies }& \multicolumn{3}{|c|}{Feature 2 (group)}\\\cline{3-5}
\multicolumn{2}{|c||}{$O_{ij}$} & $G_1$ & $G_2$ & Total \\\hline \hline
\multirow{5}{*}{Feature 1 (feature $X$)}& $X_1$& $O_{11}$ & $O_{12}$ & $W_1=O_{11}+O_{12}$  \\\cline{2-5}
& $X_2$ & $O_{21}$ & $O_{22}$ & $W_2=O_{21}+O_{22}$  \\\cline{2-5}
& ... & ... & ... & ...  \\\cline{2-5}
& $X_r$ & $O_{r1}$ & $O_{r2}$ & $W_r=O_{r1}+O_{r2}$  \\\cline{2-5}
& Total & $C_1=\sum_{i=1}^rO_{i1}$ & $C_2=\sum_{i=1}^rO_{i2}$ & $n=C_1+C_2$\\\hline
\end{tabular}

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & $In the analysed population the trend in a proportion of $p_1, p_2, ..., p_r$ does not exist, $\\
\mathcal{H}_1: & $There is the trend in a proportion of $p_1, p_2, ..., p_r$ in the analysed population. $
\end{array}

where:

$p_1, p_2, ..., p_r$ are the proportions $p_1=\frac{O_{11}}{W_1}$, $p_2=\frac{O_{21}}{W_2}$,…, $p_r=\frac{O_{r1}}{W_r}$.

The test statistic is defined by:

\begin{displaymath}
\chi^2=\frac{\left[\left(\sum_{i=1}^r i\cdot O_{i1}\right) -C_1\left(\sum_{i=1}^r\frac{i\cdot W_i}{n}\right)\right]^2}{\frac{C_1}{n}\left(1-\frac{C_1}{n}\right)\left[\left(\sum_{i=1}^n i^2 W_i\right)-n\left(\sum_{i=1}^n\frac{i \cdot W_i}{n}\right)^2\right]}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the Chi-square distribution with 1 degree of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Chi-square test for trend can be opened in Statistics menu → NonParametric testsChi-square, Fisher, OR/RRChi-square for trend.

EXAMPLE (smoking-education.pqs file)

We examine whether cigarette smoking is related to the education of residents of a village. A sample of 122 people was drawn. The data were recorded in a file. }

We assume that the relationship can be of two types i.e. the more educated people, the more often they smoke or the more educated people, the less often they smoke. Thus, we are looking for an increasing or decreasing trend.

Before proceeding with the analysis, we need to prepare the data, i.e., we need to indicate the order in which the education categories should appear. To do this, from the properties of the Education variable, we select Codes/Labels/Format… and assign the order by specifying consecutive natural numbers. We also assign labels.

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & $there is no trend in the rural population of increasing/decreasing $\\
& $wraz ze wzrostem wykształcenia, $\\
\mathcal{H}_1: & $there is a trend in the rural population of increasing/decreasing $\\
& $numbers of smokers with increasing education. $
\end{array}

A p-value=0.0091, which compared to a significance level of $\alpha$=0.05 indicates that the alternative hypothesis that a trend exists is true.

As the graph shows, the more educated people are, the less often they smoke. However, the result obtained by people with junior high school education deviates from this trend. Since there are only two people with lower secondary school education, it did not have much influence on the trend. Due to the very small size of this group, it was decided to repeat the analysis for the combined primary and lower secondary education categories.

A small value was again obtained p=0.0078 and confirmation of a statistically significant trend.

EXAMPLE(viewers.pqs file)

Because of the decrease in people watching some particular soap opera there was carried out an opinion survey. 100 persons were asked, who has recently started watching this soap opera, and 300 persons were asked, who has watched it regularly from the beginning. They were asked about the level of preoccupation with the character's life. The results are written down in the table below:

\begin{tabular}{|c||c|c|c|}
\hline
Level of & \multicolumn{3}{|c|}{group}\\\cline{2-4}
commitment & group of new viewers & group of steady viewers  & total \\\hline \hline
rather small & 7 & 7 & 14  \\\hline
average & 13 & 25 & 38  \\\hline
rather high & 30 & 58 & 88  \\\hline
high& 24 & 99 & 123\\\hline
very high  & 26& 111& 137\\\hline
total & 100 & 300& 400\\\hline
\end{tabular}

The new viewers consist of 25\% of all the analysed viewers. This proportion is not the same for each level of commitment, but looks like this:

\begin{tabular}{|c||c|c|c|}
\hline
Level of& \multicolumn{3}{|c|}{group}\\\cline{2-4}
commitment & group of new viewers & group of steady viewers & total \\\hline \hline
rather small & $p_1$=50.00\% & 50.00\% & 100\%  \\\hline
average & $p_2$=34.21\% & 65.79\% & 100\%  \\\hline
rather high & $p_3$=34.09\% & 65.91\% & 100\%  \\\hline
high& $p_4$=19.51\% & 80.49\% & 100\%\\\hline
very high  & $p_5$=18.98\%& 81.02\%& 100\%\\\hline
\textbf{total} & \textbf{25.00\%} & \textbf{75.00\%}& \textbf{100\%}\\\hline
\end{tabular}

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & $in the population of the soap opera viewers, the trend in proportions of $\\
& p_1, p_2, p_3, p_4, p_5 $ does not exist,$\\
\mathcal{H}_1: & $in the population of the soap opera viewers, the trend in proportions of $\\
& p_1, p_2, p_3, p_4, p_5 $ does exists.$\\
\end{array}

The p-value=0.0004 which, compared to the significance level $\alpha$=0.05, proves the truth of the alternative hypothesis that there is a trend in the proportions $p_1, p_2, ..., p_5$. As can be seen from the contingency table of the percentages calculated from the sum of the columns, this is a decreasing trend (the more interested the group of viewers is in the fate of the characters of the series, the smaller part of it is made up of new viewers).

2022/02/09 12:56
 

The Z test for 2 independent proportions

The $Z$ test for 2 independent proportions is used in the similar situations as the Chi-square test (2x2). It means, when there are 2 independent samples with the total size of $n_1$ and $n_2$, with the 2 possible results to gain (one of the results is distinguished with the size of $m_1$ - in the first sample and $m_2$ - in the second one). For these samples it is also possible to calculate the distinguished proportions $p_1=\frac{m_1}{n_1}$ and $p_2=\frac{m_2}{n_2}$. This test is used to verify the hypothesis informing us that the distinguished proportions $P_1$ and $P_2$ in populations, from which the samples were drawn, are equal.
Basic assumptions:

Hypotheses:

#\begin{array}{cl}
\mathcal{H}_0: & P_1=P_2,\\
\mathcal{H}_1: & P_1\neq P_2,
\end{array}$

where:

$P_1$, $P_2$ fraction for the first and the second population.

The test statistic is defined by:

\begin{displaymath}
Z=\frac{p_1-p_2}{\sqrt{p(1-p)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}},
\end{displaymath}

where:

$p=\frac{m_1+m_2}{n_1+n_2}$.

The test statistic modified by the continuity correction is defined by:

\begin{displaymath}
Z=\frac{p_1-p_2-\frac{1}{2}\left(\frac{1}{n_1}+\frac{1}{n_2} \right)}{\sqrt{p(1-p)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}.
\end{displaymath}

The $Z$ Statistic with and without the continuity correction asymptotically (for the large sample sizes) has the normal distribution.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Apart from the difference between proportions, the program calculates the value of the NNT.

NNT (number needed to treat) – indicator used in medicine to define the number of patients which have to be treated for a certain time in order to cure one person.

NNT is calculated from the formula:

\begin{displaymath}
NNT=\frac{1}{|p_1-p_2|}
\end{displaymath}

and is quoted when the difference $p_1-p_2$ is positive.

NNH (number needed to harm) – an indicator used in medicine, denotes the number of patients whose exposure to a risk over a specified period of time, results in harm to one person who would not otherwise be harmed. NNH is calculated in the same way as NNT, but is quoted when the difference $p_1-p_2$ is negative.

Confidence interval – The narrower the confidence interval, the more precise the estimate. If the confidence interval includes 0 for the difference in proportions and $\infty$ for the NNT and/or NNH, then there is an indication to treat the result as statistically insignificant

Note

From PQStat version 1.3.0, the confidence intervals for the difference between two independent proportions are estimated on the basis of the Newcombe-Wilson method. In the previous versions it was estimated on the basis of the Wald method.

The justification of the change is as follows:

Confidence intervals based on the classical Wald method are suitable for large sample sizes and for the difference between proportions far from 0 or 1. For small samples and for the difference between proportions close to those extreme values, the Wald method can lead to unreliable results (Newcombe 199830), Miettinen 198531), Beal 198732), Wallenstein 199733)). A comparison and analysis of many methods which can be used instead of the simple Wald method can be found in Newcombe's study (1998)34). The suggested method, suitable also for extreme values of proportions, is the method first published by Wilson (1927)35), extended to the intervals for the difference between two independent proportions.

Note

The confidence interval for NNT and/or NNH is calculated as the inverse of the interval for the proportion, according to the method proposed by Altman (Altman (1998)36)).

The settings window with the Z test for 2 proportions can be opened in Statistics menu → NonParametric testsZ for 2 independent proportions.

EXAMPLE cont. (sex-exam.pqs file)

You know that $\frac{50}{90}=55.56%$ out of all the women in the sample who passed the exam and $\frac{20}{80}=25.00%$ out of all the men in the sample who passed the exam. This data can be written in two ways – as a numerator and a denominator for each sample, or as a proportion and a denominator for each sample:

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $The proportion of the men who passed the exam is the same as the proportion $\\
&$of the women who passed the exam in the analysed population,$\\
\mathcal{H}_1: & $The proportion of the men who passed the exam is different than the proportion $\\
&$of the women who passed the exam in the analysed population.$
\end{array}$

Note

It is necessary to select the appropriate area (data without headings) before the analysis begins, because usually there are more information in a datasheet. You should also select the option indicating the content of the variable (frequency (numerator) or proportion). The difference between proportions distinguished in the sample is 30.56%, a 95% and the confidence interval for it $(15.90%, 43.35%)$ does not contain 0.

Based on the $Z$ test without the continuity correction as well as on the $Z$ test with the continuity correction ( p-value < 0.0001), on the significance level $\alpha$=0.05, the alternative hypothesis can be accepted (similarly to the Fisher exact test, its the mid-p corrections, the $\chi^2$ test and the $\chi^2$ test with the Yate's correction). So, the proportion of men, who passed the exam is different than the proportion of women, who passed the exam in the analysed population. Significantly, the exam was passed more often by women (${\frac{50}{90}=55.56%$ out of all the women in the sample who passed the exam) than by men ($\frac{20}{80}=25.00%$ out of all the men in the sample who passed the exam).

EXAMPLE

Let us assume that the mortality rate of a disease is 100\% without treatment and that therapy lowers the mortality rate to 50% – that is the result of 20 years of study. We want to know how many people have to be treated to prevent 1 death in 20 years. To answer that question, two samples of 100 people were taken from the population of the diseased. In the sample without treatment there are 100 patients of whom we know they will all die without the therapy. In the sample with therapy we also have 100 patients of whom 50 will survive. \small{

\begin{tabular}{|c|c||c|c|}\hline
\multicolumn{2}{|c||}{Patients -- not undergoing therapy}& \multicolumn{2}{|c|}{Patients -- undergoing therapy}\\\hline
sample numerator&sample (denominator)&sample numerator&sample (denominator)\\\hline
100&100&50&100\\\hline
\end{tabular}

We will calculate the NNT.

The difference between proportions is statistically significant ($p<0.0001$) but we are interested in the NNT – its value is 2, so the treatment of 2 patients for 20 years will prevent 1 death. The calculated confidence interval value of 95\% should be rounded off to a whole number, wherefore the NNT is 2 to 3 patients.

EXAMPLE

The value of the certain proportion difference in the study comparing the effectiveness of drug 1 vs drug 2 was: difference (95%CI)=-0.08 (-0.27 do 0.11). This negative proportion difference suggests that drug 1 was less effective than drug 2, so its use put patients at risk. Because the proportion difference is negative, the determined inverse is called the NNH, and because the confidence interval contains infinity NNH(95\%CI)= 2.5 (NNH 3.7 to ∞ to NNT 9.1) and goes from NNH to NNT, we should conclude that the result obtained is not statistically significant (Altman (1998)37)).

2022/02/09 12:56

The Z Test for two dependent proportions

$Z$ Test for two dependent proportions is used in situations similar to the **McNemar's Test**, i.e. when we have 2 dependent groups of measurements ($X^{(1)}$ i $X^{(2)}$), in which we can obtain 2 possible results of the studied feature ((+)(–)).

\begin{tabular}{|c|c||c|c|c|}
\hline
\multicolumn{2}{|c||}{Observed sizes}& \multicolumn{3}{|c|}{$X^{(2)}$} \\\cline{3-5}
\multicolumn{2}{|c||}{$O_{ij}$}&\textbf{(+)}&\textbf{(--)}& \textbf{Suma}\\\hline \hline
\multirow{3}{*}{$X^{(1)}$} & \textbf{(+)} & $O_{11}$ & $O_{12}$ & $O_{11}+O_{12}$ \\\cline{2-5}
&\textbf{(--)}& $O_{21}$ & $O_{22}$ & $O_{21}+O_{22}$\\\cline{2-5}
&\textbf{Sum} & $O_{11}+O_{21}$ & $O_{12}+O_{22}$ & $n=O_{11}+O_{12}+O_{21}+O_{22}$\\\hline
\end{tabular}

We can also calculated distinguished proportions for those groups $p_1=\frac{O_{11}+O_{12}}{n}$ i $p_2=\frac{O_{11}+O_{21}}{n}$. The test serves the purpose of verifying the hypothesis that the distinguished proportions $P_1$ and $P_2$ in the population from which the sample was drawn are equal.
Basic assumptions:

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & P_1-P_2=0,\\
\mathcal{H}_1: & P_1-P_2\neq 0,
\end{array}$

where:

$P_1$, $P_2$ fractions for the first and the second measurement.

The test statistic has the form presented below:

\begin{displaymath}
Z=\frac{p_1-p_2}{\sqrt{O_{21}+O_{12}}}\cdot n,
\end{displaymath}

The $Z$ Statistic asymptotically (for the large sample size) has the normal distribution.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Confidence interval for the difference of two dependent proportions is estimated on the basis of the Newcombe-Wilson method.

The window with settings for Z-Test for two dependent proportions is accessed via the menu StatisticsNonparametric testsZ-Test for two dependent proportions.

EXAMPLE cont. (opinion.pqs file)

When we limit the study to people who have a specific opinion about the professor (i.e. those who only have a positive or a negative opinion) we will have 152 such students. The data for calculations are: $O_{11}=50$, $O_{12}=4$, $O_{21}=44$, $O_{22}=54$. We know that $\frac{50+4}{152}=35.53\%$ students expressed a negative opinion before the exam. After the exam the percentage was $\frac{50+44}{152}=61.84%$.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $a lack of a difference between the number of negative evaluations of $\\
&$the professor before and after the exam,$\\
\mathcal{H}_1: & $there is a difference between the number of negative evaluations of $\\
&$the professor before and after the exam.$\\
\end{array}$

The difference in proportions distinguished in the sample is 26.32%, and the confidence interval of 95% for the sample (18.07%, 33.88%) does not contain 0.

On the basis of a $Z$ test (p<0.0001), on the significance level of $\alpha$=0.05 (similarly to the case of McNemar's test) we accept the alternative hypothesis. Therefore, the proportion of negative evaluations before the exam differs from the proportion of negative evaluations after the exam. Indeed, after the exam there are more negative evaluations of the professor.

2022/02/09 12:56

The McNemar test, the Bowker test of internal symmetry

Basic assumptions:

The McNemar test

The McNemar test (NcNemar (1947)38)) is used to verify the hypothesis determining the agreement between the results of the measurements, which were done twice $X^{(1)}$ and $X^{(2)}$ of an $X$ feature (between 2 dependent variables $X^{(1)}$ and $X^{(2)}$). The analysed feature can have only 2 categories (defined here as (+) and (–)). The McNemar test can be calculated on the basis of raw data or on the basis of a $2\times 2$ contingency table.

\begin{tabular}{|c|c||c|c|c|}
\hline
\multicolumn{2}{|c||}{Observed frequencies}& \multicolumn{3}{|c|}{$X^{(2)}$} \\\cline{3-5}
\multicolumn{2}{|c||}{$O_{ij}$}&\textbf{(+)}&\textbf{(--)}& \textbf{Total}\\\hline \hline
\multirow{3}{*}{$X^{(1)}$} & \textbf{(+)} & $O_{11}$ & $O_{12}$ & $O_{11}+O_{12}$ \\\cline{2-5}
&\textbf{(--)}& $O_{21}$ & $O_{22}$ & $O_{21}+O_{22}$\\\cline{2-5}
&\textbf{Total} & $O_{11}+O_{21}$ & $O_{12}+O_{22}$ & $n=O_{11}+O_{12}+O_{21}+O_{22}$\\\hline
\end{tabular}

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & O_{12}=O_{21}, \\
\mathcal{H}_1: & O_{12}\neq O_{21}.
\end{array}

The test statistic is defined by:

\begin{displaymath}
\chi^2=\frac{(O_{12}-O_{21})^2}{O_{12}+O_{21}}.
\end{displaymath}

This statistic asymptotically (for large frequencies) has the Chi-square distribution with a 1 degree of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The Continuity correction for the McNemar test

This correction is a more conservative test than the McNemar test (a null hypothesis is rejected much more rarely than when using the McNemar test). It guarantees the possibility of taking in all the values of real numbers by the test statistic, according to the $\chi^2$ distribution assumption. Some sources give the information that the continuity correction should be used always, but some other ones inform, that only if the frequencies in the table are small.

The test statistic with the continuity correction is defined by:

\begin{displaymath}
\chi^2=\frac{(|O_{12}-O_{21}|-1)^2}{O_{12}+O_{21}}.
\end{displaymath}

McNemar's exact test

A common general rule for the asymptotic validity of the McNemar chi-square test is the Rufibach assumption, which is that the number of incompatible pairs is greater than 10: $O_{12}+O_{21}\geq10$ 39) when this condition is not satisfied, then we should base the exact probability values of this test 40). The exact probability value of the test is based on a binomial distribution and is a conservative test, so the recommended exact value of the mid-p McNemar test is also given in addition to the exact value of the MnNemar test.

Odds ratio of a result change

If the study is carried out twice for the same feature and on the same objects – then, odds ratio for the result change (from $(+)$ to $(-)$ and inversely) is calculated for the table.

The odds for the result change from $(+)$ to $(-)$ is $O_{12}$, and the odds for the result change from $(-)$ to $(+)$ is $O_{21}$.

Odds Ratio ($OR$) is:

\begin{displaymath}
OR=\frac{O_{12}}{O_{21}}.
\end{displaymath}

Confidence interval for the odds ratio is calculated on the base of the standard error:

\begin{displaymath}
SE=\sqrt{\frac{1}{O_{12}}+\frac{1}{O_{21}}}.
\end{displaymath}

Note

Additionally, for small sample sizes, the exact range of the confidence interval for the Odds Ratio can be determined41).

The settings window with the Bowker-McNemar test can be opened in Statistics menu → NonParametric testsBowker-McNemar or in ''Wizard''.

The Bowker test of internal symmetry

The Bowker test of internal symmetry (Bowker (1948)42)) is an extension of the McNemar test for 2 variables with more than 2 categories ($c>2$). It is used to verify the hypothesis determining the symmetry of 2 results of measurements executed twice $X^{(1)}$ and $X^{(2)}$ of $X$ feature (symmetry of 2 dependent variables $X^{(1)}$ i $X^{(2)}$). An analysed feature may have more than 2 categories. The Bowker test of internal symmetry can be calculated on the basis of either raw data or a $c\times c$ contingency table.

\begin{tabular}{|c|c||c|c|c|c|c|}
\hline
\multicolumn{2}{|c||}{Observed frequencies}& \multicolumn{5}{|c|}{$X^{(2)}$}\\\cline{3-7}
\multicolumn{2}{|c||}{$O_{ij}$} & $X_1^{(2)}$ & $X_2^{(2)}$ & ... & $X_c^{(2)}$ & Total \\\hline \hline
\multirow{5}{*}{$X^{(1)}$}& $X_1^{(1)}$ & $O_{11}$ & $O_{12}$ & ... & $O_{1c}$& $\sum_{j=1}^cO_{1j}$  \\\cline{2-7}
& $X_2^{(1)}$ & $O_{21}$ & $O_{22}$ & ... & $O_{2c}$& $\sum_{j=1}^cO_{2j}$   \\\cline{2-7}
& ...& ... & ... & ... & ...& ...  \\\cline{2-7}
& $X_c^{(1)}$ & $O_{c1}$ & $O_{c2}$ & ... & $O_{cc}$& $\sum_{j=1}^cO_{cj}$   \\\cline{2-7}
& Total & $\sum_{i=1}^cO_{i1}$ & $\sum_{i=1}^cO_{i2}$ & ... & $\sum_{i=1}^cO_{ic}$& $n=\sum_{i=1}^c\sum_{j=1}^cO_{ij}$\\\hline
\end{tabular}

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & O_{ij}=O_{ji},\\
\mathcal{H}_1: & O_{ij}\neq O_{ji} $ for at least one pair $ O_{ij}, O_{ji},
\end{array}

where $j\neq i$, $j\in{1,2,...,c}$, $i\in{1,2,...,c}$, so $O_{ij}$ and $O_{ji}$ are the frequencies of the symmetrical pairs in the $c\times c$ table

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^c\sum_{j>i}\frac{(O_{ij}-O_{ji})^2}{O_{ij}+O_{ji}}.
\end{displaymath}

This statistic asymptotically (for large sample size) has the Chi-square distribution with a number of degrees of freedom calculated using the formula: $df=\frac{c(c-1)}{2}$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

EXAMPLE (opinion.pqs file)

Two different surveys were carried out. They were supposed to analyse students' opinions about the particular academic professor. Both the surveys enabled students to give a positive opinion, a negative and a neutral one. Both surveys were carried out on the basis of the same sample of 250 students. But the first one was carried out the day before an exam done by the professor, and the other survey the day after the exam. There are some data below – in a form of raw rows, and all the data – in the form of a contingency table. Check, if both surveys give the similar results.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the number of students, who changed their opinions is exactly the same$\\
&$for each of the possibile symmetric opinion changes,$\\
\mathcal{H}_1: & $the number of students, who changed their opinions is different$\\
&$for at least one of the possibile symmetric opinion changes,$
\end{array}$

where, for example, changing the opinion from positive to negative one is symmetrical to changing the opinion from negative to positive one.

Comparing the p-value for the Bowker test (p-value<0.0001) with the significance level $\alpha=0.05$ it may be assumed that students changed their opinions. Looking at the table you can see that, there were more students who changed their opinions to negative ones after the exam, than those who changed it to positive ones after the exam. There were also students who did not evaluate the professor in the positive way after the exam any more.

If you limit your analysis only to the people having clear opinions about the professor (positive or negative ones), you can use the McNemar test:

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the number of students, who changed their opinions from negative to positive ones$\\
&$ is exactly the same as those, who changed their opinions from positive to negative,$\\
\mathcal{H}_1: & $the number of students, who changed their opinions from negative to positive ones$\\
&$ is different from those, who changed their opinions from positive to negative.$
\end{array}$

If you compare the p-value, calculated for the McNemar test (p-value < 0.0001), with the significance level $\alpha=0.05$, you draw the conclusion that the students changed their opinions. There were much more students, who changed their opinions to negative ones after the exam, than those who changed their opinions to positive ones. The possibility of changing the opinion from positive (before the exam) to negative (after the exam) is eleven $\left(\frac{44}{4}\right)$ times greater than from negative to positive (the chance to change opinion in the opposite direction is: $\left(\frac{4}{44}\right)$).

2022/02/09 12:56
2022/02/09 12:56
1) , 4) , 9) , 14)
Cohen J. (1988), Statistical Power Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Hillsdale, New Jersey
2)
Cochran W.G. and Cox G.M. (1957), Experimental designs (2nd 4.). New York: John Wiley and Sons.
3)
Satterthwaite F.E. (1946), An approximate distribution of estimates of variance components. Biometrics Bulletin, 2, 1 10-1 14
5)
Mann H. and Whitney D. (1947), On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 1 8 , 5 0 4
6) , 11)
Wilcoxon F. (1949), Some rapid approximate statistical procedures. Stamford, CT: Stamford Research Laboratories, American Cyanamid Corporation
7) , 12)
Marascuilo L.A. and McSweeney M. (1977), Nonparametric and distribution-free method for the social sciences. Monterey, CA: Brooks/Cole Publishing Company
8) , 13)
Fritz C.O., Morris P.E., Richler J.J.(2012), Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General., 141(1):2–18.
10)
Wilcoxon F. (1945), Individual comparisons by ranking methods. Biometries, 1, 80-83
15)
Cochran W.G. (1952), The chi-square goodness-of-fit test. Annals of Mathematical Statistics, 23, 315-345
16)
Freeman G.H. and Halton J.H. (1951), Note on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika 38:141-149
17)
Mehta C.R. and Patel N.R. (1986), Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software, 12, 154–161
18)
Yates F. (1934), Contingency tables involving small numbers and the chi-square test. Journal of the Royal Statistical Society, 1,2 17-235
19)
Fisher R.A. (1934), Statistical methods for research workers (5th ed.). Edinburgh: Oliver and Boyd.
20)
Fisher R.A. (1935), The logic of inductive inference. Journal of the Royal Statistical Society, Series A, 98,39-54
21)
Lancaster H.O. (1961), Significance tests in discrete distributions. Journal of the American Statistical Association 56:223-234
22)
Anscombe F.J. (1981), Computing in Statistical Science through APL. Springer-Verlag, New York
23)
Pratt J.W. and Gibbons J.D. (1981), Concepts of Nonparametric Theory. Springer-Verlag, New York
24)
Plackett R.L. (1984), Discussion of Yates' „Tests of significance for 2×2 contingency tables”. Journal of Royal Statistical Society Series A 147:426-463
25)
Miettinen O.S. (1985), Theoretical Epidemiology: Principles of Occurrence Research in Medicine. John Wiley and Sons, New York
26)
Barnard G.A. (1989), On alleged gains in power from lower p-values. Statistics in Medicine 8:1469-1477
27)
Rothman K.J., Greenland S., Lash T.L. (2008), Modern Epidemiology, 3rd ed. (Lippincott Williams and Wilkins) 221-225
28)
Cochran W.G. (1954), Some methods for strengthening the common chi-squared tests. Biometrics. 10 (4): 417–451
29)
Armitage P. (1955), Tests for Linear Trends in Proportions and Frequencies. Biometrics. 11 (3): 375–386
30) , 34)
Newcombe R.G. (1998), Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Statistics in Medicine 17: 873-890
31)
Miettinen O.S. and Nurminen M. (1985), Comparative analysis of two rates. Statistics in Medicine 4: 213-226
32)
Beal S.L. (1987), Asymptotic confidence intervals for the difference between two binomial parameters for use with small samples. Biometrics 43: 941-950
33)
Wallenstein S. (1997), A non-iterative accurate asymptotic confidence interval for the difference between two Proportions. Statistics in Medicine 16: 1329-1336
35)
Wilson E.B. (1927), Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association: 22(158):209-212
36) , 37)
Altman D.G. (1998), Confidence intervals for the number needed to treat. BMJ. 317(7168): 1309–1312
38)
McNemar Q. (1947), Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12, 153-157
39)
Rufibach K. (2010), Assessment of paired binary data; Skeletal Radiology volume 40, pages1–4
40)
Fagerland M.W., Lydersen S., and Laake P. (2013), The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional, BMC Med Res Methodol; 13: 91
41)
Liddell F.D.K. (1983), Simplified exact analysis of case-referent studies; matched pairs; dichotomous exposure. Journal of Epidemiology and Community Health; 37:82-84
42)
Bowker A.H. (1948), Test for symmetry in contingency tables. Journal of the American Statistical Association, 43, 572-574