Pasek boczny

en:statpqpl:porown3grpl

Comparison - more than two groups

\begin{pspicture}(0,2.5)(15,14.5)
\rput(2,14){Interval scale}
\rput[tl](.1,13.4){\ovalnode{A}{\hyperlink{rozklad_normalny}{\begin{tabular}{c}Are\\the data\\normally\\distributed?\end{tabular}}}}
\rput[tl](0.15,10){\ovalnode{B}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{A}{B}
\rput[tl](4.5,9.8){\ovalnode{M}{\hyperlink{sferycznosc}{\begin{tabular}{c}Whether\\sphericity\\met?\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{B}{M}
\rput[br](10.9,8){\rnode{P}{\psframebox{\hyperlink{anova_repeated}{\begin{tabular}{c}ANOVA for\\dependent\\groups\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{M}{P}
\rput[br](7.7,5.5){\rnode{R}{\psframebox{\hyperlink{kor_br_sf}{\begin{tabular}{c}MANOVA\\or ANOVA\\with correction\\Epsilon, GG, H-F\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{M}{R}
\rput[tl](0.1,8){\ovalnode{C}{\hyperlink{wariancja}{\begin{tabular}{c}Are\\the variances\\equal?\end{tabular}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{B}{C}
\rput[br](3,3){\rnode{D}{\psframebox{\hyperlink{anova_one_way}{\begin{tabular}{c}ANOVA for\\independent\\groups\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{C}{D}
\rput[br](7.75,3.4){\rnode{N}{\psframebox{\hyperlink{anova_one_way_cor}{\begin{tabular}{c}ANOVA for \\independent\\groups with\\ correction $F^*$ and $F''$\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{C}{N}



\psline{->}(3.1,12.3)(6,12.3)
\rput(2.2,10.4){Y}
\rput(2.2,8.3){N}
\rput(2.2,5.3){Y}
\rput(4.8,12.5){N}
\rput(6.8,11.4){Y}
\rput(8.9,11.4){N}
\rput(11.9,11.4){Y}
\rput(14.1,11.4){N}
\rput(4.2,9.5){Y}
\rput(4.1,6.1){N}
\rput(8.5,9.1){Y}
\rput(6.5,7.7){N}



\rput(8,14){Ordinal scale}
\rput[tl](6,13){\ovalnode{E}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\rput[br](7.8,10){\rnode{F}{\psframebox{\hyperlink{anova_friedmana}{\begin{tabular}{c}Friedman\\ANOVA\end{tabular}}}}}
\rput[br](10.0,9.6){\rnode{G}{\psframebox{\hyperlink{anova_kruskal-wallis}{\begin{tabular}{c}Kruskal\\Wallis\\ANOVA\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{E}{F}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{E}{G}

\rput(13,14){Nominal scale}
\rput[tl](11,13){\ovalnode{H}{\hyperlink{zalezne_niezalezne}{\begin{tabular}{c}Are the data\\dependent?\end{tabular}}}}
\rput[br](13.1,10){\rnode{I}{\psframebox{\hyperlink{anova_q_cochrana}{\begin{tabular}{c}Q-Cochran\\ANOVA\end{tabular}}}}}
\rput[br](16.1,8.41){\rnode{J}{\psframebox{\hyperlink{chi_kwadrat_olbrzymi}{\begin{tabular}{c}multidimentional\\$\chi^2$ test\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{H}{I}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{H}{J}

\rput(3.5,4.9){\hyperlink{test_levenea}{Levenea,}}
\rput(3.1,4.6){\hyperlink{test_levenea}{Brown-Forsythe}}
\psline[linestyle=dotted]{<-}(2.6,6.2)(3.5,5.0)
\rput(4,10.8){\hyperlink{testy_normalnosci}{normality tests}}
\psline[linestyle=dotted]{<-}(3.2,11.4)(3.9,11.2)
\rput(9.0,7.0){\hyperlink{sferycznosc}{Mauchly test}}
\psline[linestyle=dotted]{<-}(7.9,8.1)(9.0,7.3)
\end{pspicture}

Note

The proposed test selection scheme for the multiple group comparison is not the only possible scheme and does not include all the tests proposed in the software for this comparison.

Note

Note, that simultaneous comparison of more than two groups can NOT be replaced with multiple performance the tests for the comparison of two groups. It is the result of the necessity of controlling the I type error $\alpha$. Choosing the $\alpha$ and using the $k$-fold selected test for the comparison of 2 groups, we could make the assumed level much higher $\alpha$. It is possible to avoid this error using the ANOVA test (Analysis of Variance) and contrasts or the POST-HOC tests dedicated to them.

 

Parametric tests

The ANOVA for independent groups

The one-way analysis of variance (ANOVA for independent groups) proposed by Ronald Fisher, is used to verify the hypothesis determining the equality of means of an analysed variable in several ($k\geq2$) populations.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \mu_1=\mu_2=...=\mu_k,\\
\mathcal{H}_1: & $not all $\mu_j$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\mu_1$,$\mu_2$,…,$\mu_k$ – means of an analysed variable of each population.

The test statistic is defined by:

\begin{displaymath}
F=\frac{MS_{BG}}{MS_{WG}},
\end{displaymath}

where:

$\displaystyle MS_{BG} = \frac{SS_{BG}}{df_{BG}}$ – mean square between-groups,

$\displaystyle MS_{WG} = \frac{SS_{WG}}{df_{WG}}$ – mean square within-groups,

$\displaystyle SS_{BG} = \sum_{j=1}^k{\frac{\left(\sum_{i=1}^{n_j}x_{ij}\right)^2}{n_j}}-\frac{\left(\sum_{j=1}^k{\sum_{i=1}^{n_j}x_{ij}}\right)^2}{N}$ – between-groups sum of squares,

$\displaystyle SS_{WG} = SS_{T}-SS_{BG}$ – within-groups sum of squares,

$\displaystyle SS_{T} = \left(\sum_{j=1}^k{\sum_{i=1}^{n_j}x_{ij}^2}\right)-\frac{\left(\sum_{j=1}^k{\sum_{i=1}^{n_j}x_{ij}}\right)^2}{N}$ – total sum of squares,

$df_{BG}=k-1$ – between-groups degrees of freedom,

$df_{WG}=df_{T}-df_{BG}$ – within-groups degrees of freedom,

$df_{T}=N-1$ – total degrees of freedom,

$N=\sum_{j=1}^k n_j$,

$n_j$ – samples sizes for $(j=1,2,...k)$,

$x_{ij}$ – values of a variable taken from a sample for $(i=1,2,...n_j)$, $(j=1,2,...k)$.

The F statistic has the F Snedecor distribution with $df_{BG}$ and $df_{WG}$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Effect size - partial $\eta^2$

This quantity indicates the proportion of explained variance to total variance associated with a factor. Thus, in a one-factor ANOVA model for independent groups, it indicates what proportion of the between-groups variability in outcomes can be attributed to the factor under study determining the independent groups.

\begin{displaymath}
\eta^2=\frac{SS_{BG}}{SS_{BG}+SS_{res}}
\end{displaymath}

POST-HOC tests

Introduction to contrast and POST-HOC testing

The settings window with the One-way ANOVA for independent groups can be opened in Statistics menu→Parametric testsANOVA for independent groups or in ''Wizard''.

EXAMPLE(age ANOVA.pqs file)

There are 150 persons chosen randomly from the population of workers of 3 different transport companies. From each company there are 50 persons drawn to the sample. Before the experiment begins, you should check if the average age of the workers of these companies is similar, because the next step of the experiment depends on it. The age of each participant is written in years. Age (company 1): 27, 33, 25, 32, 34, 38, 31, 34, 20, 30, 30, 27, 34, 32, 33, 25, 40, 35, 29, 20, 18, 28, 26, 22, 24, 24, 25, 28, 32, 32, 33, 32, 34, 27, 34, 27, 35, 28, 35, 34, 28, 29, 38, 26, 36, 31, 25, 35, 41, 37\\Age (company 2): 38, 34, 33, 27, 36, 20, 37, 40, 27, 26, 40, 44, 36, 32, 26, 34, 27, 31, 36, 36, 25, 40, 27, 30, 36, 29, 32, 41, 49, 24, 36, 38, 18, 33, 30, 28, 27, 26, 42, 34, 24, 32, 36, 30, 37, 34, 33, 30, 44, 29\\Age (company 3): 34, 36, 31, 37, 45, 39, 36, 34, 39, 27, 35, 33, 36, 28, 38, 25, 29, 26, 45, 28, 27, 32, 33, 30, 39, 40, 36, 33, 28, 32, 36, 39, 32, 39, 37, 35, 44, 34, 21, 42, 40, 32, 30, 23, 32, 34, 27, 39, 37, 35.

Before proceeding with the ANOVA analysis, the normality of the data distribution was confirmed.

The analysis window tested the assumption of equality of variance, obtaining p>0.05 in both tests.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the average age of the workers off all the analysed transport companies is the same,$\\
\mathcal{H}_1: & $at least 2 means are different.$
\end{array}$

Comparing the p-value = 0.005147 of the one-way analysis of variance with the significance level $\alpha=0.05$, you can draw the conclusion that the average ages of workers of these transport companies is not the same. Based just on the ANOVA result, you do not know precisely which groups differ from others in terms of age. To gain such knowledge, it must be used one of the POST-HOC tests, for example the Tukey test. To do this, you should resume the analysis by clicking and then, in the options window for the test, you should select Tukey HSD and Add graph.

The critical difference (CD) calculated for each pair of comparisons is the same (because the groups sizes are equal) and counts to 2.730855. The comparison of the $CD$ value with the value of the mean difference indicates, that there are significant differences only between the mean age of the workers from the first and the third transport company (only if these 2 groups are compared, the $CD$ value is less than the difference of the means). The same conclusion you draw, if you compare the p-value of POST-HOC test with the significance level $\alpha=0.05$. The workers of the first transport company are about 3 years younger (on average) than the workers of the third transport company. Two interlocking homogeneous groups were obtained, which are also marked on the graph.

We can provide a detailed description of the data by selecting Descriptive statistics in the analysis window

2022/02/09 12:56

The contrasts and the POST-HOC tests

An analysis of the variance enables you to get information only if there are any significant differences among populations. It does not inform you which populations are different from each other. To gain some more detailed knowledge about the differences in particular parts of our complex structure, you should use contrasts (if you do the earlier planned and usually only particular comparisons), or the procedures of multiple comparisons POST-HOC tests (when having done the analysis of variance, we look for differences, usually between all the pairs).

The number of all the possible simple comparisons is calculated using the following formula:

\begin{displaymath}
c={k \choose 2}=\frac{k(k-1)}{2}
\end{displaymath}

Hypotheses:

The first example - simple comparisons (comparison of 2 selected means):

\begin{array}{cc}
\mathcal{H}_0: & \mu_1=\mu_2,\\
\mathcal{H}_1: & \mu_1 \neq \mu_2.
\end{array}

The second example - complex comparisons (comparison of combination of selected means):

\begin{array}{cc}
\mathcal{H}_0: & \mu_1=\frac{\mu_2+\mu_3}{2},\\[0.1cm]
\mathcal{H}_1: & \mu_1\neq\frac{\mu_2+\mu_3}{2}.
\end{array}

If you want to define the selected hypothesis you should ascribe the contrast value $c_j$, $(j=1,2,...k)$ to each mean. The $c_j$ values are selected, so that their sums of compared sides are the opposite numbers, and their values of means which are not analysed count to 0.

  • The first example: $c_1=1$, $c_2=-1$, $c_3=0, ...c_k=0$.
  • The second example: $c_1=2$, $c_2=-1$, $c_3=-1$, $c_4=0$,…, $c_k=0$.

How to choose the proper hypothesis:

  • [i] Comparing the differences between the selected means with the critical difference (CD) calculated using the proper POST-HOC test:

\begin{array}{ccl}
$ if the differences between means  $ \ge CD & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if the differences between means  $ < CD & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The LSD Fisher test

For simple and complex comparisons, equal-size groups as well as unequal-size groups, when the variances are equal.

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,1,df_{WG}}}\cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}},
\end{displaymath}

where:

$F_{\alpha,1,df_{WG}}$ - is the critical value (statistic) of the F Snedecor distribution for a given significance level$\alpha$ and degrees of freedom, adequately: 1 and $df_{WG}$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_j\overline{x}_j}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}}}.
\end{displaymath}

The test statistic has the t-student distribution with $df_{WG}$ degrees of freedom.

The Scheffe test

For simple comparisons, equal-size groups as well as unequal-size groups, when the variances are equal.

  • [i] The value of a critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,df_{BG},df_{WG}}}\cdot \sqrt{(k-1)\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}},
\end{displaymath}

where:

$F_{\alpha,df_{BG},df_{WG}}$ - is the critical value(statistic) of the F Snedecor distribution for a given significance level $\alpha$ and $df_{BG}$ and $df_{WG}$ degrees of freedom.

  • [ii] The test statistic is defined by:

\begin{displaymath}
F=\frac{\left(\sum_{j=1}^k c_j\overline{x}_j\right)^2}{(k-1)\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}}.
\end{displaymath}

The test statistic has the F Snedecor distribution with $df_{BG}$ and $df_{WG}$ degrees of freedom.

The Tukey test.

For simple comparisons, equal-size groups as well as unequal-size groups, when the variances are equal.

  • [i] The value of a critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\frac{\sqrt{2}\cdot q_{\alpha,df_{WG},k} \cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}}}{2},
\end{displaymath}

where:

$q_{\alpha,df_{WG},k}$ - is the critical value (statistic) of the studentized range distribution for a given significance level$\alpha$ and $df_{WG}$ and $k$ degrees of freedom.

  • [ii] The test statistic is defined by:

\begin{displaymath}
q=\sqrt{2}\frac{\sum_{j=1}^k c_j\overline{x}_j}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)MS_{WG}}}.
\end{displaymath}

The test statistic has the studentized range distribution with $df_{WG}$ and $k$ degrees of freedom. Info.

The algorithm for calculating the p-value and the statistic of the studentized range distribution in PQStat is based on the Lund works (1983)1). Other applications or web pages may calculate a little bit different values than PQStat, because they may be based on less precised or more restrictive algorithms (Copenhaver and Holland (1988), Gleason (1999)).

Test for trend.

The test examining the existence of a trend can be calculated in the same situation as ANOVA for independent variables, because it is based on the same assumptions, but it captures the alternative hypothesis differently - indicating in it the existence of a trend in the mean values for successive populations. The analysis of the trend in the arrangement of means is based on contrasts Fisher LSD. By building appropriate contrasts, you can study any type of trend such as linear, quadratic, cubic, etc. Below is a table of sample contrast values for selected trends.

\begin{tabular}{|cc||c|c|c|c|c|c|c|c|c|c|}
\hline
&&\multicolumn{10}{c|}{Contrast}\\\hline
Number of groups&Trends&$c_1$&$c_2$&$c_3$&$c_4$&$c_5$&$c_6$&$c_7$&$c_8$&$c_9$&$c_{10}$\\\hline\hline
\multirow{2}{*}{3}&line&-1&0&1&&&&&&&\\
&quadratic&1&-2&1&&&&&&&\\\hline
\multirow{3}{*}{4}&line&-3&-1&1&3&&&&&&\\
&quadratic&1&-1&-1&1&&&&&&\\
&cubic&-1&3&-3&1&&&&&&\\\hline
\multirow{3}{*}{5}&line&-2&-1&0&1&2&&&&&\\
&quadratic&2&-1&-2&-1&2&&&&&\\
&cubic&-1&2&0&-2&1&&&&&\\\hline
\multirow{3}{*}{6}&line&-5&-3&-1&1&3&5&&&&\\
&quadratic&5&-1&-4&-4&-1&5&&&&\\
&cubic&-5&7&4&-4&-7&5&&&&\\\hline
\multirow{3}{*}{7}&line&-3&-2&-1&0&1&2&3&&&\\
&quadratic&5&0&-3&-4&-3&0&5&&&\\
&cubic&-1&1&1&0&-1&-1&1&&&\\\hline
\multirow{3}{*}{8}&line&-7&-5&-3&-1&1&3&5&7&&\\
&quadratic&7&1&-3&-5&-5&-3&1&7&&\\
&cubic&-7&5&7&3&-3&-7&-5&7&&\\\hline
\multirow{3}{*}{9}&line&-4&-3&-2&-1&0&1&2&3&4&\\
&quadratic&28&7&-8&-17&-20&-17&-8&7&28&\\
&cubic&-14&7&13&9&0&-9&-13&-7&14&\\\hline
\multirow{3}{*}{10}&line&-9&-7&-5&-3&-1&1&3&5&7&9\\
&quadratic&6&2&-1&-3&-4&-4&-3&-1&2&6\\
&cubic&-42&14&35&31&12&-12&-31&-35&-14&42\\\hline
\end{tabular}

Linear trend

A linear trend, like other trends, can be analyzed by entering the appropriate contrast values. However, if the direction of the linear trend is known, simply use the For trend option and indicate the expected order of the populations by assigning them consecutive natural numbers.

The analysis is performed on the basis of linear contrast, i.e. the groups indicated according to the natural order are assigned appropriate contrast values and the statistics are calculated Fisher LSD .

With the expected direction of the trend known, the alternative hypothesis is one-sided and the one-sided $p$-value is interpreted. The interpretation of the two-sided $p$-value means that the researcher does not know (does not assume) the direction of the possible trend.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Homogeneous groups.

For each post-hoc test, homogeneous groups are constructed. Each homogeneous group represents a set of groups that are not statistically significantly different from each other. For example, suppose we divided subjects into six groups regarding smoking status: Nonsmokers (NS), Passive smokers (PS), Noninhaling smokers (NI), Light smokers (LS), Moderate smokers (MS), Heavy smokers (HS) and we examine the expiratory parameters for them. In our ANOVA we obtained statistically significant differences in exhalation parameters between the tested groups. In order to indicate which groups differ significantly and which do not, we perform post-hoc tests. As a result, in addition to the table with the results of each pair of comparisons and statistical significance in the form of $p$:

we obtain a division into homogeneous groups:

In this case we obtained 4 homogeneous groups, i.e. A, B, C and D, which indicates the possibility of conducting the study on the basis of a smaller division, i.e. instead of the six groups we studied originally, further analyses can be conducted on the basis of the four homogeneous groups determined here. The order of groups was determined on the basis of weighted averages calculated for particular homogeneous groups in such a way, that letter A was assigned to the group with the lowest weighted average, and further letters of the alphabet to groups with increasingly higher averages.

The settings window with the One-way ANOVA for independent groups can be opened in Statistics menu→Parametric testsANOVA for independent groups or in ''Wizard''.

2022/02/09 12:56

The ANOVA for independent groups with F* and F" corrections

$F^*$ (Brown-Forsythe, 19742)) and $F''$ (Welch, 19513)) Corrections concern ANOVA for independent groups and are calculated when the assumption of equality of variances is not met.

The test statistic is in the form of:

\begin{displaymath}
F^*=\frac{SS_{BG}}{\sum_{j=1}^k\left(1-\frac{n_j}{n}sd_j^2\right)},
\end{displaymath}

\begin{displaymath}
F''=\frac{\frac{\sum_{j=1}^kw_j(\overline{x}_j-\widetilde{x})}{k-1}}{1+\frac{2(k-2)}{k^2-1}\sum_{j=1}^kh_j},
\end{displaymath}

where:

$sd_j$ – group standard deviation $j$,

$w_j=\frac{n_j}{sd_j^2}$ – group weight $j$,

$\widetilde{x}$ – weighted mean,

$h_j=\frac{\left(1-\frac{w_j}{\sum_{j=1}^kw_j}\right)^2}{n_j-1}$.

This statistic is subject toSnedecor's F distribution with $k-1$ and adjusted $df_{WG_k}$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

POST-HOC Tests

Introduction to the contrasts and POST-HOC tests was done in chapter concerning one-way analysis of variance.

T2 Tamhane test

For simple and complex comparisons, equal-size groups as well as unequal-size groups, when the variances differ significantly (Tamhane A. C., 19774)).

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha_{Sidak},1,df_v}}\cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)},
\end{displaymath}

where:

$F_{\alpha_{Sidak},1,df_v}$ - is the critical value (statistics) of the Snedecor's F distribution for modified significance level $\alpha_{Sidak}$ and for degrees of freedom 1 and $df_{v}$ respectively,

$\alpha_{Sidak}=1-(1-\alpha)^{(1/k)}$,

$df_v=\frac{\left(\sum_{j=1}^k\frac{c_j^2sd_j^2}{n_j}\right)^2}{\sum_{j=1}^k\frac{c_j^4sd_j^4}{n_j^2(n_j-1)}}$

  • [ii] The test statistic is in the form of:

\begin{displaymath}
t=\frac{\left(\sum_{j=1}^k c_j\overline{x}_j\right)^2}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)}}.
\end{displaymath}

This statistic is subject to the t-Student distribution with $df_v$ degrees of freedom, and p-value is adjusted by the number of possible simple comparisons.

BF test (Brown-Forsythe)

For simple and complex comparisons, equal-size groups as well as unequal-size groups, when the variances differ significantly (Brown M. B. i Forsythe A. B. (1974)5)).

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,k-1,df_v}}\cdot \sqrt{(k-1)\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)},
\end{displaymath}

where:

$F_{\alpha,k-1,df_v}$ - is the critical value (statistics) of the Snedecor's F distribution for a given significance level$\alpha$ as well as $k-1$ and $df_v$ degrees of freedom.

  • [ii] The test statistic is in the form of:

\begin{displaymath}
F=\frac{\left(\sum_{j=1}^k c_j\overline{x}_j\right)^2}{(k-1)\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)}.
\end{displaymath} This statistic is subject to Snedecor's F distribution with $k-1$ and $df_v$ degrees of freedom.

GH test (Games-Howell).

For simple and complex comparisons, equal-size groups as well as unequal-size groups, when the variances differ significantly (Games P. A. i Howell J. F. 19766)).

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\frac{q_{\alpha,k,df_v} \cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)}}{\sqrt{2}},
\end{displaymath}

gdzie:

$q_{\alpha,k,df_v}$ - is the critical value (statistics) of the the distribution of the studentised interval for a given significance level $\alpha$ as well as $k$ and $df_v$ degrees of freedom.

  • [ii] The test statistic is in the form of:

\begin{displaymath}
q=\sqrt{2}\frac{\sum_{j=1}^k c_j\overline{x}_j}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2sd_j^2}{n_j}\right)}}.
\end{displaymath} This statistic follows a studenty distribution with $k$ and $df_v$ degrees of freedom.

Trend test.

The test examining the presence of a trend can be calculated in the same situation as ANOVA for independent groups with correction $F^*$ and $F''$, because it is based on the same assumptions, however, differently captures the alternative hypothesis - indicating the existence of a trend in the mean values for successive populations. The analysis of the trend of the arrangement of means is based on contrasts (T2 Tamhane). By creating appropriate contrasts you can study any type of trend e.g. linear, quadratic, cubic, etc. A table of sample contrast values for certain trends can be found in the description trend test for Ona-Way ANOVA.

Linear trend

A linear trend, like other trends, can be analyzed by entering the appropriate contrast values. However, if the direction of the linear trend is known, simply use the Linear Trend option and indicate the expected order of the populations by assigning them consecutive natural numbers.

The analysis is performed based on linear contrast, i.e., the groups indicated according to the natural ordering are assigned appropriate contrast values and the T2 Tamhane statistic is calculated.

With the expected direction of the trend being known, the alternative hypothesis is one-sided and the one-sided value of $p$ is subject to interpretation. The interpretation of the two-sided value of $p$ means that the researcher does not know (does not assume) the direction of the possible trend.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Settings window for the One-way ANOVA for independent groups with F* and F„ adjustments is opened via menu StatisticsParametric testsANOVA for independent groups or via the ''Wizard''.

EXAMPLE (unemployment.pqs file)

There are many factors that control the time it takes to find a job during an economic crisis. One of the most important may be the level of education. Sample data on education and time (in months) of unemployment are gathered in the file. We want to see if there are differences in average job search time for different education categories.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $average job search time is the same $\\
& $for every category of education,$\\
\mathcal{H}_1: & $at least one education category (one population)$\\
& $  have a different average job search time.$\\
\end{array}$

Due to differences in variance between populations (for Levene test $p=0.0001$ and for Brown-Forsythe test $p=0.0002$):

the analysis is performed with the correction of various variances enabled. The obtained result of the adjusted $F$ statistic is shown below.

Comparing $p<0.0001$ (for the $F^*$ test) and $p<0.0001$ (for the $F''$ test) with a significance level of $\alpha=0.05$, we find that the average job search time differs depending on the education one has. By performing one of the POST-HOC tests, designed to compare groups with different variances, we find out which education categories are affected by the differences found:

The least significant difference (LSD) determined for each pair of comparisons is not the same (even though the group sizes are equal) because the variances are not equal. Relating the LSD value to the resulting differences in mean values yields the same result as comparing the p-value with a significance level of $\alpha=0.05$. The differences are between primary and higher education, primary and secondary education, and vocational and higher education. The resulting homogeneous groups overlap. In general, however, looking at the graph, we might expect that the more educated a person is, the less time it takes them to find a job.

In order to test the stated hypothesis, it is necessary to perform the trend analysis. To do so, we {reopen the analysis} with the button and, in the test options window, select: the Tamhane's T2 method, the Contrasts option (and set the appropriate contrast), or the For trend option (and indicate the order of education categories by specifying consecutive natural numbers).

Depending on whether the direction of the correlation between education and job search time is known to us, we use a one-sided or two-sided p-value. Both of these values are less than the given significance level. The trend we predicted is confirmed, that is, at a significance level of $\alpha=0.05$ we can say that this trend does indeed exist in the population from which the sample is drawn.

2022/02/09 12:56

The Brown-Forsythe test and the Levene test

Both tests: the Levene test (Levene, 1960 7)) and the Brown-Forsythe test (Brown and Forsythe, 1974 8)) are used to verify the hypothesis determining the equality of variance of an analysed variable in several ($k>=2$) populations.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \sigma_1^2=\sigma_2^2=...=\sigma_k^2,\\
\mathcal{H}_1: & $not all $\sigma_j^2$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\sigma_1^2$,$\sigma_2^2$,…,$\sigma_k^2$ $-$ variances of an analysed variable of each population.

The analysis is based on calculating the absolute deviation of measurement results from the mean (in the Levene test) or from the median (in the Brown-Forsythe test), in each of the analysed groups. This absolute deviation is the set of data which are under the same procedure performed to the analysis of variance for independent groups. Hence, the test statistic is defined by:

\begin{displaymath}
F=\frac{MS_{BG}}{MS_{WG}},
\end{displaymath}

The test statistic has the F Snedecor distribution with $df_{BG}$ and $df_{WG}$ degrees of freedom. The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

The Brown-Forsythe test is less sensitive than the Levene test, in terms of an unfulfilled assumption relating to distribution normality.

The settings window with the Levene, Brown-Forsythe tests' can be opened in Statistics menu→Parametric testsLevene, Brown-Forsythe.

2022/02/09 12:56

The ANOVA for dependent groups

The single-factor repeated-measures analysis of variance (ANOVA for dependent groups) is used when the measurements of an analysed variable are made several times ($k\geq2$) each time in different conditions (but we need to assume that the variances of the differences between all the pairs of measurements are pretty close to each other).

This test is used to verify the hypothesis determining the equality of means of an analysed variable in several ($k\geq2$) populations.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \mu_1=\mu_2=...=\mu_k,\\
\mathcal{H}_1: & $not all $\mu_j$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\mu_1$,$\mu_2$,…,$\mu_k$ – means for an analysed features, in the following measurements from the examined population.

The test statistic is defined by:

\begin{displaymath}
F=\frac{MS_{BC}}{MS_{res}}
\end{displaymath}

where:

$\displaystyle MS_{BC} = \frac{SS_{BC}}{df_{BC}}$ – mean square between-conditions,

$\displaystyle MS_{res} = \frac{SS_{res}}{df_{res}}$ – mean square residual,

$\displaystyle SS_{BC} = \sum_{j=1}^k\left({\frac{\left(\sum_{i=1}^{n}x_{ij}\right)^2}{n}}\right)-\frac{\left(\sum_{j=1}^k{\sum_{i=1}^{n}x_{ij}}\right)^2}{N}$ – between-conditions sum of squares,

$\displaystyle SS_{res} = SS_{T}-SS_{BS}-SS_{BC}$ – residual sum of squares,

$\displaystyle SS_{T} = \left(\sum_{j=1}^k{\sum_{i=1}^{n}x_{ij}^2}\right)-\frac{\left(\sum_{j=1}^k{\sum_{i=1}^{n}x_{ij}}\right)^2}{N}$ – total sum of squares,

$\displaystyle SS_{BS} = \sum_{i=1}^n\left(\frac{\left(\sum_{j=1}^k x_{ij}\right)^2}{k}\right)-\frac{\left(\sum_{j=1}^k{\sum_{i=1}^{n}x_{ij}}\right)^2}{N}$ – between-subjects sum of squares,

$df_{BC}=k-1$ – between-conditions degrees of freedom,

$df_{res}=df_{T}-df_{BC}-df_{BS}$ – residual degrees of freedom,

$df_{T}=N-1$ – total degrees of freedom,

$df_{BS}=n-1$ – between-subjects degrees of freedom,

$N=nk$,

$n$ – sample size,

$x_{ij}$ – values of the variable from $i$ subjects $(i=1,2,...n)$ in $j$ conditions $(j=1,2,...k)$.

The test statistic has the F Snedecor distribution with $df_{BC}$ and $df_{res}$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Effect size - partial <latex>$\eta^2$</latex>

This quantity indicates the proportion of explained variance to total variance associated with a factor. Thus, in a repeated measures model, it indicates what proportion of the between-conditions variability in outcomes can be attributed to repeated measurements of the variable.

\begin{displaymath}
\eta^2=\frac{SS_{BC}}{SS_{BC}+SS_{res}}
\end{displaymath}

Testy POST-HOC

Introduction to the contrasts and the POST-HOC tests was performed in the \ref{post_hoc} unit, which relates to the one-way analysis of variance.

The LSD Fisher test

For simple and complex comparisons (frequency in particular measurements is always the same).

Hypotheses:

Example - simple comparisons (comparison of 2 selected means):

\begin{array}{cc}
\mathcal{H}_0: & \mu_j=\mu_{j+1},\\
\mathcal{H}_1: & \mu_j \neq \mu_{j+1}.
\end{array}

  • [i] The value of the critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,1,df_{res}}}\cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}},
\end{displaymath}

where:

$F_{\alpha,1,df_{res}}$ - is the critical value (statistic) of the F Snedecor distribution for a given significance level $\alpha$and degrees of freedom, adequately: 1 and $df_{res}$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_j\overline{x}_j}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}}}.
\end{displaymath}

The test statistic has the t-Student distribution with $df_{res}$ degrees of freedom. Note!

For contrasts $SE_{contrast}$ is used instead of $\sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}}$, and degrees of freedem: $df_{BS}$.

The Scheffe test

For simple comparisons (frequency in particular measurements is always the same).

  • [i] The value of the critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,df_{BC},df_{res}}}\cdot \sqrt{(k-1)\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}},
\end{displaymath}

where:

$F_{\alpha,df_{BC},df_{res}}$ - is the critical value (statistic) of the F Snedecor distribution for a given significance level $\alpha$ and $df_{BC}$ and $df_{res}$ degrees of freedom.

  • [$\mathfrak{(ii)}$] The test statistic is defined by:

\begin{displaymath}
F=\frac{\left(\sum_{j=1}^k c_j\overline{x}_j\right)^2}{(k-1)\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}}.
\end{displaymath} The test statistic has the F Snedecor distribution with $df_{BC}$ and $df_{ref}$ degrees of freedom.

The Tukey test.

For simple comparisons (frequency in particular measurements is always the same).

  • [i] The value of the critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\frac{\sqrt{2}\cdot q_{\alpha,df_{res},k} \cdot \sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}}}{2},
\end{displaymath}

where:

$q_{\alpha,df_{res},k}$ - is the critical value (statistic) of the studentized range distribution for a given significance level $\alpha$ and $df_{res}$ and $k$ degrees of freedom.

  • [ii] The test statistic is defined by:

\begin{displaymath}
q=\sqrt{2}\frac{\sum_{j=1}^k c_j\overline{x}_j}{\sqrt{\left(\sum_{j=1}^k \frac{c_j^2}{n}\right)MS_{res}}}.
\end{displaymath}

The test statistic has the studentized range distribution with $df_{res}$ and $k$ degrees of freedom.

Info.

The algorithm for calculating the p-value and statistic of the studentized range distribution in PQStat is based on the Lund works (1983)\cite{lund}. Other applications or web pages may calculate a little bit different values than PQStat, because they may be based on less precised or more restrictive algorithms (Copenhaver and Holland (1988), Gleason (1999)).

Test for trend.

The test that examines the existence of a trend can be calculated in the same situation as ANOVA for dependent variables, because it is based on the same assumptions, but it captures the alternative hypothesis differently – indicating in it the existence of a trend of mean values in successive measurements. The analysis of the trend in the arrangement of means is based on contrasts Fisher LSD test. By building appropriate contrasts, you can study any type of trend, e.g. linear, quadratic, cubic, etc. A table of example contrast values for selected trends can be found in the description of the testu dla trendu for ANOVA of independent variables.

Linear trend

Trend liniowy, tak jak pozostałe trendy, możemy analizować wpisując odpowiednie wartości kontrastów. Jeśli jednak znany jest kierunek trendu liniowego, wystarczy skorzystać z opcji Trend liniowy i wskazać oczekiwaną kolejność populacji przypisując im kolejne liczby naturalne.

A linear trend, like other trends, can be analyzed by entering the appropriate contrast values. However, if the direction of the linear trend is known, simply use the Fisher LSD test option and indicate the expected order of the populations by assigning them consecutive natural numbers.

With the expected direction of the trend known, the alternative hypothesis is one-sided and the one-sided p-values is interpreted. The interpretation of the two-sided p-value means that the researcher does not know (does not assume) the direction of the possible trend.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Single-factor repeated-measures ANOVA can be opened in Statistics menu→Parametric testsANOVA for dependent groups or in ''Wizard''.

EXAMPLE (pressure.pqs file)

2022/02/09 12:56

The ANOVA for dependent groups with Epsilon correction and MANOVA

Epsilon and MANOVA corrections apply to repeated measurements ANOVA and are calculated when the assumption of sphericity is not met or the variances of the differences between all pairs of measurements are not close to each other.

Correction of non-sphericity

The degree to which sphericity is met is represented by the value of $W$ in the Mauchly test, but also by the values of Epsilon ($\varepsilon$) calculated with corrections. $\varepsilon=1$ indicates strict adherence to the sphericity condition. The smaller the value of Epsilon is compared to 1, the more the sphericity assumption is affected. The lower limit that Epsilon can reach is $\frac{1}{k-1}$.

To minimize the effects of non-sphericity, three corrections can be used to change the number of degrees of freedom when testing from an F distribution. The simplest but weakest is the Epsilon lower bound correction. A slightly stronger but also conservative one is the Greenhouse-Geisser correction (1959)9). The strongest is the correction by Huynh-Feldt (1976)10). When sphericity is significantly affected, however, it is most appropriate to perform an analysis that does not require this assumption, namely MANOVA.

A multidimensional approach - MANOVA

MANOVA i.e. multivariate analysis of variance not assuming sphericity. If this assumption is not met, it is the most efficient method, so it should be chosen as a substitute for analysis of variance for repeated measurements. For a description of this method, see univariate MANOVA. Its use for repeated measures (without the independent groups factor) limits its application to data that are differences of adjacent measurements and provides testing of the same hypothesis as ANOVA for dependent variables. Settings window for ANOVA for dependent groups with Epsilon correction and MANOVA is opened via menu StatisticsParametric testsANOVA for dependent groups or via ''Wizard''.

EXAMPLE (pressure.pqs file)

The effectiveness of two treatments for hypertension was analyzed. A sample of 56 patients was collected and randomly assigned to two groups: group treated with drug A and group treated with drug B. Systolic blood pressure was measured three times in each group: before treatment, during treatment and after 6 months of treatment.

Hypotheses for treated with drug A:

$\begin{array}{cl}
\mathcal{H}_0: & $Mean systolic blood pressure is the same$\\
& $at any stage of treatment - for those treated with drug A,$\\
\mathcal{H}_1: & $At least one stage of treatment with the drug A$\\
& $  mean systolic blood pressure is different.$\\
\end{array}$

The hypotheses for those treated with drug B read similar.

Since the data have a normal distribution, we begin our analysis by testing the assumption of sphericity. We perform the testing for each group separately using a multiple filter.

Failure to meet the sphericity assumption by the group treated with drug B is indicated by both the observed values of the covariance and correlation matrix and the result of the Mauchly test ($W=0.68$, p=0.0063.

We resume our analysis and in the test options window select the primary filter to perform a repeated-measures ANOVA - for those treated with drug A, followed by a correction of this analysis and a MANOVA statistic - for those treated with drug B.

Results for those treated with drug A:

indicate significant (at the level of significance $\alpha=0.05$) differences between mean systolic blood pressure values (p<0.0001 for repeated measures ANOVA). More than 66% of the between-conditions variation in outcomes can be explained by the use of drug A ($\eta=0.66$). The differences apply to all treatment stages compared (POST-HOC score). The decrease in systolic blood pressure due to treatment is also significant (p<0.0001). Thus, we can consider Drug A as an effective drug.

Results for those treated with drug B:

indicate that there are no significant differences between mean systolic blood pressure values, both when we use epsilon and Lambda Wilks (MANOVA) corrections. As little as 17% of the between-conditions variation in results can be explained by the use of drug B ($\eta=0.17$).

2022/02/09 12:56

Mauchly's sphericity

Sphericity assumption is similar but stronger than the assumption of equality of variance. It is met if the variances for the differences between pairs of repeated measurements are the same. Usually, the simpler but more stringent compound symmetry condition is considered in place of the sphericity assumption. This can be done because meeting the compounded symmetry condition entails meeting the sphericity assumption.

Compound symmetry condition assumes, symmetry in the covariance matrix, and therefore equality of variances (elements of the main diagonal of the covariance matrix) and equality of covariances (elements off the main diagonal of the covariance matrix).

Violating the assumption of sphericity or combined symmetry unduly reduces the conservatism of the F-test (makes it easier to reject the null hypothesis).

To check the sphericity assumption, the Mauchly test is used (1940)\cite{mauchly}. Statistical significance ($p\le \alpha$) here implies a violation of the sphericity assumption.

Basic application conditions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \sigma_{diff(1)}=\sigma_{diff(2)}=...=\sigma_{diff(s)},\\
\mathcal{H}_1: & $not all $\sigma_{diff(i)}$ are equal $(i=1,2,...,s)$,$
\end{array}

where:

$\sigma_{diff(i)}$ - population variance of differences between $i$-th pair of repeated measurements,

$s$ - number of pairs.

Mauchly's $W$ value is defined as follows:

\begin{displaymath}
W=\frac{\prod_{j=1}^{k-1}\lambda_j}{\left(\frac{1}{k-1}\sum_{j=1}^{k-1}\lambda_j\right)^{k-1}}.
\end{displaymath}

The test statistic has the form of:

\begin{displaymath}
\chi^2=(f-1)(n-1)\ln W,
\end{displaymath}

where:

$f=\frac{2(k-1)^2+(k-1)+2}{6(k-1)(n-1)}$,

$\lambda_j$ - eigenvalue of the expected covariance matrix,

$k$ - number of variables analyzed.

This statistic has asymptotically (for large sample) Chi-square distribution with $df=\frac{k(k-1)}{2}-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

A value of $W\approx1$ is an indication that the sphericity assumption is met. In interpreting the results of this test, however, it is important to note that it is sensitive to violations of the normality assumption of the distribution.

EXAMPLE (pressure.pqs file)

2022/02/09 12:56
2022/02/09 12:56

Non-parametric tests

The Kruskal-Wallis ANOVA

The Kruskal-Wallis one-way analysis of variance by ranks (Kruskal 1952 11); Kruskal and Wallis 1952 12)) is an extension of the U-Mann-Whitney test on more than two populations. This test is used to verify the hypothesis that there is no shift in the compared distributions, i.e., most often the insignificant differences between medians of the analysed variable in ($k\geq2$) populations (but you need to assume, that the variable distributions are similar - comparison of rank variances can be checked using Conover's rank test).

Additional analyses:

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \phi_1=\phi_2=...=\phi_k,\\
\mathcal{H}_1: & $not all $\phi_j$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\phi_1,\phi_2,...\phi_k$ distributions of the analysed variable of each population.

The test statistic is defined by:

\begin{displaymath}
H=\frac{1}{C}\left(\frac{12}{N(N+1)}\sum_{j=1}^k\left(\frac{\left(\sum_{i=1}^{n_j}R_{ij}\right)^2}{n_j}\right)-3(N+1)\right),
\end{displaymath}

where:

$N=\sum_{j=1}^k n_j$,

$n_j$ – samples sizes $(j=1,2,...k)$,

$R_{ij}$ – ranks ascribed to the values of a variable for $(i=1,2,...n_j)$, $(j=1,2,...k)$,

$\displaystyle C=1-\frac{\sum(t^3-t)}{N^3-N}$ – correction for ties,

$t$ – number of cases included in a tie.

The formula for the test statistic $H$ includes the correction for ties $C$. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $C=1$).

The $H$ statistic asymptotically (for large sample sizes) has the Chi-square distribution with the number of degrees of freedom calculated using the formula: $df = (k - 1)$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The POST-HOC tests

Introduction to the contrasts and the POST-HOC tests was performed in the unit, which relates to the one-way analysis of variance.

The Dunn test

For simple comparisons, equal-size groups as well as unequal-size groups.

The Dunn test (Dunn 196413)) includes a correction for tied ranks (Zar 201014)) and is a test corrected for multiple testing. The Bonferroni or Sidak correction is most commonly used here, although other, newer corrections are also available, described in more detail in Multiple comparisons.

Example - simple comparisons (comparing 2 selected median / mean ranks with each other):

\begin{array}{cc}
\mathcal{H}_0: & \theta_j=\theta_{j+1},\\
\mathcal{H}_1: & \theta_j \neq \theta_{j+1}.
\end{array}

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=Z_{\frac{\alpha}{c}}\sqrt{\frac{N(N+1)}{12}\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)},
\end{displaymath}

where:

$\displaystyle Z_{\frac{\alpha}{c}}$ - is the critical value (statistic) of the normal distribution for a given significance level $\alpha$ corrected on the number of possible simple comparisons $c$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
Z=\frac{\sum_{j=1}^k c_j\overline{R}_j}{\sqrt{\frac{N(N+1)}{12}\left(\sum_{j=1}^k \frac{c_j^2}{n_j}\right)}},
\end{displaymath}

where:

$\overline{R}_j$ – mean of the ranks of the $j$-th group, for $(j=1,2,...k)$,

The formula for the test statistic $Z$ includes a correction for tied ranks. This correction is applied when tied ranks are present (when there are no tied ranks this correction is not calculated because $\sum(t^3-t)=0$).

The test statistic asymptotically (for large sample sizes) has the normal distribution, and the p-value is corrected on the number of possible simple comparisons $c$.

Conover-Inman test

The non-parametric equivalent of Fisher LSD15), used for simple comparisons of both groups of equal and different sizes.

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,1,N-k}}\cdot\sqrt{S^2\frac{N-1-H}{N-k}\sum_{j=1}^k \frac{c_j^2}{n_j}},
\end{displaymath}

where:

$\displaystyle S^2=\frac{1}{N-1}\left(\sum_{j=1}^k\sum_{i=1}^{n_j}R_{ij}^2-N\frac{(N+1)^2}{4}\right)$

$\displaystyle F_{\alpha,1,N-k}$ is the critical value (statistic) Snedecor's F distribution for a given significance level $\alpha$ and for degrees of freedom respectively: 1 i $N-k$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_j\overline{R}_j}{\sqrt{S^2\frac{N-1-H}{N-k}\sum_{j=1}^k \frac{c_j^2}{n_j}}},
\end{displaymath}

where:

$\overline{R}_j$ – The mean ranks of the $j$-th group, for $(j=1,2,...k)$,

This statistic follows a t-Student distribution with $N-k$ degrees of freedom.

The settings window with the Kruskal-Wallis ANOVA can be opened in Statistics menu→NonParametric tests Kruskal-Wallis ANOVA or in ''Wizard''.

EXAMPLE (jobSatisfaction.pqs)

A group of 120 people was interviewed, for whom the occupation is their first job obtained after receiving appropriate education. The respondents rated their job satisfaction on a five-point scale, where:

1- unsatisfying job,

2- job giving little satisfaction,

3- job giving an average level of satisfaction,

4- job that gives a fairly high level of satisfaction,

5- job that is very satisfying.

We will test whether the level of reported job satisfaction does not change for each category of education.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the level of job satisfaction is the same for each education category,$\\
\mathcal{H}_1: & $at least one education category (one population)  $ \\
&$has different levels of job satisfaction.$
\end{array}$

The obtained value of p=0.001 indicates a significant difference in the level of satisfaction between the compared categories of education. Dunn's POST-HOC analysis with Bonferroni's correction shows that significant differences are between those with primary and secondary education and those with primary and tertiary education. Slightly more differences can be confirmed by selecting the stronger POST-HOC Conover-Iman.

In the graph showing medians and quartiles we can see homogeneous groups determined by the POST-HOC test. If we choose to present Dunn's results with Bonferroni correction we can see two homogeneous groups that are not completely distinct, i.e. group (a) - people who rate job satisfaction lower and group (b)- people who rate job satisfaction higher. Vocational education belongs to both of these groups, which means that people with this education evaluate job satisfaction quite differently. The same description of homogeneous groups can be found in the results of the POST-HOC tests.

We can provide a detailed description of the data by selecting descriptive statistics in the analysis window and indicating to add counts and percentages to the description.

We can also show the distribution of responses in a column plot.

2022/02/09 12:56

The Jonckheere-Terpstra test for trend

The Jonckheere-Terpstra test for ordered alternatives described independently by Jonckheere (1954) 16) an be calculated in the same situation as the Kruskal-Wallis ANOVA , as it is based on the same assumptions. The Jonckheere-Terpstra test, however, captures the alternative hypothesis differently - indicating in it the existence of a trend for successive populations.

Hypotheses are simplified to medians:

\begin{array}{cl}
\mathcal{H}_0: & \theta_1=\theta_2=...=\theta_k,\\
\mathcal{H}_1: & \theta_1\geq\theta_2\geq...\geq\theta_k, $ with at least one strict inequality$
\end{array}

Note

The term: „with at least one strict inequality” written in the alternative hypothesis of this test means that at least the median of one population should be greater than the median of another population in the order specified.

The test statistic has the form:

\begin{displaymath}
Z=\frac{L-\left[\frac{N^2-\sum_{j=1}^kn_j^2}{4}\right]}{SE}
\end{displaymath}

where:

$L=$ – sum of $l_{ij}$ values obtained for each pair of compared populations,

$l_{ij}$ – number of results higher than a preset value in the next occurring group,

$SE=\sqrt{\frac{A}{72}+\frac{B}{36N(N-1)(N-2)}+\frac{C}{8N(N-1)}}$,

$A=N(N-1)(2N+5)-\sum_{j=1}^kn_j(n_j-1)(2n_j+5)-\sum_{l=1}^gt_l(t_l-1)(2t_l+5)$,

$B=\sum_{j=1}^kn_j(n_j-1)(n_j-2)\cdot\sum_{l=1}^gt_l(t_l-1)(t_l-2)$,

$C=\sum_{j=1}^kn_j(n_j-1)\cdot\sum_{l=1}^gt_l(t_l-1)$,

$g$ – number of groups of different tied ranks,

$t_l$ – umber of cases included in the tied rank,

$N=\sum_{j=1}^k n_j$,

$n_j$ – sample sizes for $(j=1,2,...k)$.

Note

To be able to perform a trend analysis, the expected order of the populations must be indicated by assigning consecutive natural numbers.

The formula for the test statistic $Z$ includes the correction for ties. This correction is applied when tied ranks are present (when there are no tied ranks the test statistic formula reduces to the original Jonckheere-Terpstra formula without this correction).

The statistic $Z$ has asymptotically (for large samples) normal distribution.

With the expected direction of the trend known, the alternative hypothesis is one-sided and the one-sided p-value is interpreted. The interpretation of the two-sided p-value means that the researcher does not know (does not assume) the direction of the possible trend.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Jonckheere-Terpstra test for trend can be opened in Statistics menu→NonParametric testsKruskal-Wallis ANOVA or in ''Wizard''.

EXAMPLE cont. (jobSatisfaction.pqs file)

It is suspected that better educated people have high job demands, which may reduce the satisfaction level of the first job, which often does not meet such demands. Therefore, it is worthwhile to conduct a trend analysis.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $No indicated trend in satisfaction with first job,$\\
\mathcal{H}_1: & $There is an indicated trend in the level of satisfaction with the first job.$
\end{array}$

To do this, we resume the analysis with the button, select the Jonckheere-Terpstra trend test option, and assign successive natural numbers to the education categories.

The obtained one-sided value p<0.0001 and is less than the set significance level $\alpha=0.05$, which speaks in favor of a trend actually occurring consistent with the researcher's expectations.

We can also confirm the existence of this trend by showing the percentage distribution of responses obtained.

2022/02/09 12:56

The Conover ranks test of variance

Conover squared ranks test is used, similarly to Fisher-Snedecor test (for $k=2$), Levene test and Brown-Forsythe test (for $k>=2$) to verify the hypothesis of similar variation of the tested variable in several populations. It is the non-parametric counterpart of the tests indicated above, by that it does not assume normality of the data distribution and is based on the ranks17).However, this test examines variation and therefore distances to the mean, so the basic condition for its use is:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & $the dispersion of the data in the populations being compared is the same, $\\
\mathcal{H}_1: & $at least two populations differ in the amount of data dispersion$.
\end{array}

The test statistic has the form:

\begin{displaymath}
\chi^2=\frac{1}{D^2}\left(\sum_{j=1}^k\frac{S_j^2}{n_j}-N\overline{S}^2\right)
\end{displaymath}

where:

$N=n_1+n_2+...+n_k$,

$n_j$ – individual group sizes,

$S_j$ –sum of ranks squares in $j$-th group,

$\overline{S}=\frac{1}{N}\sum_{j=1}^k\S_j$ – mean of all ranks squares,

$D^2=\frac{1}{N-1}\left(\sum_{ji=1}^NR_i^4-N\overline{S}^2\right)$,

$R_i$ – ranks for values representing the distance of the measurement from the mean of a given group.

This statistic has a Chi-squaredistribution with $k-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Conover ranks test of variance can be opened in Statistics menu→NonParametric testsKruskal-Wallis ANOVA, option Conover ranks test of variance or textsf{Statistics} menu→NonParametric testsMann-Whitney, option Conover ranks test of variance.

EXAMPLE(surgeryMethod.pqs file)

Patients have been prepared for spinal surgery. The patients will be operated on by one of three methods. Preliminary allocation of each patient to each type of surgery has been made. At a later stage we intend to compare the condition of the patients after the surgeries, therefore we want the groups of patients to be comparable. They should be similar in terms of the height of the interbody space (WPMT) before surgery. The similarity should concern not only the average values but also the differentiation of the groups.

The distribution of the data was checked

It is found that for the two methods, the WPMT operation exhibits deviations from normality, largely caused by skewness of the data. Further comparative analysis will be conducted using the Kruskal-Wallis test to compare whether the level of WPMT differs between the methods, and the Conover test to indicate whether the spread of WPMT scores is similar in each method.

Hypotheses for Conover's variance test:

$\begin{array}{cl}
\mathcal{H}_0: & $The diversity (scope) of WPMT is the same for each method of operation,$\\
\mathcal{H}_1: & $WPMT diversity (range) is higher/lower for at least one method of operation.$
\end{array}$

Hypotheses for Kruskal-Wallis test:

$\begin{array}{cl}
\mathcal{H}_0: & $WPMT level is the same for each operation method,$\\
\mathcal{H}_1: & $WPMT level is higher/lower for at least one method of operation.$
\end{array}$

First, the value of Conover's test of variance is interpreted, which indicates statistically significant differences in the ranges of the groups compared (p=0.0022). From the graph, we can conclude that the differences are mainly in group 3. Since differences in WPMT were detected, the interpretation of the result of the Kruskal-Wallis test comparing the level of WPMT for these methods should be cautious, since this test is sensitive to heterogeneity of variance. Although the Kruskal-Wallis test showed no significant differences (p=0.2057), it is recommended that patients with low WPMT (who were mainly assigned to surgery with method B) be more evenly distributed, i.e. to see if they could be offered surgery with method A or C. After reassignment of patients, the analysis should be repeated.

2022/02/09 12:56

The Friedman ANOVA

The Friedman repeated measures analysis of variance by ranks – the Friedman ANOVA - was described by Friedman (1937)18). This test is used when the measurements of an analysed variable are made several times ($k\geq2$) each time in different conditions. It is also used when we have rankings coming from different sources (form different judges) and concerning a few ($k\geq2$) objects, but we want to assess the grade of the rankings agreement.

Iman Davenport (198019)) has shown that in many cases the Friedman statistic is overly conservative and has made some modification to it. This modification is the non-parametric equivalent of the ANOVA for dependent groups which makes it now recommended for use in place of the traditional Friedman statistic.

Additional analyses:

Basic assumptions:

Hypotheses relate to the equality of the sum of ranks for successive measurements ($R_{j}$) or are simplified to medians ($\theta_j$)

\begin{array}{cl}
\mathcal{H}_0: & \theta_1=\theta_2=...=\theta_k,\\
\mathcal{H}_1: & $not all $\theta_j$ are equal $(j=1,2,...,k)$,$
\end{array}

where:

$\theta_1,\theta_2,...\theta_k$ medians for an analysed features, in the following measurements from the examined population.

Two test statistics are determined: the Friedman statistic and the Iman-Davenport modification of this statistic.

The Friedman statistic has the form:

\begin{displaymath}
T_1=\frac{1}{C}\left(\frac{12}{nk(k+1)}\left(\sum_{j=1}^k\left(\sum_{i=1}^n R_{ij}\right)^2\right)-3n(k+1)\right),
\end{displaymath}

where:

$n$ – sample size,

$R_{ij}$ – ranks ascribed to the following measurements $(j=1,2,...k)$, separately for the analysed objects $(i=1,2,...n)$,

$\displaystyle C=1-\frac{\sum(t^3-t)}{n(k^3-k)}$ – correction for ties,

$t$ – number of cases included in a tie.

The Iman-Davenport modification of the Friedman statistic has the form:

\begin{displaymath}
T_2=\frac{(n_j-1)T_1}{n_j(k-1)-T_1}
\end{displaymath}

The formula for the test statistic $T_1$ and $T_2$ includes the correction for ties $C$. This correction is used, when ties occur (if there are no ties, the correction is not calculated, because of $C=1$).

The $T_1$ statistic has asymptotically (for large sample sizes) has the Chi-square distribution with $df=k - 1$ degrees of freedom.

The statistic $T_2$ follows the Snedecor's F distribution with $df_1=k-1$ i $df_2=(n_j-1)(k-1)$ degrees of freedem.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The POST-HOC tests

Introduction to the contrasts and the POST-HOC tests was performed in the unit, which relates to the one-way analysis of variance.

The Dunn test

For simple comparisons (frequency in particular measurements is always the same).

The Dunn test (Dunn 196420)) is a corrected test due to multiple testing. The Bonferroni or Sidak correction is most commonly used here, although other, newer corrections are also available and are described in more detail in the Multiple Comparisons section.

Hypotheses:

Example - simple comparisons (comparison of 2 selected medians):

\begin{array}{cc}
\mathcal{H}_0: & \theta_j=\theta_{j+1},\\
\mathcal{H}_1: & \theta_j \neq \theta_{j+1}.
\end{array}

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=Z_{\frac{\alpha}{c}}\sqrt{\frac{k(k+1)}{6n}},
\end{displaymath}

where:

$\displaystyle Z_{\frac{\alpha}{c}}$ - is the critical value (statistic) of the normal distribution for a given significance level $\alpha$ corrected on the number of possible simple comparisons $c$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
Z=\frac{\sum_{j=1}^k c_jR_j}{\sqrt{\frac{k(k+1)}{6n}}},
\end{displaymath}

where:

$\overline{R}_j$ – mean of the ranks of the $j$-th measurement, for $(j=1,2,...k)$,

The test statistic asymptotically (for large sample size) has normal distribution, and the p-value is corrected on the number of possible simple comparisons $c$.

Conover-Inman test

Non-parametric equivalent of Fisher LSD21), sed for simple comparisons (counts across measurements are always the same).

  • [i] he value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=\sqrt{F_{\alpha,1,df_2}}\cdot\sqrt{\frac{2\left(n_jA-\sum_{j=1}^tR_j^k\right)}{(n_j-1)(k-1)}},
\end{displaymath}

where:

$\displaystyle A=\sum_{i=1}^{n_j}\sum_{j=1}^kR_{ij}^2$ – sum of squares for ranks,

$\displaystyle F_{\alpha,1,df_2}$ to critical value (statistic) Snedecor's F distribution for a given significance level $\alpha$ and for degrees of freedom respectively: 1 and $df_2$.

  • [ii] The test statistic is defined by:

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_jR_j}{\sqrt{\frac{2\left(n_jA-\sum_{j=1}^tR_j^k\right)}{(n_j-1)(k-1)}}},
\end{displaymath}

where:

$R_j$ – the sum of ranks of $j$th measurement, for $(j=1,2,...k)$,

The test statistic hast-Student distribution with $df_2$ degrees of freedem.

The settings window with the Friedman ANOVA can be opened in Statistics menu→NonParametric tests Friedman ANOVA, trend test or in ''Wizard''

EXAMPLE (chocolate bar.pqs file)

Quarterly sale of some chocolate bar was measured in 14 randomly chosen supermarkets. The study was started in January and finished in December. During the second quarter, the billboard campaign was in full swing. Let's check if the campaign had an influence on the advertised chocolate bar sale.

\begin{tabular}{|c|c|c|c|c|}
\hline
Shop&Quarter I&Quarter II&Quarter III&Quarter IV\\\hline
SK1&3415&4556&5772&5432\\
SK2&1593&1937&2242&2794\\
SK3&1976&2056&2240&2085\\
SK4&1526&1594&1644&1705\\
SK5&1538&1634&1866&1769\\
SK6&983&1086&1135&1177\\
SK7&1050&1209&1245&977\\
SK8&1861&2087&2054&2018\\
SK9&1714&2415&2361&2424\\
SK10&1320&1621&1624&1551\\
SK11&1276&1377&1522&1412\\
SK12&1263&1279&1350&1490\\
SK13&1271&1417&1583&1513\\
SK14&1436&1310&1357&1468\\\hline
\end{tabular}

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is a lack of significant difference in sale values, in the compared $\\
& $quarters, in the population represented by the whole sample, $\\
\mathcal{H}_1: & $the difference in sale values, between at least 2 quarters, is significant,$\\
& $in the population represented by the whole sample.$
\end{array}$

Comparing the p-value of the Friedman test (as well as the p-value of the Iman-Davenport correction of the Friedman test) with a significance level $\alpha=0.05$, we find that sales of the bar are not the same in each quarter. The POST-HOC Dunn analysis performed with the Bonferroni correction indicates differences in sales volumes pertaining to quarters I and III and I and IV, and an analogous analysis performed with the stronger Conover-Iman test indicates differences between all quarters except quarters III and IV.

In the graph, we presented homogeneous groups determined by the Conover-Iman test.

We can provide a detailed description of the data by selecting Descriptive statistics in the analysis window .

If the data were described by an ordinal scale with few categories, it would be useful to present it also in numbers and percentages. In our example, this would not be a good method of description.

2022/02/09 12:56

The Page test for trend

The Page test for ordered alternative described in 1963 by Page E. B. 22) can be computed in the same situation as Friedman's ANOVA, since it is based on the same assumptions. However, Page's test captures the alternative hypothesis differently - indicating that there is a trend in subsequent measurements.

Hypotheses involve equality of the sum of ranks for successive measurements or are simplified to medians:

\begin{array}{cl}
\mathcal{H}_0: & \theta_1=\theta_2=...=\theta_k,\\
\mathcal{H}_1: & \theta_1\geq\theta_2\geq...\geq\theta_k, $ with at least one strict inequality$
\end{array}

Note

The term: „with at least one strict inequality” written in the alternative hypothesis of this test means that at least one median should be greater than the median of another group of measurements in the order specified.

The test statistic has the form:

\begin{displaymath}
Z=\frac{L-\left[\frac{nk(k+1)^2}{4}\right]}{\sqrt{\frac{n(k^3-k)^2}{144(k-1)}}}
\end{displaymath}

where:

$L=\sum_{j=1}^kR_jc_j$,

$R_j$ – the sum of ranks of $j$th measurement,

$c_j$ –the weight for $j$-th measurement informing about the natural order of this measurement among other measurements (weights are consecutive natural numbers).

Note

In order to perform a trend analysis, the expected ordering of measurements must be indicated by assigning consecutive natural numbers to successive measurement groups. These numbers are treated as weights in the analysis $c_1$, $c_2$, …, $c_k$.

The formula for the test statistic $Z$ does not include a correction for ties, making it somewhat more conservative when tied ranks are present. However, using a correction for tied ranks for this test is not recommended.

The statistic $Z$ has asymptotically (for large sample) normal distribution.

With the expected direction of the trend known, the alternative hypothesis is one-sided and the one-sided p-value is interpreted. Interpreting a two-sided p-value means that the researcher does not know (does not assume) the direction of the possible trend.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Page test for trend can be opened in Statistics menu→NonParametric tests Friedman ANOVA, trend test or in ''Wizard''

EXAMPLE cont. (chocolate bar.pqs file)

The expected result of the intensive advertising campaign conducted by the company is a steady increase in sales of the offered bar.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $no indicated trend in bar sales,$\\
\mathcal{H}_1: & $there is an indicated trend in bar sales.$
\end{array}$

Comparing a one-sided p<0.0001 with a significance level $\alpha=0.05$, we find that the campaign produced the expected trend of increased product sales.

2022/02/09 12:56

The Durbin's ANOVA (missing data)

Durbin's analysis of variance of repeated measurements for ranks was proposed by Durbin (1951)23). This test is used when measurements of the variable under study are made several times – a similar situation in which Friedman'sANOVA is used. The original Durbin test and the Friedman test give the same result when we have a complete data set. However, Durbin's test has an advantage – it can also be calculated for an incomplete data set. At the same time, data deficiencies cannot be located arbitrarily, but the data must form a so-called balanced and incomplete block:

  • the number of measurements for each object is $k$ ($k \leq t$),
  • each measurement is made on $r$ objects ($r \leq b$),
  • the number of objects for which the same pair of measurements was taken simultaneously is constant and equal to $\lambda$.

where:

$t$ – total number of considered measurements,

$b$ – total number of examined objects.

Basic assumptions:

Hypotheses involve equality of the sum of ranks for successive measurements ($R_{j}$) or are simplified to medians ($\theta_j$):

\begin{array}{cl}
\mathcal{H}_0: & \theta_1=\theta_2=...=\theta_k,\\
\mathcal{H}_1: & $not all $\theta_j$ are equal $(j=1,2,...,k)$,$
\end{array}

Two test statistics of the following form are determined:

\begin{displaymath}
T_1=\frac{(t-1)\left[\sum_{j=1}^tR_j^2-tC\right]}{A-C},
\end{displaymath}

\begin{displaymath}
T_2=\frac{T_1/(t-1)}{(b(k-1)-T_1)/(bk-b-t+1)},
\end{displaymath}

where:

$R_{j}$ – sum of ranks for successive measurements $(j=1,2,...t)$,

$R_{ij}$ – ranks assigned to successive measurements, separately for each of the studied objects $(i=1,2,...b)$,

$\displaystyle A=\sum_{i=1}^b\sum_{j=1}^tR_{ij}^2$ – sum of squared ranks,

$\displaystyle C=\frac{bk(k+1)^2}{4}$ – correction coefficient.

The formula for $T_1$ and $T_2$ statistics includes a correction for tied ranks.

For complete data, the $T_1$ statistic is the same as the Friedman test. It has asymptotically (for large sample sizes) Chi-square distribution with $df=t - 1$ degrees of freedom.

The $T_2$ statistic is the equivalent of Friedman's Iman-Davenport ANOVA adjustment, so it follows Snedecor's F distribution with $df_1=t-1$ i $df_2=bk-b-t+1$ degrees of freedom. It is now considered to be more precise than the $T_1$ statistic and is recommended for use with the $T_1$ statistic24).

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Testy POST-HOC

Introduction to the contrasts and the POST-HOC tests was performed in the unit, which relates to the one-way analysis of variance.

Conover-Inman test

Used for simple comparisons (the counts in each measurement are always the same).

Hypotheses:

Example - simple comparisons (comparing 2 selected medians / rank sums between each other):

\begin{array}{cc}
\mathcal{H}_0: & \theta_j=\theta_{j+1},\\
\mathcal{H}_1: & \theta_j \neq \theta_{j+1}.
\end{array}

  • [ii] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=t_{1-\alpha /2, bk-b-t+1}\sqrt{\frac{(A-C)2r}{bk-b-t+1}\left(1-\frac{T_1}{b(k-1)}\right)},
\end{displaymath}

where:

$t_{1-\alpha /2, bk-b-t+1}$ – is the [wartosc_krytyczna|critical value (statistic) of the t-Student distribution for a given significance level $\alpha$ and $df=bk-b-t+1$ degrees of freedom.

  • [ii] The test statistic has the form:

\begin{displaymath}
t=\frac{\sum_{j=1}^k c_jR_j}{\sqrt{\frac{(A-C)2r}{bk-b-t+1}\left(1-\frac{T_1}{b(k-1)}\right)}},
\end{displaymath}

The test statistic has t-Student distribution with $df=bk-b-t+1$ degrees of freedom.

The settings window with the Durbin's ANOVA can be opened in Statistics menu→NonParametric tests Friedman ANOVA, trend test or in ''Wizard''

Note

For records with missing data to be taken into account, you must check the Accept missing data option. Empty cells and cells with non-numeric values are treated as missing data. Only records with more than one numeric value will be analyzed.

EXAMPLE (mirror.pqs file)

An experiment was conducted among 20 patients in a psychiatric hospital (Ogilvie 1965)25). This experiment involved drawing straight lines according to a presented pattern. The pattern represented 5 lines drawn at different angles ($0^o, 22.5^o, 45^o, 67.5^o, 90^o$) relative to the indicated center. The patients' task was to reproduce the lines while having their hand covered. The time at which the patient drew the line was recorded as the result of the experiment. Ideally, each patient would draw a line from all angles, but elapsed time and fatigue would have a significant impact on performance. In addition, it is difficult to keep the patient interested and willing to cooperate for an extended period of time. Therefore, the project was planned and conducted in balanced and incomplete blocks. Each of the 20 patients traced a line at two angles (there were five possible angles). Thus, each angle was drawn eight times. The time at which each patient drew a line at a given angle was recorded in the table.

\begin{tabular}{|c||c|c|c|c|c|}
\hline
patient number &$0^o$&$22.5^o$&$45^o$&$67.5^o$&$90^o$\\\hline
1&7&15&&&\\
2&20&&72&&\\
3&8&&&26&\\
4&33&&&&36\\
5&7&16&&&\\
6&&68&67&&\\
7&&33&&64&\\
8&&34&&&12\\
9&10&&96&&\\
10&&29&59&&\\
11&&&17&9&\\
12&&&100&&15\\
13&16&&&32&\\
14&&19&&32&\\
15&&&36&39&\\
16&&&&44&54\\
17&16&&&&38\\
18&&17&&&12\\
19&&&37&&11\\
20&&&&56&6\\
\hline
\end{tabular}

We want to see if the time taken to draw each line is completely random, or if there are lines that took more or less time to draw.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no significant difference between the time  $\\
& $taken by patients to draw each line, $\\
\mathcal{H}_1: & $at least one line is drawn in shorter/longer time.$
\end{array}$

Comparing the p=0.0145 for the $T_2$ statistic (or the p=0.0342 for the $T_1$ statistic) with the $\alpha=0.05$ significance level, we find that the lines are not drawn at the same time. The POST-HOC analysis performed indicates that there is a difference in the time taken to draw the line at angle $0^o$. It is drawn faster than the lines at the angle of $22.5^o$, $45^o$ and $67.5^o$.

The graph shows homogeneous groups indicated by the post-hoc test.

2022/02/09 12:56

The Skillings-Mack ANOVA (missing data)

The analysis of variance of repeated measures for Skillings-Mack ranks was proposed by Skillings and Mack in 1981 26). t is a test that can be used when there are missing data, but the missing data need not occur in any particular setting. However, each site must have at least two observations. If there are no tied ranks and no gaps are present it is the same as the Friedman's ANOVA, and if data gaps are present in a balanced arrangement it corresponds to the results of Durbin's ANOVA.

Basic assumptions: Basic assumptions:

Hypotheses relate to the equality of the sum of ranks for successive measurements ($R_{j}$) or are simplified to medians ($theta_j$)

\begin{array}{cl}
\mathcal{H}_0: & \theta_1=\theta_2=...=\theta_k,\\
\mathcal{H}_1: & $nie wszystkie $\theta_j$ są sobie równe $(j=1,2,...,k)$,$
\end{array}

The test statistic has the form:

\begin{displaymath}
\chi^2=A\Sigma_0^{-1}A^T
\end{displaymath}

where:

$A=(A_1,A_2,...,A_{k-1}$

$A_j=\sum_{i=1}^n\sqrt{\frac{12}{s_i+1}}\left(R_{ij}-\frac{s_i+1}{2}\right)$,

$s_i$ – number of observations for $i$-th object,

$R_{ij}$ – ranks assigned to successive measurements ($j = 1, 2, ...k$), separately for each study object ($i = 1, 2, ...n$), with ranks for missing data equal to the average rank for the object,

$\Sigma_0$ – matrix determining the covariances for $A$ at the truth of $\mathcal{H}_0$27).

When each pair of measurements occurs simultaneously for at least one observation, this statistic has asymptotically (for large sample sizes) the Chi-square distribution with $k-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Skillings-Mack ANOVA can be opened in Statistics menu→NonParametric tests Friedman ANOVA, trend test or in ''Wizard''

Note

For records with missing data to be taken into account, you must check the Accept missing data option. Empty cells and cells with non-numeric values are treated as missing data. Only records containing more than one numeric value will be analyzed.

EXAMPLE (polling.pqs file)

A certain university teacher, wanting to improve the way he conducted his classes, decided to verify his teaching skills. In several randomly selected student groups, during the last class, he asked them to fill in a short anonymous questionnaire. The survey consisted of six questions about how the six specified parts of the material were illustrated. The students could rate it on a 5-point scale, where 1 - the way of presenting the material was completely incomprehensible, 5 - a very clear and interesting way of illustrating the material. The data obtained in this way turned out to be incomplete due to the fact that students did not answer questions about the part of the material they were absent on. In the 30-person group completing the survey, only 15 students provided complete responses. Performing an analysis that does not account for data gaps (in this case, a Friedman analysis) will have limited power by cutting the group size so drastically and will not lead to the detection of significant differences. Data gaps were not planned for and are not present in the balanced block, so this task cannot be performed using Durbin's analysis along with his POST-HOC test.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no significant difference in the evaluations of the different parts of the material by$\\
& $studentów, $\\
\mathcal{H}_1: & $at least one part of the material is assessed differently by students.$
\end{array}$

The results of the ANOVA Skillings-Mack analysis are presented in the following report:

The $p$ value obtained should be treated with caution due to possible tied ranks. However, for this study, the p=0.0067 is well below the accepted significance level of $\alpha=0.05$, indicating that significant differences exist. The differences in responses can be observed in the graph; however, there is no POST-HOC analysis available for this test.

2022/02/09 12:56

The Chi-square test for multidimensional contingency tables

Basic assumptions:

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & O_{ij...}=E_{ij...} $ for all categories,$\\
\mathcal{H}_1: & O_{ij...} \neq E_{ij...} $ for at least one category,$
\end{array}$

where:

$O_{ij...}$ and $E_{ij...}$ $-$ observed frequencies in a contingency table and the corresponding expected frequencies.

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^r\sum_{j=1}^c\sum...\sum\frac{(O_{ij...}-E_{ij...})^2}{E_{ij...}}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the Chi-square distribution with a number of degrees of freedom calculated using the formula: $df = (r - l)(c - 1)(l - 1) + (r- l)(c- 1) + (r- 1)(l- 1) + (c- 1)(l- 1)$ - for 3-dimensional tables.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Chi-square (multidimensional) test can be opened in Statistics menu → NonParametric tests (unordered categories)Chi-square (multidimensional) or in ''Wizard''.

Note

This test can be calculated only on the basis of raw data.

2022/02/09 12:56

The Q-Cochran ANOVA

The Q-Cochran analysis of variance, based on the Q-Cochran test, is described by Cochran (1950)29). This test is an extended McNemar test for $k\geq2$ dependent groups. It is used in hypothesis verification about symmetry between several measurements $X^{(1)}, X^{(2)},..., X^{(k)}$ for the $X$ feature. The analysed feature can have only 2 values - for the analysis, there are ascribed to them the numbers: 1 and 0.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & $all the "incompatible" observed frequencies are equal,$ \\
\mathcal{H}_1: & $not all the "incompatible" observed frequencies are equal,$
\end{array}

where:

„incompatible” observed frequencies – the observed frequencies calculated when the value of the analysed feature is different in several measurements.

The test statistic is defined by: \begin{displaymath}
Q=\frac{(k-1)\left(kC-T^2\right)}{kT-R}
\end{displaymath}

where:

$T=\sum_{i=1}^n\sum_{j=1}^kx_{ij}$,

$R=\sum_{i=1}^n\left(\sum_{j=1}^kx_{ij}\right)^2$,

$C=\sum_{j=1}^k\left(\sum_{i=1}^nx_{ij}\right)^2$,

$x_{ij}$ – the value of $j$-th measurement for $i$-th object (so 0 or 1).

This statistic asymptotically (for large sample size) has the Chi-square distribution with a number of degrees of freedom calculated using the formula: $df=k-1$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The POST-HOC tests

Introduction to the contrasts and the POST-HOC tests was performed in the unit, which relates to the one-way analysis of variance.

The Dunn test

For simple comparisons (frequency in particular measurements is always the same).

Hypotheses:

Example - simple comparisons (for the difference in proportion in a one chosen pair of measurements):

\begin{array}{cl}
\mathcal{H}_0: & $the chosen "incompatible" observed frequencies are equal,$ \\
\mathcal{H}_1: & $the chosen "incompatible" observed frequencies are different.$
\end{array}

  • [i] The value of critical difference is calculated by using the following formula:

\begin{displaymath}
CD=Z_{\frac{\alpha}{c}}\sqrt{2\frac{kT-R}{n^2k(k-1)}},
\end{displaymath}

where:

$\displaystyle Z_{\frac{\alpha}{c}}$ - is the critical value (statistic) of the normal distribution for a given significance level $\alpha$ corrected on the number of possible simple comparisons $c$.

</WRAP

  • [ii] The test statistic is defined by:

\begin{displaymath}
Z=\frac{\sum_{j=1}^k c_jp_j}{\sqrt{2\frac{kT-R}{n^2k(k-1)}}},
\end{displaymath}

where:

$p_j$ – the proportion $j$-th measurement $(j=1,2,...k)$,

The test statistic asymptotically (for large sample size) has the normal distribution, and the p-value is corrected on the number of possible simple comparisons $c$.

The settings window with the Cochran Q ANOVA can be opened in Statistics menu→ NonParametric testsCochran Q ANOVA or in ''Wizard''.

Note

This test can be calculated only on the basis of raw data.

EXAMPLE(test.pqs file)

We want to compare the difficulty of 3 test questions. To do this, we select a sample of 20 people from the analysed population. Every person from the sample answers 3 test questions. Next, we check the correctness of answers (an answer can be correct or wrong). In the table, there are following scores:

\begin{tabular}{|c|c|c|c|}
\hline
No.&question 1 answer &question 2 answer &question 3 answer \\\hline
1&correct&correct&wrong\\
2&wrong&correct&wrong\\
3&correct&correct&correct\\
4&wrong&correct&wrong\\
5&wrong&correct&wrong\\
6&wrong&correct&correct\\
7&wrong&wrong&wrong\\
8&wrong&correct&wrong\\
9&correct&correct&wrong\\
10&wrong&correct&wrong\\
11&wrong&wrong&wrong\\
12&wrong&wrong&correct\\
13&wrong&correct&wrong\\
14&wrong&wrong&correct\\
15&correct&wrong&wrong\\
16&wrong&wrong&wrong\\
17&wrong&correct&wrong\\
18&wrong&correct&wrong\\
19&wrong&wrong&wrong\\
20&correct&correct&wrong\\\hline
\end{tabular}

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $The individual questions received the same number of correct answers,$\\
& $in the analysed population,$\\
\mathcal{H}_1: & $There are different numbers of correct and wrong answers in individual test questions, $\\
& $in the analysed population.$
\end{array}$

Comparing the p value p=0.0077 with the significance level $\alpha=0.05$ we conclude that individual test questions have different difficulty levels. We resume the analysis to perform POST-HOC test by clicking , and in the test option window, we select POST-HOC Dunn.

The carried out POST-HOC analysis indicates that there are differences between the 2-nd and 1-st question and between questions 2-nd and 3-th. The difference is because the second question is easier than the first and the third ones (the number of correct answers the first question is higher).

2022/02/09 12:56
2022/02/09 12:56
1)
Lund R.E., Lund J.R. (1983), Algorithm AS 190, Probabilities and Upper Quantiles for the Studentized Range. Applied Statistics; 34
2)
Brown M. B., Forsythe A. B. (1974), The small sample behavior of some statistics which test the equality of several means. Technometrics, 16, 385-389
3)
Welch B. L. (1951), On the comparison of several mean values: an alternative approach. Biometrika 38: 330–336
4)
Tamhane A. C. (1977), Multiple comparisons in model I One-Way ANOVA with unequal variances. Communications in Statistics, A6 (1), 15-32
5)
Brown M. B., Forsythe A. B. (1974), The ANOVA and multiple comparisons for data with heterogeneous variances. Biometrics, 30, 719-724
6)
Games P. A., Howell J. F. (1976), Pairwise multiple comparison procedures with unequal n's and/or variances: A Monte Carlo study. Journal of Educational Statistics, 1, 113-125
7)
Levene H. (1960), Robust tests for the equality of variance. In I. Olkin (Ed.) Contributions to probability and statistics (278-292). Palo Alto, CA: Stanford University Press
8)
Brown M.B., Forsythe A. B. (1974a), Robust tests for equality of variances. Journal of the American Statistical Association, 69,364-367
9)
Greenhouse S. W., Geisser S. (1959), On methods in the analysis of profile data. Psychometrika, 24, 95–112
10)
Huynh H., Feldt L. S. (1976), Estimation of the Box correction for degrees of freedom from sample data in randomized block and split=plot designs. Journal of Educational Statistics, 1, 69–82
11)
Kruskal W.H., Wallis W.A. (1952), Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583-621
12) , 13) , 20)
Dunn O. J. (1964), Multiple comparisons using rank sums. Technometrics, 6: 241–252
14)
Zar J. H., (2010), Biostatistical Analysis (Fifth Editon). Pearson Educational
15) , 17) , 21) , 23)
Conover W. J. (1999), Practical nonparametric statistics (3rd ed). John Wiley and Sons, New York
16)
Jonckheere A. R. (1954), A distribution-free k-sample test against ordered alternatives. Biometrika, 41: 133–145) and Terpstra (1952)((Terpstra T. J. (1952), The asymptotic normality and consistency of Kendall's test against trend, when ties are present in one ranking. Indagationes Mathematicae, 14: 327–333
18)
Friedman M. (1937), The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32,675-701
19)
Iman R. L., Davenport J. M. (1980), Approximations of the critical region of the friedman statistic, Communications in Statistics 9, 571–595
22)
Page E. B. (1963), Ordered hypotheses for multiple treatments: A significance test for linear ranks. Journal of the American Statistical Association 58 (301): 216–30
24)
Durbin J. (1951), Incomplete blocks in ranking experiments. British Journal of Statistical Psychology, 4: 85–90
25)
Ogilvie J. C. (1965), Paired comparison models with tests for interaction. Biometrics 21(3): 651-64
26) , 27)
Skillings J.H., Mack G.A. (1981) On the use of a Friedman-type statistic in balanced and unbalanced block designs. Technometrics, 23:171–177
28)
Cochran W.G. (1952), The chi-square goodness-of-fit test. Annals of Mathematical Statistics, 23, 315-345
29)
Cochran W.G. (1950), The comparison ofpercentages in matched samples. Biometrika, 37, 256-266
en/statpqpl/porown3grpl.txt · ostatnio zmienione: 2022/02/12 16:23 przez admin

Narzędzia strony