Spis treści

Comparison - one group

\begin{pspicture}(0,6)(15,14.5)
\rput(2,14){\hyperlink{interwalowa}{Interval scale}}
\rput[tl](.1,13.4){\ovalnode{A}{\hyperlink{rozklad_normalny}{\begin{tabular}{c}Are\\the data\\normally\\distributed?\end{tabular}}}}
\rput[br](2.9,7.2){\rnode{B}{\psframebox{\hyperlink{test_t_student}{\begin{tabular}{c}Single-sample\\t-test\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{A}{B}

\rput(2.2,10.4){Y}
\rput(4.3,12.5){N}

\rput(7.5,14){\hyperlink{porzadkowa}{Ordinal scale}}
\rput[br](8.9,11.5){\rnode{C}{\psframebox{\hyperlink{test_wilcoxon_rangowanych_znakow}{\begin{tabular}{c}Wilcoxon\\(signed-ranks)\\test\end{tabular}}}}}
\ncline[angleA=-90, angleB=90, arm=.5, linearc=.2]{->}{A}{C}

\rput(12.5,14){\hyperlink{nominalna}{Nominal scale}}
\rput[br](13.8,11.09){\rnode{D}{\psframebox{\begin{tabular}{c}\hyperlink{test_chi_kwadrat_dobroci}{$\chi^2$ test}\\\hyperlink{test_chi_kwadrat_dobroci}{(goodness-of-fit),}\\\hyperlink{test_z_dla_proporcji}{tests for} \\\hyperlink{test_z_dla_proporcji}{one proportion}\\\end{tabular}}}}

\rput(4.0,10){\hyperlink{testy_normalnosci}{normality tests}}
\psline[linestyle=dotted]{<-}(3.4,11.2)(4,10.2)
\end{pspicture}

 

Parametric tests

The t-test for a single sample

The single-sample $t$ test is used to verify the hypothesis, that an analysed sample with the mean ($\overline{x}$) comes from a population, where mean ($\mu$) is a given value.
Basic assumptions:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \mu=\mu_0,\\
\mathcal{H}_1: & \mu\ne \mu_0,
\end{array}

where:

$\mu$ – mean of an analysed feature of the population represented by the sample,

$\mu_0$ – a given value.

The test statistic is defined by: \begin{displaymath}
t=\frac{\overline{x}-\mu_0}{sd}\sqrt{n},
\end{displaymath}

where:

$sd$ – standard deviation from the sample,

$n$ – sample size.

The test statistic has the t-Student distribution with $n-1$ degrees of freedom.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Note, that: If the sample is large and you know a standard deviation of the population, then you can calculate a test statistic using the formula: \begin{displaymath}
t=\frac{\overline{x}-\mu_0}{\sigma}\sqrt n.
\end{displaymath} The statistic calculated this way has the normal distribution. If $n \rightarrow \infty$ $t$-Student distribution converges to the normal distribution $N(0,1)$. In practice, it is assumed, that with $n>30$ the $t$-Student distribution may be approximated with the normal distribution.

Standardized effect size.

The Cohen's d determines how much of the variation occurring is the difference between the averages.

\begin{displaymath}
	d=\left|\frac{\overline{x}-\mu_0}{sd}\right|
\end{displaymath}.

When interpreting an effect, researchers often use general guidelines proposed by Cohen 1) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the Single-sample <latex>$t$</latex>-test can be opened in Statistics menu→Parametric testst-test or in ''Wizard''.

Note

Calculations can be based on raw data or data that are averaged like: arithmetic mean, standard deviation and sample size.

EXAMPLE (courier.pqs file)

You want to check if the time of awaiting for a delivery by some courier company is 3 days on the average $(\mu_0=3)$. In order to calculate it, there are 22 persons chosen by chance from all clients of the company as a sample. After that, there are written information about the number of days passed since the delivery was sent till it is delivered. There are following values: (1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7).

The number of awaiting days for the delivery in the analysed population fulfills the assumption of normality of distribution.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $mean of the number of awaiting days for the delivery, which is supposed$\\
&$to be delivered by the above-mentioned courier company is 3,$\\
\mathcal{H}_1: & $mean of the number of awaiting days for the delivery, which is supposed$\\
&$ to be delivered by the above-mentioned courier company is different from 3.$
\end{array}$

Comparing the $p$ value = 0.0881 of the $t$-test with the significance level $\alpha=0.05$ we draw the conclusion, that there is no reason to reject the null hypothesis which informs that the average time of awaiting for the delivery, which is supposed to be delivered by the analysed courier company is 3. For the tested sample, the mean is $\overline{x}=3.73$ and the standard deviation is $sd=1.91$.

2022/02/09 12:56

The Single-Sample Chi-square Test for a Population Variance

Basic assumptions:

Hypotheses:

\begin{array}{cc}
\mathcal{H}_0: & \sigma=\sigma_0,\\
\mathcal{H}_1: & \sigma\ne \sigma_0,
\end{array}

where:

$\sigma$ – standard deviation of a characteristic in the population represented by the sample,

$\sigma_0$ – setpoint.

The test statistic is defined by: \begin{displaymath}
t=\frac{(n-1)sd^2}{\sigma_0^2},
\end{displaymath}

where:

$sd$ – standard deviation in the sample,

$n$ – sample size.

The test statistic has the Chi-square distribution with the degrees of freedom determined by the formula: $df=n-1$.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Whereby, if the standard deviation value is less than the setpoint, the $p$ value is calculated as the doubled value of the area under the chi-square distribution curve to the left of the corresponding critical value, and if it is greater than the setpoint, it is the doubled value of the corresponding area to the right.

he settings window with the Chi-test for variance w can be opened in Statistics menu→Parametric testsChi-square test for variance.

Note

Calculations can be based on raw data or data that are averaged like: standard deviation and sample size.

EXAMPLE (dispenser.pqs file)

Before starting the production of another batch of a certain cough syrup, control measurements of the volume of syrup poured into the bottles were made. The technical documentation of the dosing device shows that the permissible variation in syrup volume measured by the standard deviation is 1ml. It should be verified that the tested device is working properly.

The distribution of the volume of syrup poured into the bottles was checked (with the Lilliefors test) obtaining a result consistent with this distribution. The analysis concerning the standard deviation can therefore be performed with the chi-square test for variance

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the standard deviation of the volume of syrup$\\
&$ poured by the dosing device is 1ml,$\\
\mathcal{H}_1: & $the standard deviation of the volume of syrup$ \\
&$poured by the dosing device is other than 1ml.$
\end{array}$

Comparing the $p<0.0001$ value of the $\chi^2$ test with the significance level $\alpha=0.05$ we find that the scatter of the dispensing device is different from 1ml. However, we can consider the performance of the device as correct because the standard deviation of the sample is 0.76, which is significantly less than the acceptable value from the technical documentation.

2022/02/09 12:56
2022/02/09 12:56

Non-parametric tests

 

The Wilcoxon test (signed-ranks)

The Wilcoxon signed-ranks test is also known as the Wilcoxon single sample test, Wilcoxon (1945, 1949)2). This test is used to verify the hypothesis, that the analysed sample comes from the population, where median ($\theta$) is a given value.

Basic assumptions:

Hypotheses:

\begin{array}{cl}
\mathcal{H}_0: & \theta=\theta_0, \\
\mathcal{H}_1: & \theta\neq \theta_0.
\end{array}

where:

$\theta$ – median of an analysed feature of the population represented by the sample,

$\theta_0$ – a given value.

Now you should calculate the value of the test statistics $Z$ ($T$ – for the small sample size), and based on this $p$ value.

The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Depending on the size of the sample, the test statistic takes a different form:

  • for a small sample size

\begin{displaymath}
T=\min\left(\sum R_-,\sum R_+\right),
\end{displaymath}

where:

$\sum R_+$ and $\sum R_-$ are adequately: a sum of positive and negative ranks.

This statistic has the Wilcoxon distribution.

  • for a large sample size

\begin{displaymath}
Z=\frac{T-\frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}-\frac{\sum t^3-\sum t}{48}}},
\end{displaymath}

where:

$n$ - the number of ranked signs (the number of ranks),

$t$ - the number of cases being included in the interlinked rank.

The test statistic formula $Z$ includes the correction for ties. This correction should be used when ties occur (when there are no ties, the correction is not calculated, because $\left(\sum t^3-\sum t\right)/48=0$.

$Z$ statistic asymptotically (for a large sample size) has the normal distribution. Continuity correction of the Wilcoxon test (Marascuilo and McSweeney (1977)3))

A continuity correction is used to enable the test statistic to take in all values of real numbers, according to the assumption of the normal distribution. Test statistic with a continuity correction is defined by:

\begin{displaymath}
Z=\frac{\left|T-\frac{n(n+1)}{4}\right|-0.5}{\sqrt{\frac{n(n+1)(2n+1)}{24}-\frac{\sum t^3-\sum t}{48}}}.
\end{displaymath}

Standardized effect size

The distribution of the Wilcoxon test statistic is approximated by the normal distribution, which can be converted to an effect size $r=\left|Z/n\right|$ 4) to then obtain the Cohen's d value according to the standard conversion used for meta-analyses:

\begin{displaymath}
	d=\frac{2r}{\sqrt{1-r^2}}
\end{displaymath}

When interpreting an effect, researchers often use general guidelines proposed by 5) defining small (0.2), medium (0.5) and large (0.8) effect sizes.

The settings window with the Wilcoxon test (signed-ranks) can be opened in Statistics menu$ \to$ NonParametric testsWilcoxon (signed-ranks) or in ''Wizard''.

EXAMPLE cont. (courier.pqs file)

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $median of the number of awaiting days for the delivery, which is supposed $\\
&$to be delivered by the analysed courier company is 3$\\
\mathcal{H}_1: & $median of the number of awaiting days for the delivery, which is supposed $ \\
&$to be delivered by the analysed courier company is different from 3$
\end{array}$

Comparing the p-value = 0.1232 of Wilcoxon test based on $T$ statistic with the significance level $\alpha=0.05$ we draw the conclusion, that there is no reason to reject the null hypothesis informing us, that usually the number of awaiting days for the delivery which is supposed to be delivered by the analysed courier company is 3. Exactly the same decision you would make basing on the p-value = 0.1112 or p-value = 0.1158 of Wilcoxon test based upon $Z$ statistic or $Z$ with correction for continuity.

2022/02/09 12:56

The Chi-square goodness-of-fit test

The $\chi^2$ test (goodnes-of-fit) is also called the one sample $\chi^2$ test and is used to test the compatibility of values observed for $r$ ($r>=2$) categories $X_1, X_2,..., X_r$ of one feature $X$ with hypothetical expected values for this feature. The values of all $n$ measurements should be gathered in a form of a table consisted of $r$ rows (categories: $X_1, X_2, ..., X_r$). For each category $X_i$ there is written the frequency of its occurence $O_i$, and its expected frequency $E_i$ or the probability of its occurence $p_i$. The expected frequency is designated as a product of $E_i=np_i$. The built table can take one of the following forms:

\begin{tabular}[t]{c@{\hspace{1cm}}c}
\begin{tabular}{c|c c}
$X_i$ categories& $O_i$ & $E_i$ \\\hline
$X_1$ & $O_1$ & $E_i$ \\
$X_2$ & $O_2$ & $E_2$ \\
... & ... & ...\\
$X_r$ & $O_r$ & $E_r$ \\
\end{tabular}
&
\begin{tabular}{c|c c}
$X_i$ categories&  $O_i$ & $p_i$ \\\hline
$X_1$ & $O_1$ & $p_1$ \\
$X_2$ & $O_2$ & $p_2$ \\
... & ... & ...\\
$X_r$ & $O_r$ & $p_r$ \\
\end{tabular}
\end{tabular}

Basic assumptions:

  • measurement on a nominal scale - any order is not taken into account,
  • large expected frequencies (according to the Cochran interpretation (1952)6),
  • observed frequencies total should be exactly the same as an expected frequencies total, and the total of all $p_i$ probabilities should come to 1.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & O_i=E_i $ for all categories,$\\
\mathcal{H}_1: & O_i \neq E_i $ for at least one category.$
\end{array}$

Test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^r\frac{(O_i-E_i)^2}{E_i}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the Chi-square distribution with the number of degrees of freedom calculated using the formula: $df=(r-1)$.
The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

The settings window with the Chi-square test (goodness-of-fit) can be opened in Statistics menu → NonParametric tests (unordered categories)Chi-square (goodnes-of-fit) or in ''Wizard''.

EXAMPLE (dinners.pqs file )

We would like to get to know if the number of dinners served in some school canteen within a given frame of time (from Monday to Friday) is statistically the same. To do this, there was taken a one-week-sample and written the number of served dinners in the particular days: Monday - 33, Tuesday - 29, Wednesday - 32, Thursday -36, Friday - 20.

As a result there were 150 dinners served in this canteen within a week (5 days). We assume that the probability of serving dinner each day is exactly the same, so it comes to $\frac{1}{5}$. The expected frequencies of served dinners for each day of the week (out of 5) comes to $E_i=150\cdot\frac{1}{5}=30$.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $the number of served dinners in the analysed school canteen within given$\\
& $days (of the week) is consistent with the expected number of given out dinners these$\\
& $days,$\\
\mathcal{H}_1: & $the number of served out dinners in the analysed school canteen within a given $\\
& $week is not consistent with the expected number of dinners given out these days.$
\end{array}$

The p-value from the $\chi^2$ distribution with 4 degrees of freedom comes to 0.2873. So using the significance level $\alpha=0.05$ you can estimate that there is no reason to reject the null hypothesis that informs about the compatibility of the number of served dinners with the expected number of dinners served within the particular days.

Note!

If you want to make more comparisons within the framework of a one research, it is possible to use the Bonferroni correction7). The correction is used to limit the size of I type error, if we compare the observed frequencies and the expected ones between particular days, for example:

Friday $\Longleftrightarrow$ Monday,

Friday $\Longleftrightarrow$ Tuesday,

Friday $\Longleftrightarrow$ Wednesday,

Friday $\Longleftrightarrow$ Thursday,

Provided that, the comparisons are made independently. The significance level $\alpha=0.05$ for each comparison must be calculated according to this correction using the following formula: $\alpha=\frac{0.05}{r}$, where $r$ is the number of executed comparisons. The significance level for each comparison according to the Bonferroni correction (in this example) is $\alpha=\frac{0.05}{4}=0.0125$.

However, it is necessary to remember that if you reduce $\alpha$ for each comparison, the power of the test is increased.

2022/02/09 12:56

Tests for one proportion

You should use tests for proportion if there are two possible results to obtain (one of them is an distinguished result with the size of m) and you know how often these results occur in the sample (we know a Z proportion). Depending on a sample size $n$ you can choose the Z test for a one proportion – for large samples and the exact binomial test for a one proportion – for small sample sizes . These tests are used to verify the hypothesis that the proportion in the population, from which the sample is taken, is a given value.

Basic assumptions:

  • measurement on a nominal scale - any order is not taken into account.

The additional condition for the Z test for proportion

  • large frequencies (according to Marascuilo and McSweeney interpretation (1977)8) each of these values: $np>5$ and $n(1-p)>5$).

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & p=p_0,\\
\mathcal{H}_1: & p\neq p_0,
\end{array}$

where:

$p$ – probability (distinguished proportion) in the population,

$p_0$ – expected probability (expected proportion).

The Z test for one proportion

The test statistic is defined by:

\begin{displaymath}
Z=\frac{p-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}},
\end{displaymath}

where:

$p=\frac{m}{n}$ distinguished proportion for the sample taken from the population,

$m$ – frequency of values distinguished in the sample,

$n$ – sample size.

The test statistic with a continuity correction is defined by:

\begin{displaymath}
Z=\frac{|p-p_0|-\frac{1}{2n}}{\sqrt{\frac{p_0(1-p_0)}{n}}}.
\end{displaymath}

The $Z$ statistic with and without a continuity correction asymptotically (for large sizes) has the normal distribution.

Binomial test for one proportion

The binomial test for one proportion uses directly the binomial distribution which is also called the Bernoulli distribution, which belongs to the group of discrete distributions (such distributions, where the analysed variable takes in the finite number of values). The analysed variable can take in $k=2$ values. The first one is usually definited with the name of a success and the other one with the name of a failure. The probability of occurence of a success (distinguished probability) is $p_0$, and a failure $1-p_0$.

The probability for the specific point in this distribution is calculated using the formula:

\begin{displaymath}
P(m)={n \choose m}p_0^m(1-p_0)^{n-m},
\end{displaymath}

where:

${n \choose m}=\frac{n!}{m!(n-m)!}$,

$m$ – frequency of values distinguished in the sample,

$n$ – sample size.

Based on the total of appropriate probabilities $P$ a one-sided and a two-sided p-value is calculated, and a two-sided $p$ value is defined as a doubled value of the less of the one-sided probabilities.

The p-value is compared with the significance level$\alpha$:

\begin{array}{ccl}
$ if $ p \le \alpha & \Longrightarrow & $ reject $ \mathcal{H}_0 $ and accept $ 	\mathcal{H}_1, \\
$ if $ p > \alpha & \Longrightarrow & $ there is no reason to reject $ \mathcal{H}_0. \\
\end{array}

Note

Note that, for the estimator from the sample, which in this case is the value of the $p$ proportion, a confidence interval is calculated. The interval for a large sample size can be based on the normal distribution - so-called Wald intervals. The more universal are intervals proposed by Wilson (1927)9) and by Agresti and Coull (1998)10). Clopper and Pearson (1934)11) intervals are more adequate for small sample sizes.

Comparison of interval estimation methods of a binomial proportion was published by Brown L.D et al (2001)12)

The settings window with the Z test for one proportion can be opened in Statistics menu→NonParametric tests (unordered categories)Z for proportion.

EXAMPLE cont. (dinners.pqs file)

Assume, that you would like to check if on Friday $\frac{1}{5}$ of all the dinners during the whole week are served. For the chosen sample $m=20$, $n=150$.

Select the options of the analysis and activate a filter selecting the appropriate day of the week – Friday. If you do not activate the filter, no error will be generated, only statistics for given weekdays will be calculated.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $on Friday, in a school canteen there are served $\frac{1}{5}$ out of all dinners which are served$ \\
& $within a week,$\\
\mathcal{H}_1: & $on Friday, in a school canteen there are significantly more than $\frac{1}{5}$ or less than $\frac{1}{5} \\
& $dinners out of all the dinners served within a week in this canteen.$
\end{array}$

The proportion of the distinguished value in the sample is $p=\frac{m}{n}=0.133$ and 95% Clopper-Pearson confidence interval for this fraction $(0.083, 0.198)$ does not include the hypothetical value of 0.2.

Based on the Z test without the continuity correction (p-value = 0.0412) and also on the basis of the exact value of the probability calculated from the binomial distribution (p-value = 0.0447) you can assume (on the significance level $\alpha=0.05$), that on Friday there are statistically less than $\frac{1}{5}$ dinners served within a week. However, after using the continuity correction it is not possible to reject the null hypothesis p-value = 0.0525).

2022/02/09 12:56
2022/02/09 12:56
1) , 5)
Cohen J. (1988), Statistical Power Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Hillsdale, New Jersey
2)
Wilcoxon F. (1945), Individual comparisons by ranking methods. Biometries 1, 80-83
3) , 8)
Marascuilo L.A. and McSweeney M. (1977), Nonparametric and distribution-free method for the social sciences. Monterey, CA: Brooks Cole Publishing Company
4)
Fritz C.O., Morris P.E., Richler J.J.(2012), Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General., 141(1):2–18.
6)
Cochran W.G. (1952), The chi-square goodness-of-fit test. Annals of Mathematical Statistics, 23, 315-345
7)
Abdi H. (2007), Bonferroni and Sidak corrections for multiple comparisons, in N.J. Salkind (ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: Sage
9)
E.B. (1927), Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association: 22(158):209-212
10)
Agresti A., Coull B.A. (1998), Approximate is better than „exact” for interval estimation of binomial proportions. American Statistics 52: 119-126
11)
Clopper C. and Pearson S. (1934), The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26: 404-413
12)
Brown L.D., Cai T.T., DasGupta A. (2001), Interval Estimation for a Binomial Proportion. Statistical Science, Vol. 16, no. 2, 101-133