The Chi-square test for large tables

These tests are based on the data gathered in the form of a contingency table of 2 features ($X$, $Y$). One of them has possible $r$ categories $X_1, X_2,..., X_r$ and the other one $c$ categories $Y_1, Y_2,..., Y_c$ (look at the table (\ref{tab_kontyngencji_obser})).

The $\chi^2$ test for $r\times c$ tables is also known as the Pearson's Chi-square test (Karl Pearson 1900). This test is an extension on 2 features of the Chi-square test (goodness-of-fit).

The test statistic is defined by:

\begin{displaymath}
\chi^2=\sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij}-E_{ij})^2}{E_{ij}}.
\end{displaymath}

This statistic asymptotically (for large expected frequencies) has the [en:statpqpl:rozkladypl:ciaglepl#rozklad_chi_kwadrat|Chi-square distribution]] with a number of degrees of freedom calculated using the formula: $df=(r-1)(c-1)$.
The p-value, designated on the basis of the test statistic, is compared with the significance level $\alpha$.

The settings window with the Chi-square test (RxC) can be opened in Statistics menu → NonParametric testsChi-square, Fisher, OR/RR or in ''Wizard''

EXAMPLE (country-education.pqs file)

There is a sample of 605 persons ($n=605$), who had 2 features analysed for ($X$=country of residence, $Y$=education). The first feature occurrs in 4 categories, and the second one in 3 categories ($X_1$=Country 1, $X_2$=Country 2, $X_3$=Country 3, $X_4$=Country 4, $Y_1$=primary, $Y_2$=secondary, $Y_3$=higher). The data distribution is shown below, in the contingency table:

Based on this sample, you would like to find out if there is any dependence between education and country of residence in the analysed population.

Hypotheses:

$\begin{array}{cl}
\mathcal{H}_0: & $there is no dependence between education and country of residence$\\
&$in the analysed population,$\\
\mathcal{H}_1: & $there is a dependence between education and country of residence$\\
&$in the analysed population.$
\end{array}$

Cochran's condition is satisfied.

The p-value = 0.0006. So, on the basis of the significance level $\alpha=0.05$ we can draw the conclusion that there is a dependence between education and country of residence in the analysed population. If we are interested in more precise information about the detected dependencies, we will obtain it by determining multiple comparisons through the options Fisher, Yates and others… and then Multiple column comparisons (RxC) and one of the corrections e.g. Benjamini-Hochberg

A closer look reveals that only the second country differs from the other countries in educational attainment in a statistically significant way.