The sampling window is opened via Data
→Sampling simulation …
Sampling simulation is a way of generating multinomial distribution data. It involves assigning a given number of cases to categories, in a user-specified manner. The generated data is returned in a new datasheet. The generation can be repeated, so that the datasheet will have many generated columns depending on the number of repetitions of this operation set in the sampling window.
Options:
Probability should be defined as a value between 0 and 1, with the sum of the probabilities given for all categories being 1.
Relative Risk defines risk relative to other categories and is a value greater than 1 for the increased risk category and a fraction less than 1 for the reduced risk category.
Setting the probability or relative risk values to the same level for all categories, is the same as the distribution for H0.
EXAMPLE (simulations.pqs file)
As a basis for the simulation, the population of Wielkopolska in 2013 was used, which according to the CSO was = 3467016 people. The voivodeship is divided into 315 municipalities. The municipalities differ significantly in the number of inhabitants. The most populous municipality (the capital of the province) 548028 inhabitants, the least populous 1454 inhabitants, the median and quartiles are respectively: 6298 (4462; 9621) inhabitants. Assuming that in 2013 there were 6934 residents of the voivodeship with disease X, it is necessary to simulate the distribution of the sick people in such a way as to obtain:
It should be noted that an uniform random distribution of 6934 patients does not imply a similar number of patients in each municipality. It is known that municipalities with a larger number of those at risk should have a corresponding larger number of patients than those with a smaller population. It is therefore of interest to distribute the patients in such a way that the ratio of patients to population is relatively constant. This implies accepting the null hypothesis H0 and the population variable. The number of individual municipalities was recorded in a column named: population.
The data drawn based on these assumptions are presented in the first column of the new datasheet. To be able to observe the random distribution of illness rates across municipalities, copy the resulting data into the „Random” datasheet of column „S1”. The formula in column 7 will then be recalculated (you can view and change the formula by setting Codes/Labels/Format in the column properties). On the map, the result is shown using the Map manager from the Spatial Analysis menu. The proportion of patients to population in each municipality is then plotted. An example of the result is shown on the map below.
In the „Clusters” datasheet, as in the previous task, the frequency for the study population is given. This time the higher frequency is expected in some municipalities (indicated on the map), so in addition, in the next column of the datasheet, the value of the relative risk for individual municipalities is presented, setting it to 4, for municipalities with increased risk and 1 for the remaining municipalities.
Appropriate sampling requires that you select the alternative hypothesis HA (by selecting the relative risk column) and the population variable (by indicating the population size column of the municipalities). The data drawn under these assumptions are presented in the first column of the new datasheet.
To be able to observe the distribution of the coefficient, assuming greater risk in the indicated municipalities, copy the result obtained to the datasheet „Clusters” column „S1”. The formula in column 7 will then be recalculated. On the map, the obtained result is presented using the Map manager from the Spatial Analysis menu. The proportion of ill people to population in each municipality is then plotted. An example of the result is shown on the map below.