It is an analysis of the degree of intensity of a given feature in spatial objects.
We use two pieces of information for the construction of a coefficient which will allow to check if the neighboring objects form clusters with similar values of the variable:
Note
The objects neighborhood is defined by a spatial weight matrix. In Moran's analysis window we can choose any matrix generated previously by using menu Spatial analysis
→ Tools
→ Spatial weights matrix
or indicate the neighbor matrix according to contiguity – Queen, row standardized, that is proposed by the program.
Note
It is not recommended to conduct Moran's analysis for objects without neighborhood (objects described in the weight matrix only with the 0 value). Such objects can be excluded from the analysis by deactivating them or an analysis can be made with the use of a different manner of defining neighborhood (a different weight matrix).
Moran's I coefficient – introduced by Moran in 1948 1).
In order to check if the selected objects are characterized by similar values of the variable one can use the multiplying rule which says that multiplying 2 positive numbers gives a positive result and multiplying 2 different numbers (1 positive and 1 negative) gives a negative result. With the use of this rule we calculate . Unfortunately, as the results of that rule are only obtained when there are both positive and negative values, the simple rule must be modified so as to ensure the presence of different signs. The values of the variable will, then, be replaced in the earlier formula with the differences of the values of the variable and of its mean value. In this way the objects with values smaller than the mean will be negative and those with values greater than the mean will be positive: . Obviously, the summation should concern neighboring objects, which means that, at this point, information from weights matrices must be used:
In this way non-neighboring objects obtain the weight value 0, for which reason the values of those objects are not added. Further operations which change the formula obtained in this manner are made with the view to making the obtained coefficient independent from the number of analyzed objects and to standardizing it so that its values are limited to the interval . As a result, Moran's autocorrelation coefficient is expressed with the formula:
where:
– the number of spatial objects (the number of points or polygons),
, – are the values of the variable for the compared objects,
– it is the mean value of the variable for all objects,
– elements of the spatial weights matrix (weights matrix row standardized),
,
– variance
Moran's linear autocorrelation coefficient studies the strength of the linear relationship between the standardized variable () and the spatial lag of the variable (). Spatial lag is the weighted mean from the standardized values of neighboring objects
A graphic presentation of spatial autocorrelation is Moran's scatter plot. Points in the first quarter (HH) and in the third quarter (LL) are objects surrounded by similar neighbors: HH (high-high) – objects with high values, surrounded by objects with high values; LL (low-low) – objects with low values, surrounded by objects with low values. Points in the second quarter (LH) and the fourth quarter (HL) are objects surrounded by neighbors not similar to them. LH (low-high) – objects with low values, surrounded by objects with high values; HL (high-low) – objects with high values, surrounded by objects with low values.
The belonging to and distribution of points in the four quarters of Moran's diagram indicates the type of autocorrelation. If points are distributed mainly in the second quarter (LH) and fourth (HL) – it is a sign of negative correlation, if they belong mainly to the first quarter (HH) and third (LL) – it is a sign of positive correlation. If the points are distributed evenly in all four quarters then spatial autocorrelation does not exist.
On the Moran's diagram there is a regression line, the direction of which also allows to interpret Moran's coefficient :
The square of Moran's coefficient informs about the degree (it is a percentage) to which the value of the variable in the object is explained by the value of that variable in neighboring objects.
Note
When the values of a studied feature are characterized by a great variability of variance then it is desirable to stabilize that variability. The basic information about smoothing variables have been described in the Chapter \ref{wygladz_przestrz} SPATIAL SMOOTHING
Significance of Moran's autocorrelation coefficient
A test for checking the significance of Moran's autocorrelation coefficient serves the purpose of verifying the hypothesis about a lack of correlation between and spatial lag .
Hypotheses:
The test statistic has the form presented below:
where:
– the expected value,
– variance.
Depending on the assumption concerning the distribution of the population from which the sample has been taken, the manner of selecting variance is chosen (Cliff and Ord (1981)2), and Goodchild (1986)3)). If it is normal distribution, then:
where:
,
.
If it is random distribution, then:
where:
,
.
Statistics asymptotically (for a large sample size) has the normal distribution.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
The window with settings for Moran's analysis
is accessed via the menu Spacial statistics
→Tools
→Moran's global I statistic
.
EXAMPLE (catalog: leukemia
, file: leukemia.pqs
)
The analysis will concern the data gathered and analyzed by L.A. Waller and others in 19924) and 19945), described on 281 objects in 20046).
leukemia
contains information about the location of 281 polygons (census tracts) in the northern part of the state of New York. The map is prepared in the set of flat rectangular coordinate system UTM 18N and is based on the data of the file BNA (Boundary File) available on the server CIESIN ftp://ftp.ciesin.columbia.eduleukemia
:CASES
– the number of cases of leukemia in the years 1978-1982, ascribed to particular objects (census tracts). The value should be an integral number, however, in agreement with Waller's (1994) description, some cases which could not be objectively ascribed to a particular region have been divided proportionately. Hence, the numerousnesses of the cases ascribed to the 281 objects are not integral numbers.POP
– population size in particular objects.prev
– the frequency coefficient of leukemia per 100000 people, for each object in one year: prev=(CASES/POP)*100000/5Epidemiologically interesting are the regions in which the prevalence of leukemia is higher, as their grouping could indicate the existence within their boundaries of environmental teratogens causing an increased frequency of occurrence of leukemia.
We start from presenting the geographic distribution of the frequency coefficient (prev) on the map. For that purpose we draw a map in the Map Manager and edit the layer , choosing Graduated colors
:
We have at our disposal several ways of coloring a map – we choose coloring in accordance with the values of the variable prev
, dividing it into quartiles:
Dark colors on the map present the places with a higher frequency coefficient of leukemia, whereas light places signify a lower frequency coefficient. In order to learn if their geographic distribution is random or if they forms clusters, we will calculate Moran's coefficient. Before calculating that coefficient we should determine the manner of defining neighborhood of regions and it is advisable to create an appropriate weights matrix. In Moran's analysis window we can choose any matrix generated previously by using menu Spatial analysis
→ Tools
→ Spatial weights matrix
or indicate the neighbor matrix according to contiguity – Queen, row standardized, that is proposed by the program.
Having generated the weights matrix we select the file leukemia and start Moran's analysis by selecting the menu Spatial analysis
→ Spatial statistics
→ Global Moran's I statistic
. In the analysis window we select the variable Prev
and the neighbor matrix Queen
, and select the option Add graph
.
Moran's correlation coefficient obtained in the analysis is small and has the value :
When we test the significance of Moran's coefficient we study the randomness of the distribution of the frequency coefficient of leukemia in the studied region. We check if similar shades on the map are located close to one another or not. In other words: we check if the odds of having leukemia in the studied population depends on geographic location or not. The value calculated with the assumption of randomness, as in the case of the assumption of normality, is greater than the standard assumed significance level 0.05, which means that there is no evidence for autocorrelation. Thus, we assume that the distribution of the variable prev
is a random distribution. Moran's diagram confirms that assumption:
The existence of positive autocorrelation, in which we are the most interested, would result in the distribution of the points of the Moran's diagram in quarters I and III. Here, however, we see that the points are as frequent in quarters I and III as in II and IV.