Getis-Ord Local G

Introduction

The \(G_{i}\) statistics are known to be useful for identifying “hot and/or cold spots” and to check for heterogeneity in the dataset. \(G_{i}\) (Getis and Ord, 1992) statistics are the ratio of the sum of values in neighbouring locations, defined by a given distance, to the sum over all observations.

Like Local Moran’s I, \(G_{i}\) statistics can detect local ‘pockets’ of dependence that may not show up when using global statistics, for example they isolate micro-concentrations in the data which are otherwise swamped by the data’s overall randomness. The form of the \(X\) matrix is additive, either \((X_{j})\) or \((X_{i} + X_{j})\) for the self-included measure. This contrasts with the Moran statistic where the matrix is in the form \((X_{i} – \overline{X})(X_{j}- \overline{X})\) or Geary’s C which specifies the matrix in the form \((X_{i} – X_{j})^2\). As \(G_{i}\) measures are measures of concentration, they are in contrast to the Moran statistic which examines the correlation or covariance of values in neighbouring regions compared to the data’s overall variance (while Geary’s C calculates differences). \(G_{i}\) statistics evaluate association by examining ‘additive qualities’ (Getis and Ord, 1996, p. 262), they compare local weighted averages to global averages to isolate ‘coldspots’ and ‘hotspots’.

The interpretation of \(G_{i}\) is also somewhat different from other measures of spatial association. When \(G_{i}\) is larger than its expected value, the overall distribution of the variable being measured can be seen as characterised by positive spatial autocorrelation, with high value clusters prevalent. If \(G_{i}\) is smaller than its expected value, then the overall distribution of the variable being measured is still characterised by positive spatial autocorrelation, but with low values clustered together.

A special feature of this statistic is that it equals 0 when all \(X\) values are the same. Also while the weighted value of an \(X\) value might be expected to rise with the number of neighbours (or weighted regions), all else equal, a region that has a greater number of neighbours does not receive a greater \(G_{i}\). Only when the observed estimate in the vicinity of the region \(i\) varies significantly from the mean of the variable does \(G_{i}\) rise (Getis and Ord, 1992).

A slightly different form of \(G_{i}\) was suggested by Ord and Getis (1995), \(G_{i}(d)\) originally proposed for elements of a symmetric binary weights matrix, was extended to variables that do not have a natural origin and to non-binary standardised weight matrices (Ord and Getis, 1995: 289): This statistic for each region \(i\) is:

\(G_{i}(d) = {{\sum\nolimits_{j}w_{ij}(d)X_{j}-W_{i}\mu}\over{\sigma\{[(n-1)S_{1i}-W_{i}^2]/(n-1)\}^2}}, j\neq i\)

where \(w_{ij}\) is the spatial weight matrix element, \(X_{j}\) is the variable, \(W_{i} = \sum\nolimits_{j}w_{ij}\), \(S_{1i} = \sum\nolimits_{j}w_{ij}^2\) and \(\mu\) and \(\sigma\) are the usual sample mean and standard deviation for the sample size of n-1. \(d\) is the threshold distance from \(i\). \(G_{i}^*(d)\) includes the case where \(i = j\).

The Getis-Ord \(G_{i}\) are statistics for local spatial association but are not LISAs given the criteria established by Anselin (1995). Their individual components are not related to the global statistic of spatial association (\(G\)). Anselin notes that “this requirement is not needed for the identification of local spatial clusters, but it is important for the second interpretation of a LISA, as a diagnostic of local instability in measures of global spatial association (for example in the presence of significant global association)” (Anselin, 1995: 102).

The results firstly produce the \(G_{i}\); for each area \(i\) as a standardised z-value. Getis and Ord (1992) argued that inference, as with global measures are based on calculating a standardised value and comparing this against a null which is assumed to follow a normal distribution. However a normally distributed null may not be an appropriate assumption, as Local \(G_{i}\) are not independent of each other by design (Ord and Getis, 1995). By definition one region may appear in a number of different region’s weighting vectors. This raises the general issue that for local measures of spatial autocorrelation inference is complicated as statistics will be correlated when weights contain the same elements which they do. This is a problem of multiple statistical comparison and reflects “the built-in correlatedness of measures for adjoining locations” (Anselin, 1995: 112). This requires a more stringent test to be able to assert spatial non-randomness, that is, to assert the presence of spatial autocorrelation at the local level. Anselin (1995: 96) notes “This means that when the overall significance associated with the multiple comparisons (correlated tests) is set to \(\alpha\), and there are \(m\) comparisons, then the individual significance \(\alpha_{i}\) should be set to \(\alpha/m\) (Bonferroni) or \(1 – (1 – \alpha)^{1/m}\)

Inputs

To compute the Getis-Ord Local G statistics, we will look at socio-economic data in Adelaide to examine the extent of spatial-autocorrelation.

To do this:

  • Select Adelaide GCCSA as your area
  • Select SA1 SEIFA 2011 – The Index of Relative Socio-Economic Disadvantage (IRSD)  as your dataset, selecting all variables
  • Spatialise the dataset, naming it something like SPATIALISED SEIFA IRSD Adelaide
  • Generate a Contiguous Spatial Weights Matrix for the spatialised dataset, using 1st order Queen contiguity. Name it something like Contig SWM Adelaide SA1s

Once you have done this, open the Getis-Ord Global G tool (Tools → Spatial Statistics → Getis-Ord Local G) and enter the parameters as they appear in the image below. These are also explained underneath the image

[Click to Enlarge]

[Click to Enlarge]

  • Dataset Input: the dataset that contains the variable(s) to be tested. Here we use the dataset named SPATIALISED SEIFA IRSD Adelaide
  • Spatial Weights Matrix: the spatial weight matrix to be used (described here). In this instance we use the one name Contig SWM Adelaide SA1s
  • Key Column: specify the unique codes for your areas. In this case, we will use SA1 Code
  • Variable: the variable(s) to be tested. Here we use Score
  • Matrix type indicates whether the spatial weights matrix should include \(w_{ii}\) > 0. Inclusion of the self weight \(w_{ii}\) > 0 corresponds to \(G_{i}*\)

Once you have entered the parameters, click Add and Run to execute the tool

Outputs

Your output will be a dataset that can be mapped based on a number of the variables produced by the analyses. These are explained below

  • Z value: This is the z score for the variable that you chose to include in the analysis. In this instance, it is the IRSD decile score. The Z score is calculated by taking the variable value, subtracting the sample mean and dividing that by the sample standard deviation. It is a measure of how far away from the mean the score is.
  • p-value_(norm): The statistical significance of your z score using normal significance testing
  • p-value_(bon): The statistical significance of your z score using a Bonferroni significance testing adjustment method
  • map_group_(norm): The map group that each area belongs to: 1 = Cluster of High Values, 2 = Cluster of Low Values, 0 = Non Significant when using normal statistical significance testing
  • map_group_(bon): The map group each area belongs to: 1 = Cluster of High Values, 2 = Cluster of Low Values, 0 = Non Significant when using Bonferroni statistical significance testing
  • map_group_name(norm): The map group that each area belongs to: High = Cluster of High Values,  Low = Cluster of Low Values, and Non Significant when using normal statistical significance testing
  • map_group_name(bon): The map group that each area belongs to: High = Cluster of High Values,  Low = Cluster of Low Values, and Non Significant when using Bonferroni statistical significance testing

You can create a choropleth of these variables. For the image below, we have chosen map_group_name(norm). Red indicates SA1s of high index scores (low disadvantage) clustered together; blue indicates low index scores (high disadvantage); green indicates non-significant clusters

[Click to Enlarge]

[Click to Enlarge]