Lagrange Multipliers
Contents
Introduction
Fundamentally, Lagrange multipliers allow users to determine whether a spatial lag model (SLM), or spatial error model (SEM) (or neither) is the most appropriate treatment for your analysis.
Anselin (1988b) proposed Lagrange Multiplier diagnostic tests for OLS models where the focus is on detecting model misspecification due to spatial dependence in one of two forms. The first of these tests whether there is an omitted spatially lagged dependent variable(\(LM_{lag}\)), while the second tests for spatial error autocorrelation (\(LM_{error}\)).
Anselin et al (1996), building on the work from Bera and Yoon (1993), extended this to include more robust tests that include the possible presence of the other form of spatial dependence. Specifically these are a test for an omitted spatially lagged dependent variable in the possible presence of spatial error autocorrelation (\(RLM_{lag}\)), as well as a test for spatial error autocorrelation in the possible presence of a spatially lagged dependent variable (\(LM_{error}\)). In addition they derive a portmanteau test that incorporates both aspects by combining the test for spatial error autocorrelation with the test for an omitted spatially lagged dependent variable in the presence of spatial error autocorrelation (SARMA). The tests are all based on the results of ordinary least-squares (OLS) estimation.
If we use the notation which is explained fully in the next section, the LM test for an omitted spatially lagged dependent variable tests that \(\rho = 0\), while the LM test for error autocorrelation tests whether \(\lambda = 0\). LM tests can be used then to determine which of the models below is the more appropriate (see for example Elhorst, 2010) If, for example we find that indeed \(\lambda = 0\), but \(\rho \neq 0\) then the appropriate model may be the spatial lag model (see next section):
\(y = \rho Wy+ X\beta + \epsilon\)
The LM statistics follow a chi-squared distribution with the number of degrees of freedom being the number of coefficients being tested. Hence the LM statistics \(LM_{lag}\), \(LM_{error}\), \(RLM_{lag}\) and \(RLM_{error}\) are asymptotically chi-squared distributed with one degree of freedom. This would mean that for, say the \(LM_{lag}\) test, we reject the null hypothesis of not requiring a spatially lagged dependent variable if the test statistic exceeds 3.84 at the 5 per cent level. The SARMA is asymptotically chi-squared distributed with 2 degrees of freedom.
The decision tree on whether to use the SEM or the SLM using the Lagrange multipliers is shown below (taken from here)
Inputs
In order to illustrate the Lagrange Multipliers in use, we will look at the relationship between Type 2 Diabetes and socio-economic advantage/disadvantage in the Greater Melbourne area. To do this:
- Select Melbourne GCCSA as your area
- Select the following datasets:
- SA2 SEIFA 2011 – The Index of Relative Socio-Economic Advantage and Disadvantage (IRSAD), with all variables
- SA2 Chronic Disease – Modelled Estimate, selecting only SA2 Name, SA2 Code, and Diabetes – Rate per 100.
- Merge the datasets together, naming it Disadvantage and Diabetes
- Spatialise the dataset, naming it Spatialised Disadvantage and Diabetes
- Generate a Contiguous Spatial Weights Matrix, naming it Contig SWM Disadvantage and Diabetes, with 1st order Queen contiguity.
Once you have completed these steps, open the Lagrange Multipliers tool (Tools → Spatial Statistics → Lagrange Multipliers) and enter the parameters as shown below (these are explained below the image)
- Dataset input: the spatialised dataset that contains the variable(s) to be tested. Here we choose Spatialised Disadvantage and Diabetes
- Spatial Weights Matrix: the spatial weight matrix to be used, probably derived from one of the methods above. We select Contig SWM Disadvantage and Diabetes in this instance
- Regression Dependent Variable: the dependent variable(s) of the regression equation. Here we select Diabetes – Rate per 100
- Variable(s): the independent variable(s) of the regression equation. Here we choose Score.
Once you have entered your parameters, click Add and Run to execute the tool.
Outputs
Once your tool has run, click the Display button. This will open up a text editor with your LM results. It should look something like the image below. The results are explained underneath it
The results show- \(LM_{error}\) statistic, degrees of freedom and p-value. This is the test for whether an error model would be appropriate. In this instance it is highly significant (top of green box).
- \(LM_{lag}\) statistic, degrees of freedom and p-value. This is the test for whether a lag model would be appropriate. In this instance it is also highly significant (bottom of green box). This means that our first round of tests does now allow us to choose between an SEM and an SLM. We move onto the second round of robust tests (red box)
- \(RLM_{error}\) statistic, degrees of freedom and p-value. This is the second robust test for whether an error model would be appropriate. In this instance it is highly significant (top of red box)
- \(RLM_{lag}\) statistic, degrees of freedom and p-value. This is the second robust test for whether a lag model would be appropriate. In this instance it is non-significant (bottom of red box). This means that an SEM is appropriate for our treatment of the data.
- SARMA statistic, degrees of freedom and p-value