Use Case: Mapping, Charting and Statistical Analysis – Polling Booth Data

Introduction

For this use case, we will look at polling booth data from the 2010 Federal Election to do some exploratory mapping and statistical analysis

Choosing Area and Loading Data

First, choose your state of interest – we will be focussing on South Australia for the following exercise.

[Click to Enlarge]

[Click to Enlarge]

After you have selected your area, you will need to load these data sets from 2010:

[Click to Enlarge]

[Click to Enlarge]

Mapping and Charting

To get a broad idea of how Adelaide voted, we will create a box plot of the voting patterns. To do this, open the Tools dialogue box, select Charts and then Box Plot.

Enter the parameters for the tool as shown in the image below:

[Click to Enlarge]

[Click to Enlarge]

Once you have selected your variables, click the Add and Run button. This will execute the tool. You should get a dialogue box pop up looking like this:

[Click to Enlarge]

[Click to Enlarge]

Once you click Display your box plot should open, looking something like this:

[Click to Enlarge]

[Click to Enlarge]

The ‘whiskers’ indicate a wide total range of voting patterns among polling booths, while the boxes represent the upper and lower quartiles respectively, and the black central line the median score.

We can also other chart tools to visualise how the variables are distributed. Open Tools, then Chart Tools and then Histogram, and then enter the parameters as shown below

[Click to Enlarge]

[Click to Enlarge]

Once you click Add and Run this will execute the tool, allowing the following histogram to be displayed:

[Click to Enlarge]

[Click to Enlarge]

Now we want to see how the votes are distributed spatially across South Australia. To do this, we will create a Choropleth of one of the variables – the Percentage Swing to or away from the Coalition.

Open the Maps, Charts and Graph dialog in the Visualise your data panel and select Map Visualisations and then Choropleth. Enter the parameters below – you can select the kind of palette and the colours of the palette as you choose. Here we have chosen Diverging and Spectral.

[Click to Enlarge]

[Click to Enlarge]

Once you click Add and Display, you should get a result which looks something like this:

choro_use_case_1_sa

[Click to Enlarge]

Reds indicate negative swings,  i.e. swings to other parties, and blues indicate positive swings to the coalition parties.

If you zoom into Adelaide you see a much finer grained pattern of polling booths, looking something like the following:

[Click to Enlarge]

[Click to Enlarge]

Spatial Autocorrelation

Your initial mapping exercise certainly gives the impression that the swings are not randomly distributed: polling booths that are close to each other tend to have similar patterns of swings – negative swing polling booths tend to be found with other negative swing polling booths and vice versa.

However, we can check whether this is truly the case, rather than by chance alone, by running some Spatial Statistics.

The first step is to compute a spatial weight matrix (SWM) which calculates the “closeness” of each polling booth area to each other area. This can be done simply based on whether the polling booth areas touch (Contiguous Spatial Weight Matrix), or whether they are within a user specified distance or on distance between the polling booth areas (Distance Spatial Weight Matrix).

Before we can compute the spatial weights matrix we first need to link the geometry of the polling booth areas to a data set. To do this, open the Spatial Data Manipulation tools, and then select Spatialise Aggegrated Dataset, and enter your parameters as shown in the image below:

[Click to Enlarge]

[Click to Enlarge]

Click Add and Run to execute the tool. You don’t need to display the results, so just close the dialogue box that appears.

Next we want to create our spatial weights matrix. We will use the Contiguous Spatial Weights method, which counts a polling booth area if it shares a border or a corner with the polling booth area in question.

To create the spatial weights matrix, click the Tools button, then click Spatial Statistics Tools, then Contiguous Spatial Weights Matrix. Enter your parameters as shown below – remembering to use your spatialised dataset, not the original dataset!

[Click to Enlarge]

[Click to Enlarge]

Once you click Add and Run, this will produce an output file named Output: ContigSWM-WorkflowXXX.

We can now use this in different spatial statistical tests. As an example we will use Moran’s I, which compares similarity in location with similarity in the attribute values (some detailed information about spatial autocorrelation and how we measure it can be found here)

To compute the Moran’s I statistic for our variable (the % swing to or from the Coalition), open Tools, then Spatial Statistics and then Moran’s I. Enter your parameters as shown in the image below:

[Click to Enlarge]

[Click to Enlarge]

Once you have entered your parameters, click Add and Run to execute the tool. It should take a few seconds to execute, and then give you a dialogue box like the one below

[Click To Enlarge]

[Click To Enlarge]

If you click Display this should bring up the values for the Moran’s I test, as shown below. The value that we are interested in is the upper most row.

[Click to Enlarge]

A positive value for Moran’s I indicates a clustering of high values with high values and low values with low values (positive spatial autocorrelation). The p-value is the usual test of statistical significance (usually if it is less than 0.05). So, as we suspected there is significant positive spatial autocorrelation in the swing to the Coalition, with a Moran’s I value of 0.415

We can also visualise this relationship as a Moran I Scatter Plot. To do this, select Tools, then Charts, then Moran’s I Scatterplot, and enter your parameters as shown below.

[Click to Enlarge]

[Click to Enlarge]

Once you have entered your parameters, click Add and Run to execute the tool. This will bring up a dialogue box as shown below

[Click to Enlarge]

[Click to Enlarge]

Clicking the Display button will bring up your Moran’s I scatter plot, which should like the image below:

[Click to Enlarge]

[Click to Enlarge]

We can see that the relationship between the z-scaled variable, and the values of the variables around it is positive. This means that areas with a high coalition swing tend to be clustered together, and vice versa. More information on how to interpret a Moran’s I Scatter plot is available here.

Regression and ANOVA Analyses

Next, we want to look at whether other factors are associated with the % Coalition Swing in polling booths around South Australia.

Firstly, we will superimpose the percentage of Green voters over the % Coalition Swing, using a choropleth centroid map. To do this, open up the Maps, Graphs and Visualisations menu, select Map Visualisations and then Choropleth – Centroid. Enter the parameters for this as shown below:

Greenscentrodinputs

[Click to Enlarge]

This should result in the following map appearing on your portal screen:

[Click to Enlarge]

[Click to Enlarge]

There appears on the map to be a greater Green party vote in areas with a swing away from the Coalition (yellow to red) than in those with a swing towards the Coalition (blue).

To check the statistical strength of this pattern we can use the Linear Regression Plot to visualise and estimate some of the statistics. This can be found under Tools, then Chart Tools then Regression Simple Linear Plot. Enter your parameters as shown below, and then click Add and Run.

[Click to Enlarge]

[Click to Enlarge]

Once this has run, click on the Display button that appears to bring up the following chart:

[Click to Enlarge]

This implies that our interpretation may be correct, as the Green vote is dropping as the swing to Coalition increases, but is this statistically significant?

To test the significance we can run a Regression analysis. To do this open Tools, then click  Statistical Analysis and then Regression.  Input the following parameters:

[Click to Enlarge]

[Click to Enlarge]

Note that more than one independent variable can be entered, but we are only choosing one. Clicking Add and  Run will cause this to execute, and you can click Display on the resultant dialogue box to open up the text window of the regression output. The start of this window just lists the variables, but if you scroll right to the bottom, you should see something that looks like this:

[Click to Enlarge]

[Click to Enlarge]

We can now see that the coefficient is highly significant (essentially close to zero) although the relationship is not strong, with a very small R-squared (just over 0.05).

Can we find any variables with stronger correlation? Using the Tools → Statistical Analysis → Correlation tool we can add as many variables as we wish. Enter the parameters as shown below, making sure you select % Coalition Swing, % of Australian Labor Party, % Coalition, % Greens and % Independents as your variables:

[Click to Enlarge]

[Click to Enlarge]

Click on Add and Run to execute the tool, and then when the tool has finished running click Display to see the results:

[Click to Enlarge]

[Click to Enlarge]

By looking at the first column, this shows that the swing to the Coalition tended to be positive in booths with higher ALP votes, and negative in booths that voted for the Coalition, which perhaps indicates a rebalancing in some areas previous election. Additionally, the swing towards the Coalition tended to be negative in booths with higher Independent vote shares. All these correlations were greater than the correlation with the Green vote.

We have other variables (demographic and socio-economic) we can also test, but first they need to be joined into the same table with the voting patterns. To do this we can use the Inner Join tool, joining the first two datasets and then joining the result of that with the third.

To do the first join, open the Tabular Inner Join tool (Tools → Data Manipulation Tools → Tabular Inner Join) and enter the parameters as below and click Add and Run:

[Click to Enlarge]

[Click to Enlarge]

Once this has run, it will appear in your Data panel, named “Output: inner-join XXX“. You should rename this (using the spanner symbol on the right of the dataset) to “Join 1: Socioeconomic + Demographic” This indicates the order that you joined the datasets (L + R).

Once you have renamed this dataset, we now need to join this to the second dataset. Once again, open up Tabular Inner Join tool (Tools → Data Manipulation Tools → Tabular Inner Join) and enter the parameters as shown below. It is very important that you specify the left and right datasets correctly. Once you have entered the parameters as below, click Add and Run.

[Click to Enlarge]

[Click to Enlarge]

Again it’s a good idea to rename the output dataset in your data panel: “Join 2: Socioeconomic Demographic + Voting”, as indicated in the blue box in the following image:

[Click to Enlarge]

[Click to Enlarge]

We can now run a new correlation analysis, looking to see if socioeconomic and demographic factors are associated in any way with the swing to the Coalition within polling booth areas of South Australia

Again, open up the correlation tool, and enter the parameters as shown below. Be sure to choose % Single Person Households, (lone_p_h_), % Home Ownership (own_h_), % Home Buying (rent_h_), % With No Religion (no_relig_) and % With a University Degree (h_degree_), and Coalition Swing (M1_tpp_swing) as your correlation variables.

[Click to Enlarge]

[Click to Enlarge]

Once you have clicked Add and Run, the tool should execute and you should get a dialogue box pop up to show the tool has run – click Display  to get the output of the correlation. It should look something like the box below. The bottom row (indicated in blue box here) is the list of correlation coefficients for % Coalition swing with the other variables selected.

[Click to Enlarge]

[Click to Enlarge]

We can see that areas with a greater swing towards was correlated with fewer single person households, fewer individuals owning their own homes, and fewer people with a higher degree. By contrast, areas with a greater swing towards the Coalition were correlated with more people without a religion, and where people were buying a house.

To test further we can do a multiple linear regressionChoose the Tools → Statistical Analysis → Regression and enter parameters as below. Remember to put % Coalition Swing (M1_tpp_swing) as your Dependent Variable and the other variables you used for the correlation as your Independent Variables. Once you’ve entered your parameters, click Add and Run.

[Click to Enlarge]

[Click to Enlarge]

After execution, click the Display button to show the output. Once you scroll to the bottom, you should see the following output. This shows that only rates of owning a Home and buying a Home were significantly associated with a swing towards the Coalition (their P values were lower than 0.05 – blue box). Together, these variation in these factors only explained about 11.75% of variation in the swing towards or away from the Coalition in South Australia in 2010, at the polling booth level (green box).

[Click to Enlarge]

[Click to Enlarge]

We can also use ANOVA to explore these relationships.

For ANOVA, our input data needs to be classified first. This can be achieved using the Classifiers tool. Open this tool (Tools → Statistical Analysis → Classifiers) tool, and enter the parameters as shown below. The parameters you need to classify are:

    • % Single Person Households (lone_p_h_);
    • % Home Ownership (own_h_);
    • % Home Buying (rent_h_);
    • % With No Religion (no_relig_); and
    • % With a University Degree (h_degree_)
[Click to Enlarge]

[Click to Enlarge]

Click Add and Run to execute the tool

This creates a new dataset, and adds new columns at the right of the table. These columns include the designated class of the classified attributes, and the lower and upper bounds of the assigned classes. For this exercise you could classify %home owners, % home purchasers, %no religion and %with a degree or higher qualification into two classes each.This will give appropriate input data for the ANOVA.

Once the new classifications are available in the Output Table we can run the ANOVAYou will need to select the classified dataset, and you will need to use the classified versions of the attributes we used (listed above), right at the end of the drop down menu (i.e. the ones that end with _Class ).

Enter the parameters as shown below and click Add and Run

[Click to Enlarge]

[Click to Enlarge]

Once you have run the tool, click Display on the pop up box, to show the results of the ANOVA. It should look something like the image below. The statistical significance of the results are boxed in red. This shows that living alone, owning a home or buying a home had a significant impact on the coalition swing, but having no religion or having a higher degree had no significant impact.

[Click to Enlarge]

[Click to Enlarge]