# Use Case: Mapping, Charting and Statistical Analysis – Polling Booth Data

## Contents

## Introduction

For this use case, we will look at polling booth data from the 2010 Federal Election to do some exploratory mapping and statistical analysis

## Choosing Area and Loading Data

*choose your state of interest*– we will be focussing on South Australia for the following exercise. After you have selected your area, you will need to

*load these data sets from 2010*:

## Mapping and Charting

To get a broad idea of how Adelaide voted, we will create a **box plot** of the voting patterns. To do this, open the *Tools* dialogue box, select *Charts *and then

*Box Plot*

**.**Enter the parameters for the tool as shown in the image below:

Once you have selected your variables, click the*Add and Run*button. This will execute the tool. You should get a dialogue box pop up looking like this: Once you click

*Display*your box plot should open, looking something like this: The ‘whiskers’ indicate a wide total range of voting patterns among polling booths, while the boxes represent the upper and lower quartiles respectively, and the black central line the median score.

We can also other chart tools to visualise how the variables are distributed. Open *Tools*, then *Chart Tools* and then *Histogram, *and then enter the parameters as shown below

*Add and Run*

**this will execute the tool, allowing the following histogram to be displayed: Now we want to see how the votes are distributed spatially across South Australia. To do this, we will create a**

**Choropleth**of one of the variables – the Percentage Swing to or away from the Coalition.

Open the *Maps, Charts and Graph *dialog in the **Visualise your data **panel and select **Map Visualisations** and then **Choropleth**. Enter the parameters below – you can select the kind of palette and the colours of the palette as you choose. Here we have chosen *Diverging *and *Spectral*.

*Add and Display*, you should get a result which looks something like this: Reds indicate negative swings, i.e. swings to other parties, and blues indicate positive swings to the coalition parties.

If you zoom into Adelaide you see a much finer grained pattern of polling booths, looking something like the following:

## Spatial Autocorrelation

However, we can check whether this is truly the case, rather than by chance alone, by running some **Spatial Statistics**.

The first step is to compute a **spatial weight matrix** (SWM) which calculates the “closeness” of each polling booth area to each other area. This can be done simply based on whether the polling booth areas touch (**Contiguous Spatial Weight Matrix**), or whether they are within a user specified distance or on distance between the polling booth areas (**Distance Spatial Weight Matrix**).

Before we can compute the spatial weights matrix we first need to link the geometry of the polling booth areas to a data set. To do this, open the** Spatial Data Manipulation** tools, and then select **Spatialise Aggegrated Dataset**, and enter your parameters as shown in the image below:

*Add and Run*to execute the tool. You don’t need to display the results, so just close the dialogue box that appears.

Next we want to create our spatial weights matrix. We will use the **Contiguous Spatial Weights** method, which counts a polling booth area if it shares a border or a corner with the polling booth area in question.

To create the spatial weights matrix, click the *Tools *button, then click

*Spatial Statistics Tools*, then

*Contiguous*

*Spatial Weights Matrix.*Enter your parameters as shown below – remembering to use your spatialised dataset, not the original dataset!

*Output: ContigSWM-WorkflowXXX*

**.**We can now use this in different spatial statistical tests. As an example we will use **Moran’s I**, which compares similarity in location with similarity in the attribute values (some detailed information about spatial autocorrelation and how we measure it can be found here)

To compute the Moran’s I statistic for our variable (the % swing to or from the Coalition), open* Tools, *then* Spatial Statistics *and then* Moran’s I.* Enter your parameters as shown in the image below:

*Add and Run*to execute the tool. It should take a few seconds to execute, and then give you a dialogue box like the one below If you click

*Display*this should bring up the values for the Moran’s I test, as shown below. The value that we are interested in is the upper most row. A positive value for Moran’s I indicates a clustering of high values with high values and low values with low values (positive spatial autocorrelation). The p-value is the usual test of statistical significance (usually if it is less than 0.05). So, as we suspected there is significant positive spatial autocorrelation in the swing to the Coalition, with a Moran’s I value of 0.415

We can also visualise this relationship as a **Moran I Scatter Plot***. *To do this, select *Tools, *then *Charts, *then *Moran’s I Scatterplot, *and enter your parameters as shown below.

*Add and Run*to execute the tool. This will bring up a dialogue box as shown below Clicking the

*Display*button will bring up your Moran’s I scatter plot, which should like the image below: We can see that the relationship between the z-scaled variable, and the values of the variables around it is positive. This means that areas with a high coalition swing tend to be clustered together, and vice versa. More information on how to interpret a Moran’s I Scatter plot is available

**here.**

## Regression and ANOVA Analyses

Firstly, we will superimpose the percentage of Green voters over the % Coalition Swing, using a **choropleth centroid **map. To do this, open up the Maps, Graphs and Visualisations menu, select **Map Visualisations** and then *Choropleth –* *Centroid. *Enter the parameters for this as shown below:

To check the statistical strength of this pattern we can use the** Linear Regression Plot** to visualise and estimate some of the statistics. This can be found under *Tools*, then *Chart Tools* then *Regression Simple Linear Plot. *Enter your parameters as shown below, and then click *Add and Run.*

*Display*button that appears to bring up the following chart: This implies that our interpretation may be correct, as the Green vote is dropping as the swing to Coalition increases, but is this statistically significant?

To test the significance we can run a **Regression **analysis. To do this open *Tools*, then click * Statistical Analysis* and then* Regression. * Input the following parameters:

*Add and Run*will cause this to execute, and you can click

*Display*on the resultant dialogue box to open up the text window of the regression output. The start of this window just lists the variables, but if you scroll right to the bottom, you should see something that looks like this: We can now see that the coefficient is highly significant (essentially close to zero) although the relationship is not strong, with a very small R-squared (just over 0.05).

Can we find any variables with stronger **correlation**? Using the *Tools → **Statistical Analysis → Correlation* tool we can add as many variables as we wish. Enter the parameters as shown below, making sure you select *% Coalition Swing, % of Australian Labor Party, % Coalition, % Greens *and *% Independents *as your variables:

*Add and Run*to execute the tool, and then when the tool has finished running click

*Display*to see the results: By looking at the first column, this shows that the swing to the Coalition tended to be positive in booths with higher ALP votes, and negative in booths that voted for the Coalition, which perhaps indicates a rebalancing in some areas previous election. Additionally, the swing towards the Coalition tended to be negative in booths with higher Independent vote shares. All these correlations were greater than the correlation with the Green vote.

We have other variables (demographic and socio-economic) we can also test, but first they need to be joined into the same table with the voting patterns. To do this we can use the **Inner Join** tool, joining the first two datasets and then joining the result of that with the third.

To do the **first** join, open the Tabular Inner Join tool (*Tools → Data Manipulation Tools → Tabular Inner Join*) and enter the parameters as below and click *Add and Run*:

*Data*panel, named “

*Output: inner-join XXX*“. You should rename this (using the spanner symbol on the right of the dataset) to “

*Join 1: Socioeconomic + Demographic*” This indicates the order that you joined the datasets (L + R).

Once you have renamed this dataset, we now need to join this to the second dataset. Once again, open up Tabular Inner Join tool (*Tools → Data Manipulation Tools → Tabular Inner Join*) and enter the parameters as shown below. It is very important that you specify the left and right datasets correctly. Once you have entered the parameters as below, click *Add and Run.*

*“Join 2: Socioeconomic Demographic + Voting”,*as indicated in the blue box in the following image: We can now run a new correlation analysis, looking to see if socioeconomic and demographic factors are associated in any way with the swing to the Coalition within polling booth areas of South Australia

Again, open up the correlation tool, and enter the parameters as shown below. Be sure to choose % Single Person Households, (*lone_p_h_*), % Home Ownership (*own_h_*), % Home Buying (*rent_h_*), % With No Religion (*no_relig_*) and % With a University Degree (*h_degree_*), and Coalition Swing (*M1_**tpp_swing*) as your correlation variables.

*Add and Run*, the tool should execute and you should get a dialogue box pop up to show the tool has run – click

*Display*to get the output of the correlation. It should look something like the box below. The bottom row (indicated in blue box here) is the list of correlation coefficients for % Coalition swing with the other variables selected. We can see that areas with a greater swing towards was correlated with fewer single person households, fewer individuals owning their own homes, and fewer people with a higher degree. By contrast, areas with a greater swing towards the Coalition were correlated with more people without a religion, and where people were buying a house.

To test further we can do a multiple **linear regression. **Choose the *Tools* → *Statistical Analysis* → *Regression* and enter parameters as below. Remember to put % Coalition Swing (*M1_tpp_swing*) as your Dependent Variable and the other variables you used for the correlation as your Independent Variables. Once you’ve entered your parameters, click *Add and Run*.

*Display*button to show the output. Once you scroll to the bottom, you should see the following output. This shows that only rates of owning a Home and buying a Home were significantly associated with a swing towards the Coalition (their P values were lower than 0.05 – blue box). Together, these variation in these factors only explained about 11.75% of variation in the swing towards or away from the Coalition in South Australia in 2010, at the polling booth level (green box). We can also use

**ANOVA**to explore these relationships.

For ANOVA, our input data needs to be classified first. This can be achieved using the **Classifiers **tool. Open this tool (*Tools → **Statistical Analysis → Classifiers*) tool, and enter the parameters as shown below. The parameters you need to classify are:

- % Single Person Households (
*lone_p_h_*); - % Home Ownership (
*own_h_*); - % Home Buying (
*rent_h_*); - % With No Religion (
*no_relig_*); and - % With a University Degree (
*h_degree_*)

*Add and Run*to execute the tool

This creates a new dataset, and adds new columns at the right of the table. These columns include the designated class of the classified attributes, and the lower and upper bounds of the assigned classes. For this exercise you could classify %home owners, % home purchasers, %no religion and %with a degree or higher qualification into two classes each.This will give appropriate input data for the ANOVA.

Once the new classifications are available in the Output Table we can run the **ANOVA. **You will need to select the *classified* dataset, and you will need to use the *classified *versions of the attributes we used (listed above), right at the end of the drop down menu (i.e. the ones that end with *_Class *).

Enter the parameters as shown below and click *Add and Run*

*Display*on the pop up box, to show the results of the ANOVA. It should look something like the image below. The statistical significance of the results are boxed in red. This shows that living alone, owning a home or buying a home had a significant impact on the coalition swing, but having no religion or having a higher degree had no significant impact.