Analysing Industry Clustering

Introduction

In this tutorial, we will introduce the Employment Clustering tool in the AURIN portal. You can read about the background of this tool here, or view the tutorial video here 

Understanding ANZSIC Codes

The Australian and New Zealand Standard Industrial Classification (ANZSIC) is central to using this tool. ANZSIC classifies the industry categories into 4 hiearchical levels. Here is a quick example:

[Click to Enlarge: ]

[Click to Enlarge]

For the top level, which is denoted with one letter “E”, stands for general construction industry; ’30’ and ’31; sitting on the second level are more specific construction types (Building and Heavy and Civil Engineering, respectively). It goes way down to the most specific construction categories such as “House Construction” and “Road and Bridge Construction” which are represented with 4-digit codes.

Dataset Preparation

Before using this tool, we need to access a series of pre-cooked datasets (also called basemaps). These datasets can be accessed via Cloudstor

The datasets are organised at the Local Government Area (LGA) level.

For this example we are have access to data from 13 LGAs located within North-West Melbourne:

[Click to Enlarge]

[Click to Enlarge]

Download the entire library to your hard drive extract all of the files within in it – these will be 13 more zip files, one for each LGA.

For this tutorial we will focus on the Hume LGAOnce you extract the files within the LGA_Hume.zip file, there should be two additional zip files:

  • 2006_LGA_Hume.zip
  • 2011_LGA_Hume.zip

While you need to extract these from the LGA_Hume.zip file, you should then leave them as separate zip files (i.e. don’t extract them further), because they need to be uploaded to the portal as zip files.

You now need to upload both of these zip files to the portal. To do this for each one, click the Upload button under the Data panel, and enter the parameters as shown below:

[Click to Enlarge]

[Click to Enlarge]

The basemaps are in shapefile format (.shp).

Their geometries come from the ABS Destination Zones (DZNs, which you can download it from here) defined by ABS, while their main attributes (job numbers in each industry group) come from the ABS 2011 Census – Counting Employed Persons, Place of Work database.

The complexity of these basemaps arises from the overlay of planning zone (see the Victoria Planning Provisions) onto the DZNs. Because DZNs are quite large, they don’t by themselves reveal  many details of how different industry jobs are scattered within it. On the other hand, Planning Zones are at a finer scale and more importantly it provides rules for the existence of any industry categories within the zones.

As a result, we broke a DZN into constituent pieces pieces, by clipping them with the overlaid planning zone polygons, checked which industry categories were permitted in each of the clipped polygons, and then recalculate the job numbers for each permitted category for each newly created clipped polygon. (Check out this paper for more details, and see the image below)

[Click to Enlarge]

[Click to Enlarge]

Running the Tool

  • Step 1: Upload employment basemap zip files for 2006 and/or 2011 as described above. Be sure to select Non-Aggregated for geographic level
  • Step 2: View Basemap Geometry. In your dataset list, click the spanner button and choose Display on Map. This will open up a dialogue box where you can select the colours you would like your polygons to be while you visualise them. Click Update and Display to show them on your map. They should look something like the image below (depending on the colour that you choose!)
[Click to Enlarge]

[Click to Enlarge]

You can turn this base map on or off by clicking the eye icon next to it.

  • Step 3: Open Employment Clustering Tool. Click the Tools button in the Analyse panel, and select Specialist Tools → Employment Clustering, and enter the parameters as shown below. As the name suggests, this is a specialist tool which would benefit some knowledge of the datasets (such as ANZSIC codes, planning provisions) as well as Ward’s hierarchical clustering algorithm. Read this paper to understand algorithm parameters.
  • Step 4: Enter your parameters to match the screenshot below. Take particular care when entering the Wards Clustering Non-Spatial Attribute Selection. There is a substantial list in the drop down, so keep your eye out for E000 and I000 (Construction and Transport). Click Add and Run.
[Click to Enlarge]

[Click to Enlarge]

Once the workflow is done, two output datasets are added to the dataset list.

The first one is a pure table summarising job numbers in each industry category for every cluster (shown below). The value in column wardclut represents the ward clustering result. Five clusters are formed based on the parameter inputs

[Click to Enlarge]

[Click to Enlarge]

By contrast, the second output (shown below) contains cluster details for each individual polygon in the dataset

[Click to Enlarge]

[Click to Enlarge]

It’s worthwhile renaming these outputs to differentiate between them!

  • Step 6: Visualise the Clusters. We will do this by creating a Choropleth map to visualise the clusters. Choose the second output as the dataset to be visualised (i.e. the larger table), and choose wardclut as the attribute. Since we have 5 clusters, just put the number of classes to 5. Select Pre-classified as your classifier. Set the rest settings as follows:
[Click to Enlarge]

[Click to Enlarge]

Once you have clicked Add and Display, the choropleth should appear on your screen and appear something like the following

[Click to Enlarge]

[Click to Enlarge]

That’s all for this tutorial. Thanks for reading and hope you find AURIN portal is useful for your research.