HASS DEVL/Tinker Training – Data Champions Day

In this tutorial we will undertake a range of tasks within the AURIN portal relating to geo-referencing.

Logging in to the AURIN Portal

Please note that the AURIN system is supported by Firefox, Chrome or Safari natively only.

If this is your first time visiting the AURIN portal you will be greeted with the Australian Access Federation (AAF) login screen:

If you are a student, staff member or researcher at an Australian University (that is, you have a .edu.au email address), you should already be able to access the portal by selecting your university from the drop-down list and using your regular login details.

A few other institutions have organisational access: AARNET, AIMS, CSIRO, INTERSECT, NICTA and TPAC

If you are not a member of the above list, you will need to register here. You will then receive a registration email from the AAF asking you to verify your account. Then you will need to log in via the AAF Virtual Home Network – the option as at the top of the organisation list

Navigating the AURIN Portal Interface

Once you have logged in to the AURIN Portal, you will be greeted by the main map interface, which looks something like the image below. The numbers in the image represent elements of the portal interface that you may find interesting or useful as you navigate your way through the portal

1. The Map Interface

The AURIN Portal is presented in a map based format, which allows you to zoom, pan and navigate with either your track pad or click wheel, or through the or – zoom function on the top left of the map. The red boundary on your map indicates the area of geography that you have selected for your study.

2. Select Your Area

You can choose what your study area is by the navigation box that appears when you first log in. This can be reopened at any point by clicking on any of the rows in the Area panel. Comprehensive information about how to select your area can be found here

3. Select Your Data

You can aquire your data by using the options present under the Data panel. Comprehensive information about how to select your area can be found here

4. Visualise Your Data

You can create maps and interactive charts from your data by using the options present under the Visualise panel. Comprehensive information on visualisations can be found here

5. Analyse Your Data

You can undertake simple and sophisticated analyses of your data by using the options present under the Analyse panel. Comprehensive information on analysis tools can be found here

6. The Help Button

On the top right of the portal is the blue Help button, which will link out to the AURIN Help pages that you are currently browsing. Wherever you see a blue ? question mark, this will also link out to the appropriate help page associated with that part of the portal.

7. MyAURIN

Clicking on the MyAURIN link will bring up a menu (shown below) with a number of options. Here you can create a new project, rename your project (always a good idea if you’re creating lots of projects!), open up a different project from your project list and reset your current project.

8. Report An Issue

If you come across a bug or a problem with the portal, please report an issue to the AURIN team by clicking on the Report an Issue button

Importing a Dataset into the AURIN Portal

In this section we are going to upload a dataset provided to us which contains the latitude and longitude (X,Y) co-ordinates of survey respondents

This dataset first needs to be downloaded from cloudstor here and saved to your desktop or wherever you see fit.

Now we are going to import this dataset into your session using the Import function of the Data panel, which looks like the button below

import
This button allows you to bring your own dataset into the AURIN portal. Clicking it calls the Upload dialogue boxes (shown below).

You will first need to upload either a zipped shapefile (.zip) or a comma delimited file (.csv), by browsing to the right directory on your computer

Browse to the dataset and upload

Once the data has uploaded, you will need to name the data and put an abstract (preferably with metadata details), and specify the level of geographic aggregation (Aggregation Level), and the attribute field which identifies each of the geographic units (SEQID in this instance) features or rows Once you have added the data that you wish to visualise, click the Add & Display button.Y

You will need to select Non Aggregated under Aggregation Level, as your dataset is not related to any of the standard geographies that we keep on the AURIN server.

Once this has uploaded click on the Display pop up that appears to display your data. It should look like the image below

Creating a Points dataset

Although our table doesn’t contain a great deal of information, two columns are very useful to us – the longx and laty columns. These give us the specific distance north or south of the equator (latitude) and east or west of the Greenwich Mean Time line of longitude. By plotting these x and y co-ordinates, we can can find a specific point in space for these survey respondents.

To do this we, we will use the Lat Lon to Points Dataset tool

  1. Open the tool (Tools → Spatial Data Manipulation → Lat Lon to Points Dataset)
  2. Enter the Parameters as you see them in the image below
  3. Click Add and Run

Once the tool has run you can open the table. You will see that it contains the exact same information as before, but now has an additional column in it called Geometry. This means that this dataset is now spatial. You can even now download the dataset as a shapefile.

The first thing you will want to do is rename this file something meaningful – click on the spanner next to the dataset named something like “Output: LatLon2Points-Workflow XXX” and rename it something descriptive. I have renamed mine Points: Geocoded

To have a look at what this dataset looks like on the map, click again on the spanner and click “Display on Map”. You can accept the default settings of the pop up dialogue and then click Update and Display. It should look something like the image below

Adding AURIN Datasets

As mentioned, our dataset doesn’t have much information attached to it. Now we are going to attach some of the information to it from some overlying geographic aggregations. In this instance the SA2s.

A note on statistical geographies

The Australian Bureau of Statistics maintain a range of geographies at which they release data from various products including the census. In this instance we will be using the Statistical Area Level 2 datasets, which are roughly equivalent to suburbs in metro regions

Now we’re going to bring in some AURIN data, in this instance information about the characteristics of the areas that each of the respondents live in. We will do this using the + Data button under the data panel

plusDataset
This button calls up the Data Browser window, giving users access to the datasets that are available for their use from the many AURIN data custodians around the country.

The first thing to note is that the number of datasets that are available to you when you open the Data Browser window is restricted by your area selection – the smaller the area that you select, the fewer the datasets that will be available within your session

There are many ways that you can filter and search for datasets within the data browser. You can do a Keyword search, limit your search to a specific level of aggregation or granularity, or limit to the organisation/data custodian that you’re interested in.

When you select a dataset (shown below), the dialogue box will then require you to select the attributes you want for the data. You can select all of the attributes by clicking the top left check box next to “Attributes” (in red box on image below). Additionally, you may be required to filter some of the specific attributes(in blue box on image below). The abstract for your data is provided in the top right of the Data Browser window (green box). as by year of collection, or age cohorts.

Once you have selected your dataset and attributes, you can click either Add or Add and Open. The former option will keep your Data Browser open, so that you can shop for more datasets, but it will not automatically retrieve your dataset from the custodian/source. You will still need to do this by clicking on the specific dataset entry in the Data panel later. The latter option will close your Data Browser window and automatically retrieve the dataset from the custodian/source

In this instance

  • Under Keyword type in Synthetic
  • Under Organisation select in UC_NATSEM (University of Canberra – National Centre for Social and Economic Modelling)
  • Under Aggregation Level select Statistical Area Level 2 (2016)
  • Click Search

This will bring up one dataset: NATSEM – Social and Economic Indicators – Synthetic Estimates SA2 2016. Click all of the attributes and Click Add and Open.

This will open the dataset, which you can explore and have a look at the various estimates of social inequality, income and poverty at SA2 level across Australia

Creating a Choropleth Map

Now we are quickly going to create a map of one of the variables in the dataset we have brought in. In my instance, I am going to map housing stress, but you can choose any of the variables.

A choropleth map is likely to be the most common kind of map visualisation that AURIN users will make with the data that they access. A choropleth (from Greek χώρο (“area/region”) + πλήθος (“multitude”)) is a thematic map in which areas are shaded or patterned in proportion to the measurement of a variable being displayed on the map

Choropleth maps can be created quickly and easily in the AURIN Portal, and provide a useful first pass at detecting and visualising interesting spatial patterns in your data.

To show how to do this, we will create the map above, but with a different colour palette. In order to do this:

Open the Choropleth tool (Maps, Charts and Graphs → Map Visualisations → Choropleth) and enter the parameters as shown in the image below. These will be explained below the image

To create a choropleth map, select Map Visualisations and then Choropleth in the Visualisations pop up window (shown below). This will bring up a range of fields that need to be populated.

  • Select a dataset: Here you can choose which of the datasets you would like to display as a map. In this exercise we use NATSEM – Social and Economic Indicators – Synthetic Estimates SA2 2016
  • Select an attribute: This is the field that you want to map. If you want your map to make sense, and actually display the variable you are interested in, it is important to make sure you have selected the right attribute to map together with right classifier. For this example I chose Housing Stress 30/40 rule
  • Select a classifier: Here we define how we break up our range of values int he attribute. For an attribute that is numerical in format (either an integer or a decimal),  the default setting for this field is Jenks (Natural Breaks), which breaks your data up into intuitive groups based on the shape of distribution of values. You can select Quantiles or Equal Intervals. If your attribute is categorical – that is, if it is a description or a word (such as a land use zone, or a name, or any kind of “string”) then the parameter will automatically set to Pre-classified. For this example I choose Quantile.
  • Number of Classes: This slider allows you to define the number of breaks in your data (minimum of 3, maximum of 12). The number that you choose should depend on the distribution of your values, the number of data points (areas) and the information that you are trying to portray with your data. For this example I chose 5
  • Select a palette type: Here you can choose the type of colour scheme for your data – Sequential, which shifts from a shade of one colour to another;  Qualitative, where the colours are unique along the palette (used for Pre-classified) ; and Diverging, where colours shift to two colours from a central point  along a natural spectrum. For this instance I chose Sequential.
  • Palette: This allows you to choose the actual colours of your palette (you can switch the ends of the palette around by clicking the Reverse Palette box at the bottom of the box. AURIN uses colours generated by Colour Brewer. For this example I chose blue
  • Default Opacity: This slider allows you to define how opaque your map is over the base map. 0.00 indicates completely transparent, 1.00 indicates completely opaque. Here I selected 1
  • Hover Opacity: This slider allows you to define how opaque you want specific areas to be when you hover over them with a mouse, with the same values as for Default Opacity. Here we select 0.85
  • Name: The default for this field is “Choropleth-X”. It’s a good idea to change the name of this to something that reflects the data, particularly if you plan on having multiple choropleth maps from different datasets. The name that you choose here will also be displayed in the legend automatically generated for your map. Here we use the name: Housing Stress (30/40 rule) – Australian SA2s 2016

Once you have selected your parameters click Add and Display.

[Your map should automatically appear on the screen, and should look something like the image below. It will also appear under your Visualise pane on the right. You can turn the map on and off by clicking on the little map icon to the left of the name of it in that pane.

You can edit aspects of the choropleth, such as the parameters you’ve chosen for its visualisation, or changing its name, by clicking on the spanner symbol to the right of the map name in the Visualise pane

Adding Information to Points

Now that we’ve added this SA2 dataset to our session we want to add all of the attributes of each the SA2s to the points of the survey respondents that fall within the SA2s

First, we need to spatialise this dataset, or turn it into a shapefile. At the moment it is just a table of data. To do this  navigate to the Spatialise Aggregated Dataset tool (Tools → Spatial Data Manipulation→ Spatialise Aggregated Dataset) and enter select the dataset that you want to spatialise. In this instance we select NATSEM – Social and Economic Indicators – Synthetic Estimates SA2 2016 , and just click Add and Run. This creates a dataset that we also need to rename in the right hand Data Panel. I have renamed mine NATSEM Indicators: Spatialised

Now that we’ve turned this dataset into a shapefile, we can overlay it over the points and create a new point dataset where all of these attributes of the SA2s have been added to the points. To do this we use the Point Join Tool

The Point Join tool allows you to associate a point dataset with all of the attributes of a polygon dataset that ‘overlays’ it. In this sense, the point join tool is similar to the ‘identify’ tool found in a number of desktop GIS platforms.

Open the Point Join tool (Tools → Spatial Data Manipulation → Point Join) and enter your parameters as shown below. You need to specify the point file that you want to join the data too, and the spatialised polygon file that will be joined. You can choose whether you will only include points in the output file that have a polygon boundary that they fall into (and thus, some variables that can be joined to them).

Once you have entered the parameters click Add and Run

This may take some time, but it will eventually create a new dataset in your Data panel called Output: sphspatial1-join -XXX. Rename this something like Points with NATSEM Data. You can open this table by clicking on the little table icon next to it. It should look like this:

If you want to, you can download this as a shapefile, or you can create a choropleth of the points for any of the variables attached. However, be aware of the inferences that you draw – the characteristics are of the areas that the people live in, not of the people themselves!

Counting Points in Polygons

Counting the number of points or events within an area is one of the key elements of spatial data aggregation, and is a crucial component of the de-identification process for sensitive data such as turning individual health records into counts within predefined areas.

The AURIN portal has a quick, easy to use count tool to aid in this process.

To illustrate the Count Point tool in use, we will count the number of individuals in Australia

Firstly browse for and add the following dataset:

SA4 Aggregated Population & Dwelling Counts 2016 Census for Australia

Next, Spatialise this dataset rename the output: Spatialised SA4 Populations

Open the Count Point tool (AnalyseToolsSpatial Data ManipulationCount Points in Polygons) and enter your parameters as shown below:

  • Points:Points: Geocoded
  • Polygons: Spatialised SA4 Populations
  • Count Attribute: Count
  • Once you have entered your parameters click Add and Run

Once you your tool has run, click on the Display button in the pop-up dialogue box. This should open up the output table. Your count column will appear in the far left hand of the table, with the count of the number of survey respondents in each of the SA4s (shown below)

 

Now, lets have a look to see if there is a relationship between the population size of SA4s and the number of survey respondents. To do this, we will create a scatterplot

To do this, go to the Scatterplot Chart tool (Tools → Charts → Scatterplot) and enter your parameters as shown below, and then click Add and Run

Once the tool has finished running, click the Display button, and you will see a scatterplot like the one below, which shows that as population sizes for the SA4s increases, so too does the number of survey respondents