Hierarchical Clustering Distance Matrix

Introduction

Distance matrix is a matrix (two-dimensional array) containing the distances, taken pairwise, of a set of points. This matrix will have a size of NxN where N is the number of points, nodes or vertices.

The output of this is a graph which shows how similar each of the different areas are when taking into account a range of variables. Closeness to each other on the distance matrix suggests greater and greater similarity

Inputs

To illustrate the use of theHierarchical Clustering Distance Matrix tool, we will use a dataset with a number of variables in it that can be related to each other: Income, Inequality and Financial Stress across the Greater Hobart area. To do this:

  • Select Greater Hobart GCCSA as your area
  • Select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011 as your dataset

Once you have done this, open the Hierarchical Clustering Distance Matrix tool (Tools → Charts→ Hierarchical Clusting Distance Matrix) and enter the parameters as shown in the image below (they are also explained under the image in more detail

[Click to Enlarge]

The parameters that need to be entered are:

  • Cluster Analysis Hierarchical Dataset Input: Select a dataset that contains the variables of interest. Here we select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011
  • Cluster Analysis Variable List: A set of independent variables. Here we select Disposable Income (Synthetic Data), Gini Coefficient (Synthetic Data), Poverty Rate (Synthetic Data), % with no access to emergency money (Synthetic Data), % Can’t afford a night out (Synthetic Data).
  • Cluster Analysis Distance Metric: The distance measure to be used. This must be one of  “Euclidean”, “maximum”, “manhattan”, “Canberra”, “binary” or “minkowski”. Here we select Euclidean
  • Cluster Analysis Cluster Metric: The agglomeration method (linkage rule) to be used. This should be one of “ward”, “single”, “complete”, “average”, “mcquitty”, “median” or “centroid”. Here we select Complete
  •  Chart Title: A title for your Hierarchical Clustering Distance Matrix
  •  Grid: Specify whether you would like gridlines on your output graph
  •  Greyscale: Specify whether you would like your graph to be grey-scale (checked) or colour (unchecked)

Note : Please see the documentation of Hierarchical Cluster Analysis for further details

Once you have selected your parameters, click the Add and Run button.

Outputs

Once you have run the tool, click the Display button which appears in the pop-up dialogue box. This should open up a chart tool looking like the one shown below.

[Click to Enlarge]

The outputs indicate that all of the SA2s within Hobart are more similar to each other with respect to those variables, than they are to the Mount Wellington SA2. The most similar SA2s are West Hobart and Margate-Snug

References

(1) An Introduction to R (3.1.0 (2014-04-10)).
(2) Aldenderfer, M. S. and R. K. Blashfiled (1984) Cluster Analysis, SAGE Publications, Inc, Newbury Park.
(3) CaliŃski, T. (2005) Dendrogram, Encyclopedia of Biostatistics, Vol. 2, pp. 1415 – 1417, Wiley, New York.
(4) http://www.mathworks.com.au/help/stats/hierarchical-clustering.html