Hierarchical Clustering Tree Chart

Introduction

The tree chart is a node-link diagram that provides a visual condensation of Hierarchical Cluster Analysis output. It is also commonly used in determining the number of clusters and spotting outliers. The tree chart displays the hierarchical structure implied by the similarity matrix and clustered by the linkage rule.

The output of this is a graph which shows how similar each of the different areas are when taking into account a range of variables. Closeness to each other on the tree chart suggests greater and greater similarity

Inputs

To illustrate the use of theHierarchical Clustering Tree Chart tool, we will use a dataset with a number of variables in it that can be related to each other: Income, Inequality and Financial Stress across the Greater Hobart area. To do this:

  • Select Greater Hobart GCCSA as your area
  • Select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011 as your dataset

Once you have done this, open the Hierarchical Clustering Tree Chart tool (Tools → Charts→ Hierarchical Clustering Tree Chart) and enter the parameters as shown in the image below (they are also explained under the image in more detail

[Click to Enlarge]

The parameters that need to be entered are:

  •  Cluster Analysis Hierarchical Dataset Input: Select a dataset that contains the variables of interest. Here we select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011
  •  Cluster Analysis Variable List: A set of independent variables. Here we select Disposable Income (Synthetic Data), Gini Coefficient (Synthetic Data), Poverty Rate (Synthetic Data), % with no access to emergency money (Synthetic Data), % Can’t afford a night out (Synthetic Data).
  •  Cluster Analysis Distance Metric: The distance measure to be used. This must be one of  “Euclidean”, “maximum”, “manhattan”, “Canberra”, “binary” or “minkowski”. Here we select Euclidean
  •  Cluster Analysis Cluster Metric: The agglomeration method (linkage rule) to be used. This should be one of “ward”, “single”, “complete”, “average”, “mcquitty”, “median” or “centroid”. Here we select Complete
  •  Chart Title: A title for your Hierarchical Clustering Tree Chart
  •  Grid: Specify whether you would like gridlines on your output graph
  •  Greyscale: Specify whether you would like your graph to be grey-scale (checked) or colour (unchecked)

Note : Please see the documentation of Hierarchical Cluster Analysis for further details

Once you have selected your parameters, click the Add and Run button.

Outputs

Once you have run the tool, click the Display button which appears in the pop-up dialogue box. This should open up a chart tool looking like the one shown below.

[Click to Enlarge]

The outputs indicate that the SA2s of Hobart are are more similar to each other than any are to the SA2 of Mount Wellington, with respect to the variables selected above. They also show that the most similar SA2s are West Hobart and Margate-Snug.

References

(1) An Introduction to R (3.1.0 (2014-04-10)).
(2) Aldenderfer, M. S. and R. K. Blashfiled (1984) Cluster Analysis, SAGE Publications, Inc, Newbury Park.
(3) CaliŃski, T. (2005) Dendrogram, Encyclopedia of Biostatistics, Vol. 2, pp. 1415 – 1417, Wiley, New York.
(4) http://www.mathworks.com.au/help/stats/hierarchical-clustering.html