ANOVA

Introduction

ANOVA is an abbreviation of Analysis of Variance. The purpose of doing an ANOVA is to compare the means of populations (groups) by analysing the differences between group means for statistical significance. For instance, we might have a range of values –  say the heights of individuals – spread among 5 different ethnic groups, and we want to see if there are significant differences between the groups in terms of their mean heights – that is, whether ethnic group is a significant predictor of differences in height. If we only had two groups, we might want to run a T-test. An ANOVA test is good when we have multiple categorical values that we can test between.

Inputs

To show the ANOVA tool in use, we will test whether there are significant differences in expenditure at gaming venues between LGAs across Melbourne. To do this:

  • Select Melbourne GCCSA as your area
  • Select Gaming Venues 2013 for Victoria as your dataset, selecting all variables

Once you have done this, create a Centroid Choropleth of the gaming venues across Melbourne, using Expenditure 2012 – 2013 as your attribute. It should look something like the image shown below. Do you think that there is a relationship between location and the expenditure at gaming venues?

[Click to Enlarge]

[Click to Enlarge]

To run the ANOVA tool, open the tool (Tools → Statistical Analysis → ANOVA) and enter the parameters as shown below. These parameters are explained underneath the image as well.

[Click to Enlarge]

[Click to Enlarge]

  • Dataset Input: This is where we put the dataset that contains the variable that we want to test. In this instance, we select Gaming Venues 2013 for Victoria
  • Dependant Variable: This is where we select the variable that we want to test. It has to be a ratio or interval variable. In this instance, we select Expenditure 2012-2013
  • Independent Variable: This is where we select the variable that we think might be related to our dependent variable. It has to be a nominal or categorical variableIn this instance, we select LGA Name
  • FAMILY type: Here we can select from gaussian, binomial, gamma, poisson, inverst.gaussian, quasi, quasibinomial, quasipoisson. In this instance we select the default gaussian.
  • LINK type: Here we can select from logit, probit, caushit, log, cloglog, identity, inverse, sqrt, 1/mu^2. In this instance we select the default identity

Once you have entered the parameters, click Add and Run to execute the tool

Outputs

Once your tool has finished running, click the Display button appears on the pop up dialogue box. This opens up a simple text window with the outputs of your ANOVA analysis, as shown below. We are particularly interested in the first line entitled lga, which has the summary of the test of whether there are significant differences in Expenditure 2012-2013 by LGA grouping in the dataset.

[Click to Enlarge]

[Click to Enlarge]

These outputs are explained below:

  • Df: Degrees of Freedom for the ANOVA test. For the lga line, this is equal to n – 1 (i.e. there were 40 LGAs in the dataset)
  • Sum Sq: The Sum of Squares (SS) for the variable
  • Mean Sq: The Mean of Squares (MS) for each variable
  • F value: The F Statistic for the test
  • Pr(>F): The P value – probability of getting the F statistic for your test, with your degrees of freedom, by chance alone. Our results suggest a highly significant result, that is, that LGA was significantly associated with venue expenditure in our dataset

References