# ANOVA

## Contents

## Introduction

ANOVA is an abbreviation of *Analysis of Variance. *The purpose of doing an ANOVA is to compare the means of populations (groups) by analysing the differences between group means for statistical significance. For instance, we might have a range of values – say the heights of individuals – spread among 5 different ethnic groups, and we want to see if there are significant differences between the groups in terms of their mean heights – that is, whether ethnic group is a significant predictor of differences in height. If we only had two groups, we might want to run a **T-test**. An ANOVA test is good when we have multiple categorical values that we can test between.

## Inputs

To show the ANOVA tool in use, we will test whether there are significant differences in expenditure at gaming venues between LGAs across Melbourne. To do this:

**Select***Melbourne GCCSA*as your area**Select***Gaming Venues 2013 for Victoria*as your dataset, selecting all variables

Once you have done this, create a **Centroid Choropleth**** **of the gaming venues across Melbourne, using *Expenditure 2012 – 2013 *as your attribute. It should look something like the image shown below. Do you think that there is a relationship between location and the expenditure at gaming venues?

*Tools → Statistical Analysis → ANOVA*) and enter the parameters as shown below. These parameters are explained underneath the image as well.

*Dataset Input:*This is where we put the dataset that contains the variable that we want to test. In this instance, we select*Gaming Venues 2013 for Victoria**Dependant Variable:*This is where we select the variable that we want to test. It has to be a**ratio or interval variable.**In this instance, we select*Expenditure 2012-2013**Independent Variable:*This is where we select the variable that we think might be related to our dependent variable. It has to be a**nominal or categorical variable***.*In this instance, we select*LGA Name**FAMILY type:*Here we can select from*gaussian, binomial, gamma, poisson, inverst.gaussian, quasi, quasibinomial, quasipoisson.*In this instance we select the default*gaussian.**LINK type:*Here we can select from*logit, probit, caushit, log, cloglog, identity, inverse, sqrt, 1/mu^2.*In this instance we select the default*identity*

Once you have entered the parameters, click *Add and Run *to execute the tool

## Outputs

Once your tool has finished running, click the *Display *button appears on the pop up dialogue box. This opens up a simple text window with the outputs of your ANOVA analysis, as shown below. We are particularly interested in the first line entitled *lga*, which has the summary of the test of whether there are significant differences in *Expenditure 2012-2013 *by LGA grouping in the dataset.

*Df:*Degrees of Freedom for the ANOVA test. For the*lga*line, this is equal to*n – 1*(i.e. there were 40 LGAs in the dataset)*Sum Sq:*The Sum of Squares (SS) for the variable*Mean Sq:*The Mean of Squares (MS) for each variable*F value:*The F Statistic for the test*Pr(>F):*The P value – probability of getting the F statistic for your test, with your degrees of freedom, by chance alone. Our results suggest a highly significant result, that is, that LGA was significantly associated with venue expenditure in our dataset

## References

- An Introduction to R (3.1.2 (2014-10-31)).
- The Cambridge Dictionary of Statistics (2010), Cambridge, UK.
- WIKIPEDIA (http://en.wikipedia.org/wiki/ )
- http://www.csse.monash.edu.au/~smarkham/resources/anova.htm
- http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm
- http://www.personality-project.org/r/r.anova.html