Winsor

Introduction

The winsor tool allows you to transform datasets and potentially remove the influence of outliers and extreme values in your data prior to running other statistical analyses. It does this by truncating or rounding the most extreme upper or lower values in your variable/column down or up to a level that you specify.

Inputs

To show the Winsor tool in action, we will run it on some housing stress data from Melbourne. To do this:

  • Select Mebourne GCCSA as your area
  • Select SA2 Housing Transport as your dataset, selecting Statistical Area Level 2 Name and Mortgage Stress – Percent as your variables

Once you have done this, open the Winsor tool (Tools  Statistical Analysis  Winsor) and enter your parameters as shown in the image below. These are also explained in more detail under the image

[Click to Enlarge]

[Click to Enlarge]

  • Dataset input: The dataset containing the variable you would like to transform. In this instance we have selected SA2 Housing Transport
  • Input Variable: The variable that you would like to tranform. In this case we select Mortgage Stress – Percent
  • New Variable Name: The name of the winsorised variable. This will appear as a new column in the output. Remember to include underscores instead of spaces, and no special characters. We have named our new variable Winsor_Mortgage_Stress_Percent
  • Trim Value: You specify the proportion of each tail that will be trimmed, between 0 and 0.5. In this case, we have chosen 0.1

Once you have entered your parameters, click Add and Run to execute your tool.

Outputs

Once your tool as run, click the Display button to open the output table. It should look like the image below. If you order the columns from smallest to largest, you should see that all of the highest and lowest values in the Mortgage Stress – Percent column have been rounded down and up in the Winsor_Mortgage_Stress_Percent column.

[Click to Enlarge]

[Click to Enlarge]

We can see this more graphically, if we create a scatterplot of the original variable for our X axis and the Winsored variable for our Y axis (shown below). We can see that where it should be straight line, the values at the top and the bottom of the Y axis have been truncated.

[Click to Enlarge]

[Click to Enlarge]