Theil Index
Contents
Introduction
The Theil Index is a common measure of concentration and dispersion. It has historically been used to measure income inequality, but in effect can be used to measure the dispersion of any variable across regions relative to the whole.
The Theil index is calculated as follows
\(T={\sum_{i}\left(y_{i}\over Y\right)}ln\left(y_{i}/Y\over n_{i}/N\right)\)
(Akita and Kawamura, 2002)
Where \(y_{i}\) is the count of the variable of interest (such as number of unemployed people) in area \(i\), \(Y\) is total count of that variable of interest across the entire study area, \(n_{i}\) is the total population count in region \(i\), and N is the total population count across the entire study area.
The Theil Index can range from 1 to the natural log (\(ln\)) of the number of categories, \(n\) (in this case, the number of areas within the study area). It is common to standardise the Theil Index so it can range between zero and one, which is done by dividing by \(ln(n)\). A Theil Index of zero indicates perfect equality and every region has the proportion for the population. Conversely, a standardised Theil Index value of one represents a state of perfect inequality, where one region has all of the variable of interest.
Inputs
To show the Theil Index tool in use, we will run it to calculate the coefficients for the distribution of female youth unemployment across Western Australia
To do this:
- Select Western Australia as your area
- Select SA2 OECD Indicators: Unemployment Rates 2011 as your dataset, with the following variables:
- SA2 Name
- Unemployed females 15 – 24
- Females in labour force 15 – 24
Once you have done this, open the Theil Index tool (Tools → Spatial Statistics → Theil Index) and enter the parameters as shown in the image below. These are also explained underneath the image
- Dataset input: This is the dataset that contains the values you would like to include in the Theil Index calculation. In this instance we select SA2 OECD Indicators: Unemployment Rates 2011
- Numerator: This is the column that contains the different counts for the specific variable that you would like to calculate the inequality of distribution for across the study region. In this instance we select Unemployed females 15 – 24
- Denominator: This is the column that contains the total counts of the sample population that you are taking the numerator from. In this instance we select Females in labour force 15 – 24
Once you have entered the parameters, click Add and Run to execute the tool
Outputs
Once you have run the tool, click the Display button that appears on the pop up dialogue box. This should open up a text box like the one shown below, which has the Theil Index values for your variable (raw and standardised). In this instance, we have a standardised value of 0.0089, which suggests low inequality in the distribution of female unemployment in WA.
References
- Akita, T. and Kawamura, K. (2002) ‘Regional income inequality in China and Indonesia: A comparative analysis’, Paper presented at the 46th Congress of European Regional Science Association (ERSA), Dortmund, August.
- CofFEE Spatial Statistics Tools Help File