Linear Model

Introduction

Linear Model – In its simplest form, a linear model specifies the (linear) relationship between a dependent (or response) variable Y, and a set of predictor variables, the Xs, so that:

\(Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + … + b_{k}X_{k}\)

In this equation b0 is the regression coefficient for the intercept and the bi values are the regression coefficients (for variables 1 through k) computed from the data.

Inputs

To run the Linear model, we will run it on income and poverty data for Sydney. To do this:

  • Select Sydney GCCSA as your area
  • Select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011 as your dataset, selecting all variables

Once you have done this, open the tool (Tools → Statistical Analysis → Linear Model) and enter your parameters, which are explained under the image below

[Click to Expand]

[Click to Expand]

  • Dataset input: Here you select the dataset that you would like to include in the Linear Model. In this instance we select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011.
  • Formula: This part of the parameter input can be tricky for users not familiar with the R Language syntax, particularly because this formula has to be entered in this format. The basic model is “Y regressed on X”, which is denoted in R as

\(Y \sim X \)

where is the name of the dependent variable and is the name of the independent variable, or the variable that you’re testing has an effect on the dependent variable, and the tilde symbol ( ~ ) means “regressed on”. In this format, you can add multiple independent variables, such that you write the model as:

\(Y \sim X_{1} + X_{2} + X_{3} \)

Additional components of the model, such as interactive terms, and how to enter them can be found in the link above.

For this example, we will enter the formula shown below:

poverty_rate_synthetic_estimates ~ median_disposable_household_income_synthetic_estimates

It is important that the names rather than the titles are entered into the formula. These are the “machine readable” names of variables, rather than the “human readable names”. You can find these, and copy and paste them into the formula box if you open up the metadata of your dataset, shown in red below

[Click to Enlarge]

[Click to Enlarge]

Once you have entered the parameters, click Add and Run to execute the tool

Outputs

Once you have run the tool, click the Display button on the pop up dialogue box that appears. This will open up a text editor with the outputs of your Linear Model (as illustrated below). This shows a small (R2 = 0.0575 boxed in blue) but significant (P = 0 boxed in red) effect of Median Disposable Income on Poverty Rates for this dataset.

[Click to Enlarge]

[Click to Enlarge]