# Linear Model

## Contents

## Introduction

Linear Model – In its simplest form, a linear model specifies the (linear) relationship between a dependent (or response) variable *Y*, and a set of predictor variables, the *X*s, so that:

#### \(Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + … + b_{k}X_{k}\)

In this equation *b _{0} *is the regression coefficient for the intercept and the

*b*values are the regression coefficients (for variables 1 through

_{i}*k*) computed from the data.

## Inputs

To run the Linear model, we will run it on income and poverty data for Sydney. To do this:

**Select***Sydney GCCSA*as your area**Select***SA2 OECD Indicators: Income, Inequality and Financial Stress 2011*as your dataset, selecting all variables

Once you have done this, open the tool (*Tools → Statistical Analysis → Linear Model*) and enter your parameters, which are explained under the image below

*Dataset input:*Here you select the dataset that you would like to include in the Linear Model. In this instance we select*SA2 OECD Indicators: Income, Inequality and Financial Stress 2011.**Formula:*This part of the parameter input can be tricky for users not familiar with the**R Language**syntax, particularly because this formula has to be entered in this format. The basic model is “Y regressed on X”, which is denoted in R as

#### \(Y \sim X \)

where *Y *is the name of the dependent variable and *X *is the name of the independent variable, or the variable that you’re testing has an effect on the dependent variable, and the tilde symbol ( ~ ) means “regressed on”. In this format, you can add multiple independent variables, such that you write the model as:

#### \(Y \sim X_{1} + X_{2} + X_{3} \)

Additional components of the model, such as interactive terms, and how to enter them can be found in the link above.

For this example, we will enter the formula shown below:

**poverty_rate_synthetic_estimates ~ median_disposable_household_income_synthetic_estimates**

It is important that the names* *rather than the titles are entered into the formula. These are the “machine readable” names of variables, rather than the “human readable names”. You can find these, and copy and paste them into the formula box if you open up the metadata of your dataset, shown in **red** below

*Add and Run*to execute the tool

## Outputs

Once you have run the tool, click the *Display *button on the pop up dialogue box that appears. This will open up a text editor with the outputs of your Linear Model (as illustrated below). This shows a small (*R ^{2} = 0.0575 *boxed in blue) but significant (

*P = 0*boxed in red) effect of Median Disposable Income on Poverty Rates for this dataset.