Assignment 4 Answer Key
Table of results for alternative models which explore the effect of dist on ed
Regressor
dist
(a)
ed
-0.037*
(0.013)
(b)
ln(ed)
-0.003*
(0.001)
-0.191
(0.101)
0.143*
(0.050)
0.351*
(0.071)
0.362*
(0.077)
0.093*
(0.003)
0.372*
(0.06
Determining the Value of a Fireplace Intro to Ch. 6
The following empirical example uses data on house prices, in the New York
area in 2002-2003 (the data are from Richard De Veaux of Williams College).
Lets try to determine the value of a fireplace. Firs
houses = read.csv("http:/home.cc.umanitoba.ca/~godwinrt/3180/data/houseprice.csv")
attach(houses)
head(houses)
#Q.1
summary(houses)
head(houses)
length(houses)
#Instead, we will measure Price in thousands of dollars:
Price = Price/1000
#Q.2
lm(Price ~ Fir
1
Ch. 08 Example
This example again uses the Current Population Survey (CPS) dataset. There
are 61395 observations.
> cps =
read.csv("http:/home.cc.umanitoba.ca/~godwinrt/3180/data/cps.csv")
> attach(cps)
> head(cps)
1
2
3
4
5
6
ahe female age northeast m
Downloading and Installing R
R is open-source and free, and has a large online user-support base. If you have a problem,
Google-ing it will likely provide ample solutions.
For PC: http:/cran.r-project.org/bin/windows/base/
For Mac: http:/cran.r-project.or
1
R Example Regressing Test Scores on Student-Teacher Ratio Econ 3180 Ryan
Godwin
Your goal is to reproduce the results from the regression of Test Scores on Student-Teacher
Ratio (see figure 4.3, pg. 118, or Chapter 4 Part 2 lecture slide, pg. 6)
First,
Heteroskedasticity and Homoskedasticity,
and Homoskedasticity-Only Standard Errors
(Section 5.4)
What?
Consequences of homoskedasticity
Implication for computing standard errors
What do these two terms mean?
If var(u|X=x) is constant that is, if the va
Linear Regression with one Regressor
Covering Chapters 4.1 and 4.2.
Weve seen the California test score data before.
Now we will try to estimate the marginal effect of
STR on SCORE.
To motivate these sections:
Hiring an extra teacher costs $
Funding for
#Input the data
y = c(1,4,5,4)
x = c(2,4,6,8)
#Plot the data
plot(x,y,xlim=c(0,10),ylim=c(0,10),pch = 16,col = 2)
#Choose the intercept and slope for the "fitted" regression line
b0 = 0
b1 = 1
#Draw the chosen line
abline(b0,b1,col=3)
#Draw the residuals
R Code for generating the graphs:
set.seed(3187)
#Generate and plot the data:
x = rnorm(10)
y = 2 + 2*x + 4*rnorm(10)
plot(x,y,pch=16,main = "3180 Line Fitting",sub = "Experiment 3")
#OLS:
b0 = lm(y~x)$coeff[1]
b1 = lm(y~x)$coeff[2]
abline(b0,b1,col=2,lwd
#Load the data into R:
teachdata = read.csv("http:/home.cc.umanitoba.ca/~godwinrt/3180/data/str2.csv")
attach(teachdata)
#See some summary statistics:
summary(teachdata)
#Does it appear that the new variables matter?
plot(eng, score)
plot(exppup, score)
p
1
Chapter 8 Conclusion
Three questions about test scores (score) and student-teacher ratio (str):
a) After controlling for differences in economic characteristics of different
districts, does the effect of str on score depend on the fraction of English
le
1
Logarithmic functions of Y and/or X
Last class, we saw that we could approximate a model in which X has a
non-linear effect on Y by using a polynomial population model:
= 0 + 1 + 2 2 + 3 3 + + +
Other regressors may be added as usual
This is a polyn
Chapter 6 - Outline
1.
2.
3.
4.
5.
Omitted variable bias
Causality and regression analysis
Multiple regression and OLS
Measures of fit
Sampling distribution of the OLS estimator
1
Omitted Variable Bias
(SW Section 6.1)
The error u arises because of factor
The Population Multiple Regression
Model (SW Section 6.2)
Consider the case of two regressors:
Yi = 0 + 1X1i + 2X2i + ui, i = 1,n
Y is the dependent variable
X1, X2 are the two independent variables (regressors)
(Yi, X1i, X2i) denote the ith observatio
The Least Squares Assumptions for
Multiple Regression (SW Section 6.5)
Yi = 0 + 1X1i + 2X2i + + kXki + ui, i = 1,n
1. The conditional distribution of u given the Xs has mean
zero, that is, E(u|X1 = x1, Xk = xk) = 0.
2. (X1i,Xki,Yi), i =1,n, are i.i.d.
3.
Confidence Sets for Multiple
Coefficients (SW Section 7.4)
Yi = 0 + 1X1i + 2X2i + + kXki + ui, i = 1,n
What is a joint confidence set for 1 and 2?
A 95% joint confidence set is:
A set-valued function of the data that contains the true
parameter(s) in 95%
Chapter 7 - Outline
1. Hypothesis tests and confidence intervals for a single
coefficient
2. Joint hypothesis tests on multiple coefficients
3. Other types of hypotheses involving multiple coefficients
4. How to decide what variables to include in a regre
#Load the data into R:
teachdata = read.csv("http:/home.cc.umanitoba.ca/~godwinrt/3180/data/str.csv")
attach(teachdata)
#See some summary statistics:
summary(teachdata)
#Notice that there are for variables: test scores, student teacher ratio, expenditure
#Randomly generate the population model:
x1 = rnorm(10)
y = 5 + 2*x1 + rnorm(10)
#Plot it:
plot(x1,y,pch = 16,col = 2)
#Run OLS. What is the R-squared and adjusted R-squared?
summary(lm(y~x1)
#Now make up a variable, x2:
x2 = c()
#Plot it:
plot(x2,y,pch
1
Ch. 08 Introduction
This example uses the Current Population Survey (CPS) dataset. There are
61395 observations.
> cps =
read.csv("http:/home.cc.umanitoba.ca/~godwinrt/3180/data/cps.csv")
> attach(cps)
> head(cps)
1
2
3
4
5
6
ahe female age northeast mi
Nonlinear Regression Functions
(SW Chapter 8)
Everything so far has been linear in the Xs
But the linear approximation is not always a good one
The multiple regression framework can be extended to handle
regression functions that are nonlinear in one o
ECON 3180 Practise Assignment #2 What Determines House Prices?
This assignment should help you with your familiarity of dummy variables, R-squared and adjusted Rsquared, and the F-test.
1.) Load the housing data. You can see how to do this from the first
The Least Squares Assumptions
(SW Section 4.4)
What, in a precise sense, are the properties of the OLS
estimator? We would like it to be unbiased, and to have a small
variance. Does it? Under what conditions is it an unbiased
estimator of the true populat
Confidence Intervals for 1
(Section 5.2)
Recall that a 95% confidence is, equivalently:
The set of points that cannot be rejected at the 5%
significance level;
A set-valued function of the data (an interval that is a
function of the data) that contains
Regression with a Single Regressor:
Hypothesis Tests and Confidence Intervals
(SW Chapter 5)
Overview
Now that we have the sampling distribution of OLS
estimator, we are ready to perform hypothesis tests about 1
and to construct confidence intervals abou
Estimators and Sampling Distributions (based on Ch. 2.5 and 2.6)
Consider a random variable Y commuting time.
Suppose I am interested in knowing the expected value for commuting time, E(Y).
I observe a sample:
. (Sample size = n)
Assume:
the
s are i.i.d.