This preview shows pages 1–3. Sign up to view the full content.
1
API202A
Empirical Methods II
Spring 2009
Assignment 2
Due February 19
th
at 5pm
(Drop in API202A mailbox inside manila envelope with your KSG mailbox # written outside)
Section I: Smoking and Cancer
Few medical professionals doubt that smoking leads to many health problems, including lung cancer.
While it is more difficult to determine that smoking
causes
lung cancer than simply to say that the two are
related, in this assignment you will perform analysis that can begin to document the relationship between
smoking and lung cancer.
However, this is far from a definitive analysis.
The sample is very small, and
no effort is made to control for other differences between the countries in the sample.
The purpose of this
exercise is to help you learn the mechanics of ordinary least squares (OLS) regression.
First you will
calculate the regression “by hand” using the formulas developed in class, and then you will use Stata to
confirm the calculation.
The death rate from lung cancer in 1950 and the per capita cigarette consumption in 1930 are shown
below for five countries.
The cancer rates are shown for a later time period because it presumably takes
time for lung cancer to develop and be diagnosed.
Our hypothesis is that the dependent variable, lung
cancer (
Y
), is a function of the independent variable, smoking (
X
).
Country
Cigarettes Consumed
Lung Cancer Deaths
per Capita (1930)
per Million People (1950)
Holland
460
245
Finland
1115
350
Great Britain
1145
465
Canada
510
150
N
o
r
w
a
y
2
5
0
9
0
Source: Edward R. Tufte,
Data Analysis for Politics and Management
, Table 3.3.
(1)
Using the appropriate formulas (given at the end of this assignment), show how to calculate
each of the following.
Note: “show how to calculate” means (1) write the appropriate
formula; (2) plug in the appropriate values; and (3) show the computed answer.
You do not
need to show the intermediate calculations between steps 2 and 3. NOTE: You may use excel
to do the calculations as long as you show the formulas used and you do not use the builtin
regression functions.
a)
1
ˆ
, the estimated slope coefficient from the regression
i
i
i
X
Y
ˆ
ˆ
ˆ
1
1
b)
0
ˆ
, the estimated intercept term from the same regression
c)
Holland
Y
ˆ
, the predicted value for Holland
d)
Holland
ˆ
, the OLS residual for Holland
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document2
(2)
In this question you will see how the same regression is produced by Stata.
Open Stata and
type “edit,” which brings up something that looks like a spreadsheet.
Enter the smoking and
cancer values in the first two columns.
Doubleclick the column headers to enter variable
names (“smoke”, “cancer”) and appropriate labels.
Close the editor window when you are
done.
Type “list” to be sure you have typed in the numbers correctly, and type “sum” to
inspect the variable means.
This is the end of the preview. Sign up
to
access the rest of the document.
 Spring '09
 LEVY

Click to edit the document details