SOLUTIONS TO
Problem Set 1
Introduction to Econometrics
prepared by
Prof. Marcelo J. Moreira and Seyhan E Arkonac, PhD
for all sections
Spring 2010
“Calculator” was once a job description.
This problem set gives you an opportunity to do some
calculations on the relation between smoking and lung cancer, using a (very) small sample of
five countries.
The purpose of this exercise is to illustrate the mechanics of ordinary least
squares (OLS) regression.
First you will calculate the regression “by hand” using formulas from
class and the textbook, then you will use STATA to confirm the calculation.
For the “by hand”
calculations, you may relive history and use long multiplication, long division, and tables of
square roots and logarithms; or you may use an electronic calculator or a spreadsheet.
The data are summarized in the following table.
The variables are per capita cigarette
consumption in 1930 (the independent variable, “
X
”) and the death rate from lung cancer in 1950
(the dependent variable, “
Y
”).
The cancer rates are shown for a later time period because it takes
time for lung cancer to develop and be diagnosed.
Observation #
Country
Cigarettes consumed
per capita in 1930 (
X
)
Lung cancer deaths per
million people in 1950 (
Y
)
1
Switzerland
530
250
2
Finland
1115
350
3
Great Britain
1145
465
4
Canada
510
150
5
Denmark
380
165
Source: Edward R. Tufte,
Data Analysis for Politics and Management
, Table 3.3.
1.
Use a calculator, a spreadsheet, or “by hand” methods to compute the following; refer to the
textbook for the necessary formulas.
(
Note
:
if you use a spreadsheet, attach a printout)
a)
The sample means of
X
and
Y
,
X
and
Y
.
b)
The standard deviations of
X
and
Y
,
s
X
and
s
Y
.
c)
The correlation coefficient,
r
, between
X
and
Y
d)
1
ˆ
, the OLS estimated slope coefficient from the regression
Y
i
=
0
+
1
X
i
+
u
i
e)
0
ˆ
, the OLS estimated intercept term from the same regression
f)
ˆ
i
Y
,
i
= 1,…,
n
, the predicted values for each country from the regression
g)
ˆ
i
u
, the OLS residual for each country.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentAnswers:
a)
The sample means of
X
and
Y
,
X
and
Y
.
X
= 736,
Y
= 276
b)
The standard deviations of
X
and
Y
,
s
X
and
s
Y
.
s
X
= 364.41,
s
Y
=
132.35
c)
The correlation coefficient,
r
, between
X
and
Y
r = 0.9262
d)
1
ˆ
, the OLS estimated slope coefficient from the regression
Y
i
=
0
+
1
X
i
+
u
i
1
ˆ
= 0.336418
e)
0
ˆ
, the OLS estimated intercept term from the same regression
0
ˆ
=
28.39656
f)
ˆ
i
Y
,
i
= 1,…,
n
, the predicted values for each country from the regression
Switzerland
206.6981
France
403.5026
GreatBritain
413.5952
Canada
199.9697
Denmark
156.2354
g)
ˆ
i
u
, the OLS residual for each country.
Switzerland
43.3019
France
53.5026
GreatBritain
51.40483
Canada
49.9697
Denmark
8.7646
2.
Now calculate the statistics in question #1 using STATA.
On the STATA output file, find
and label the items in Question #1.
STATA HINTS:
First load STATA and type “edit,” which brings up something that looks
like a spreadsheet.
Enter the smoking and cancer values in the first two columns.
Double
click the column headers to enter variable names (e.g. “smoke”, “deaths”).
Close the editor
window when you are done.
The following commands will be useful:
list
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 SeyhanArkonac
 Econometrics, Standard Deviation, Variance, Statistical hypothesis testing, per capita

Click to edit the document details