Problem Set 1
Introduction to Econometrics
Prof. Marcelo J. Moreira and Seyhan E Arkonac, PhD
for all sections
“Calculator” was once a job description.
This problem set gives you an opportunity to do some
calculations on the relation between smoking and lung cancer, using a (very) small sample of
The purpose of this exercise is to illustrate the mechanics of ordinary least
squares (OLS) regression.
First you will calculate the regression “by hand” using formulas from
class and the textbook, then you will use STATA to confirm the calculation.
For the “by hand”
calculations, you may relive history and use long multiplication, long division, and tables of
square roots and logarithms; or you may use an electronic calculator or a spreadsheet.
The data are summarized in the following table.
The variables are per capita cigarette
consumption in 1930 (the independent variable, “
”) and the death rate from lung cancer in 1950
(the dependent variable, “
The cancer rates are shown for a later time period because it takes
time for lung cancer to develop and be diagnosed.
per capita in 1930 (
Lung cancer deaths per
million people in 1950 (
Source: Edward R. Tufte,
Data Analysis for Politics and Management
, Table 3.3.
Use a calculator, a spreadsheet, or “by hand” methods to compute the following; refer to the
textbook for the necessary formulas.
if you use a spreadsheet, attach a printout)
The sample means of
The standard deviations of
The correlation coefficient,
, the OLS estimated slope coefficient from the regression
, the OLS estimated intercept term from the same regression
, the predicted values for each country from the regression
, the OLS residual for each country.