Lecture3_2005

# Lecture3_2005 - Prof. Green Intro Stats Regression with...

This preview shows pages 1–4. Sign up to view the full content.

Prof. Green Intro Stats Regression with Experimental Data Regression ranks among the most useful tools in statistics. Use #1: Predicting outcomes . Often a mindless, theory-free activity. Can be fun and/or profitable. Example: predicting sales based on leading indicators. Use #2: Estimating causal parameters . Requires assumptions about how the independent variable is related to unobserved disturbances that influence the dependent variable. Easy to produce numbers; producing convincing results requires ingenuity, a strong research design, or a gullible audience. Terminology Dependent variable : outcome or response variable. Usually denoted Y and put on Y- axis. Independent variable : treatment variable or regressor. Usually denoted X and put on X- axis. One “regresses Y on X.” Standard Linear Regression Model Y = a + bX +U Note that a (intercept) and b (slope) are parameters . These are typically unknown to the researcher. Interpretation: the term a is the expected value of Y when X=0. The term b is the rate at which the expected value of Y changes for each one-unit increase in X. X and Y are observed variables. Note that the unobserved variable U is called the “disturbance term” or “error term.” The variance of U is another parameter in the model. This model is linear in the parameters – no exponents or quotients. If the U variable were zero for every observation, we would find a straight line relationship when we plot Y against X.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example Suppose the true regression model were: Y = 2 – 3 X + U a B X U Y good 2 -3 1 0 -1 perfectly controlled experiment 2 -3 2 0 -4 (no disturbance variance) 2 -3 3 0 -7 2 -3 4 0 -10 2 -3 5 0 -13 a B X U Y okay 2 -3 1 1 -3 Problem: X and U are correlated 2 -3 2 4 -2 (note that the observed 2 -3 3 7 -8 relationship between X and Y 2 -3 4 10 -9 Is flat when it should be negative) 2 -3 5 13 -13 a B X U Y bad 2 -3 1 1 0 Problem: X and U are correlated 2 -3 2 4 0 (note that the observed 2 -3 3 7 0 relationship between X and Y 2 -3 4 10 0 Is flat when it should be negative) 2 -3 5 13 0 Here’s what the three sets of data look like. The black line correctly reflects the true regression line. The red line distorts the true underlying regression because X and U are correlated. The green points fall more or less along the true regression line.
X Y-Data 5 4 3 2 1 0 -2 -4 -6 -8 -10 -12 -14 Variable Y okay Y good Y bad Scatterplot of Y good, Y bad, Y okay vs X

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/07/2008 for the course STAT 102 taught by Professor Jonathanreuning-schererdonaldgreen during the Fall '05 term at Yale.

### Page1 / 9

Lecture3_2005 - Prof. Green Intro Stats Regression with...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online