This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 9. Linear Regression and Correlation Data: y – a quantitative response variable x – a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical ) For example, y = annual income, x = number of years of education y = college GPA, x = high school GPA (or perhaps SAT) We consider: • Is there an association? (test of independence ) • How strong is the association? (uses correlation ) • How can we describe the nature of the relationship, e.g., by using x to predict y ? ( regression equation, residuals) Linear Relationships Linear Function (StraightLine Relation): y = α + β x expresses y as linear function of x with slope β and y intercept α. For each 1unit increase in x, y increases β units β > 0 ⇒ Line slopes upward ( positive relationship) β = 0 ⇒ Horizontal line ( y does not depend on x ) β < 0 ⇒ Line slopes downward ( negative relation) Example: Economic Level and CO2 Emissions OECD (Organization for Economic Development, www.oecd.org ): Advanced industrialized nations “committed to democracy and the market economy.” oecddata file (from 2004) on p. 62 of text and at text website www.stat.ufl.edu/~aa/social/ • Let y = carbon dioxide emissions (per capita, in metric tons) Ranges from 5.6 in Portugal to 22.0 in Luxembourg mean = 10.4, standard deviation = 4.6 • x = gross domestic product (GDP, in thousands of dollars per capita) Ranges from 19.6 in Portugal to 70.0 in Luxembourg mean = 32.1, standard deviation = 9.6 The relationship between x and y can be approximated by y = 0.42 + 0.31x. • At x = 0, predicted CO2 level y = 0.42 + 0.31 x = 0.42 + 0.31(0) = 0.42 (irrelevant, because no GDP values near 0) • At x = 39.7 (value for U.S.), predicted CO2 level y = 0.42 + 0.31(39.7) = 12.7 (actual = 19.8 for U.S.) • For each increase of 1 thousand dollars in per capita GDP, CO2 use predicted to increase by 0.31 metric tons per capita • But, this linear equation is just an approximation. The correlation between x and y for these nations was 0.64, not 1.0 (It is even less, 0.41, if we take out the outlier observation for Luxembourg.) Likewise, we would not expect to be able to predict annual income perfectly using years of education or to predict college GPA perfectly using high school GPA. Effect of variable coding? Slope and intercept depend on units of measurement. • If x = GDP measured in dollars (instead of thousands of dollars), then y = 0.42 + 0.00031 x because a change of $1 has only 1/1000 the impact of a change of $1000 (so, the slope is multiplied by 0.001). • If y = CO2 output in kilograms instead of metric tons (1 metric ton = 1000 kilograms), with x in dollars, then y = 1000(0.42 + 0.00031 x ) = 420 + 0.31x Suppose x changes from U.S. dollars to British pounds and 1 pound = 2 dollars. What happens? Probabilistic Models • In practice, the relationship between y and x is not “perfect” because y is not completely determined by x ....
View
Full
Document
This note was uploaded on 11/20/2011 for the course STATISTICS ST3241 taught by Professor Manwai's during the Spring '11 term at National University of Singapore.
 Spring '11
 ManWai's
 Correlation, Linear Regression

Click to edit the document details