This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 9. Linear Regression and Correlation Data: y: a quantitative response variable x: a quantitative explanatory variable (Chapter 8: Recall that both variables were categorical ; later chapters have multiple explanatory variables) For example (Wagner et al., Amer. J. Community Health , vol. 16, p. 189) y = mental health, measured with Hopkins Symptom List (presence or absence of 57 psychological symptoms) x = stress level (a measure of negative events weighted by the reported frequency and subject’s subjective estimate of impact of each event) We consider: • Is there an association? (test of independence ) • How strong is the association? (uses correlation ) • How can we describe the nature of the relationship, e.g., by Linear Relationships Linear Function (StraightLine Relation): y = α + β x expresses y as linear function of x with slope β and y intercept α. For each 1unit increase in x, y increases β units β > 0 ⇒ Line slopes upward ( positive relationship) β = 0 ⇒ Horizontal line ( y does not depend on x ) β < 0 ⇒ Line slopes downward ( negative relation) Example: Economic Level and CO2 Emissions OECD (Organization for Economic Development, www.oecd.org ): Advanced industrialized nations “committed to democracy and the market economy.” oecddata file (from 2004) on p. 62 of text and at text website www.stat.ufl.edu/~aa/social/ • Let y = carbon dioxide emissions (per capita, in metric tons) Ranges from 5.6 in Portugal to 22.0 in Luxembourg (U.S. = 19.8) mean = 10.4, standard deviation = 4.6 • x = gross domestic product (GDP, in thousands of dollars per capita) Ranges from 19.6 in Portugal to 70.0 in Luxembourg (U.S. = 39.7) mean = 32.1, standard deviation = 9.6 The relationship between x and y can be approximated by y = 0.42 + 0.31x. • At x = 0, predicted CO2 level y = 0.42 + 0.31 x = 0.42 + 0.31(0) = 0.42 (irrelevant, because no GDP values near 0) • At x = 39.7 (value for U.S.), predicted CO2 level y = 0.42 + 0.31(39.7) = 12.7 (actual = 19.8 for U.S.) • For each increase of 1 thousand dollars in per capita GDP, CO2 use predicted to increase by 0.31 metric tons per capita • But, this linear equation is just an approximation. The correlation between x and y for these nations was 0.64, not 1.0 (It is even less, 0.41, if we remove the outlier observation for Luxembourg.) Effect of variable coding? Slope and intercept depend on units of measurement. • If x = GDP measured in dollars (instead of thousands of dollars), then y = 0.42 + 0.00031 x (instead of y = 0.42 + 0.31 x ) because a change of $1 has only 1/1000 the impact of a change of $1000 (so, the slope is multiplied by 0.001). • If y = CO2 output in kilograms instead of metric tons (1 metric ton = 1000 kilograms), with x in dollars, then y = 1000(0.42 + 0.00031 x ) = 420 + 0.31x Suppose x changes from U.S. dollars to British pounds and 1 pound = 2 dollars. What happens? Probabilistic Models • In practice, the relationship between y and x is not “perfect” because...
View
Full
Document
This note was uploaded on 07/12/2011 for the course STA 3030 taught by Professor Agresti during the Spring '11 term at University of Florida.
 Spring '11
 Agresti
 Correlation, Linear Regression

Click to edit the document details