This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Relationships Between Measurement Variables Measurement Economic and Social Liberalism in the U.S. states the Review
– Scatterplots display relationships between two Scatterplots variables variables E.g. GPA & SAT scores Today Today
– Relationships between two variables Deterministic v. Statistical Relationships Relationships Deterministic relationships: one variable one perfectly predicts another perfectly
– Speed & distance traveled within a certain Speed amount of time amount Statistical (or probabilistic) relationships: there is a relationship, but it is not precise there
– SAT scores and college GPA Scatterplot of 2 Random Variables Scatterplot
1 0 0 .2 .4 rand1 .6 .8 .2 .4 rand2 .6 .8 1 Scatterplot of Reagan Vote in 1980 and in 1984 by State and
70 50 % vote for reagan, 1984 55 60 65 75 40 50 60 % vo te for reag an, 1980 70 80 U.S. Census Publication, 'STATE AND METROPOLITAN DATA BOOK, 1991. Scatterplot of Reagan Vote in 1980 & Carter Vote in 1976 Carter
70 30 percent vote f or carter in 1976 40 50 60 40 50 60 % vo te for reag an, 1980 70 80 U.S. Census Publication, 'STATE AND METROPOLITAN DATA BOOK, 1991. Scatterplot of Reagan Vote in 1980 & Percentage of Households Below the Poverty Line in 1991 the
70 40 5 % vote for reagan, 1980 50 60 80 10 15 % below po verty lin e 1991 20 25 U.S. Census Publication, 'STATE AND METROPOLITAN DATA BOOK, 1991. Correlation Correlation Positive relationships, negative Positive relationships, no relationship relationships, Express relationships with a single number
– Correlation coefficient (Pearson productmoment correlation) measures strength of moment relationship between 2 measurement variables relationship
r Correlation Correlation Correlation coefficient ranges between 1 and 1
– A correlation of 1 indicates that there is a perfect linear correlation relationship between the two variables relationship – A correlation of 1 indicates a perfect linear relationship correlation also, but the relationship is negative: as one variable increases, the other decreases increases, – A correlation of 0 indicates that there is no linear correlation relationship relationship – A positive correlation indicates that as one variable positive increases, the other increases; a negative correlation means that as one variable increases, the other decreases decreases Calculating Pearson’s Correlation Calculating
The correlation coefficient is calculated based on the following formula that The uses Z scores: uses r = ∑ ZxZy
n This means: For each observation, calculate the Z score for the x For variable & the y variable (that is Zx by your Zy respectively) variable respectively) Multiply your Zx by your Zy for each person or observation Add up these products (∑ ZxZy). Divide by n, the number of people or observations. Pearson Correlation Pearson
r = ∑ ZxZy n
Z score Review: Z = (case score  mean) = (XM) (case (XM) standard deviation S standard Remember: The mean z score for the x, y variables will be 0 Scores higher than the mean are positive Scores lower than the mean are negative Correlation between SAT and GPA; Z scores are given SAT GPA 12004.0 14004.0 10003.0 800 3.0 800 SAT Zs GPA Z + .45 +1.34  .45 1.34 SAT Zs x GPA Zs +1.00 + .45 +1.00 +1.34 1.00 + .45 1.00 +1.34 r = ∑ ZxZy = 3.58 = .895 3.58
n 4 There is a strong positive relationship b/w GPA & SAT scorez. How does this formula work? How Positive correlation: A llot of big positive numbers ot in the numerator means a lot of pairs of positive numbers & pairs of negative numbers numbers Negative correlation: A llot of negative pairs in the ot numerator means pairs of positive & negative numbers numbers 0 correlation: equal numbers of pairs of negative equal numbers, positive numbers, pos & neg numbers, & neg & positive numbers neg Correlation=1 Correlation=1
8 0 0 2 y 4 6 2 4 x 6 8 Examples of r=1 Examples Deterministic relationships
– e.g. time spent traveling at a fixed speed & distance covered e.g. Hard to come up with interesting examples Cigars
– Ring gauge is diameter of a cigar measured in 64ths of an inch – “…a 24 year old man might smoke a 3/8 inch cigarillo, a 32year 24 old a ½ inch panatela, & a 48year old a ¾ inch Churchill” –Fink 2009 2009 Ring Gauge Should Match Your Age Ring
.8 .4 25 .5 sizeofcigar .6 .7 30 35 age 40 45 50 Correlation of 1 Correlation
8 0 8 2 y 4 6 6 4 x 2 0 Scatterplot of 2 Random Variables Scatterplot
1 0 0 .2 .4 rand1 .6 .8 .2 .4 rand2 .6 .8 1 Correlation= .03 Scatterplot of Reagan Vote in 1980 & in 1984 by State in
70 50 % vote for reagan, 1984 55 60 65 75 40 50 60 % vo te for reag an, 1980 70 80 Correlation=.85 Correlation of % Vote for Carter 1976 & 1976 % Vote for Reagan 1980
percent vote f or carter in 1976 40 50 60 30 70 40 50 60 % vo te for reag an, 1980 70 80 Correlation= .71 Correlation Notes Correlation It does not matter if we change the units of It measurement measurement
– E.g. the correlation between height & weight is E.g. the same regardless of whether height is expressed in feet, inches, or meters expressed Note also that correlation measures the Note extent to which a linear relationship exists relationship Correlation=0 Correlation=0 var11 0 5 5 10 15 20 25 0 var10 5 Correlation Notes Correlation Correlation does not imply causation Correlation means that there is a Correlation relationship between the 2 variables relationship
– E.g. number of firefighters, number of fires in a E.g. city city Correlation problems Correlation Does not have substantive significance
– We know the correlation between the vote for We Reagan in 1984 & in 1980 is .85 Reagan – How much does the vote in 1984 increase for How each percentage point increase in 1980? each – Consider a state in which the vote was 50% in Consider 1980. What should we expect the vote to be in 1984? 1984? These questions require regression These Scatterplot of Reagan Vote in 1980 & in 1984 by State in
70 50 % vote for reagan, 1984 55 60 65 75 40 50 60 % vo te for reag an, 1980 70 80 U.S. Census Publication, 'STATE AND METROPOLITAN DATA BOOK, 1991. Regression Regression We want to fit a straight line to these data Equation for this line will provide a Equation substantive interpretation for the substantive relationship and will help us predict what predict would happen in states with a given percentage of the vote in 1980 percentage 50 vote for Reagan 1 98 4 55 60 65 70 75 40 50 60 % vo te for reag an, 1980 70 80 Regression Regression y iis the dependent variable or outcome variable; iit s t dependent outcome is the variable we are trying to predict is
– Reagan vote in 1984 x iis the independent variable or explanatory s independent or variable variable
– Reagan vote in 1980 x iis displayed on the horizontal axis & y on the s vertical axis vertical Regression Equation Regression
Equation of the line is Equation y = a + xb xb a iis the constant; it is the value of y when x s equals zero equals b iis the slope of the line; it shows how much s y increases for each unit increase in x increases Regression example Regression
Equation for the Reagan line is Equation y =25.8 + .67x =25.8 The constant is 25.8 (when x=0, y=25.8) The slope is .67 For each increase of 1 percentage point in For the Reagan 1980 vote, the Reagan 1984 vote increases by .67 percentage points vote 0 0 10 20 30 40 50 60 70 % vo te for reag an, 1980 80 90 100 10 vote for Reagan 1 98 4 20 30 40 50 60 70 80 90 100 Interpreting the Relationship Interpreting Interpretation: for each increase of one Interpretation: percentage point in the 1980 vote, the 1984 vote will increase by 2/3 vote
– This gives us a substantive interpretation This between the 2 variables between Prediction Prediction
The equation for the Reagan line is The y =25.8 + .67x =25.8 What about a state with 50% voting for What Reagan in 1980? Reagan Y=25.8 + .67 * (50%)=59.3 50 55 vote for Reagan 1 98 4 60 65 70 75 40 50 60 % vo te for reag an, 1980 70 80 The equation for the Reagan line is y =25.8 + .67x =25.8 What about a state with 60% voting for What Reagan in 1980? Reagan Y=25.8 + .67 * (60%)=66% 50 vote for Reagan 1 98 4 55 60 65 70 75 40 50 60 % vo te for reag an, 1980 70 80 A more recent example… more Obama support in primary states Obama (green ones) by black population (green Regression notes Regression As with correlation, regression does not As imply causation imply
– E.g. the 1980 vote for Reagan did not cause the E.g. 1984 vote for Reagan 1984 – Both are caused by other factors Both (e.g. conservatism of the state; etc.) (e.g. Regression notes Regression Be careful in making predictions; do not Be extrapolate far beyond existing data extrapolate
– E.g. if we have data for 20002008, we can E.g. predict what will happen in 2009, but shouldn’t extrapolate out to 2080 extrapolate Regression notes Regression
Also, as with correlation, linear regression can Also, underestimate or fail to detect curvilinear relationships relationships Review Review Correlation shows the relationship between two measurement variables; Ranges between 1 and 1 measurement Regression offers a substantive interpretation of the relationship the
– y= a + bx – Can be used to interpret the strength of the relationship Can and to make predictions about individual cases taking on a particular x value on Questions? ...
View
Full
Document
This note was uploaded on 01/27/2011 for the course COM 200 taught by Professor Tamborini during the Fall '09 term at Michigan State University.
 Fall '09
 TAMBORINI

Click to edit the document details