This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Chapter 7. Correlation and Simple Linear Regression Section 7.1. Correlation Scatterplots Plot of Y vs X, two quantitative variables, measured on the same individual X=explanatory variable, Y=response variable Interpreting scatterplots Direction (positive, negative) 1 Strength (strong or moderate or weak ) Look for outliers. 2 Example: Interpret the following scatterplot. X=percent taking SAT Y=median SAT math score for each state Correlation A numerical measure of the strength and direction of linear association between two quantitative variables. Why? Because it is very hard to determine the strength of the linear association by eye. 3 Facts about Correlation coefficient(r) Correlation does not distinguish between explanatory and response variables. r is always between 1 and +1. Interpretation: positive/negative, strong /weak. r measures only the strength of the linear relationship between x and y. Outliers can have a strong effect on r. Correlation is a pure number, in other words, it is unitless. r=+1 means perfect positive linear relationship. r=1 means perfect negative linear relationship. It measures the linear relationship between two quantitative variables. The correlation coefficient r=0 doesnt mean there is no relationship between two variables. It means there is no linear relationship between the variables. 4 CAUTION: Correlation measures the strength of the linear association between two variables. Therefore we can have two variables with a low correlation but with a strong association, which is not linear. Example: The variables X and Y are highly correlated and we can see a clear linear pattern. On the other hand, the correlation between the variables X and Z is small, but they also have a strong association. X Y 20 15 10 5 35 30 25 20 15 10 5 Scatterplot of Y vs X Correlations: X, Y Pearson correlation of X and Y = 1.000 5 X Z 20 15 10 5 100 80 60 40 20 Scatterplot of Z vs X Correlations: X, Z Pearson correlation of X and Z = 0.191 Examples: 6 Formula: Correlation Coefficient Suppose you have n pairs of data points as: (x 1, y 1 ),, (x n, y n ), Compute the sample means y x , and sample s.d.s s x and s y , where = = = = n i i y n i i x y y n s x x n s 1 2 1 2 ) ( 1 1 , ) ( 1 1 Compute zscores: x i x s x x z = , y i y s y y z = Correlation coefficient:   = = y i x i n i s y y s x x n r 1 1 1 Note: You dont need to remember this formula. You can use your calculator (should have 2variable functions) or Minitab ( for homework problems). Why does the formula of r give the same sign (negative/positive) as the graph?...
View Full
Document
 Summer '08
 Kyung
 Statistics

Click to edit the document details