Unformatted text preview: Chapter 16: Chapter Correlation Coefficient Correlation
A. Construction of Measure of Reliability ( x1 , y1 ), ( x2 , y2 ),........., ( xn , yn ) Given data points , we can always find the regression line ˆ y = a + bx where and ∑ x y − nx y a = y − bx b= ∑ x − nx
i i 2 i 2 and use it to predict y from x. Is this prediction reliable?
Say, in case A, where the experimental data lie more or less along a straight line, the answer is possibly “yes”. In case B, below, where the experimental In data are scattered around, we might think that the prediction may not be reliable. that
Question: How can we tell whether the data will lead to reliable predictions or not? i.e. whether the scatter diagram is of Case A or Case B or something else? Answer: We shall construct a measure of linearlink measure between x and y.
For this consider Case A: r= (∑ xi2 − nx 2 )(∑ yi2 − ny 2 ) ∑x y
i i − nx y iis called the correlation s coefficient between x and y . coefficient between It is unit less. It This “r” is the measure of linear This link we have been seeking. link Example 1: Find r for the data of Example 3 of Ch15: Ch15: (15,0.4), (50,3.1), (35,1.2), (90,4.3)
Solution: ON MODE COMP MODE REG Lin 15 0.4 DT,......, 90 4.3 DT , , REG Output: SHIFT SVAR VAR gives 0.9611 = r r Output:
4=n
590 = ∑ xi yi using the formula
47.5 = x
12050 = ∑ xi2 2.25 = y
29.7 = ∑ yi2 r= = (∑ xi2 − nx 2 )(∑ yi2 − ny 2 ) 590 − 4 × 47.5 × 2.25 (12050 − 4 × 47.52 )(29.7 − 4 × 2.252 ) ∑x y
i i − nx y = 0.9611 Is this value of r large or small? How do I interpret r? B. Theorem B.
Proof: 1 ≤ r ≤ 1 C. Interpretation of r: C.
If r=0.9394 r=0.9413 r=0.7086 r=0.5354 r=±0.1314 r=0 r=±1 We say that x and y are:
Strongly and positively correlated Strongly but negatively correlated
Moderately and positively correlated Moderately and positively correlated weakly and ± vely correlated uncorrelated
Strongly and ± vely correlated Note 1: The correlation between x and y is the same as the correlation between y and x. Note 2: In a study of correlation, take n to be as large as possible (usually n ≥ 20). Otherwise the value r cannot reflect the true correlation too well. For example, in the above calculation (example 1), although r = 0.9611 is quite high, one should not be too excited. This is because n is only 4. If, in the extreme case, n=2, then r must be +1 or 1 (as two points n=2 then must lie on a straight line). This is an exaggeration. Therefore, the larger the n, the better. Therefore, However, in applying the above However, interpretation, one must also note the following points: the Example 2: Suppose x =icecream consumption i.e. total amount of icecream consumed by the inhabitants of a city, and y=number of people drowned at the beach Then, it is likely that the correlation between x and y is +vely high. But that does not mean that eating icecream causes a person to drown. In fact, they just happen to move together because of the weather. Note 3: Correlation means co–relation. It only indicates how x and y move together. x and y need NOT have a causeeffect relationship. relationship. Example 4: However, the high correlation between smoking and lung cancer led to the conclusion that smoking causes cancer. The high correlation was noticed long ago. But the causeeffect relationship was only proved in 1996. Since then, cigarette commercials have been completely banned. Example 3: Brothers’ heights are also highly and +vely correlated. But, one is not the cause and the other is not the effect. and In fact, both the heights are the effect of In inheritance from their parents. inheritance D. Use of LRMode in 50F
ˆ y = a + bx Example 5: Find and r for the following data: Hence predict y when x=50 Solution: ON MODE COMP MODE REG Lin , 15 4.7 DT , ; 54 3.3 SHIFT 2 DT ………………. , ( −) 95 0.5 DT REG Output: n 8=n SHIFT SSUM EXE 23383 = ∑ x ∑x SHIFT SSUM EXE 878.9 = ∑ xi yi SHIFT SSUM EXE ∑ xy
2 2 i 49.375 = x x SHIFT SVAR VAR EXE 2.7875 = y SHIFT SVAR VAR EXE y a SHIFT SVAR VAR EXE b SHIFT SVAR VAR EXE
2 5.6147 = a − 0.05726 = b
2 i 77.37 = ∑ y ∑y SHIFT SSUM EXE r SHIFT SVAR VAR EXE − 0.91457 = r ˆ y 50 SHIFT SVAR VAR EXE ˆ 2.7517 = y Hence ...
View
Full Document
 Spring '10
 Unknown
 Correlation and dependence, Pearson productmoment correlation coefficient, Covariance and correlation, Spearman's rank correlation coefficient, SHIFT SVAR VAR

Click to edit the document details