Chapter 17: Chapter More Computational Tricks for Regression and Correlation Regression Example 1: The data below represent the lengths (x) and The breadths (y) of five cuckoos' eggs, measured in mm. mm. x 22.3 24.2 20.8 25.9 23.5 y 16.5 17.8 10.4 19.3 15.8 (a) Fit a regression line Find r. Find ˆ y = a + bx to these data. Solution: Solution ON MODE COMP MODE REG Lin REG , 16.5 DT ,………...,23.5 , 15.8 DT 22.3 SHIFT S-SUM n EXE 5 = n SHIFT S-VAR VAR SHIFT S-VAR VAR SHIFT S-VAR VAR a EXE b EXE r EXE − 21.2344 = a 1.5936 = b 0.9078 = r ˆ ∴ y = a + bx = −21.2344 + 1.5936 x r = 0.9078 (b) It was later found that the pair (20.8,10.4) was incorrectly recorded. It should have been (21.8, 14.0). Modify the R.L. and r found in (a). Modify ∇∇∇ SHIFT SHIFT SHIFT SHIFT ∇∇∇ S-VAR S-VAR S-VAR VAR VAR VAR ∇ 21.8 EXE ∇ 14 EXE AC a EXE b EXE r EXE − 9.6923 = a 1.1203 = b 0.9076 = r ˆ ∴ y = a + bx = −9.6923 + 1.1203 x r = 0.9076 (a) (a) Find the R.L. of x on y. (i.e. x = c + dy ) Find (i.e. ˆ for the corrected data. for Also find r. Is the correlation the same as (b)? Solution: Method 1: Simply reverse x and y and find the regression line as usual. y x 16.5 17.8 14.0 19.3 15.8 22.3 24.2 21.8 25.9 23.5 ˆ ∴ x = c + dy = 11.2754 + 0.7353 y r = 0.9076 Note the following relationship: Note ∑ x y − nx y b= ∑ x − nx i i 2 i 2 ∑ x y − nx y d= ∑ y − ny i i 2 i 2 r= (∑ xi2 − nx 2 )(∑ yi2 − ny 2 ) ∑x y i i − nx y ∴r = 2 (∑ x − nx )(∑ y − ny ) 2 i 2 2 i 2 (∑ xi yi − nx y ) 2 = bd Method 2: Continue the above program Method (i.e. r = 0.9076 is showing on the calculator screen) r2 bd = r ⇒ d = b 2 and x = c + dy ⇒ c = x − dy b y EXE we have the following steps: we x2 ( −) ÷ Ans SHIFT S-VAR VAR SHIFT EXE 0.7353 = d × SHIFT S-VAR VAR + SHIFT S-VAR VAR SHIFT x 11.2754 = c Therefore, and and ˆ x = c + dy = 11.2754 + 0.7353 y r = 0.9076 There is NO change in the value of the correlation. There ˆ ˆ (d) Find y at x = 21.5 using y = a + bx of (b); (d) ˆ Find x at y = 18 using x = c + dy of (c). ˆ Solution: For the first part, we continue the above program: For 21.5 SHIFT S-VAR VAR ˆ ˆ y EXE 14.3946 = y ˆ ∴ y = 14.3946 For the second part, we need direct computation: ˆ ∴ x = 11.2754 + 0.7353 × 18 = 24.5108 Example 2: Example Results of 40 students who took a numerical test and an aptitude test are shown below. Find y = a + bx and Find ˆ Aptitude Test y 601-650 551-600 501-550 451-500 401-450 1 2 2 r test 2 5 8 2 x 2 5 1 Nume rical 1 2 4 3 21-40 41-60 61-80 81-100 Solution: Solution In terms of class marks, the data are as below: below Aptitude Test y 625.5 575.5 525.5 475.5 425.5 1 2 2 Nume rical 30.5 50.5 1 2 4 3 test 70.5 2 5 8 2 x 90.5 2 5 1 Aptitude Test y 625.5 575.5 525.5 475.5 Nume rical 30.5 50.5 1 2 1 2 2 4 3 test 70.5 2 5 8 2 x 90.5 2 5 1 Solution: Solution ON MODE 30.5 425.5 , 525.5 DT, , , , 625.5 DT, , 575.5 SHIFT ; 2 DT, , 525.5 SHIFT ; 50.5 50.5 DT, , ; 475.5 SHIFT 3 DT…………………………………., 475.5 , , , ; ; COMP 90.5 90.5 DT DT 625.5 SHIFT 2 DT, 575.5 SHIFT 5 DT, MODE REG Lin REG 475.5 SHIFT ; 2 DT, 425.5 SHIFT ; 2 DT, 4 525.5 SHIFT SHIFT SHIFT SHIFT SHIFT S-SUM S-VAR S-VAR S-VAR 40 = n n EXE VAR a VAR b VAR r EXE EXE EXE 430.03 = a 1.6933 = b 0.5991 = r ˆ ∴ y = a + bx = 430.03 + 1.6933 x r = 0.5991 Example 3: Example Construct (any number of ) paired data so that the Construct correlation coefficient between x and y is (a) 0.4 (a) (b) -0.35 Solution: Trick: Assign frequency 1+r, 1-r, etc. to the points etc. (1,1) , (-1, 1), etc. as in the table below. x -1 1 x y f 1 1 1+r -1 1 1-r -1 -1 1+r 1 -1 1-r y 1 -1 1-r 1+r 2 1+r 1-r 2 2 2 4 Then x = 0 , y = 0 and we have: Then Correlation = (∑ xi2 − nx 2 )(∑ yi2 − ny 2 ) ∑x y i i − nx y = 4r − 0 =r 4−0⋅ 4−0 (a) For r =0.4, we construct the data set as below: =0.4, x y f 1 1 1.4 -1 1 0.6 -1 -1 1.4 1 -1 0.6 x y 1 -1 0.6 1.4 1.4 0.6 -1 1 This yields r = 0.4 This 0.4 (b) For r = -0.35, we construct the data set as below: below: x 1 -1 -1 1 y f 1 1 -1 -1 1.35 x This yields r = -0.35 This -0.35 0.65 1.35 0.65 -1 1.35 0.65 1 0.65 1.35 y 1 -1 Note: The above method only takes care of the r value. If we The want to take care of other quantities such as s1 and s2 , we need more tricks. See Ex 4. need Example 4: Example Two samples from the same experiment give the Two following results: following Item n x 30 32 y s1 s2 r 0.7 0.8 Sample 1 13 Sample 2 10 125 130 3 4 6 5 The two samples are now combined. The Find the R.L. of y on x and the correlation, based on these 23 pairs of data. these Solution: Solution Let us create a frequency table for Sample 1: create y \x 125+6 125 125-6 y\x 131 125 119 13 − 1 (1 + 0.7) 4 30-3 13 − 1 (1 − 0.7) 4 30 0 30+3 13 − 1 (1 + 0.7) 4 Table I 0 27 0.9 0 5.1 1 0 30 0 1 0 0 33 5.1 0 0.9 Table II Solution: To verify that the above method of creation Solution To works, you may use REG-Mode on Table II: works, y\x 131 27 0.9 30 0 33 5.1 125 0 1 0 119 5.1 0 0.9 ON MODE COMP MODE REG Lin REG ON 27 27 30 33 , , , 131 SHIFT ; 0.9 DT, 125 DT 131 SHIFT ; 5.1 DT, , , 119 SHIFT ; 5.1 DT, 119 SHIFT ; 0.9 DT Then press SHIFT S-SUM Then check the values in Table II. check and SHIFT S-VAR VAR to Solution continued: Solution Similarly, we may create a frequency table for create Sample 2: Sample y \x 120+5 120 120-5 10 − 1 (1 + 0.8) 4 32-4 10 − 1 (1 − 0.8) 4 32 0 32+4 10 − 1 (1 + 0.8) 4 y\x 125 28 0.45 0 4.05 32 0 1 0 36 4.05 0 0.45 0 Table III Table 1 0 10 − 1 (1 − 0.8) 4 0 120 115 Table IV Then apply REG-Mode to both Tables II & IV taken together. Then The results are: The ˆ ∴ y = 95.9956 + 0.8692 x and and r = 0.5098 General formula: General y \x y + sY ( x − sX n −1 )(1 − r ) 4 x 0 1 0 ( x + sX ( n −1 )(1 + r ) 4 y y − sY ( 0 n −1 )(1 + r ) 4 0 n −1 )(1 − r ) 4 Simple Exercises: Simple Q1. Find (a) x y f ˆ y = a + bx and r from these data: 76.5 6.8 3 94.3 120.7 64.3 5.1 1 3.2 2 8.4 3 70.2 7.5 1 Q1. Find Q1. (b) ˆ y = a + bx and r from these data: x 0-5 y 20-40 40-60 60-80 80-100 3 2 5-10 10-15 15-20 20-25 4 4 2 2 1 8 5 1 9 2 Q2. Two samples from the same experiment yield the results below: experiment (a) Combine the two samples to obtain a R.L of y on Combine ˆ x. (i.e. y = a + bx ) (b) What is the correlation based on the 30 pairs? What Statistics Sample 1 Sample 2 n 20 10 x 15 16 y s1 s2 r 0.78 0.85 2.7 2.1 1.5 1.8 2.2 2.0 ...
