Lecture 6

# Lecture 6 - 6.2142857*4 =-10.7142858 Thus the regression...

Relationships can be deceiving An outlier is defined to be any value that is more than 1.5*IQR beyond the closest quartile. A value is an outlier if either 1) the value is greater than Q 3 +(1.5*IQR) 2) the value is less than Q 1 -(1.5*IQR) Ex. What is the possible outlier? X: 2 5 7 9 10 11 55 Q 1 = P 25 =(n*k)th # Q 3 = P 75 =(n*k)th #

IQR = A value is an outlier if either 1) the value is greater than 2) the value is less than
How does an outlier effect the correlation? X: 1 2 3 4 5 6 7 Y: 2 5 7 9 10 11 55 x y 7 6 5 4 3 2 1 60 50 40 30 20 10 0 Scatterplot of y vs x

Find the correlation X Y X 2 Y 2 XY 1 2 1 4 2 2 5 4 25 10 3 7 9 49 21 4 9 16 81 36 5 10 25 100 50 6 11 36 121 66 7 55 49 3025 385 x= y= x 2 = y 2 xy

Now drop the outlier we found earlier and find the correlation. X Y X 2 Y 2 XY 1 2 1 4 2 2 5 4 25 10 3 7 9 49 21 4 9 16 81 36 5 10 25 100 50 6 11 36 121 66 x= y x 2 y 2 xy

How does an outlier effect the correlation? Using the same values and the Sum of Squares form the previous examples X: 1 2 3 4 5 6 7 Y: 2 5 7 9 10 11 55 For all the original 7 pairs of points X = x n = 28/7 = 4 Y = y n = 99/7 = 14.142857 b = SSxy/SSxx = 174/28 = 6.2142857 a = Y - b* X = 14.142857 –

Unformatted text preview: 6.2142857*4 = -10.7142858 Thus, the regression line for the original 7 pairs is ˆ y = 6.2142857*x - 10.7142858 x y 7 6 5 4 3 2 1 60 50 40 30 20 10 Scatterplot of y vs x Now drop the outlier we found earlier and find the correlation. So, we are only working with 6 pairs of numbers. X = x n ∑ = 21/6 = 3.5 Y = y n ∑ = 44/6 = 7.333333333 b = SSxy/SSxx = 31/17.5 = 1.771428571 a = Y- b* X = 7.33333333– 1.771428571*3.5 = 1.13333333 Thus, the regression line for the 6 pairs after the outlier is dropped. ˆ y = 1.771428571*x + 1.133333333 x_1 y_1 6 5 4 3 2 1 12 10 8 6 4 2 Scatterplot of y_1 vs x_1 Now look at the two regression lines superimposed on the same graph X-Data Y-Data 7 6 5 4 3 2 1 60 50 40 30 20 10 Variable y * x y_1 * x_1 Scatterplot of y vs x, y_1 vs x_1 The outlier pulls the regression line towards it....
