This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 9 BIVARIATE DATA ANALYSIS, CORRELATION, REGRESSION LINE A very important application of statistical methods is study of relations between two continuous variables X and Y . given observed data ( 29 , , 1, , j j X Y j n = . One first examines the data by a plot called a scatter diagram or scatter plot to guess what kind of curve fits the points. Often the curve is one of the following types: (A) Line (B) Curve similar to parabola (C) two groups separated by a gap This chapter discusses quantitative methods that apply when looking at the data by scatter plot concludes that points vary around a fitted line. A. PROBABILITY THEORY OF LINEAR REGRESSION MODEL RELATION ( 29 , X Y To understand theory and practice of correlation coeffi cients ( 29 , R X Y we must introduce concept of regression line to fit data which is motivated by concept of a model in the popula tion for the relation of two variables X and Y . We assume that X and Y are both random variables having population means ( 29 X , ( 29 Y Population Variances ( 29 ( 29 ( 29 2 2 X E X X = , ( 29 ( 29 ( 29 2 2 Y E Y Y = and population covariance 2 ( 29 ( 29 ( 29 ( 29 ( 29 cov , X Y E X X Y Y = Song of Sums formula for variance of sum X+Y and difference XY is VAR[X+Y]=VAR[X]+VAR[Y]+2 COV[X,Y] VAR[XY]=VAR[X]+VAR[Y]2 COV[X,Y] Population correlation coefficient is defined ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 cov , , X Y X X Y Y R X Y E X Y X Y  = = Important inequality: 1&lt;R(X,Y)&lt;1 EXAMPLE&gt; Assume VAR[X]=VAR[Y\=1, COV[X,Y]=.5 Then VAR[X+Y]=3, VAR[X_Y]=1, R(X,Y)=.5 Note that VAR[2X]=4 VAR[X]=4, COV[2X,Y]=2 COV[X,Y]=1, R(2X,2Y)=R(X,Y)=.5 B. LINEAR PREDICTION OF Y FROM X Linear Regres sion model of Y given X is motivated by problem of predicting value of Y from value of X by minimum mean square predic tion of Y by a linear function of X denoted ( 29 ( 29 ( 29 ( 29  Y X Y b X X = + where b is chosen to minimize ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 2 2  MSE b E Y Y X E Y Y b X X = = To minimize ( 29 MSE b we apply calculus, take derivative of ( 29 MSE b with respect to b , and solve for value of b at which de rivative equals zero. Derivative of ( 29 MSE b is ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 E Y Y b X X X X  = ; 3 [ ] [ ] cov , X Y bVAR X = HOW? Conclusion: compute population slope coefficient b by formula [ ] [ ] cov , X Y b VAR X = The minimum of ( 29 MSE b equals [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] 2 2 2 2 cov , cov , cov , 1 VAR Y b X Y b VAR X X Y VAR Y VAR X X Y VAR Y VAR X VAR Y + = = HOW? Conclusion: compute population minimum ( 29 MSE b by formula for minimum mean square error...
View
Full
Document
 Fall '07
 Parzen
 Statistics, Correlation

Click to edit the document details