Chap9 - STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 9...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 9 BIVARIATE DATA ANALYSIS, CORRELATION, REGRESSION LINE A very important application of statistical methods is study of relations between two continuous variables X and Y . given observed data ( 29 , , 1, , j j X Y j n = . One first examines the data by a plot called a scatter diagram or scatter plot to guess what kind of curve fits the points. Often the curve is one of the following types: (A) Line (B) Curve similar to parabola (C) two groups separated by a gap This chapter discusses quantitative methods that apply when looking at the data by scatter plot concludes that points vary around a fitted line. A. PROBABILITY THEORY OF LINEAR REGRESSION MODEL RELATION ( 29 , X Y To understand theory and practice of correlation coeffi- cients ( 29 , R X Y we must introduce concept of regression line to fit data which is motivated by concept of a model in the popula- tion for the relation of two variables X and Y . We assume that X and Y are both random variables having population means ( 29 X , ( 29 Y Population Variances ( 29 ( 29 ( 29 2 2 X E X X =- , ( 29 ( 29 ( 29 2 2 Y E Y Y =- and population covariance 2 ( 29 ( 29 ( 29 ( 29 ( 29 cov , X Y E X X Y Y =-- Song of Sums formula for variance of sum X+Y and difference X-Y is VAR[X+Y]=VAR[X]+VAR[Y]+2 COV[X,Y] VAR[X-Y]=VAR[X]+VAR[Y]-2 COV[X,Y] Population correlation coefficient is defined ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 cov , , X Y X X Y Y R X Y E X Y X Y -- = = Important inequality: -1<R(X,Y)<1 EXAMPLE> Assume VAR[X]=VAR[Y\=1, COV[X,Y]=.5 Then VAR[X+Y]=3, VAR[X_Y]=1, R(X,Y)=.5 Note that VAR[2X]=4 VAR[X]=4, COV[2X,Y]=2 COV[X,Y]=1, R(2X,2Y)=R(X,Y)=.5 B. LINEAR PREDICTION OF Y FROM X Linear Regres- sion model of Y given X is motivated by problem of predicting value of Y from value of X by minimum mean square predic- tion of Y by a linear function of X denoted ( 29 ( 29 ( 29 ( 29 | Y X Y b X X = +- where b is chosen to minimize ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 2 2 | MSE b E Y Y X E Y Y b X X =- =--- To minimize ( 29 MSE b we apply calculus, take derivative of ( 29 MSE b with respect to b , and solve for value of b at which de- rivative equals zero. Derivative of ( 29 MSE b is ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 E Y Y b X X X X ----- = ; 3 [ ] [ ] cov , X Y bVAR X- = HOW? Conclusion: compute population slope coefficient b by formula [ ] [ ] cov , X Y b VAR X = The minimum of ( 29 MSE b equals [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] 2 2 2 2 cov , cov , cov , 1 VAR Y b X Y b VAR X X Y VAR Y VAR X X Y VAR Y VAR X VAR Y- + =- =- HOW? Conclusion: compute population minimum ( 29 MSE b by formula for minimum mean square error...
View Full Document

Page1 / 13

Chap9 - STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 9...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online