This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Review for Final Exam Scatterplots : Plot of Y vs X, two quantitative variables, measured on the same individual Correlation: A numerical measure of the strength and direction of linear association between two quantitative variables. Facts about Correlation coefficient(r) • Correlation does not distinguish between explanatory and response variables. • r is always between 1 and +1. • Interpretation: positive/negative, strong /weak. • r measures only the strength of the linear relationship between x and y. • Outliers can have a strong effect on r. • Correlation is a “pure number,” in other words, it is unitless. • r=+1 means perfect positive linear relationship. • r=1 means perfect negative linear relationship. • It measures the linear relationship between two quantitative variables. The correlation coefficient r=0 doesn’t mean there is no relationship between two variables. It means there is no linear relationship between the variables. 1 • Linear Model: ε β β + + = x y o 1 where β 0 and β 1 are unknown constants and ε~N(0, σ 2 ). This is a simple regression model. y: dependent variable x: explanatory variable ε: error • Fit Model x y 1 ˆ ˆ ˆ β β + = ˆ β , 1 ˆ β : leastsquares coefficients y ˆ : fitted/predicted value Residual=observed( y ) predicted( y ˆ ) ˆ i i i e y y = The leastsquares coefficients ˆ β and 1 ˆ β are the quantities that minimize the sum S, where S is: 2 2 2 1 ˆ ˆ ˆ ( ) ( ( )) i i i i o i S e y y y x β β = = = + ∑ ∑ ∑ Formulas: 1 ˆ β = ˆ β = 2 Interpreting the Regression Line • ˆ β is the y intercept. (the point where the regression line crosses the yaxis) It is the predicted value of y when x=0. Do not interpret the intercept if 0 is not in the range of the data or if 0 is not a possible value for x. • 1 ˆ β is the slope It is the expected/predicted change in y for one unit change in x. R 2 = proportion of the variability in y that is explained by x The bigger the SSR, stronger the linear relationship between y and x is. ∑ = = n i i y y SST 1 2 ) ( , the Total Sum of Squares ∑ = = n i i y y SSR 1 2 ) ˆ ( , the Regression Sum of Squares, and ∑ = = n i i i y y SSE 1 2 ) ˆ ( , the Error Sum of Squares (or residual sum of squares), we can write Total sum of squares=Regression sum of squares +Error sum of squares i.e., SST= SSR+ SSE 3 This is called the Analysis of variance (ANOVA) identity. Formula for R 2 2 1 SSR SSE R SST SST = =  R 2 is called the coefficient of determination since it measures the proportion of variability in y explained by its regression on x. For simple linear regression, i.e. when we have only one explanatory variable, R 2 =r 2 where r is the correlation coefficient)....
View
Full
Document
This note was uploaded on 08/05/2011 for the course STA 3032 taught by Professor Kyung during the Summer '08 term at University of Florida.
 Summer '08
 Kyung
 Statistics, Correlation

Click to edit the document details