This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Chapter 7 Regression and Correlation Simple Least squares regression (SLR): is a method to find the “best fitting line” ( ˆy a bx = + ) fitted to a set of n points [i.e., pairs of observations (x, y)]. SLR minimizes the sum of squares of vertical distances from the observed points to the fitted line. That is why the procedure is also called least squares regression. Correlation coefficient (r): • A number that measures the strength and direction of the linear association between X and Y (both quantitative variables). • – 1 ≤ r ≤ + 1 always . • Correlation (r) = + 1 when there is a perfectly linear increasing relationship between X and Y. • Correlation (r) = – 1 when there is a perfectly linear decreasing relationship between X and Y. • No units. Correlation is a unitless entity • R 2 = (r) 2 = is called coefficient of determination . • R 2 measures the percent of variability in the response (Y) explained by the changes in X [or by the regression on X]. • What does R 2 = 0.81 (= 81%) mean? • How do you find r when you are given R 2 ? For example what is r, if R 2 = 0.81 = 81%? Example : Suppose your friend claims that she can guess a persons age correctly (well, almost). So, to see if this claim is justifiable, you select a random sample of 10 people, ask your friend to guess their ages and then ask the person his/her true age. The following are observed: ID 1 2 3 4 5 6 7 8 9 10 Guessed age 1 8 5 2 6 5 9 2 8 5 8 1 3 6 6 4 4 35 True Age 2 4 5 7 8 5 2 5 5 1 5 6 4 35 First step in regression analysis is to identify the o Independent (explanatory) variable and o Dependent (response) variables. Since the true age determines your friend’s guesses, (and your friend’s guess has no effect on a person’s true age) we have X = True age = Independent Variable Y = Guessed Age = Dependent Variable. Step Two: draw a scatter diagram of the data and interpret what you see (to get some ideas about the relation between two variables). STA 3032 Chapter 7 Page 1 of 19 True Age Guessed Age 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 Scatterplot of Guessed Age vs True Age 1. What do you see? 2. Verify the following summary statistics using your calculator: 44.5 22.02 46.9 24.02 X Y x s y s r = 0.9844 = = = = 3. Compute the slope and intercept of the least squares regression line, given that r = 0.9844 . Slope = b = r × s Y / s X = 0.9844 × 24.02 /22.42 = 1.054651561 Intercept = a = y b x = 46.9 – 1.054651561 × 44.5 = – 0.0319944692 Hence the prediction equation is ˆ 0.03 1.05 y x =  + . Are these results consistent with what you have observed in the scatter plot? 4. Interpret the numerical results: Correlation = r = 0.9844, so there is a o Strong (since r is close to +1) o increasing, (since r is positive) o linear relationship (from scatter diagram) between the true and guessed ages Slope = 1.05 means that For every unit increase in the true age (X), The guessed age (Y) increases by 1.05 years. by 1....
View
Full
Document
This note was uploaded on 06/07/2011 for the course STA 3032 taught by Professor Kyung during the Spring '08 term at University of Florida.
 Spring '08
 Kyung
 Statistics, Correlation, Least Squares

Click to edit the document details