Unformatted text preview: Simple Linear Regression Chapter 13 Simple Regression  Part 1 1 Correlation & Regression We will investigate the relationship between two variables, Y and X In general, we want to answer the question: "Does X tell us something about how Y behaves?" Simple Regression  Part 1 2 Textbook's Example At Sunflowers Apparel, how well can you predict sales by the size of the store? You would naturally think that more stuff on display ==> higher sales. Sales from 14 stores are in Site.xls. Simple Regression  Part 1 3 Exploring the data We use an XY scatter plot to display both the strength and type of relationship. We use the correlation coefficient to summarize the strength of the relationship. Simple Regression  Part 1 4 The example
Scatter Plot of Sales versus Store Size 14 12 Annual Sales (Millions) 10 8 6 4 2 0 0 1 2 3 4 5 6 7 Square Feet (1000s) Correlation = ? Simple Regression  Part 1 5 Computing r Many of you can do this with the calculators you used in STA 2023 Our text covers this back in the descriptive statistics chapter 3. Page 116: Cov ( X , Y ) r= S x Sy Simple Regression  Part 1 6 Computations
Store Square Feet Annual Sales 1 1.7 3.7 2 1.6 3.9 3 2.8 6.7 4 5.6 9.5 5 1.3 3.4 6 2.2 5.6 7 1.3 3.7 8 1.1 2.7 9 3.2 5.5 10 1.5 2.9 11 5.2 10.7 12 4.6 7.6 13 5.8 11.8 14 3 4.1
Simple Regression  Part 1 4.523367 Covar 2.999414 SD sales 1.707981 SD SqFt 0.950883 Correl 7 Excel and PhStat notes PhStat has the Scatter Plot off the descriptive statistics menu. Excel has functions COVAR and CORREL. Correlation not directly in PhStat, but there are templates Covariance.xls and Correlation.xls. Simple Regression  Part 1 8 What is covariance? It measures how much X and Y tend to vary in the same direction. Positive covariance means ___________. A negative covariance means _________. However, it is hard to interpret because there is no "standard". What does the covariance of 4.52 mean? If store size was in actual square footage, covariance would be 4524.
Simple Regression  Part 1 9 Correlation coefficient
A "standardized" covariance Population covariance between X and Y = Y X
Estimated by Sample covariance between X and Y r= SY S X
Simple Regression  Part 1 10 Correlation coefficient The correlation coefficient (r) measures how much Y and X tend to vary in the same direction on a standard scale It will always be between 1 and +1
r = +1 implies a perfect positive relationship r = 1 implies a perfect negative relationship r = 0 implies no linear relationship exists! Simple Regression  Part 1 11 Correlation patterns Simple Regression  Part 1 12 Two other examples Files are in the simple regression lecture module: __________ and __________. Might wish to print out the page with the graph. For later, will also want the regression output. Simple Regression  Part 1 13 Strength of the Relationship Correlation measures the strength of the linear relationship between Y and X How large does this measure (r) have to be to show a "useful" linear relationship? There is a formal hypothesis test on 500501. For now, here is a quick "rule of thumb". Simple Regression  Part 1 14 The quick rule of thumb Correlation is significant if: 2 r > n This is approximately what would occur in a hypothesis test at = .05 significance. If you are close to that you might want to perform the formal hypothesis test.
Simple Regression  Part 1 15 Quick test results Site selection: n = 14, r = .95088 Next example: ____________ n = ___ and r = _____ Simple Regression  Part 1 16 Simple Linear Regression Obtaining the Fit
Sections 12.2 and 12.3 Simple Regression  Part 1 17 Regression analysis Correlation tells us how strongly Y and X are related. Regression analysis is the name of the procedure that estimates the form of this relationship. We'll begin with simple regression, which assumes the form: ^ = b +b X Yi 0 1 i
Simple Regression  Part 1 18 Regression notation Y is the variable we want to predict We believe X influences how Y behaves i b0 b1 is the estimated value of Y at Xi is the Yintercept in the equation is the slope of the regression line Simple Regression  Part 1 19 Example (page 474)
n = 14 Sunflowers Apparel stores Y = Annual sale in Million$ units. Values range from 2.7 to 11.8 X = Size of the store in 1000square foot units (values from 1.1 to 5.6)
Simple Regression  Part 1 20 Scatter Plot
Sunflow ers Apparel 14 12 Annual Sales (Millions) 10 8 6 4 2 0 0 1 2 3 4 5 6 7 Size of Store (1000 sq feet) Simple Regression  Part 1 21 Fitting the Regression Line Our goal: Find the straight line that best fits the data we've collected minimizes the error in fit The best equation will be the one that The equation is: ^ Yi = b0 + b1 X i ^ ei = Yi  Yi
22 The fit error is thus: Simple Regression  Part 1 Obtaining the line to predict sales
Sunflow ers Apparel 14 12 Annual Sales (Millions) 10 8 6 4 2 0 0 1 2 3 4 5 6 7 Size of Store (1000 sq feet) + Errors  Errors Simple Regression  Part 1 23 Balancing out the errors The fit error for the ith plot diagram is: point on the scatter ^ ei = Yi  Yi We would like the sum of the + errors to be the same as the sum of the errors. make this happen. However, there are many lines that can Simple Regression  Part 1 24 The "Least Squares" Line So, which of these solutions is the best one? Select the line with the minimum sum of squared error terms: ei = ?
2 i n This requires ... (gulp!) ...
Simple Regression  Part 1 CALCULUS!
25 The Least Squares Estimators Slope: b1 = r Sy Sx
POOF! Intercept: b0 = Y  b1 X There are many equivalent forms (478)
Simple Regression  Part 1 26 Regression with sales data
Sunflow ers Apparel 14 12 Annual Sales (Millions) 10 8 6 4 2 0 0 1 y = 1.6699x + 0.9645 R2 = 0.9042 PHStat scatter plot Size of Store (1000 sq feet) 2 3 4 5 6 7 Excel's Trend Line function and R2
Simple Regression  Part 1 27 Output from PHStat Simple Regression  Part 1 28 Interpretation of results Remember the variables are
Y = Annual sales per store (in Million$) X = Size of store (1000 square feet) The estimated slope (b1) tells us: The estimated intercept (b0) tells us:
Simple Regression  Part 1 29 Second example, via workbook
Open your data file _______________ 2. Open Simple Linear Regression.XLS 3. Copy your data to SLRData sheet (the X variable goes in column A, Y in col. B)
1. Simple Regression  Part 1 30 Updating the formulas
The workbook assumed data was in cells A2 through B15. On the COMPUTE worksheet, need to change this to A2 through B25 or B36 or whatever. Select cell range L2:M6 2. In L2, fix the A and B upper limits 3. Hit Ctrl/Shift Enter or AppleKey Enter
1.
Simple Regression  Part 1 31 Second example, interpretation Variables are Y = _____ and X = ______. Equation is: The estimated slope (b1) tells us: The estimated intercept (b0) tells us: Simple Regression  Part 1 32 How good is our new model?
There are two standard ways to judge:
1. How much of the variation in the Y values (sales) can be attributed to the different values of X (store size)? In general, how small (or large) are the errors in fit?
Simple Regression  Part 1 33 1. R A universal measure of fit
2 The Coefficient of Determination:
The variation in Y explained by the X  Y relationship R = The variation in Y
2 The R2 value is: Always between 0 and 1 Usually interpreted as a percentage The square of correlation (for simple regression)
Simple Regression  Part 1 34 Output from PHStat
90.4 % of the variation in sales is due to variation in store size. Simple Regression  Part 1 35 How is R2 computed? ANOVA table: Total variation in the Y values is SST = 116.9543 The amount of unexplained variation is SSE = 11.2067 The difference is thus the variation explained by the regression equation or SSR = 105.7476 The ratio of explained to total is how we get R2 = 105.7476 / 116.9543 = .9042
Simple Regression  Part 1 36 Size of the typical error (SYX) For each observation i, its error is given by: ^ ei = Yi  Yi
SYX = To find the "typical error," use this formula: ei
i n 2 n2 This is the amount by which the prediction typically misses the actual value
Simple Regression  Part 1 37 Output from PHStat
SYX our text calls this the standard error of the estimate Simple Regression  Part 1 38 SYX in our example The typical error (called the standard error of the estimate) for our model is: SYX = .9664 This means that: That doesn't sound so bad, if you consider that annual sales ranged from _______ to _______. Simple Regression  Part 1 39 Our second example n = ___ R2 Y = _______ X = ________ = = SYX Simple Regression  Part 1 40 ...
View
Full Document
 Spring '08
 Thompson
 Regression Analysis, PHStat, Sunflow ers Apparel

Click to edit the document details