This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Simple Regression 3: Inference and Prediction
Sections 13.7-13.8 Simple Regression 3 1 Our Examples Site.XLS Annual sales per store at Sunflowers Apparel predicted by the size of the store. ________ A new one __________ Simple Regression 3 2 The simple regression model We fit a model of the form: ^ Y = b0 + b1 X And then we checked it to determine if this was a reasonable way to view the data. At this point we might be interested in using the model for statistical inference.
Simple Regression 3 3 Statistical inference
Some of the things we want to do: 1. Test to see if there is a significant relationship between X and Y 2. Make an interval estimate of the effect of X on Y 3. Predict the value of Y that may occur at a specific X Simple Regression 3 4 Testing for significance Any time we have a set of data we can go ahead and get a correlation or regression. Excel will give us the numbers. But, is something really there or is it "coincidence"? We can test for significance. Simple Regression 3 5 A test on the correlation Earlier we had a "quick rule" about the correlation. It said correlation is significant if ______. Here we look at a more formal test that can use a different or can be configured with a one-sided hypothesis.
Simple Regression 3 6 The T-test (page 500-501) The hypotheses are: H0 : = 0 H1 : 0
1- r 2 sr = n-2
t STAT = r -0 1- r 2 n-2
7 The standard error for the correlation r is: Calculate the T- stat. The formula is:
Simple Regression 3 Test for new example
Data set _____________ X= ________ Y = _________ n = ___ r = ___ Simple Regression 3 8 Inference about the slope ( 1) In a regression, b1 is our estimator of the population slope. It is a function of the sample data, so the estimator b1 is a random variable that follows some distribution. The standard deviation in this distribution is the standard error for the slope estimate sb1
Simple Regression 3 9 Interval estimate of slope A confidence interval estimate of the population's true slope (1) is given by: b1 t sb1 t is a value from the T-distribution with n-2 degrees of freedom Simple Regression 3 10 Output for Sunflowers sales regression b1 s b1
Simple Regression 3 interval
11 Computation of interval
n=14 so there are n-2=12 df the t value for a 95% interval is 2.179 interval is: 1.6699 2.179 (.1569) 1.6699 .3419 or _______ to _______
Simple Regression 3 12 Interpretation of interval In general This is an estimate of the effect on the Y-value if X were increased by one unit In this example For each 1000 square feet of store space, annual sales tend to increase by _______ to _______. Simple Regression 3 13 Testing for Significance
Is this a real predictive relationship or merely a coincidence? H0: 1 = 0 H1: 1 0
H0 claims that a change in X does not produce a change in Y, thus X has no predictive power tstat = b1/S(b1) (on printout) P-value
Simple Regression 3 14 Output in sales example b1 s b1 t p
15 Simple Regression 3 Our example
Ho: 1 = 0 ( An increase in sq feet does not affect sales) H1: 1 0 (Sales change when store size increases) Decision rule: tstat = pvalue =
Simple Regression 3 16 Testing for "direction" The test on the spreadsheet always assumes a two-sided H1 In some applications, the alternative might be stated as H1: 1 > 0 or H1: 1 < 0 The test statistic on the printout would be computed the same, but the p-value assumes the two-sided version. Cut it in half.
Simple Regression 3 17 One of our other examples
Data: _______________ Y = ________ and X = ___________ The logic of the problem suggests this type of hypothesis test: Simple Regression 3 18 Other hypothesis tests about 1 Sometimes there is a reason to test some other hypothesis than H0: 1 = 0 Suppose we have Y = sales and X = the amount of advertising If we believe that every dollar spent in advertising yields 10 dollars in sales, we could perform a test of the form H0: 1 = 10 No problem. Change the test statistic to TSTAT = (b1 10)/S(b1)
Simple Regression 3 19 An illustration in our ____ example Suppose this was an industry standard: Does that look like it is true here? Simple Regression 3 20 Inference about the intercept Similar to inference about slope 1 Standard error and interval are on the output When are we interested in these inferences? When the data includes ZERO (when 0 is in the range of our X values) When the intercept has NATURAL MEANING Simple Regression 3 21 In which example "would we"? Sunflowers Apparel: X (square footage) ranges from 1 to 6 b0 = 0.9645 Example: __________ X ranges from ____ to _____ Meaning of b0 is ???
Simple Regression 3 22 Our estimate b0 = _____ and sb0= _____ for the interval is: The interval has been computed as: Interpretation ???? Simple Regression 3 23 The test in the ANOVA table There is an F test for significance of the model performed in the ANOVA table. It is kind of redundant here since it gives us the same info as the T-test This will become more important in multiple regression where several X variables are used. Simple Regression 3 24 Prediction in EComm.XLS An internet retailer company wants to develop a model for predicting how many orders they will need to process the next day. They get immediate information on how many "hits" their ecommerce site receives, but don't get the order information from a third-party until the next day. Hits and orders are both in units of 1000. Simple Regression 3 25 Reasonable prediction?
Orders versus E-Commerce Site Hits
25.0 20.0 Orders (1000s) 15.0 10.0 5.0 0.0 0 100 200 300 400 500 600 700 800 Thousands of hits on site r = .6194
Simple Regression 3 26 S imple Linear Regression Analysis Regression S tatistics Multiple R 0.6194 RS quare 0.3837 Adjusted R S quare 0.3569 S tandard Error 4.1257 Observations 25 ANOVA df Regression Residual Total 1 23 24 S S MS F 243.7159 243.7159 14.3185 391.4841 17.0210 635.2000 tS tat P-value 2.0679 0.0501 3.7840 0.0010 Intercept Hits C oefficients S tandard Error 4.9456 2.3917 0.0194 0.0051 Estimated Orders = 4.9456 + .0194*Hits
Simple Regression 3 27 Estimated Orders = 4.9456 + .0194*Hits Remember both orders and site hits in units of 1000 b1 = .0194 implies each thousand hits brings 19.4 orders b0 = 4.9456 says there are 4946 orders even if there were no hits. ??? Se = 4.1257 implies a typical prediction error is 4126 orders.
Simple Regression 3 28 Prediction of Y at X = xi We might want to know how many orders 350,000 hits might bring. We will look at two related types of prediction, for the average Y at X=xi and for individual Y values. Simple Regression 3 29 Estimating the average value of Y at X = xi We can produce a point estimate by just plugging in to the Y-hat equation: i = b0+ b1 xi If we want an interval estimate, we will need to slap a margin of error around this, so we need a standard error. Simple Regression 3 30 Confidence Interval for Average Y Interval Formula: ^ Yi t n -2 SYX hi The value of hi is a function of the X value in the prediction (page 504) 2 1 (xi - X ) + n n 2 (X i - X )
Simple Regression 3 31 Doing the calculations at 350,000 hits It is not too hard to get the point estimate:
Orders = 4.9456 + 0.0194 Hits = 4.9456 + 0.0297 (350) = 10.753 or 10,753 orders We would not want to do the standard error by hand. Thankfully, PHStat will compute this.
Simple Regression 3 32 Prediction in PHStat Simple Regression 3 33 Confidence Interval Estimate A new sheet called "CIEandPI" is output Data X Value Confidence Level Intermediate Calculations Sample Size Degrees of Freedom t Value XBar, Sample Mean of X Sum of Squared Differences from XBar Standard Error of the Estimate h Statistic Predicted Y (YHat) For Average Y Interval Half Width Confidence Interval Lower Limit Confidence Interval Upper Limit For Individual Response Y Interval Half Width Prediction Interval Lower Limit Prediction Interval Upper Limit
Simple Regression 3 350 95% 25 23 2.068658 438.8 650360 4.125657 0.052125 11.72099 1.948515 9.772477 13.66951 8.754178 2.966814 20.47517
34 Predicting an Individual Y-value Now we are talking about one point in the distribution of Y at X = xi The interval's center point is simply the n+1 as before. The standard error is different. It is larger than the one for predicting the average Y value because now we are try to say where an entire distribution lies.
Simple Regression 3 35 Prediction Interval The interval has the form: ^ t S Yi n - 2 YX 1 + hi Where the standard error is now (page 505): SYX 1 + hi
Simple Regression 3 36 Interpretations Confidence interval for Average Y Prediction interval for Individual response Y Simple Regression 3 37 A couple of comments
_ Both standard errors contain (xi -2 X)2 SYX 1 (xi - X ) 1+ + n sum This implies we do our best prediction right at the center of our X values (here the average is 438.8 thousand hits). It also implies that extrapolation will yield pretty 38 Simple Regression 3 Approximate prediction interval
The prediction standard error is: SYX 1 ( x i - X )2 1+ + n sum If we are predicting near the center of the interval, the "gory term" will be small. With a sample of 20 or so, the entire term under the square root won't matter much either. What do you get if you just use SYX alone?
Simple Regression 3 39 ...
View Full Document
- Spring '08