This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: MN1025 – Business Statistics 35 Lecture 8—Friday 29/2/2008 LINEAR REGRESSION Reference: Lind et al. , Chapter 13. 8.1 Regression: introduction In the last lecture we introduced the concept of the best-fit line, which is an approximation to the data. The closeness of this approximation is measured by the correlation coefficient r . In this lecture we will see how the best-fit line can be used for prediction. Example: Suppose the College wishes to save money and asks: can we predict exam results well from weekly work? If the answer is yes, we dispense with exams. So to test this we need a sample of students already examined and see if for each student, their average weekly mark predicts their exam result. To estimate the predictive power, one uses Linear Re- gression . 8.2 Back to sales and scores Back to Example 7.7 (sales and scores). Here are the data again: Data Display Row scores sales 1 4 5 2 7 12 3 3 4 4 6 8 5 10 11 We wish to analyse how good an approxima- tion to the data the best-fit line is. We use STAT → REGRESSION → REGRESSION. We are asked to choose a RESPONSE column and a PRE- DICTOR column. In this case the only reasonable choice is “scores” as predictor (or cause) and “sales” as response (or effect). We get the Regression Anal- ysis table shown below. Regression Analysis: sales versus scores The regression equation is sales = 1.20 + 1.13 scores Predictor Coef SE Coef T P Constant 1.200 2.313 0.52 0.640 scores 1.1333 0.3569 3.18 0.050 S = 1.955 R-Sq = 77.1% R-Sq(adj) = 69.4% Analysis of Variance Source DF SS MS F P Regression 1 38.53 38.53 10.08 0.050 Residual Error 3 11.47 3.82 Total 4 50.00 The regression equation sales = 1.20 + 1.13 scores in the printout is the equation of the best-fit line. We can plot this on the scatter plot or get Minitab to plot it for us: we use STAT → REGRESSION → FITTED LINE PLOT and enter again “sales” as response, “scores” as pre- dictor. In all these examples we assume that the underlying populations have an approximately normal distribu- tion, and that a relation of the form sales = m × scores + c + random error is reasonable. In general, there could be more than one predictor. For instance, we could think that staff experience was a relevant factor and get a relation of the form sales = m 1 × scores + m 2 × experience+ c +random error . Here we have two predictors, “scores” and “experi- ence”. Generally, by a suitable choice of additional predictors we can reduce the random error. In this course, we will always use a single predictor only. 8.3 Testing if the slope is nonzero For the population of scores and sales there is an underlying (population) regression line: sales = m population × scores + c population ....
View Full Document
This note was uploaded on 04/17/2008 for the course MN 1025 taught by Professor Schack during the Spring '08 term at Royal Holloway.
- Spring '08