**Unformatted text preview: **Business Statistics (BUSA 3101)
Dr. Lari H. Arjomand
[email protected] Slide 1 NOTE
When in a problem raw
data are given, then
you may use either
Data Analysis Add-Ins
or SWStat+ Add-Ins in
Excel to solve the
problem Slide 2 Chapter 13
Simple Linear Regression
s
s
s
s
s
s s Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
for
Computer Solution Slide 3 Types of Regression Models
Regression
Models 1 Explanatory
Variable 2+ Explanatory
Variables Multiple Simple Linear NonLinear Linear NonLinear Slide 4 Simple Linear Regression Model The equation that describes how y (as a dependent variable) is related to x (as an independent variable) and an error term is called the regression model. The simple linear regression model is:
y = β0 + β1x +ε where: β0 and β1 are called parameters of the model, ε is a random variable called the error term. Slide 5 Simple Linear Regression Equation
s The simple linear regression equation is:
E(y) = β0 + β1x • Graph of this regression equation is a straight line.
• β0 is the y intercept of the regression line.
• β1 is the slope of the regression line.
• E(y) is the expected value (mean) of y for a given x value. Slide 6 Simple Linear Regression Equation
s Positive Linear Relationship
E( y )
Regression line
Intercept β0 Slope β1
is positive
x Slide 7 Simple Linear Regression Equation
s Negative Linear Relationship
E( y )
Intercept β0 Regression line Slope β1
is negative
x Slide 8 Simple Linear Regression Equation
s No Relationship
E( y )
Intercept Regression line β0 Slope β1
is 0
x Slide 9 Estimated Simple Linear Regression Equation
s The estimated simple linear regression equation ˆ
y = b0 + b1 x • The graph is called the estimated regression line. • b0 is the y intercept of the line.
• b1 is the slope of the line.
ˆ •y is the estimated value of y for a given x value. Slide 10 Estimation Process
Regression Model
y = β0 + β1x +ε
Regression Equation
E(y) = β0 + β1x
Unknown Parameters
β0, β1 Sample Data:
x y
x1 y1
. . . . xn yn b0 and b1
provide estimates of
β0 and β1 Estimated
Regression Equation
ˆ
y = b0 + b1 x Sample Statistics
b0, b1 Slide 11 Least Squares Method The least squares method is a procedure for using sample data to find the estimated regression equation, i.,e. b0 and b1. The least squares method uses the sample data to provide the values of b0 and b1 that minimize the sum of the squares of the deviations between observed values of the dependent variable yi and the estimated values of the dependent variable . See next ^
yi slide. Slide 12 Least Squares Method
s Least Squares Criterion
ˆ
m in ∑ ( y i − y i )2
where: yi = observed value of the dependent variable for the ith observation
^
yi = estimated value of the dependent variable for the ith observation Slide 13 Least Squares Graphically n ∑ LS minimizes i =1 Y 2
ei = 2
e1 + 2
e2 + 2
e3 + Regression
Regression
Model
Model Y2 = b0 + b1X 2 + e2
e
e 1 e Regression
Equation 4 2 e 3 2
e4 ˆ
Yi = b0 + b1 X i
X Slide 14 Least Squares Method
s Slope for the Estimated Regression Equation b1 where: ∑ (x − x )( y − y )
=
∑(x − x )
i i i 2 xi = value of independent variables
yi = value of dependent variables Slide 15 Least Squares Method
s yIntercept for the Estimated Regression Equation b0 = y − b1 x where:
_
_ x = mean value for independent variable
y = mean value for dependent variable Slide 16 The Simple Linear Regression Model Illustrated Slide 17 Example 1 John Sherman, the student body
president at Clayton State, is
concerned about the cost to students
of textbooks. He believes there is a
relationship between the number of
pages in the text and the selling price
of the book. To provide insight into
the problem he selects a sample of
eight textbooks currently on sale in
the bookstore. Use Excel to develop
a regression equation. See next slide
for the data. Slide 18 Example 1 (Continued)
Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 Slide 19 Example 1 (Solution)
The regression equation is:
^
yi = 48 + .05X The slope of the line is .05.
Each addition page costs
about a nickel.
The equation crosses the Y-axis at
$48.00. A book with no pages
would cost $48.00. Slide 20 Example 1 (Continued)
We can use the
regression equation
to estimate (predict)
values of Y. The estimated selling price of an 800
page book is $88.00, found by:
Price = $48 + .05(Number of Pages)
= $48 + .05(800)
= $88.00 Slide 21 Thinking Challenge Example Student:
Solve this
Problem You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
Use Excel to see what is the relationship between sales & advertising? Slide 22 Using Excel’s Regression Tool Up to this point, you have seen how Excel can be
Up
used for various statistical analysis.
used Excel also has a comprehensive tool in its Data
comprehensive
Analysis package called Regression.
Regression The Regression tool can be used to perform a
The
complete regression analysis.
complete Slide 23 Using Excel’s Regression Tool
s First enter the data into Excel worksheet, and then:
Step 1 Select the Tools menu
Tools
Step 2 Choose the Data Analysis option
Data
Step 3 Choose Regression from the list of
Regression
Analysis Tools
Analysis Slide 24 Using Excel’s Regression Tool
s Excel Regression Dialog Box Click
Click Slide 25 Excel Solution
Regression Statistics Output ANOVA Output
Data Estimated Regression
Estimated
Equation Output
Equation Ý = -0.10 + 0.70 X Thinking Challenge Example Student:
Solve the
Problem You’re an economist for the county cooperative. You gather the following data:
Fertilizer (lb.)
Yield (lb.) 4
3.0 6
5.5
10
6.5
12
9.0
Use Excel to see what is the relationship between fertilizer
and crop yield? Slide 27 Excel Solution Ý = 0.80 + 0.65 X SWStat+ Solution Simple Linear Regression
Another Example: Reed Auto Sales
Reed Auto periodically has
a special weeklong sale. As part of the advertising
campaign Reed runs one or
more television commercials
during the weekend preceding the sale. Data from a
sample of 5 previous sales are shown on the next slide.
s Slide 30 Simple Linear Regression
s Example (Continued): Reed Auto Sales
Number of TV Ads
1
3
2
1
3 Number of
Cars Sold
14
24
18
17
27 What is the relationship between car sold and TV Ads? Use
Excel to solve the problem and to graph the scatter graph
and trendline.
and s Slide 31 Excel Solution ˆ
y = 10 + 5x Scatter Diagram and Trendline
30 Car s Sold 25
20 y = 5x + 10 15
10
5
0
0 1 2
TV Ads 3 4 Slide 33 SST, SSR, & SSE
s Relationship Among SST, SSR, SSE
SST = SSR + SSE
ˆ
ˆ
( y i − y ) 2 = ∑ ( y i − y )2 + ∑ ( y i − y i ) 2
∑ where: SST = Total variations SSR = Explained variation by regression SSE = Unexplained variation or residual Note: Excel refers to SSE as sum of squares residual. Slide 34 Variation Measures Y Yi
SST = Total
variations (Yi -Y)2
variations SSE = Unexplained
SSE
^
variation (Yi -Yi)2 ˆ
Yi = b0 + b1 X i
SSR = Explained
^
variation (Yi -Y)2
variation Y X X
i Slide 35 Coefficient of Determination
The coefficient of determination (r2) is the
coefficient
proportion of the total variation in the dependent variable
(Y) that is explained or accounted for by the variation in
the independent variable (X). It is the square of the coefficient of correlation ( r). It ranges from 0 to 1. It does not give any information on the direction of the relationship between the variables. Slide 36 Coefficient of Determination (Continued)
s The coefficient of determination equation is:
r2 = SSR/SST Explained variation
r=
Total variation
2 0 ≤ r2 ≤ 1 Slide 37 The Coefficient of Correlation (r) is a measure of the
Coefficient
strength of the relationship between two variables.
Also called Pearson’s r and
It requires interval or
Pearson’s product moment
ratio-scaled data.
correlation coefficient.
P e a r s o n 's r
It can range from
-1.00 to 1.00.
Values of -1.00 or +1.00
indicate perfect and strong
correlation.
Negative values indicate an
inverse relationship and
positive values indicate a
direct relationship. -1 0 1 Values close to 0.0
indicate weak correlation. Slide 38 Different Values of the Correlation Coefficient Slide 39 We calculate the coefficient of correlation from the
following formula.
Formula for r r= + r if b1 is positive, and
2 r= − r if b1 is negative
2 Where, b1 is the slope of
the estimated regression
equation. Slide 40 Coefficient of Determination ( r2 )
s Example Reed Auto Sales (Continued)
Reed Auto periodically has
a special weeklong sale. As part of the advertising
campaign Reed runs one or
more television commercials
during the weekend preceding the sale. Data from a
sample of 5 previous sales are shown on the next slide. Slide 41 Coefficient of Determination (r2)
s Reed Auto Sales Example (Continued)
Number of TV Ads
1
3
2
1
3 Number of
Cars Sold
14
24
18
17
27 Find the coefficient of determination, and explain the answer. Slide 42 Coefficient of Determination
(Solution)
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 88%
88%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold. Slide 43 Excel Solution
r2
SSR SST ˆ
y = 10 + 5x Sample Correlation Coefficient
As we said: rxy = (sign of b1 ) Coefficient of Determination
rxy = (sign of b1 ) r 2
where: b1 = the slope of the estimated regression ˆ equation y = b0 + b1 x Slide 45 Example Continued Sample Correlation Coefficient
Solution
rxy = (sign of b1 ) r 2 ˆ
5x
The sign of b1 in the equationy = 10 + is “+”. rxy = + .8772 rxy = +.9366 b1 Slide 46 Excel Solution
r ˆ
y = 10 + 5x Example 1 (Continued)
John Sherman, the student body president at Clayton State, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Draw scatter diagram and compute the correlation coefficient (r) and r2 . See next slide for the data. Explain. Slide 48 Example 1 (Continued)
Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 Slide 49 Scatter Diagram of Number of Pages and Selling Price of Text
100 90
Price ($)
80 70 60
400 500 600 700 800 Page Example 1 (Continued) Slide 50 Excel Solution
r
r2 Example 1 (Continued) r = 0.61
r2 = .612 = 0.38
The correlation between the number of pages and the
selling price of the book is 0.61. This indicates a
moderate association between the variables. Slide 52 ttest of significance of r
Did a computed r (correlation coefficient)
come from a population of paired
observations with zero correlation? Ho: r = 0 (The correlation in the population is zero.)
H1: r ≠ 0 (The correlation in the population is
different from zero.)
Actual t test for
the coefficient of
correlation t = r √ n- 2
√ 1- r2 With n-2 d.f. Slide 53 Question: Test the hypothesis that there is no correlation in the population . Use an alpha of 0.02. Step 1
H0: the correlation in the
population is zero.
H1:The correlation in the
population is not zero.
Step 3
The statistic to
use follows the
t distribution. Step 2
Significance
level is .02.
Step 4
H0 is rejected if actual
t > critical t value of 3.143
or if actual t < critical t
value of -3.143 (or if p ≤ α = .02)Slide 54 Example 1 (Continued)
Example 1 (Continued) 2
t = r √ n- 2
√ 1- r Step 5
Find the value of the
test statistic. = .61 √ 8 – 2
√ 1 - .612 = 1.898 = 1.90 Since actual t = 1.90 < critical t = 3.143 (and p=0.11> 0.02)
H0 is not rejected. We cannot reject
the null hypothesis that there is no
correlation in the population. The
amount of association could be due to
chance.
See next slide for Excel Salutation Slide 55 Excel Solution Actual t Value
Conclusion: Since actual t = 1.90 < critical t = 3.143 (and p=0.11> 0.02) accept H0 The Regression Model Assumptions
Model y= β0 + β1 x + ε Assumptions about the model error terms, ε ’s
Assumptions
1-Mean Zero The mean of the error terms is equal to 0.
2- Constant Variance The variance of the error terms σ 2
Constant
is the same for all values of x.
is
3- Normality The error terms follow a normal distribution
Normality
for all values of x.
x.
4- Independence The values of the error terms are
Independence statistically independent of each other.
Slide 57 Mean Square Error and Standard Error of Estimation SSE Mean Square Error, point estimate
s = MSE =
of residual variance σ2
n-2
2 Where sum of square errors (SSE) is:
ˆ
SSE = ∑ ( yi − yi ) 2 = ∑ ( yi − b0 − b1 xi ) 2 SSE
s = MSE =
n-2 Standard Error of Estimate, point
estimate of residual standard deviation σ The Standard Error of the Estimate measures the scatter, or dispersion, of the observed values around the line of regression.
Slide 58 Example 1 (Continued)
John Sherman, the student body
president at Clayton State, is
concerned about the cost to students
of textbooks. He believes there is a
relationship between the number of
pages in the text and the selling price
of the book. To provide insight into
the problem he selects a sample of
eight textbooks currently on sale in
the bookstore. Find standard error of
estimate. For the data, see next slide. Slide 59 Example 1 (Continued)
Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 See next slide for Excel Solution to this question. Slide 60 Excel Solution Standard Error of
Estimate + SWStat Solution SWStat+ Solution Testing for Significance (Slope: β1) To test for a significant regression relationship, we
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether
must conduct a hypothesis test to determine whether the value of β11 (slope) is zero.
the value of β (slope) is zero. Two tests are commonly used:
Two tests are commonly used:
t Test and F Test Both the tt test and F test require an estimate of σ 22,,
Both the test and F test require an estimate of σ the variance of ε in the regression modelS.
the variance of ε in the regression modelS. Slide 64 Testing for Significance (Slope: β1): t Test
(Continued) s Hypotheses H 0 : β 1 = 0 H a : β 1 ≠ 0 Slide 65 Testing for Significance (Slope: β1): t Test
Testing for Significance (
(Continued)
s Rejection Rule
Reject H0 if pvalue < α or if actual t < tα/2 or if actual t > tα/2
or if actual where: tα/2 is based on a t distribution
with n 2 degrees of freedom Slide 66 Testing for Significance (Slope: β1): t Test
(Example)
s Reed Auto Sales (Continued)
Reed Auto periodically has
a special weeklong sale. As part of the advertising
campaign Reed runs one or
more television commercials
during the weekend preceding the sale. Data from a
sample of 5 previous sales are shown on the next slide. Slide 67 Testing for Significance (Slope: β1): t Test
(Example)
s Reed Auto Sales Example (Continued)
Number of TV Ads
1
3
2
1
3 Number of
Cars Sold
14
24
18
17
27 Test for β1= 0 apply t test, and also use a 95% confidence interval for β1 to test the hypotheses you used in the t test. Slide 68 Testing for Significance (Slope: β1): t Test
1. Determine the hypotheses. H 0 : β 1 = 0 H a : β 1 ≠ 0 2. Specify the level of significance.
3. Select the test statistic.
4. State the rejection rule. α = .05 Example Continued b1
t=
sb1
Sb1 is standard error of slope (b1)
b1 Reject H0 if pvalue < .05
or if actual|t| > critical t = 3.182 (from t Table with
3 degrees of freedom) Slide 69 Testing for Significance (Slope: ): t Test
5. Compute the value of the test statistic. b1
5
t=
=
= 4.63
sb1 1.08
6. Determine whether to reject H0. Example Continued Since the pvalue of .02 ( from Excel output) is less than .05, and since actual t = 4.63 > critical t = 3.182. We can reject H0. In other words:
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold. Slide 70 Excel Solution
Excel Solution
Standard Error of b1
(Sb1) Since the pvalue of .02 <.05, and since actual t = 4.63 > critical t = 3.182. We reject H0. Confidence Interval for Slope: β1 Confidence Interval for Optional Reading Testing for Significance (Slope: β1 ): F Test Only in the case of Simple Regression Analysis, the F test will provide the same conclusion as the t test; that is, if the t test indicates β1 # 0 and hence a significant relationship, the F test will also indicate a significant relationship. But with more than one independent variable (see next chapter), only the F test can be used to test for an overall significant relationship. Test Statistic
F = MSR/MSE Where: MSR is called mean square regression, and: MSR = SSR / Number of independent variables in the regression equation = SSR/1 = SSR Slide 73 Testing for Significance (Slope: β1 ): F Test s Hypotheses H 0 : β 1 = 0
H a : β 1 ≠ 0 Slide 74 Testing for Significance (Slope: β1 ): F Test
s Rejection Rule
Reject H0 if pvalue < α
Or if actual F > critical Fα where: In case of Simple Regression, Fα is based on an F distribution with:
1 degree of freedom in the numerator and
n 2 degrees of freedom in the denominator.
Value of Fα is found from the F distribution table Slide 75 NOTE To find the value of Fα (critical value of F ) from the F table, you need to have three pieces of information: value of α , and degrees of freedom for both numerator and and denominator. As we said, in the case of simple linear regression model the Fα (critical value of F ) is based on an F distribution with 1 degree of freedom in the numerator and n – 2 degrees of freedom in the denominator. Where n = sample size. Slide 76 Testing for Significance (Slope: β1 ): F Test
1. Determine the hypotheses. H 0 : β 1 = 0
H a : β 1 ≠ 0 2. Specify the level of significance. α = .05 3. Select the test statistic. F = MSR/MSE 4. State the rejection rule. Reject H0 if pvalue < .05
or if actual F > critical Fα (where Fα is based on an F distribution with 1 d.f. in Example Continued numerator and n 2 d.f. in Slide 77
denominator.) Testing for Significance (Slope: β1 ): F Test
Testing for Significance 5. Compute the value of the test statistic.
F = MSR/MSE = 100/4.667 = 21.43 Example Continued
Actual F value
Actual 6. Determine whether to reject H0. Actual F = 21.43 is > Critical F = 10.13 (from F Table.) Also, the pvalue = 0.02 (from Excel output) corresponding to F = 21.43 is less than . Hence, we reject H0.
α = .05
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold .
See next slide for Excel solution Slide 78 Excel Solution
Actual F Value
P value Since Actual F = 21.43 is > Critical F = 10.13, and also since
P = 0.02 < α =0.05, therefore H0 is rejected. Using SWStat+ Using SWStat+ Using SWStat+ Since Actual F = 21.43 is > Critical F =
10.13, and also since P = 0.02 < α
=0.05, therefore H0 is rejected. Some Cautions about the
Interpretation of Significance Tests Rejecting H0: β1 = 0 and concluding that the
relationship between x and y is significant does not enable us to conclude that a causeandeffect
relationship is present between x and y. In other words, just because we are able to reject H0: β1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship
between x and y. Slide 83 Slide 84 Simple Linear Regression Example 2
Using Excel A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected •
•
• Dependent variable (y) = house price in $1000s
Independent variable (x) = square feet
For data, see next slide Slide 85 Sample Data for House Price Model
(Example 2)
House Price in $1000s
(y)
245 Square Feet (x)
1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Slide 86 Regression Using Excel (Solution) Tools / Data Analysis / Regression Slide 87 Excel Solution (Example 2) r R Squared
Regression Statistics
Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations ANOVA The regression equation is:
house price = 98.24833 + 0.10977 (square feet) 10 df
df SSR
SS MS F Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 Significance F 32600.5000 Coefficients
Coefficients
Intercept
Square Feet Standard Error 11.0848 SSE t Stat Pvalue 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 SST Slide 88 Excel Solution (Example 2) Regression Statistics
Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations ANOVA •The calculated t statistic and pvalue for testing whether the regression slope is = 0. In other words to see if b1 =0 10 df
df SS MS Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 9 32600.5000 Coefficients
Coefficients
Intercept
Square Feet Standard error of b1 Standard Error t Stat 11.0848 Significance F 1708.1957 Total F Pvalue 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Slide 89 Interpretation of the Intercept, b0 (Example 2) house price = 98.24833 + 0.10977 (square feet) b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values) • Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet. Slide 90 Interpretation of the Slope Coefficient, b1 (Example 2) house price = 98.24833 + 0.10977 (square feet) b1 measures the estimated change in the average value of Y as a result of a oneunit change in X • Here, b1 = 0.10977 tells us that the average value of a house increases by 0.10977($1000) = $109.77, on average, for each additional one square foot of size Slide 91 Simple Linear Regression (Example 2 Continued)
More Questions What are the values of r and R squared. Explain your answers. Do a hypothesis test of the slope, b1,. Apply both the t test and the F test. In other words, test to see if there is any relationship between the house price (Y) and the size of house (square feet; X) ?
Student:
Solve this part
of the Problem Slide 92 Using SWSat+ to Solve Real Estate Problem Slide 93 End of Chapter 13 Your house is
finished! Lari Arjomand Slide 94 ...

View
Full Document