1 Simple Linear Regression k   MGMT 670: Business Analytics Krannert Graduate School of Management Purdue University

2 § Describe the causal relationship between a  response  (or  dependent) variable and at least one  explanatory  (or  independent) variable. § Used for prediction e.g., sales vs. promotion activities § Correlation analysis  Measures the association between numerical  variables Not used for prediction e.g., German mark and Japanese yen Regression Models
3 1. Define problem or question 2. Specify model 3. Collect data 4. Do descriptive data analysis 5. Estimate unknown parameters 6. Evaluate model 7. Implementation: Prediction,    Regression Modeling Steps

4 Step 1: Define Problem § Most critical step § What are the objectives of building the model? § Who will use the model? § Are resources available (data etc.)? § How will the results be implemented? Example:   Develop a model to explain the variations in sales by advertising and promotional activities.

6 Simple Linear Regression Models § Widely used for Trend Analysis § One dependent (response) variable § One independent (explanatory) variable Y Y   = b 0 + 1 X X b1 = Slope b0 = Y-intercept
7 Visualizing the Linear Regression Model  … ε 1 β 0 β 0

8 Estimated model does not match the true model …  House price versus # of Bathrooms 0 50000 100000 150000 200000 250000 300000 350000 400000 0 1 2 3 4 5 # of Bathrooms House price Actual Data Best Model Fit True Model
9 Example: Pharmex Drugs § Collected data from 50 randomly selected regions Promote : Pharmex’s promotional expenditures as a percentage of  those of the leading competitor Sales : Pharmex’s sales as a percentage of those of the leading  competitor Region Promote Sales 1 77 85 2 110 103 3 110 102 4 93 109 5 90 85 6 95 103 7 100 110 8 85 86 9 96 92 10 83 87 11 88 99 12 94 101 13 104 109 14 89 81

10 § Descriptive statistics § Correlation Coefficient § Scatter Plot Step 4  Descriptive Data Analysis
11 Descriptive Statistics Descriptive Statistics: Promote, Sales   Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Promote 50 0 97.88 1.24 8.74 77.00 92.75 96.50 103.00 Sales 50 0 99.74 1.40 9.89 81.00 92.00 101.00 108.00 Variable Maximum Promote 117.00 Sales 119.00 Correlations: Promote, Sales   Pearson correlation of Promote and Sales = 0.673 P-Value = 0.000

12 Y X r = 0 Y X r = 0 Correlation Coefficients § Linear association between two variables Y X r = -1 Y X r = 1
13 Example: Pharmex Drugs -Scatter Diagram

14 Step 5  Estimate Unknown Parameters § Least Squares Method Minimize the sum of squared errors i = 1 n ( ) y i - y ^ i 2   Observed value x Y e 1 e 2 e 3 e 4 e 5 Predicted value Prediction Equation
Y X Why Squares? i

