Bstat10 - Lessons in Business Statistics Prepared By P.K...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lessons in Business Statistics Prepared By P.K. Viswanathan Chapter 10: Correlation and Regression Introduction Managers very often have to assess the nature and degree of relationship between variables. For example, a marketing manager would like to know the degree of relationship between advertising expenditure and the sales volume. Normally, you expect a positive relationship between sales and advertising expenditure. The manager would like to know whether money spent on advertising is justified in terms of sales generated; flat 10 percent increase in advertisement expenditure will result in how much extra sales volume? This type of question could be answered by Correlation and Regression. This Chapter covers the nitty-gritty of correlation and regression. 1) What is Correlation? The manager of the business environment of today is very often interested in finding out whether there is any association between two or more variables and if it is true, he would like to know the strength of relationship between the variables. The strength of relationship is also known as the degree of relationship. In the previous Chapter, we have provided a conceptual framework of the Chi Square distribution that does try to provide some answer to the question of finding out whether the two attributes in a contingency table or associated or not. The degree of relationship between two variables can be elegantly worked out by correlation coefficient when the variables are intervally scaled. 1) What is Correlation?-Continues What is the correlation between demand and price of a product? For all normal commodities we know that when price increases, the demand decreases and when price decreases, the demand increases. Economists call this inverse relationship between demand and price as the price elasticity of demand. So, logically speaking the correlation coefficient between demand and price must be negative. 2) Insights into Correlation 2) Insights into Correlation Positive Correlation: As the value of one variable increases, the value of the other variable also increases. For example you normally expect a positive correlation between advertising and sales. As you increase the amount spent on advertising, the sales volume will also increase. 2) Insights into Correlation Negative Correlation: As the value of one variable increases, the value of the other variable decreases. For example the correlation between demand and price is negative for all normal commodities. The economists say the price elasticity is negative meaning the relationship between demand and price is negative. 2) Insights into Correlation No Correlation: At times, we may not be able to find any correlation pattern. It may be a case of absence of correlation. We say that no linear correlation is observed. This is because the correlation coefficient that we apply in practice is based on a linear relationship. 2) Insights into Correlation 2) Insights into Correlation For a sample of n observations selected on two variables X and Y, the sample correlation coefficient of Karl Pearson is defined as follows: r (X X)(Y Y) (X (Y X) Y) 2 2 This is also known as Product Moment Correlation. Here r represents the sample correlation coefficient. 2) Insights into Correlation Properties of Correlation Coefficient The correlation coefficient is a pure number independent of unit of measurement and scale. The value of r will not change if X and Y are converted into U and V by transformation of scale. The correlation coefficient always lies between –1 and +1 The three extreme positional values of r are shown below: 2) Insights into Correlation Example: The following data refer to two variablespromotional expenses (Rs. Lakhs) and sales (1000 units) collected in the context of a promotional study. Calculate the correlation coefficient and comment. Promotional Expenses 7 10 9 4 11 5 3 Sales 12 14 13 5 15 7 4 2) Insights into Correlation Solution: The basic calculations are shown in the following spreadsheet. 2) Insights into Correlation In the spreadsheet calculations shown above, in the first two columns, the numbers 7 and 10 in the bottom row are the mean of X and Y. That is X = 7 and Y = 10 . Likewise, (X X )(Y Y ) = 83 , (X X ) 2 = 58 and (Y Y ) 2 = 124. 2) Insights into Correlation r (X X)(Y Y) = (X X ) (Y Y) 2 2 83 (58)(124 ) = 0.9787. Comments: The promotional expense is strongly associated with sales and the correlation is very close to 1. 3) Basics of Regression Need for Regression The Pearson’s correlation coefficient gives you just the degree of relationship or association. It cannot help you estimate or predict the response variable for a given independent variable. The response variable is called the dependent variable. In the example we have taken for the correlation coefficient, ‘promotional expense’ is the independent variable and ‘sales’ is the dependent variable. Sales depend on promotional expense. Using regression analysis, it is possible to predict sales for a given promotion expense. For business planning and forecasting, regression is much more useful than correlation. 3) Basics of Regression 4) Regression Model Simple Linear Regression Model: In this model, dependent variable is a linear function of one independent variable. For example, demand may be structured as a linear function of price. Based on sample data collected for the dependent and independent variable, a model is postulated connecting the dependent variable with the independent variable in a linear equation form. Symbolically, we write the sample regression line as follows: Y = a+bx where Y is the dependent variable X is the independent variable a and b are constants. a and b are determined by statistical least square method. b is called the regression coefficient(slope) and a is the constant term (intercept). 4) Regression Model Historical Perspective Just for knowledge sake, it is worth pointing out here that the estimates for a and b obtained by least square method are called ‘Best Linear Unbiased Estimates’ (BLUE) first pioneered by Gauss and Markoff in the context of General Linear Models that take care of Multiple Linear Regression as well. 4) Regression Model Values of a and b in the case of simple linear regression model The values of a and b are obtained by solving the normal equations that are given below: Y na b X YX a X b X 2 Here Y is the dependent variable, X is the indepe ndent variable, and n is the sample size. Solving these two normal equations, You will find b= (X X )(Y Y ) ( X X ) b a= Y X 2 4) Regression Model Multiple Linear Regression Model: Whenever we are interested in the combined influence of several independent variables upon one dependent variable, our model is that of multiple regression. Demand for example, may be a function of price, income of the consumer, advertising expense, industrial growth, and competitor’s price. When all these independent variables change, what happens to the demand is a study of multiple linear regression. 4) Regression Model Example: The following data refer to two variables-promotional expenses(Rs. Lakhs) and sales(1000 units) collected in the context of a promotional study. Set up the simple linear regression model and predict sales when promotional expense is Rs.13 lakhs. 4) Regression Model Promotional Expenses 7 10 9 4 11 5 3 Sales 12 14 13 5 15 7 4 4) Regression Model 4) Regression Model You postulate the model in the standard form as follows: Y = a+bx where Y is the dependent variable X is the independent variable a and b are constants. As already worked out by solving the two normal equations, b= Y (X X)(Y ) ( X X ) 2 = (83/58) = 1.4310 b a = Y X = 10-1.4310(7) = -.017 So the fitted equation is Y = -0.017+1.4310X. This is the line of best fit. 4) Regression Model To predict the sales when promotional expense =13, put X =13 in the fitted equation, you will get the answer. Y = -0.017+1.4310(13) = 18.59. The estimated sales when promotional expense is Rs. 13 lakhs is = 18.59(1000) units =18590. 4) Regression Model The concept of Coefficient of Determination for Statistical Validity R2 is called the coefficient of determination. This gives the contribution made by regression in explaining the variations in the dependent variable. This is worked out as a ratio between the regression sum of square and the total sum of square. In other words, R2 measures the % variation in the dependent variable as explained by the independent variable. Closer the value of R2 to 1, greater is the veracity of the model. To calculate , you need the following terms. 4) Regression Model (Ye )2 Y Regression Sum of Squares = Error Sum of Squares = Total Sum of Squares = (Y e ) 2 Y (Y ) 2 Y Where Ye is the estimated value of Y for a given X. This is obtained from the fitted line of regression. Please note Total Sum of Squares = Regression Sum of Squares+ Error Sum of Squares 4) Regression Model Basic Calculations for the Example 4) Regression Model From the spreadsheet, we have the following: Total Sum of Squares = (Y ) 2 = Y Regression Sum of Squares = Error Sum of Squares = (Ye )2 = Y (Y e ) 2 = Y 124.00 118.78 5.22 2 R = (Regression Sum of squares/Error Sum of Squares) = (118.72/124) = 0.9579 The interpretation is 95.79% of the variations in sales is explained by promotional expense and only about 4.21% is explained by the error or residual term. So, the model fitted is fairly accurate. 4) Regression Model Things to do in a Simple Linear Regression Model Postulate the model Y =a+bX. Enter the sample data for X and Y in Microsoft Excel. Perform the Regression Analysis and get the summary output from Excel Write the Regression Equation using the intercept and coefficient of X from Excel summary output. Predict Y for a given X Validate the model statistically by looking at R2 as well as F statistic in the ANOVA that tests the null hypothesis of no linear relationship. After statistical validation use the model for estimation and prediction 4) Regression Model Multiple Linear Regression Model Multiple linear regression is a logical extension of the simple linear regression. The number of independent variables will be more than one. The same procedure of setting up the model is followed as in the case of simple linear regression. When the number of independent variables increases, Microsoft Excel is the only way out. Doing the calculations using a calculator is not only very tedious but also error prone. If you want to do a multiple regression model involving 10 independent variables using a calculator, you must be crazy! The best way to understand how multiple regression works in practice is through an example. 4) Regression Model Example: Eight patients underwent an operation in a hospital. Measurements of weight(kg), duration of operation(minutes), and blood loss(ml) were taken. The hospital authorities would like to know whether the blood loss was related to weight and duration of operation. The data are as follows: Weight(X1) 44 42 70 45 50 51 36 53 Duration of Operation (X2) 108 85 88 114 110 101 97 121 Blood Loss(Y) 505 492 472 506 484 492 515 466 4) Regression Model 4) Regression Model 1) Regression equation is Y = 584.4716- 1.35887X1- 0.25783X2 2) From regression Statistics on top R2 =0.6551. This means that 65.51% of variations in blood loss is explained by weight and duration. About 35.49% are accounted by error. The R2 value suggests that the model is not robust and more factors will have to be added. Let us see what ANOVA concludes. 3) In ANOVA, calculated F value is 4.75 and F significance is 0.0699(Pvalue). Since the P value is more than the level of significance 0.05, accept the null hypothesis of no linear relationship between Blood loss and weight and duration. You get the same conclusion by working out F critical using the paste function. F or F table in Appendix G. Critical for F(2,5) for 5% is = 5.79. Calculated F is less than critical F. So, accept the null hypothesis. 4) Regression Model Limitations of Multiple Regression Model The most crucial assumption made is that the independent variabl es are not correlated with each other. If they are correlated, then the reg ression coefficients cannot be estimated. This problem is called multicollinearity. The procedure followed for resolving multicollinearity is to drop the independent variable that has the highest standard deviation and then rework the model again. You may also like to use two-stage least square method that is part of econometrics. The other way is to transform a set of correlated independent variables into an uncorrelated set of variables by the technique called principal component analysis. This is an advanced technique requiring the help of advanced statistical software like SPSS. When there are wild fluctuations in one or more of the independe nt variables, multiple regression model crumbles and will be highly unreliable. In order to use the multiple regression model for prediction, you have to first predict the values of the independent variables using some other prediction method. In forecasting problems, multiple regression at best can work fo r short and medium term only. It cannot be successfully used for long term forecasting. ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online