ECMT 1020 Lecture 7 Multiple Regression III

Some Twists We assumed the relationship between Y and X was linear . In many real contexts this assumption is not valid. Consider the number of errors made by a driver (Y) against the level of alcohol in their blood. It would be likely that as the level of alcohol rises the number of errors would increase disproportionately . Additionally, gains from economies of scale will likely increase at a decreasing rate as output expands This would suggest non-linear relationships .
Non-linear Relationships If the true relationship is non-linear using a linear regression model to estimate will lead to errors and violation of a key regression assumption . Transformations of the variables are needed in order to employ the Least Squares technique discussed earlier validly. Most transformations will use a conversion to logarithms of either the Y or X variable or both (other transformations possible) . Consider the error vs alcohol % scatterplot

Example error 0 50 100 150 200 250 300 350 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 alc% errors
Transformation- ln y vs. x log error 0 2 4 6 8 0 0.05 0.1 0.15 log error

Non Linear (in variables) Relationships Consider the average production costs of a firm with respect to output produced Transformed using Ave. Cost - Quantity Relationship Quantity Ave: Cost \$ x e Y 1 0 α α+ = x Y 1 0 ln =
Promotion impact on Sales Promotion Sales Non Linear (in variables) Relationships Consider the effect of a promotional campaign on sales Transformed using x Y ln 1 0 α + =

Transforming non-linear (in-variables) models Transform x or y to create a “linear” relationship Generally, taking ln (natural log) of x or y or both will work Other transformations such as powers (x 2 , x 3 ) square roots (y 0.5 , x 0.5 ) and inverse (1/y,1/x) are also possible
Non-linear relationships Y X 1 0 10 1 100 2 1000 3 Y 1000 100 10 1 0 1 2 3

Transformation log 10 Y 3 2 1 0 1 2 3 X Y X log 10 y 1 0 0 10 1 1 100 2 2 1000 3 3
Day after recall DAR vs Number of Ads 0 10 20 30 40 50 60 70 80 90 100 0 2 4 6 8 10 12 Number of Ads DAR

Regression:Untransformed data Regression Statistics Multiple R 0.892 R Square 0.796 Adjusted R Square 0.767 Standard Error 9.913 Observations 9 ANOVA df SS MS F Significance F Regression 1 2680.287 2680.287 27.273 0.001 Residual 7 687.936 98.277 Total 8 3368.222 Coefficients Standard Error t Stat P-value Intercept 36.731 6.251 5.876 0.001 Ads 5.939 1.137 5.222 0.001
Residual Plot - untransformed data Ads Residual Plot -25 -20 -15 -10 -5 0 5 10 0 2 4 6 8 10 12 Ads Residua

Transforming the data X Ln X Y Ads Ln (Ads) DAR 8 2.08 83 2 0.69 47 10 2.30 87 1 0.00 23 3 1.10 63 7 1.95 82 2 0.69 55 5 1.61 74 4 1.39 66 DAR vs ln(Ads) 0 10 20 30 40 50 60 70 80 90 100 0.00 0.50 1.00 1.50 2.00 2.50 ln(Number of Advertisments) DAR
Regression: Transformed data Regression Statistics Multiple R 0.980 R Square 0.961 Adjusted R Square 0.955

