ass3scanned.pdf - 134 Multiple Linear Regression Model...

This preview shows page 1 - 7 out of 7 pages.

Image of page 1

Subscribe to view the full document.

Image of page 2
Image of page 3

Subscribe to view the full document.

Image of page 4
Image of page 5

Subscribe to view the full document.

Image of page 6
Image of page 7
You've reached the end of this preview.

Unformatted text preview: 134 Multiple Linear Regression Model Consider the orthogonal transformation z = P’ y and its inverse p-l-l—li [7+1 )1 = Z CzZz+ Z CZZ1+ Z 6,2, l: i=p+2—l l=p+2 =fiA+(fl—fiA)+(y—m where [M is the projection of y. on L A (X), and fl is the fi — [2A is in L(X) and perpendicular to LA(X); Hy — A A l and ”u — #4112 = Zf;+2_z z?- _ Sincey ~ N(u, 021), itfollows thatz = P’y ~ N(P’u, 021). Under the n1 hypothesis AB = 0, the mean vector projection of y on L(A fillz = 2:72p” 2,2 = SO Pi” Pl’p P'u‘ PZ’M — 0 P33” . 0 ypothesis [l = fiA is in L A (X) and the column of P2 are perpendicular to L A (X). In addition, P371, = 0 since the columns of f are perpendicular to L(X). Hence, i. 21, 22, . . . , zn are independent normal random variables with variance 02. ii.‘ zp+2_z, . . . , z p+1 have zero means under the null hypothesis Ali 2 0. iii. zp+2, . . . , z” have zero means under the original model, even if the null hypothesis is false. Thus, i. H [L — fiAHZ /02 = 2:21;] z,-2 is the sum of 1 independent x12 random variables. It has a X12 distribution. ii. 5(3) is a function of zp+2, . . . , z”, whereas “[1 — fiAHZ is a function of zp+2_l, . . . , zp+1. Furthermore, z], 22, . . . , 2,, are independent. This shows that S (,8) and H fl, — [L A“2 are independent. I EXERCISES ' ‘ .Ionsider the regression on time, 4.2. For the regression model y, = ,30 + 6, with , y, =fio +fi1t+et, witht=1, 2, . . . , n. . n=2andy’=(2, 4), draw the datain Here, the regressor vector is x’ = ( 1, 2, . . . , two—dimensional space. Identify the n). Take 11 = 10. Write down the matrices orthogonal projection of y onto L(X) = L(l). X ’ X, (X’X)—1, V(,3), and the variances of ,30 Explain geometrically 30, ft, and e. and 31. 4.3. Consider the regression niodel 3’1" =.30 + 13m + 6i, i = 1,2,3. With 1 2.2 x = 3 y = 3.9 2 3.1 draw the data in three—dimensional space and identify the orthogonal projection of y onto L(X) = L(l, x). Explain geometrically ,8, fl, ' . . and e. 1, Consider the regression model ; ' yz=fio+fiixi+ei.i=1,2,3.With 1 2 x: 3 y= 4 2 6 draw the data in three-dimensional space and identify the orthogonal projection of y onto L(X) = L(l, x). Explain geometrically ,3, pi, and e. 4.5. After fitting the regression model, y: flu + £31161 + 62x2 + [33363 + E on 15 cases, it is found that the mean square error .92 = 3 and . ‘ 0.5 0.3 0.2 0.6 0.3' 6.0 0.5 0.4 0.2 0.5 0.2 0.7 0.6 0.4 0.7 3.0 OH)“1 = Find a. The estimate of V031). b. The estimate of Cov(,31, 33). c. The estimate of Cord/31, 33). d. The estimate of V031 — 33). a When fitting the model I E0) = flo + .lel + 132X2' to a set of n = 15 cases, we pbtained the least squares estimates fig 2 10, 61 = 12, 62 = 15, and s2 = 2. It is also known that , ' 1 0.25 0.25 (X’X)—1= 0.25 0.5 —0.25 0.25 —0.25 2 a. Estimate V032). b. Test the hypothesis that ,62 = 0. Exercises 135 . Estimate the covariance'between 81 and .32. . . Test the hypothesis that ,81 = ,32, using both the t ratio and the 95% confidence interval. . The corrected total sum of squares, SST = 120. Construct the AN OVA table and test the hypothesis that ,81 = ,82 = 0. Obtain the percentage of variation in y that is explained by the model. . Consider a multiple regression model of the price of houses (y) on three explanatory variables: taxes paid (in), number of bathrooms (x2), and square feet 053). The incomplete (Minitab) output from a regression on n = 28 houses is given as follows: - The regression equation ,is price 2— 10.7 + 0.190 taxes + 81.9 baths + 0.101 sqft Predictor Coef SE Coef t p Constant —10.65 24.02 taxes 0.18966 0.05623 baths ' 81.87 47.82 sqft 0.10063 0.03125 Analysis of variance Source DF SS 3 504541 MSFp Regression Residual Error Total 27 541 1 19 . Calculate the coefficient of determination R2. . Test the null hypothesis that all three regression coefficients are zero (Ho: fl = .62 51 ,83 = 0). Use significance level 0.05. . 'Obtain a 95% confidence interval of the regression coefficient for “taxes.” Can you simplify the model by dropping “taxes”? Obtain a 95% confidence interval of the - regression coefficient for “baths.” Can you simplify the model by dropping “baths”? 4.8. Continuation of Exercise 4.7. The incomplete (Minitab) output from a multiple regression 136 Multiple Linear Regression Model of the price of houses on the two explanatory variables, taxes paid and square feet, is given as follows: The regression equation is price = 4.9 + 0.242 taxes + 0.134 sqft I Predictor Coef SE Coef t p _ Constant 4.89 23.08 taxes 0.24237 0.04884 sqft 0.13397 0.02537 Analysis of variance Source DF SS MS F p Regression 2 500074 250037 Residual Error Total 541 l 19 a. Calculate the coefficient of determination R2. b. Test the null hypothesis that both regression coefficients are zero (H0: 131 = [32 = 0). Use significance level 0.05. 0. Test whether you can omit the variable “taxes” from the regression model. Use significance level 0.05. d. Comment on the fact that the regression coefficients for taxes and square feet are different than those shown in Exercise 4.7. Fitting the regression yi = I30 + [31161-1 + fizxiz + 8,- on n = 30 cases leads to the following results: 30 2,108 5,414 X’X: 2,108‘152,422 376,562 5,414 376,562 1,015,780 5,263 X’y= 346,867 and y’y=1,148,317 921,939 a. Use computer software to find (X ’ X )‘1. Obtain the least squares estimates and their standard errors. b. Compute the t statistics to test the simple hypotheses that each regression coefficient is zero. ’ c. Determine the coefficient of variation R2. (The complete data are given invthe file abrasion.) 4.10. The following matrices were computed for a certain regression problem: 15 3,626 44,428 X’X: 3,626 1,067,614 11,419,181 , 44,428 11,419,181 139,063,428 2,259 X’y: 647,107 7,096,619 (X’Xr‘: 1.2463484 2.1296642 x 10-4 —4.1567125 x 10-4 77329030 x 1076 —7.0302518 x 10-7 1.9771851 x 10-7 3.452613 6“ = 0.496005 0.009191 y’y = 394,107 a. Write down the estimated regression equation. Obtain the standard errors of the regression coefficients. 'b. Compute the t statistics to test the simple hypotheses that each regression coefficient is’equal to zero. Carry out these tests. State . your conclusions. .A study was conducted to investigate the determinants of survival size of nonprofit US. hospitals. Survival size, y, was defined to be the largest U.S. hospital (in terms of the number of beds) exhibiting growth in market 'share. For the investigation, 10 states were selected at random, and the survival size for nonprofit hospitals in each of the selected states was determined for two time periods I: . 1981—1982 and 1984—1985. Furthermore, the following characteristics were collected on each selected state for each of the two time periods: x1 = Percentage of beds that are in for—profit hospitals. x2 = Number of people enrolled in health maintenance organizations as a fraction , of the number of people covered by hospital insurance. 253 = State population in thousands. X4 : Percentage of state that is urban. The data are given in the file hospital. a. Fit the model y = [30 + .31961 + 52362 + ,B3x3 + [34x4 + 6 b. The'influence of the percentage of beds in for—profit hospitals was of particular interest to the investigators. What does the analysis tell us? ' c. What further investigation might you do with this data set. Give reasons? (1. Rather than selecting 10 states at random, how else might you collect the data on . survival size? Would your approach be an improvement over the random selection? 4.12. The amount of water used by the production facilities of a plant varies. Observations on water usage and other, possibily related, variables were collected for 17 months. The data are given inthe file water. The explanatory variables are TEMP : average monthly temperature(°F) PROD = amount of production DAYS 2 number of operating days in the month PAYR = number of people on the monthly plant payroll HOUR = number of hours shut down for ' maintenance The response variable is USAGE = monthly water usage (gallons/ 100). a. Fit the model containing all five independent variables, y = flo + fil TEMP + 162 PROD + ,83 DAYS ' +fi4PAYR+fi5HOUR+e Plot residuals against fitted values and residuals against the case index, and comment about model adequacy. b. Test the hypothesis that I31 = ’33 = .35 : 0_ 0. Which model or set of models would you suggest for predictive purposes? Briefly justify. Exercises 137 d. Which independent variable seems to be the most important one in determining the amount of water used? e. Write a nontechnical paragraph that summarizes your conclusions about plant water usage that is supported by the data. Data on last year’s sales (y, in~100,000s of dollars) in 15 sales districts are given in the file sales. This file also contains promotional expenditures (x1, in thousands of dollars), the number of active accounts (x2), the number of competing brands (x3), and the district potential (X4, coded) for each of the districts. 4.13. a. A model with all four regressors is proposed: y = .30 + [31951 + [32162 + [33163 + 134.164 + 6, e N N (0, (72) Interpret the parameters fig, fil , and ,84. b. Fit the proposed model in (a) and calculate estimates offli, i =0, 1, . . . , 4, and 02. c. Test the following hypotheses: (i) .54 = 0; (ii) 133 = I34 = 0; (iii) I92 =.33;' (iV).51= .32 ='fi3 = 164 = 0 d. Consider the reduced (restricted) model with fi4 = 0. Estimate its coefficients and give an expression for the expected sales. 6. Using the model in (d), obtain a prediction for the sales in a district where x1: 3.0, x2 :45, and x3 = 10. Obtain the corresponding 95% prediction interval. The survival rate (in percentage) of bull semen after storage is measured at various combinations of concentrations of three materials (additives) that are thought to increase the chance of survival. The data listed below are given in the file bsemen. 4.14. % Survival % Weight 1' % Weight 2 % Weight 3 ' (y) (x1) (x2) (263) 25.5 1.74 5.30 10.80 31.2 6.32 5.42 9.40 25.9 6.22 8.41 7.20 38.4 10.52 4.63 8.50 18.4 1.19 11.60 9.40 26.7 1.22 5.85 9.90 138 Multiple Linear Regression Model % Survival ‘ % Weight 1 % Weight 2 % Weight 3 (1’) (X1) (x2) (x3) 26.4 4.10 6.62 8.00 25.9 6.32 8.72 9.10 32.0 4.08 4.42 8.70 25.2 4.15 7.60 9.20 39.7 10.15 4.83 9.40 35.9 1.72 3.12 7.60 26.5 1.70 5.30 8.20 Assume the model y = .30 + 61x1 + [32x2 + [33X3 + 6. a. b. d. 6. Compute X’X, (X’X)_1, and X’y. Plot the response y versus each predictor variable. Comment on these plots. . Obtain the least squares estimates of 3 and give the fitted equation. Construct a 90% confidence interval for i. the predicted mean value of y when x1=3,x2;8, andx3=9; ii. the predicted individual value of y when x1;3,x2=8,andX3=9. Construct .the' AN OVA table and test for a significant linear relationship between y and the three predictor variables. An experiment was conducted to study the toxic action of a certain chemical on silkworm larvae. The relationship of loglo (survival time) to log10(dose) and log10(larvae weight) was investigated. The data, obtained by feeding each larvae‘a precisely measured dose of the chemical in an aqueous solution and recording the survival time until death, are given in the following table. The data are stored in the file silkw. 10810 10gm 10g10 Survival Time (y) Dose (x1) Weight (x2) 2.836 0.150 0.425 2.966 0.214 0.439 2.687 0.487 0.301 2.679 0.509 0.325 2.827 0.570 0.371 2.442 0.590 0.093 2.421 0.640 0.140 10£510 10810 10g10 Survival Time (y) Dose (x1) Weight (x2) 2.602 0.781 0.406 2.556 0.739 0.364 2.441 0.832 0.156 2.420 0.865 0.247 2.439 0.904 0.278 2.385 0.942 0.141 . 2.452 1.090 0.289 2.351 1.194 0.193 Assume the model y = ,60 +,61x1 +132x2 + E. a. Plot the response y versus each predictor variable. Comment on these plots. Obtain the least squares estimates for ,8 and give the fitted equation. Construct the AN OVA table and test for a significant linear relationship between y and the two predictor variables. ' . Which independent variable do you consider to be the better predictor of log(survival time)? What are your reasons? Of the models involving one or both of the independent variables, which do you prefer, and why? 4.16. You are given the following matrices computed for a regression. analysis: 9 136 269 260 X’X— 136 2,114 4,176 3,583 . — 269 4,176 8,257 7,104 260 3,583 7,104 12,276 45 648 X' = y 1,283 1,821 9.610 0.008' '—0.279 —0.044 0.008 0.509 —O.258 0.001 (mo-1: —0.279 —0.258 0.139 0.001 -—0.044 0.001 0.001 0.0003 Ev EXERCISES '5 6 Consider the following regression model: Salary (in $1,000): 20 + 2x + 52 + 0.7xz where x is the number of years of experience, and z is an indicator variable that is 1 if you have obtained an MBA degree and 0 otherwise; xz is the product between years of experience and the indicator variable 2. Graph salary (y) against years of experience (x). Do this for both groups (without MBA and with MBA) on the same graph, and comment on the degree of interaCtion. . You are interested in the starting salaries of accounting, management information systems, and economics majors. You consider a model that factors in the GPA of students, obtaining the following regression model: Salary (in $1,000): —15'+ (18)GPA + (3)1NDacc + (2-1)INDn1is INDacc is an indicator variable that is 1 if the ' student is in accounting and 0 otherwise. lNDmiS is an indicator variable that is 1 if the . student is an MIS student and 0 otherwise. a. Calculate the expected salary difference between an accounting and an economics student with the same GPA. b. Calculate the expected salary difference between an accounting and an MIS student , with the same GPA. .Me data are taken from Mazess, R. B., _ Peppler, W. W., and Gibbons, M. Total body composition by dualphoton (153 Gd) ‘ absorptiometry. American Journal of Clinical Nutrition, 40, 834—839, 1983. The data are given in the file bodyfat. A new method of measuring the body fat percentage is investigated. The body fat, age (between 23 and 61 years), and gender (4 males and 14 females) of 18 normal adults are listed below. Graph body fat against age and gender (you may want to overlay these two on the same graph). Consider a regression model Exercises 163 with age and gender as the explanatory variables. Interpret the results, and discuss the effects of age and gender. Is it useful to include an interaction term for age and gender? y = % Fat x1 2 Age x2 = Gender 9.5 23 1 27.9 23 0 7.8 27 1 17.8 27 1 31.4 39 0 25.9 41 0 27.4 45 1 25.2 - 49 0 31.1— 50 0 34.7 53 0 0 0 0 0 O 0 0 0 42.0 53 29.1 54 32.5 56 30.3 57 33.0 58 33.8 58 41.1 60 34.5 61 . You are regressing fuel efficiency (y) on three predictor variables, 'x1 , x2, and x3, and you obtain the following fitted regression model: ' I30 = 30 + 31161 + 32):; + 33163 The coefficient of determination for this regression model is R2 290%. A regression of XI on x2, x3‘ gives you an R2 of 60%; A regression of xz on x1, x3 gives you an R2 of 80%; and ' ' A regression of x; on x1, 152 gives you an R2 of 90%. Calculate and interpret the variance inflation factors for the regression coefficients ,6 1, [32, and ,33 . 164 Specification issues in Regression Models .Which one of the following statements I Average y. . - ‘ suggests the presence of a multicollinearity Factor 1: Factor 2: Factor 3: from 5 problem: x1 x2 x3 Experiments a. High R2 and high I ratios _1 _1 _1 79.7 b. High correlation between explanatory I 1 —1 —-1 V 74.3 variables and dependent variable . —1 1 —1 76.7 c. Low pairwise correlation among ‘ ' 1 . 1 ‘1 ' 70'0 independent variables _1 — i 1 gig 2 - _ ' d. Low R and low t ratios . _1 1 1 87.3 e. High Rzand mostly insignificant t ratios 1 1 1 73.7 5.6. The data are taken from Latter, H. O: The cuckoo’s egg. Biometrika, 1, 164—176, 1901. The data are given in the file cuckoo. Each listed yield is actually the average of The female cuckoo lays her eggs into the five individual independent experiments. The nest of foster parents. The foster parents’are variance of individual measurements can be - usually deceived, probably because of the estimated from the five replications in each similarity in the sizes of the eggs. Latter cell. It is found that investigated this possible explanation and 8 5 measured the lengths of cuckoo eggs (in Z Z (yij — 52,-)2 millimeters) that were found in the nests of s2 : i=1f=1 : 40 0 the following three species: , . 8(5 — 1) ~ . He (1 e S arrow a. Estimate the effects of factors 1—3. That is, V2g2 0p 23 9' 20 9 23 8 25 0 24 0 estimate the coefficients in the regression 21.7 23.8 22.8 23.1 23.1 23.5 mOdel . 23‘0 23'0 Y=fio+ [31161 +fl2x2 +133X3 +8' Robin: - 218 23-0 23.3 22.4 23.0 23.0 Calculate the Standard errors Of the 230 22 4 23.9 223 220 22.6 coefficients and interpret the results. . 220 22.1 21.1 230 Comment on the nature of the design matrix. Wren: ‘ 19.8 221 21.5 209 22.0 21.0 b. Is it possible to learn something about 22.3 21.0 20.3 209 220 200 interactions? Consider the interaction 20.8 21.2 21.0 effect between factors 1 and 2. Write out the Xmatrix of the regression model y = .30 + 16116] + 132362 + ,33X3 + fl4x1x2 + 8. Estimate the model and comment on this Obtain the analysis of variance table and test whether or not the mean lengths of the eggs issue. » found in the nests of the three species are 5.8. In a study on the effect of coffee consumption different. Display the data graphically, and on blood pressure, 30 patients are selected at interpret the results. random from among the patients of a medical practice. A questionnaire is administered to 5.7. Percenta e ields from a chemical ti g y reac on each patient to get the following information: for changing temperature (factor 1), reaction time (factor 2), and concentration of a certain x1 : Average number of cups of coffee ingredient (factor 3) are as follows: consumed/day ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern