This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Regression coefﬁcients Btimate Standard Error t Probability
Intercept 10.45 | 3.193 | 3.27273422... 0.00205023...
Population 1. 999 0.0600 3131666667... < 0.001
of city
Size of store 0.253 | 0.3450 | 074732509... 045045540...
Amount spent 0.250 0.3262 0.?6640093... 0.4474392...
on promotion
Distance to 1. 609 0.2346 635848252... < 0.001
city center a} At a level of sig‘liﬂcanoe of 0.05. the result of the F test for this model is that the null hypothesis E rejetxed. b} Calculate the 95% conﬁdence interval for the slope (BI) of the variable population or city. You may ﬁnd this useful. Give your
answers to 3 decimal placa. 0 50151 c} Suppose you are going to construct a new model by removing the most insigniﬁcant variable. You would ﬁrst remove: population of city ' size of store
amount spent on promotion
distanoe to city Feedback [2 out of 4] a) You are correct.
b) This is not correct.
1.878 < Bl < 2.120 4:) You are correct. Discussion a) Since the P-yalue of the F test statistic less than 0.05, the F test null hypothesis that all the model ooeﬁ'iciens are equal to zero ls rejected and you
conclude that at least one of the parameters is non-zero. b) A 95% conﬁdence interval for the slope [ii is given by the formula: Mable; n = sample size = 50 p = number of explanatory variables = 4 b1 : estimate of the slope = 1.999 i; = two-tailed 95% critical value in the t distribution with n-p-1 r: 45) degrees of freedom = 2.0141 so,l = standard error in 01 = 0.0600 B1 = population slope = unknown b1 1 t' )< SE.”1 Therefore the conﬁdence interval is given by: 1.999 1' 2.0141. x 0.0500 h) A 95% conﬁdence interval for the slope B1 is given by the formula: hide. variables n = sample size = 50 p = number of explanatory variables = 4 b1 = estimate of the slope = 1.999 t" = halo—tailed 95% critical value in the t distribution with n—p—1 (= 45) degrees of freedom = 2.0141 SE”1 = standard error in bl = 0.0600 B: = population slope = unknown 01 d: t. X 55.31 Therefore the conﬁdence interval is given by: 1.999 in 2.0141 x 0.0600 Which, when rounded to 3 decimal places, can be expressed as: 1.878 < [31 < 2.120 c) In reﬁning a multiple regression model you may want to remove any explanatcry variables than do not signiﬁcantly add to the predictive power of the model.
You do this on the basis of the signiﬁcance of the coeiﬁcient estimate of each variable. Any that are not statistically signiﬁcant are candidates for this process. Supposing that you are followmg this procedure with this model. you want to identify the most insigniﬁcant variables of any insigniﬁcant variables. That
means the variable with the regression coefﬁcient that has the highest F-value. From the regression coefﬁcients output. you can see that this is the variable
size of store. :I 1 of 1. ID: MSTMKCMJLDDBD You are part of a team investigating the identifying motor vehicle accidents. A multiple regression model is to be constructed to predict the number of motor vehicle
accidents in a Down per year based upon the population of the Down, the number of recorded traﬁic olfenses per year and the average annual oemperature in the
town. Data has been collected on 30 randomly seleaed towns: mm I} Find the multiple regression equation using all three explanatory vanahles. Assume that M is population. K2 is number of recorded trafﬁc offenses per year and X3
ls average annual temperature. Give your answers to 3 decimal placu. y = 1 + 2 population + 3 no. trafﬁc oﬁenoﬁ + 4 average temp b} At a level or signiﬁcance of o.osr the result of the F test for this model is that the null hypothesis [E] rejected.
c) The explanatory variable that is most correlated with number of motor vehicle aocldents per year Is: population
- number of ham: offenses average annual temperature
d} The explanatory variable that is least mrrelated with number of motor vehicle accidents per year Is: Population
number oi traffic oﬁense's - average annual temperature
e} The value of R2 [or this model. to 2 dedmal pla. Is equal to 1 i) The value of s for this model, to 3 decimal places, is equal to 2 9} Construct a new multiple regresswn model by removing the variable average annual ramperarure. Give your answers to 3 deﬂmal places. The new regression model equation Is: 9 = 3 + 3 population + 3 no. trafﬁc offences ll] In the new model compared to the previous one. the value oi R2 {to 2 decimal places) is: h) In the new model compared to the previous one, the value of R2 (to 2 decimal places) is: increased
1 decreased
u n ch an ged i) In the new model compared to the previous one, the value of s (to 3 decimal places) is: - increased
decreased
u n ch an ged — Feedback [3 out of 14] a) This is not correct.
1'}: 11.617 + 1.789 x population + 3.473 x no. traffic offences + 0.093 x average temp
b) You are correct.
c) This is not correct.
The explanatory variable that is most correlated with number of motor vehicle accidents per year is population.
11) You are correct.
e) You are correct.
I) This is not correct.
The value ofs for this model is equal to 25.076.
9) This is not correct.
1} = 19.081 + 1.789 x population + 3.472 x no. trafﬁc offences h) This is not correct. In the new model compared to the previous one, the value of R2 is unchanged. I) This is not correct. In the new model oompared to the previous one. the value of s is decreased. DISC" SSIOI‘I Entering the data into a suitable software package. you should obtain the following results: Regresslon analysis a 0.99493951...
E 2503555503... DF 55 MS F Probability
IRegression 3 3,214,260.06620541...|1,071,422.95542347... 1,703.94013939... < 0.001
IResidual 25 16,343.50040161... |523.70047599... ITotal 29 3,230,617.36668702...I Regression coefficients Estimate Standard Error t Probability
I Intercept 11.6165899... I 12931458362... 038983202... I 032910916...
I Population 1.78873764... I 003112565... 5146827765... I < 0.001
No. of recorded 3.4723972... I 008356307... 41.55770554... < 0.001
traffic offences
I Average annual temperature 009323003... I 1.60757764... 0.0579941... I 035419575... a) Therefore, the regression equation with all three variables is: bu + blxl + bzxz + b3X3
= 11.6165899... + 1.78873764... x xi + 3.4723972... >< x2 + 0.09323003... x x3
11.61? + 1.789 x xi + 3.473 x x2 + 0.093 x X3 Rounded to 3 decimal places ‘1' h) Since the P-value ofthe F test statistic less than 0.05. the F test null hypothesis that all the model coefﬁcients are equal to zero In rejected and you
conclude that at least one of the parameters is non-zero. The correlations of each explanatory variable with the response variable are: Currelaticm with no. of motor vehicle accident/year Population 0. 80265297. ..
| No. of traffic offences I 0. 59266568... |
| Average temp I -0. 09224906... | c) Therefore, the variable that has the highest correlation with number of motor vehicle accidents per year is population. d) The variable that has the lowest correlation with number of motor vehicle accidents per year is average annual hemperahure.
e) From the regression model analysis. the value of R2 for the model (to 2 decimal places) is equal to 0.99. 0 From the regression model analysis. the value of s for the model (to 3 decimal places) is equal to 25.076. Performing a new regression analysis with the variable average annual temperature removed you should have the following results: Regression analysis
R2 0. 99493886. .. 24. 6085014. . . or 55 MS F Probability
Regression 2 l3,214,266.7514506... l1,607,133.3757253...l2,653.8313614... <o.oo1 |
Residual 27|16,350.61521606... |aos.57934134... | |
Total 29 | 3,230,617.36666667... | | Regression coefﬁcients w 19.08069091... Population 138880285. .. No. of recorded 3.47190367...
traffic offences 9) Therefore. the regression equation with the two variables population and number oflecorded trafﬁc offences ls: ; = Do + 01x1 + bzxz
= 1908069091... + 1.78880285...X1 + 3.47190367.._X2
= 19.081 + 1.789 x xi + 3.472 x x2 Rounded to 3decimai places From the new regression analysis. you can see that compared to the previous model: h) To 2 decimal places. the value of R2 is unchanged.
i} To 3 decimal places. the value ofs is decreased. 2| 1 of 3 m: usr.MR.-rn.oz.oozo A companyr that manufadurs paper develops a regression model in order to predict the number of salﬁ it will make in a cityI (y) in terms of three variables: the
population of that city {xi}. the number of companies in that city [X2} and the amount ofoompetition in the city {X3}. A sample of 60 cities is collected. Foreach city.
the population of the city, the number of companies in the city and the amount of competition (in terms of net worth of competing companies) In the city are
recorded. Also, the number of sales Is recorded. The following regression equation was calculated: 9: 78.62 + 10.83x1 + assoxz - 15.92><3 Along with this, the following values were calculated:
55M = 922.83
55E = 201.55
An overall F test is to be concluded in order to assas the signiﬁcance of this model. So the hypotheses are:
Ha: All regression Doel’ﬁcients are zero
Ha: Not all regression ooel’ﬁu'ents are zero
Give you answer to part a) to 4 decimal places. Give your answers to part b) as whole numbers. 3) Calculate the mast statistic (F) for this test. F = 0.218 b) 111|s test statistic follows the F distribution with 5? degrees of freedom in the numerator and 1 degrees of freedom In the denominator. [D outofa] a) This is not correct.
F = 85.4684 b) This is not correct.
This test statistic follows the F distribution with 3 degrees of freedom in the numerator and 55 degrees of freedom in the denominator. a) In order to calculate the test statistic, you must ﬁrst calculate the regression mean square (MSR) and the error mean square (MSE). These values are
calculated in terms of the regression sum of squares {55M} and error sum of squares (SSE) respectively. Let p denote the number or independent variables
in the model and n denote the sample size that helped develop the regression equation. So in this question, p = 3 and n = 60. ‘lhen: 55M P
922.83
3 = 302.61 MSR and SSE
n - p - 1
201.55
60 - 3 - 1
159910214... MSE = The test statistic can be calculated using the following formula: hide variables MSR = regression mean square = 307.61
MSE = error mean square = 3. 59910714...
F = test statistic = unknown = MSR
use 307.61
359910214... = 8545841975...
85.4684 Rounded as last step 1' The test statistic can be calculated using the following formula: hide variables MSR = regression mean square = 30?. 61
MSE = error mean square = 159910714...
F = test statistlc = unknown MSR
_ MSE = 307.51
3. 59910714. . . 8546341975...
= 85.4654- Rounded as last step b) This test statistic follows an F distribution. The numbers of degrees of freedom in the numerator and denominator of this distributlon are calculated in terms
of the number of Independent variables In the model (p) and the size of the sample that is used (n). The number of degrea of freedom In the numerator is
p = 3. The number ofdegrees offreedom in the denominator is n - p - 1 = 60 - 3 - J. = 56. :I ofa ID: M57.MR.TM.oz.ooao Muriel has construded a multiple regression model and has conducted an F test of the model at a level of signiﬁcance of 0.05. The model has four independent
variables. The rault of the F test was that the null hypothesis was rejected. Select the approprlate concluslon that can be drawn: the dependent variable is not related to any of the independent variables
exactly one of the regression coefﬁcients is non—zero
all of the regression coefficients are zero
- at least one of the regression coefﬁcients is non-zero
all of the regression coefficients are non—zero :| Feedback [1 out of 1] You are correct. Discussion The signiﬁcanoe of a multiple regression model can be tested by using an F test. The F test for a multiple regression model has the following null and alternate
hypotheses: H0=Bl=BZ=B3=Bd=D
Ha: at least one is not equal to zero Therefore, If the null hypothesis is rejected, you oonclude that at least one 01' the regression coefficients ls non-zero. mat is, that the model is signiﬁcant. 3 2 or 3 m: nsr.nn.m.cs.dozob The following {our diagrams depict four residual plots for four different regression models. Select the residual plot that suggeas that the assumption of independence
of error terms is violated: ..vc.¢.uoc.-.-.o oun.n...'un...o.o
elo .. . .uu. ....o-..-oo.oocu time :1: 8'0 .P..-.. ".o"oo_uo° a. can. 3": time [1 outh 1] You are correct. Discussion Most of the assumptions in multiple regression are assumptions about the error terms. A multiple regression model with two independent variables can be
written as: \i=ﬁci+l31"1'l'32><2"'E The a term represents an independent variable, and there are several assumptions about that variable. It is assumed to be normally distributed with a mean of
D and a constant variance. (In this way, for ﬁxed values of XI and x2, y is a random variable following the normal distribution with mean 30 + 31x1 4- 32x2 and a
variance that does not depend on the values of x1 and x2.) Also. the error terms are assumed to be independent. That is, there should be no correlation between
different error terms. (This means that the value assumed by y for any given values of XI and x2 is independent of the value assumed by y for other given
values of x1 and x2.) The assumption that the error terms are independent is tested by plotting the residuals in the order that they were gathered in. In this way, you are testing
whether there is any correiation between the data points that are recorded close to one another. It may turn out that an event half-way through the data
collection process affected the variables being studied. For example, consider a regression model that is developed to calculate support for the government in
terms of the age and the annual salary of a citizen. A sample of 50 people is collected to develop a regression equation. Now it might turn out that, at some
point during the data collection, a political event occurs that lowers overall support for the government. The data is no longer independent: larger values for the
dependent variable will tend to con'elate with other larger values for the dependent variable (and smliariy, smaller values for the dependent variable will tend to
correlate with other smaller values of the dependent variable). In the midual plot: lime The residuals seem to be positive for the data points collected early on in the study while they tend to be negative for the data points collected later on in the
study. There is therefore discernible correlation between the error terms. and they are not independent. 2| 1 of 3 ID: Ms-r.Mn.‘rM.oa.uom Consider the multiple regression model:
Y330+31x1+32><2+5
A sample Is drawn and a prediction equation is calculated, as are the residuals. Seled the method that is most oommonly used to test each assumption of the model: Plot residuals against Plot residuals against Plot Plot values of independent residuals residuals
variables against time in histogram predicted values for
Y a) y has a linear relationship with each of x1 and
X). b) The error terms are independent of one another. ' I c) The random variable 2 follows a normal
distribution. d) The variance of e is constant. I 3 Feedback [D out of 4] :| Feedback [0 out of 4] a) This is not oorrect.
To test this assumption you plot the miduals against values of the independent variahlm.
b) This is not oonect.
To test this assumption you plot the mlduals against time.
c) This is not oonect.
To test this assumption you plot the mlduals In a histogram.
d) This is not oonect.
To test this assumption you plot the residuals against predicted values for y. Discussion Assumptions about multiple regression models In general terms, the assumptions about multiple regression models are about the nature of the dependent variable, y. In particular, if there are two
independent variables the ammption is that for given values of XI and x2, y is a normally distributed random variable with expected value E(y) = Bo + 31x1 + 32x2 and a varianoe that does not depend upon the values of the independent variables. Also, the value assumed by y at one given set of
values of the independent variables is independent of the value assumed by y at another set. Most of these assumptions can be restated in terms of the error random variable, e. That is why the tests for the validity of the assumptions are all related to
the residuals in the sample. Linearity It is awn-ted that the dependent variable y varies linearly with each of the independent variables. The most common way of testing this is to plot the residuals
against the valuesofeach independent variable in the sample. The residuals should be centered about 0 with no overall pattern. any pattern in the residuals
would suggest a non-linear relationship. For example. suppose in plotting the residuals against the variable x1. you ﬁnd that the rduals tend to be positive for low and high values of XI but negative for middle values. This would suggest that y pmibly has a quadratic relationship with x1. Llnearltv It is assumed that the dependent variable y varies linearly with each of the Independent variables. The most common way of testing this is to plot the residuals
against the values of each independent variable in the sample. The residuals should be centered about 0 with no overall pattern. Any pattern in the residuals
would suggest a non-linear relationship. For example. suppose In plotting the residuals agalnst the variable x1. you find that the residuals tend to be positive for low and high values of x1 but negative for middle values. This would suggest that y possibly has a quadratic relationship with x1. Independence It is assumed that the error terms are independent. The main way that this assumption would be violated is through the existence of some relationship
between consecutive measurements in the sample. For this reasonr to test independence of the error terms you plot the residuals In the order they were
gathered. In other words, you plot residuals against time. Through this you can detect the presence of any autooorrelation in the data. Positive autooorrelation
occurs when consecutive error terms have the same sign {Mime or negative) more often than would be expected. Negative autooorrelation occurs when
consecutive error terms tend to switch signs more often than would be expected. Normality It is assumed that e follows a normal distribution. To test this assumption, you treat the residuals as sample data points from this random variable and put them
into a frequency distribution. There are then several options available to you. You can do a goodness-of-ﬁt test, a normal probability plot, or a simple histogram in order to assess whether 5 is normally distributed. Equalltyofvarlanoe There is an assumption that the variance of y at diﬁerent levels of the Independent variables does not depend on the values those variables take. This
assumption can be restated as: the variance of c is constant. In simple linear regression, this assumption isltested by plotting the residuals against the values of
the independent variable. However in multiple regression It is most convenient to plot the residuals against y. The residuals should be centered about I] with no
overall pattem. If. for example, there are extreme (postive or negative) values for the residuals for small values of y and low values for the residuals for large
values of y, this would suggest that s does not have constant variance. 3| 3 of 3 ID: MST.MR.m.os.co1o A multiple regression model has be2n developed in order to preriict the number of work injuries that occur at a mechanic workshop in a month (y) based on the
amount of money spent on maintaining govemment-standard safe machinery and equipment (x1) and the and the number of customers that the workshop gets a month (3(2).
Y=Bo+l31x1+ﬂzii2+5 A sample of ?5 workshops is...

View
Full Document

- Fall '13
- ChristaLSorola