56 hypothesis testing in mlr the null hypothesis for

This preview shows page 14 - 17 out of 55 pages.

5.6 Hypothesis testing in MLR The null hypothesis for testing if a covariate is related to the response is H 0 : β j = 0 where we have the test statistic t = ˆ β j - 0 SE ( ˆ β j ) t n - ( p +1) under H 0 . Example 5.2. For β 1 (size), we have t = 31 . 26 21 . 47 = 1 . 46 From the t-distribution table for n = 18 , we see that this corresponds to a p-value between 0 . 1 and 0 . 2 ( 0 . 1625 to be exact). We therefore do not reject H 0 (p-value > 0 . 05 ) i.e. there is no significant relationship between overhead and size, after accounting for the other variates. 11
Winter 2018 STAT 331 Course Notes 6 JANUARY 25, 2018 6 January 25, 2018 6.1 Scatter plot matrix For a given set of explanatory variates and a response variate, we can plot a matrix of 2D scatter plots of each variate against all the other variates. Figure 6.1: Size, employees, and clients are all correlated with overhead. Note however that size, employees, and clients are all correlated with each other therefore it would probably suffice to only include one of these explanatory variates without losing much information in our model. From this matrix, we can visually see which explanatory variates are correlated to the explanatory variate but also which explanatory variates are correlated with each other. 6.2 Multicollinearity When strong (linear) relationships are present among two or more explanatory variates, we say the variates exhibit multicollinearity . Intuitively, multicollinearity means some explanatory variates are dependent and it would not be required to have all the extraneous dependent variates in model since they do not introduce much additional explained variance/information. In fact, multicollinear is detrimental : it leads to inflated variances of the associated parameter estimates ( ( X T X ) - 1 has inflated diagonal entries, thus SE ( ˆ β j ) = ˆ σ q ( X T X ) - 1 jj is inflated), resulting in inaccurate conclusions from hypothesis tests and confidence intervals (which depend on SE ( ˆ β j ) ) (intuitively, our estimate of the impact of one unit change of x j , ˆ β j , while controlling for the others tend to be less precise since there is some dependency happening when “changing” x j with another correlated x k ). 6.3 Variance inflation factor (VIF) To assess whether a variate x j is a problem in terms of multicollinearity, we can regress x j onto all other explanatory variates. We can then caclulate the variance inflation factor for x j VIF j = 1 1 - R 2 j 12
Winter 2018 STAT 331 Course Notes 7 JANUARY 30, 2018 The VIF j can be interpreted as the factor by which the variance of ˆ β j is increased relative to the ideal case in which all explanatory variates are uncorrelated (i.e. columns of X are orthogonal). Example 6.1. Suppose we do this for x j = x 3 : we regress the number of employees on all other explanatory variates (see scatter plot matrix above).

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture