GR5205 p94-p110.pdf - 3.8 Analysis of Variance for Multiple...

This preview shows page 1 - 6 out of 17 pages.

3.8Analysis of Variance for Multiple Linear RegressionSuppose we are interested in theoverallrelationship between the response variableyagainst allcovariatesx1, x2, . . . , xp-1. To assess the overall relationship, we test the null alternative pair:H0:β1=β2=· · ·=βp-1= 0HA:At least oneβi6= 0.TheF-statistic as a random variable is:F=MSRMSEUnderH0, theappropriateproposition impliesFhas aF-distribution with respective degreesof freedomdf1=p-1anddf2=n-p.The corresponding test statistic is:fcalc=MSRMSE.Note:Fis a random variable andfcalcis a single realization ofFbased on the data set.Standard statistical outputTheanalysis of variance tablefor linear regression is:ANOVA TableDfSum SqMean SqF valuePr(>F)Regressionp-1SSRMSR=SSR/(p-1)fcalc=MSR/MSEP-valueResidualsn-pSSEMSE=SSE/(n-p)Totaln-1SSTwhereSSR=nXi=1yi-¯y)2SSE=nXi=1(yi-ˆyi)2SST=nXi=1(yi-¯y)2Important identities(p-1) + (n-p) =n-1SSR+SSE=SSTNote: For simple linear regression:94
Rejection ruleAlternative HypothesisRejection region for a leveltestHAfcalcf,p-1,n-pP-value computation:P(F > fcalc).Development of the overall F-test95
3.9Coefficient of Multiple DeterminationDEFINITION3.16 Thecoefficient of multiple determination, denotedR2is defined byR2=SSRSST= 1-SSESST(3.11)InterpretationR2100% of the variation in the responseYis explained by the covariatesx1, x2, . . . , xp-1."Other details0R21For simple linear regression,r2=R2.There is not acorrelation coefficientrfor multiple linear regression.Every time a new variable is added to the model, the coefficient of multiple determinationR2increases. (Never decreases)Every time a new variable is added to the model,SSEdecreases. (Never increases)To adjust forR2always increasing, we can divide the sums of squaresSSEandSSTby their respective degrees of freedom. This leads to the adjusted coefficient of multipledetermination.DEFINITION3.17 Theadjusted coefficient of multiple determination, denotedR2ais definedbyR2a= 1-SSEn-pSSTn-1= 1-n-1n-pSSESST= 1-n-1n-p(1-R2)(3.12)Note:96
Example 13 continuedDfSum SqMean SqF valuePr(>F)RegressionResidualsDfSum SqMean SqF valuePr(>F)RegressionResiduals97
3.10Inference on the Slope ParametersCeteris paribusis a Latin phrase meaning "with other things being equal or held con-stant".The above notion is key when interpreting and testing slope parameters in a multiple linearregression model. To further understand this, recall thatE[Yi] =β0+β1xi1+β2xi2+· · ·+βp-1xi,p-1.Interpretation of the slope parameterβj“For every one unit increase in covariatexj, the mean of the response variable increases(or decreases) byβjunits when holding all other covariates constant"Further motivation:For simple linear regression:

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture