Lecture23 - STAT350 Lecture22 Chapter3.3...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT350 Lecture22 Chapter3.3 LeastSquareRegressionLine 3.3Fi<ngaLine(Regressionline) IfXandYhasalinearrelaEonship: Findthelinewhichbestfitsthedata RegressionLine UsethislinetopredictYforgivenvaluesofX Recall: EquaEonofstraightline:y = a + b x HistoricalNoteonRegressionLine SirFrancisGaltondiscoveredaphenomenoncalled Regression Toward the Mean Tallerfatherstendedtohavesomewhatshortersons,and viceversa Son'sheighttendedtoregresstowardthemeanheightof thepopulaEon,comparedtotheirfather'sheight GaltondevelopedRegressionAnalysistostudythiseffect, whichhecalled"regressiontowardmediocrity". IllustraEonofLeastSquareRegressionLine Example 70 65 60 55 50 45 40 155 160 165 170 175 180 LeastSquaresRegressionLine Regressionlineis: Howdoweknowthisistherightline? Whatmakesitbest? ItistheLeast Squares Regression Line ItisthelinewhichmakestheverEcaldistances fromthedatapointstothelineassmallas possible Usestheconceptofsumsofsquares SmallsumsofsquaresisgoodLeastSquares! FindingtheLeastSquaresRegressionLine ThesoluEongives: AlternatecalculaEons Meaningofslopeb: HowmuchdoesYchangeifXischangedby1unit? (Rise over run) DirectlyrelatedtothecorrelaEon Example3.7onpage117 FindtheequaEonfortheregressionline. SummarystaEsEcsaregiven: n = 20 x = 2817.9 y = 1574.8 x = 415,949.85 x y = 222,657.88 y = 124,039.58 i i 2 i i i 2 i Example3.7onpage117 Step1Findtheslopeb: n = 20 1 x i y i - ( x i )( y i ) Sxy n b= = 1 Sxx x i2 - ( x i ) 2 n 1 222,657.88 - (2817.9)(1574.8) 20 = 1 415,949.85 - (2817.9) 2 20 776.434 = = 0.041 18,921.83 x = 2817.9 y = 1574.8 x = 415,949.85 x y = 222,657.88 y = 124,039.58 i i 2 i i i 2 i Example3.7onpage117 Step2Findtheintercepta: n = 20 x = 2817.9 y = 1574.8 x = 415,949.85 x y = 222,657.88 y = 124,039.58 i i 2 i i i 2 i b = 0.041 a = y - bx 1574.8 2817.9 = - 0.041* 20 20 = 78.74 - 0.041*140.895 = 72.96 TheequaEonfortheregressionlineis ^ y = 72.96 + 0.041x Example3.7onpage117 Usetheregressionlinetopredict ywhenx=150 ^ y = 72.96 + 0.041x = 72.96 + 0.041*150 = 79.1 DangerofExtrapolaEon FieedrelaEonship(e.g.regressionline)may notbevalidforxvaluesmuchbeyondthe rangeofthedata Growthchartforboysorgirlsage218 Example(ex20onpage126) Refertothetanktemperatureefficiency raEodata(nextslide): a) DeterminetheequaEonoftheleastsquare line. b) CalculateapointpredicEonforefficiency raEowhentanktemperatureis182 TemperatureRaEoData(parEal) 'Temp:' 170 172 173 174 174 175 'RaEo:' .84 1.31 1.42 1.03 1.07 1.08 SAScode data tank; infile'H:\Stat350Data\ex3_20.txt'; inputtempraEo; run; proc reg data=tank; modelraEo=temp; plotraEo*temp; run; The Equa:on for the Least Square Regression Line is: Ra:o = 15.245 + 0.0942*temp The Equa:on for the Least Square Regression Line is: Ra:o = 15.245 + 0.0942*temp Example(ex20onpage126) a)TheequaEonoftheleastsquarelineis: Ra:o = 15.245 + 0.0942*temp b)ApointpredicEonforefficiencyraEowhen tanktemperatureis182: RaEo=15.245+0.0942*182=1.8994 Assessingthefit Howwelldoesthelinefitthedata? Weassessthefitofthelinebylookingathow much varia:on in Y is explained by the regression line on X BreakingupSumsofSquares Total Varia:on in y: SSTo = ( y i - y ) Breakintotwopieces: RegressionSumsofSquares: 2 Partweareexplainingwithourregressionline ^ SS Re g = SSM = ( y i - y ) 2 ErrorSumsofSquares unexplainedvariaEon ^ SS Re sid = SSE = ( y i - y i ) 2 IfSSEissmall,wecanassumeourfitisgood BreakingupSumsofSquares SSTo = TotalSumsofSquares measuresthetotalvariaEoninY SSTo=SSReg+SSResid ^ SS Re g = ( y i - y ) SSM or SSReg 2 ^ SS Re sid = ( y i - y i ) SSE or SSResid 2 RegressionSumsofSquares PartofvariaEoncanbeexplained PartofvariaEonwecan't bytheregressionline explain(unexplainedvariaEon) ErrorSumsofSquares IfSSEissmall,wecanassumeourfitisgood SSTo = 9.91; SSReg=SSM=4.47; SSResid=SSE=5.44 CoefficientofDeterminaEon r2isgivenby: r 2 = SSM = 1- SSE SST SST MulEplyingr2by100givesthepercentof variaEonaeributedtothelinear regressionbetweenYandX WhenSSMislarge(orSSEsmall),we haveexplainedalargeamountofthe variaEoninY Coefficient of Determina:on R2=0.45 Example(ex20onpage126) d)WhatproporEonoftheobservedvariaEonin theefficiencyraEocanbeaeributedtothe appropriatelinearrelaEonshipbetweenthe twovariables? Answer:R2=0.45.So,45%oftheobserved variaEonintheefficiencyraEocanbe aeributedtotheappropriatelinear relaEonshipbetweenthetwovariables. Theother55%areunexplainedvariaEons. StandardDeviaEonaboutthe regressionline Givenby: Itisthetypicalamountbywhichan observaEonvariesabouttheregressionline Alsocalled"rootMSE"orthesquarerootof theMeanSquareError RootMSEis0.49724intheexample Root MSE = 0.49724 ThetypicalamountbywhichanobservaEonvaries abouttheregressionlineis 0.49724 Example--HeightandWeight Thefollowingdatasetgivestheaverage heightsandweightsforAmericanwomen aged3039(source:The World Almanac and Book of Facts, 1975). TotalobservaEons15. Example--HeightandWeight Example--HeightandWeight Example--HeightandWeight WhatistheesEmatedregressionline? Usingtheline,predicttheweightofwomen 73intall. Example--HeightandWeight WhatistheesEmatedregressionline? EquaEonfortheRegressionlineis: Weight=87.5+3.45*height Usingtheline,predicttheweightofwomen 73intall. Weight=87.5+3.45*73=164.35 ResidualPlots Theresidualscanbeusedtoassessthe appropriatenessofalinearregressionmodel. A residual plot plotstheresidualsagainstx Shouldshowarandomscaeeringofpoints Shouldnothaveanypaeern Ifapaeernisobserved,thelinearregression modelisprobablynotappropriate. Examples--good Examples--linearityviolaEon Examples-- constantvarianceviolaEon ...
View Full Document

This note was uploaded on 02/06/2012 for the course STAT 350 taught by Professor Staff during the Spring '08 term at Purdue University-West Lafayette.

Ask a homework question - tutors are online