# L choose values that give the best fit l what do you

l Choose values that give the best fit l What do you mean best fit? l One approach is minimize residual sum of squares l Method called least squares l Basis of regression analysis (see Keller Ch 16)

21 Least squares: The diagram x b b y 1 0 ˆ + = y x w w w w b 0 e 1
22 Least squares: The optimization problem squares of sum residual the minimize that estimates slope and intercept be will solution Thus ) ˆ minimize to chosen are where ˆ Assume 2 1 1 0 1 0 i i n i i i y y ( b b x b b y - + = =

23 Least squares: The solution x b y b s s b x xy 1 0 2 1 - = = l Note: l Point of the means will lie on line of best fit l b 1 will have same sign as covariance (correlation) between l Zero covariance (correlation)  b 1 = 0  ?
24 Internet use: Keller exercise 3.52 and extension l Problem l Interested in internet usage l l Data l Random sample of 15 adults l Two variables l Education (years) l Internet use (hours in previous week) l What are the key features of these variables & their relationship?

Internet use: Excel summary statistics 25 Education Internet use Mean 12.667 Mean 10.000 Standard Error 0.779 Standard Error 1.857 Median 11 Median 10 Mode 11 Mode 0 Standard Deviation 3.016 Standard Deviation 7.191 Sample Variance 9.095 Sample Variance 51.714 Kurtosis -0.114 Kurtosis -0.432 Skewness 0.586 Skewness 0.181 Range 11 Range 24 Minimum 8 Minimum 0 Maximum 19 Maximum 24 Sum 190 Sum 150 Count 15 Count 15
26 Internet use: Scatter diagram & fitted regression line 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 18 20 Hours of use Education Internet use

27 Internet use: Regression line l b 1 = 15.296/(9.095) = 1.682 b 0 = 10 – 1.682*12.677 = – 11.323 l Be careful: EXCEL uses population formulae in calculating covariances = 15.296 = 14.267*(15/14) l See Keller p. 138 Covariance Education Internet use Education 8.489 Internet use 14.267 48.267 Correlation Education Internet use Education 1 Internet use 0.705 1
28 Internet use: Summary

Progress report #1 l Descriptive statistics (Emphasis of course so far) l What are the key features of data? l How can we best describe these features so that analysis is informative l Inferential statistics (Emphasis of course to come) l Extracting information about population parameters on basis of sample statistics l What does a sample mean tell us about a population mean? l Typically only alternative because difficult or impossible to determine population mean l Need more foundations before covering later in course 29
