Chapters 1 and 2 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 31

Toluca data (p. 19) Toluca makes replacement parts for refridgerators. We consider one particular part, manufactured in varying lot sizes. Takes time to set up production regardless of lot size; this Want to relate work hours to lot size. n = 25 pairs ( x i , Y i ) were obtained. 2 / 31
data toluca; input size hours @@; label size="Lot Size (parts/lot)"; label hours="Work Hours"; datalines; 80 399 30 121 50 221 90 376 70 361 60 224 120 546 80 352 100 353 50 157 40 160 70 252 90 389 20 113 110 435 100 420 30 212 50 268 90 377 110 421 30 273 90 468 40 244 80 342 70 323 ; proc sgscatter; plot hours*size; run; options nocenter; proc reg; model hours=size; run; 3 / 31

Toluca data, SAS output The REG Procedure Dependent Variable: hours Work Hours Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 252378 252378 105.88 <.0001 Error 23 54825 2383.71562 Corrected Total 24 307203 Root MSE 48.82331 R-Square 0.8215 Dependent Mean 312.28000 Adj R-Sq 0.8138 Coeff Var 15.63447 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 62.36586 26.17743 2.38 0.0259 size Lot Size (parts/lot) 1 3.57020 0.34697 10.29 <.0001 4 / 31
Toluca data Roughly linear trend, no obvious outliers. 5 / 31

Toluca The ﬁtted model is [ hours = 62 . 37 + 3 . 570 lot size . A lot size of x = 65 takes ˆ Y = 62 . 37 + 3 . 570(65) = 294 hours to ﬁnish, on average . For each unit increase in lot size, the mean time to ﬁnish increases by 3 . 57 hours. Increasing the lot size by 10 parts increases the time by 35.7 hours, about a week. b 0 = 62 . 37 is only interpretable for lots of size zero. What does that mean here? 6 / 31
The i th ﬁtted value is ˆ Y i = b 0 + b 1 x i . The points ( x 1 , ˆ Y 1 ) , . . . , ( x n , ˆ Y n ) fall on the line y = b 0 + b 1 x , the points ( x 1 , Y 1 ) , . . . , ( x n , Y n ) do not. The i th residual is e i = Y i - ˆ Y i = Y i - ( b 0 + b 1 x i ) , i = 1 , . . . , n , the diﬀerence between observed and ﬁtted values. e i estimates ± i . 7 / 31

Properties of the residuals (pp. 23–24) 1 n i =1 e i = 0 (from normal equations) 2 n i =1 x i e i = 0 (from normal equations) 3 n i =1 ˆ Y i e i = 0 (1 and 2) 4 Least squares line always goes through (¯ x , ¯ Y ) (easy to show). 8 / 31
Estimating σ 2 , Section 1.7 σ 2 is the error variance. If we observed the ± 1 , . . . , ± n , a natural estimator is S 2 = 1 n n i =1 ( ± i - 0) 2 . If we replace each ± i by e i we have ˆ σ 2 = 1 n n i =1 e 2 i . However, E σ 2 ) = 1 n n X i =1 E ( Y i - b 0 - b 1 x i ) 2 = ...a lot of hideous algebra later.

