Lecture 1
STT226W
1. STT212/STT213 REVIEW
A. Sampling Distribution
Suppose we want to make inferences about . We take a random sample of size n and compute
x
x i We need to know how x behaves in repeated samples in order to use it to make inferences
n
abo

Exercise 4.88 on page 246
STT226W
A naval base is considering modifying or adding to its fleet of 48 standard aircraft. The final decision regarding the type and number of aircraft to be added
depends on a comparison of cost versus effectiveness of the mo

STT226W
Best Subsets Regression
Surgical Unit Example: A hospital surgical unit was interested in predicting survival in patients undergoing a particular type of liver
operation.
Row
y
1
695
2
403
3
710
4
349
5 2343
6
348
7
518
. . .
51
405
52
579
53
550

Recap of last time
Example
Some properties of LS line
LS estimators are unbiased
Model Assumptions
Estimating s2
Properties & Interpretation of s2
Sampling distribution of slope
Sampling distribution of intercept
Recap of Last Time
Example
Find the least

STT226W
Homework 9
DUE: Friday, April 18
READ: Indicator Variables and Interaction Models in the LECTURES folder on Blackboard
1. A study was published of mens and womens winning times in the Boston Marathon. The
independent variables used to model winnin

Interaction Model
STT226W
Predict amount of oil (in gallons) used to heat a home from degrees below freezing
(F) and amount of insulation (inches). So Deg = 0 means it is 32F outside, Deg = 10
means it is 22F outside and Deg = -10 means it is 42F outside.

Getting data into R
STT226W
An important first step is learning how to get data into R. The most direct way is to simply type data into the
Console window. Type
>a=4
and notice that a has been added to the Workspace window.
Now type
>x=c(2,3,4)
and notice

(a). The plot shows a strong positive linear relationship between hours of staff time and amount of
billings.
(b).
(c). I observe 6 runs.
(d). n1 = 11, n2 = 9; expected runs =
2n1n2
2(11)(9)
1
1 10.9 ; sd of runs = 2.15345
n1 n2
(11 9)
(e). Yes, there i

1.
(a).
(b). The relationship between X and Y is curvilinear
(c).
(Intercept)
-1051.108
X
66.186
I(X^2)
-1.006
(d). The variance inflation factors are greater than 10 which means that there is a multicollinearity
problem.
vif(model1)
X I(X^2)
1453
1453
(e

Inference for slope
Confidence interval
Hypothesis testing
Inference for intercept
Confidence interval
Hypothesis testing
Sampling Distribution of b 0
?
b 0 ~ N b 0 ,s ( b 0 )
(
)
x2
1
x2
2 1
) = s2 +
s (b0
= s +
2
( x - x)
n
n
S xx
the differen

Correlation coefficient
Analysis of Variance
Coefficient of Determination
1
Correlation Coefficient
r=
S xy
S xx S yy
=
( x - x )( y - y )
(
(
x - x ) y - y )
2
2
Measure of the strength of the LINEAR association between x and y
2
Correlation coefficien

1. To determine whether extra personnel are needed for the day, the owners of a water adventure park would like to
find a model that would allow them to predict the days attendance each morning before opening based on the day of
the week and weather condi

Variable Selection
STT226W
so far we have assumed variables chosen in advance
set of varibles to include are not usually predetermined
no unique set of best variables
set of variables that is best for one purpose may not be best for another
suppose we hav

Transformation for Nonlinear Relation Only
If the regression relation between X and Y is nonlinear but the distribution of the error terms is
reasonably close to a normal distribution with equal variance, then transformations on X should be
attempted. The

STT226W
POLYNOMIAL REGRESSION MODELS
frequently used for curvilinear response model
model may fit the data well but take unexpected directions outside the range of the data
ONE PREDICTOR VARIABLE SECOND ORDER MODEL (x2 term)
E ( y) 0 1x 2 x 2
0 is the mea

Transformation for Nonnormality and Unequal Error Variances
Unequal error variances and nonnormality of the error terms frequently occur together. In this case, we
need a transformation on Y, since the shapes and spreads of the distributions of Y need to

QUALITATIVE PREDICTORS
Qualitative, as well as quantitative, predictor variables can be used in regression models. For example,
gender (male,female), purchase status (purchase, no purchase), and class year in school (freshman,
sophomore, junior, senior).

STT226W
MULTICOLLINEARITY
When two or more X ' s in the model are moderately or highly correlated
HOW TO DETECT
significant correlations between pairs of X ' s
non-significant t-tests for 's with significant global F test
opposite signs than expected for

STT226W
RESIDUAL CORRELATION
correlated eror terms suggest there is additional information in the data that has not been exploited in
the current model
EFFECT OF RESIDUAL CORRELATION
's are still unbiased but dont have minimum variance
2
estimates of and

Homework 6
STT226W
1. (a). Fit the model using the method of least squares. Is there evidence that the model is useful for
predicting y? Test using = 0.05.
> attach(GASKETS)
> model=lm(numdef~speed)
> summary(model)
(Intercept)
speed
Estimate Std. Error t

Correlation coefficient
Analysis of Variance
Coefficient of Determination
1
Correlation Coefficient
r
S xy
S xx S yy
( x x )( y y )
(x x ) ( y y)
2
2
Measure of the strength of the LINEAR association between x and y
2
Correlation coefficient & slope

Regression Analysis
A statistical technique for modeling
the relationship between variables
Uses of Regression
Data description
Parameter estimation
Prediction and estimation
Control
The Model Building Process
Collecting Data
Observational
Experimen

Recap of last time
Example
Some properties of LS line
LS estimators are unbiased
Model Assumptions
Estimating s2
Properties & Interpretation of s2
Sampling distribution of slope
Sampling distribution of intercept
Recap of Last Time
Example
Find the least

Inference for slope
Confidence interval
Hypothesis testing
Inference for intercept
Confidence interval
Hypothesis testing
Examples
0
Sampling Distribution of
0 ~ N 0 , ( 0 )
1
1 x2
x2
(0 ) 2
2
2
n ( x x)
n S xx
the different values wed