Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/13
An example: vending machine1
Suppose that an industrial engineer employed by a
soft drink beverage bottler is analy
Math439: Linear Statistical Models
Variable selection: Part I
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/19
Reasons for variable selection
Variable selection is intended to select the best subset
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/13
Estimation of 2
underlineUnbiased estimator: 2 = M SE =
n
i=1
where
n
e2 =
i
SSE =
SSE
n2 ,
(yi yi )2
i=1
is called
Math439: Linear Statistical Models
Variable selection: Part II
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/17
Approaches for variable selection
There are two main approaches to do the variable sele
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/16
Sampling distributions for a normal random sample
Suppose that Y1 , . . . , Yn form a random sample of size n from
Math439: Linear Statistical Models
Categorical Variables in Linear Regression
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/21
Qualitative/Class/Categorical Variables
To this point, the predictor var
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/8
Simultaneous condence region
Consider the rectangular region given by the two marginal condence
intervals on the pre
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/13
Decomposition of sum of squares
SST = SSR + SSE
S ST = n (yi y )2
i=1
S SR = n (i y )2
y
i=1
n
S SE = i=1 (yi yi )2
Ma 439 Linear Models Fall 2010
Solutions for Problem Set #1 Due September 16, 2010
Prof. Sawyer Washington University
1. For the matrix
2 9
5 13
A=
11
0
7
14
17
3 2
6
0
2 3
5
(i) a2+ = j=1 a2j = 5 + 13 + 14 + 6 + 0 = 38
3
(ii) i=1 ai4 = 3 + 6 2 = 7
3
(ii
Ma 439 Linear Models Fall 2010
Solutions for Problem Set #2 Due October 19, 2010
Prof. Sawyer Washington University
(Do problems 1-4 by hand, and problems 5-6 using SAS.)
1. (10) Since Cov(X, Y ) for real-valued random variables X, Y is linear in
both var
TAKEHOME FINAL
Hand in either to Professor Sawyer or to the receptionist in the Mathematics Office.
NOTE: There should be NO COLLABORATION on the takehome final,
other than for the mechanics of using the computer.
Open textbook and notes (including course
Ma 439 Test Linear Statistical Models Fall 2010
Model Solutions
Prof. Sawyer Washington Univ. Test date October 26, 2010
1.
(Let X be a d-dimensional random vector that is normally distributed with parameters
and (that is, X N (, ).
Show that (X ) 1 (X )
Generalized Inverses:
How to Invert a Non-Invertible Matrix
S. Sawyer September 7, 2006 rev August 6, 2008
1. Introduction and Denition. Let A be a general mn matrix. Then
a natural question is when we can solve
Ax = y
for x Rm ,
given y Rn
(1.1)
If A is
Guide to Using SAS
The only way to learn a new programming language is by writing in
it. . . This is the basic hurdle; to leap over it you have to be able to create
the program text somewhere, compile it successfully, load it, run it, and
nd out where you
The Method of Lagrange Multipliers
S. Sawyer July 23, 2004
1. Lagranges Theorem. Suppose that we want to maximize (or minimize) a function of n variables
f (x) = f (x1 , x2 , . . . , xn )
for
x = (x1 , x2 , . . . , xn )
(1.1a)
subject to p constraints
g1
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/14
Example: Latitude and mortality due to malignant
melanoma
Reference: Fisher and Van Belle (1993). Biostatistics: A
Math439: Linear Statistical Models
Checking lack-of-t
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/16
Lack-of-t of the linear regression model
Question: Is the straight-line character of the relatio
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/21
Some basics of using R
Variables
Basic math operation
Array and related operations
Function and loop
Import/Export
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/11
Prediction of new observations
Question: What is the response value at x = x0 ?
A natural point prediction is y0 =
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/1
Multiple linear regression
A multiple linear regression model with p predictor variables is
dened as
yi = 0 + 1 xi1
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/11
Quadratic forms in normal variables
1
2
If y Nn (0, I ), y T y 2 .
n
If y Nn (0, 2 I ) and M is a n n symmetric ide
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/17
Hypothesis testing in multiple linear regression
What is the overall adequacy of the model?
F-test for goodness-of-
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/4
Extra-sum-of-squares method
Question: Do a subset of r < p regressors contribute signicantly to the regression
model
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/13
Partial F-test: compare rst-order model and
second-order model
Consider a regression with two predictors x1 and x2
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/11
Interpretation of regression coecients
Consider y = 0 + 1 x1 + 2 x2 + . After tting the model, how to
interpret 1 ?
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/19
Regression diagnosis
Checking model assumptions
1
2
3
4
5
6
Linear relationship between y and x
E( i) = 0
Homosceda
Math439: Linear Statistical Models
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/21
Box-Cox transformation
Box and Cox (1964) suggested a family of transformations
designed to reduce nonnormality of
Math439: Linear Statistical Models
Checking Heteroskedasticity in Linear Regression
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/22
Outline
Heteroskedasticity: Violation of the constant variance
ass
Math439: Linear Statistical Models
Linear Regression with Autocorrelated Errors
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/18
Autocorrelation
Regression models using time series data often involve
Math439: Linear Statistical Models
Outlier detection
Instructor: Nan Lin
nlin@wustl.edu
Class materials are available on Blackboard (http:/bb.wustl.edu)
1/22
Good and bad scatter plots
See star.r.
2/22
Outliers
In regression analysis, the model assumption