Topics to be covered this week
Winter Quarter, 2016
Monday, Jan 25 Three-factor studies (chap 22, App. Lin. Stat. Models).
Wednesday, Jan 26
Random effects model (chaps 25.1-25.4, App. Lin Stat. Models).
Homework 3 (Due on Monday, Jan Feb 1
Due : November 24, 2014, In Class
1. Model Selection Procedures and Model Validation. Diabetes data. This data
consist of 19 variables on 403 subjects from 1046 subjects who were interviewed in a
study to understand the prevalenc
14.44) You definitely need to show sufficient work for getting the second partial derivatives to get credit if this problem is chosen to be graded. After getting the expressions, evaluate them in R to calculate the cov matrix. DO NOT simply use the vcov()
I wanted to clarify that x1^2:x2^2 is not a second order term. However, I included it because the student solution manual for your text includes 6 terms for this problem. The 6 terms are not defined and x1^2:x2^2 was the only thing I could think of
Here are some tips on the use of R for Homework 5.
(a) You may use lm.ridge command in MASS package. It will also give you the GCV values.
(b) For ridge regression it is possible to get VIF smaller than 1.
(c) Parts (c) and (d) of question 3: in order to
Unbalanced two-factor ANOVA
Consider the Hay Fever Relief data, but suppose that a few observations are missing and
this an example of an unbalanced two-factor study.
Factor B (ingredient 2
Factor A (ingredient 1)
2:4; 2:7; 2:3;
Pooling of sums of squares:
When the interactions eects can be ignored either because of prior knowledge or because of an F-test for
interactions, an alternative estimator of
model becomes: Yijk =
can be provided. Without the interacti
We will deal with three factor models here. Note that once we know how to
analyze the two-factor case, this knowledge basically allows us to tackle three
or higher order studies, except that notations become increasingly messy
Topics covered this week
Winter Quarter, 2016
Wednesday, Feb 17
Longitudinal data analysis (chap 9, Extending the Linear Model with R).
Homework 5 (Due on Wednesday, Feb 24).
(From App. Lin. Stat. Models) Problems 26.4, 26.5, 26.6, 26.19, 2
Consider the car price data which contain information on the cars sold in the U.S. in 1993. The response
variable is price and there are 6 other variable - city MPG, hwy MPG, engine size, HP, tank size and weight.
The goal is t
So far we have only dealt withy the cases where the response variable is quantitative and the mathematical
assumption is that the response is a continuous variable. Let us look at two data sets. The rst data has a
Partial Least Squares
Partial least squares (PLS) is a method that is quite popular among many scientists. Like the ridge
regression, this method is useful when there is a substantial amount of collinearity among the independent
variables. It i
Lasso is similar in spirit to the ridge regression except that the penalty is dierent. For the purpose of
discussion we will assume that all the variables have been standardized and call them Y; X1 ; : : : ; X6 . In ridge
regression we ha
Longitudinal Data Analysis.
Longitudinal data analysis is related to repeated measures designs. Consider a few examples below. In
each case, you have individuals or subjects being observed over time. For example, in the rat growth data,
Two-factor studies (balanced)
Consider a two-factor case where factor A has a levels, factor B has b levels, and there are n observations
for each of the ab combinations of the factors. For the Hay Fever Relief example, 9 compounds for hay fever
Repeated measures design
This design is often employed in practice when the same subject (or object) is subjected to a number of
treatments. It is a two-factor model in which one factor (subject) is usually treated as random whereas the
Repeated Measures and Split-plot designs.
This handout lists a few more repeated measures designs which are a bit more complicated than the ones
in Handout 6. In addition it also presents the basic ideas behind Split-Plot designs, which are qui
Nested design (both factors xed):
Consider the data set given later. A company runs three schools for mechanics, one in each of the
following three cities: Atlanta (i=1), Chicago (i=2) and San Francisco (i=3). The instructors are:
Random eects model
We will consider a one-factor model where the factor eects are random. The following example is useful.
Coil winding machines: A plant contains a large number of coil winding machines. A production analyst
studied a certain ch
Random and Mixed Eects Models.
Two factor models (both factors random, ANOVA model II).
For this model we have two factors A and B, both random. An example (data on "miles per gallon") is
given later where both the factors diver (factor A) and d