PUBH 7430
Assignment 3
October 18, 2011
Due: October 27, 2011 at the beginning of class
On this assignment, as in all homeworks, UNEDITED COMPUTER OUTPUT IS
NOT ACCEPTABLE. Tables and plots prepared from statistical output should be appropriate for inclus

PUBH 7430
Assignment 3 Solutions
October 18, 2011
Due: October 27, 2011 at the beginning of class
On this assignment, as in all homeworks, UNEDITED COMPUTER OUTPUT IS
NOT ACCEPTABLE. Tables and plots prepared from statistical output should be appropriate

B. Exploring Correlated Data Mean Structure
Every good statistical analysis begins with an
ocular test, that is, a good look at the data.
Here are some recommendations for plotting
correlated data:
(1) Show as much of the raw data as possible;
minimize su

BASIC DIAGNOSTICS
Fitted values and residuals for GLMs are defined
as we expect:
ij = Xij
Y
ij
eij = Yij Y
and are easily obtained from PROC MIXED:
MODEL y = / OUTPM = diag;
The data set diag will contain both the fitted
values and the residuals. They ca

ESTIMATION FOR
Recall from a regression model (independent
errors) that there are several possible estimates for
2:
2
1 n
2
2
i
M L = i=1 Yi Y
n
2
1
n
i = M SE
i=1 Yi Y
=
nK
2
Why do we use the M SE instead of
2 ?
This is true in models with correla

V. GENERALIZED LINEAR MIXED MODELS
A. INTRODUCTION
As with normally distributed data and GLMMS,
sometimes it is easier to conceptualize a
correlation as having arisen from random effects.
GLM
Yij Normal
E[Yij ] = Xij
V ar[Yi ] = i
GLMM
Yij Normal
E[Y
h i

III. METHODS FOR NORMALLY
DISTRIBUTED CORRELATED DATA
A. General Linear Models (GLMs)
Recall the usual regression model
Yi = Xi + i
i N (0, 2)
i = 1, . . . , n
independent
where Yi is the one outcome measure from each
of n individuals.
For correlated data

III B. General Linear Mixed Models (GLMMs)
INTRODUCTION
Example I - Animal Behavior
Study objective: to study the learning of
termite fishing by young chimpanzees
Study design: video-taped focal follows of
chimpanzee mothers in 15 min increments over
fo

PREDICTION OF FITTED MEANS
In mixed models, it is important to recognize that
two types of fitted means can be computed:
Population-averaged (or marginal) mean:
E [Yij ] =
Cluster-specific (or conditional) mean:
E [Yij | i] =
They are predicted using
an

MAXIMIZATION DETAILS
Whether ML or REML is used, something that
looks like a likelihood needs to be maximized. SAS
calls this the objective function and it is:
L() = 12 log | 21 r01r n2 log (2)
for ML
L() = 12 log | 21 log |X 01X| 21 r01r
np
2 log (2)
fo

EDA
Mean Model EDA (what goes in Xi )
As for GLMs.
Covariance Model EDA (what goes in Zi and Ri)
If you are considering a model with random
slopes, you need to verify that each clusters
trend over time is approximately linear. It is
not sufficient to ver

NAME: _SOLUTIONS_
PUBH 7430 Midterm examination
November 3, 2011
This exam runs from 9:45 a.m. until 11:00 a.m.
There are 5 questions, for a total of 80 points.
There is an (optional) BONUS question worth 5 additional
points at the end of Question 1.
Clos

IV. GENERALIZED LINEAR MODELS
A. INTRODUCTION
A general linear model says
Yi = Xi + i
i N (0, i)
which implies that Yi N (Xi , i).
But what if our outcome variable is a count, a
categorical variable, or some other kind of variable
for which normality is N

B. Derived Variables Models
A derived variable analysis first reduces the vector
Yi for each individual to a single value Yi, and then
uses that new single value as the response variable
in a standard analysis.
What kind of single values?
EX Yi =
EX Yi =

D. GzLMs for CORRELATED DATA
Recall that a regressions likelihood (where Yi is a
single observation):
Yi Xi 2
1
2
2 1/2
n
e
i=1 (2 )
could be easily expanded to a GLM likelihood:
1 (Y X )0 1 (Y X )
1
1/2
n
i
i
i
|i| 2 e 2 i i
i=1 (2)
where Yi is a vector

Summarized Data Plots
(1) Boxplot at each time point or for each cluster
Disadvantages: gives a very crude view of changes
in level and changes in variability. Cant be
done for binary outcomes.
(2) Plot of average or median at each time point
or for each

GLMMs FOR CLUSTER-CORRELATED DATA
(a.k.a. other kinds of random effects)
Recall the chimp learning (termite fishing) example and consider the random intercept model:
Yij =
age skill
acquired = 1 + 2sex + i1 + ij
by offspring ij
where i = chimp mother and

E. REVIEW OF MULTIPLE LINEAR REGRESSION
IN MATRIX NOTATION
EXAMPLE Contact Time Study
Each year the U.S. Naval Postgraduate School sets
aside a Discovery Day during which the general
public is invited into their laboratories. Our data
come from 21 October

D. Review of Matrices
Goal: to familiarize everyone with why (and how)
we can write down regression models as either
Yi = 0 + 1Xi1 + 2Xi2 + 3Xi3 + 4Xi4 + i
or equivalently as
Y = X + .
We will use both types of notation throughout the
course.
49
Notation

D. EDA for NON-NORMAL DATA
It is still of course important to determine how
covariates should be entered into a model. EDA
plots are more difficult for GzLMs because of the
link function.
Bernoulli - Mean EDA
For categorical Xij , compute the observed P

ESTIMATION FOR
Our general linear model is
Yi = Xi + i
i = 1, . . . , n
i N (0, i)
independent
which implies that
Yi N (Xi , i)
independent.
Recall that for a regression (independent data, Yi
is a number not a vector), we wrote down a
formula for the bel

Lecture Notes for
Methods for Correlated
Data
PubH 7430 Fall 2005
c Dr. Lynn E. Eberly
Last modified: September 5, 2005
I. Introduction
A. Examples
Six Cities Study
- prospective observational study of the
effects of air pollution in adults and children

D. MODEL FITTING ISSUES
OVERALL MODELING STRATEGY
Our overall modeling strategy is exactly the same
as we saw for GLMs and GLMMs (see e.g. p.143).
GzLMMs are likelihood based, so there are
likelihood ratio tests to compare nested models,
and AIC and BIC v

TESTING
Components in can be tested just like we did for
GLMs:
t
B 0
= q
1B
B 0 (X 0X)
F =
h
i1
0
0
1
0
0
B
(B
) B (X X)
(B 0
)
rank (B)
B is defined as before: a vector or matrix of
contrast coefficients that indicate which
components of are tested.
T

II. Classical Methods for Correlated Data
We will talk about two classical methods:
derived variable models
repeated measures ANOVA models.
Several other classical methods exist (such as
MANOVA) for non-longitudinal correlated data,
but they will not be

ESTIMATION of
Recall our GLMM looks like
Yi = Xi + Zii + i
i N (0, D)
i N (0, Ri )
(usually Ri = 2I)
Therefore:
E[Yi] =
V ar[Yi ] =
Yi
Note that our variances are an explicit function of
whats in Zi (e.g., longitudinal times t1, , tJ ).
182
Since we sti

PUBH 7430
Lecture 20
J. Wolfson
Division of Biostatistics
University of Minnesota School of Public Health
November 17, 2011
1 / 25
Generalized linear mixed models
2 / 25
Linear mixed models
So far, we have dealt with the linear mixed model (LMM)
Yij = xij

PUBH 7430
Lecture 19
J. Wolfson
Division of Biostatistics
University of Minnesota School of Public Health
November 15, 2011
Random slopes
Random eects model for eect of GART on HIV viral
load
Viral loads within individuals are correlated across time
Sub

PUBH 7430
Lecture 4
J. Wolfson
Division of Biostatistics
University of Minnesota School of Public Health
September 15, 2010
A little bit of notation
Notation - independent data
With independent data on n units, for each unit j we have
(Scalar, one-dimens