Chapter 4. Introduction to Generalized Linear
Models
Deyuan Li
School of Management, Fudan University
Feb. 28, 2011
1 / 107
Outline
•
4.1 Generalized Linear Models
•
4.2 Generalized Linear Models for Binary Data
•
4.3 Generalized Linear Models for Counts
•
4.4 Moments and Likelihood for Generalized Linear Models
•
4.5 Inference for Generalized Linear Models
•
4.6 Fitting Generalized Linear Models
•
4.7 Quasilikelihood and Generalized Linear Models
2 / 107
4.1 Generalized Linear Models
Generalized linear models
(GLMs) extend ordinary regression
models to encompass
nonnormal
and
nonidentical
response
distributions and modeling
functions
of the mean.
4.1.1 Components of generalized linear models
random component; systematic component; link function.
1) The
random component
consists of a response variable
Y
with
independent observations (
y
1
, . . . ,
y
N
) from a distribution in the
natural exponential family
. This family has probability density or
mass function of form
{
Insert
......
}
The value of the parameter
θ
i
may vary for
i
= 1
, . . . ,
N
,
depending on values of explanatory variables.
The term
Q
(
θ
) is called the
natural parameter
.
3 / 107
2) The
systematic component
relates a vector (
η
1
, . . . , η
N
) to the
explanatory variables through a linear model.
Let
x
ij
denote the value of predictor
j
(
j
= 1
,
2
, . . . ,
p
) for subject
i
. Then
η
i
=
j
β
j
x
ij
,
for
i
= 1
, . . . ,
N
.
This linear combination of explanatory variables,
η
i
, is called the
linear predictor
. Usually,
x
i
1
= 1 for all
i
, for the coeﬃcient of an
intercept (often denoted by
α
) in the model.
3) The
link function
connects the random and the systematic
components.
Let
μ
i
=
E
(
Y
i
),
i
= 1
, . . . ,
N
. The model links
μ
i
to
η
i
by
η
i
=
g
(
μ
i
)
,
where the link function
g
is a monotonic, differentiable function.
4 / 107
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Thus,
g
links
E
(
Y
i
) to explanatory variables through the formula
{
Insert
......
}
The link function
g
(
μ
) =
μ
, called the
identical link
, has
η
i
=
μ
i
,
i.e., a linear model for the mean itself. This is the link function for
ordinary regression with normally distributed
Y
.
The link function that transforms the mean to the natural
parameter is called the
canonical link
. For it,
g
(
μ
i
) =
Q
(
θ
i
), and
Q
(
θ
i
) =
∑
j
β
j
x
ij
.
In summary, a GLM is a linear model for a transformed mean of a
response variable that has distribution in the natural exponential
family.
5 / 107
4.1.2 Binomial logit models for binary data
The
Bernoulli distribution
has the probability mass function
f
(
y
;
π
)
=
π
y
(1
−
π
)
1
−
y
= (1
−
π
)[
π/
(1
−
π
)]
y
=
(1
−
π
) exp
y
log
π
1
−
π
for
y
= 0 and 1.
This is in the natural exponential family, with
θ
=
π
,
a
(
π
) = 1
−
π
,
b
(
y
) = 1 and
Q
(
π
) = log[
π/
(1
−
π
)].
The natural parameter
Q
(
π
) = log[
π/
(1
−
π
)] is the log odds of
response
y
= 1, i.e., the
logit
of
π
.
⇒
The canonical link function is the logit link,
η
= log[
π/
(1
−
π
)].
GLMs using the logit link are often called
logit models
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 DeyuanLi
 Normal Distribution, πini yi

Click to edit the document details