The Annals of Statistics
1995, Vol
.
23, No
.
6, 1865
]
1895
THE 1994 NEYMAN MEMORIAL LECTURE
SMOOTHING SPLINE ANOVA FOR EXPONENTIAL
FAMILIES, WITH APPLICATION TO THE
WISCONSIN EPIDEMIOLOGICAL STUDY
OF DIABETIC RETINOPATHY
1
B
Y
G
RACE
W
AHBA
,
2
Y
UEDONG
W
ANG
,
3
C
HONG
G
U
,
4
R
ONALD
K
LEIN
5
AND
B
ARBARA
K
LEIN
6
University of Wisconsin
]
Madison, University of Michigan, Purdue
University, University of Wisconsin
]
Madison and University of
Wisconsin
]
Madison
Let
y
,
i
s
1,
. . .
,
n
, be independent observations with the density of
i
Ž
.
w
Ž
.
Ž
.
x
y
of the form
h y
,
f
s
exp
y f
y
b f
q
c y
, where
b
and
c
are
i
i
i
i
i
i
i
given functions and
b
is twice continuously differentiable and bounded
Ž Ž ..
Ž
.
Ž
1
.
Ž
d
.
away from 0
.
Let
f
s
f t i
, where
t
s
t
,
. . .
,
t
g
T
m
???
m
T
s
T
,
i
1
d
the
T
Ž
a
.
are measurable spaces of rather general form and
f
is an
unknown function on
T
with some assumed ‘‘smoothness’’ properties
.
Ž .
4
Ž .
Given
y
,
t i
,
i
s
1,
. . .
,
n
, it is desired to estimate
f t
for
t
in some
i
region of interest contained in
T
.
We develop the fitting of smoothing
Ž .
Ž
.
spline ANOVA models to this data of the form
f t
s
C
q
Ý
f
t
q
a
a
a
Ž
.
Ý
f
t
,
t
q
???
.
The components of the decomposition satisfy side
a

b
ab
a
b
conditions which generalize the usual side conditions for parametric
ANOVA
.
The estimate of
f
is obtained as the minimizer, in an appropriate
Ž
.
Ž
.
Ž
.
function
space,
of
L
y
,
f
q
Ý
l
J
f
q
Ý
l
J
f
q
???
,
a
a
a
a
a

b
ab
ab
ab
Ž
.
Ž
.
where
L
y
,
f
is the negative log likelihood of
y
s
y
,
. . .
,
y
9
given
f
,
1
n
the
J
,
J
,
. . .
are quadratic penalty functionals and the ANOVA decom
a
ab
position is terminated in some manner
.
There are five major parts re
Ž .
quired to turn this program into a practical data analysis tool: 1 methods
Ž
for deciding which terms in the ANOVA decomposition to include
model
. Ž .
selection ,
2
methods for choosing good values of the smoothing parame
Ž .
ters
l
,
l
,
. . .
,
3
methods for making confidence statements concern
a
ab
Ž .
ing the estimate, 4 numerical algorithms for the calculations and, finally,
Ž .
5
public software
.
In this paper we carry out this program, relying on
earlier work and filling in important gaps
.
The overall scheme is applied
Received January 1995; revised May 1995
.
1
This work formed the basis for the Neyman Lecture at the 57th Annual Meeting of the
Institute of Mathematical Statistics, Chapel Hill, North Carolina, June 23, 1994, presented by
Grace Wahba
.
2
Research supported in part by NIH Grant EY09946 and NSF Grant DMS9121003
.
3
Research supported in part by NIH Grants EY09446, P60DK20572 and P30HD18258
.
4
Research supported by NSF Grant DMS9301511
.
5
Research supported by NIH Grant EY03083
.
6
Research supported by NIH Grant EY03083
.
AMS
1991
subject classifications
.
Primary 62G07, 92C60, 68T05, 65D07, 65D10, 62A99,
62J07; secondary 41A63, 41A15, 62M30, 65D15, 92H25, 49M15
.
Key words and phrases.
Smoothing spline ANOVA, nonparametric regression, exponential
families, risk factor estimation
.