Unformatted text preview: ledge of β and σ 2 is summarized by our posterior distribution. First draw (β, σ 2 ) from
their joint posterior distribution, then draw y ∼ N (Xβ, σ 2 I ).
˜ • Posterior predictive simulation: • Analytic form of the posterior predictive distribution: ˜ˆ
p(˜|y ) is multivariate t with location X β , square scale matrix
s (I + X (X X ) X ), and n − k degrees of freedom.
& % ' $ Model checking and robustness • Suppose one simulates many samples y1 , . . . , yn from the
˜ posterior predictive distribution conditional on the same
covariate vectors, x1 , . . . , xn used to simulate the data. Slide 6 • To judge if a particular response value yi is consistent with the
tted model, one looks at the position of yi relative to the
histogram of simulated values of yi from the corresponding
˜ predictive distribution. • If yi is in the tail of the distribution, that indicates that this observation is a potential outlier. & % MATH-440 Linear Regression ' $ Example Measurements on breeding pairs of land-bird species were collected
from 16 islands around Britain over the course of several decades.
The dataset birdextint.txt contains the following variables for
• TIME: the average time of extinction of the species on the
Slide 7 island where it appeared • NESTING: the average number of nesting pairs
• SIZE: the size of the species (0=small or 1=large)
• STATUS: the migratory status of the species (0=migrant or 1=resident) The objective is to t a model that relates the time of extinction of
the bird species to the covariates.
' Slide 8 %
bird = read.table("birdextinct.txt", header=T, sep="\t")
hist(TIME) The distribution of the outcome variable, TIME, is strongly
right-skewed. Let's transform it to the log-scale:
LOGTIME = log(TIME)
hist(LOGTIME) & % MATH-440 Linear Regression ' $ 0 10 20 Slide 9 30 Frequency 40 50 Histogram of TIME 0 10 20 30 40 50 60 TIME & % ' $ 8
0 2 4 Slide 10 6 Frequency 10 12 14 Histogram of LOGTIME 0 1 2 3 4 LOGTIME & % MATH-440 Linear Regression ' $ Let us look at the relationship between LOGTIME and the three
predictor variables. Slide 11 plot(NESTING, LOGTIME)
out = (LOGTIME > 3)
text(NESTING[out], LOGTIME[out], label=SPECIES[out], pos=2)
View Full Document