Linear Regression
Response or dependent variable: Y
Predictors or independent variables: X1 , X2 , . . . , Xp
GOALS:
Exploring p(y |x) as a function of x
Understanding the mean of Y as a function of x
Making predictions of Y for new x.
Searching for obser
Mercury in Bass
High levels of mercury in sh known to cause health
problems, especially in children. Nicholas School study of
mercury concentrations in largemouth bass from the
Wacamaw and Lumber rivers in NC.
Mercury cannot be excreted and accumulates in
Bayesian approach to simple regression
ind
Model: Yi N (0 + 1 xi , 2 ), where i = 1, . . . , n
Semi-conjugate priors for (0 , 1 , 2 ):
(0 , 1 ) M V N2 (0 , 1 ), 0 )
0 is a 2 2 matrix of variances and covariance
2
1/ 2 Gamma(0 /2, 0 0 /2)
Here, (0 , 1 ) is
Multivariate Normal Distribution
Our next major topics include hierarchical models and
regression (Chapter 8 and 9 in Hoff)
For Bayesian regression, we need the multivariate
normal (MVN) distribution (Chapter 7 in Hoff)
Today we introduce the MVN distribu
Bayesian Finite Population Inference
Population of N individuals.
Take a random sample (without replacement) of n
individuals.
Let yi the the value of a survey variable Y for individual
i, where i = 1, . . . , N .
Let Y = (y1 , . . . , yn ) be the sampled
Modeling Named Storms in Atlantic
World Meteorological Association names tropical
storms and hurricanes
Has distribution of storms changed over time? If so,
when?
Data collected on number of named storms in Atlantic
Ocean from 1970 to 2010. Data available
Convergence to Posterior Distribution
Theory proves that if a Gibbs sampler iterates enough, the
draws will be from the joint posterior distribution (called the
target or stationary distribution). Convergence to the target
distribution does not depend on
Monte Carlo Sampling
We have seen that Monte Carlo sampling is a useful
tool for sampling from prior and posterior distributions
By limiting attention to conjugate prior distributions, all
models have had tractable posterior distributions so
sampling was
The Pygmalion Study
Do teachers expectations impact academic development of
children?
Researchers gave IQ test to elementary school children
They randomly picked six children and told teachers
that the test predicts them to have high potential for
acceler
Posterior Integration
Suppose we want posterior distribution of function of , say
= g ( )
For expectation of , we have
p( | Y )d =
g ()
g ()p( | Y )d
What if we do not know how to compute the integral?
Common problem as we move in to higher dimensional
p
Volcano Eruptions
What is the distribution of waiting times for volcanos?
Data on duration between eruptions (in months) of the
Mauna Loa volcano on Hawaii between 1832 and 1952.
n = 36 observations; see maunaloa.txt on Sakai.
Questions:
1. What is a 95%
Teenagers and Televisions
In 1998, the New York Times and CBS News polled 1048
randomly selected 13 - 17 year olds to ask them if they had
a television in their room.
n = 1048 sampled teenagers
y = 692 had a television in their room
Inference about , the
Motivating Bayesian Inference
Signicance tests and condence intervals are forms of
classical or frequentist inference.
When might classical inference be inadequate?
Suppose you ip a coin (with unknown probability of
heads) three times and get tails all th
STA 122/290 course web page
www.stat.duke.edu/ jerry/sta122/sta122n290f11.html
Access via Sakai
Access via my home page
Access via department web pages (courses in fall 2011)
p. 1/
What is modern statistical data analysis?
Modern statistical data analysi
Examples of poor study designs
For STA 122/290
Prof. Jerry Reiter
Problematic survey design
Literary Digest calls 1936 election wrong
Questionnaires mailed to 10 million people. Addresses
collected from telephone books and club memberships.
About 2.4 mil