This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: ISyE8843A, Brani Vidakovic Handout 8 1 Hierarchical Bayes and Empirical Bayes. MLII Method. Hierarchical Bayes and Empirical Bayes are related by their goals, but quite different by the methods of how these goals are achieved. The attribute hierarchical refers mostly to the modeling strategy, while empirical is referring to the methodology. Both methods are concerned in specifying the distribution at prior level, hi- erarchical via Bayes inference involving additional degrees of hierarchy (hyperpriors and hyperparameters), while empirical Bayes is using data more directly. In expanding Bayesian models and inference to more complex problems, going beyond the simple likelihood-prior-posterior scheme, a hierarchy of models may be needed. The parameter(s) of interest con- sidered are entering the model via their “realizations” which are modeled similarly as they were “measure- ments.” The common name parameter population distribution is indicative of the nature of the approach. 1.1 Hierarchical Bayesian Analysis Hierarchical Bayesian Analysis is a convenient representation of a Bayesian model, in particular the prior π , via a conditional hierarchy of so called hyper-priors π 1 ,...,π n +1 , π ( θ ) = Z π 1 ( θ | θ 1 ) π 2 ( θ 1 | θ 2 ) ...π n ( θ n- 1 | θ n ) π n +1 ( θ n ) dθ 1 dθ 2 ...dθ n . (1) Operationally, the model [ x | θ ] ∼ f ( x | θ ) , [ θ | θ 1 ] ∼ π 1 ( θ | θ 1 ) , [ θ n- 1 | θ n ] ∼ π n ( θ | θ 1 ) , [ θ n ] ∼ π n +1 ( θ n ) . (2) is equivalent to the model [ x | θ ] ∼ f ( x | θ ) , [ θ ] ∼ π ( θ ) , as the inference on θ is concerned. Notice that in the hierarchy of data, parameters and hyperparameters, X-→ θ-→ θ 1-→ θ 2 ...-→ θ n X and θ i are independent, given θ. That means, [ X | θ,θ 1 ,... ] d = [ X | θ ] , [ θ i | θ,X ] d = [ θ i | θ ] , where d = is equality in distribution. The joint distribution [ X,θ,θ 1 ,...,θ n ] which by definition is [ X,θ,θ 1 ,...,θ n ] = [ X | θ,θ 1 ,...,θ n ] [ θ | θ 1 ,...,θ n ] [ θ 1 | θ 2 ,...,θ n ] ... [ θ n- 1 | θ n ] [ θ n ] can be represented as [ X,θ,θ 1 ,...,θ n ] = [ X | θ ][ θ | θ 1 ] [ θ 1 | θ 2 ] ... [ θ n- 1 | θ n ] [ θ n ] , thus, to fully specify the model, only “neighbouring” conditionals [ X | θ ] , [ θ | θ 1 ] , [ θ 1 | θ 2 ] , ..., [ θ n- 1 | θ n ] and the “closure” distribution [ θ n ] are needed. Why then decompose the prior, as in (1) and use the model (2). Here are some of the reasons: • Modeling requirements may lead to the hierarchy in the prior. For example Bayesian models in meta analysis; 1 • The prior information may be separated into the structural part and the subjective/noninformative part at higher level of hierarchy; • Robustness and objectiveness – “let the data talk about the hyperparameters;” • Calculational issues (utilizing hidden mixtures, mixture priors, missing data, MCMC format)....
View Full Document
This note was uploaded on 10/23/2011 for the course ISYE 8843 taught by Professor Vidakovic during the Spring '11 term at Georgia Institute of Technology.
- Spring '11