This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: ISyE8843A, Brani Vidakovic Handout 15 There are no true statistical models. 1 Model Search, Selection, and Averaging. Although some model selection procedures boil down to testing hypotheses about parameters and choosing the best parameter or a subset of parameters, model selection is a broader inferential task. It can be non parametric, for example. Model selection sometimes can be interpreted as an estimation problem. If the competing models are indexed by i ∈ { 1 , 2 ,...,m } , getting the posterior distribution of an index i would lead to the choice of best model, for example, the model that maximizes posterior probability of i . 1.1 When you hear hoofbeats, think horses, not zebras. Ockham’s razor is a logical principle attributed to the medieval philosopher and Franciscan monk William of Ockham. The principle states that one should not make more assumptions than the minimum needed. This principle is often called the principle of parsimony. It is essential for all scientific modeling and theory building. Figure 1: Ockhams Razor: Pluralitas non est ponenda sine necessitate. Franciscan monk William of Ock ham (ca. 12851349) As Jefferys and Berger (1991) pointed out, the idea of measuring complexity and connecting the notions of complexity and prior probability goes back to Sir Harold Jeffreys’ pioneering work on statistical inference in the 1920s. On page 47 of his classical work [11], Jeffreys says: Precise statement of the prior probabilities of the laws in accordance with the condition of convergence requires that they should actually be put in an order of decreasing prior probability. But this corresponds to actual scientific procedure. A physicist would test first whether the whole variation is random against the existence of a linear trend; than a linear law against a quadratic one, then proceeding in order of increasing complexity. All we have to say is that simpler laws have the greater prior probabilities. This is what Wrinch and I called the simplicity postulate. To make the order definite, however, requires a numerical rule for assessing the complexity law. In the case of laws expressible by differential equations this is easy. We would define the complexity of a differential equation, cleared of roots and fractions, by the sum of order, the degree, and the absolute values of the coefficients. Thus s = a would be written 1 as ds/dt = 0 with complexity 1 + 1 + 1 = 3 . s = a + ut + 1 2 gt 2 would become d 2 s/dt 2 = 0 with complexity 2 + 1 + 1 = 4 . Prior probability 2 m of 6 /π 2 m 2 could be attached to the disjunction of all laws of complexity m and distributed uniformly among them. In the spirit of Jeffreys’ ideas, and building on work of Wallace and Boulton, Akaike, Dawid, Good, Kolmogorov, and others, Rissanen (1978) proposed the Minimum Description Length Principle (MDLP) as a paradigm in statistical inference. Informally, the MDLP can be stated as follows: The preferred M for explaining observed data D is one that minimizes:...
View
Full
Document
This note was uploaded on 10/23/2011 for the course ISYE 8843 taught by Professor Vidakovic during the Spring '11 term at Georgia Institute of Technology.
 Spring '11
 VIDAKOVIC

Click to edit the document details