This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach Kun He * Department of Mathematics University of Kansas Lawrence, KS 66045 Glen Meeden † School of Statistics University of Minnesota Minneapolis, MN 55455 Appeared in Journal of Statistical Planning and Inference , Vol 61 (1997), 5959. * Research supported in part by University of Kansas General Research Fund † Research supported in part by NSF Grant SES 9201718 1 Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach Short Running Title Number of Bins in a Histogram ABSTRACT In this note we consider the problem of, given a sample, selecting the number of bins in a histogram. A loss function is introduced which reflects the idea that smooth distributions should have fewer bins than rough distri butions. A stepwise Bayes rule, based on the Bayesian bootstrap, is found and is shown to be admissible. Some simulation results are presented to show how the rule works in practice. Key Words: histogram, Bayesian bootstrap, stepwise Bayes, admissibility, noninformative Bayes and entropy. AMS 1991 Subject Classification: Primary 62C15; Secondary 62F15, 62G07. 2 1 Introduction The histogram is a statistical technique with a long history. Unfortunately there exist only a few explicit guidelines, which are based on statistical theory, for choosing the number of bins that appear in the histogram. Scott [8] gave a formula for the optimal histogram bin width which asymptotically minimizes the integrated mean squared error. Since the underlying density is usually unknown, it is not immediately clear how one should apply this in practice. Scott suggested using the Gaussian density as a reference standard, which leads to the databased choice for the bin width of a × s × n 1 / 3 , where a = 3 . 49 and s is an estimate of the standard deviation. (See also Terrell and Scott [10] and Terrell [9].) As Scott noted many authors advise that for real data sets histograms based on 520 bins usually suffice. Rudemo [7] suggested a crossvalidation technique for selecting the number of bins. But such methods seem to have large sampling variation. In this note we will give a decision theoretic approach to the problem of choosing the number of bins in a histogram. We will introduce a loss function which incorporates the idea that smoother densities require less bins in their histogram estimates than rougher densities. A noninformative Bayesian approach, based on the Bayesian bootstrap of Rubin [6], will yield a data dependent decision rule for selecting the number of bins. We will then give a stepwise Bayes argument which proves the admissibility of this rule and shows the close connection of the rule to the notion of maximum likelihood, which also underlies the idea of a histogram. Finally we give some simulation results which show how our rule works in practice and compares to Scott’s rule. In section 2 we describe the rule and give the simulation results, while the proof of admissibility is deferred to section 3.the proof of admissibility is deferred to section 3....
View
Full
Document
This note was uploaded on 12/05/2011 for the course GRC 421 taught by Professor Dougspeer during the Fall '11 term at Cal Poly.
 Fall '11
 DougSpeer

Click to edit the document details