{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

10.1.1.78.5947

# 10.1.1.78.5947 - Selecting the Number of Bins in a...

This preview shows pages 1–4. Sign up to view the full content.

Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach Kun He * Department of Mathematics University of Kansas Lawrence, KS 66045 Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 Appeared in Journal of Statistical Planning and Inference , Vol 61 (1997), 59-59. * Research supported in part by University of Kansas General Research Fund Research supported in part by NSF Grant SES 9201718 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach Short Running Title Number of Bins in a Histogram ABSTRACT In this note we consider the problem of, given a sample, selecting the number of bins in a histogram. A loss function is introduced which reflects the idea that smooth distributions should have fewer bins than rough distri- butions. A stepwise Bayes rule, based on the Bayesian bootstrap, is found and is shown to be admissible. Some simulation results are presented to show how the rule works in practice. Key Words: histogram, Bayesian bootstrap, stepwise Bayes, admissibility, non-informative Bayes and entropy. AMS 1991 Subject Classification: Primary 62C15; Secondary 62F15, 62G07. 2
1 Introduction The histogram is a statistical technique with a long history. Unfortunately there exist only a few explicit guidelines, which are based on statistical theory, for choosing the number of bins that appear in the histogram. Scott [8] gave a formula for the optimal histogram bin width which asymptotically minimizes the integrated mean squared error. Since the underlying density is usually unknown, it is not immediately clear how one should apply this in practice. Scott suggested using the Gaussian density as a reference standard, which leads to the data-based choice for the bin width of a × s × n - 1 / 3 , where a = 3 . 49 and s is an estimate of the standard deviation. (See also Terrell and Scott [10] and Terrell [9].) As Scott noted many authors advise that for real data sets histograms based on 5-20 bins usually suffice. Rudemo [7] suggested a cross-validation technique for selecting the number of bins. But such methods seem to have large sampling variation. In this note we will give a decision theoretic approach to the problem of choosing the number of bins in a histogram. We will introduce a loss function which incorporates the idea that smoother densities require less bins in their histogram estimates than rougher densities. A non-informative Bayesian approach, based on the Bayesian bootstrap of Rubin [6], will yield a data dependent decision rule for selecting the number of bins. We will then give a stepwise Bayes argument which proves the admissibility of this rule and shows the close connection of the rule to the notion of maximum likelihood, which also underlies the idea of a histogram. Finally we give some simulation results which show how our rule works in practice and compares to Scott’s rule. In section 2 we describe the rule and give the simulation results, while the proof of admissibility is deferred to section 3.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern