Statistical Data Mining ORIE 474 Fall 2007 Tatiyana Apanasovich 09/05/07 Histograms

Histograms Histograms provide three very important pieces of information about distributions of data values: shape, central location (the middle) spread (how different the values are from each other and from the middle). Histograms show how data can pile up ; in any distribution of values, some values will occur more frequently than others. The peaks on the histogram show where there is similarity among the data. This is the central location, which is measured by mean, median, and mode. While these statistics provide valuable information about the process, central location alone does not provide a complete picture of the process. When you consider the spread of the data, you will see its extremes. The shape of the histogram can show if the system leans toward one extreme or the other, or if there are multiple peaks.

Interpreting histograms After you have the graph you need to examine it and interpret it – see what it tells you First, look for patterns and deviations from the patterns

Interpreting histograms … In our example histogram, let’s begin with the obvious deviations at the low and high end of the histogram. There are two states that are separated from the bulk in the middle, one with 6.3% and one with 17.0% people 65 and older These two outlier states are Alaska and Florida
Interpreting histograms … Once we have seen these on the histogram, it’s easy to pick them out of the list as well What about a state like Utah with 8.5% 65 and older? This is a matter of judgment, Utah is certainly unusually low according to this histogram, but it may not qualify as an outlier in the same sense as Alaska Once we have identified the outliers, the next step is to examine them closely and try to figure out why Commonly they are due to data problems, typing errors like 40 instead of 4.0 Once the data problems are eliminated we need more information

Interpreting histograms … In this case, we know that Florida is a destination for retirees so it makes sense
