class_09_05_01 - Statistical Data Mining ORIE 474 Fall 2007...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Data Mining ORIE 474 Fall 2007 Tatiyana Apanasovich 09/05/07 Histograms
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Histograms Histograms provide three very important pieces of information about distributions of data values: shape, central location (the middle) spread (how different the values are from each other and from the middle). Histograms show how data can pile up ; in any distribution of values, some values will occur more frequently than others. The peaks on the histogram show where there is similarity among the data. This is the central location, which is measured by mean, median, and mode. While these statistics provide valuable information about the process, central location alone does not provide a complete picture of the process. When you consider the spread of the data, you will see its extremes. The shape of the histogram can show if the system leans toward one extreme or the other, or if there are multiple peaks.
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Interpreting histograms After you have the graph you need to examine it and interpret it – see what it tells you First, look for patterns and deviations from the patterns
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Interpreting histograms … In our example histogram, let’s begin with the obvious deviations at the low and high end of the histogram. There are two states that are separated from the bulk in the middle, one with 6.3% and one with 17.0% people 65 and older These two outlier states are Alaska and Florida
Background image of page 6
Interpreting histograms … Once we have seen these on the histogram, it’s easy to pick them out of the list as well What about a state like Utah with 8.5% 65 and older? This is a matter of judgment, Utah is certainly unusually low according to this histogram, but it may not qualify as an outlier in the same sense as Alaska Once we have identified the outliers, the next step is to examine them closely and try to figure out why Commonly they are due to data problems, typing errors like 40 instead of 4.0 Once the data problems are eliminated we need more information
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Interpreting histograms … In this case, we know that Florida is a destination for retirees so it makes sense
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/23/2009 for the course ORIE 474 at Cornell University (Engineering School).

Page1 / 20

class_09_05_01 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online