# We call an observation a suspected outlier if it

• 13

This preview shows page 8 - 10 out of 13 pages.

We call an observation a suspected outlierif it falls more than 1.5 times the size of the interquartile range (IQR) above the first quartile or below the third quartile. This is called the “1.5 * IQR rule for outliers.Q3= 4.35Q1 = 2.22567.92456.12345.32234.92124.72014.51964.21854.11743.91633.81523.71413.6133.41263.31152.91042.8932.5822.3712.3662.1551.5441.9331.6221.2110.6Disease X01234567Years until death8Interquartile range:Q3– Q14.35 2.2 = 2.151.5*IQR:1.5*2.15=3.255Distance to Q37.9 4.35 = 3.55Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than 3.225 years (1.5 * IQR). Thus, individual #25 is a suspected outlier.Show ExampleThe standard deviation “s” is used to describe the variation around the mean. Like the mean, it is not resistant to skew or outliers.s21n1(xi1nx )21. First calculate the variance s2.21)(11xxnsni2. Then take the square root to get the standard deviation s.Measure of spread: the standard deviationMean±1 s.d.xCalculations …We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator or software.21)(1xxdfsniMean = 63.4Sum of squared deviations from mean = 85.2Degrees freedom (df) = (n 1) = 13s2= variance = 85.2/13 = 6.55 inches squareds= standard deviation = 6.55 = 2.56 inchesWomen’s height (inches)i xix (xx )(xx )21 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Mean 63.4 Sum 0.0Sum 85.2
9/11/20129Variance and Standard DeviationWhy do we square the deviations?The sum of the squared deviations of any set of observations from their mean is the smallest that the sum of squared deviations from any number can possibly be.The sum of the deviations of any set of observations from their mean is always zero.Why do we emphasize the standard deviation rather than the variance? s, not s2, is the natural measure of spread for Normal distributions.shas the same unit of measurement as the original observations.Why do we average by dividing by n 1 rather than n in calculating the variance?The sum of the deviations is always zero, so only n1 of the squared deviations can vary freely.The number n1 is called the degrees of freedom.Properties of Standard Deviationsmeasures spread about the mean and should be used only when the mean is the measure of center.s= 0 only when all observations have the same value and there is no spread. Otherwise, s> 0.sis not resistant to outliers. shas the same units of measurement as the original observations. Choosing among summary statisticsBecause the meanis not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and don’t have outliers.