Unformatted text preview: Summarizing and Summarizing Displaying Ratio Data Displaying Intro Take some ratio data • e.g. incomes in the US What basic features of these data do we care about? 3 Useful Features of Ratio Data
1. Shape of distribution Central, average, or typical value 1. Mean, median, mode 1. Variability range, interquartile range, standard deviation Shape of Data Histograms display the shape of data distributions
1. Divide dataset into intervals, or bins, or equal width 2. Count # of cases per bin 3. Bar height = # of cases within that bin Histogram: Fastest speed driven
60 0 0 20 Frequency 40 50 100 fastest 15 0 200 Distribution Shape Terms
• Outliers • Skewed to right or left; symmetric • Bimodal, unimodal Questions for you… How would you describe the shape of the US incomes? What does this mean substantively? Measures of Central Tendency Mode: the most common value
• Useful if a small number of values • Also can be used for ordinal or nominal data E.g. modal gender in the class, Mode of class year • This is where the terms bimodal; unimodal come from Mode Example
If we treat the household income data as ordinal ($0 $10,000, $10,000 $20,000, etc)… modal category for US incomes is $10,000 $20,000 Mode Strengths & Weaknesses
Strengths: • Easy to understand & calculate Weaknesses: • Not practical if a large # of categories • Does not provide a lot of information Mean Definition
Mean: The mean is what people commonly refer to as the “average.” It is like the center of gravity of a distribution. Some Notation ∑Xi is the sum of the values Xi Suppose our x values are 1, 4, 2 ∑Xi = 1 + 4 + 2 = 7 The i is an index; it designates the ith observation So above, X1 =1, X2 =4, X3 = 2 Formula for the Mean
M = ∑X i N ∑ : sum of Xi Xi : case value, where i=1,2,3…,n M : mean of all cases N : total number of cases Strengths of the Mean Simple to calculate The mean can tell you a lot about a large dataset… For example, what is the mean wage for a drug dealer? (A “foot soldier”, or gang member who deals drugs directly to people on the street) Calculating Average Footsoldiers’ Calculating Wage Wage
1. Add up all footsoldiers’ hourly wages Divide by number of footsoldiers = MEAN 1. Levitt & Venkatesh (2000) The Median
The median is the value of the score that divides the data in half. Take these numbers: 3,4,4,5,6
• Median=4 Even number of cases: 3,4,5,6 • Median=4.5 (midpoint of middle two cases) Median Examples
Median of our data… Height: 66 inches Top speed: 95 mph Median & Mean Median Strengths & Weaknesses
Mean Strengths:
• Means have nice statistical properties Median Strengths: • The median is less sensitive to outliers than the mean E.g. Take a set of incomes: $30,000, $40,000, $45,000, $50,000, $120,000 Mean=$285,000/5=$57,000 Median=$45,000 Mean vs. Median
Incomes in 2005: Mean household income=$63,344 Median household income=$46,326 Measures of Variability
We now know how to display the central tendency of a set of ratio data. But how about the spread of the data? How much is the data spread out among that central tendency? Range
Range: difference between minimum value and maximum value E.g. Take a set of incomes: $30,000, $40,000, $45,000, $50,000, $120,000 Range = $120,000$30,000 Range = $90,000 Range Strengths & Weaknesses Strengths: • Easy to calculate and understand Weakness: • only depends on the largest and smallest values • E.g. test scores: 0,75,75,75,75,75,75,75,75,75,100 Interquartile Range Quartiles are the medians of the two halves of the list
• Lower quartile one quarter from the bottom • Upper quartile one quarter from the top Interquartile range: difference between the upper and lower quartiles Calculating IQR
1, 3, 3, 5, 6, 7, 8, 10, 11, 12, 13, 13, 14, 15, 16 Median=10 Lower quartile=5 Upper quartile=13 Interquartile range = 135 = 8 IQ Range Strengths & Weaknesses Strengths: • Less influenced by largest and smallest values than the range Weaknesses: • Harder to calculate and understand than the range Standard Deviation
Standard Deviation: a measure of the spread of a distribution of numbers. It is the average distance from the mean. Calculating the Standard Deviation Calculate the mean Find deviations from the mean for each value (valuemean) Square these deviations Sum the squared deviations Divide by n1 (n is the # of values). This is the variance Take the square root of the variance. This is the standard deviation Variance & Variance Formula for the Variance Formula
Variance is the Standard Deviation squared. s2= ∑(X M)2 i N1 Xi : case value, where i=1,2,3…,n M : mean of all cases ∑: Sum of (XiM)2 N : the total number of cases Formula for the Standard Deviation
The standard deviation, s, is the square root of the variance, s2 Example on page 138 90, 90, 100, 110, 110 Mean=100 Deviations: 10, 10, 0, 10, 10 Squared deviations: 100, 100, 0, 100, 100 Sum of squared deviations: 400 N1=51=4 Variance=400/4=100 Standard deviation=10 Why do we square the deviations before adding them up? (From previous example) If we don’t square the deviations: 10 + 10 + 0 + 10 + 10 = 0 But we take the square root later Why divide by n1 and not n? This is too technical to discuss in this Class The key to understand is that we are taking the average distance from the mean What is the standard deviation of this dataset? 1,1,1,1,1 Mean=1 Deviations=0,0,0,0,0 Squared deviations=0,0,0,0,0 Sum of squared deviations=0 N1=51=4 Variance=0/4=0 Standard deviation=square root of 0=0 What does this mean? Standard Deviation Controversy
Larry Summers, former president of Harvard • Got into trouble about remarks in the difference in mathematical & scientific aptitude between men & women • Never claimed any difference in average ability • Instead, difference in standard deviation “[Larry Summers’] core claim, indeed his only claim, of innate difference [in intelligence between men and women] was that the standard deviation of men’s intelligence might be 20 percent greater than that of women. “ Ian Ayres, Supercrunchers “Imagine that you are expecting your first child. You are told that you can choose the range of possible IQs that your child will have, but this range must be centered on an IQ of 100. Any IQ within the range that you choose is equally likely to occur. What range would you choose — 95 to 105, or would you roll the dice on a wider range of, say, 60 to 140?”
Ian Ayres, Supercrunchers Standard Deviation Standard Strengths & Weaknesses Strengths Weaknesses • Hard to calculate? Strengths • Easy to understand? • Takes into account all of the data • Very useful Any questions? ...
View
Full Document
 Fall '09
 TAMBORINI
 Standard Deviation, Mean

Click to edit the document details