This is true even if the word average is not in the question eg do college

# This is true even if the word average is not in the

• amyrox13
• 23

This preview shows page 11 - 16 out of 23 pages.

‐ This is true even if the word average is not in the question (e.g., “do college graduates earn more money than non‐ college graduates?”) This is also true for many relational RQs ‐ E.g., Is there an association between mother’s education level and students’ academic performance? Or does smoking cause lung cancer? 33 Most of our RQs are about the “ average attitude/behavior and average effect
10/04/2019 12 Typically, we measure “average” attitude/behavior with mean 1. Mean The sum total of observed values divided by the number of observations Normally, the “mean value” of your sample gives you a sense of what the sample is like 34 1 n i i X X n 35 But sometimes mean gives you a poor sense of the “average” value “Bill Gates walks into a bar example” Name Income Tom \$32,000 Larry \$36,000 Susan \$39,000 Paul \$41,000 Marcus \$45,000 Randy \$50,000 Sandy \$57,000 Tim \$60,000 Pam \$65,000 Jack \$80,000 What is the average (mean) annual income for the group? \$50,500 Name Income Tom \$32,000 Larry \$36,000 Susan \$39,000 Paul \$41,000 Marcus \$45,000 Randy \$50,000 Sandy \$57,000 Tim \$60,000 Pam \$65,000 Jack \$80,000 Bill Gates \$1,000,000,000 What is the new average (mean) income for the group? > \$100 million Hence we need an alternative measure – median is to the rescue Median Middle value when data arranged in ascending or descending order If even data points ‐ average of two middle values Is also called the 50th percentile *Another situation where you can only use median but not mean is when you work with rank order data ( ordinal scaled data such as income groups ) 36
10/04/2019 13 37 Back to the “Bill Gates walks into a bar example” Name Income Tom \$32,000 Larry \$36,000 Susan \$39,000 Paul \$41,000 Marcus \$45,000 Randy \$50,000 Sandy \$57,000 Tim \$60,000 Pam \$65,000 Jack \$80,000 Median income of the group Median = (\$45,000 + \$50,000)/2 = \$47,500 Name Income Tom \$32,000 Larry \$36,000 Susan \$39,000 Paul \$41,000 Marcus \$45,000 Randy \$50,000 Sandy \$57,000 Tim \$60,000 Pam \$65,000 Jack \$80,000 Bill Gates \$1,000,000,000 Median income of the new group Median = \$50,000 38 Application: why average (mean) income doesn’t represent “average Joe’s” income? Average (mean) household income: close to 70% of households make less than this Outliers ‐ an observation that is well outside of the expected range of values Outliers are typically spotted by examining the distribution of the variable You need to be very careful with outliers because they can mess up your mean calculation There is no scientific way to determine how far is too far out. Mostly a judgement call 39 Metric variables – outliers
10/04/2019 14 40 Metric data outliers – commute time to Caulfield Statistics Q6 N Valid 107 Missin g 0 Mean 38.21 Median 30.00 Std. Deviation 58.319 Minimum 5 Maximum 600 Q6 Frequency Percent Valid Percent Cumulative Percent Valid 5 5 4.7 4.7 4.7 8 1 .9 .9 5.6 10 6 5.6 5.6 11.2 15 10 9.3 9.3 20.6 20 16 15.0 15.0 35.5 25 8 7.5 7.5 43.0 30 18 16.8 16.8 59.8 32 1 .9 .9 60.7 35 4 3.7 3.7 64.5 38 1 .9 .9 65.4 40 10 9.3 9.3 74.8 45 6 5.6 5.6 80.4 50 2 1.9 1.9 82.2 53 2 1.9 1.9 84.1 60 11 10.3 10.3 94.4 70 2 1.9 1.9 96.3 90 2 1.9 1.9 98.1 120 1 .9 .9 99.1 600 1 .9 .9 100.0 Total 107 100.0 100.0 41 Metric data outliers – commute time to Caulfield 42 Mean is very sensitive to the existence of outlier values, but not the median Including outlier Excluding outlier Mean 38.2 32.9 Median 30.0 30.0 Example ‐ “Average” Commute time to Caulfield
10/04/2019 15 You need to make changes to the collected data for your research ‐ For example, to create new ways to group your respondents for