BUS708 Week 03 Descriptive Statistics.pdf - BUS708...

• Anuj6335
• 21

This preview shows page 1 - 9 out of 21 pages.

BUS708 Statistics and Data Analysis LECTURE 03 DESCRIPTIVE STATISTICS Outline One categorical variable Summary statistics: frequency table , proportion Visualization: bar chart, pie chart Two categorical variables Summary statistics: two-way table, difference in proportions Visualization: segmented or side-by-side bar chart One quantitative variable Summary statistics: mean, median, standard deviation Visualization: dot plot, histogram Two quantitative variables Summary statistics: two-way table, difference in proportions Visualization: segmented or side-by-side bar chart One categorical and one quantitative variables Summary statistics: statistics by group, difference in means Visualization: side-by-side boxplots/dot plot/ histogram 2 Textbook Section 2.1 – 2.5
Data Description Recall last week … 3 Statistics: Unlocking the Power of Data Lock 5 One Categorical Variable: Proportion The proportion in a category is found by ݌ݎ݋݌݋ݎݐ݅݋݊ = ݊ݑܾ݉݁ݎ ݅݊ ܿܽݐ݁݃݋ݎݕ ݐ݋ݐ݈ܽ ݏܽ݉݌݈݁ ݏ݅ݖ݁ Proportion for a sample: ݌ (“p-hat”) Proportion for a population: p
Statistics: Unlocking the Power of Data Lock 5 Proportion What proportion of adults sampled do not own a cell phone? Android 458 iPhone 437 Blackberry 141 Non Smartphone 924 No cell phone 293 Total 2253 ݌ ො = 293 2253 = 0.13 or 13% Proportions and percentages can be used interchangeably Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables Look at the relationship between two categorical variables Data collected from university students on 1. Relationship status 2. Gender
Statistics: Unlocking the Power of Data Lock 5 Two-Way Table 1. What proportion of students are in a relationship? 2. What proportion of females are in a relationship? 3. What proportion of students in a relationship are female? 4. What proportion of males are in a relationship? Female Male Total In a Relationship 32 10 42 It’s Complicated 12 7 19 Single 63 45 108 Total 107 62 169 42/169 25% 32/107 30% 32/42 76% 10/62 16% Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions A difference in proportions is a difference in proportions for one categorical variable calculated for different levels of the other categorical variable Example: proportion of females in a relationship – proportion of males in a relationship ݌ ி − ݌ = 0.30 − 0.16 = 0.14
Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions ݌̂ = proportion of students in a relationship who are female ݌̂ = proportion of single students who are female Find the difference in proportions: ݌ − ݌ Female Male Total In a Relationship 32 10 42 It’s Complicated 12 7 19 Single 63 45 108 Total 107 62 169 32/42 – 63/108 = 0.762 – 0.583 = 0.179 Statistics: Unlocking the Power of Data Lock 5 One Quantitative Variable Recall from last week: to describe a quantitative variable, we need to consider: Shape discussed last week Centre Spread Measures of Centre: mean, median Measures of Spread: standard deviation, range, IQR Other measures: Minimum, maximum, Q1, Q3
Statistics: Unlocking the Power of Data Lock 5 Notation n = sample size (the number of cases in the sample) x 1 , x 2 , …, x n represent the n values of the variable x Example (from HollywoodMovies2011): x 1 = 97.009, x 2 = 201.897, x 3 = 216.196, … Statistics: Unlocking the Power of Data Lock 5 Mean The mean or average of the data values is ݉݁ܽ݊ = ௦௨௠ ௢௙ ௔௟௟ ௗ௔௧௔ ௩௔௟௨௘௦ ௡௨௠௕௘௥ ௢௙ ௗ௔௧௔ ௩௔௟௨௘௦ Sample mean: ݔ̅ Population mean: (“mu”) ݉݁ܽ݊ = ݔ + ݔ + ⋯ + ݔ ݊ = ∑ ݔ ݊
Statistics: Unlocking the Power of Data Lock 5 Median The median , m , is the middle value when the data are ordered. If there are an even number of values, the median is the average of the two middle values. The median splits the data in half. Statistics: Unlocking the Power of Data Lock 5 Resistance A statistic is resistant if it is relatively unaffected by extreme values. The median is resistant while the mean is not. Mean Median With Harry Potter \$150,742,300 \$76,658,500 Without Harry Potter \$141,889,900 \$75,009,000
Statistics: Unlocking the Power of Data Lock 5 Exercise: Measures of Center For each of the following variables: Find the mean Find the median Identify any outliers
• • • 