Unformatted text preview: Public Health 6450 Fall 2011 Andrew Mugglin and Lynn Eberly Division of Biostatistics School of Public Health University of Minnesota ph6450@biostat.umn.edu Part 02 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Where are we going? Previously: I Univariate Exploratory Data Analysis (EDA) with plots I Categorical Variable: Frequency Table, Bar chart, Pie Chart I Continuous Variable: Stemplot, Histogram Current topics: I Univariate EDA with descriptive statistics I Multivariate EDA with tables and plots Mugglin and Eberly PubH 6450 Fall 2011 Part 02 2 / 51 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Measures of Central Tendency and Shape Measures of Dispersion Data Exploration Strategy Measures of central tendency I Mean I Median I Mode I Skew Mugglin and Eberly PubH 6450 Fall 2011 Part 02 3 / 51 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Measures of Central Tendency and Shape Measures of Dispersion Data Exploration Strategy Summation notation Setup: dataset consists of { 8 , 2 , 3 , 5 } . These numbers are denoted as x i x 1 = 8 , x 2 = 2 , x 3 = 3 , x 4 = 5 . Sum = 4 X i =1 x i = x 1 + x 2 + x 3 + x 4 = 8 + 2 + 3 + 5 = 18 where the captial Greek letter sigma is the summation sign. Mugglin and Eberly PubH 6450 Fall 2011 Part 02 4 / 51 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Measures of Central Tendency and Shape Measures of Dispersion Data Exploration Strategy Mean Definition For a set of n observations x 1 , x 2 , . . . , x n , the mean x is defined as: x = x 1 + x 2 + + x n n , or x = 1 n n X i =1 x i . Mugglin and Eberly PubH 6450 Fall 2011 Part 02 5 / 51 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Measures of Central Tendency and Shape Measures of Dispersion Data Exploration Strategy Median Definition For a set of n observations x 1 , x 2 , . . . , x n , the median M is the midpoint, i.e., half the observations take that value or a smaller value. Let x (1) , x (2) , . . . , x ( n ) denote the increasingly ordered observations: x (1) is the minimum (the smallest) x (2) is the second smallest, x (3) is the third smallest, . . . x ( n ) is the maximum (the largest). Then M = ( x ( n +1) / 2 if n is odd , 1 2 ( x n / 2 + x ( n +2) / 2 ) if n is even . Mugglin and Eberly PubH 6450 Fall 2011 Part 02 6 / 51 Review Univariate EDA with Descriptive Statistics Multivariate EDA with Tables and Plots Measures of Central Tendency and Shape Measures of Dispersion Data Exploration Strategy Mode Definition The mode: The value that occurs most frequently in the data set....
