Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics . A measure of central tendency is a number that indicates the “center” of a data set, or a “typical” value. Sample mean X _ : For n observations, X _ = X i / n = The sample mean is often used to estimate the population mean . (Typically we can’t calculate the population mean.) Alternative: Sample median M: the “middle value” of the data set. (At most 50% of data is greater than M and at most 50% of data is less than M.) Steps to calculate M : (1) Order the n data values from smallest to largest. (2) Observation in position ( n +1)/2 in the ordered list is the median M. (3) If ( n +1)/2 is not a whole number, the median will be the average of the middle two observations. For large data sets, typically use computer to calculate M.

Example: Per capita CO 2 emissions for 25 European countries (2006): Ordered Data: 3.3 4.2 5.6 5.6 5.7 5.7 6.2 6.3 7.0 7.6 8.0 8.1 8.3 8.6 8.7 9.4 9.7 9.9 10.3 10.3 10.4 11.3 12.7 13.1 24.5 Luxembourg with 24.5 metric tons per capita is an outlier (unusual value). What if we delete this country? Which measure was more affected by the outlier?
Shapes of Distributions When the pattern of data to the left of the center value looks the same as the pattern to the right of the center, we say the data have a symmetric distribution. Picture: If the distribution (pattern) of data is imbalanced to one side, we say the distribution is skewed . Skewed to the Right (long right “tail”). Picture: Skewed to the Left (long left “tail”). Picture: Comparing the mean and the median can indicate the skewness of a data set.

Other measures of central tendency Mode : Value that occurs most frequently in a data set. In a histogram, the modal class is the class with the most observations in it. A bimodal
