# 1.2 Numerical Summary for Unvariate Variable.pdf - STAT...

• Notes
• 10

This preview shows page 1 - 3 out of 10 pages.

STAT 2513. Statistics for Biological and Health Sciences § 1.2 Numerical Summary for Univariate Variable Peng Zeng (Auburn University, Summer 2019) This section discusses how to summarize data with numbers. The two most important summaries of data for a quantitative variable are those for the center and the spread . Appropriate choices for these measures are discussed here as well as the effects of a linear transformation of the data on these measures. This section also explains the five-number summary and its graphical representation, the boxplot . Finally, the percentile is ex- plained as a measure of relative standing. Measures of Center In a class of 15 students, the scores of exam 1 are: 55, 60, 65, 68, 72, 64, 83, 63, 66, 72, 95, 62, 83, 58, 72. How would you describe the center of these numbers? A typ i cal ap proach is to de scribe the cen ter of data by mean . The word “mean” is how we call “average” in Statistics. Calculate the sample mean ¯ x by the formula ¯ x = 1 n n X i =1 x i = x 1 + x 2 + · · · + x n n , where n denotes the sample size , or the number of individuals in a dataset. Here n = 15, and x i denotes the score for the i th student. For the example, the mean is ¯ x = 55 + 60 + 65 + 68 + 72 + 64 + 83 + 63 + 66 + 72 + 95 + 62 + 83 + 58 + 72 15 = 69 . 2 . An other ap proach for de scrib ing the cen ter is to cal cu late the me dian . The word “median” roughly means “middle number”, that is, we sort the data and then pick the number in the middle. Below are the data sorted in ascending order. 55, 58, 60, 62, 63, 64, 65, 66, 68, 72, 72, 72, 83, 83, 95. The middle number is exactly the 8th score, because seven scores are smaller than it and seven scores are larger than it. Hence the median is 66, which is the 8th score. Suppose for example that there are only n = 4 scores. There is no single “middle number”. In this case, we average the two middle numbers. For example, the sample median of 55, 58, 60, 62 is (58 + 60)/2 = 59. As a summary, the sam ple me dian is the mid dle num ber, pro vided there is one, or the av er age of the two mid dle num bers . Care has to be taken when interpreting a measure of center. Average life expectancy, for example, has substantially increased during the last two decades in the United States. This poses problems for the long term financing of Social Security. Based on this information one 1
might find an increase of minimum retirement age to be the obvious solution. It turns out, however, that life expectancy has increased much more for wealthy people than for the poor. Should the poor, who depend on Social Security, have to work longer because the wealthy, who don’t need it, live longer? More generally, just because some procedure or decision is best in the “typical” case, does not necessarily make it the best choice in a particular case. Measures of Spread How spread apart is the data? One idea for measuring the spread of the data is to calculate the average distance from its center. For the data: 55, 58, 60, 62, 63, 64, 65, 66, 68, 72, 72, 72, 83, 83, 95 the center, as measured by the sample mean, is 69.2. So we subtract from each score 69.2 and obtain - 14 . 2, - 11 . 2, - 9 . 2, - 7 . 2, - 6 . 2, - 5 . 2, - 4 . 2, - 3 . 2, - 1 . 2, 2.8, 2.8, 2.8, 13.8, 13.8, 25.8.