This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Lecture 3 Nancy Pfenning Stats 1000 We learned last time how to construct a stemplot to display a single quantitative variable. A back-to-back stemplot is a useful display tool when we are interested in comparing the values of a single quantitative variable for two categorical groups. Example Lets use a back-to-back stemplot to compare earnings (in thousands of dollars) of 28 male and 51 female students. Since the earnings range from 0 to 22 thousand, we will split the stems 0, 1, and 2 five ways each. [Note that besides the quantitative variable earnings, we are adding in a categorical variable sex that has two possible values, male and female.] Sharing the same stems, male earnings precede them right to left, while female earnings follow the stems left to right. 1 1 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 5 4 4 4 5 5 5 5 7 6 6 6 6 7 7 8 8 8 8 9 1 1 2 2 1 2 1 5 1 1 1 2 2 2 The center is clearly higher for the males (midpoint at 3) than for the females (midpoint at 2). Male earnings range from 0 to 22 thousand, whereas female earnings range from 0 to 15 thousand. However, the spreads appear comparable if we disregard the high outliers. Shapes are very skewed to the right, as is often the case with monetary variables such as earnings, costs of homes, etc. Both distributions have a single peak in the low thousands: 2 or 3 thousand dollars was a common amount for both sexes. Note that some of the stems have no leaves. These stems must not be omitted, otherwise we could not see outliers for what they are. Since we see a tendency in this particular class for males to earn more than females, it is natural to wonder whether the same conclusion can be drawn about Pitt students, or all college students, in general. Our ultimate goal in this course is to go beyond the data at hand and draw conclusions about the larger population from which the data originated, a process called statistical inference. This requires careful development of needed theory over the course of the semester. Up to now, weve mentioned center as simply the midpoint (median), and spread as the range. These only provide limited information from a couple of observations. Since center and spread are the most important features of a distribution, they should be defined carefully. One measure of center is the median , or middle value. There is a single middle value for an odd number of observations. For an even number of observations, we take the median to be the average of the two middle values. Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3 2 = 3 thousand dollars. The median earnings of the 51 female students is the 26th value, 2 thousand dollars. We can say that the typical male student earns 1 thousand dollars more than the typical female student....
View Full Document
- Fall '06