Chapter 2 - Chapter 2 Section 1 Describing Location in a...

Info icon This preview shows pages 1–7. Sign up to view the full content.

Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the highest score in the class? What if it is below the “average" of the entire class? Maybe the teacher might “curve” the grade. We will focus on the act of describing the location of an individual within a distribution. Let’s consider the class scores below: 79 81 80 77 73 83 74 93 78 8O 75 67 73 77 83 86 90 79 85 83 89 84 82 77 72 Here is a stemplot with the data. Notice that the distribution 6 7 is roughly symmetric with no apparent outliers. Where does your 7 2334 'éLL : (8'41 score in comparison to everyone else? 7 5777899 515 8 00123334 8 569 Measuring Position: Percentiles 9 03 One way to describe your position is to tell what percent of students in the class earned scores that were below ours. That is, we can calculate the ercentile. Definition: Percentile The pth percentile of a distribution is the value with p percent of the observations less than it. Here, our score is the fourth from the top of the class. Since 21 of the 25 observations are below our score it is at the 84th percentile in the test score distribution. Example: _| Using these scores, let’s calculate the percentile of the following: 3. “w. a) The score at 72. 7 g TR SCOVQ 6‘} 19‘ ‘Y b) The score at 93_ 3/35: 4 e/ .a ,,.x, ‘2 \ l‘ll 'E' ’ 3",: I’ffy't‘iE/g “lam? 5/ch of 9‘5 ‘> 7' {h a - L, , c) The two students at 80.} —: 1% /, 3‘3 TR 3M0 {tfiems‘ v‘r" 3“” “it? ‘56 ll :fo :—-’m’;~jtl.£, L_ *Note: Some may define the pth percentile as the value with p percent less than or equal to it. Cumulative Relative Frequency Graphs There are some interesting graphs that can be made using percentiles. One of the graphs starts with a frequency table for a quantitative variable. Here is a frequency table that summarizes the ages of the first 44 US. presidents when they were inaugurated: Age Frequency Relative Cumulative Cumulative relative Frequenc 40-44 2 1 45-49 7 {1 50-54 13 2 .1 55-59 12 3M 60—64 7 E 5 65-69 |_ 3 “l ‘t The extra columns will be used to help us determine the relative frequency, cumulative frequency, and cumulative relative frequency. To determine the relative frequency we would divide the count of each class by the total and multiply by 100 to get the percentage. To determine the cumulative frequency we would add the counts in the frequency column for the current class and all classes with smaller values of the variable. To determine the cumulative relative frequency, we would divide the entries in the cumulative frequency by the total and multiply by 100 to receive the percentage. We can make a cumulative relative frequency graph of the data using the table. ”\0 Rs 5:: «5.: :.. J What can we learn from this graph? Barack Obama was inaugurated at the age of 47. Is this unusually yo ng? Shaw“ 5 CA e um {0. Wow lam-3r <13 ‘ 3 "-“’ “U" l 63>.” fenear‘nk inseam \ 3.1 ‘l , 0" :~\\ ? M $5.95}; 5 UVL rm rabflsrm \erxix m‘ Q“ 1, AN], Alamo}; WM fix 433$“ 43 A . . th . . . . “\M FQ‘dY r “\3 Estimate and Interpret the 65 percentile of the distribution. T‘Q 5‘3"” \n u)\1\\cfl.\l\ “ ,JVCji-‘P-c- A 1*”; .1 ,m.” V“ 5.} it (.3 , ‘bi chi "ill/i C, 5.1% i «\M 9'15: w x -- ..‘~ .. -— «1 x’ ~“v.-\.\\C 2;; N ecu! ”’cl 1‘- “'6‘“ Measuring Position: z-Scores By looking back at your test score, we knew that the score is above what seems to be the average. Let’s use the data of the test scores to determine the 1—variable statistics. Mean Median Standard Deviation Q o g Q 6 . 0'? We can describe the location of your score by telling how many standard deviations above or below the mean score is. Since the mean is 80 and the standard deviation is about 6, the score of 86 is about one standard deviation above the mean. Converting the observations in this manner is called standardizing. Definition: Standardized value (z-score) If x is an observation from a distribution that has known mean and standard deviation, the standardized value of x is ' _ x ‘mean : X G x _ standard deviation 53‘ A standardized value is often called a z-score. Let’s revisit the scores we calculated the percentiles for and determine their z~scores. Example: I _, Use the test scores from the statistics quiz, determine the standardized score. X 1 30 >7! 7‘ " 9 1 a the rade at 9%: - . , ..\ ) 9, x37? -_:, CV3 e" {62253 3"“?! TM “salad; 0? 0‘33 \$ 3J4 ZAC‘VA‘“; ’ 5 x ‘ é 0'1 Atuch-xons \lmw NW rat-fem 4*- £0 ~ . ‘ /\ 2 :- r; asov< (3. a”! I?" '3, 9x . \fi' \' ,- W . __ i, m \ f; a €"\,{r\\‘§m \A E‘s-(“\ANybflf b) the grade at 72 at: M“ r“ “C ”F 1;; ‘ g ,3 . 17: : "I Q - Q (I \fef‘ as!» ‘x 'N '\ ,~ 9 0‘“ 5% OK. f. 3“, [i 2 =2- «s ”25.2; 7-17“ a r 'l 3.2 We can also use z-scores for comparisons. Suppose you took a Chemistry test and got an 82 on the test. At first you can be disappointed, but your teacher described the scores as fairly symmetric with a mean of 76 and a standard deviation of 4. How does your score compare to your statistics grade? I (_ A ‘) 3M S (Van 5&1 yttl‘a? ’37“ », 1'5» ow: E'i A. 1-’ l '2, 45 ~ ‘ , In, , (1:: W: 7” -...,:--.Vw -_ #q..n._ ,._. V ,. L; a, -.,)e f. O . QC! (KAI l' Transforming Data To find the standardized score(z-score) for an individual observation, we transformed this data value by subtracting the mean and dividing by the standard deviation. Transforming converts the observations from the original units of measurement to a standardized scale. What effect does transfonning-adding or subtracting; multiplying or dividing- have on the shape. center, and spread of the entire distribution? Let’s investigate. Example: . . —_' Soon after the metric” system was introduced in Australia, a group of students were asked to guess the width of their classroom to the nearest meter. Here are the guesses in order from lowest to highest: 8 9 10 10 10 1O 10 1O 11 11 11 11 12 12 13 13 13 14 14 14 15 15 15 15 15 15 15 15 16 16 16 17 17 17 17 18 18 2O 22 25 27 35 38 40 Let’s create a dotplot and examine the 1-variable-statistics to describe the 8008. Shape: "C \0 X j: ._ g N (03 "L We? TR Alt; u U 'Vm AQFIm ; ‘V ’51:, 9»: M 5 Jr“: “:3 . Center: \ _ "{‘H we) \ a q GP .xNaL. 6306 3:2: \. M3 1 M. ”r; Sliheaf: T .OfiK of: 3*th A35») 31 Tuna \ 3 Q “KL“: 1 JV» r. M “n3 , «a, (E MHSVWG'O‘ fl ‘ C" 5 \\e . Mun -; a 2’ , or ‘ at“ ““9 ‘ Outliers: m CJQQ ; A A :3 l? «a A .\ 3 iv i Effect of adding or subtracting a constant The actual width of the room was actually 13 meters wide. How close were the student guesses? We can examine the distribution of students’ guessing errors by defining a new variable: 0g. error = guess ~ 13 That is, we will subtract 13 from each observation. What can you guess would happen to our distribution? How will it effect the 8008? Let’s use the calculator to display the effect. Effect of Addin or Subtractin a Constant Adding the same number a (either positive, zero, or negative) to each observation —'adds a to‘measures of center and location (mean, median, quartiles, percentiles), but — does®change the shape of the distribution or measures of spread (range, IQR, standard deviation). Effect of multiplying or dividing a constant Since the metric system was barely introduced, it may not be useful to tell the students they were wrong by a few meters. So to put it in terms they may understand, we can convert the data into feet. There is roughly 3.28 feet in meter, so for the student that had an error of -5 meters can translate to 3.28 feet =—16.4 feet 0 C4; :3; 5:, as ~17 3,; 1 meters So let’s change the units of measurement from meters to feet. We need to multiply the error values by 3.28. What effect do you think it will have with the graph? — 5 meters x Effect of Multiplying (or Dividing) by a Constant Multiplying (or dividing) each observation by the same number b (positive, negative, or zero) - multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b, - does not change the shape of the distribution M fit) if “.5 {‘A\‘n\il&‘£$ _ WSW; 3% I {I ctr-«A Kay 9‘9"“: ‘2”: .7 a 'u u-r‘r ‘ Connecting transformations and z-scores How does transforming deal with z—scores? Well to find a z-score it is a combination of subtracting the mean from every score and dividing it by the standard deviation. Let’s use the calculator to plot the z—scores. How do you think the distribution will change? ..... -et ,__... f, , , y t my? ' Density Curves . We already have a few steps to approach our data since the very beginning. 1) Plot your data: make a graph, usually a dot plot, stemplot, or histogram. 2) Look for the overall pattern (SOCS) 3) Calculate the numerical summary to describe the center and spread (mean/standard deviation or median/IQR) We will add the following: . 4) Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. The following is a histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills (ITBS). A smooth curve is drawn on top as a good description of the overall pattern of the data. 4 6 S 1 2 i. 2 1 11138 vacant-luv score l‘l‘BS YM‘IWBI'Y same (a) (b) mm 284:1} the proportion of scores less than or equal as an in manual data is om (him proponian Dtscofeslessmanmaqualtosfltmm[trademfiyburvelstlm , The shaded region of scores less than 6.0 or less is shaded to compare to the area that is given in . the graph on the right. The total area of the histogram bars is 100% (a proportion of 1), since all the observations are represented. In moving from histogram bars to a smooth curve, we make a specific choice: adjust the scale of the graph so that the total area of the curve is exactly 1. Now the total area represents all the observations, just like the histogram. We can interpret areas under the curve as proportions of the observations. Definition: Density Curve A density curve is a curve that - is always on or above the horizontal axis, and - has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. Density curves come in many shapes. A density curve can give a good approximation of the overall pattern. Outliers, which are departures from the pattern, are not described by the curve. *Note: No set of data is exactly described by a density curve. The curve is an approximation that is easy to use and accurate enough for practical use. Describing Density Curves Our measures of center and spread also apply to density curves as well as to actual sets of observations. Areas under a density curve represent proportions of the total number of observations. The median of a data set is the point with half the observations on either side. So the median of a density curve is the “equal-areas point,” the point with half the area under the curve to its left and the remaining half of its area to the right. Because density curves are idealized patterns, a symmetric density curve is exactly symmetric. The median and mean of a symmetric curve are exactly the same. We can see below how a skewed distribution effects the location of the mean. xx m M, a,“ 3/ ' \\ " \ $ The long right tail putts“! f 5, ,,:thoniennwthcright f E \ 3’ N L..,.W-WJ f l f: g . E : / 3 g g \\ E r i i i a—ei—a/ ~.%—— ~— _- 4...“; E m._ We. ..... .. - a M e . . l I Mean Median and mean Median (a) (I?) FIGQRE 2.9 (a) The median and mean of a symmetric density curve both lie at the center at sym- metry. (biThe‘median and mean ot a right-skewed density curve. The mean is puned away from the median toward the long tail. FiGURE 2.18 The mean is the balance point of a density curve. The mean of a set of observations is their arithmetic average. The mean of a density curve is the point at which the curve would balance if it were made of solid material. From the previous section we had described the mean and standard deviation of a set of data with the symbols x and sx respectively. With a distribution curve we will denote the mean with the Greek letter mu (p) and the standard deviation with the Greek Letter sigma (o). ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern