This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this
score? What if it is the highest score in the class? What if it is below the “average" of the entire
class? Maybe the teacher might “curve” the grade. We will focus on the act of describing the location
of an individual within a distribution. Let’s consider the class scores below:
79 81 80 77 73 83 74 93 78 8O 75 67 73
77 83 86 90 79 85 83 89 84 82 77 72 Here is a stemplot with the data. Notice that the distribution 6 7
is roughly symmetric with no apparent outliers. Where does your 7 2334 'éLL : (8'41
score in comparison to everyone else? 7 5777899 515
8 00123334
8 569
Measuring Position: Percentiles 9 03 One way to describe your position is to tell what percent of students in the class earned scores that
were below ours. That is, we can calculate the ercentile.
Definition: Percentile The pth percentile of a distribution is the value with p percent of the observations less than it. Here, our score is the fourth from the top of the class. Since 21 of the 25 observations are below our
score it is at the 84th percentile in the test score distribution.
Example: _| Using these scores, let’s calculate the percentile of the following: 3. “w. a) The score at 72. 7 g TR SCOVQ 6‘} 19‘ ‘Y b) The score at 93_ 3/35: 4 e/ .a ,,.x, ‘2
\ l‘ll 'E' ’ 3",: I’ffy't‘iE/g “lam? 5/ch of 9‘5 ‘> 7' {h a
- L, ,
c) The two students at 80.} —: 1% /,
3‘3
TR 3M0 {tﬁems‘ v‘r" 3“” “it? ‘56 ll :fo :—-’m’;~jtl.£, L_ *Note: Some may define the pth percentile as the value with p percent less than or equal to it. Cumulative Relative Frequency Graphs
There are some interesting graphs that can be made using percentiles. One of the graphs starts with a frequency table for a quantitative variable. Here is a frequency table that summarizes the ages of
the first 44 US. presidents when they were inaugurated: Age Frequency Relative Cumulative Cumulative relative
Frequenc 40-44 2 1 45-49 7 {1 50-54 13 2 .1 55-59 12 3M 60—64 7 E 5 65-69 |_ 3 “l ‘t The extra columns will be used to help us determine the relative frequency, cumulative frequency,
and cumulative relative frequency. To determine the relative frequency we would divide the count of each class by the total and multiply
by 100 to get the percentage. To determine the cumulative frequency we would add the counts in the frequency column for the
current class and all classes with smaller values of the variable. To determine the cumulative relative frequency, we would divide the entries in the cumulative
frequency by the total and multiply by 100 to receive the percentage. We can make a cumulative relative frequency graph of the data using the table. ”\0 Rs 5:: «5.: :.. J What can we learn from this graph? Barack Obama was inaugurated at the age of 47. Is this unusually yo ng?
Shaw“ 5 CA e um {0. Wow lam-3r <13 ‘ 3 "-“’ “U" l 63>.” fenear‘nk inseam
\ 3.1 ‘l , 0" :~\\ ? M $5.95}; 5 UVL rm rabﬂsrm \erxix m‘ Q“ 1, AN], Alamo}; WM ﬁx 433$“ 43
A . . th . . . . “\M FQ‘dY r “\3
Estimate and Interpret the 65 percentile of the distribution.
T‘Q 5‘3"” \n u)\1\\cﬂ.\l\ “ ,JVCji-‘P-c- A 1*”; .1 ,m.” V“ 5.} it (.3 , ‘bi chi "ill/i C, 5.1% i «\M
9'15: w x -- ..‘~ .. -— «1
x’ ~“v.-\.\\C 2;; N ecu! ”’cl 1‘- “'6‘“ Measuring Position: z-Scores
By looking back at your test score, we knew that the score is above what seems to be the average. Let’s use the data of the test scores to determine the 1—variable statistics. Mean Median Standard Deviation
Q o g Q 6 . 0'? We can describe the location of your score by telling how many standard deviations above or below
the mean score is. Since the mean is 80 and the standard deviation is about 6, the score of 86 is
about one standard deviation above the mean. Converting the observations in this manner is called
standardizing. Deﬁnition: Standardized value (z-score) If x is an observation from a distribution that has known mean and standard deviation, the
standardized value of x is '
_ x ‘mean : X G x _ standard deviation 53‘
A standardized value is often called a z-score. Let’s revisit the scores we calculated the percentiles for and determine their z~scores. Example: I _,
Use the test scores from the statistics quiz, determine the standardized score. X 1 30 >7! 7‘ " 9 1 a the rade at 9%: - . , ..\
) 9, x37? -_:, CV3 e" {62253 3"“?! TM “salad; 0? 0‘33 \$ 3J4 ZAC‘VA‘“;
’ 5 x ‘ é 0'1 Atuch-xons \lmw NW rat-fem 4*- £0 ~
. ‘ /\
2 :- r; asov<
(3. a”!
I?" '3, 9x . \ﬁ'
\' ,- W . __ i, m \ f; a €"\,{r\\‘§m \A E‘s-(“\ANybﬂf
b) the grade at 72 at: M“ r“ “C ”F 1;; ‘ g ,3 .
17: : "I Q - Q (I \fef‘ as!» ‘x 'N '\ ,~ 9 0‘“ 5% OK.
f. 3“, [i
2 =2- «s
”25.2; 7-17“
a r 'l 3.2 We can also use z-scores for comparisons. Suppose you took a Chemistry test and got an 82 on the test. At ﬁrst you can be disappointed, but your teacher described the scores as fairly symmetric with a mean of 76 and a standard deviation of 4. How does your score compare to your statistics grade?
I (_ A ‘) 3M S (Van 5&1 yttl‘a? ’37“ », 1'5» ow: E'i A. 1-’ l '2, 45 ~ ‘ , In, , (1:: W: 7” -...,:--.Vw -_ #q..n._ ,._.
V ,. L; a, -.,)e f. O . QC! (KAI l' Transforming Data
To find the standardized score(z-score) for an individual observation, we transformed this data value by subtracting the mean and dividing by the standard deviation. Transforming converts the
observations from the original units of measurement to a standardized scale. What effect does
transfonning-adding or subtracting; multiplying or dividing- have on the shape. center, and spread of the entire distribution? Let’s investigate. Example: . . —_'
Soon after the metric” system was introduced in Australia, a group of students were asked to guess
the width of their classroom to the nearest meter. Here are the guesses in order from lowest to
highest:
8 9 10 10 10 1O 10 1O 11 11 11 11 12
12 13 13 13 14 14 14 15 15 15 15 15 15
15 15 16 16 16 17 17 17 17 18 18 2O 22
25 27 35 38 40
Let’s create a dotplot and examine the 1-variable-statistics to describe the 8008.
Shape: "C \0 X j: ._ g N (03 "L We?
TR Alt; u U 'Vm AQFIm ; ‘V ’51:, 9»: M 5 Jr“: “:3 .
Center: \ _
"{‘H we) \ a q GP .xNaL. 6306 3:2: \. M3 1 M. ”r;
Sliheaf: T .OﬁK of: 3*th A35») 31 Tuna \ 3 Q “KL“: 1 JV» r. M “n3
, «a, (E MHSVWG'O‘ ﬂ ‘ C" 5
\\e . Mun -; a 2’ , or ‘ at“ ““9 ‘
Outliers:
m CJQQ ; A A :3 l? «a A .\ 3 iv i Effect of adding or subtracting a constant
The actual width of the room was actually 13 meters wide. How close were the student guesses?
We can examine the distribution of students’ guessing errors by defining a new variable: 0g. error = guess ~ 13
That is, we will subtract 13 from each observation. What can you guess would happen to our
distribution? How will it effect the 8008? Let’s use the calculator to display the effect. Effect of Addin or Subtractin a Constant
Adding the same number a (either positive, zero, or negative) to each observation —'adds a to‘measures of center and location (mean, median, quartiles, percentiles), but — does®change the shape of the distribution or measures of spread (range, IQR, standard
deviation). Effect of multiplying or dividing a constant Since the metric system was barely introduced, it may not be useful to tell the students they were
wrong by a few meters. So to put it in terms they may understand, we can convert the data into feet.
There is roughly 3.28 feet in meter, so for the student that had an error of -5 meters can translate to 3.28 feet =—16.4 feet 0 C4; :3; 5:, as ~17 3,; 1 meters So let’s change the units of measurement from meters to feet. We need to multiply the error values
by 3.28. What effect do you think it will have with the graph? — 5 meters x Effect of Multiplying (or Dividing) by a Constant Multiplying (or dividing) each observation by the same number b (positive, negative, or zero)
- multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b, - does not change the shape of the distribution
M ﬁt) if “.5 {‘A\‘n\il&‘£$ _ WSW; 3% I {I ctr-«A Kay 9‘9"“: ‘2”: .7 a 'u u-r‘r ‘ Connecting transformations and z-scores How does transforming deal with z—scores? Well to find a z-score it is a combination of subtracting
the mean from every score and dividing it by the standard deviation. Let’s use the calculator to plot
the z—scores. How do you think the distribution will change? ..... -et ,__... f, , , y t my? ' Density Curves .
We already have a few steps to approach our data since the very beginning. 1) Plot your data: make a graph, usually a dot plot, stemplot, or histogram. 2) Look for the overall pattern (SOCS) 3) Calculate the numerical summary to describe the center and spread (mean/standard deviation or
median/IQR) We will add the following: . 4) Sometimes the overall pattern of a large number of observations is so regular that we can describe
it by a smooth curve. The following is a histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the
vocabulary part of the Iowa Test of Basic Skills (ITBS). A smooth curve is drawn on top as a good
description of the overall pattern of the data. 4 6 S 1 2 i. 2 1
11138 vacant-luv score l‘l‘BS YM‘IWBI'Y same
(a) (b) mm 284:1} the proportion of scores less than or equal as an in manual data is om
(him proponian Dtscofeslessmanmaqualtosﬂtmm[trademﬁyburvelstlm , The shaded region of scores less than 6.0 or less is shaded to compare to the area that is given in . the graph on the right. The total area of the histogram bars is 100% (a proportion of 1), since all the
observations are represented. In moving from histogram bars to a smooth curve, we make a specific
choice: adjust the scale of the graph so that the total area of the curve is exactly 1. Now the total
area represents all the observations, just like the histogram. We can interpret areas under the curve
as proportions of the observations. Deﬁnition: Density Curve A density curve is a curve that - is always on or above the horizontal axis, and
- has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above
any interval of values on the horizontal axis is the proportion of all observations that fall in that
interval. Density curves come in many shapes. A density curve can give a good approximation of the overall
pattern. Outliers, which are departures from the pattern, are not described by the curve. *Note: No set of data is exactly described by a density curve. The curve is an approximation that is
easy to use and accurate enough for practical use. Describing Density Curves
Our measures of center and spread also apply to density curves as well as to actual sets of observations. Areas under a density curve represent proportions of the total number of observations.
The median of a data set is the point with half the observations on either side. So the median of a density curve is the “equal-areas point,” the point with half the area under the curve to its left and the
remaining half of its area to the right. Because density curves are idealized patterns, a symmetric density curve is exactly symmetric. The median and mean of a symmetric curve are exactly the same. We can see below how a skewed
distribution effects the location of the mean. xx m M, a,“ 3/ ' \\ " \ $ The long right tail putts“!
f 5, ,,:thoniennwthcright
f E \ 3’ N L..,.W-WJ
f l f: g
. E :
/ 3 g g \\
E r i
i i
a—ei—a/ ~.%—— ~— _- 4...“; E m._ We. .....
.. - a M e . .
l I Mean
Median and mean Median (a) (I?) FIGQRE 2.9 (a) The median and mean of a symmetric density curve both lie at the center at sym-
metry. (biThe‘median and mean ot a right-skewed density curve. The mean is puned away from the median toward the long tail. FiGURE 2.18 The mean is the balance point of a density curve. The mean of a set of observations is their arithmetic average. The mean of a density curve is the
point at which the curve would balance if it were made of solid material. From the previous section we had described the mean and standard deviation of a set of data with the symbols x and sx respectively. With a distribution curve we will denote the mean with the Greek
letter mu (p) and the standard deviation with the Greek Letter sigma (o). ...

View
Full Document

- Fall '15
- Mr. Sanchez
- Statistics, Probability, Chapter 2 Notes, DescribingLocationDistribution