This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Below are boxplots for the five common idealized data distributions. Symmetric: UShaped Symmetric: Uniform Q1 Qz C23 Q1 Q2 Q3
Rig ht—Skewed ‘ . LeftSkewed Symmetric: BellShaped Q1 (:22 Q. ' ,
We deﬁne the interquartile range (IQR) as the differencezbetween the third and first quartiles:
' IQR = Q3 — Q1. The IQR is a more resistant measure of variation than the standard deviation because
it’s not influenced by the largest and smallest values in the data set. Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
g
Q
9
D
D 9 D D D 9 35
D, l
I
l
l Chapter 2 — Summarizing Data Outlier and Potential Outlier
0 Label a value as an outlier if it is more than 3 times the IQR in distance from the nearer of the first and third quartiles. Denote it with an asterisk (*). . If a value is more than 1.5 and less than 3 times the IQR in distance from the
nearer of the ﬁrst and third quartiles, label it as a potential outlier and denote it by a circle (0). Example 20: Exam scores, revisited
Construct a boxplot for the following exam scores. What is the value of the IQR? Are there any outliers? 84 88 68 68 95
81 79 8 1 79 82
79 8 1 79 9O 8 1 Example 21. Class Sizes
The following table lists the class sizes of ten Stat 100 courses. 2x76 150/ 7; 380 320 410 ,290 310 ‘15 1&0 180' Z70 2'90 310 320 530 3‘80 Lilo a) Constructa boxplot for the data.
min=75> Q‘71go (17,7300 , Q3 :[email protected] \ mamrtto “7 (ca 100 3» Hoe
b) Are there any outliers? Any potential outliers? IQE= 330~1go —_ I50
\,g.l50:7_25 515O=’l‘50 (l.o\ outli‘efs or 3§o£en£r4loq£lrers / l “IQ afififlflﬁ‘ié‘ili‘ii‘iiiii333331131‘3‘3'33'33i13 CHAPTER 3 — LINEAR RELATIONSHIPS: REGRESSION AND
CORRELATION 3.1 Scatterplots Example 1: Exam 2 vs Exam 1 The following scatterplot relates the first two exam grades for 40 previous Stat 100
students. The explanatory variable, the grade for Exam 1, is plotted on the
horizontal axis (xaxis). The response variable is the grade for Exam 2 and it is
plotted on the vertical axis (yaxis). Exam 2 vs. Exam 1 120 100 Are these two variable positively correlated or negatively correlated? ' Example 2: Positive or Negative Correlation? ' Imagine that we have a representative sample of college students. For each of the following pairs of variables, state whether the correlation is positive or negative. a. Roommate’s age and his year of birth. — b. Height of mother and height of father. + c. Amount of time spent studying and amount of time spent partying. '..—
d. SAT score and freshman year of college GPA. + e. Amount of time spent awake and amount of time spent sleeping. ~— E
E
g
g
E x 80
é
é“ 60
gm
i 40 @r
2
55 65  75 85 95 105 Chapter 3 — Regression and Correlation 3.2 The Correlation Coefﬁcient Along with describing a linear correlation as positive or negative, we also characterize
the strength of the linear correlation. If the points cluster tightly around the line, we
say there is a strong correlation. If the points lie on the line, we say there is a
perfect correlation. Example 3: Linear Relationships
Describe the linear relationships in each of the following scatterplots. .m..."n.........w........—.N......W...~._..".........................._.........._mm... ...................................... W
/ Q U\
_d“
M
:0
a) + 2
O
H'
m Mader«is+ 38 Chapter 3 — Regression and Correlation The linear correlation coefficient r measures the strength of the linear
relationship between the paired x and yvalues. The strength of the linear
correlation is classified as strong, weak to moderate, or negligible, according to the
magnitude of r. The range of values for the correlation coefficient is shown below. Strong Strong
Weak to moderate positive positive correlation correlation negative Weak to moderate
correlation negative correlation Negligible
correlation é“
®
6%
a
%‘ ___. M A. r , PlotC PlotD 9
®
é! X Plot F 39 E
l
g
E
g
l,
g
E Chapter 3  Regression and Correlation The correlation coefﬁcient for a set of paired x and yvalues is r___1_, [x—YIy—ij; 1 Z(X—)7)(y—)7), n—l 5, 5y n—l sxsy where n is the number of (X, y) pairs in the data, 2 indicates summation over all the
pairs, and Y , )7, sx, and sy can be calculated using the formulas for mean and standard deviation discussed in Chapter 2. Steps in computing the correlation r
1. Standardize the xvalues by first subtracting Y and then dividing by sx.
2. Standardize the yvalues by first subtracting? and then dividing by sy. 3. Multiply each standardized x by its associated standardized y.
4. Divide the sum of the products from step 3 by n — 1. Example 4: Exam 2 vs Exam 1
Compute r for the following eight pairs of Exam 1 and Exam 2 grades. ———__—__.—____
Exam 1 42 56 57 58 61 62 63 64
Exam 2 54 54 58 58 63 63 64 64 —————________________________ For Exam 1, the average is 58 and the standard deviation is 7, approximately. For
Exam 2, the average is 60 and the Standard deviation is 4, approximately. . — (”I“) x y 7 4 7 4
42 54 ,4.“ ' 15 475.4%
56 54 «5.14 \~5 OL‘VS
57 58‘ —o.il4 —o.5 0.07
58 58 o ,0_.5 . o 51  53 0‘43 0.75 65.57.
62 63 (5.57 0.75 6.‘\3
63 , 64 0.71 l (3. 7!
64 64 0.36, I 0‘ 86,  i _ 6,. 25
r ' ,_._ ~ ————— = t 0. $4 3
Scale invariance rule for r : V7" 7 . r is the same for all choices of units (linear transformations) for x and y. 40 ...
View
Full Document
 Spring '09
 hirtz

Click to edit the document details