**Unformatted text preview: **INTERPRETING SCATTER PLOTS AND TWO-VARIABLE STATISTICS
Activity 1: Match the scatter plot to the correct correlation coefficient:
1. 0.14
2. ‐0.99
3. 0.43
4. ‐0.77 2 1 4 3 Activity 2:
A zoologist was interested in predicting the weight of alligators by simply measuring their length. Some brave
researchers went out to an alligator preserve in the Everglades, and measured 21 alligators’ lengths and weights. Best ft linear equation: y = 5.9x ‐ 393,
a)
b) c) r = 0.93 What is the slope of the linear model? Interpret this value in context of the data.
The slope of the line is 5.9. This means that for every extra inch of length, the weight of the alligator
increases by 5.9 pounds.
What is the y‐intercept of the linear model? Interpret this value in context of the data. Does this
interpretation make sense in context? Why (not)?
The y-int of the model is -393. This means that when the alligator has a length of zero inches it has a
weight of -393 pounds. This does not make sense because if an alligator has no length (does it exist)
and what is “negative weight”.
One alligator the researchers named “Fluffy” was too aggressive to be weighed. They did get
Fluffy’s length, though: 108 inches. Predict her weight using the model.
y = 5.9 (108) – 393 pounds
y = 244.2 pounds
The predicted value for Fluffy’s weight is 244.2 pounds. d) Even though the correlation between weight and length is high (0.93), There may be a better
equation to model this relationship. Why?
If we look at the residuals, there is a pattern that can be seen. A relationship based on a curve is a
better model. Visually one can see that a curve will better fit the data set. Activity 3:
Mr. Theil collected arm span and height data for a larger group of students. He also wanted to see if the
relationship between height and arm span was different for males and females. Correlation coefficients: for females, r= 0.917 and for males, r = 0.616
a)
For which group, males, or females, is the relationship between height and arm span stronger?
The relationship between height and arm span is stronger for females than for males.
b)
Give one piece of visual evidence for your conclusion in part a).
The blue squares that represent females “fit” closer to the linear model. (Residuals appear to be
smaller). The female data points more closely form a straight line than the males data points.
c)
Give one piece of numerical evidence for your conclusion in part a).
The relationship is stronger for the females since their R2 value is larger.
d)
Tracy’s arm span is 170 cm long. Predict her height, using the appropriate best ‐ft linear model.
Height = .666 (170) + 57.4 cm
Height = 170.6 cm
The predicted height for Tracy would be 170.6 cm tall.
e)
Chuckie’s Arm span is 180 cm. Predict his height.
Height = .38 (180) + 112.5 cm
Height = 180.9 cm
The predicted height for Chuckie would be 180.9 cm tall.
f)
Which prediction, Tracy’s or Chucky’s, is probably more accurate? Provide evidence and/or
specifc reasoning for your decision.
My prediction for Tracy’s height is probably more accurate since the R2 value for the female model is
larger (stronger relationship). 84% of the change in height is due to the change in arm span.
g)
One person, “Kelly,” has an arm span of 168 cm, and a height of 170 cm, and was left off the
plot. You don’t know if Kelly is male or female. What’s your best guess? Provide evidence for
your conclusion.
Male Height = .38(168) +112.5 cm
Female Height = .666(168) + 57.4 cm
= 176.3 cm
= 169.3 cm
Based on the above calculations, I think Kelly is female since her actual height is closest to the
predicted height for a female with an arm span of 168 cm. The height calculations support this
conclusion.
h)
How confdent are you with your decision in g)? Absolutely sure, pretty sure, or not very sure at all?
Explain.
I am pretty sure since the female model is the strongest of the two models. The female height
predicted by calculation almost perfectly matches Kelly’s actual height. i) j) There’s a point plotted at (212, 181). Write a sentence that describes the gender and appearance of this
person. How are they considerably different from the rest of the people in this study? Be specifc.
Male Height = .38(212) +112.5 cm
Female Height = .666(212) + 57.4 cm
= 193.1 cm
= 198.6 cm
Residual Male = 181-193.1 cm
Residual Female = 181-198.6 cm
=-12.1 cm
=-17.6 cm
Looking at the residual values, the male value is smaller and this person is most likely male, with long
arms. He is much shorter than most males with a similar arm span.
There’s a point plotted at (175, 188). Write a sentence that describes the gender and appearance of this
person. How are they considerably different from the rest of the people in this study? Be specifc.
Male Height = .38(175) +112.5 cm
Female Height = .666(175) + 57.4 cm
= 179 cm
= 174 cm
Residual Male = 188-179 cm
Residual Female = 188-174 cm
= 9 cm
= 14 cm
Looking at the residual values, the male value is smaller and this person is most likely male, with
shorter arms. He is much taller than most males with a similar arm span. PROBLEM 1: Do higher grossing movies in the US tend to be higher-grossing internationally as well? The
following table contains the box office receipts for the ten highest grossing movies in history (as of 2007). The
numbers are in millions of dollars and adjusted for inflation.
Movie
Domestic
International
Receipts
Receipts
Titanic
601
1235
Star Wars
461
337
Shrek 2
437
444
ET
435
322
Star Wars: Phantom Menace
431
491
Pirates of the Caribbean: Dead Man’s
417
592
Chest
Spider-Man
404
418
Star Wars: Revenge of the Clones
380
468
The Lord of the Rings: The Return of the
377
752
King
Spider-Man 2
373
410
a. Plot the relationship on a scatter plot. b. Find the equation of the line of best ft. Use this model to predict the international box office gross for a
movie which brings in $500 million dollars in the US.
International Receipts = 2.8471(500) – 681.91
= 741.64 million dollars
The predicted International Box Office Receipts would be 741.64 million dollars.
Interpret the value of the slope in context of this situation.
As the Domestic Receipt increase by 1 million dollars, the International Receipts would increase by
2.8471 million dollars. c. d. e. f. g. h. Interpret the y‐intercept of the model in context. Do you feel this interpretation has any
real‐world value? Explain.
If the Domestic Receipts were $0, the International Receipts would be a negative 681.91 million
dollars. This is not a real-world interpretation since it is not possible to negative ticket sales.
Suppose that Titanic were removed from this data set. How would this removal change
the value of the slope and the intercept of the linear model?
It would make the relationship turn from a positive one to a negative one. The slope would become
-2.1385 and the intercept would become a large positive value (1353.2 million dollars).
Suppose that Titanic were removed from this data set. How would this removal change
the value of the correlation coefficient? Explain why.
Before r = 0.69
After r =0.50
It will make it a weak negative correlation, since the titanic was such a popular movie the numbers
“skew” the data and everything becomes opposite once it’s removed.
Create a new scatter plot, determine the new equation of the line of best ft, and determine the new
correlation coefficient after removing Titanic from the data set. Even if you removed Titanic from the data set, and computed a new linear model and
correlation coefficient, why might it be inappropriate to use them to make predictions
about the international box office income of other movies that premiere in the US?
The movie may be more popular in the US then internationally or vice-versa. There could also be a
popular actor that is in the movie that may not be as popular in another country. Therefore, the data
could be very different for US or internationally. PROBLEM 2: The data provided in the table below are the gold medal winning long jump distances for the
men’s and women’s divisions at the Olympics from 1948 to present.
Year
Men’s Distance (m)
1948
7.82
1952
7.57
1956
7.83
1960
8.12
1964
8.07
1968
8.90
1972
8.24
1976
8.34
1980
8.54
1984
8.54
1988
8.72
1992
8.67
1996
8.50
2000
8.55
2004
8.59 Women’s Distance (m)
5.69
6.24
6.35
6.37
6.76
6.82
6.87
6.72
7.06
6.96
7.40
7.14
7.12
6.99
7.07 Make three scatter plot graphs – Men’s Distance vs. year, Women’s Distance vs. year, and Women’s Distance vs.
Men’s Distance (or vice versa).
(You could also try to make a double scatter plot with year on the x-axis and both men’s and women’s distances
on the y-axis)
a. Determine the equations of the lines of best ft and the correlation coefficients for each scatter plot. b. What do each of these equations and coefficients tell you about distances over time, and men’s
distances compared to women’s distances.
This tells me that the slope of the line of best fit for the women is greater than that of the men. (The
women’s line of best fit could possibly cross the men’s line of best fit ). The R2 value for the women is higher than for the men, this means that the women’s data set more
closely forms a straight line. (ie Women: 73% of the variation in distance is due to year and Men: 59%
of the variation in distance is due to the year.)
c. According to the scatter plots and trends, will women ever “catch up” to men in terms of distance
jumped.
ymale = 0.0164x – 24.041
yfemale = 0.021x – 34.655
If women catch the men, ymale = yfemale
0.0164x – 24.041 = 0.021x – 34.665
0.0046x = 10.624
x= 2310
This means that according to the lines of best fit, the women could catch up to the men in the year
2310. In that Olympic games, the winning length would be 13.84 m. This does not seem probable as
this is an additional 80% increase in distance. ...

View
Full Document

- Winter '12
- Funk
- Statistics, Correlation, Correlation Coefficient, Scatter Plots, Human height, Scatter plot, arm span