Unformatted text preview: Stat 3011 Spring 2011
Introduction to Statistics
Exam 3 Answers
Problem 1
(10 points) Below is a stem and leaf plot that illustrates the number of wins, as of April 18th, for all 16
teams in the National League. Does the data meet either of the two assumptions for performing a
confidence interval for the average number of wins by a National League team? Explain why or why
not for each assumption.
> stem(winsn, scale=2)
The decimal point is at the 
5  00
60
7  000000
8  0000
90
10  0
11 
12  0
There are two assumptions we need to check in order to do a confidence interval for the average:
(1) The data is from an independent, random sample (2 points)
The data does not meet this assumption because it is not a random sample. In fact, it is not a
sample. The problem asks for the average number of wins for all the National League teams. The
problem has graphed all the data for all the teams in the National League. Therefore, this is a graph of
the population. The average of all of these data points is the average number of wins for all the
national league teams, and therefore, we don't need a confidence interval we can calculate the
parameter. (3 points)
(2) The data is normally distributed (2 points)
This is somewhat of a judgement call. I'd say no, this data is a bit too right skewed to be seen
as normal. If you need something more concrete, note that Q1 is at 7 and Q3 is at 8, so 1.5*IQR is 1.5,
which means 5, 10, and 12 are all outliers. But you don't have to calculate potential outliers to get full
credit. (3 points)
Problem 2
(10 points) A Pew Research Poll of 1004 adults from March 1720 found that 39% favored increasing
the use of nuclear power. A poll of 2251 adults from last October found that 47% favored increasing
the use of nuclear power. Is this an instance of dependent samples (i.e. matched pairs) or independent
samples? Explain.
This is an example of independent samples. (3 points)
The reason is worth 7 points. Some possible reasons include:
The two samples have different sample sizes, so they can't be directly linked.
There seem to be two separate polls taken, which means two random samples.
While there is the possibility that some people were included in both polls, not all are, so independent. Problem 3
A) (30 points) The standard tip percentage for a good server is 20%. For a waitress at a restaurant,
a random sample of 23 checks had an average tip percentage of 27.8% and a standard deviation
of 7.8%. A histogram of the tip percentage looks like the normal curve. Is there evidence that
the tip percentage differs from 20%?
Start with the last sentence: "is there evidence" means that we need to do a hypothesis test. And "the
tip percentage differs from 20%" tells us both the null value and the alternative hypothesis. The tricky
part of this problem is that the information is in terms of percentages, but the percentage is a
measurement (% of the total bill) as opposed to an indication of the relative frequency of a category.
Note that the problem gives a standard deviation, which we only need for a quantitative variable.
1) Hypotheses (6 points)
Ho: µ=20% (or µ=0.20, but be careful: you need to convert all the data into proportions to
perform the testing, which adds more potential for arithmetical mistakes.)
Ha: µ≠20%
2) Assumptions (5 points)
We need to check and make sure that the data is a random sample for that waitress and that
the data looks normal. Both these criteria are met. (Test taking tip: if I'm going to assign
30 points to a hypothesis testing problem, it will satisfy the assumptions)
3) Test statistic (9 points)
s
7.8%
=1.63 (2 pts formula, 1 for calculation)
The standard error estimate: se= = n 23
x
−0 27.8 −20
The test statistic: t =
=
=4.795 (3 pts formula, 1 pt calculation)
se
1.63
df=n1 = 231=22 (1 pt formula, 1 calculation)
4) pvalue (5 points)
using the R output on the last page, we see the closest output is:
> pt(4.79, 22)
[1] 4.385543e05 (2 points)
since the alternative hypothesis is twosided, we need to double this number, so the pvalue
is 2*0.000043855 = 0.00008771 (3 points)
5) Conclusions (5 points)
There is no rejection level given, so we have to assume the rejection level is 0.05. At this
rejection level, we have evidence to reject Ho.(3 points)
We have evidence that the true tip percentage for this waitress differs from 20%.(2 points)
B) (5 points) Based on your conclusion in part A, what type of error could be present: Type I or
Type II error? Explain.
There is potential for Type I error because we rejected the null hypothesis (2 points). Type I
error occurs when you reject Ho when it is in fact true. We have rejected the null, and we
therefore have the potential to have rejected the null when in fact, the waitresses true tip
percentage was 20%.(3 points) Problem 4
(20 points) We want to survey people with little experience with computers to estimate how many
difficulties arise over the course of a month when they use computers. We'd like a 95% confidence
interval to be accurate to within ±3 hassles. A similar survey done on people with a lot of computer
experience had a standard deviation of 10 hassles. How many people with little computer experience
do we need to survey?
Again, if we go to the last sentence, we see what the problem is asking for: a sample size for a study.
Looking at the rest of the problem, we can figure out what type of data we have and what info we're
given. Note that units are given. That is a sure sign that the variable is quantitative. Also, we're given
a standard deviation. So we need to use the following formula:
z 2∗ 2 (8 points) where m=±3 hassles (2 pts), σ=10 hassles (2 pts) , and zα = z0.025 = 1.96 (4 pts)
n= 2
m
1.96 2∗10 2
plugging into the formula: n= 2
= 42.68 (2 pts) which we round up to 43 people (2 points)
3
Problem 5
A) (20 points) A survey about homelessness offered the randomly selected participants $10 to
complete the survey. At the end of the survey, they had the choice to either donate the $10 to a
homeless shelter or keep it. Of the 275 who chose to keep the $10, 103 of them had not seen
any homeless people in the past month. Of the 1228 who chose to donate the $10, 466 of them
had not seen any homeless people in the past month. For the event “did not see any homeless
people”, find the 99% confidence interval for the difference between those who chose to keep
the money and those who chose to donate it.
The first thing we need to do is see if we have a large enough sample (5 points): keep group: n* pk = # of people who haven't seen any homeless people = 103 n*(1 pk )= # of people who have seen homeless people = 275  103 = 172 donate group: n* pd = # of people who haven't seen any homeless people = 466 n*(1 pd ) = # of people who have seen homeless people = 1228  466 = 762
all are bigger than 10, so we can do a confidence interval. The formula is: ( pk  pd ) ± zα * p k∗1 − p k p d ∗1 − p d where k=keep and d=donate (5pts)
nk
nd 103 =0.3745 (1 pt)
466 = 0.3795 (1 pt) and nk = 275 (1 pt) and nd = 1228 (1pt)
pd= pk = 1228
275
And, for the 99% confidence interval, we need z0.005 = 2.575 (2 points)
Where 0.3745 ∗ 1 −0.3745 0.3795 ∗ 1−0.3795 Plugging the values in: (0.3745  0.3795) ± 2.575*
275
1228
= 0.0050 ± 2.575 *0.0323 = 0.0050 ±0.0832 (2 points)
Which gives us a confidence interval of (0.0872, 0.0782) B) (5 points) Interpret this confidence interval. What does it tell you about the difference between
those who donated the money and those that chose to keep it?
Since 0 is in the interval, there's no significant difference between those who chose to keep the money
and those who donated it with respect to the proportion that had not seen any homeless people in the
past month. ...
View
Full
Document
This note was uploaded on 02/28/2012 for the course ECON 111 taught by Professor Aaa during the Summer '11 term at UIBE.
 Summer '11
 aaa

Click to edit the document details