In [53]:
lower_bound
=
percentile(
2.5
, resampled_means)
upper_bound
=
percentile(
97.5
, resampled_means)
print
(
"95
% c
onfidence interval for the average restaurant score, computed by bootstrap
95% confidence interval for the average restaurant score, computed by bootstrapping:
( 90.05 , 92.88 )
Question 4
Does the distribution of the resampled mean scores look normally distributed? State
"yes" or "no" and describe in one sentence why you would expect that result.
Yes, because the Central Limit Theorem says that the probability distribution of the sum or
average of a large random sample drawn with replacement will be roughly normal, regardless of
the distribution of the population from which the sample is drawn.
Question 5
Does the distribution of the
sampled scores
look normally distributed? State "yes"
or "no" and describe in one sentence why you should expect this result.
Hint:
Remember that we are no longer talking about the resampled means!
No, it’s becuase the Central Limit Theorem does not apply to the distribution of sampled
scores. It would only apply to the sum or average of the sampled scores.
7

For the last question, you’ll need to recall two facts. 1. If a group of numbers has a normal dis-
tribution, around 95% of them lie within 2 standard deviations of their mean. 2. The Central Limit
Theorem tells us the quantitative relationship between the following: * the standard deviation of
an array of numbers. * the standard deviation of an array of means of samples taken from those
numbers.
Question 6
Without referencing the array
resampled_means
or performing any new simulations,
calculate an interval around the
sample_mean
that covers approximately 95% of the numbers in
the
resampled_means
array.
You may use the following values to compute your result, but you
should not perform additional resampling
- think about how you can use the CLT to accomplish
this.
In [54]:
sample_mean
=
np
.
mean(restaurant_sample
.
column(
3
))
sample_sd
=
np
.
std(restaurant_sample
.
column(
3
))
sample_size
=
restaurant_sample
.
num_rows
sd_of_means
=
sample_sd
/
np
.
sqrt(sample_size)
lower_bound_normal
=
sample_mean
-2*
sd_of_means
upper_bound_normal
=
sample_mean
+2*
sd_of_means
print
(
"95
% c
onfidence interval for the average restaurant score, computed by a normal
95% confidence interval for the average restaurant score, computed by a normal approximation:
( 90.09739258692412 , 92.96260741307589 )
This confidence interval should look very similar to the one you computed in
Question 3
.
1.2
2. Testing the Central Limit Theorem
The Central Limit Theorem tells us that the probability distribution of the
sum
or
average
of a
large random sample drawn with replacement will be roughly normal,
regardless of the distribution
of the population from which the sample is drawn
.
That’s a pretty big claim, but the theorem doesn’t stop there. It further states that the standard
deviation of this normal distribution is given by
sd of the original distribution
√
sample size
In other words, suppose we start with
any distribution
that has standard deviation
x
, take a
sample of size
n
(where
n

#### You've reached the end of your free preview.

Want to read all 28 pages?

- Fall '17
- Normal Distribution, Standard Deviation, Mean