Question 2.
Create a table called
full_data_with_value
that’s a copy of
full_data
, with an
extra column called
"Value"
containing each player’s value (according to our crude measure). Then
make a histogram of players’ values.
Specify bins that make the histogram informative and
don’t forget your units!
Remember that
hist()
takes in an optional third argument that allows
you to specify the units!
Hint
: Informative histograms contain a majority of the data and
exclude outliers
.

7

Now suppose we weren’t able to find out every player’s salary (perhaps it was too costly to interview
each player). Instead, we have gathered a
simple random sample
of 100 players’ salaries. The cell
below loads those data.
[7]:
sample_salary_data
=
Table
.
read_table(
"sample_salary_data.csv"
)
sample_salary_data
.
show(
3
)
<IPython.core.display.HTML object>
Question 3.
Make a histogram of the values of the players in
sample_salary_data
, using the
same method for measuring value we used in question 2.
Use the same bins, too.
Hint:
This will take several steps.

8

Now let us summarize what we have seen. To guide you, we have written most of the summary
already.
Question 4.
Complete the statements below by setting each relevant variable name to the value
that correctly fills the blank.
• The plot in question 2 displayed a(n) [
distribution_1
] distribution of the population of
[
player_count_1
] players. The areas of the bars in the plot sum to [
area_total_1
].
• The plot in question 3 displayed a(n) [
distribution_2
] distribution of the sample of
[
player_count_2
] players. The areas of the bars in the plot sum to [
area_total_2
].
distribution_1
and
distribution_2
should be set to one of the following strings:
"empirical"
or
"probability"
.
player_count_1
,
area_total_1
,
player_count_2
, and
area_total_2
should be set to integers.
Hint 1:
For a refresher on distribution types, check out
Section 10.1