1.3
3. Sampling Basketball Players
This exercise uses salary data and game statistics for basketball players from the 20142015 NBA
season. The data was collected from
BasketballReference
and
Spotrac
.
Run the next cell to load the two datasets.
Question 1.
We would like to relate players’ game statistics to their salaries. Compute a table
called
full_data
that includes one row for each player who is listed in both
player_data
and
7
salary_data
. It should include all the columns from
player_data
and
salary_data
, except the
"PlayerName"
column.
[21]:
full_data
=
player_data
.
join(
'Name'
,salary_data,
'PlayerName'
)
full_data
[21]:
Name
 Age
 Team  Games  Rebounds  Assists  Steals  Blocks 
Turnovers  Points  Salary
A.J. Price
 28
 TOT
 26
 32
 46
 7
 0

14
 133
 62552
Aaron Brooks
 30
 CHI
 82
 166
 261
 54
 15

157
 954
 1145685
Aaron Gordon
 19
 ORL
 47
 169
 33
 21
 22

38
 243
 3992040
Adreian Payne
 23
 TOT
 32
 162
 30
 19
 9

44
 213
 1855320
Al Horford
 28
 ATL
 76
 544
 244
 68
 98

100
 1156
 12000000
Al Jefferson
 30
 CHO
 65
 548
 113
 47
 84

68
 1082
 13666667
AlFarouq Aminu  24
 DAL
 74
 342
 59
 70
 62

55
 412
 1100602
Alan Anderson
 32
 BRK
 74
 204
 83
 56
 5

60
 545
 1276061
Alec Burks
 23
 UTA
 27
 114
 82
 17
 5

52
 374
 3034356
Alex Kirk
 23
 CLE
 5
 1
 1
 0
 0
 0
 4
 507336
… (482 rows omitted)
[22]:
_
=
ok
.
grade(
'q3_1'
)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

Test summary
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
Basketball team managers would like to hire players who perform well but don’t command high
salaries. From this perspective, a very crude measure of a player’s
value
to their team is the number
of points the player scored in a season for every
$1000 of salary
(
Note
: the
Salary
column is in
dollars, not thousands of dollars). For example, Al Horford scored 1156 points and has a salary of
$12 million.
This is equivalent to 12,000 thousands of dollars, so his value is
1156
12000
.
Question 2.
Create a table called
full_data_with_value
that’s a copy of
full_data
, with an
8
extra column called
"Value"
containing each player’s value (according to our crude measure). Then
make a histogram of players’ values.
Specify bins that make the histogram informative, and
don’t forget your units!
Remember that
hist()
takes in an optional third argument that allows
you to specify the units!
Hint
: Informative histograms contain a majority of the data and
exclude outliers
.
Now suppose we weren’t able to find out every player’s salary (perhaps it was too costly to interview
each player). Instead, we have gathered a
simple random sample
of 100 players’ salaries. The cell
below loads those data.
[23]:
full_data_with_value
=
full_data
.
with_column(
'Value'
,full_data
.
column(
'Points'
)
/
,
→
(full_data
.
column(
'Salary'
)
/1000
))
full_data_with_value
.
hist(
'Value'
,bins
=
np
.
arange(
0
,
1
,
.1
))
[24]:
sample_salary_data
=
Table
.
read_table(
"sample_salary_data.csv"
)
sample_salary_data
[24]:
PlayerName
 Salary
C.J. Watson
 2106720
Taj Gibson
 8000000
Jerrelle Benimon  35000
Quincy Pondexter  3146068
Tyreke Evans
 11265416
Brandon Bass
 6900000
9
DeJuan Blair
 2000000
LeBron James
 20644400
Victor Claver
 1370000
Tony Snell
 1472400
… (90 rows omitted)
Question 3.
Make a histogram of the values of the players in
sample_salary_data
, using the