13 3 sampling basketball players this exercise uses

This preview shows page 7 - 11 out of 23 pages.

1.3 3. Sampling Basketball Players This exercise uses salary data and game statistics for basketball players from the 2014-2015 NBA season. The data was collected from Basketball-Reference and Spotrac . Run the next cell to load the two datasets.
Question 1. We would like to relate players’ game statistics to their salaries. Compute a table called full_data that includes one row for each player who is listed in both player_data and 7
salary_data . It should include all the columns from player_data and salary_data , except the "PlayerName" column. [21]: full_data = player_data . join( 'Name' ,salary_data, 'PlayerName' ) full_data [21]: Name | Age | Team | Games | Rebounds | Assists | Steals | Blocks | Turnovers | Points | Salary A.J. Price | 28 | TOT | 26 | 32 | 46 | 7 | 0 | 14 | 133 | 62552 Aaron Brooks | 30 | CHI | 82 | 166 | 261 | 54 | 15 | 157 | 954 | 1145685 Aaron Gordon | 19 | ORL | 47 | 169 | 33 | 21 | 22 | 38 | 243 | 3992040 Adreian Payne | 23 | TOT | 32 | 162 | 30 | 19 | 9 | 44 | 213 | 1855320 Al Horford | 28 | ATL | 76 | 544 | 244 | 68 | 98 | 100 | 1156 | 12000000 Al Jefferson | 30 | CHO | 65 | 548 | 113 | 47 | 84 | 68 | 1082 | 13666667 Al-Farouq Aminu | 24 | DAL | 74 | 342 | 59 | 70 | 62 | 55 | 412 | 1100602 Alan Anderson | 32 | BRK | 74 | 204 | 83 | 56 | 5 | 60 | 545 | 1276061 Alec Burks | 23 | UTA | 27 | 114 | 82 | 17 | 5 | 52 | 374 | 3034356 Alex Kirk | 23 | CLE | 5 | 1 | 1 | 0 | 0 | 0 | 4 | 507336 … (482 rows omitted) [22]: _ = ok . grade( 'q3_1' ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Running tests --------------------------------------------------------------------- Test summary Passed: 1 Failed: 0 [ooooooooook] 100.0% passed Basketball team managers would like to hire players who perform well but don’t command high salaries. From this perspective, a very crude measure of a player’s value to their team is the number of points the player scored in a season for every $1000 of salary ( Note : the Salary column is in dollars, not thousands of dollars). For example, Al Horford scored 1156 points and has a salary of $12 million. This is equivalent to 12,000 thousands of dollars, so his value is 1156 12000 . Question 2. Create a table called full_data_with_value that’s a copy of full_data , with an 8
extra column called "Value" containing each player’s value (according to our crude measure). Then make a histogram of players’ values. Specify bins that make the histogram informative, and don’t forget your units! Remember that hist() takes in an optional third argument that allows you to specify the units! Hint : Informative histograms contain a majority of the data and exclude outliers . Now suppose we weren’t able to find out every player’s salary (perhaps it was too costly to interview each player). Instead, we have gathered a simple random sample of 100 players’ salaries. The cell below loads those data. [23]: full_data_with_value = full_data . with_column( 'Value' ,full_data . column( 'Points' ) / , (full_data . column( 'Salary' ) /1000 )) full_data_with_value . hist( 'Value' ,bins = np . arange( 0 , 1 , .1 )) [24]: sample_salary_data = Table . read_table( "sample_salary_data.csv" ) sample_salary_data [24]: PlayerName | Salary C.J. Watson | 2106720 Taj Gibson | 8000000 Jerrelle Benimon | 35000 Quincy Pondexter | 3146068 Tyreke Evans | 11265416 Brandon Bass | 6900000 9
DeJuan Blair | 2000000 LeBron James | 20644400 Victor Claver | 1370000 Tony Snell | 1472400 … (90 rows omitted) Question 3. Make a histogram of the values of the players in sample_salary_data , using the

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture