131 convenience sampling one sampling methodology

This preview shows page 18 - 22 out of 33 pages.

1.3.1Convenience samplingOne sampling methodology, which isgenerally a bad idea, is to choose players who are somehowconvenient to sample. For example, you might choose players from one team that’s near yourhouse, since it’s easier to survey them. This is called, somewhat pejoratively,convenience sampling.Suppose you survey onlyrelatively newplayers with ages less than 22. (The more experiencedplayers didn’t bother to answer your surveys about their salaries.)Question 3.3Assignconvenience_sample_datato a subset offull_datathat contains onlythe rows for players under the age of 22.In [35]:convenience_sample=full_data.where('Age', are.below(22))convenience_sampleOut[35]:PlayerName| Salary| Age| Team | Games | Rebounds | Assists | Steals | BlocksAaron Gordon| 3992040 | 19| ORL| 47| 169| 33| 21| 22Alex Len| 3649920 | 21| PHO| 69| 454| 32| 34| 105Andre Drummond| 2568360 | 21| DET| 82| 1104| 55| 73| 153Andrew Wiggins| 5510640 | 19| MIN| 82| 374| 170| 86| 50Anthony Bennett | 5563920 | 21| MIN| 57| 216| 48| 27| 16Anthony Davis| 5607240 | 21| NOP| 68| 696| 149| 100| 200Archie Goodwin| 1112280 | 20| PHO| 41| 74| 44| 18| 9Ben McLemore| 3026280 | 21| SAC| 82| 241| 140| 77| 19Bradley Beal| 4505280 | 21| WAS| 63| 241| 194| 76| 18Bruno Caboclo| 1458360 | 19| TOR| 8| 2| 0| 0| 1... (34 rows omitted)In [36]:_=ok.grade('q3_3')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 2Failed: 0[ooooooooook] 100.0% passedQuestion 3.4Assignconvenience_statsto a list of the average age and average salary ofyour convenience sample, using thecompute_statisticsfunction. Since they’re computed on asample, these are calledsample averages.
18
19
In [38]:_=ok.grade('q3_4')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 3Failed: 0[ooooooooook] 100.0% passedNext, we’ll compare the convenience sample salaries with the full data salaries in a single his-togram. To do that, we’ll need to use thebin_columnoption of thehistmethod, which indicatesthat all columns are counts of the bins in a particular column. The following cell should not requireany changes; just run it.In [39]:defcompare_salaries(first, second, first_title, second_title):"""Compare the salaries in two tables."""max_salary=max(np.append(first.column('Salary'), second.column('Salary')))bins=np.arange(0, max_salary+1e6+1,1e6)first_binned=first.bin('Salary', bins=bins).relabeled(1, first_title)second_binned=second.bin('Salary', bins=bins).relabeled(1, second_title)first_binned.join('bin', second_binned).hist(bin_column='bin')compare_salaries(full_data, convenience_sample,'All Players','Convenience Sample')Question 3.5Does the convenience sample give us an accurate picture of the age and salary ofthe full population of NBA players in 2014-2015? Would you expect it to, in general? Before youmove on, write a short answer in English below. You can refer to the statistics calculated above orperform your own analysis.20
1.3.2Simple random samplingA more principled approach is to sample uniformly at random from the players. If we ensure thateach player is selected at most once, this is asimple random sample without replacement, sometimesabbreviated to "simple random sample" or "SRSWOR". Imagine writing down each player’s name

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 33 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Fall
Professor
N/A
Tags
Pride and Prejudice, Simple random sample, Test Summary

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture