Question 5 for which range of values does the plot in

• 18
• 100% (1) 1 out of 1 people found this document helpful

This preview shows page 9 - 11 out of 18 pages.

Question 5. For which range of values does the plot in question 3 better depict the distribution of the population’s player values : 0 to 0.5, or above 0.5? Explain your answer.
9
1.4 4. Earthquakes The next cell loads a table containing information about every earthquake with a magnitude above 4.5 in 2017, compiled by the US Geological Survey. (source: ) [26]: earthquakes = Table() . read_table( 'earthquakes_2017.csv' ) . select([ 'time' , 'mag' , , 'place' ]) earthquakes [26]: time | mag | place 2017-12-31T23:48:50.980Z | 4.8 | 30km SSE of Pagan, Northern Mariana Islands 2017-12-31T20:59:02.500Z | 5.1 | Southern East Pacific Rise 2017-12-31T20:27:49.450Z | 5.2 | Chagos Archipelago region 2017-12-31T19:42:41.250Z | 4.6 | 18km NE of Hasaki, Japan 2017-12-31T16:02:59.920Z | 4.5 | Western Xizang 2017-12-31T15:50:22.510Z | 4.5 | 156km SSE of Longyearbyen, Svalbard and Jan Mayen 2017-12-31T14:53:32.590Z | 5.1 | 41km S of Daliao, Philippines 2017-12-31T14:51:58.200Z | 5.1 | 132km SSW of Lata, Solomon Islands 2017-12-31T12:24:13.150Z | 4.6 | 79km SSW of Hirara, Japan 2017-12-31T04:02:18.500Z | 4.8 | 10km W of Korini, Greece … (6350 rows omitted) There are a several earthquakes that occurred in 2017 that we’re interested in, and generally, we won’t have access to this large population. Instead, if we sample correctly, we can take a small subsample of earthquakes in this year to get an idea about the distribution of magnitudes throughout the year! Question 1. In the following lines of code, we take two different samples from the earthquake table, and calculate the mean of the magnitudes of these earthquakes. Are these samples representative of the population of earthquakes in the original table (that is, the should we expect the mean to be close to the population mean)? Hint: Consider the ordering of the earthquakes table. [27]: sample1 = earthquakes . sort( 'mag' , descending = True ) . take(np . arange( 100 )) sample1_magnitude_mean = np . mean(sample1 . column( 'mag' )) sample2 = earthquakes . take(np . arange( 100 )) sample2_magnitude_mean = np . mean(sample2 . column( 'mag' )) [sample1_magnitude_mean, sample2_magnitude_mean] [27]: [6.422999999999999, 4.7749999999999995] Both samples are deterministic samples. Sample 1 takes the 100 earthquakes with the biggest magnitude and Sample 2 takes the 100 first earthquakes of the table. There are some earthquakes that have no chance of being included in our sample. Therefore these samples are not really representative of the population of earthquakes in the original table.