# Data8_Project_2 - PART 1 1.1 grouped_death_causes =...

• 6
• 100% (17) 17 out of 17 people found this document helpful

This preview shows page 1 - 3 out of 6 pages.

PART 11.1 grouped_death_causes = causes_of_death.group("Cause")all_unique_causes = grouped_death_causes.column("Cause")sorted(all_unique_causes)1.2causes_for_plotting = causes_of_death.pivot('Cause','Year', values = 'Age Adjusted Death Rate',collect = sum)causes_for_plotting.show(5)1.3 (double check on your own to make sure we got the same)disease_trend_explanation = make_array(1, 3, 5, 6)disease_trend_explanationPART 21.1 (revised) **revised again**observed_diabetes = sum(framingham.column("DIABETES"))/framingham.num_rowsobserved_diabetes_distance = abs(observed_diabetes - 0.0093)observed_diabetes_distance1.2diabetes_proportions = make_array(.9907, .0093)def diabetes_statistic():db_sim = sample_proportions(3842, diabetes_proportions).item(1)abs_db_sim = abs(db_sim - .0093)return abs_db_sim1.3diabetes_simulated_stats = make_array()for i in np.arange(5000): this_sim = diabetes_statistic()diabetes_simulated_stats = np.append(diabetes_simulated_stats, this_sim)diabetes_simulated_stats1.5Based on the histogram above, we should reject the null as an observed distance to the true