This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Problem Set #3 1 CS 229, Autumn 2011 Problem Set #3: Theory & Unsupervised learning Due in class (9:30am) on Wednesday, November 16. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to [email protected] , please make sure to write the homework number and the question number in the subject line, such as Hwk 3 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class’ collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figures that you are asked to plot. (5) Please indicate the submission time and number of late dates clearly in your submission. SCPD students: Please email your solutions to [email protected] , and write “Prob lem Set 3 Submission” on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [23 points] Uniform convergence You are hired by CNN to help design the sampling procedure for making their electoral predictions for the next presidential election in the (fictitious) country of Elbania. The country of Elbania is organized into states, and there are only two candidates running in this election: One from the Elbanian Democratic party, and another from the Labor Party of Elbania. The plan for making our electorial predictions is as follows: We’ll sample m voters from each state, and ask whether they’re voting democrat. We’ll then publish, for each state, the estimated fraction of democrat voters. In this problem, we’ll work out how many voters we need to sample in order to ensure that we get good predictions with high probability. One reasonable goal might be to set m large enough that, with high probability, we obtain uniformly accurate estimates of the fraction of democrat voters in every state. But this might require surveying very many people, which would be prohibitively expensive. So, we’re instead going to demand only a slightly lower degree of accuracy. Specifically, we’ll say that our prediction for a state is “highly inaccurate” if the estimated fraction of democrat voters differs from the actual fraction of democrat voters within that state by more than a tolerance factor γ . CNN knows that their viewers will tolerate some small number of states’ estimates being highly inaccurate; however, their credibility would be damaged if they reported highly inaccurate estimates for too many states. So, rather than trying to ensure that all states’ estimates are within γ of the true values (which would correspond to no state’s estimate being highly inaccurate), we will instead try only to ensure that the number of states with highly inaccurate estimates is small.to ensure that the number of states with highly inaccurate estimates is small....
View
Full
Document
This document was uploaded on 01/06/2012.
 Fall '09

Click to edit the document details