4 Pages

n17

Course: COMPSCI 70, Fall 2010
School: Berkeley
Rating:
 
 
 
 
 

Word Count: 1473

Document Preview

70 CS Fall 2010 Discrete Mathematics and Probability Theory Tse/Wagner Lecture 17 Polling and the Law of Large Numbers Polling Question: We want to estimate the proportion p of Democrats in the US population, by taking a small random sample. How large does our sample have to be to guarantee that our estimate will be within (say) 0.1 of the true value with probability at least 0.95? This is perhaps the most basic...

Register Now

Unformatted Document Excerpt

Coursehero >> California >> Berkeley >> COMPSCI 70

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
70 CS Fall 2010 Discrete Mathematics and Probability Theory Tse/Wagner Lecture 17 Polling and the Law of Large Numbers Polling Question: We want to estimate the proportion p of Democrats in the US population, by taking a small random sample. How large does our sample have to be to guarantee that our estimate will be within (say) 0.1 of the true value with probability at least 0.95? This is perhaps the most basic statistical estimation problem, and it shows up everywhere. We will develop a simple solution that uses only Chebyshevs inequality. More rened methods can be used to get sharper results. Lets denote the size of our sample by n (to be determined), and the number of Democrats in it by the random variable Sn . (The subscript n just reminds us that the r.v. depends on the size of the sample.) Then 1 our estimate will be the value An = n Sn . Now as has often been the case, we will nd it helpful to write Sn = X1 + X2 + + Xn , where Xi = 1 0 if person i in sample is a Democrat; otherwise. Note that each Xi can be viewed as a coin toss, with Heads probability p (though of course we do not know the value of p). And the coin tosses are independent.1 Hence, Sn is a binomial random variable with parameters n and p. What is the expectation of our estimate? 1 E(An ) = E( n Sn ) = 1 E(Sn ) = 1 (np) = p. n n So for any value of n, our estimate will always have the correct expectation p. [Such a r.v. is often called an unbiased estimator of p.] Now presumably, as we increase our sample size n, our estimate should get more and more accurate. This will show up in the fact that the variance decreases with n: i.e., as n increases, the probability that we are far from the mean p will get smaller. To see this, we need to compute Var(An ). But An = 1 Sn , which is just a constant times a binomial random n variable. Theorem 17.1: For any random variable X and constant c, we have Var(cX ) = c2 Var(X ). are assuming here that the sampling is done with replacement; i.e., we select each person in the sample from the entire population, including those we have already picked. So there is a small chance that we will pick the same person twice. 1 We CS 70, Fall 2010, Lecture 17 1 The proof of this theorem follows directly from the denition of the variance. (Try it yourself.) Now to compute Var(An ): n 2 Var(An ) = Var( 1 Sn ) = ( 1 )2 Var(Sn ) = ( 1 )2 Var(Xi ) = , n n n n i=1 where we have written 2 for the variance of each of the Xi . The third equality follows from the calculation we did for the binomial random variable in the last lecture note. So we see that the variance of An decreases linearly with n. This fact ensures that, as we take larger and larger sample sizes n, the probability that we deviate much from the expectation p gets smaller and smaller. Lets now use Chebyshevs inequality to gure out how large n has to be to ensure a specied accuracy in our estimate of the proportion of Democrats p. A natural way to measure this is for us to specify two parameters, and , both in the range (0, 1). The parameter controls the error we are prepared to tolerate in our estimate, and controls the condence we want to have in our estimate. A more precise version of our original question is then the following: Question: For the Democrat-estimation problem above, how large does the sample size n have to be in order to ensure that Pr[|An p| ] ? In our original question, we had = 0.1 and = 0.05. Lets apply Chebyshevs inequality to answer our more precise question above. Since we know Var(An ), this will be quite simple. From Chebyshevs inequality, we have Pr[|An p| ] Var(An ) 2 = 2. 2 n To make this less than the desired value , we need to set n 2 1 2 . (1) Now recall that 2 = Var(Xi ) is the variance of a single sample Xi . So, since Xi is a 0/1-valued r.v., we have 2 = p(1 p), and inequality (1) becomes n p(1 p) 1 . 2 (2) Plugging in = 0.1 and = 0.05, we see that a sample size of n = 2000 p(1 p) is sufcient. At this point you should be worried. Why? Because our formula for the sample size contains p, and this is precisely the quantity we are trying to estimate! But we can get around this. The largest value possible of p(1 p) is 1/4 (achieved when p = 1/2.) Hence, if we pick n = 2000 (1/4) = 500, then no matter what the value of p is, the sample size is sufcient. Estimating a general expectation What if we wanted to estimate something a little more complex than the proportion of Democrats in the population, such as the average wealth of people in the US? Then we could use exactly the same scheme as above, except that now the r.v. Xi is the wealth of the ith person in our sample. Clearly E(Xi ) = , the 1 average wealth (which is what we are trying to estimate). And our estimate will again be An = n n=1 Xi , for i a suitably chosen sample size n. We again have E(An ) = . And as long as Var(n 1 Xi ) = n=1 Var(Xi ) , we = i CS 70, Fall 2010, Lecture 17 2 have as before Var(An ) = , where 2 = Var(Xi ) is the common variance of the Xi s. From equation (1), it n is enough for the sample size n to satisfy 1 n 2 2 . (3) Here and are the desired error and condence respectively, as before. Now of course we dont know 2 , appearing in equation (3). In practice, we would use an upper bound on 2 (just as we used the fact that 2 = p(1 p) 1/4 in the Democrats problem). Plugging these bounds into equation (3) will ensure that our sample size is large enough. Let us recapitulate the three properties we used about the random variables Xi s in the above derivation: (1) E(Xi ) = , i = 1, . . . , n. (2) Var(Xi ) = 2 , i = 1, . . . , n. (3) Var(n=1 Xi ) = n=1 Var(Xi ). i i The rst two properties hold if the Xi s have the same distribution. The third property holds if the random variables are mutually independent. We have already dened the notion of independence for events, and we will dene independence for random variables in the next lecture note. Intuitively, two random variables are independent if the events "associated" with them are independent. In the polling example when the Xi s are indicator random variables for ipping Heads, the random variables are independent if the ips are independent. Random variables which have the same distribution and are independent are called independent identically distributed (abbreviated as i.i.d.). As a further example, suppose we are trying to estimate the average rate of emission from a radioactive source, and we are willing to assume that the emissions follow a Poisson distribution with some unknown parameter of course, this is precisely the expectation we are trying to estimate. Now in this case we 2 1 have = and also 2 = (see the previous lecture note). So 2 = . Thus in this case a sample size of n= 1 2 2 sufces. (Again, in practice we would use a lower bound on .) The Law of Large Numbers The estimation method we used in the previous two sections is based on a principle that we accept as part of everyday life: namely, the Law of Large Numbers (LLN). This asserts that, if we observe some random variable many times, and take the average of the observations, then this average will converge to a single value, which is of course the expectation of the random variable. In other words, averaging tends to smooth out any large uctuations, and the more averaging we do the better the smoothing. Theorem 17.2: [Law of Large Numbers] Let X1 , X2 , . . . , Xn be i.i.d. random variables with common expectation = E(Xi ). Dene An = 1 n=1 Xi . Then for any > 0, we have ni Pr [|An | ] 0 as n . Proof: Let Var(Xi ) = 2 be the common variance of the r.v.s; we assume that 2 is nite2 . With this (relatively mild) assumption, the LLN is an immediate consequence of Chebyshevs Inequality. For, as we 2 have seen above, E(An ) = and Var(An ) = , so by Chebyshev we have n Pr [|An | ] 2 If Var(An ) 2 = 0 2 n 2 as n . 2 is not nite, the LLN still holds but the proof is much trickier. 3 CS 70, Fall 2010, Lecture 17 This completes the proof. 2 Notice that the LLN says that the probability of any deviation from the mean, however small, tends to zero as the number of observations n in our average tends to innity. Thus by taking n large enough, we can make the probability of any given deviation as small as we like. CS 70, Fall 2010, Lecture 17 4
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Berkeley - COMPSCI - 70
CS 70 Fall 2010Discrete Mathematics and Probability Theory Tse/Wagner Lecture 18Multiple Random Variables and Applications to InferenceIn many probability problems, we have to deal with multiple r.v.s dened on the same probability space. We have alread
Berkeley - COMPSCI - 70
CS 70 Fall 2010Discrete Mathematics and Probability Theory Tse/Wagner Note 19A Brief Introduction to Continuous ProbabilityUp to now we have focused exclusively on discrete probability spaces , where the number of sample points is either nite or counta
Rutgers - CHEM 162 - 162
CHEM 162-2012 EXAM I REVIEWfrom 2011 exam IDr. Ed TavssReview dates: 2/3 and 2/7Chem 162-2012 Exam I review1gsolute + gsolvent = gsolutionFORMULASmLsolute + mLsolventmLsolutionMass percent = grams of solute/100 g solutionMole percent = moles of
Rutgers - THEATER AP - 101
Machinaldefinition: part of a machinesociety is a working machine but when something goes wrong, it breaks downwomen: homemaker, secretary, insignificant jobs, no high statusesHelenempathize with her?Background: machinery cage city flag judicial
CUNY Baruch - CIS - 220
Jessica YinCIS2200H- XZ24HProf. IzenChapter 1 Assignment- Assignments and Exercises1.Porters Five Forces Model helps business understand the attractiveness of an industryand its competitive pressure in terms of buyer power, supplier power, the threa
CUNY Baruch - CIS - 220
Jessica YinCIS2200H- XZ24HProf. IzenChapter 1 Assignment1. What is the relationship between management information systems (MIS) andinformation technology (IT)?Management information systems (MIS) are a business function that deals with varioustask
CUNY Baruch - CIS - 220
Jessica YinCIS2200H- XZ24HProf. IzenChapter 2 Assignment- Short Answer Questions1. The traditional buy-hold-sell inventory model is an expensive and potentially risky onebecause holding on to excess inventory thats not needed costs money. This is bec
CUNY Baruch - CIS - 220
Jessica YinCIS2200H- XZ24HProf. IzenChapter 4 Assignment- Short Answer Questions1. Databases are what aid organization in both transaction and analytical processing. A relationaldatabase is the most popular of all different databases. Relational data
CUNY Baruch - CIS - 220
Jessica YinCIS2200H- XZ24HProf. IzenChapter 4 Assignment- Short Answer Questions1. There is the structured decision, in which you process certain information in a specified way toget the right answer. There is also the non-structured decision where t
Ashford University - EXP 105 - EXP105
Maryland - ENGR - 100
S100ENET henlds PlaoTeam GraftovercHTeam OrganizationDivision of LaborMeetingsGantt Chart
Purdue - PHIL - 306
Philosophy 306Lecture 16Wittgenstein: The picture theory of meaningA. The Picture theory of meaning (pp. 10-11):1. Sentences from the Tractatus:a. We picture facts to ourselves (2.1)b. A picture represents a situation in logical space, the existence
ITT Tech Pittsburgh - AASISA - TB145
When I select a computer case the five factors I look for is case type, size, cooling,installation features and the extras.Case type - You need to make sure that the case will hold the type of motherboard you areputting in. So if you have a miniATX you
ITT Tech Pittsburgh - AASISA - TB145
I am going to cover the fast growing trend of HDMI (High-Definition MultimediaInterface) for this assignment. HDMI was founded by several major corporations and starteddevelopment in early 2002. HDMI was created to improve on the DVI and component (YCbC
ITT Tech Pittsburgh - AASISA - TB145
The first monitor I will cover is the Asus MS238H Black 23" Widescreen LED Monitor.This monitor would be my first pick out of the list of monitors I will be covering. With a pricetag of $180.00 this is a monitor packed with performance for a great price
ITT Tech Pittsburgh - AASISA - TB145
When you talk about computer ports there is so many to list. I will cover in my opinionthe most popular and common ones. My top pick would be the USB (Universal Serial Bus) port.This port now has 3 different speeds, 1.0, 2.0, and 3.0. The most common on
ITT Tech Pittsburgh - AASISA - TB145
A computer bus is made up of the maze like circuitry of the motherboard. There is four types of signalsthat can travel through these circuits; power, control signals, memory addresses, and data. Chips on the boardrequire power and will take what power t
ITT Tech Pittsburgh - AASISA - TB145
Intel Core i5-760 Lynnfield 2.8GHz 8MB L3 Cache LGA 1156 95W Quad-Core Desktop Processor. Thisprocessor has four processing cores to maximize performance and handle multitasking. It also has Intel SmartCache which increases data accessing speed.1AMD Ph
ITT Tech Pittsburgh - AASISA - TB145
The computer memory is very vital to the performance of a computer.Because memory runs faster than a hard drive, accessed programs aretemporarly stored in the memory to run. The more memory you have the moreprograms you can have opened at once. There a
ITT Tech Pittsburgh - AASISA - TB145
When buying a motherboard there are several factors a person should look for. One would be the type ofmotherboard you want. There are several forms mini-ATX, ATX, etc. You will need to know the form you needbased on your case size. The next thing would
ITT Tech Pittsburgh - AASISA - TB145
A UPS or uninterruptible power supply protects usually protects against four types of power problems,voltage surges and spikes, voltage sags, total power failure, and frequency differences. Computer systemsexpect 120-volts AC oscillating at 60 Hertz. Sy
ITT Tech Pittsburgh - AASISA - TB145
I did 5.5.4 of the lab but would only give me 80%. But I got check marks for all steps.
ITT Tech Pittsburgh - AASISA - TB145
When it comes to partitioning a hard drive, to me it is a users choice. Me I don't have any drivespartitioned. The reason you would partition a hard drive is to be more organized, faster access to files and theability to do many other things. If you hav
ITT Tech Pittsburgh - AASISA - TB145
RAID stands for "Redundant Arrays of Independent Disks", the idea behind RAID was to combinemultiple small inexpensive disk drives into an array of disk drives that yields performance. Basically combiningmultiple drives into a single drive which could i
ITT Tech Pittsburgh - AASISA - TB145
Serial ATA or SATASerial ATA uses a thin 7 pin connector and only uses 250mV of power. SATA drives do not need to havejumpers set for master and slave and has transfer speeds ranging over 300Mps depending on the type ofgeneration you buy.1 When compari
ITT Tech Pittsburgh - AASISA - TB145
Hard drives have come a long way. Hard drives use to be connected directly to the motherboard untildesigners realized that they were too bulky and heavy for the motherboard. So they created a cable that wouldrun from the hard drive to the board calling
ITT Tech Pittsburgh - AASISA - TB145
The world of computers is always changing and so is the types of storage. When it comes to hard drivesthe newest technology is the solid state drive. Their capacity range from 8GB up to 2TB, also with the newnessof this technology it also very pricey. T
ITT Tech Pittsburgh - AASISA - TB145
If I had a hard drive fail to load up, the first thing I would do is listen to make sure the drive is notmaking out of the ordinary sounds. This would be an indicator to me that some hardware has malfunctioned orbroke. Another problem could be one of th
ITT Tech Pittsburgh - AASISA - TB145
Anytime you want to install something inside a computer the most important thing is to disconnect thecomputer from any power source. Next thing is to have an antistatic bracelet, these are essential sincecomputers components are fragile to ESD. Open the
ITT Tech Pittsburgh - AASISA - TB145
When you think of "architecture" you think of buildings. When you put that into the aspect of computersit is a little different. When you talk about architecture in computers it is the order in which particular processesare carried out by the operating
ITT Tech Pittsburgh - AASISA - TB145
Anytime a system is running slowly or it takes time to open files can be because your hard drive isfragmented. There are a few options you have to fixing this, you can use the Windows based defrag option orpurchase a third party defragging software. Onc
ITT Tech Pittsburgh - AASISA - TB145
When upgrading from Windows 9.x or 98 to Windows 2000 you come across many problems. Theregistries are not compatible, drivers are not always up to date, and software might not run with Windows 2000.The first thing I would do is run the compatibility so
ITT Tech Pittsburgh - AASISA - TB145
When you think back at what the internet was originally created for and what it has become, it amazesme at what you can do with the internet. I am going to cover just a few of the services and describe each one inlittle detail.Email Services -Email is
ITT Tech Pittsburgh - AASISA - TB145
Bus Topology -This type has all the nodes connected to a common transmission medium withtwo endpoints. The advantage of a bus topology is the ease ofmanaging it. This is real useful for a small business.1Ring Topology - In this type every node in the
ITT Tech Pittsburgh - AASISA - TB145
To send an email the first thing you need to do is register for an e-mail account. You have several tochoose from like Yahoo, Gmail, Hotmail, etc. Once you have established your email follow the steps below;1. Click the "compose or new" button to start
ITT Tech Pittsburgh - AASISA - TB145
1. Network Access - This specifies where the data is sent and in what form to the specificnetwork.2. Internet - This is responsible for supplying data packets or datagram's across one or more networks.3. Transport - This is responsible for end-to-end m
ITT Tech Pittsburgh - AASISA - TB145
This assignment is very vague and not detailed enough. Several issues could be playing a role in the error that istaking place. There is could be a virus, adware, spyware issues. The web browser may need to be re-installed, orperform a vital update. Or
S.F. State - BUS - 101
PM592 Week 4 AssignmentProblems 4-1 through 4-34-1 Resource leveling problemThe following data were obtained from a project to build a pressure vessel:Activity DurationPredecessorsResources/CostA4 weeks-1 Cutting Platform/$800 day.B4 weeks-1
GWU - IAFF - 3186
The Eurasia Center 4927 Massachusetts Ave. NW Washington, DC 20009 www.eurasiacenter.org Email: President@eurasiacenter.orgThe Islamic Republic of AfghanistanCountry Report Politics:The Islamic Republic of Afghanistan has politically been in a state of
GWU - IAFF - 3186
The Eurasia Center 4927 Massachusetts Ave. NW Washington, DC 20009 www.eurasiacenter.org Email: President@eurasiacenter.orgThe Republic of AzerbaijanCountry Report Politics:The Republic of Azerbaijan regained its independence following the fall of the
GWU - IAFF - 3186
The Eurasia Center 4927 Massachusetts Ave. NW Washington, DC 20009 www.eurasiacenter.org Email: President@eurasiacenter.orgThe Islamic Republic of IranCountry Report Politics:The Islamic Republic of Iran was founded following the Iranian Revolution of
GWU - IAFF - 3186
Greg Arnold Herr Scanlon AP Deutsch 3/20/10 Der germanischen Kultur Die deutsche Bereiche war die barbarischen Hinterland vor zwei-tausend (2000) Jahren. Weit von heutigen Deutschland, es ein Land mit Aberglauben und Wildheit. Aber die Germanen war nicht
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reported around
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reported around
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights (Week 3) India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reporte
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights (Week 3) India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reporte
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights (Week 3) India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reporte
GWU - IAFF - 3186
Greg Arnold IAFF 3186 India News Highlights (Week 3) India holds state elections in Uttarakhand, Punjab India held state elections in the western states of Uttarakhand and Punjab today. Over 20 million people were eligible to vote, and both states reporte
GWU - IAFF - 3188
Arnold The Middle East has been a lynchpin of US foreign policy for the last 70 years, and since the fall of the Soviet Union, has more often than not been the center of its focus. Twice in the last few decades, American forces have intervened in one part
GWU - FINA - 6274
FINValuation methodsAn overview2001 M. P. NarayananUniversity of MichiganFINMethodologies3 Comparable multiples s P/E multiple s Market to Book multiple s Price to Revenue multiple s Enterprise value to EBIT multiple 3 Discounted Cash Flow (DCF) s
GWU - FINA - 6274
VALUATION OF FIRMS IN MERGERS AND ACQUISITIONSOKAN BAYRAKDefinitions A merger is a combination of two or morecorporations in which only one corporation survives and the merged corporations go out of business. Statutory merger is a merger where the acq
GWU - FINA - 6274
Valuation of Merger TargetCorporate Financial Decisions Timothy A. ThompsonBasicsValuation of merger target is from the perspective of acquiring companys shareholders Net present value of acquisition is the ,value of the target to acquirer" minus the ,
GWU - FINA - 6274
Chapter 29: Mergers and Acquisitions Basic terms and definitions concerning mergers and acquisitions Reasons for mergers and acquisitions Real world empirical observations An example of valuing a potential acquisitionWSU EMBA Corporate Finance29-1Mer
GWU - FINA - 6274
MERGERS & ACQUISITIONS Chapter 19Alex Tajirian, 1997Mergers & Acquisitions19-2OUTLINE # # # # # #Types of Takeovers Valid vs. Dubious Reasons for Takeovers Valuation and Payment Methods of Takeovers LBOs Divestitures and Spin-offs Performance Evidenc
GWU - FINA - 6274
Acquisition ValuationAswath DamodaranAswath Damodaran1Issues in Acquisition ValuationnAcquisition valuations are complex, because the valuation ofteninvolved issues like synergy and control, which go beyond just valuinga target firm. It is importa
GWU - EDUC - 1101-2
:26 (:1/10)1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
GWU - EDUC - 1101-2
:29 (:1/9)1. 2. 3. 4. 5. 7. 6. 8. 9. 10. 11. 12.
GWU - EDUC - 1101-2
:43 (:1/9)1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
GWU - EDUC - 1101-2
:42 (:1/10)1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
GWU - EDUC - 1101-2
:41 (:1/10)1. 2. 3. 4. 5. 6. 7. 8. 9. 10.11. :12. 13. 14. 15. 16. 17. 18. 19.