As can be seen, Horvitz-Thompson estimators are in principle quite general intheir applicability. In reality, of course, they are limited to those contexts in whichthe values of theπimay be computed for the underlying sampling design. Suchcomputations tend to be noticeably more difficult for designs that sample withoutreplacement. Nevertheless, we will see in Section 5.4 a number of different networkgraph sampling designs for which the valuesπimay be computed. And even incases for which it appears difficult or impossible to compute these inclusion prob-abilities, the Horvitz-Thompson framework nonetheless provides a perspective bywhich valuable insight may be gained.5.2.2 Estimation of Group SizeA special type of population total that is often of interest is the size of a group. Note,for example, that in the previous material it was implicitly assumed thatNu, the sizeof the populationU, is known. Many times this is simply not true. In fact, in somecases estimatingNuis an important goal in and of itself. For example, there aremany populations that are ‘hard to find’ and yet whose size is important to know forplanning purposes, such as populations of endangered animal species or populationsof humans at risk for a particular rare disease or condition.In principle, we may writeNu=∑i∈U1, which would suggest the Horvitz-Thompson estimatorˆNu=∑i∈Sπ−1i. Unfortunately, as we have seen in Exam-ples 5.2 and 5.3, knowledge ofNuis typically needed to compute theπi, whichmakes this approach infeasible.Instead, alternative techniques have been developed for this particular estimationproblem, the primary example of which is the class ofcapture-recaptureestimators.The simplest version of capture-recapture involves two stages of simple randomsampling without replacement, yielding two samples, sayS1andS2. In the firststage, the sampleS1of sizen1is taken, and all of the units inS1are ‘marked.’Marking might correspond to literally tagging a fish or animal, or simply noting theID number of a record in a database. All of the units inS1are then ‘returned’ to thepopulation – either literally or figuratively – and, at the second stage of sampling, asample of sizen2is taken fromU. The valueˆN(c/r)u=n2mn1(5.9)

1305 Sampling and Estimation in Network Graphsis then used as an estimate ofNu, wherem=|S1∩S2|is the number of marked unitsobserved in the second sample. An estimator of the variance isV(ˆN(c/r)u) =n1n2(n1−m)(n2−m)m3.(5.10)The capture-recapture estimator in (5.9) may be seen to adjust upward the sizen1of the first sample by one over the factorm/n2, an indication of what fraction ofthe overall population was marked. It has been derived using a number of differentarguments. For example, ifn1andn2are fixed in advance,mhas a hypergeometricdistribution, and the integer part ofˆN(c/r)ucan be shown to be the correspondingmaximum likelihood estimate ofNu. Similar estimators have been derived undervarious other models and assumptions, allowing for such changes as randomn1,n2,unequal probability sampling of units, changes in inclusion probabilities betweenthe first and second stages of sampling, and more than two stages of sampling.

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 396 pages?

Upload your study docs or become a

Course Hero member to access this document