Unformatted text preview: STOCHASTIC EPIDEMIC MODELS
AND
THEIR STATISTICAL ANALYSIS
Hakan ANDERSSON and Tom BRITTON
February 2000 Hakan Andersson
Group Financial Risk Control
Swedbank
SE105 34 Stockholm
Sweden
email: [email protected]
Tom Britton
Department of Mathematics
Uppsala University
P.O. Box 480
SE751 06 Uppsala
Sweden
email: [email protected] Preface
The present lecture notes describe stochastic epidemic models and methods for their
statistical analysis. Our aim is to present ideas for such models, and methods for their
analysis along the way we make practical use of several probabilistic and statistical
techniques. This will be done without focusing on any speci c disease, and instead
rigorously analyzing rather simple models. The reader of these lecture notes could
thus have a twofold purpose in mind: to learn about epidemic models and their
statistical analysis, and/or to learn and apply techniques in probability and statistics.
The lecture notes require an early graduate level knowledge of probability and
statistics. They introduce several techniques which might be new to students, but our
intention is to present these keeping the technical level at a minimum. Techniques that
are explained and applied in the lecture notes are, for example: coupling, di usion
approximation, random graphs, likelihood theory for counting processes, martingales,
the EMalgorithm and MCMC methods. The aim is to introduce and apply these
techniques, thus hopefully motivating their further theoretical treatment. A few
sections, mainly in Chapter 5, assume some knowledge of weak convergence we hope
that readers not familiar with this theory can understand the these parts at a heuristic
level.
The text is divided into two distinct but related parts: modelling and estimation.
The rst part covers stochastic models and their properties, often assuming a large
community in which the disease is spread. The second part deals with statistical
questions, that is, what can be said about the model and its parameters, given that
an epidemic outbreak has been observed. The second part uses results from the rst
part, and is hence not suited for reading without having read the rst part.
The lecture notes are selfinstructive and may be read by anyone interested in
the area. They are suited for a onesemester course of approximately 15 twohour
lectures. Most chapters may be presented in one such lecture. Chapters that need
somewhat longer treatment are Chapters 5, 6, and 8. Each chapter ends with a few
exercises giving extensions of the theory presented in the text.
These notes were written during the spring term 1999 when the authors gave a
joint graduate course in the Departments of Mathematics at Stockholm and Uppsala
Universities. We thank the participants in the course: Anders Bjorkstrom, Nestor
Correia, Maria Deijfen, Peter Grenholm, Annika Gunnerhed, Allan Gut, Jemila Seid Hamid, Stefan Israelsson, Kristi Kuljus, Johan Lindback and Habte Tewoldeberhan
for their constructive criticisms of the manuscript as well as for pointing out several
errors. The lecture notes were rewritten taking their comments into account during
the second half 1999. We are also grateful to the referees of Springer for careful
reading and numerous constructive suggestions on how to clearify bits and pieces as
well as language improvement. Needless to say, the authors are responsible for any
remaining errors. Tom Britton gratefully acknowledges support from the Swedish
Natural Science Research Council.
Hakan Andersson, Stockholm
Tom Britton, Uppsala February, 2000 Contents
Preface
List of contents
Part I: STOCHASTIC MODELLING
Chapter 1. Introduction 1.1. Stochastic versus deterministic models
1.2. A simple epidemic model: The ReedFrost model
1.3. Stochastic epidemics in large communities
1.4. History of epidemic modelling
Exercises v
vii
1
3
3
4
6
7
9 Chapter 2. The standard SIR epidemic model 11
11
12
14
15
18 Chapter 3. Coupling methods 19
19
22
22
26 Chapter 4. The threshold limit theorem 27
27
28 2.1. De nition of the model
2.2. The Sellke construction
2.3. The Markovian case
2.4. Exact results
Exercises 3.1. First examples
3.2. De nition of coupling
3.3. Applications to epidemics
Exercises 4.1. The imbedded process
4.2. Preliminary convergence results 4.3. The case mn=n ! > 0 as n ! 1
4.4. The case mn = m for all n
4.5. Duration of the Markovian SIR epidemic
Exercises 30
32
34
36 Chapter 5. Density dependent jump Markov processes 39
39
40
41
43
46
48 Chapter 6. Multitype epidemics 51
51
53
55
56
61 Chapter 7. Epidemics and graphs 63
64
65
66
70
72 Chapter 8. Models for endemic diseases 73
73
77
83 5.1. An example: A simple birth and death process
5.2. The general model
5.3. The Law of Large Numbers
5.4. The Central Limit Theorem
5.5. Applications to epidemic models
Exercises 6.1. The standard SIR multitype epidemic model
6.2. Large population limits
6.3. Household model
6.4. Comparing equal and varying susceptibility
Exercises
7.1. Random graph interpretation
7.2. Constant infectious period
7.3. Epidemics and social networks
7.4. The twodimensional lattice
Exercises 8.1. The SIR model with demography
8.2. The SIS model
Exercises Part II: ESTIMATION
Chapter 9. Complete observation of the epidemic process
9.1. Martingales and loglikelihoods of counting processes
9.2. MLestimation for the standard SIR epidemic
Exercises 85
87
87
91
94 Chapter 10. Estimation in partially observed epidemics 99
99
103
105 Chapter 11. Markov Chain Monte Carlo methods 107
107
109
111
113
114 Chapter 12. Vaccination 12.1. Estimating vaccination policies based on one epidemic
12.2. Estimating vaccination policies for endemic diseases
12.3. Estimation of vaccine e cacy
Exercises 117
117
120
123
124 References 129 10.1. Estimation based on martingale methods
10.2. Estimation based on the EMalgorithm
Exercises
11.1. Description of the techniques
11.2. Important examples
11.3. Practical implementation issues
11.4. Bayesian inference for epidemics
Exercises Part I STOCHASTIC MODELLING
In the rst part of these lecture notes, we present stochastic models for the spread
of an infectious disease in some given population. We stress that the models aim at
describing the spread of viral or bacterial infections with a persontoperson transmission mechanism. Diseases that belong to this category are, for example, childhood
diseases (measles, chickenpox, mumps, rubella, ...), STD's (sexually transmitted diseases) and less severe diseases such as in uenza and the common cold. Diseases
excluded from these models are, for example, hostvector and parasitic infections although they have some features in common. Modi cations of epidemic models can
also be used in applications in the social sciences, modelling for example the spread of
knowledge or rumours (see e.g. Daley and Kendall, 1965, and Maki and Thompson,
1973). In this context being susceptible corresponds to being an ignorant (not having
some speci c information) and infectious corresponds to being a spreader (spreading the information to other individuals). One possible rumour model is where the
spreader continues to spread the rumour forever which corresponds to the SI model
(e.g. Exercise 2.3).
Two main features make the modelling of infectious diseases di erent from other
types of disease. The rst and perhaps most important reason is that strong dependencies are naturally present: whether or not an individual becomes infected depends
strongly on the status of other individuals in its vicinity. In Section 1.2 we shall
see how this complicates the stochastic analysis, even for a very small group of individuals. For nontransmittable diseases this is usually not the case. Occurences of
such diseases are usually modelled using survival analysis, in which hazard functions,
describing the agedependent risk for an individual to fall ill, are speci ed proportional hazards (Cox, 1972) is an important example. These hazards may be speci c
to each individual and even contain a random parameter which may be correlated
between related individuals as in frailty models (e.g. Hougaard, 1995). Even then the
diseasetimes for di erent individuals are de ned to be independent given the hazard
functions, contrary to the case for infectious diseases. A thorough analysis of such
models and their statistical analysis, including many examples, is given in Andersen
et al. (1993).
The second feature, which a ects the statistical analysis, is that most often the
epidemic process is only partly observed. For example it is rarely known by whom an
infected individual was infected, nor the time at which the individual was infected and
during what period she was infectious (here and in the sequel we denote individuals
by `she'). Problems related to this property are treated in Part II of the lecture notes.
In the introductory chapter we present the simplest stochastic model, discuss advantages and disadvantages of stochastic models as opposed to deterministic models
and give a brief overview of the history of epidemic models and give some general
references on epidemic modelling. In Chapter 2 we de ne what we call `The standard
SIR epidemic model' (SIR epidemic for short), which serves as the canonical model in
the text. The abbreviation SIR is short for `susceptible', `infectious' and `removed'.
SIR models assume that individuals are at rst susceptible, if they get infected they
remain infectious for some time, after which they recover and become immune. An
individual is said to be removed if she has recovered and is immune or dies, and plays
no further role in the epidemic. A construction of the SIR model is given in Chapter
2, and exact results concerning the nal size (i.e. the total number infected by the
disease) and `the total area under the trajectory of infectives' are derived. In Chapter
3 the Coupling Method is presented, a method which enables the SIR epidemic model
to be approximated by a branching process during its initial stage, assuming a large
population. First, the main idea of coupling is given, and then it is applied to the SIR
epidemic model. In Chapter 4 the important threshold limit theorem, concerning the
nal size of the SIR epidemic in a large population, is stated and proved. In Chapter
5 we are concerned with approximations of the entire epidemic process and not only
its nal size. This relies on di usion theory for the special case when the epidemic
model is Markovian.
Chapters 6 and 7 extend the standard SIR epidemic to the case where the population is not completely homogeneous. Chapter 6 is devoted to socalled multitype
models, where individuals are of di erent `types' and these types di er, for example
in terms of susceptibility. In Chapter 6 we also model the spread of disease when
the community is built up of households, assuming di erent transmission rates between individuals depending on whether they belong to the same household or not.
In Chapter 7 we characterise an epidemic model in terms of random graphs. This
technique is introduced to model heterogeneities caused by e.g. social structures or
geographic location. The last chapter of Part I, Chapter 8, concerns models for endemic diseases, i.e. diseases which are present in the population over a long period
of time. The models of this chapter are di erent in that new susceptible individuals
have to enter the population for endemicity to occur. This can be achieved by assuming that old individuals die and new (susceptible) individuals are born into the
population, i.e. a dynamic population, or that individuals become susceptible after
recovery rather than immune. The latter type of model is, for obvious reasons, called
an SIS model. 1 Introduction
1.1 Stochastic versus deterministic models
These lecture notes focus on stochastic models and their statistical analysis. Deterministic epidemic models have perhaps received more attention in the literature
(see also Section 1.4 about the history of epidemic models). For example, the monograph by Anderson and May (1991), probably the most cited reference in the recent
literature on epidemic models, treats almost exclusively deterministic models.
The main advantage of deterministic models lies in their simpler (but not necessarily simple!) analysis. For a stochastic epidemic model to be mathematically
manageable it has to be quite simple, and thus not entirely realistic. Deterministic
models can be more complex, yet still possible to analyse, at least when numerical
solutions are adequate.
Several reasons suggest that stochastic models are to be preferred when their
analysis is possible. First, the most natural way to describe the spread of disease is
stochastic one de nes the probability of disease transmission between two individuals,
rather than stating certainly whether or not transmission will occur. Deterministic
models describe the spread under the assumption of mass action, relying on the law
of large numbers. In fact, an important part in stochastic modelling lies in showing
what deterministic model the stochastic model converges to, when the community
size becomes large. Secondly, there are phenomena which are genuinely stochastic
and do not satisfy a law of large numbers. For example, in a large community, many
models will lead either to a minor outbreak infecting only few individuals, or else to
a major outbreak infecting a more or less deterministic proportion of the community
(see Section 1.3 below). To calculate the probability of the two events is only possible
in a stochastic setting. Further, when considering extinction of endemic diseases
(see Chapter 8), this can only be analysed with stochastic models, since extinction
occurs when the epidemic process deviates from the expected level. A third important
advantage concerns estimation. Knowledge about uncertainty in estimates requires a
stochastic model, and an estimate is not of much use without some knowledge of its
uncertainty.
To conclude, stochastic models are to be preferred when their analysis is possible
otherwise deterministic models should be used. Deterministic models can also serve
as introductory models when studying new phenomena. We see no con ict between
the two and believe that both types of models play an important role in better
understanding the mechanisms of disease spread. In these lecture notes we focus
on stochastic models, for deterministic models we refer the reader to Anderson and
May (1991), and Bailey (1975) { who also treats stochastic models. 1.2 A simple epidemic model: The ReedFrost model
Before describing some general characteristics of the models to be studied, we present
the simplest possible epidemic model for the spread of an infectious disease, say the
common cold, in a small group of individuals. The model is called the ReedFrost
model named after its founders (see Section 1.4 on the history of Epidemic modelling)
and is a socalled chainbinomial model (also treated in Section 10.2). The model is
an SIR epidemic model which means that individuals are at rst susceptible to the
disease. If an individual becomes infected, she will rst be infectious, and called an
infective, for some time and then recover and become immune, a state called removed.
An individual is said to be infected if she is either infectious or removed, i.e. no longer
susceptible.
The model is usually speci ed using discrete time dynamics. In the discrete time
scenario it is natural to think of the infectious period as being short and preceded
by a longer latent period. Then new infections will occur in generations, these generations being separated by the latent period as the discrete time unit. The event
probabilities in a given generation depend only on the state of the epidemic in the
previous generation (i.e. a Markov model), and these events are speci ed by certain
binomial probabilities. If we let Xj and Yj denote the number of susceptibles and
infectives respectively at time (or generation) j , the chainbinomial ReedFrost model
has conditional probabilities P(Yj+1 = yj+1jX0 = x0 Y0 = y0 : : : Xj = xj Yj = yj )
= P(Yj+1 = yj+1jXj = xj Yj = yj ) xj (1 ; qy )y +1 (qy )x ;y +1
yj+1
and Xj+1 = Xj ; Yj+1. This means that a given susceptible of generation j remains
susceptible in the next generation if she escapes infection from all infectives of generation j these events are independent each occurring with probability q. Further,
di erent susceptibles in a given generation become infected independently of one another and infectious individuals are removed in the next generation. Given the initial
state X0 = n and Y0 = m the probability of the complete chain y1 : : : yk yk+1 = 0
is obtained by conditioning sequentially and using the Markov property of the chain.
If we let xj+1 = xj ; yj+1 we have
P(Y1 = y1 : : : Yk = yk Yk+1 = 0jX0 = n Y0 = m)
= P(Y1 = y1jX0 = n Y0 = m)
P(Yk+1 = 0jXk = xk Yk = yk )
xk (1 ; qy )0 (qy )x :
n
= y (1 ; qm)y1 (qm )n;y1
0
= 1 j j j j j k k k From a mathematical point of view, the spread of the disease does not have to
occur in generations. The necessary assumption is that each individual who becomes infected has `infectious contact' with any other given individual (meaning that the
individual becomes infected if she is still susceptible) with probability p = 1 ; q, and
all such contacts occur independently. Of course, the notion of generation becomes
less meaningful, unless there is a long latency period prior to the infectious period.
Still the formula can be used when computing the total number of infected individuals
P
Z = j 1 Yj , as we shall see (note that the initially infectives m are excluded). The
quantity Z is also known as the nal size of the epidemic.
To compute P(Z = zjX0 = n Y0 = m) we simply sum the probabilities of all
P
chains for which jyj = j 1 yj = z. From the de ning equations it is seen that Yj = 0
implies that Yj+1 = 0. This means that new infections may only occur whenever some
individuals are infectious, which implies that the length of a chain cannot be longer
than the total number infected, making the number of possible chains nite. The
probability function for the nal number infected is hence P(Z = zjX0 = n Y0 = m) = X y:jyj=z P(Y1 = y1 : : : Yk = yk Yk+1 = 0jX0 = n Y0 = m): Below we calculate the probability function explicitly for Z , the nal number
infected among those initially susceptible, when Y0 = m = 1 and X0 = n = 1 2, and
3. Since m = 1 in all cases we omit it in the conditioning notation. We start with
n = 1: P(Z = 0jX0 = 1) = P(Y1 = 0jX0 = 1) = q
P(Z = 1jX0 = 1) = P(Y1 = 1 Y2 = 0jX0 = 1) = p: For n = 2 we have P(Z = 0jX0 = 2) = P(Y1 = 0jX0 = 2) = q2
P(Z = 1jX0 = 2) = P(Y1 = 1 Y2 = 0jX0 = 2) = 2 pq q
1
P(Z = 2jX0 = 2) = P(Y1 = 2 Y2 = 0jX0 = 2) + P(Y1 = 1 Y2 = 1 Y3 = 0jX0 = 2) = p2 + 2 pq p:
1
For n = 3 we compute only the rst three probabilities, the nal probability may be
derived from the complement. P(Z = 0jX0 = 3) = P(Y1 = 0jX0 = 3) = q3
P(Z = 1jX0 = 3) = P(Y1 = 1 Y2 = 0jX0 = 3) = 3 pq2 q2
1
P(Z = 2jX0 = 3) = P(Y1 = 2 Y2 = 0jX0 = 3) + P(Y1 = 1 Y2 = 1 Y3 = 0jX0 = 3)
= 3 p2 q q2 + 3 pq2
2
1 2 pq q:
1 Needless to say, these probabilities become very complicated to compute even for
moderate sized groups (n larger than 10 say). The ReedFrost model is a special case
of the standard SIR epidemic model presented in Chapter 2, in which the length of the
infectious period is deterministic (implying that contacts between pairs of individuals
occur independently). As for the model of Chapter 2, we implicitly assume that the
population is homogeneous and that individuals mix homogeneously. In Section 2.4
we derive the nal size probabilities in a more coherent way for the more general
model. 1.3 Stochastic epidemics in large communities
The previous section illustrates the need for approximation methods when considering large communities. In fact, the recursive formulas for the nal size presented
in Section 2.4 are numerically unstable and cannot be applied to obtain numerical
solutions when the number of susceptibles n exceeds 50100.
The main insight from such large population approximations is the threshold limit
theorem (Chapter 4). Loosely speaking, this theorem states that, as n ! 1 one
of two possible scenarios can occur: either a) only few individuals will ever become
infected, or else b) a more or less deterministic positive proportion of the susceptibles,
with some Gaussian noise of smaller order, will have been infected by the end of the
epidemic. The latter scenario is referred to as a large or major outbreak. Much
work has been carried out to state versions of this theorem for more and more general
models, and in particular to derive the probabilities of each of the two scenarios as well
as nding the deterministic proportion infected in case of a large outbreak. Another
important task is to nd out for which parameter values the asymptotic probability
of a major outbreak is 0. The parameter R0 , called the basic reproduction number,
plays a crucial role in this context. The parameter R0 , a function of the model, is
the average number of new infections caused by a `typical' infective during the early
stages of the epidemic. The threshold limit theorem states that a major outbreak in
a large population is possible if and only if R0 > 1. From a statistical perspective
only major outbreaks are considered since minor outbreaks would rarely be observed
at all. Questions of interest are then to derive parameter estimates, including their
uncertainty, having observed the number of infected. The main motivation for such
estimation lies in the control of disease spread. If for example a vaccine is available,
a question of practical relevance is to estimate what proportion of the susceptible
population it is necessary to vaccinate, in order to avoid future outbreaks, a state
known as `herd immunity'. This type of question is considered in Chapter 12.
Sometimes not only the nal state of the epidemic is of interest, but rather the
entire epidemic process. It is then possible to approximate the epidemic process
using di usion theory (see Chapter 5). This is, for example, the case in the statistical
setting where data is collected continuously from an ongoing epidemic, a topic treated in Chapter 9. 1.4 History of epidemic modelling
This section will give only a short selected presentation of some early mathematical
models for the spread of infectious diseases. We have no intention of covering all
important events in the development of this subject. For a more detailed historical
overview we refer the reader to the books by Bailey (1975) and Anderson and May
(1991).
One of the earliest studies of the nonlinearity of an epidemic model was contained
in a paper by Hamer (1906). Hamer postulated that the probability of a new infection
in the next discrete timestep was proportional to the product of the number of
susceptibles and the number of infectives. Ross (1908) translated this `mass action
principle' to the continuous time setting. The rst complete mathematical model
for the spread of an infectious disease which received attention in the literature was
a deterministic model of Kermack and McKendrick (1927). This model, known as
the deterministic general epidemic, is now presented (note that the parametrization is
di erent from the models treated later in the text). Let x(t), y(t) and z(t) respectively
denote the number of susceptibles, infectives and removed (=recovered and immune,
or dead) individuals. The population is considered to be of xed size n (i.e. x(t) +
y(t)+ z(t) = n for all t). The model is then de ned by the following set of di erential
equations
x0 (t) = ; x(t)y(t)
y0(t) = x(t)y(t) ; y(t)
z0 (t) = y(t)
(1.1)
with initial state (x(0) y(0) z(0)) = (x0 y0 0). In Figure 1.1 we show how (x y z)
varies over time for = 1:9 and = 1 where the bottom region corresponds to x(t),
the middle region to y(t) and the top region to z(t) (which is empty at t = 0 since
z(0) = 0). Note that their sum remains constant over time since x(t)+ y(t)+ z(t) = n.
The factor x(t)y(t) in (1.1) is the crucial nonlinear term, indicating that infections
occur at high rate only when there are many susceptibles and infectives. From the
rst two equations it follows that dx=dz = ; x, where = = . So x(t) = x0 e; z(t)
and hence y(t) = n ; z(t) ; x(t) = n ; z(t) ; x0 e; z(t) . Kermack and McKendrick
(1927) showed that y is decreasing unless y0( x0 ; ) > 0 or equivalently x0 > 1= .
In this latter case there will be a growing epidemic. This observation is known as
the threshold result, i.e. that completely di erent behaviour will occur depending
on whether x0 exceeds 1= or not. Another important observation was that z(t) !
z1 < n as t ! 1, where z1 is the solution of z = n ; x0 e; z . This leads to the very
important property that not everyone becomes infected!
The rst stochastic model was proposed by McKendrick (1926). This model, a 100 80 60 40 20 2.5 5 7.5 10 12.5 15 Figure 1.1: Plot of (x(t) y(t) z(t)) de ned by (1.1) with = 1:9, = 1, x0 = 100 and
y0 = 1. For each t, x(t) corresponds to the lower segment, y(t) to the midsegment
and z(t) to the topsegment. (N.B. x(t) + y(t) + z(t) = 101 for all t.)
stochastic continuoustime version of the deterministic model of Kermack and McKendrick (1927), did not receive much attention. A model which attracted more
attention at the time, even though it was never published, was the chainbinomial
model of Reed and Frost (presented in Section 1.2) put forward in a series of lectures
in 1928.
It was not until the late 1940's, when Bartlett (1949) studied the stochastic version
of the KermackMcKendrick model, that stochastic continuoustime epidemic models
began to be analysed more extensively. Since then, the e ort put into modelling
infectious diseases has more or less exploded. In the rest of the lecture notes a
selection of these methods will be presented.
The list of references covering epidemic models and their analysis can of course
be made long. Here we only mention a few central texts. The rst pioneering monograph was written by Bailey in 1957 but has since then been reprinted in a second
edition (Bailey, 1975). This book covers both stochastic and deterministic models as
well as statistical inference with numerous applications to real data. Becker (1989) is
mainly concerned with statistical analysis of infectious diseases. Gabriel et al. (1990)
is a collection of papers, on stochastic modelling and some statistical inference, that
were written for a workshop on stochastic epidemic modelling in Luminy, France.
The book by Anderson and May (1991) mentioned previously is probably the book
which has received the most attention, together with Bailey (1975). Anderson and May (1991) model the spread of disease for several di erent situations and give many
practical applications. The book is also concerned with estimation however, both
modelling and inference are deterministic. A thematic semester at Isaac Newton Institute, Cambridge, resulted in three collections of papers (Mollison, 1995, Isham and
Medley, 1996, and Grenfell and Dobson, 1996), covering topics in stochastic modelling
and statistical analysis of epidemics (i.e. its propagation in a community), human
infectious diseases (i.e. the infection process inside the body), and animal diseases,
respectively. Very recently, two new monographs have been published. The rst one
is written by Daley and Gani (1999) who have a long experience in stochastic modelling of the spread of disease. This book focuses on stochastic modelling but contains
also statistical inference and deterministic modelling, as well as several historical remarks. Diekmann and Heesterbeek (200?), nally, are concerned with mathematical
epidemiology of infectious diseases and also apply their methods to real data. Exercises 1.1. Compute P(Z = 0jX0 = 10 Y0 = 1), P(Z = 1jX0 = 10 Y0 = 1), P(Z = 2jX0 =
10 Y0 = 1) and P(Z = 3jX0 = 10 Y0 = 1) for the ReedFrost model. (Hint: Compute the probability for each epidemic chain giving Z = k separately and then add these
probabilities.) 1.2. Show that y(t) ! 0 as t ! 1 for the model of Kermack and McKendrick given
by the di erential equations (1.1). 1.3. Consider again the KermackMcKendrick model and assume x0 > 1= . Using the formula y(t) = n ; z(t) ; x0 e; z(t) , show that the maximum number of infectives
occurs exactly at the time t when x(t) = 1= . 2 The standard SIR epidemic model
In this chapter we present a simple model for the spread of an infectious disease.
Several simplifying assumptions are made. In particular, the population is assumed to
be closed, homogeneous and homogeneously mixing. Also, the e ects of latent periods,
change in behaviour, time varying infectivity and temporary or partial immunity are
not taken into account. In later chapters we shall indicate ways of handling some of
these complicating features of a reallife epidemic. 2.1 De nition of the model
We assume that initially there are m infectious individuals (that have just become
infected) and n susceptible individuals. The infectious periods of di erent infectives
are independent and identically distributed according to some random variable I ,
having an arbitrary but speci ed distribution. During her infectious period an infective makes contacts with a given individual at the time points of a time homogeneous
Poisson process with intensity =n. If a contacted individual is still susceptible, then
she becomes infectious and is immediately able to infect other individuals. An individual is considered `removed' once her infectious period has terminated, and is
then immune to new infections, playing no further part in the epidemic spread. The
epidemic ceases as soon as there are no more infectious individuals present in the
population. All Poisson processes are assumed to be independent of each other they
are also independent of the infectious periods.
We call this model the standard SIR epidemic model, the letters S, I, R standing
for the terms `susceptible', `infectious' and `removed', respectively. Following Ball
(1995), we denote the process by En m( I ). Also denote the mean and the variance
of the infectious period I by and 2 , respectively. The rate of contacting a given
individual is set to =n in order to keep the rate at which a given infective makes
contact with other (initially susceptible) individuals constant (= ), independently of
the population size. The special case where the infectious period has an exponential
distribution will be discussed in some detail further on.
The basic reproduction number and the nal epidemic size, already encountered
in the Introduction, are two extremely important epidemiological quantities. The
nal size of the epidemic, Z , is simply de ned as the number of initially susceptible
individuals that ultimately become infected. Thus Z is a nite random variable taking
values between 0 and n. In Section 2.4 we shall derive a linear system of equations
for the distribution of the nal epidemic size.
The basic reproduction number, R0 , is a little more di cult to describe. For
this simple model R0 is conveniently de ned as the expected number of infections
generated by one infectious individual in a large susceptible population. For the model presented above R0 = since is the average length of the infectious period
and during the infectious period an infectious individual has contact with initially
susceptible individuals at rate . The branching approximation of Section 3.3 will
in a rigorous way give the basic reproduction number the interpretation of a critical
parameter indicating whether a large outbreak is possible or not. For epidemic models
with di erent types of heterogeneities, it is not always obvious how R0 should be
de ned. We shall return to this problem in Chapters 6 and 7. 2.2 The Sellke construction
The following alternative elegant construction of the standard SIR epidemic model
is based on Sellke (1983). We keep track of the total `infection pressure' generated
by the infectious individuals. Each susceptible individual is associated with a critical
level of `exposure to infection', and as soon as the infection pressure reaches this level,
the susceptible becomes infected. We call this level the threshold of the individual.
This purely mathematical construction does not re ect any properties of a reallife
epidemic but it will serve as an important tool in the derivation of several results in
later chapters.
Label the initial infectives ;(m ; 1) ;(m ; 2) : : : 0 and the initial susceptibles
1 2 : : : , n. Let I;(m;1) I;(m;2) : : : In be independent and identically distributed
random variables, each distributed according to I . Also, let Q1 Q2 : : : Qn be an
independent sequence of independent and identically distributed exponential random
variables, having mean 1. These are the individual thresholds. For i = ;(m ;
1) ;(m ; 2) : : : 0, the initial infective labelled i remains infectious for a time Ii and
is then removed. Denote by Y (t) the number of infectives at time t, and let A(t) = n Zt
0 Y (u) du (2.1) be the total infection pressure exerted on a given susceptible up to time t. Note
that in A(t) the infectives are weighted according to their infectious periods. For
i = 1 2 : : : n, the susceptible labelled i becomes infected when A(t) reaches Qi .
The j th susceptible who becomes infected (not necessarily the susceptible labelled j !)
remains infectious for a time Ij and is then removed. The epidemic ceases when there
are no more infectives present.
In Figure 2.1 we have plotted the total infection pressure A(t) against t for an
epidemic starting with one infectious individual (m = 1). On the yaxis we have
indicated the smallest individual thresholds (Q(i) denotes the ith order statistic) and
horizontally the corresponding infectious period translated in time to the instant when
the individual becomes infected. Note that the slope of A(t) is proportional to the
number of infectious periods covering the time point t (i.e. the number of infectives
Y (t)!) as it should be according to the de nition of A given in (2.1). 6 Q(5)
Q(4)
;
; Q(3)
Q(2)
Q(1) ;
;
;
;  t Figure 2.1: A typical realisation of the total infection pressure with m = 1 initially
infectious individual. Note that the infection pressure never reaches Q(4) so the epidemic stops and the nal size is Z = 3. Let us check that this construction gives a process equivalent to the standard SIR
epidemic. The infectious periods follow the correct distribution and, if Y (t) = y and
the individual labelled i is still susceptible at time t, then she will become infected
during (t t + t) with probability y t=n + o( t). Indeed, owing to the lackofmemory property of the individual threshold Qi , the probability of the complementary
event is given by P(Qi > A(t + t) j Qi > A(t)) = P(Qi > A(t + t) ; A(t))
=
=
= e; A(t+ t);A(t)]
exp ; n y t + o( t)
1 ; y t=n + o( t) where the third equality follows from the de nition of A(t), since no infections will
occur in a small enough time increment, making Y (u) constant and equal to y in
(t t + t). In the original formulation of the model, our susceptible individual is
contacted according to the superposition of y independent Poisson processes, each of
intensity =n, giving rise to the same infection probability. 2.3 The Markovian case
Consider the standard SIR model En m( I ) and denote by X (t) and Y (t) the number
of susceptibles and the number of infectives, respectively, at time t. The process
(X Y ) = f(X (t) Y (t)) t 0g will be a Markov process if and only if the infectious
period has the lackofmemory property. Assume therefore that I is exponentially
distributed with intensity . Then the process (X Y ) is governed by the following
transition table:
from to
at rate
(i j ) (i ; 1 j + 1) ij=n
(i j ; 1)
j
which follows immediately from the de nition of the model. This model, which originated with Bartlett (1949), is known as the general stochastic epidemic, a name
that now seems inappropriate, since the model has over the years been generalized
in an innumerable number of ways. The assumption of an exponentially distributed
infectious period is certainly not epidemiologically motivated, although with this assumption the mathematical analysis becomes much simpler. Notably, using Markov
process theory we can obtain deterministic and di usion approximations for the whole
trajectory, which are valid for large population sizes (see Chapter 5). This is usually
hard to achieve when the stochastic process is not Markovian. On the other hand,
the modern probabilistic methods used in this text to derive branching process approximations (Section 3.3) together with results for the nal epidemic size (Section 2.4 and Chapter 4) do not rely on the Markov property, but can be carried out for
all instances of the standard SIR epidemic de ned in Section 2.1. 2.4 Exact results
Consider again the standard SIR epidemic En m ( I ). We will derive a triangular
n
linear system of equations for P n = (P0n P1n : : : Pn ), where Pkn is the probability
that k of the initial susceptibles are ultimately infected.
R
Let Z be the nal size of the epidemic, and let A = A(1) = n 01 Y (u) du be the
total pressure of the epidemic. Recall the Sellke construction above. Both the nal
size and the total pressure can be expressed in terms of the infectious periods and
the individual thresholds. First, 9
8
i
<
X=
Ij
Z = min :i : Q(i+1) > n
j =;(m;1) where Q(1) Q(2) : : : Q(n) are the order statistics of Q1 Q2 : : : Qn, since the epidemic
stops as soon as the infection pressure generated by the previously infected individuals
is insu cient to infect any more susceptibles. Also, A= n Z
X j =;(m;1) Ij which is just another way of writing A(1).
It is thus clear that the nal size and the total pressure are intimately related. In
fact, we have the following Wald's identity for epidemics (Ball, 1986): Lemma 2.1 Consider the standard SIR epidemic En m( I ) and let A be as above. Then E e; A ( =n)Z +m = 1
0
where ( ) = E exp(; I )] is the Laplace transform of I .
Proof. To prove the identity, we note that ; ( =n) n+m = 20
13
n
X A5
E 4exp @; n
Ij
j =;(m;1)
"
n
X !!# = E exp ; =Ee ( =n) h ; A; A+ n i j =Z +1
n;Z Ij where the last identity follows since the variables Ij , j Z + 1, are independent of
both Z and A.
n
We are now in position to derive the system of equations for P n = (P0n : : : Pn ).
For each k 1, de ne K to be the set f1 2 : : : kg also, let 0 be the empty set.
Recall that the initial susceptibles are labelled 1 2 : : : n. Pkn is the probability that
k initial susceptibles are infected in the En m( I ) epidemic, and PK is the probability
;n P n . n
that precisely the set K is infected. By symmetry, Pkn = k K
Now x k and choose ` such that 0 k ` n, implying that K L N.
We use the notion of infection pressure to compare an epidemic within N with a
subepidemic within L. The event that an epidemic within N infects precisely the
set K is the same as the event that a subepidemic within L infects precisely K, and
that these k new infectives, together with the m initial infectives, fail to infect any
of the individuals in the set N n L. We know from the Sellke construction that the
probability of avoiding the infection is given by exp(;a), given that the subepidemic
has generated the infection pressure A` = a. It follows that
n
`
PK = PKE exp(;A` (n ; `)) j Z ` = k]
where Z ` is the nal size of the subepidemic. This equation is equivalent to
;` P n
k k = P `E exp(;A` (n ; `)) j Z ` = k]:
;n
(2.2)
k
k Now let us use Wald's identity (Lemma 2.1) applied to the subepidemic and with
= n ; ` to get
h
i
E e;A (n;`) ( (n ; `)=n)]Z +m = 1
or, conditioning on the nal size Z `,
` ` `
X Pk`E ;exp(;A`(n ; `)) j Z ` = k ( (n ; `)=n)]k+m k=0 = 1: (2.3) Equations (2.2) and (2.3) immediately give us ;` P n `
X k
;n ( (nk; `)=n)]k+m = 1:
k=0 k
;` ;n = ;n;k ;n , we arrive at the following result:
Finally, noting that
k k `;k ` Theorem 2.2 Consider the standard SIR epidemic En m( I ). Denote by Pkn the
probability that the nal size of the epidemic is equal to k, 0 k
`
X n;k n
` ; k Pk
k=0 ( (n ; `)=n)]k+m = n
` n. Then 0 ` n: (2.4) P[Z+m=j] 0.2 0.15 0.1 0.05 10 20 30 40 50 j Figure 2.2: The exact distribution of Z + m for m = 1, n = 50, = 1:5 and I
i.e. the infectious period is constant and equal to 1. 1, Note that, since the system of equations is triangular, the nalsizeprobabilities
can be solved recursively. The proof of the theorem depends on the infection pressure
generated by the various infectives, rather than the actual infectious periods. This
indicates that we may allow for latent periods and timedependent infection rates in
the modelling, and still get the same type of nal size results, as long as the required
infection pressure can be calculated. That this is indeed the case will become even
clearer in Section 7.1 where the concept of random graphs is used to describe the
epidemic ow through a homogeneous and uniformly mixing population. In Figure
n
2.2 the probabilities P0n P1n : : : Pn , are plotted for a speci c choice of community
and parameter values (the gure actually shows the plot of P (Z + m = k) for different values of k but since m = 1 this corresponds to Pkn;1). When the infectious
period is constant, I c say, an infectious individual infects susceptible individuals independently (with probability p = 1 ; e; c) the distribution of the nal size is
then equivalent to the ReedFrost model de ned for discrete time dynamics in Section
1.2. It is seen in the gure that the distribution is bimodal: either a few individuals
are infected or else a fairly large number are infected. This qualitative behavior becomes more and more evident as n, the initial number of susceptibles, increases. A
mathematical proof ofthis is given by the threshold limit theorem of Chapter 4.
Finally we mention brie y the very elegant theory developed by Lefevre and Picard
in a series of papers (see e.g. Lefevre and Picard, 1990). They work with quite general
classes of stochastic epidemic models, both singletype and multitype (cf. Chapter 6),
and derive e.g. equations for the nal size distribution and the total force of infection A(1), using a nonstandard family of polynomials initially introduced by Gontcharo
(1937). However, we have decided not to include any presentation of their work,
since the approach is rather algebraic in nature, and hardly increases the intuitive
understanding of the models treated here. Exercises
2.1. Compute P0n, P1n and P2n numerically using the recursive formula given by (2.4)
assuming n = 10, m = 1, = 2 and that the infectious period I is:
a) exponentially distributed (the Markovian case) with mean 1 time unit.
b) ;(2 2)distributed (i.e. with mean 1).
c) constant and equal to 1. 2.2. Assume m = 1, = 1 and that the infectious period is constant with mean 1 time unit. Use your favorite computer and mathematical software to see when the
recursive formula of Section 2.4 breaks down numerically by computing and plotting
n
P1n : : : Pn for n = 10 20 30 : : : (for most computers negative probabilities start
appearing around n 70 ; 90). 2.3. Modify the standard SIR epidemic so that the infectious period is = 1. (This model is denoted the SI model since individuals never get removed. It also has applications in sociology for the spread of rumours/knowledge. Infectious then corresponds
to knowing, and spreading, the rumour.) For this model X (t) + Y (t) = n + m for
all t. Assume m = 1. Describe the random process Y = fY (t) t 0g, of infectious
individuals. Calculate the expected waiting time until everyone in the population
becomes infected. (Hint: Consider the consecutive waiting times between infections.)
What happens with this expression when n gets large? 3 Coupling methods
Let us assume that we are interested in comparing two or more random elements
with each other. It is sometimes possible to construct versions of these random
elements on the same probability space, in such a way that the comparison suddenly
becomes easy (indeed, often trivial) to carry out. This procedure is called coupling,
the term referring to the fact that the random elements so constructed are often
highly dependent. The coupling method has found many important applications in
various elds of probability theory, including Markov processes, renewal processes and
Poisson approximation. The book by Lindvall (1992) provides a nice introduction to
the subject.
Here we introduce some classical coupling ideas by providing simple examples.
Then, after presenting the formal de nition, we describe some applications of the
coupling method to the standard SIR epidemic model En m ( I ). First, it is shown
that the number of infectious individuals in a large population initially behaves like
a branching process. By coupling En m ( I ) with a branching process, we justify the
approximation of the epidemic by the simpler and thoroughly analysed branching
process. This result indicates at the same time the signi cance of the basic reproduction number R0 . Second, by coupling two epidemics with di erent contact parameter
, we prove the intuitively obvious fact that the accumulated number of infected individuals at a given time grows (in a sense yet to be de ned) with , the infectiousness
of the disease. 3.1 First examples
Gambler's ruin problem Consider rst the standard gambler's ruin problem. An individual with an initial
capital of m units of money goes to a casino. He plays a series of independent games,
in each of which he wins one unit with probability and loses one unit with probability
1 ; . He continues until either his capital reaches n (n > m) or he goes bankrupt.
We wish to use coupling to prove that P ( ), the probability of reaching the capital n
given , is increasing in (as would be expected).
To this end, let U1 U2 : : : be independent and identically distributed random
variables, each uniformly distributed on the interval (0 1). Then, for a given , de ne
+1 if Ui
Y (i) = ;1 otherwise
i = 1 2 : : : , and
X
X ( ) = m + Y (i)
i=1 = 1 2 : : : . Then, since Y (i) is +1 with probability , the process X = fX ( ) =
1 2 : : : g, clearly describes the capital of the gambler. Note carefully that the same
set of uniform variables is used to construct an entire family of trajectories, indexed
by . Now, if < 0 then by construction Y (i) Y (i) for all i, so X ( ) X ( )
for all . This means that if a trajectory X ( ) reaches n before 0 then the same is
true for X ( ). Thus we have that P ( ) P ( 0), since this property does not depend
on the particular sample space chosen.
0 0 0 Stochastic ordering Let X and X 0 be realvalued random variables. We say that X is stochastically smaller
D
than X 0 and write X X 0 if
P(X a) P(X 0 a)
for all a, i.e. the probability that X exceeds an arbitrary level is smaller than the
probability that X 0 exceeds the same level.
When working with stochastically ordered random variables, the following simple
D
result can often be very helpful: if X and X 0 are random variables such that X X 0,
^^
^^
then there exists a coupling (X X 0) of X and X 0 such that X X 0. The proof goes
as follows. For a given distribution function G, de ne the generalized inverse G of
G by
G (u) = inf fx : G(x) ug
0 < u < 1:
Then G (U ) has distribution function G if U is uniformly distributed on (0 1). If
F and F 0 are the distribution functions of X and X 0, respectively, we have F F 0
^
by assumption. Thus F
F 0 , and we see that the variables X = F (U ) and
0 = F 0 (U ) provide us with the desired coupling.
^
X
Domination of birth and death processes In our next example, di erent birth and death processes are compared. Suppose that
X = fX (t) t 0g, and X 0 = fX 0(t) t 0g, are two birth and death processes on
the set of nonnegative integers. The process X has birth rates i and death rates i,
P(X (t + dt) ; X (t) = +1 j X (t) = i) = idt + o(dt)
P(X (t + dt) ; X (t) = ;1 j X (t) = i) = idt + o(dt)
X (0) = m, and likewise X 0 has birth rates 0i, death rates 0i and initial value m0. We
use coupling to show that if i 0i for all i 0 and i 0i for all i 1 then X (t)
is stochastically smaller than X 0(t) for all t (provided also m m0 ).
^^
De ne a bivariate process (X X 0) with initial value (m m0) and with the following
intensity table: from to
at rate
(i j ) (i + 1 j )
i
0
(i j + 1)
j
(i ; 1 j )
i
0
(i j ; 1)
j
(i i) (i + 1 i + 1) i
0; i
(i i + 1)
i
(i ; 1 i ; 1) 0i
(i ; 1 i)
i ; 0i :
This process is wellde ned since all the numbers describing the transition rates are
^
nonnegative by assumption. It is easily checked that the rst coordinate X follows
the same law as X . Likewise, the second coordinate is distributed as X 0. Moreover,
if the bivariate process starts above the diagonal i = j then it will stay above the
^
^
^
^
diagonal at all times. Hence we have found versions X and X 0 with X (t) X 0(t) for
D
all t in particular X (t) X 0 (t) for all t, since this latter property has nothing to do
with the sample space chosen.
Convergence of Markov chains As a nal illustration of the coupling method we show the classical result that an
irreducible aperiodic Markov chain with a nite mstate space approaches stationarity
as time grows, regardless of the initial distribution. Let X = (X (0) X (1) : : : ) be
such a Markov chain and denote its transition probability matrix by P and its initial
distribution by . Also, let = ( 1 : : : m ) be the unique strictly positive stationary
distribution, satisfying = P . We wish to prove that P(X ( ) = j ) ! j as ! 1,
for all xed states j , j = 1 : : : m.
We now give the classical coupling proof. Introduce a Markov chain X 0 = (X 0(0),
X 0(1) : : : ), independent of X , governed by the transition matrix P , and with ini^
tial distribution . This makes X 0 stationary. Then x a state j and de ne X =
^ (0) X (1) : : : ) by
^
(X
X
^
X ( ) = X 0(( )) if < T
if
T where T = minf 0 : X ( ) = X 0( )g:
^
Due to the (strong) Markov property, the processes X and X will be equally distributed. Thus
^
jP(X ( ) = j ) ; P(X 0( ) = j)j = jP(X ( ) = j ) ; P(X 0( ) = j )j
^
= jP(X ( ) = j T > ) ; P(X 0( ) = j T > )j P(T > ): It is easy to see that P(T > ) ! 0 as ! 1, i.e. that T is nite a.s. Indeed,
the bivariate process (X X 0 ) is irreducible and aperiodic, and T is the rst time this
process visits the diagonal. 3.2 De nition of coupling
Skills in coupling are acquired only by working through many examples. The actual
de nition is not very illuminating, but we give it here anyway for the sake of completeness. Given probability spaces ( F P) and ( 0 F 0 P0), denote the state space
by E E could be the set of nonnegative integers, the set of real numbers, the set of
sequences of real numbers, the space of rightcontinuous realvalued functions de ned
on 0 1), and so on.
By a coupling of the random elements X : ! E and X 0 : 0 ! E we mean a
^^
^^
probability space ( ^ F P) and a random element (X X 0) : ^ ! E 2 such that
D^
X=X and D^
X 0 = X 0: ^
In our rst example, the desired result was obtained by putting X = X and
^
X 0 = X . The second and third example also t nicely into our de nition, and so
^
does the last one if we put X 0 = X 0.
0 3.3 Applications to epidemics
Initial approximation Intuitively speaking, during the initial stages of an epidemic in a large population
we would expect that, with high probability, contacted individuals are susceptible, so
that the number of infectious individuals follows some kind of branching behaviour.
This branching approximation idea has a long history, see e.g. Bartlett (1955) and
Kendall (1956). Here, following Ball and Donnelly (1995), we use a coupling argument
to investigate how the approximation improves as the population size tends to in nity.
Let us rst de ne the branching process, and derive heuristically some initial
results concerning this process. At time t = 0 there exists a group of m ancestors
(that have just been born). The life spans of di erent individuals are independent
and identically distributed according to a random variable I . During her life span,
a given ancestor gives birth at the time points of a Poisson process with intensity .
Her children have independent and identically distributed life spans and themselves
give birth according to Poisson processes with intensity , and so on. We assume
that all Poisson processes are independent of each other they are also independent
of the life spans. Denote the resulting process by Em( I ). Jagers (1975) gives a thorough treatment of a class of continuous time branching processes containing the
above process as a special case.
Let fY (t) t 0g, be the number of individuals alive at time t, t 0, and denote
by D the number of o spring of a given individual. We wish to investigate the possible
extinction/explosion of Y (t) as t grows. First, since there are on the average mE (D)
individuals in the th generation, it is intuitively clear that the process will become
extinct if and only E (D) 1. Turning to the more interesting case where E (D) > 1,
we let q be the extinction probability of the branching process and rst assume that
m = 1. Then, by letting D0 be the number of children of the ancestor, we have q= 1
X
k=0 P(extinction j D0 = k)P(D0 = k): But ultimate extinction will occur if and only if all of P (independent) branches
the
generated by these children become extinct, hence q = 1 qk P(D0 = k), showing
k=0
that q is the solution of the equation = E ( D ). With a little more work one can show
that q is the smallest solution to this equation. Finally, when there are m ancestors
the extinction probability is given by qm with q as above. For detailed proofs of all
these results, see Jagers (1975).
We conclude our description of the process Em ( I ) by presenting the fundamental
quantities encountered above in a somewhat more explicit form. Note that, given
I = t, the number of children D is Poisson distributed with mean t. It follows that
the probability generating function of D is ; E( D) = E E( D j I )
= E (expf; I (1 ; )g)
= ( (1 ; )) and its expectation is E (D ) =
where the probability generating function and the expectation of I are given by
and , respectively. Also note that the variance of D is strictly positive, even if the
lifetime I is constant.
Returning to the main topic of this section, we examine the branching approximation of our epidemic process. Consider a sequence of standard SIR epidemic processes
En m( I ), n 1. Let fYn(t) t 0g, be the process describing the number of infectives in the nth epidemic. We wish to compare this process with fY (t) t 0g, the
process describing the number of individuals alive in the continuous time branching
process Em ( I ).
First we construct the branching process Em ( I ). Suppose that the probability
space ( F P) holds the individual life histories H;(m;1) H;(m;2) : : : , where each 6 H3
H2
H1
H0 u
u
eu
e eu
 t Figure 3.1: Construction of Em ( I ) for m = 1 using life histories. Hi is a list containing the life span of the ith individual together with the time points
at which this individual gives birth. Let H;(m;1) H;(m;2) : : : H0 be the life histories
of the m ancestors and let Hi , i 1, be the life history of the ith individual born. Next, we use an independent sequence Ui , i 1, of independent and identically
distributed random variables de ned on ( F P), each uniformly distributed on
(0 1), to construct all of the epidemic processes En m ( I ), n 1. Fix n and label
the initial susceptibles 1 2 : : : n. The initial ancestors in the branching process
correspond to the initial infectives in the epidemic. A contact in the epidemic process
occurs whenever a birth occurs in the branching process. The individual contacted
at the ith contact has label Ci = nUi ] + 1. If this individual is still susceptible,
then she becomes infected in the epidemic, otherwise she and all of her descendants
in the branching process are ignored in the epidemic process. In the latter case the
individual is called a ghost, following Mollison (1977). Finally, the death of an nonghost individual in the branching process corresponds to removal in the epidemic.
This construction leads to a process equivalent to En m( I ).
In Figure 3.1 the construction of Em ( I ) is illustrated with m = 1. The life
histories of the initially infectious individuals are inserted horizontally one at each
ylevel, starting at t = 0 (in the example only H0 since m = 1). As t grows, a new life
history is inserted whenever there is a birth among the life histories present. In the
construction of the epidemic En m( I ) the same procedure is used, except that the
life history is not inserted if the label Ci has appeared previously, i.e. the individual
is a ghost.
Obviously, the processes Yn and Y agree until the time Tn of the rst ghost. The number of births in the branching process during a xed time interval 0 t0] is nite
a.s. It is easily checked that any nite number of labels Ci will be distinct with a
probability tending to 1 as n ! 1, showing that P(Tn > t0 ) ! 1 as n ! 1: Ball and Donnelly (1995) have shown that the branching process breaks down roughly
at time C log(n). Note that the problem of deciding the number of contacts made
before a previously infected individual is picked again, i.e. the number of trials before
one of the labels Ci is repeated, is a variant of the classical birthday problem.
We collect our ndings in the following theorem: Theorem 3.1 Consider a sequence of epidemic processes En m( I ), n 1. Also,
denote by Yn (t) the number of infectives at time t, t 0. Then, for each xed t0 ,
Yn(t0) ! Y (t0 ) almost surely, where fY (t) t 0g, is the process describing the
number of individuals alive in the branching process Em ( I ).
If
1 then Y becomes extinct with probability 1. On the other hand, if > 1
then Y becomes extinct with probability qm , where q is the smallest root of the equation
( (1 ; )) = , or explodes with probability 1 ; qm .
This result shows that the basic reproduction number R0 = determines, for a
large population, whether or not a large outbreak of the epidemic may occur.
Monotonicity We nally show how coupling can be used to compare epidemics with the same
distribution for infectious periods but with di erent infection rates. We consider
0 , and
two standard SIR epidemic models En m ( I ) and En m( 0 I ) where
prove that the accumulated number of infected individuals at time t for the epidemic
with infection rate is stochastically smaller than the corresponding quantity for the
epidemic with rate 0.
Let us return to the Sellke construction. Let I;(m;1) I;(m;2) : : : In be the infectious periods and let Q1 Q2 : : : Qn be the individual thresholds of the epidemic
En m( I ). Thus the variables Qi , 1 i n, are exponentially distributed with
intensity 1. To construct En m( 0 I ), use the same realizations of infectious periods and individual thresholds. Only the infection pressure processes, A(t) and A0 (t),
respectively, are di erent for the two models.
For 1 j n, de ne Tj to be the time of the j th infection in the epidemic with
infection rate Tj = 1 if the nal size is strictly less than j . Similarly de ne Tj0,
1 j n, for the epidemic process with infection rate 0. The rst infection in the
epidemic with rate 0 will occur earlier than the rst infection in the other epidemic, i.e. T10 T1, since A(t) A0 (t) up to the time point T10 . This will give rise to
an even larger di erence in magnitude between the two infection pressure processes.
Consequently, it follows that T20 T2 , and so on. The desired result follows, since jfj : Tj tgj jfj : Tj0 tgj for all t. Exercises
^^
3.1. Check that the process (X X 0), de ned in the section on domination of birth and death processes, is indeed a coupling of the birth and death processes X and
X 0, i.e. that the marginal distributions coincide with the distributions of X and X 0,
respectively. 3.2. In Section 3.3 we saw that the branching approximation broke down on the part of the sample space where the size of the total progeny in the branching process
was in nite. Show that in this case the rst ghost in the construction appears when
p
approximately n individuals have become infected. It is wellknown in branching
process theory that, given explosion, the total number born before t grows like e t ,
where (the socalled Malthusian parameter) satis es a certain equation. What can
hence be said about the time of the appearance of the rst ghost? 3.3. Consider the Markovian version (Section 2.3) of the standard SIR epidemic (m xed, n large). Without referring to the branching approximation of Section 3.3,
approximate the process of infectives Yn(t) during the initial stage of the epidemic
with a suitable simple birth and death process. What is the probability of extinction/explosion of this approximating process? (Hint: Xn(t) n during the initial
stage of the epidemic.) 4 The threshold limit theorem
We will now explore in greater detail the large population limit of the nal size
distribution for the standard SIR epidemic model En m( I ). We have seen (Section
3.3) that, if the population of susceptibles is large and we introduce a small number
of initial infectives, the number of infectious individuals behaves like a branching
process in the beginning. If the basic reproduction number R0 = is less than or
equal to 1, a small outbreak will occur. On the other hand, if R0 exceeds 1, then
there is a positive probability that the approximating branching process explodes this
implies, of course, that the branching process approximation will break down after
some time. Then it is reasonable to expect that the nal epidemic size will satisfy a
law of large numbers. This indicates that the asymptotic distribution of the nal size
actually consists of two parts, one close to zero and the other concentrated around
some deterministic value. In this chapter we sketch the derivation of these results,
using the Sellke construction and the beautiful imbedding representation of ScaliaTomba (1985, 1990). A uctuation result for the nal size, given a large outbreak, will
also be given. In the nal section we combine earlier results and indicate the proof of
a theorem due to Barbour (1975) on the duration of the (Markovian) standard SIR
epidemic. 4.1 The imbedded process
To explain the imbedding idea, we need to introduce two auxiliary processes the
infection pressure process I (t) = n t];m
X
j =;(m;1) and the threshold process Q(t) = n
X
j =1 Ij 0 t n+m 1 fQ j tg t 0: Recall that the variables Ij are the infectious periods, and the variables Qj are the
individual thresholds. The index t should not be interpreted as ordinary time.
We know from Section 2.4 that the nal size Z can be written 8
<
Z = min :i 0 : Q(i+1) > n i
X
j =;(m;1) 9
= Ij : Let us express this quantity in terms of the infectivity and the susceptibility processes.
By de nition, Z = i if and only if
Q(1)
I (0 + m)
Q(2)
I (1 + m)
...
Q(i)
I (i ; 1 + m)
Q(i+1) > I (i + m):
Put another way, the pressure I (0 + m) is enough to infect at least one individual,
the pressure I (1 + m) is then enough to infect at least two individuals, and so on.
Hence
Q(I (0 + m)) > 0
Q(I (1 + m)) > 1
...
Q(I (i ; 1 + m)) > i ; 1
Q(I (i + m)) i:
It follows immediately that
Z = minft 0 : Q(I (t + m)) = tg:
For future use, denote the composition Q(I (t)) by C (t). Figure 4.1 shows simulated
realizations of C (t) for two di erent population sizes. The nal size Z is the \time"
when C (t) intersects t. 4.2 Preliminary convergence results
From now on we consider a sequence of epidemic processes En m( I ), n 1 the
quantities de ned above will now have a subscript n. It is straightforward to analyse
the (independent!) processes In and Qn. De ne In (t) = In(nt) and Qn(t) = Qn (t)=n.
Then
In(t) ! t and Qn(t) ! 1 ; e;t
in probability, uniformly on compact sets. Here we recall that = E (I ). Since
composition is a continuous operation, it follows that
Cn(t) = Qn(In(t)) ! 1 ; e; t
in probability, uniformly on compact sets. We also have the following uctuation
results. The sequence of processes
pn ;In(t) ; t t 0 100 1000 80 800 60 600 40 400 20 200 20 40 60 80 100 200 400 600 800 1000 Figure 4.1: Simulation of the imbedded process C (t) for n = 100 (left) and n = 1000
(right). n 1, converges in distribution to a Wiener process with variance
sequence
pn ;Qn(t) ; 1 ; e;t t 0 2 2 t. Also, the n 1, converges weakly to a Gaussian process with mean 0 and variance e;t (1 ; e;t ).
(Readers not familiar with the theory of weak convergence of sequences of random
processes may consult e.g. Billingsley, 1968. Convergence for a xed arbitrary value
of t follows from the usual law of large numbers and the central limit theorem, hence
the stated results are certainly plausible.) Finally, in order to study the uctuations
C~n(t) of the composite process Cn(t) = Qn(In(t)) we write pn ;Qn(In(t)) ; 1 ; e; t =
+ pn Qn(In(t)) ; h1 ; e;I (t)i
pn h1 ; e;I (t)i ; 1 ; e; t
n n : ~
It is seen that this process converges weakly to a zero mean Gaussian process C (t)
with variance
;
e; t 1 ; e; t + 2 2 te;2 t :
All of the results above are derived rigorously by ScaliaTomba (1985, 1990). 4.3 The case mn=n ! > 0 as n ! 1
Write the nal size proportion as
Zn = 1 min t 0 : C t + mn = t :
n
nn
nn
n
0
Working with Zn = Zn + mn, which includes the initial infectives in the nal size, will
lead to cleaner formulas. We know that Cn(t) converges in probability to 1 ; e; t ,
and also that mn =n ! > 0 by assumption. It is thus easily seen that the sequence
0
0
Zn = Zn=n converges to , the unique solution to the equation
1 ; e; ; :
To give an intuitive explanation of this equation, rewrite it as
1 + ; = e; :
(4.1)
A ;given individual avoids becoming infected by a given infective with probability
E e; I=n . Hence the probability of escaping infection is given by ;
E e; I=n = n 1; n
e;
i.e. the right hand side of (4.1). But the probability of escaping infection is of course
equal to the proportion of initial susceptibles who remain uninfected, i.e. the left hand
side of (4.1).
In Figure 4.2 the two functions f (t) = 1 + ; t and g(t) = e; t are plotted for
the case = 0:1, = 1:8 and = 1. The intersection of the two curves de nes the
unique solution
0:903.
0
0
To obtain a central limit theorem for Zn, we proceed as follows. Since Zn =
0 ) + mn =n and = 1 ; e; + , we may write
Cn(Zn pn ;Z 0 ; Zn
0 h pn
pn
pn i Cn ;Zn0 ; 1 ; e; Z
n
h ; Zi
+
1;e
; 1 ; e; + o(1)
h
i
Cn ;Zn0 ; 1 ; e; Z
=
p;0
+ e; n Zn ; + o(1):
p0
Rearranging and taking limits yields that the sequence n(Zn ; ) converges to a
= 0 n 0 n 0 n normally distributed random variable with mean 0 and variance ;e; ;1 ; e; + 2 2 e;2 ;1 ; e; 2 : 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 Figure 4.2: Graphical illustration of , the solution to (4.1), for = 0:1, = 1:8 and
= 1.
The expression above is wellde ned if the condition
e; < 1
is ful lled. To see that this is indeed the case, consider the functions exp(; t)
and 1 + ; t in Figure 4.2. The derivative of the exponential function has to be
strictly greater than ;1 at the crossing point , leading to the desired conclusion.
The condition has actually a very natural interpretation. At the beginning, the basic
reproduction number is given by R0 = , i.e. an infectious individual generates on
the average new cases in a large susceptible population. However, after a large
outbreak a fraction 1 ; ( ; ) has escaped infection so that a second introduction
of the disease in the population would correspond to the e ective basic reproduction
number (1 ; ( ; )) = exp(; ). This number is less than 1, otherwise the
epidemic would not have ceased in the rst place.
Let us state our results as a theorem. Theorem 4.1 Consider a sequence of epidemic processes En m ( I ). Assume that
mn=n ! > 0 as n ! 1 and de ne as the solution to (4.1). Also denote the
p0
0
nal epidemic size by Zn and write Zn = Zn + mn . Then the sequence n(Zn=n ; )
converges to a normally distributed random variable with mean 0 and variance
(1 ; ) + 2 2 2
(1 ; )2
n where = 1 + ; = e; . 4.4 The case mn = m for all n
Recall the branching process approximation result of Section 3.3, implying that the
nal size Zn of the epidemic En m( I ) converges almost surely to the total progeny
of a suitably chosen branching process Em( I ) as n ! 1. Thus we have that
0
lim lim P(Zn t) = qm
t!1 n!1 of
where qm is the extinction probability p the approximating process. Choose a sequence tn such that tn =n ! 0 and tn= n ! 1 as n ! 1. Also, de ne as the
nontrivial solution of the equation
1 ; e; = :
(4.2)
We will show that
0
lim P(Zn tn) = qm
(4.3)
n!1
p
0
(4.4)
lim nlim P(tn < Zn < n ; c n) = 0
c!1 !1
pn) = 0:
0
(4.5)
lim lim P(Zn > n + c
c!1 n!1 This means that if n is large and if the branching approximation breaks down, then
p
p
the nal size falls in the range n ; c n n + c n] with high probability (for some
large xed c). ScaliaTomba (1985, 1990) has proved that the central limit theorem
above (with = 0) applies in the case of a large outbreak.
We start by proving (4.3), following Ball and Clancy (1993) rather than ScaliaTomba (1985, 1990). De ne Pt Em ( I )] to be the probability that the branching
0
process Em ( I ) has total progeny t (including the m ancestors). If Zn tn then
each infective contacts susceptible individuals at a rate bounded below by n =
(n + m ; tn )=n. It follows that P(Z 0 n tn) t
X
n t=0 1
X
t=0 Pt E m ( n I )] Pt E m ( n I )] m
and the right hand side is the extinction probability qn of the modi ed branching
process Em( n I ). Now n ! as n ! 1, so that qn ! q. Thus, for xed > 0
there remains to choose t and n large enough that
0
0
qm ; P(Zn t) P(Zn tn) qm + : Equation (4.3) follows. Now assume that the basic reproduction number is above 1,
i.e. R0 = > 1, otherwise there is nothing left to prove. We proceed to show (4.4).
We have ; cpn) P(Cn(t + m) = t for some t 2 tn n ; cpn])
p
t
1~
1 ; e; (t+m)=n ; n = ; p Cn t + m for some t 2 tn n ; c n]
n
n 0
P(tn < Zn < n =P : Are there any values of t in the prescribed interval for which the equality above could
be ful lled? The left hand side is a concave function of t, so it is enough to check the
p
endpoints. First put t = n ; c n. After an easy calculation, we have
t
c;
1 ; e; (t+m)=n ; n = pn e; + 1 + O(1=n)
p
~
and the process jCn(t)j= n will not be able to reach this level if c is large enough.
Second, set t = tn . It follows that
t
n
1 ; e; (t+m)=n ; n = ( ; 1) tn + O(1=n):
p
p
Again, this quantity is large on the 1= n scale, since tn= n ! 1 as n ! 1. (Note
that > 1 by assumption.) This proves (4.4), and (4.5) is veri ed in the same
manner.
We have thus indicated the proof of the following very important result. Theorem 4.2 Consider a sequence of epidemic processes En m ( I ). Assume that
n mn = m for all n, and de ne as the nontrivial solution to (4.2). Also denote the
0
nal epidemic size by Zn and write Zn = Zn + m.
If
1 then Zn ! Z almost surely, where P(Z < 1) = 1 and Z is the total
progeny in a continuous time branching process Em ( I ), initiated by m ancestors, in which individuals give birth at the rate during a lifetime distributed according to
I.
If > 1 then Zn still converges to Z , but now P(Z < 1) = qm , where qm is the
extinction probability of the branching process. With probability 1 ; qm , the sequence
pn(Zn0 =n ; ) converges to a normally distributed random variable with mean 0 and
variance
(1 ; ) + 2 2 2
(1 ; )2
where = 1 ; . From the theorem it follows in particular that, in the case where R0 > 1, the nal
size proportion Zn=n converges in distribution to a random variable with mass qm at the point 0 and mass 1 ; qm at the point . In Figures 4.3 and 4.4 histograms
of the nal size in 10000 simulations are reported. Both gures treat the Markovian
version of the standard epidemic model (cf. Section 2.3) and an initial population of
n = 1000 susceptibles and m = 1 infectious individual. Figure 4.3 is the frequency
distribution when R0 = 0:8 < 1 whereas Figure 4.4 corresponds to the case where
R0 = 1:5 > 1. It is clear from the picture that major outbreaks occur only in the
latter case. The solution of (4.2) is
0:583 when R0 = = 1:5 and the initial
proportion infectious is negligible. The variance expression of Theorem 4.2 is 3.139.
The theorem then states that the distribution of the major outbreak sizes should be
p
approximately Gaussian with mean n = 583 and standard deviation 3:139n 56,
agreeing quite well with the histogram. 4.5 Duration of the Markovian SIR epidemic
Let us nally discuss the duration Tn of the Markovian SIR epidemic, i.e. the time
between the rst infection and the last removal in the epidemic. Barbour (1975)
has derived limit theorems, as the population size becomes large, for the distribution
of the duration. The epidemic is allowed to start either with a positive fraction of
infectives or with a single case. Below we give a heuristic motivation of the fact
that Tn is either O(1) or grows like log(n) as n ! 1. We only discuss the (more
complicated) case with one initially infectious individual.
Consider a sequence of standard SIR epidemics En m ( I ), where I is exponentially distributed with rate . Let Xn(t) and Yn(t) denote the number of susceptibles
and infectives, respectively, at time t. De ne the duration as
Tn = inf ft 0 : Yn(t) = 0g:
We have seen (Exercise 3.3) that, if n is large, the number of infectives Yn is well
approximated at the beginning by a linear birth and death process Y = fY (t) t 0g
with birth rate y and death rate y. If =
1 then this approximating process
will become extinct with probability 1, corresponding to a duration Tn that is O(1).
On the other hand, if = > 1 then Y will become extinct with probability ( = );1
and will explode with probability 1 ; ( = );1. In this latter case, the approximation
actually breaks down after approximately C1 log(n) time units, as we have seen in Exercise 3.2, and we say that the rst phase of the epidemic is over. In the second phase
of the epidemic the bivariate process (Xn Yn) moves in a more or less deterministic
fashion, and one can show that the duration of this deterministic phase is O(1) (cf.
Chapter 5). The third phase then begins when Xn=n has settled to 1 ; but there are
still infectives present in the population. Here Yn can be approximated by another
~
~
birth and death process Y = fY (t) t 0g, this time with birth rate (1 ; )y and
death rate y. We have seen in Section 4.3 that (1 ; )= = (1 ; ) < 1, so
~
the approximating process Y is always subcritical, and will therefore become extinct
after an additional time C2 log(n). Cf. the construction of Whittle (1955).
n 8000 6000 4000 2000 0 200 400 600 800 1000 Figure 4.3: Histogram of nal sizes for 10000 simulations of Em n( I ) with m = 1,
n = 1000 and R0 = = 0:8, i.e. below threshold. Each histogram bar has width
10, so for example the second bar denotes the frequency of outbreak sizes between 10
and 19.
8000 6000 4000 2000 0 200 400 600 800 1000 Figure 4.4: Histogram of nal sizes for 10000 simulations of Em n( I ) with m = 1,
n = 1000 and R0 = = 1:5, i.e. above threshold. These arguments together make it plausible that either Tn is short, or the ratio
Tn= log(n) converges in distribution as n ! 1. Actually a much stronger statement
is true: Either Tn is short or the di erence Tn ; C log(n) converges in distribution as
n ! 1. We have the following theorem: Theorem 4.3 Consider a sequence of epidemic processes En m ( I ), where I is
n exponential with rate . Assume that mn = 1 for all n, and de ne as the nontrivial
solution to (4:2).
If = 1 then the duration Tn of the epidemic converges weakly to the time to
extinction of a birth and death process fY (t) t 0g with birth rate y and death rate
y.
If = > 1 then the above is true only on a part of the sample space of probability
mass ( = );1 . On the rest of the sample space, Tn ; ; 1
+1
(1 ; )
; log(n) ; c ! W in distribution as n ! 1. Here c = c( ) is a constant and W has the distribution
of W1 =( ; (1 ; )) + W2 =( ; ), where W1 and W2 are independent, both with the
extreme value distribution function F (w) = exp(;e;w ). The proof is not di cult but rather technical, and is therefore omitted (see Barbour, 1975). Exercises
4.1. This exercise will give you a feeling of what proportion gets infected, assuming a major outbreak, for di erent parameter values. Using equation (4.1), compute the
(asymptotic) proportion that gets infected numerically if:
a) R0 = 2 (i.e. = 2) and = 0:1.
b) R0 = 0:8 and = 0:1:
c) R0 = 2 and = 0 (the proportion infected is now the largest solution to (4.1) or
equivalently (4.2)).
d) R0 = 0:8 and = 0. (What does this say about the possibility of a major
outbreak?) 4.2. We know that the basic reproduction number R0 = for the standard SIR
epidemic remains constant as n grows. What happens with the nal size if instead
R0 grows with the population size, R0 = R0 (n)? Study the following cases: a) R0 (n) = Cn.
b) R0(n) = log(n).
(Hint: Since most individuals will get infected when R0 is large, consider instead the
number of individuals who escape the epidemic. Compute the probability that a given
individual i escapes infection by looking at her threshold Qi .) 4.3. Consider the standard SIR epidemic. Assume that m = 1, n is large, and the
basic reproduction number is above 1. Using the branching approximation, show
that the probability of a large outbreak is always less than or equal to the nal size
proportion in case of a large outbreak. When does equality hold? (Hint: Derive
expressions for the two quantities and apply Jensen's inequality to nd a relation
between them.) 5 Density dependent jump Markov processes
In the present section we shall approximate certain jump Markov processes as a
parameter n, interpreted as the population size, becomes large. The results will be
presented in a form general enough for our purposes. More general results, as well as
other extensions, may be found in Chapter 11 of Ethier and Kurtz (1986), which has
served as our main source. With the aim to explain the intuition behind the theory
we start with a simple example, a birth and death process with constant birth rate
(`immigration') and constant individual death rate. The results in Sections 5.3 and
5.4 are applied to this example, thus giving explicit solutions. In Section 5.5 we apply
the results to the Markovian version of the epidemic model described in Section 2.3.
It is shown that this process converges weakly to a certain Gaussian process but in
this case it is not possible to obtain explicit solutions for the deterministic limit and
the covariance function. It is worth mentioning that the techniques presented in this
chapter may be applied to a wide range of problems such as more general epidemic
models and models for chemical reactions and population genetics, as well as other
population processes.
Two fundamental results about Poisson processes will be used without proof. In
Section 5.3 we use the fact that if Y = fY (t) t 0g is a Poisson process then
limn!1 sups t jn;1Y (ns) ; sj = 0 almost surely, for any t 0. In p
Section 5.4 we ap(n) , de ned by W (n) (t) = n(n;1 Y (nt) ; t),
ply the result stating that the process W
converges weakly to the standard Brownian motion. This is proven using the fact
P
that Y (nt) = n=1 (Y (kt) ; Y ((k ; 1)t)) is a sum of n independent and identically
k
distributed Poisson random variables, and applying Donsker's theorem (e.g. Billingsley, 1968 p 68). See for example Ethier and Kurtz (1986), Exercise 4.10 for hints
to a complete proof. In Section 5.4 we also integrate with respect to Gaussian processes, the so called It^ integrals. However, it is possible to read the text without any
o
knowledge of such integrals. 5.1 An example: A simple birth and death process Let X = fX (t) t 0g be a birth and death process with constant birth rate: k = ,
and with death rates: k = k. This means that individuals enter the population
at constant rate (immigration) and each individual lives for an exponentially distributed time with mean 1= . If at time t, the process satis es X (t) > = , then
the death rate (= X (t)) exceeds the birth rate (= ), and vice versa if X (t) < = .
One might therefore de ne = as the `equilibrium' of the population, and one would
expect the population size to uctuate around this value, except possibly during the
initial phase, if X (0) is far from the equilibrium. In Figure 5.1 below a simulation
of the birth and death process with = 200, = 1 and X (0) = 100 is plotted. As
we can see the process soon reaches the equilibrium value = = 200 around which
it uctuates randomly. The arguments above are very loose, and the present chapter 200
150
100
50
2 4 6 8 Figure 5.1: Simulation of birth and death process with
100. k = 200, 10 k t = k and X (0) = will formalize the mathematical results. In particular, if is large, implying that
the population size at equilibrium is large, we will show that the process above (and
others of similar type) can be approximated by certain di usion processes. For the
birth and death process de ned above, it turns out that it can be approximated by a
socalled OrnsteinUhlenbeck process. Such a process makes excursions away from the
equilibrium, but the farther away from equilibrium the stronger is the drift towards
equilibrium a result in line with the heuristics mentioned above.
At the end of Sections 5.3 and 5.4 we apply the results of this section to the birth
and death process and obtain explicit solutions. The birth and death process ts in
with the model below if , the parameter assumed to be large, is replaced by n. 5.2 The general model Suppose that for each n 1, Zn = fZn(t) t 0g is a continuoustime Markov
(
process on the ddimensional lattice Z d governed by the jump intensities qznz)+` =
n `(n;1 z ) z ` 2 Z d . This means that
P(Zn(t + h) = z + ` j Zn(t) = z) = hn `(n;1 z) + o(h) ` 6= 0
X
P(Zn(t + h) = z j Zn(t) = z) = 1 ; hn `(n;1z) + o(h):
(5.1)
` We assume that the process has only a nite number of possible transitions, i.e.
that there are only nitely many ` 2 Z d for which supx `(x) > 0, and that these
transition rates `(x) are continuous functions. The starting point Zn(0) is assumed
nonrandom. The rates above explain why the processes are called density dependent.
The jump rates depend on the density of the process (i.e. normed by n). The factor n implies that the rates increase with n, a necessary criterion for the processes to
behave more and more closely like a di usion.
One way to characterize this process is by means of Poisson processes. To this
end, let Y` = fY`(t) t 0g be independent standard Poisson process de ned for each
of the possible transitions `. Then Zn can be written as
Zn(t) = Zn(0) + X
` `Y` n Zt
0 ` (n ;1 Z n (s))ds : (5.2) Note that even though the Poisson processes were de ned independently, the terms
in the sum above are dependent since the observation points of the Poisson processes
are dependent. It is easy to check that (5.2) satis es (5.1) by recalling that the
probability of a jump in a Poisson process in a short time interval is proportional
to the length of the interval. Given Zn(t) = z, the probability that there will be a
R
jump in Y` n 0t ` (n;1Zn(s))ds during (t t + h) is thus hn `(n;1 z) + o(h), since
the integrand is equal to n `(n;1 z) until the rst jump after t.
Note also that the processes are de ned on the same probability space for di erent
n, as the same Poisson processes are utilized in the construction. Below we prove
convergence theorems for the sequence of processes fZng. 5.3 The Law of Large Numbers Before proving convergence results for fZng we derive an inequality known as Gronwall's inequality, which is interesting in its own right. Lemma 5.1 R(Gronwall's inequality). Assume f is a real function satisfying 0
t f (t) a + b 0 f (s)ds, for some positive constants a and b and for all t
f (t) aebt t 0. 0. Then Proof. By iterating the inequality above one obtains f (t) a+b
= Zt
0 f (s1 )ds1 a + b a + abt + b2 = aebt : Z0 t Z0 0 a+b Z s1
0 f (s2)ds2 ds1 f (s2)ds2ds1 ;a + b Z s2 f (s )ds ds ds
3 3 21
00
0
1
(bt)2 + b3 Z t Z s1 Z s2 f (s )ds ds ds : : : a X (bt)k
a + a(bt) + a 2
3 321
k! a + abt + b2
= Z t Z s1 Zt s1 0 0 0 k=0 The result follows.
^
^
Let Y` be centered Poisson processes, that is, Y`(t) = Y`(t) ; t. Further, let
;1 Zn and de ne the drift function F by F (x) = P ` ` (x), x 2 Z d . With this
Zn = n
`
notation, equation (5.2) is equivalent to Zn(t) = Zn X^
(0) + n;1 `Y n ` ` Zt
0 ` (Zn (s))ds + Zt
0 F (Zn(s))ds: (5.3) The second term on the right hand side of (5.3) will be small for large n since
^
sups t n;1jY`(ns)j converges to 0 almost surely, as mentioned at the beginning of the
present chapter. This suggests that Zn will resemble the deterministic vector function
z(t) de ned as the solution to the integral equation z(t) = z0 + Zt
0 F (z(s))ds: (5.4) This deterministic approximation can also be explained intuitively. The process Zn
P
starts at Zn(0) and the `average drift' of Zn(s) at s is ` ` `(Zn(s))ds = F (Zn(s))ds,
R
implying that Zn(t) should be approximately equal to Zn(0) + 0t F (Zn(s))ds. The
following theorem proves this strictly. Theorem 5.2 Suppose that limn!1 Zn(0) = z0 and that for each compact K 2 Rd there is a constant MK > 0 such that jF (x) ; F (y )j MK jx ; y j 8x y 2 K .
Then limn!1 sups t jZn(s) ; z (s)j = 0 almost surely, where z (t) is the unique
solution to (5.4).
Proof. The validity of the theorem relies only on ` in some neighbourhood K of
s tg. De ne thus ` = supx2K `(x), which is nite due to the continuity
of `. From (5.3), the de nition of z(t) and the assumptions of the theorem we have fz(s) 0 jZn(s) ; z(s)j = Zn(0) ; z0
+ Z s;
0 X^
+ n;1 `Y
` n ` Zs
0 ` (Zn (u))du F (Zn(u)) ; F (z(u)) du X jZn(0) ; z0j + j`j u s
` sup n;1 ^
jY`(n `u)j + Zs
0 MK jZn(u) ; z(u)jdu: By Gronwall's inequality (Lemma 5.1) this implies that jZn(s) ; z(s)j X ^
jZn(0) ; z0j + j`j sup n;1jY`(n `u)j
us
` ! eM s:
K Taking the supremum then yields
sup jZn(s) ; z(s)j X ^
jZn(0) ; z0j + j`j supt n;1jY`(n `u)j
u st ` ! eM t :
K The exponential function is independent of n, the rst term in the brackets converges
to 0 by assumption, and each term in the nite sum converges almost surely to 0
^
owing to the fact that sups t n;1Y`(ns) converges to 0 almost surely, for any t. This
completes the proof.
We now apply the theorem to the birth and death process de ned in Section
5.1. We replace the parameter by n and denote the corresponding process by
Xn. The two possible jumps are 1, and the jump rates are n for an increase and
a for a decrease if Xn(t) = a. With the notation of Section 5.2 this implies that
1 (x) = 1 and ;1 (x) = x. Consequently we have F (x) = 1 (x) ; ;1 (x) =
1 ; x. It is easy to show that the solution to (5.4), denoted here by x(t), is given
by x(t) = ;1 + (x0 ; ;1)e; t . By the theorem it then follows that if the initial
value converges, i.e. Xn(0)=n ! x0 , then Xn(t)=n converges to x(t) almost surely,
uniformly on compact sets. In particular we see that for su ciently large t x(t) ;1,
so Xn(t) n ;1, as was seen heuristically in Section 5.1. 5.4 The Central Limit Theorem
In the previous section, it was shown that the normed jump Markov vector process
Zn, for a large population n, was approximately equal to the deterministic vector
function z de ned by equation (5.4). The next natural step is to study the deviations
between the two, that is to derive a central limit theorem. As usual, it turns out that
p
p
the deviations are of order n. Before de ning the nscaled vector process Vn we
de ne the similarly scaled Poisson processes de ned in the previous section
p;
^
W`(n)(t) = n n;1 Y`(nt) ; t = n;1=2 Y (nt):
As pointed out at the beginning of this chapter, W`(n) converges to the standard
p
Brownian motion W`. The nscaled centered vector process Vn is then de ned by
p;
Vn(t) = n Zn(t) ; z(t)
(5.5)
= vn(0) + X
` `W (n)
` Zt
0 Z tp ;
n F (Zn(s)) ; F (z(s)) ds:
` (Zn(s))ds + p;
Of course, vn(0) = n Zn(0) ; z(0) 0 in (5.5), which by assumption is nonrandom.
The second equality is a direct consequence of the de nition of W`(n) , Zn and z. We
can expand the integrand on the far right by Taylor's theorem, so that
pn ;F (Zn(s)) ; F (z(s)) = [email protected] (z(s))(Zn(s) ; z(s)) + O(pnjZn(s) ; z(s)j2)
= @F (z(s))Vn (s) + O(jZn(s) ; z(s)j)Vn(s) where @F = (@j Fi ) is the matrix function of partial derivatives. From Theorem 5.2
we know that Zn converges to z, and from the beginning of this chapter that W`(n)
converges to W`, a standard Brownian motion. This suggests that Vn converges to a
process V de ned by the integral equation V (t) = v0 + X
` `W` Zt
0 ` (z (s))ds + Zt
0 @F (z(s))V (s)ds: This is proven in the following theorem where we use the notation G(x) = (5.6) P ``T (x).
`
` Theorem 5.3 Suppose @F is continuous and that limn!1 vn(0) = v0 (constant).
Then Vn ) V , the process de ned in equation (5.6). This process V is a Gaussian
vector process with covariance matrix Cov(V (t) V (r)) =
where Z r^t
0 (t s)G(z(s))( (r s))T ds is a matrix function de ned as the solution of
0 (t
2 s) = ; (t s)@F (z(s)) (s s) = I ( 02 denotes the partial derivative with respect to s).
Proof. De ne n(t) by
n (t) =
= Z tp ;
n F (Zn(s)) ; F (z(s)) ; @F (z(s))Vn (s) ds
0
Zt
0 O(jZn(s) ; z(s)j)Vn(s)ds: From Theorem 5.2 we know that Zn(s) converges to z(s) uniformly on bounded intervals. Thus, since Vn is bounded in probability it follows that sups t j n(s)j converges
to 0 almost surely.
Introduce Un(t) = P `W (n) R t (Z (s))ds
``
0`n and U (t) = P `W R t (z(s))ds
` ` 0` These are the second terms in the de ning equations of Vn and V , equations (5.5)
and (5.6) respectively. Rewrite (5.5) and (5.6) to obtain
~
Un(t) := Un (t) + n(t) = ;vn(0) + Vn(t) ; U (t) = ;v0 + V (t) ; Zt
0 Zt @F (z(s))V (s)ds: 0 @F (z(s))Vn(s)ds: (5.7)
(5.8) . The processes fW`(n) n 1g converge to a standard Brownian motion for each `.
Since Zn converges uniformly on bounded intervals to z and n converges to 0 it
~
follows that Un ) U .
From the de nition of and by partial integration, Zt ~
(t s)dUn(s) =
0
=
= Zt Zt (t s)dVn(s) ;
(t s)@F (z(s))Vn (s)ds
0
(t t)Vn(t) ; (t 0)vn(0) 0 Zt ; 0 ( 02(t s) +
Vn(t) ; (t 0)vn(0) so that Vn(t) = (t 0)vn (0) + Zt
0 (t s)@F (z(s))) Vn(s)ds ~
(t s) dUn(s): (5.9) An identical argument shows that V satis es V (t) = (t 0)v0 + Zt
0 (t s) dU (s): (5.10) From equations (5.9) and (5.10) it then follows that Vn ) V by the continuous
mapping theorem (e.g. Corollary 3.1.9 of Ethier and Kurtz, 1986). The vector process
U is Gaussian, in fact a timeinhomogeneous Brownian motion vector. It follows that
V is also Gaussian and the variance function is as speci ed in the theorem.
We now apply the result to the birth and death process de ned in Section 5.1.
In the previous section we concluded that Xn(t)=n converged to the deterministic
function x(t) = ;1 + (x0 ; ;1)e; t . To simplify notation we assume that the
process is started in equilibrium, that is Xn(0) = n ;1 implying that x0 = ;1
and hence x(t) = ;1 for all t. Since F (x) = 1 ; x it follows that F 0(x) = ; .
The function de ned in the theorem then has the solution pt s) = e; (t;s) , and
(
G(x(s)) = 1 + x(s) = 2. The theorem is applied to Vn(t) = n(n;1Xn(t) ; ;1).
The assumption Xn(0) = n= implies that vn(0) = 0 for all n, so v0 = 0. The
theorem then states that the scaled birth and death process Vn converges to (i.e. may
be approximated by) a Gaussian process V with covariance function
Cov(V (t) V (r)) = Z t^r
0 e; (t;s) 2e; (r;s)ds = ;1 ;e; jt;rj ; e; (t+r) : (Except at the start, i.e. for small t and r, the second term on the far right is negligible
and then Cov(V (t) V (r)) ;1e; jt;rj.) A Gaussian process having this covariance
function is known as an OrnsteinUhlenbeck process, an important process in di usion theory (e.g. Karatzas and Shreve, 1991). It may also be illuminating to write equation
(5.6) explicitly for our example: V (t) = W1(t) ; W;1 (t) ; Zt
0 V (s)ds: Since W1 (t) and W;1(t) are independent their di erence has the same distributional
p
properties as 2W (t) (where W is a standard Brownian motion, of course). Performing this substitution and writing the integral equation above as a (stochastic)
di erential equation we have
p
dV (t) = ; V (t) dt + 2dW (t)
which is the de ning di erential equation for the OrnsteinUhlenbeck process. The
properties of the birth and death process derived heuristically can be veri ed. The
limiting process has a drift back towards equilibrium (now at the origin due to centering). Beside the negative drift, there is random noise expressed in the second
term. 5.5 Applications to epidemic models
In the present section we apply the results presented above to a special case of the
epidemic model de ned in Chapter 2. We stress that the approximation concerns the
whole epidemic process (as it evolves in time) and not only the nal size as mainly
studied in preceding sections. Due to the complex structure of epidemic models
(although simple to de ne), we will not obtain very explicit solutions, as we did for
the birth and death process above.
The theory involves approximations relying on the central limit theorem. This
means that we can only hope to approximate the epidemic process when there are
many infectious individuals, thus excluding the initial and nal phases of the epidemic.
We will therefore assume a (small) positive proportion of infectives when the epidemic
starts how to approximate the initial phase of the epidemic using coupling methods
was described in Section 3.3. Further, we treat the special case of the standard
epidemic model presented in Section 2.3 in which the infectious periods fIig are
exponentially distributed, making the epidemic process Markovian. By extending
the dimension of the process, other distributions may be modelled using the same
theory. For example, if the infectious period is ;(k ) the epidemic model falls under
the model of Section 5.2 in which there are k infectious states, which individuals pass
through sequentially with identical jump rates .
Using the notation of Chapter 2, the model treated in this section is denoted
En n( I ), where I is exponentially distributed with intensity . Our twodimensional
process is Zn = (Xn Yn), where Xn(t) denotes the number of susceptibles at t and
Yn(t) the number of infectives at the same time point. The initial values are Xn(0) = n and Yn(0) = n. The proportion of initial infectives is assumed positive but usually
very small. The process (Xn Yn) can make two types of jumps. Either a susceptible
becomes infected, implying that the process changes by (;1 1), or else an infective
is removed. The latter a ects the process by (0 ;1). Because the infectious periods are exponentially distributed with intensity parameter , the jump rate for a
(0 ;1)jump is Yn(t). New infections, or equivalently (;1 1)jumps, occur at rate
( =n)Xn(t)Yn(t).
The jump intensity functions for this model are thus
and
(;1 1) (x y ) = xy
(0 ;1) (x y ) = y:
Let x = x=n, and similarly y = y=n. We have then argued that the epidemic process
has the following jump intensities
P((Xn(t+h) Yn (t+h)) = (x;1 y +1)j(Xn(t) Yn(t)) = (x y)) = hn (;1 1) (x y)+o(h)
P((Xn(t + h) Yn(t + h)) = (x y ; 1)j(Xn(t) Yn(t)) = (x y)) = hn (0 ;1) (x y) + o(h):
The drift function F de ned in Section 5.3 is then given by
F (x y) = (; xy xy ; y):
The deterministic solution z = (x y) to the integral equation (5.4) corresponds to the
pair of di erential equations
x0 (t) = ; x(t)y(t)
x(0) = 1
0(t) = x(t)y (t) ; y (t)
y
y(0) = :
Recalling the historical overview in Section 1.4, we see that these di erential equations
are identical to those of the rst deterministic epidemic model presented by Kermack
and McKendrick (1927). A parametric solution to this set of di erential equations
was derived in Section 1.4:
x(t) = e; z(t)
y(t) = 1 + ; z(t) ; e; z(t)
;
where z(t) is de ned by the di erential equation z0 (t) = 1 + ; z(t) ; e; z(t) ,
with initial value z(0) = 0. Here = = .
First we apply Theorem 5.2 to show that the `density' process (Xn Yn)
= (Xn=n Yn=n), converges to the deterministic functions de ned above. To see that
the conditions are ful lled, we note that (Xn(0) Yn(0)) = (1 ) = (x0 y0) so the requirement that the initial value converges is obvious. Using the fact that the domain
of interest satis es 0 x1 x2 y1 y2 1 + together with the elementary bound
ab (a2 + b2 )=2, one can show that
jF (x1 y1) ; F (x2 y2)j (2 (1 + ) + ) j(x1 y1) ; (x2 y2)j: The upper bound is just a rough estimate but su cient for the second assumption
of the theorem to be satis ed. Theorem 5.2 thus proves that (Xn(t) Yn(t)) converges
almost surely to (x(t) y(t)), uniformly on bounded intervals.
Theorem 5.3 can be applied to conclude that the uctuations around the deterministic solution are asymptotically Gaussian. However, the covariance function
~~
~
de ;
the
very
Vn
Xn(t) =
pn nedn(in) ; x(ttheoremYis(tnot pn ;Yexplicit.y(tLet The = (Xn Yn), wherederivatives
~n ) =
Xt
) and
matrix of partial
n (t) ; ) .
of the drift function and the matrix function G appearing in the theorem, are
y ; xy
@F (x y) = ; yy ; ;x
and G(x y) = ; xxy xy + y :
x
~
~
The matrix @F (x y) is continuous and (Xn(0) Yn(0)) = (0 0), so the assumptions
~~
of Theorem 5.3 are satis ed. Hence, it follows that (Xn Yn) converges to a Gaussian vector process V . The set of di erential equations de ning in the covariance
function are (derivatives below are with respect to s)
0 (t s) 0 (t s)
11 (t s) 12 (t s) ; y (s) ; x(s)
11
12
0 (t s) 0 (t s) = ; 21 (t s) 22 (t s)
y(s) x(s) ;
21
22
and with 11(s s) = 22(s s) = 1 and 12(s s) = 21 (s s) = 0. It is seen that this
is actually two independent pairs of di erential equations. The rst pair contains
11 and 12 and the second pair of di erential equations contains 21 and 22 and
is identical to the rst one except for di erent initial conditions. The solutions are
not explicit but can be partially derived. They are used for the computation of the
covariance functions of the process, as speci ed in the theorem. For example, it
~
follows from the theorem that the limit of Xn(t) has variance Zt
0 ( 11(t s) ; 12(t s))2 x(s)y(s) + 12 (t s)2 y(s) ds: As mentioned earlier, it is complicated and not very illuminating to derive more explicit solutions. For interested readers we refer to Kurtz (1981) who characterizes the
limiting process for a general distribution of the infectious period, without assuming
exponentially distributed infectious periods as we have done. Exercises
5.1. Consider the SI model (see Exercise 2.5) and assume that m = mn = n , so that there is a positive proportion of initial infectives. Derive a law of large numbers for
Yn(t), i.e. let n ! 1. (Hint: The rate of new infections is ( =n)(n + m ; Yn(t))Yn(t)
since there are no removed individuals and hence Xn(t) = n + m ; Yn(t).) 5.2. The SI model (continued). Let y(t) be the deterministic limit derived in Exercise
5.1. To simplify computations you may assume that the initial proportion infecive is negligible (although positive). The central limit theorem of Section 5.4 implies that
pn (n;1Yn(t) ; y(t)) converges to a Gaussian process. Derive the integral expression
of Theorem 5.3 for the covariance function. Ultimately everyone gets infected (since
individuals remain infectious forever) so the variance function tends to 0 as t = r gets
large, but at what rate? (Hint: You may use without proof the fact that (t s) splits
into a product f (t)g(s).) 5.3. An SIRS epidemic is just like the SIR epidemic, only individuals in the removed state lose their immunity after some time and become susceptible again. This in ow
of new susceptibles can lead to a situation where the disease `survives' for a long
time, a behaviour known as endemicity (cf. Chapter 8). . Extend the Markovian SIR
epidemic to a Markovian SIRS epidemic by assuming that removed individuals become
susceptible again independently at rate . Assuming Xn(0) = n and Yn(0) = m where
m = mn = n , derive the law a large numbers for this model. 6 Multitype epidemics
The epidemic model studied so far, En m( I ), assumes that the population is homogeneous (with regard to the disease) and that individuals mix uniformly. In real life
epidemics, this is rarely the case. For example, children are usually more susceptible
to in uenza, and sometimes individuals with a previous history of the disease have
acquired some partial immunity. For STDs (sexually transmitted diseases), some individuals have higher infectivity in that they are more promiscuous (varying infectivity
is often the case in other transmittable diseases as well). It may also be that the
infectious periods of di erent individuals are not identically distributed however, the
assumption of independence seems reasonable in most cases. These heterogeneities
can be characterised as individual. A second group of heterogeneities is caused by the
social structure in the population. The model En m ( I ) assumes that an individual has contact with each individual at equal rate (= =n), so that there is uniform
mixing. In real life, the presence of social structures, such as households, friendly
(including work) relations, and geographical structures, violates this assumption.
In the present chapter we relax these assumptions and examine the consequences
to the spread of disease. In Section 6.1 we indicate how to generalise the Sellke construction and the exact results presented in Chapter 2, to the case where the contact
rates between di erent pairs of individuals as well as the distributions of the infectious
periods may vary. In Section 6.2 we discuss large population approximations. When
considering a sequence of epidemic models indexed by the increasing population size,
this must be done without increasing the number of parameters. We therefore treat
the case where individuals in the population can be characterised by di erent types of
individuals, assuming that individuals of the same type are homogeneous with respect
to susceptibility, infectivity and social mixing and have the same distribution of the
infectious period.
In Section 6.3 a model for epidemics among households is introduced. This model
is motivated by the obvious fact that the rate of transmission tends to be much higher
within households than between individuals of di erent households. We give some
preliminary results, referring the interested reader to Ball et al. (1997) for the full
story on household epidemics. Finally, in Section 6.4 we compare the distribution of
the nal size for a speci c form of the multitype epidemic model with that of the nal
size for the `single type' model En m( I ). 6.1 The standard SIR multitype epidemic model
The model of Chapter 2 can be generalised in a straightforward way to the case where
there are di erent types of individuals. Before de ning the model, originally considered by Ball (1986), we need some notation. Assume the population splits up into k
groups of individuals labelled 1 : : : k. Suppose that initially there are ni suscepti ble iindividuals and mi infectious iindividuals, i = 1 : : : k. Let n = (n1 : : : nk ),
m = (m1 : : : mk ), n = Pi ni and m = Pi mi , the latter two denoting the total
number of initially susceptible and infectious individuals respectively. Finally, we let
i = ni =n denote the proportion of individuals of type i (iindividuals).
Infectious periods of infectives of type i are distributed according to a random
variable Ii with moment generating function i, i = 1 : : : k, and all infectious periods are de ned to be independent. De ne i = E (Ii) and i2 = Var(Ii) for future
use. During the infectious period of an iindividual she has contact with a given j individual at the time points of a homogeneous Poisson process with intensity ij =n.
If the contacted individual is still susceptible, she becomes infectious and is able to
infect other individuals. After the infectious period the individual becomes recovered
and immune and is called removed. All Poisson processes are de ned to be independent. The epidemic ceases as soon as there are no infectious individuals left in the
population.
Note that ij may not necessarily coincide with ji. Included in this `contact
parameter' is not only the rate at which two individuals meet (naturally symmetric)
but also the probability of diseasetransmission which depends on the infectivity of
the infective and the susceptibility of the susceptible. Some special cases of the
general contact parameters f ij g have received attention in the literature. The case
where the contact parameter splits up into a product ij = i j goes under the name
proportionate mixing. The parameters f ig and f j g are then called infectivities and
susceptibilities respectively. The case where individuals vary only in terms of their
susceptibility has also received special attention. In Section 6.4, we study this case
more closely and see how the heterogeneity a ects the nal size of the epidemic.
The model de ned above covers the case where all individuals are di erent: simply
let each individual be a type of her own. The parameters of the model are the
contact parameters = f ij g and the random variables I = (I1 : : : Ik ) describing
the infectious periods. Following the notation of Chapter 2 we denote the model
de ned above by En m( I).
The Sellke construction can be generalised to the model above in a simple fashion. Label the iindividuals (i ;(mi ; 1)) (i ;(mi ; 2)) : : : (i ni) with the infectives listed rst, i = 1 : : : k. Let Ii ;(m ;1) Ii ;(m ;2) : : : Ii n be identically distributed random variables, each distributed according to Ii, i = 1 : : : k. Also, let
Q1 1 : : : Qi n Q2 1 : : : , Qk n be independent and identically distributed exponential random variables, having intensity 1. These are the individual thresholds. All
random variables listed above are de ned to be mutually independent. The initial
infective labelled (i r) remains infectious for a time Ii r and is then removed. Denote
by Yi(t) the number of infective iindividuals at time t, and let
i i i i k Aj (t) = k
X ij Z t
i=1 n 0 Yi(u) du (6.1) be the total infection pressure exerted on a given j susceptible up to time t. The
susceptible labelled (j u) becomes infected when Aj (t) reaches Qj u and she remains
infectious for a time Ij u and is then removed. The epidemic ceases when there are
no infectives left in the population.
Exact results for the model En m( I) are derived in a way similar to that of the
homogeneous case described in Section 2.4. We shall not perform this generalisation
but encourage the reader to do so. The formula for the nal size can be expressed in
a form similar to the nal size formula for a homogeneous population. Introduce the
vector notation
k
a = Y ai
b i=1 bi a
a1
a
XXX and b=0 = k b1 =0 bk =0 and let a b mean ai bi i = 1 : : : k. The nal size of the epidemic is now
speci ed by a vector Z where the component Zi denotes the number of initially
susceptible iindividuals who became infected. Let Pu = P(Z = u). Ball (1986) has
shown that Pu can be derived from the recursive formula
v
k
X n;u Y
Pu
v;u
u=0 i=1 i k
X
j =1 !u +m (nj ; vj ) ij =n i i =n v 0vn resembling the recursive formula for a homogeneous population in Section 2.4. 6.2 Large population limits
In the present section we discuss large population limits of the model En m( I). We
consider the case where the population size n grows but k, the number of di erent
types, is kept xed. A further assumption is that the matrix f i ij j g, containing
the expected number of contacts between all pairs of individuals, is irreducible. (A
k k matrix A = (aij ) is called irreducible if it is impossible to nd a partition D1 D2
of the index set f1 : : : kg such that aij = 0 whenever i 2 D1 and j 2 D2 .) This
assumption eliminates the possibility that part of the community encounters a major
outbreak whereas another subgroup in the population remains una ected. A proof
of the central limit theorem for the model is very much in the same spirit as the proof
for a homogeneous population in Chapter 4, but is more technical and is omitted
here. A complete proof is given by Ball and Clancy (1993). Below we present and
interpret the result.
Let i(n) = n(in) =n, (in) = m(in) =n(in) , i = 1 : : : k, and assume that these quantities
converge as n tends to in nity. For the limits (written without superscripts) it is
assumed that each i is strictly positive whereas i may be 0 or positive. Introduce
the notation Zi0(n) = Zi(n) + mi and Zi0(n) = Zi0(n) =ni, i = 1 : : : k. Thus, Zi0(n) is the proportion among the initially susceptible iindividuals who were infected during the
course of the epidemic plus the ratio i between the number of initially infectious
and initially susceptible iindividuals. As in the homogeneous case, we distinguish
between two casesP
depending on whether the initial number of infectious individuals
is of order n (i.e. i i > 0) or nite. P If
( 1 ::: i i > 0 then the vector
k ) is the unique solution ; fZi0(n)g converges in probability to
to the equations = e; P , where = j = 1 : : : k:
(6.2)
The equation above has a natural interpretation. Let j = 1 + j ; j be the left
hand side above. Then j is the proportion of initially susceptible j individuals who
escape infection. The factor i i in the exponent on the right hand side is the number
of infected iindividuals (divided by n) and i ij is the expected infection force (multiplied by n) exerted on each j individual. The sum in the exponent is thus the total
infection force acting upon j individuals, so the right hand side is the probability for
a j individual to escape infection. The balance equation above then simply states
that the probability of escaping infection equals the proportion of individuals that
escape infection. Ball and Clancy (1993) also prove a central limit theorem for the
~
~
vector Z0(n) with components Zi0(n) = pni Zi0(n) ; i . It is shown that
1+ j j i i i i ij ; ~D
Z0(n) !N 0 (S T );1 S ;1
where S and are matrices de ned by Sij =
ij = ; p i j i j ij and
k
p i j i jX
i (1 ; i ) ij + ij r=1 2 r r ri rj r : Above ij is the Kronecker function and i and i2 are the mean and variance of the
infectious period for iindividuals. It is worth observing that the variance increases
with the variances of the infectious period, and that it coincides with the variance for
a homogeneous population if there is only one type (k = 1).
For the second case where the number m of initially infectives is kept xed as
n ! 1 Ball and Clancy (1993) prove a threshold limit theorem for the multitype
model. The theorem states that Z0(n) converges in distribution to the distribution of
the total progeny of a multitype branching process having m ancestors, and with life
distributions fIig and birth rates f ij j g. The basic reproduction number R0 for the
epidemic model is the largest eigenvalue of the matrix of mean o spring f i ij j g. If
R0 1 it follows from standard branching process theory (e.g. Jagers, 1975) that the
total progeny of the branching process (and hence also for the epidemic) is almost
Q
surely nite. If R0 > 1 there is a positive probability 1 ; i qim that the branching
i process explodes (fqig is the solution to a certain equation). On this part of the
~
sample space the vector Z(n) converges to a normal distribution with mean 0, and
the same variance as for the previous case, except for replacing each i by 0 in the
expression. 6.3 Household model
When modelling the spread of disease in a human population, it is very important to
take into account the formation of small social groups such as households, schools and
work places, the reason being that the spread of infection is usually greatly facilitated
among such groups, which have a high level of mixing. Regarding household models,
Ball et al. (1997) is the main reference. For important contributions to the theory
and practical applications, see e.g. Becker and Dietz (1995), Becker and Hall (1996),
Becker and Starczak (1997) and Islam et al. (1996). A related model is treated in
an early paper by Bartoszynski (1972). Work on outbreaks within households, in the
presence of community infection, but without considering the dynamics of the latter
can be found in e.g. Longini and Koopman (1982), Addy et al. (1991) and Becker
(1989). See also Andersson and Britton (1998).
De nition of the model Initially there are n susceptible individuals and m infectious individuals. Let the
susceptible population be subdivided into n=h households each of size h. (The case of
unequal household sizes can be treated similarly, but the notation becomes cumbersome.) The infectious periods of di erent infectives are independent and identically
distributed according to a random variable I . Throughout her infectious period a
given infective contacts a given individual in the population (within or outside her
household) at rate G=n, and, additionally, a given individual in her own household
at rate L. If the contacted individual is still susceptible, she becomes infectious and
is immediately able to infect other individuals. After the infectious period the individual becomes removed. As usual, all the random variables and Poisson processes
involved are assumed to be mutually independent. Note that, by de nition, infectives make both `global' and `local' contacts with their household members. This is
for mathematical convenience only and in a large population the global rate G=n
can be neglected compared to the local rate L.
Basic reproduction number As usual we study a sequence of epidemic processes indexed by the population size
n. The household size h is kept xed while the number of households grows with
the population size, in contrast with the situation in Section 6.1 where the number of
groups was kept xed while the group sizes grew. By using branching approximations in a heuristic way, it is possible to derive the basic reproduction number R0 . Assume
that n is large and m = 1. The initial infective contacts on average G individuals,
and these individuals will belong to distinct households with high probability. Moreover, each of the individuals infected in this way generates a small subepidemic in
P
her own household, comprising on average ML = h=1 jP1j individuals. Here P1j ,
j
and more generally Pij is de ned as the probability that, given i initial infectives in
the household, ultimately j household members become infected, 0 i j h.
All these new infectives now make global contacts, introducing the disease in other
households, and so on. The branching character of the number of infectious individuals is thus demonstrated, implying that the basic reproduction number R0 is given
by
R 0 = ML G :
Of course, if there are no household formations (h = 1) we arrive at the old expression
R0 = G , whereas R0 = h G if the disease is highly infectious within households
( L = 1).
Final size equation 0
Assume that n is large and that mn=n ! > 0 as n ! 1. Let Zn be the ultimate
number of infected individuals (including the initial infectives), and suppose that this
0
quantity satis es a law of large numbers, Zn=n ! in probability as n ! 1. We
will derive by heuristic means the asymptotic equation for . De ne qj to be the
asymptotic proportion of households with j individuals ultimately infected, so that
h
1 X jq :
=+
(6.3)
h j=0 j
Each individual in a given household of size h will become infected from outside with
asymptotic probability 1 ; exp (; G ), and these infections occur independently of
each other, thus remembering the de nition of Pij above it follows that
j
X h ; ; h;i ; ; i
qj =
e
1;e
Pij :
(6.4)
i=0 i
Equations (6.3) and (6.4) together yield an implicit equation for . It is possible to
derive rigorously the threshold limit theorem for the nal size together with a normal
approximation result in case of a large outbreak. The proof is very similar to the
proof outlined in Chapter 4, but is notationally much more inconvenient, hence we
refer to Ball et al. (1997) for the details.
G G 6.4 Comparing equal and varying susceptibility
In the present section we study a speci c form of the multitype epidemic model de ned
in Section 6.1. We assume all individuals have the same distribution of the infectious period and that the contact parameters f ij g satisfy ij = j . The interpretation of
this is that all infectives are equally infectious { individuals only vary in terms of their
susceptibility. In particular we compare the nal size for such an epidemic with that
of the nal size of the `corresponding' homogeneous population. In reality, one may
not know which model is the correct one, thus motivating such a comparison. As a
byproduct we use a coupling argument to show a surprising result about exponential
variables. This result was proven independently by Proschan and Sethuraman (1976)
and Ball (1985). The proof below is inspired by Barbour, Lindvall and Rogers (1991).
First we prove a lemma and then the main theorem concerning exponential variables,
then we see what drastic consequences this has to the comparison of nal size between
di erent populations. Lemma 6.1 Let Exp( 1), 2 Exp( 2 ) and 1 and 2 Exp( ), where =
( 1 + 2)=2. Assume further that all these random variables are mutually independent.
0000
0 0D
0 0D
Then there exists a coupling ( 1 2 1 2 ) such that ( 1 2 )=( 1 2), ( 1 2 )=( 1 2)
and for which the order statistics satisfy
1 0
(1) 0
(1) In particular it follows that (i) 0
(2)
D (i) 0
(2) almost surely. i = 1 2. Proof. First we mention the standard way of constructing a continuous random
variable X with distribution function F from a uniform random variable U . This
is simply done by evaluating the inverse of the distribution function at the point
U : X = F ;1(U ). It is easy to show that the random variable so obtained has
distribution function F . This type of construction will be used repeatedly, without
explicitly performing the transformation.
Before constructing the random variables we look at the law of the order statistics.
It follows immediately from the de nition of the random variables that P( (1) > t) = P( 1 > t 2 > t) = e; 1 te; 2 t
= e; te; t = P( 1 > t 2 > t) = P( (1) > t) so the rst order statistics are actually identically distributed. The lack of memory
property for exponential variables together with the fact that 1 and 2 have the
same intensity parameter implies that the conditional second order statistics satisfy
P( (2) > t + uj (1) = u) = e; t . In the corresponding probability for (2) we have to
condition on whichever variable is smaller. Introduce = 1=( 1 + 2 ), so 1 = 2
and 2 = (1 ; )2 . Note that = P( 1 < 2). We then have P( (2) > t + uj (1) = u) = e;(1; )2 t + (1 ; )e; 2 t
P( (2) > t + uj (1) = u) = e; t : Let f ( ) denote the right hand side of the rst equation. Then f (1=2) equals the
right hand side of the second equation. It is easy to show that f is a convex function
which attains it minimum at = 1=2. Hence it follows that P ( (2) > t + uj (1) =
u) P ( (2) > t + uj (1) = u).
Now consider the coupling. Let U1 : : : U4 be i:i:d: uniform random variables.
0
0
We construct the rst order statistics (1) and (1) by taking the inverse functions
0
0
of U1 (the fact that the distributions are the same implies that (1)
(1) ). We
0
0
0
construct (2) and (2) using U2 by letting (2) be the inverse function of the conditional
0 given 0 , evaluated at U2 and similarly for 0 . It follows that
distribution of (2)
0
0 . Finally, we use (1)3 and U4 to determine which one of 0 (2) 0 , and 0 and
U
1 and 2
1
(2)
(2)
0 respectively, is the smaller. Since 0 and 0 are identically distributed, we simply
2
1
2
0
0
let 1 = (1) with probability 0.5. More formally
0
1
0
2 =
= 0
(1) 1(U3 0:5) +
0
(1) 1(U3 >0:5) +
0
to equal (1) , 0
0
We select which of 1 and 2 is
become correct. This is done as follows
0
1
0
2 =
= 0
(1) 1(U4 P( 1 2 j (1) (2) )) +
0
(1) 1(U4 >P( 1 2 j (1) (2) )) + 0
(2) 1(U3 >0:5)
0
(2) 1(U3 0:5) : so that the conditional probabilities
0
(2) 1(U4 >P( 1 2 j (1) (2) ))
0
(2) 1(U4 P( 1 2 j (1) (2) )) : It is straightforward to check that this coupling satis es the conclusions of the theorem. The distributional statement of the theorem is easy to show once the coupling
has been constructed:
0
(i) > t) = P ( (i) > t)
inequality follows trivially since (0i) P( The P( 0
(i) > t) = P ( (i) > t): 0
(i) almost surely. The lemma above is used repeatedly in the following theorem. Theorem 6.2 Let X1 : : : Xn and Y1 : : : Yn be independent exponentially distributed random variables: Xi Exp( i ), and Yi Exp( ), i = 1 : : : n, where =
0
( 1 + : : : + n)=n. Then there exists a coupling (X10 : : : Xn Y10 : : : Yn0 ) such that
0D
(X10 : : : Xn)=(X1 : : : Xn) and for which
In particular, and D
(Y10 : : : Yn0 )=(Y1 : : : Yn) Y(0j) X(0j) j = 1 : : : n almost surely.
D Y(j) X(j) j = 1 : : : n: Proof. Start with the independent sequence X1 : : : Xn. Select the rst two variables X1 and X2 and use the lemma to construct 1 and 2 such that a) they are independent and both exponentially distributed with the same parameter ( 1 + 2)=2,
D
D
b) min( 1 2) min(X1 X2), and c) max( 1 2) max(X1 X2). We have thus constructed the sequence ( 1 2 X3 : : : Xn) of independent exponential variables with
parameters
(( 1 + 2 )=2 ( 1 + 2 )=2 3 : : : n) with smaller order statistics than the original
sequence. The same procedure is repeated, choosing the pairs having largest and
smallest value, with the e ect that the order statistics are reduced and the parameters are brought closer and closer to . In the limit the order statistic converges
to Z1 : : : Zn say, which has the same law as the order statistics of n independent
exponential random variables with common parameter . Reordering these variables
at random gives a sample distributed as Y1 : : : Yn, satisfying the statement of the
theorem. The proof is complete. The theorem above is general and not related to epidemic models. The relevance
of it for comparing the distributions of the nal size for a homogeneous population
with that of a population with varying susceptibility is made clear in the following
corollary originally derived by Ball (1985). Below we let (X (t) Y (t)) denote the
number of susceptibles and infectives respectively at t for the standard SIR epidemic,
~~
and (X (t) Y (t)) the corresponding numbers for the multitype epidemic. Corollary 6.3 Consider the (homogeneous) standard SIR epidemic En m ( I ) and
the multitype epidemic En m( I) in which di erent types di er only in terms of susceptibility, and which has the same distribution of infectious periods and the same
initial numbers of infectives and susceptibles as the homogeneous epidemic (i.e. all
P
P
infectious periods have the same distribution as I and i mi =P , i ni = n and
m
ij = j ). Assume further that and f j g are related by = j nj j =n. Then it
~~
is possible to construct versions (X 0 (t) Y 0 (t)) and (X 0(t) Y 0 (t)) of the two epidemics
such that
~
X 0(t) X 0(t) for all t, almost surely.
~
As a consequence the nal sizes satisfy Z 0 Z 0 almost surely. In particular it follows
that
D
D
~
~
X (t) X (t) and Z Z
for any n m 1 and t 0.
Proof. The proof follows immediately if we modify the Sellke construction slightly.
Because individuals vary only in terms of susceptibility (i.e. ij = j ) the infection
pressure exerted on j individuals, Aj (t) de ned in (6.1), satis es Aj (t) = ( j =n) XZ t
i 0 ~
Yi(u)du = ( j = )A(t) PR ~
where A(t) = ( =n) i 0t Yi(u)du. The independent and exponentially distributed
individual thresholds fQjk g were all de ned to have intensity parameter equal to 1. If
we de ne instead the thresholds of j individuals (Qj 1 : : : Qj n ) to have parameter
j = , j = 1 : : : k, then we may assume all individuals to have the same infection
~
pressure A(t). For this modi ed Sellke construction, it follows from Theorem 6.2
that we may construct individual thresholds for the two epidemics such that the
order statistics for the homogeneous epidemic is smaller than those of the model
with varying susceptibility. Since we may use the same realisations of i.i.d. infectious
periods for both epidemics it follows that the pressure A(t) for the homogeneous
~
epidemic exceeds the pressure A(t) in the heterogeneous model for every t using a
similar argument as that used in the monotonicity result in Section 3.3. This implies
that even more individuals in the homogeneous model will become infected, and so
forth.
j The corollary above is not only interesting from a theoretical point of view. In
reallife it is of course hard to know the susceptibilities of di erent individuals, even of
di erent subgroups in the population. If all that is known is the average susceptibility
the corollary then states that a homogeneous population is the worst case. That
is, if we calculate the probability assuming a homogeneous population, then any
heterogeneity in susceptibility can only make things better in that probability mass
is shifted towards smaller outbreaks.
Andersson and Britton (1998) have argued that the calibration by comparing different models having the same arithmetic mean susceptibility is not necessarily a
fair comparison. In fact, by the Sellke construction it seems fairer to have the same
populationaverage of expected individual thresholds. The thresholds are exponentially distributed, so a threshold with parameter i has expected value ;1. With
i
this argument a population with parameters ( 1; : : : Pn) should be compared with a
homogeneous population with parameter H = n;1 i ;1 ;1 . This means that an
i
alternative calibration is to assume that the harmonic means should coincide. Andersson and Britton (1998) perform this comparison and conclude that there is no
corresponding strong result holding for all n and m and all t. There is one domination
result stating that if m is xed, then the probability of a major outbreak is minimised
for the homogeneous community. Their second result holds only for the nal size,
assuming large outbreaks in large communities. For this case they conclude that a
homogeneous population gives a larger outbreak if the disease is highly infectious,
whereas some heterogeneity in the population will cause a larger outbreak if the disease is less infectious. More precisely, they show that if H 2(1 ; e;2) 2:31 then
a homogeneous model will give a larger outbreak than any heterogeneous population
having the same harmonic mean susceptibility, whereas if H < 2(1 ; e;2) then some
heterogeneous population with the same harmonic mean susceptibility will produce
a larger outbreak in case of a major outbreak. Exercises
6.1. Consider the standard SIR multitype epidemic model with two di erent types and assume that the expected length of the infectious period is identical for the two
types 1 = 2 (=1 say). Compute the basic reproduction number R0 in case
a) ij = i j (proportionate mixing). Extend the result to the case with k di erent
types.
b) ii = + and ij = , i 6= j (i.e. the mixing rate is higher within types than
between). 6.2. Calculate numerically the nal size vector for the two examples of Exercise 6.1
with 1 = 2 = 0:5 (in both cases), 1 = 1 = 1, 2 = 2 = 2, = 1 and = 1.
6.3. In Section 6.4 the nal size domination of a homogeneous population was es tablished. To see if the di erence is severe we here study the large population limits
of the nal size (Equations (4.1) and (6.2)) in some examples. Assume the initial
proportion infectives is negligible ( = i = 0) and without loss of generality that
= 1.
a) Compare the nal size of a homogeneous population having = 1:5 with the overall nal size in population in which half of the community has susceptibility 1 = 1
and the other half 2 = 2 (implying the same arithmetic mean).
b) Do the same thing as in a) replacing 1.5 by 3, 1 by 0.5 and 2 by 5.5.
c) Now use the calibration suggested by Andersson and Britton (1998) to compare a
homogeneous population having = 1:5 with a population with half of the individuals
with 1 = 1 and the other half having 2 = 3 (note that the harmonic mean is
unchanged).
d) Repeat c) with all parameters doubled. 7 Epidemics and graphs
Random graphs provide us with a useful tool in understanding the structure of
stochastic epidemic models. By representing the individuals in a population by vertices and transmission links by arrows between these vertices, we obtain a graph that
contains information on many important characteristics, such as the nal epidemic
size, the basic reproduction number and the probability of a large outbreak. The connection between SIR epidemics and random graphs was observed by Ludwig (1974),
and has then been more fully exploited by von Bahr and MartinLof (1980), Ball and
Barbour (1990) and Barbour and Mollison (1990). In the rst two sections of this
chapter the random graph interpretation of the standard SIR epidemic model will
be given we also show that the special case of a constant infectious period yields a
particularly nice class of random graphs.
Random graphs can also serve as models of social networks. We can give a simple
reason for introducing the network concept in epidemic modelling. Many of the
classical results for the standard SIR epidemic model require the population size n
to be large. But, according to the modelling assumptions, contacts between two
given individuals in a large population occur at a very low rate in principle, this
implies that the possibility of repeated contacts is not taken into account. This
observation indicates that the assumption of homogeneous mixing is not very realistic
when describing epidemic spread in large populations. If we wish to allow repeated
contacts between certain pairs of individuals, a possible solution is to pick a graph
describing the relations between individuals, and then let the disease spread along
the social network so obtained. In that way each individual is assigned to a small
neighbourhood of other individuals, and can then contact each of her neighbours at
a `normal' rate. This subject is now receiving increasing attention, see e.g. Altmann
(1995, 1998), Rand (1997), Diekmann et al. (1998) and Andersson (1998, 2000).
It is far from obvious how a suitable form of the network is to be chosen. The
graph should be complicated enough to catch something of the sometimes extremely
irregular contact pattern in a population of living organisms, but at the same time
simple enough to lend itself to mathematical analysis. Here (Section 7.3) we will just
scratch the surface by explaining how to run the Markovian version of the standard
SIR epidemic on a very simple network (whereby the Markov property is destroyed!).
We have so far presented various ways of modelling social relations in a population, ignoring completely the possibility of geographical spread of the disease. In
Section 7.4 we make up for lost ground by presenting some results for a standard SIR
epidemic model on the twodimensional lattice, where the spatial element determines
the progress of the epidemic completely. In order to approximate reality, social space
and geographical space should of course be considered simultaneously, but to nd
natural models for both remains an open problem. 7.1 Random graph interpretation
Consider again the standard SIR epidemic En m( I ), endowing the individuals with
random variables I;(m;1) I;(m;2) : : : In. Now represent the individuals by vertices
of a graph, giving the vertices labels ;(m ; 1) ;(m ; 2) : : : n. If the ith individual
ever becomes infected, we know that she will contact a given individual j in the
population with probability pi = 1 ; exp(; Ii =n). Hence, for each given ordered pair
(i j ) (i 6= j ) of vertices of the graph, let us draw an arrow from i to j with probability
pi, the presence of such an arrow having the following interpretation: \If i ever
becomes infected then i will contact j during her infectious period". Strictly speaking,
these arrows are not drawn independently, but they are conditionally independent
given the I variables. There remains only to mark the vertices ;(m ; 1) ;(m ;
2) : : : 0 (representing the initial infectives) and just follow the arrows. The size of
the subgraph traced in this manner is exactly the nal epidemic size.
The random graph obtained above can be described as a digraph (directed graph)
with prescribed distribution of the outdegrees (the number of outgoing arrows from
a given vertex). Evidently we lose information on events referring to time when
making this construction, since only the epidemic chain is recorded the point is that
characteristics related to the nal epidemic size are found to have nice graphtheoretic
counterparts.
As a simple application we use random graphs to indicate that even in a situation
where the infectivity pro le is extremely complicated, many results may be derived
with the same ease as for the standard SIR epidemic. Consider the sample space
consisting of all nonnegative functions (t), t 0, with nite integral, and put
some probability measure P on this space. Now let ;(m;1) (t) ;(m;2) (t) : : : n(t)
be an independent sample, i(t) giving the infectiousness of individual i at time t
after infection. If the ith individual ever becomes infected, she will contact a given
individual j during her infectious period with probability
1 Z 1 (t) dt :
pi = 1 ; exp ; n
i
0 We can now draw a directed graph and trace the epidemic ow exactly as before.
Using this representation, quantities such as the basic reproduction number and the
distribution of the nal epidemic size may easily be derived. Since latency periods
and complicated time dependent infection rates can be modelled using this device,
it is easy to form the impression that we have found a way to model the spread of
just about any infectious disease. Note, however, that the population is still assumed
to be homogeneously mixing, in the sense that all susceptibles are equally likely to
contract the disease at all times this decreases drastically the realism of the model.
Figure 7.1 illustrates such a random graph with m = 1 initially infectious (marked
with a large black circle) and n = 11 initially susceptible individuals. There are e
u Y
H
HH
H e ; ;
;
; u e
HHz
;@
;
@
R
@ e u A
A
A
A
A
U ;
;;
;;
e
; e J
J J Je u Figure 7.1: Directed graph representation of a realization of Em n( I ) with m = 1
and n = 11. The initial infective is the larger black circle, while all other ultimately
infected are smaller black circles. Small white circles correspond to uninfected individuals. Z = 4 additionally infected in the illustration (infected individuals have smaller black
circles). Note that individuals to the right do not become infected even though there
are potential contacts. The reason is that the disease never reaches this subgroup.
The double arrow to the far right has the interpretation that either individual could
have infected the other, had she been infected. 7.2 Constant infectious period
If the infectious period in the standard SIR epidemic model is constant, it turns out
that the random graph formulation involves the classical Bernoulli random graphs,
which we describe next.
The theory of random graphs was introduced by Erdos and Renyi (1959) and
has been extensively studied ever since. In particular, the class of Bernoulli random
graphs has been very thoroughly explored, see e.g. Bollobas (1985) and Barbour et al.
(1992). This graph model, often referred to simply as the G (N p) model, is de ned
as follows. We are given a set of N labelled vertices. With probability p, we connect
a given pair of distinct vertices i and j by drawing an (undirected) edge between
them. These connections are made independently of each other. It is clear that the
degree Di of the ith vertex, i.e. the number of vertices adjacent to this vertex, is
binomially distributed with parameters N ; 1 and p, so that the average degree is
given by (N ; 1)p. In order to keep the size of the neighbourhood bounded as N
grows large, we have to put p = =N , for some > 0. This implies that Di will be
approximately Poisson distributed with parameter if N is large, according to the classical Poisson approximation theorem.
Two vertices i and j are said to belong to the same component if and only if there
exists a path of edges between i and j . It is thus clear that any graph consists of a
number of disjoint connected components. Properties of these components have been
extensively studied for the Bernoulli graph model. In particular, the following result
is wellknown in random graph theory. Theorem 7.1 Consider the G (N p) random graph model. Assume p = pN = =N ,
N ! 1. If
1 then a vertex chosen at random will belong to a component of
size O(1). On the other hand, if > 1 then the relative size of the largest component
converges in probability to some constant C strictly between 0 and 1, as N ! 1.
Also, a randomly chosen vertex will belong to this giant component with probability C
and it will belong to a component of size O(1) with probability 1 ; C . Turning again to the graph interpretation of the standard SIR epidemic, let us
note that if the infectious period is constant, I
, then the arrows are drawn
independently of each other, and the probability of drawing an arrow from i to j
is the same for all ordered pairs (i j ), i 6= j . Moreover, since only one (if any) of
the two arrows is actually used for disease transmission, it is enough to draw one
undirected edge between i and j , the presence of such an edge having the following
interpretation: \If i ever becomes infected then i will contact j , and vice versa".
Thus, setting N = n + m and p = 1 ; exp(; =n), we have arrived at the G (N p)
random graph model!
Random graph theory can now be invoked to derive results for the corresponding
epidemic process. For instance, Theorem 7.1 above immediately implies that, in the
case where the basic reproduction number exceeds 1, the asymptotic probability of a
large outbreak (the probability of picking a vertex belonging to the giant component)
is equal to the relative size of a large outbreak (the size of the giant component). Such
a result can of course also be derived by calculating the explosion probability of the
approximating branching process and comparing it with the solution to the nal size
equation, but the graph approach gives us much more insight to the phenomenon. 7.3 Epidemics and social networks
We have indicated above how to use directed (undirected) random graphs to describe
the epidemic ow, the arrows (edges) representing possible channels for disease transmission. We now proceed to show how to run an epidemic process on a random graph,
with the edges of the graph representing relations between individuals. Individuals
related in this way will be called neighbours. First the epidemic process on a xed
arbitrary graph is de ned, and then some large population results are derived in the
case where the underlying network is modelled as a Bernoulli random graph. An epidemic process on a xed graph Consider a closed population consisting of N = n + m individuals. Represent the
neighbourhood structure in the population with a labelled undirected graph G , so
that the ith and j th vertices of the graph are connected by an edge if and only if
individuals i and j are neighbours. The graph will often be the result of some random
experiment. We assume that the structure is xed during the course of the epidemic.
Let G be the adjacency matrix of the graph G , so that Gij = 1 if i and j are connected
and Gij = 0 otherwise. It follows that G is a symmetric binary N N matrix with
zeros in the diagonal.
Let us next de ne the dynamics of the epidemic process. We pick m initially infectious individuals at random from the population. An infectious individual remains
so for an arbitrarily distributed time period I . During this time period she makes
`close contacts' with each of her neighbours according to the points of a Poisson process with intensity . If the individual so contacted is still susceptible, then she will
immediately become infectious. After the infectious period, the infectious individual
recovers and is then immune to further infections. All infectious periods and Poisson
processes are assumed to be independent of each other.
Denote the types `susceptible' and `infectious' by the letters X and Y , respectively.
Generally speaking we set Ai = 1 if the ith individual is of type A, and Ai = 0
otherwise. Then de ne the number of individuals, connected pairs and connected
triples with given type con gurations in the following way: A]n =
AB ]n =
ABC ]n = X n+m
i=1
n+m X Ai
AiGij Bj i j =1
n+m X i j k=1
i6=k AiGij Bj Gjk Ck : For instance, X Y ]n(t) is the number of neighbours i, j where i is susceptible and j
is infectious at time t. If the total number of neighbours of a given individual is kept
bounded, then all the quantities above are O(n). Note that AB ]n = B A]n , and that
each pair in AA]n is counted twice.
Bernoulli networks As an illustration, we take the Bernoulli graph model as our underlying network. Set
N = n + m, the total number of individuals, and put p = =n for some > 0. For an outcome G belonging to G (N p) we run an epidemic on G , where for simplicity it
is assumed that the infectious period is exponentially distributed with intensity .
The basic reproduction number of this model is easily derived. Introduce an
infectious individual in a susceptible population consisting of n individuals, where n
is large. This individual has on the average neighbours, each of whom will become
infected with probability =( + ), hence R0 = + :
We now describe the time dynamics of the model. Assume that the proportion
m=n of initially infectious individuals tends to a nontrivial limit as n tends to
in nity. Let us assume that A]n =n, AB ]n=n and ABC ]n =n all tend to deterministic
limits as n ! 1, and denote these limiting processes by a, ab] and abc], respectively.
In Rand (1997) the following system of equations is derived:
dx = ; xy]
dt
dy = xy] ; y
dt
d xx] = ;2 xxy]
dt
d xy] = ( xxy] ; yxy] ; xy]) ; xy]
dt
d yy] = 2 ( yxy] + xy]) ; 2 yy]:
dt
Let us explain the fourth line the other lines are then obtained by similar reasoning.
If the central individual j in a connected XXY triple (i j k) is infected by the individual k, we gain an XY pair (i j ). On the other hand, we may lose an XY pair
(j k) in three ways: the individual j in an Y XY triple (i j k) is infected by the
individual i k infects j directly k becomes removed. The fourth line now follows
readily.
The system is not very useful as it stands, since no description of the time dynamics
of the variables abc] is provided. Fortunately, the equations can be closed at the level
of pairs by the following device. Consider a connected triple (i j k). Then,
P (Ai Bj Ck = 1) = P (Ai = 1 j Bj Ck = 1) P (Bj Ck = 1) :
If n is large, then the in uence on i of the second neighbour k is negligible given the
state of the rst neighbour j , hence asymptotically,
P (Ai = 1 j Bj Ck = 1) = P (Ai = 1 j Bj = 1)
(i
1)
= PPABBj==1) :
(
j This translates to the formula abc] = ab]b bc] for all a b c:
Finally, we insert this relation into the system above to obtain a closed system in the
ve variables x, y, xx], xy] and yy]. The initial conditions are given by x(0) = 1,
y(0) = , xx](0) = , xy](0) = and yy](0) = 2.
This system can be simpli ed considerably. The equation for yy] is super uous,
since yy] does not appear in the other equations. Also the equation for xx] can be
crossed out, since by writing down the di erential equation for x2 and comparing it
with the di erential equation for xx] it follows that xx] = x2 at all times. Finally,
we de ne y = xy]=x, the number of infectious neighbours of a typical susceptible
^
individual. It can be veri ed that
dy = 1 x d xy] ; xy] dx = ( x ; ; ) y
^
^
dt x2
dt
dt
resulting in the di erential system
dx = ; xy
^
dt
dy = xy ; y
^
dt
dy = ( x ; ; ) y
^
^
(7.1)
dt
with initial condition x(0) = 1, y(0) = and y(0) = . For a rigorous derivation of
^
these results (and more), see Altmann (1998).
The equation for the nal size proportion , where the proportion of initial infectives is included as in Chapter 4, is obtained as follows. Divide the third line of (7.1)
by the rst one and integrate to get
+ log(x) = ; (1 + ; x) + y:
^
At the end of the epidemic y = y = 0, implying that
^
1+ ; = exp ; + : Note that this transcendental equation is of the same form as the nal size equation
(4.1). The basic reproduction number R0 appears in the formula, as it should.
It is clear that there should be a qualitative di erence in behaviour between the
standard SIR epidemic in a uniformly mixing population (small per capita infection rate, but a large number of individuals can be contacted) and the same type of epidemic on a social network (large per capita infection rate, but only a small number
of individuals can be contacted). Indeed, an individual with an extremely long infectious period might infect a huge number of individuals under the assumption of
homogeneous mixing. In a social network, however, such an individual will at most
infect all of her neighbours, which is probably a small number, and will otherwise
have no impact on the spread of the disease. Let us explore these thoughts further
by indicating how to retrieve the classical KermackMcKendrick model as a limit of
(7.1) as the number of neighbours ! 1 and the infection rate ! 0 in such a way
that ! b > 0.
Using (7.1) we note that
^
^
d y ;y =; y ;y +R
dt
where R = ; y= is small given the conditions on the parameters and . Gronwall's
^
inequality (Lemma 5.1) applied to the function jy= ; yj now shows that y= and y
^
^
coincide in the limit, hence we need not bother with the variable y. We end up with
^
the system
dx = ;bxy
dt
dy = bxy ; y
dt
which we recognize as the KermackMcKendrick model. Also, R0 =
which is perfectly in line with the above ndings. 7.4 The twodimensional lattice =( + ) ! b= Consider the twodimensional lattice Z2 . Our graph G is obtained by drawing an edge
between two sites i j 2 Z2 if and only if ji ; j j = 1 (nearest neighbour interaction).
Then a Markovian epidemic process is run on G as described in Section 7.3 (without
loss of generality, assume = 1). Note however that this time we are not considering
a sequence of processes on larger and larger nite graphs but rather a single process
on an in nite graph. This and related models have been studied in e.g. Mollison
(1977), Kuulasmaa (1982), Kuulasmaa and Zachary (1984) and Cox and Durrett
(1988). For an excellent introduction to the general theory of interacting particle
systems and percolation processes, see Durrett (1995). Also, the book by Liggett
(1985) provides a thorough treatment of the subject. Following Cox and Durrett
(1988) we will discuss the critical infection rate, a quantity that corresponds to the
basic reproduction number, and the asymptotic shape of the epidemic given nonextinction. Critical infection rate In order to discuss the critical infection rate, we again draw a directed graph to
describe the disease spread without explicitly referring to the actual time dynamics.
We know that a given site i will be infectious for an exponential holding time with
mean 1 (if she ever becomes infected). If i contacts a given neighbour site j during
the infectious period we draw an arrow from i to j , and we say in this case that the
oriented bond (i j ) is open. Otherwise (i j ) is declared to be closed. Now let C be
the set of sites that can be reached from 0 by a path of open bonds. It follows that
C is exactly the set of sites that will ever become infected if we start by making the
origin infectious. Now the critical infection rate may be de ned as
c = inf f : P(jCj = 1) > 0g: (7.2) By de nition, a major outbreak has positive probability if and only if R0 > 1. So if
0 < c < 1 then our basic reproduction number may be written simply as R0 = = c,
but this notation has not been acknowledged in the literature.
It is very di cult to nd good bounds for the critical rate c, or equivalently
the critical probability pc = c=( c + 1). By using the so called zerofunction and
the beautiful comparison theorem of Kuulasmaa (1982) some progress can be made,
as we now explain. The zerofunction is de ned as follows. For each subset A of
fi 2 Z2 : jij = 1g set
(A) = P(all bonds (0 i) i 2 A are closed): Easy calculation yields
(A) = 1;p
1 ; p + pjAj where p = +1 : Fix p, and consider two extreme percolation processes, with zerofunctions 0 and 1
and with critical probabilities p0 and p1 , respectively. The rst one is obtained by
c
c
opening the bonds with probability p independently of each other. Obviously,
0 (A) = (1 ; p)jAj: The other extreme is obtained by, for each site i, opening all the bonds (i i + j ),
jjj = 1, with probability p and closing all of them with probability 1 ; p. We have
1 (A) = 1 ;p jAj 1: It is easily checked that 0 (A) (A) 1(A) for all A, and the comparison theorem
1
tells us that p0 pc p1. The number p0 is exactly 2 (see Kesten, 1982), while the
c
c
c
other probability p1 is unknown. Numerical studies in the physics literature give
c p1 0:5927. Translating to rates yields 1
1:4552. Simulation studies by
c
c
Kuulasmaa (1982) provide us with the narrower interval
1:12 c 1:25: Asymptotic shape Let (t) be the set of removed cases at time t. If the origin is infected initially then,
given nonextinction, we expect the epidemic to grow in all directions of the plane at
a linear rate. The shape theorem of Cox and Durrett (1988) states the following:
If > c then, given nonextinction, there is a convex set D R2 such that for
any > 0 we have t(1 ; )D \ C (t) t(1 + )D (7.3) for t su ciently large. Moreover, the set of infectious sites is situated close to the
boundary of tD. The set (t) necessarily contains many \holes", i.e. areas of susceptible sites that are surrounded by removed sites. Not much is known about the actual
shape of the set D. Exercises
7.1. Consider the graph interpretation of the standard SIR epidemic given in Section 7.1. Describe the distribution of the outdegrees and the distribution of the indegrees.
What happens with these distributions as n ! 1, m = 1? 7.2. Check the details in the derivation of Eq. (7.1). Also, show that for some choices of parameter values , , the proportion of infectives y is increasing initially even
though R0 is below 1. 7.3. An epidemic process on a regular network. A graph on N = n + m vertices is called kregular if all vertices have degree k. For N large, pick a kregular graph
at random and run the epidemic model of Section 7.3 on this graph. Assuming
m = 1 (so = 0 asymptotically), nd conditions on (k ) that render possible
a large outbreak. Also, derive heuristically the following equation for the nal size
proportion :
1; = + + s
+ k where s = + + s
+ k;1 : 8 Models for endemic diseases
All the epidemic models encountered so far have assumed a closed population, i.e.
births, deaths, immigration and emigration of individuals are not considered. However, when modelling the spread of a disease with a very long infectious period or
a disease in a very large population, dynamic changes in the population itself cannot be ignored. Indeed, in a large community the susceptible population might be
augmented fast enough for the epidemic to be maintained for a long time without
introducing new infectious individuals into the community our common childhood
diseases are typical examples. Such a disease is called endemic.
Already Bartlett (1956) proposed a stochastic epidemic model for endemic diseases. This model was then modi ed slightly into what is known as the SIR epidemic
process with demography (van Herwaarden and Grasman, 1995, and Nasell, 1999). In
Section 8.1 this model is presented, and some large population results for it are given.
We also discuss brie y the time to extinction of the epidemic, and the interesting
notion of critical community size, loosely de ned as the population size needed for
the epidemic to persist over a given time horizon with a given probability.
Another way of achieving endemicity is to retain the assumption of a closed population, but to suppose instead that infected individuals lose their immunity after
some time. It should be noted, however, that this is an arti cial way to keep the
epidemic going. In reality, we would not expect to observe a xed population where
individuals contract the same disease over and over again. The special case, where
individuals become susceptible immediately after the infectious period, turns out to
be particularly easy to analyse, much easier than the SIR epidemic with demography.
This model, called the SIS epidemic model, is discussed in Section 8.2. In particular,
the time to extinction is investigated in some detail. Throughout this chapter the infectious period is assumed to be exponentially distributed, thus leading to Markovian
epidemic processes which can be well understood using the techniques of Chapter 5. 8.1 The SIR model with demography
Let us describe the stochastic SIR model with demography, starting with the population dynamics. Individuals are born into the population at a constant rate n
and each of them has an exponentially distributed lifetime with intensity , i.e. the
average lifetime is given by 1= . The population size will thus uctuate around the
quantity n. The reason for choosing sizeindependent birth rates is to avoid population extinction or explosion. Assume now that the population is homogeneous and
homogeneously mixing. Initially there are n susceptible and m infectious individuals in the population. A given infective stays infectious for a time period that is
exponentially distributed with intensity (unless she dies for other causes than the
disease before the end of that period). During that time she contacts a given individ ual at rate =n. If the contacted individual is susceptible, she immediately becomes
infectious and proceeds to infect other individuals. All random variables and Poisson
processes involved are assumed to be mutually independent.
Denote by X (t) the number of susceptibles at time t, and let Y (t) denote the
number of infectives at time t. Then (X Y ) = f(X (t) Y (t)) t 0g is a bivariate
continuous time Markov process with the following transition rates:
from to
(i j ) (i + 1 j )
(i ; 1 j )
(i ; 1 j + 1)
(i j ; 1) at rate
n
i
ij=n
( + )j: The four transitions above correspond to births of susceptibles, deaths of susceptibles, infections of susceptibles, and recovery or deaths of infectives, respectively.
The possibility of births and deaths of individuals is the only feature that distinguishes this model from the standard Markovian SIR epidemic model. As we will see
shortly, this new component in the model will have a great impact on the qualitative
behaviour of the epidemic.
Law of large numbers Consider a sequence (Xn Yn) of SIR models with demography, all of the processes
having the same parameters , and . Assuming that the proportion mn =n tends to
> 0 as n ! 1, we rst derive a law of large numbers for the sequence (Xn Yn) =
(Xn=n Yn=n). We wish to apply Theorem 5.2. The rates ` of Section 5.2 are as
follows:
(1 0) (x y ) =
(;1 1) (x y ) = xy (;1 0) (x
(0 ;1) (x y) = x
y ) = ( + )y leading to the drift function F (x y) = (; xy + (1 ; x) xy ; ( + )y) : Exactly as in Section 5.5, we check that the conditions of Theorem 5.2 are ful lled.
Hence (Xn(t) Yn(t)) tends to (x(t) y(t)) in probability uniformly on compact time
intervals as n ! 1, where (x(t) y(t)) is the solution to the system of di erential
equations
dx = ; xy + (1 ; x)
dt
dy = xy ; ( + )y
dt with initial condition (x(0) y(0)) = (1 ). This system cannot be solved explicitly,
but we can understand its behaviour for large t by nding the stationary points, i.e.
the points where the time derivatives are zero.
Let us introduce two auxiliary parameters. The basic reproduction number is
given by R0 = =( + ), since the true infectious period is exponentially distributed
with intensity + , taking into account the possibility of death before recovery. Also,
let = ( + )= be the ratio of average lifetime to average duration of infection.
There are two stationary points, namely the diseasefree state (1 0) together with the
point
(^ y) = +
x^
1; +
= 1 R0 ; 1 :
+
R0 R0
The rst of the stationary points is stable for R0 < 1 and unstable for R0 > 1,
while the second one is stable for R0 > 1 (and is otherwise negative and therefore
uninteresting). In other words, if R0 < 1 the infection is predicted to die out fairly
quickly. On the other hand, if R0 > 1 then it will rise towards a positive infection
level, called the endemic level.
Central limit theorem Assume that R0 > 1 and suppose for simplicity that the process is started close to the
endemic level (nx ny). Since the process is positively recurrent and all states (i j )
^^
with j 1 communicate, the process will become absorbed into the set of diseasefree
states f(i 0) : i 0g in nite time. Prior to absorption we expect to observe small
uctuations around the endemic level. To examine the nature of these uctuations,
de ne
p;
~
~
Xn(t) Yn(t) = n Xn(t) ; x(t) Yn(t) ; y(t)
t 0:
We wish to apply the central limit theorem of Section 5.4. Using the notation of that
section, we have
^
;^
@F (^ y) = ; yy; x ; x; = R;R01 ;
x^
^
^
0
0;
and
^
^^
; ^^
20
R ; 1)
G(^ y) = (1 + x)x+ xy xy + ( xy )y = R ;(RR; 1) ;(R 0 ; 1) :
x^
; ^y
^
^^
+^
2( 0
0
0
Here we have expressed all quantities in terms of the new parameters R0 and , using
that x = 1=R0, y = (R0 ; 1)=( R0), = R0 and + = . It can be shown that
^
^
the matrix function (t s) of Theorem 5.3 splits into a product ~ (t) ~ ;1 (s), hence
the covariance matrix (t), t 0, satis es the di erential equation
d = @F (^ y) + (@F (^ y))T + G(^ y)
x^
x^
x^
(8.1)
dt by straightforward calculation using the variance expression of the theorem. We
~~
conclude that (Xn Yn) converges weakly on compact time intervals to a Gaussian
~~
process (X Y ) with mean zero and covariance matrix . In particular, an explicit
expression for the covariance matrix for large t is easily derived as the stationary
solution, ^ , to (8.1). By easy calculation,
^ = 12
R0 + R0 ;R0 ;R0 2
R0 ; 1 + R0 = : Time to extinction We still assume that R0 > 1. As already noted above, the time to extinction Tn = inf ft 0 : Yn(t) = 0g from the endemic level is a.s. nite, for any xed n. It is a classical (and very
di cult) problem, going back to Bartlett (1956), to obtain estimates of this random
variable. Not even the expected time to extinction is easily found. An approach
that at rst sight seems natural is to let the population size tend to in nity and
regard disease extinction as the result of a large deviation from a high endemic level.
Such an analysis is performed in van Herwaarden and Grasman (1995), who derive a
complicated asymptotic formula for E (Tn).
Nasell (1999) takes a quite di erent approach when deriving heuristically an approximate expression for E (Tn ). He notes that the coe cient of variation of the
number of infectious individuals in endemicity is given by pn ^ 22 pnpR0 ; 1 + R02=
ny
^ = R0 R0 n(R0 ; 1) pnpR0 ; 1 2
where the last approximation is due to the fact that
R0 . Indeed, if we assume
that the average lifetime is 80 years and consider a disease with an infectious period
of two weeks, say, then is about 2000. Thus, even if the population size is several
millions, the coe cient of variation above is still quite large. This observation suggests
that for a reallife disease in a homogeneously mixing community, extinction is likely
to be caused by a normal uctuation from a not so high endemic level, rather than
by a large deviation from a high level. Simulation results also show that, for realistic
parameter values, the Nasell (1999) formula gives a much better approximation to
the observed time to extinction than does the formula derived by van Herwaarden
and Grasman (1995).
Finally we mention the notion of critical community size. Needless to say, the
distribution of Tn depends on the parameters R0 , and as well as the population size n. The critical community size nc = nc(t p) with time horizon t and extinction
probability p is de ned as the solution n to the equation
P(Tn > t) = 1 ; p:
If p is small, this means that for community sizes larger than nc the infection is likely
to persist more than t time units. Nasell (1999) obtains estimates of the critical
community size for some sets of parameter values.
Below, two simulations of the SIR model with demography are shown in Figure
8.1. In both simulations n = 100 000, R0 = 10 and = 3750 (corresponding to
an infectious period of 1 week and average lifetime of 75 years) implying that the
equilibrium is at (nx ny) = (10000 24). The yaxis corresponds to the number of
^^
infectives and the xaxis to the number of susceptibles. In the top gure the disease
becomes extinct rather quickly, whereas it seems to become endemic in the lower
gure. 8.2 The SIS model
The stochastic SIS model for endemic infections is de ned as follows. Suppose that
we have a closed homogeneously mixing population consisting of n individuals. We
let m of these individuals become infectious at time t = 0. Each infectious individual
remains infectious for a time period that is exponentially distributed with parameter
. During that time the individual makes contact with a given individual at rate =n.
If a contacted individual is susceptible then she immediately becomes infectious. An
infectious individual becomes susceptible again right after the infectious period, in
contrast to the SIR epidemic models where the infectious period is followed by lifelong
immunity to the disease. All infectious periods and Poisson processes are assumed to
be mutually independent.
Since individuals are either susceptible or infectious, it is enough to keep track of
the number of individuals in the infectious state, say. Denote by Y (t) the number
of infectious individuals at time t. Then Y = fY (t) t 0g, can be described as a
simple continuous time birth and death process on the state space Sn = f0 : : : ng,
having the following transition rates:
from to at rate
i
i + 1 i(n ; i)=n
i
i;1 i Y (0) = m. Note that we are using the total population size rather than just the
number of initial susceptibles when scaling the infection rate. The reason is, of course,
that the initial infectives may become infectious several times over the course of the
epidemic and are thus taking a much more active part in the progress of the epidemic
than in the SIR case, where they were merely used to start up the process. 60
50
40
30
20
10
9850 9900 9950 10000 10050 10100 9850 9900 9950 10000 10050 10100 60
50
40
30
20
10 Figure 8.1: Simulations of the SIR model with demography for average population
size n = 100 000 and equilibrium point (10000 ,24). In the top gure the disease
became extinct quickly (absorbed by the xaxis) whereas the disease persisted for a
long time in the bottom gure. Law of large numbers Consider as usual a sequence Yn of epidemic models, all with the same infection rate
and recovery rate . If the proportion of initial infectives, mn=n, tends to a nontrivial limit , 0 < < 1, as n tends to in nity, we expect a law of large numbers for
the scaled process Yn = Yn=n to be valid. To prove this, we use Theorem 5.2. Using
the notation of Chapter 5, we have 1 (y) = y(1 ; y) and ;1(y) = y, so that the
drift function is given by
F (y) = y(1 ; y) ; y:
This function is easily seen to satisfy the condition of Theorem 5.2, hence we conclude
that jYn(t) ; y(t)j tends to zero in probability uniformly on compact time intervals
as n ! 1, where y(t) is the solution to the di erential equation dy = y(1 ; y) ; y
dt
y(0) = . The explicit solution is given by ;1 ; e( ; )t
y(t) = 1 ; + (e( ; )t ; 1) if 6= , while y(t) = =(1 + t) if = . The basic reproduction number is given
by = . We note that y(t) ! 0 as t ! 1 if = 1. On the other hand, if = > 1
then y(t) ! y = 1 ; = > 0 as t ! 1. In this case, the value ny is called the
^
^
endemic level of the process. Even though the deterministic motion y(t) is just an
approximation of Yn(t) valid within a xed nite time horizon, these results indicate
that the stochastic SIS model will behave very di erently depending on whether the
basic reproduction number is below or above 1.
Central limit theorem p In order to study the uctuations of the process Yn, we de ne the nscaled centered
process
p;
~
t 0:
Yn(t) = n Yn(t) ; y(t)
We restrict ourselves to the interesting case where = > 1 and the process is started
close to the endemic level, i.e. Yn(0) ! 1 ; = in probability as n ! 1. Let us
apply Theorem 5.3. We have @F (^) = ; ; 2 y = ;( ; )
y
^
G(^) = y(1 ; y) + y = 2 ( ; )
y
^
^
^
(t s) = e;( ; )(t;s) ~
and the theorem states that Yn converges weakly on compact time intervals to a
~
Gaussian process Y with mean zero and variance function Zt ~
Var(Y (t)) = 2 (
= ; ) 0 e;2( ; )(t;s) ds
;1 ; e;2( ; )t : The corresponding formulas become more complicated if the process is started away
from equilibrium, or in the case where the basic reproduction number is below 1.
Time to extinction We proceed to study the time Tn to extinction of the process Yn, Tn = inf ft 0 : Yn(t) = 0g: Since Yn is irreducible on the set Sn n f0g and has the state f0g as absorbing state,
the time to extinction is a.s. nite. It is of considerable interest to understand what
happens to Tn as the population grows to in nity. The theorem below shows that the
time to extinction for the stochastic model is a quantity that re ects the threshold
behaviour of the deterministic process. In the case
the time to extinction is
always relatively short, while in the case > it may be astronomical.
Before stating the result we need a de nition. We say that an and bn are asymptotically equivalent as n ! 1, an bn , if the quotient an=bn tends to unity as n
becomes large. Theorem 8.1 The time Tn to extinction of the stochastic SIS model has the following
asymptotic properties: (A1) > and mn=n ! > 0 as n ! 1 : Tn=E (Tn) ! U in distribution, where
U is exponentially distributed with parameter 1. Moreover,
E (Tn ) r 2
n( ; e
)2 as n ! 1, where V = log( = ) ; 1 + = > 0. nV (8.2) (A2) > and mn = m 1 for all n : Tn ! T a.s. where P(T < 1) = ( = )m < 1.
Here T is the extinction time for a linear birth and death process with birth rate
i, death rate i and with m individuals in the beginning. On the set where T
is in nite, Tn =E (Tn ) ! U in distribution, with U and E (Tn) as above. (B1) and mn=n ! > 0 as n ! 1 : We have that ( ; (1 ; )) Tn ; log n ; log ; log 1 ; (1 ; ) !W in distribution, where W has the extreme value distribution: P(W w) = expf;e;w g: (B2) and mn = m
P(T < 1) = 1. 1 for all n: Tn ! T a.s. with T as in (A2), but now The theorem states the following: (A1) If the basic reproduction number R0 = =
is above 1 and the number of initial infectives is large, then the time to extinction
Tn grows exponentially with the population size. (A2) If R0 > 1 and we start with a
small number of infectives, then a kind of threshold phenomenon presents itself. On
one part of the sample space Tn stays small, on the other part Tn grows exponentially.
(B1) If R0 1 but the number of initial infectives is large, then Tn behaves like log n.
(B2) Finally, if R0 1 and there is a small number of infectious individuals in the
beginning then the time to extinction is always small.
Proofs and references are given in Andersson and Djehiche (1998). Below we
indicate the proof of the result in the most interesting case (A2). (A1) and (A2) fall
in the class of results on the exponential limiting distributions of rst passage times
of birth and death processes into rare sets of states, see e.g. Keilson (1979). The
excellent book by Aldous (1989) gives a heuristic treatment of related topics. The
results (A2) and (B2) partly rely on the coupling argument given below.
The paper by Nasell (1996) is mainly concerned with the socalled quasistationary
distribution of the process Yn, de ned by Qn(i) = tlim P (Yn(t) = i j Yn(s) 6= 0 0 s t)
!1 i = 1 2 ::: n but interesting results concerning the time to extinction are also obtained. Nasell
notes that, for each n, Tn is exponentially distributed with parameter Qn(1) if the
initial distribution of the process equals Qn , and then uses asymptotic expansions in
order to estimate 1= Qn(1). In the situation (A1), and only there, whether the process
is started from our prescribed xed value or from the quasistationary distribution
is of no importance for the asymptotic results, and indeed, in this case 1= Qn(1) is
asymptotically equivalent to the right hand side of (8.2). For a nice treatment of
quasistationary distributions, see van Doorn (1991).
We now sketch a proof of (A2). By means of a simple coupling we show that, if the
initial number mn of infectives stays constant as n ! 1, then the epidemic process
resembles a linear birth and death process at the beginning of the time development. De ne Z (t), t 0, as a continuous time birth and death process with the following
transition table:
from to
at rate
j
j+1 j
j
j;1 j Z (0) = m. It is wellknown that the time to extinction T for Z satis es P(T <
1) = ( = )m if > . The distribution of T ^ ^be given in a closed form (see e.g.
can
Syski, 1992). Now de ne a bivariate process (Yn Zn) with transition intensities
from to
at rate
(i j ) (i + 1 j + 1) i(n ; i)=n
(i j + 1)
j ; i(n ; i)=n
(i ; 1 j ; 1) i
(i j ; 1)
j; i
^
^
and initial value (Yn(0) Zn(0)) = (m m). It is clear that the marginal distribu^
tions coincide with the distributions of Yn and Z , respectively. Moreover, Yn(t)
^
Zn(t) for all t 0. Clearly the approximation breaks down as soon as the bivariate
process leaves the diagonal. By investigating the jump rates we see that downward
jumps from the diagonal are impossible while upward jumps are very rare as long as
the states (i i) have i2 n. This implies that the two coordinates will stick together
during the whole epidemic on that part of the sample space where the time to extinc^
^
^
tion of Zn is nite. On the other hand, if Zn explodes, then one can show that Yn
reaches the endemic level ny (or rather the integer closest to this number) with high
^
probability.
To study the time to extinction given this latter event, we write Tn = An + K
X
n k=1 Bn(k) + Cn where An = inf ft 0 : Yn(t) = 0 or Yn(t) nyg, Kn is the number of completed
^
excursions from the endemic level, Bn(k) is the length of the kth completed excursion
and Cn measures the time to reach the absorbing state counted from the time of
the last entrance to the endemic level. Of course, Kn = 0 and Cn = 0 if Yn never
reaches the endemic level. By the strong Markov property, the variables Bn(k) are
independent and identically distributed. Also, the number of completed excursions
Kn has a simple geometric distribution with parameter n, say.
The following technical results are proved in Andersson and Djehiche (1998). First,
the probability n of absorption during an excursion satis es
= ; 1 e;nV as n ! 1
p
(8.3)
n
2= with V = log( = ) ; 1 + = > 0.
Also, the expected excursion lengths satisfy E (Bn(k)) r p = as n ! 1:
2
n 2( ; ) Finally, for any C > 0 we have
An ! 0 and Cn
enC
enC
in probability as n ! 1.
Using (8.5), we may write Tn
E (Tn) !0 (8.4) (8.5) K
K
X
1
Kn 1 X Bn(k) :
E (Kn)E (Bn(1)) k=1 Bn(k) = E (Kn) Kn k=1 E (Bn(k))
n n Condition on the event that the approximating linear birth and death process explodes. Then the normed sum to the right tends to 1 in probability by a version of
the law of large numbers, and it is easily checked that Kn=E (Kn) tends to an exponentially distributed random variable. Also, since the expectation of Tn is asymptotically
equivalent to E (Bn(1))= n, Equation (8.2) follows from (8.3) and (8.4). Exercises
8.1. Derive the di erential equation (8.1) for the covariance matrix (t), t 0, of the SIR epidemic with demography. Also derive the expression for ^ . 8.2. Find the endemic level (^ y) for the normed SIRS epidemic (Xn(t) Yn(t)) (see
x^ Exercise 5.3). 8.3. SIRS epidemic model (continued). Suppose that the epidemic process is started at the endemic level (i.e. (Xn(0) Yn(0)) = (^; y)). Apply Theorem 5.3 to derive a
px ^
p
~
~
central limit theorem for (Xn(t) Yn(t)) = n Xn(t) ; x Yn(t) ; y , the nscaled
^
^
centered process. and mn=n ! > 0 as n ! 1. Give
an approximation of the time until all the initial infectives have recovered from their
rst encounter with the disease, thus providing an elementary lower bound for Tn. 8.4. Consider the SIS epidemic with Part II ESTIMATION
Until now, these lecture notes have been concerned with modelling the spread of
disease in a community. Such modelling can be interesting in its own right and certain
characteristics of the models, such as the threshold limit theorem, can provide deeper
insight into the spread of disease. However, another reason for modelling epidemics is
to draw conclusions about particular diseases, for the guidance of health authorities.
In the sections to follow, we therefore examine the equally important area of drawing
inferences about model parameters. Inference in epidemics is nonstandard in that
the epidemic process is often only partly observed. The available data usually consists
of knowledge of those infected during the course of the epidemic, and sometimes at
what time points these individuals exhibited symptoms.This implies that statistical
analysis has to be based on partial observation, which, together with the strong
dependencies in epidemic models, studied in Part I, makes analysis of the simplest
models quite complicated.
In Chapter 9 we assume that the epidemic is observed completely, meaning that
for each individual we know if she was infected or not, and if she was, the times of
infection as well as removal (i.e. recovery and immunity). Such detailed data is in
practice hard to collect, at least for large communities. Still, statistical methods for
such data can be of interest as an indication of how much better estimates would be
if such data had been available. If the answer turns out to be `very much better' then
e orts should be made to collect the necessary data.
In Chapter 10 we treat more realistic types of partial data. We concentrate on the
case where the nal state is observed and brie y discuss the case where the removal
times of infected individuals are also observed. The statistical methods treated in
Chapters 9 and 10 focus on the standard SIR epidemic model however, methods to
extend the techniques to more realistic assumptions which take account of individual heterogeneities in the community and nonuniform mixing patterns, due to the
presence of social networks, are also discussed. When allowing for di erent heterogeneities, statistical analysis quickly becomes cumbersome, particularly in the realistic
case where the epidemic is only partially observed. A di erent approach to statistical
analysis is then to make use of Markov Chain Monte Carlo methods. This technique,
which has yet to be fully developed for infectious disease data, is presented in Chapter
11.
Chapter 12 focuses on the estimation of parameters most relevant for health authorities. More important than knowing the values of certain model parameters is
their interpretation in terms of how to avoid future epidemics by the introduction of
preventive measures such as vaccination. More speci cally, we focus on estimating the critical vaccination coverage, that is, the proportion of the susceptible population
which it is necessary to vaccinate in order to prevent future outbreaks.
In Part I, we observed that if there are only few initially infectious individuals
the epidemic may not take o . When drawing inferences, we can only hope for
consistency in our estimation if we observe a major outbreak. Thus, one has to
condition on the occurrence of a major outbreak. Rather than doing this, we take
the easy way out and assume that the proportion of initially infectious individuals
is positive ( > 0). In the rst part of the text it was assumed that no individuals
were initially immune. In reality this is rarely true: quite often there are immune
individuals, possibly due to a previous outbreak of the disease. The same techniques
and similar results also apply to the case with initially immune individuals, by a slight
transformation of parameters. In Part II of the text we retain the assumption of no
initially immune individuals. Conclusions about the parameters will be misleading
if this assumption does not hold. In Exercises 9.4 and 12.2 we give hints on how
to modify estimation in the case of a known positive proportion of initially immune
individuals. Another problem not discussed in the text is that of model choice we are
not comparing di erent models statistically nor checking if data is consistent with the
general model. In the present notes, parameter estimates are derived on the implicit
assumption that the model is true.
We concentrate on presenting ideas and techniques for estimation, rather than
actually applying the methods to real data. The main references for statistical methods for epidemics are Becker (1989) and Anderson and May (1991), with the latter
excluding uncertainty considerations. The earlier monograph on modelling by Bailey
(1975) also treats statistical inference. See also the recent survey paper by Becker
and Britton (1999). 9 Complete observation of the epidemic process
In this chapter we assume that the standard SIR epidemic process En m( I ) is observed completely. By complete observation is meant that the infection times i and
removal times i (hence also the length of the infectious period Ii = i ; i) of all
infected individuals are observed for individuals who did not become infected we
de ne i = 1 and Ii = 0 for formulas to be consistent. From the data it is obviously
possible to deduce how many individuals are susceptible, infectious (and removed)
for each time t, implying that (X Y ) = f(X (t) Y (t)) t 0g as well as the nal size
Z = n ; X (1) are observed. Based on the observed data we want to draw inferences on the transmission parameter and the distribution of the infectious period I .
We do this by means of Maximum Likelihood (ML) theory. First, we have to make
the meaning of `likelihood' clear for such epidemic processes. Hence, in Section 9.1
we derive the loglikelihood for a vector of counting processes, and also state some
properties of martingales which will turn out useful in the statistical analysis. 9.1 Martingales and loglikelihoods of counting processes
An excellent reference to statistical methods for counting process data is Andersen et
al. (1993), where both theory and many examples are given. In the present section
we state some results and motivate them heuristically. Consider a vector of counting
processes N = (N1 : : : Nk ), where each component Ni(t) counts the number of times
a speci c event has occurred up to time t. Assume that the probability that such
an event occurs in (t t + h] given the history of the whole vector process up until t,
denoted Ht, satis es P(Ni(t + h) ; Ni(t) = 1 jHt) = h i(t)Xo(h) i = 1 : : : k
+
P(N (t + h) ; N (t) = 0 jHt) = 1 ; h i(t) + o(h):
i (9.1) In the equations above i (t), denoted intensity functions, are nonnegative functions
which may depend on the history of the process up until time t. Formally this
is written as i (t) 2 Ht and (Ht )t 0 is called a ltration of  elds. To go into
mathematical details is beyond the scope of these lecture notes { often, it su ces to
understand their meaning heuristically. The equations above imply that, in a short
time interval, only one of the k di erent events can occur (i.e. the corresponding
counting process increases by 1). Notice the similarities between (9.1) and the jump
rates (5.1) of Chapter 5. The intensities above are more general in that they may
also depend on the process prior to t (i.e. the process is not necessarily Markovian),
and less general in that only one type of jump (increase by 1) can occur.
What is the loglikelihood of the observations on fN (t) 0 t u)g for this process? We derive this heuristically by dividing (0 u) into small increments t of
length h and working out the probability of observing what happens in the increment,
conditional on the history prior to the increment. Let t = t t + h) and let Ni(t) =
Ni(t + h) ; Ni (t) be the number of ievents that occur in this interval. Then the
probability of our observations can be written as 0
Y @Y
(h i(t)) t (0 u) 1;h Ni (t) i X
i !1;P i (t) i Ni (t) 1
+ o(h)A : The two factors within the large brackets above simply pick out which of the probabilities to use, depending on what was observed in t. As the increment length h
becomes smaller, the number of increments increases and all but the increments containing events appear in the second factor above. For small h this factor is equivalent
P
to exp (;h i i(t)). The expression above decreases (except if no events have occurred) because of the h's appearing in the inner product. This is not surprising: the
probability of observing the events exactly at some given time points is 0! However,
if we divide the expression above by h to a power equal to the observed number of
jumps, and let h tend to 0, it can be shown that the logarithm of the expression above
converges to `u(N (t) 0 t u) = XZ u
0 i (log i(t;)]dNi (t) ; i(t;)dt) (9.2) and this is the loglikelihood of the observed process, with i(t;) denoting the intensity just before t. In the expression above dNi(t) is 1 at times t when the counting
process increases, and 0 otherwise, so the rst integral is actually a sum.
The loglikelihood is used for drawing inferences about parameter(s) of the intensity functions. From now on, we therefore write i( t;) and `u( ) for the intensities
and the loglikelihood respectively. Below a one dimensional parameter is considered,
otherwise partial derivatives should be used. The true, unknown parameter value is
denoted by 0 . Further, all derivatives will be with respect to the parameter and not
time. If we have observed the counting process up until u, the MLestimator ^u is
the solution which maximizes `u( ). We nd the maximum by di erentiating the
loglikelihood: `0 (
u )=
= X Z u; 0i( t;)
dNi(t) ;
i ( t;)
i Z0
X u 0 ( t;)
i 0 i
i( 0(
i t;)dt t;) (dNi(t) ; i( t;)dt) : (9.3) The process above, viewed as a function of u is often called the score process in
statistical literature. The MLestimator ^ is the solution of the equation `0u( ) = 0. For the true parameter 0 , i.e. using the true intensities, the integration increments
dNi(t) ; i( 0 t;)dt have zero mean (cf. Equation 9.1). Such zeromean increments
are closely connected to martingales, a notion we now describe.
A process M = fMu u 0g is said to be a martingale (with respect to the history
Ht) if it satis es two conditions: E(jMuj) < 1 for all u, and E (MujHt) = Mt for
u t. The theory of martingales is rich and profound a rigorous presentation is
given in Andersen et al. (1993), Chapter II, and a less mathematical description
with a view to applications in epidemics is to be found in Becker (1989), Chapter 7.
We now state some properties of martingales relevant for our application (here we
only consider 1dimensional martingales, but many results can be extended to the
multivariate case). The single most important property of a martingale M is the
second de ning property, namely that its expected future value equals its present
value. In our case the martingale will be `u( 0 ) which is 0 at u = 0, implying that
~
E (`u( 0 )) = 0 for all u. Further, a linear combination aM + bM of martingales is
a martingale. If a bounded predictable process, i.e. a bounded process X for which
X(t) is determined by Ht; (the history up to but not including t), tis integrated with
R
~
respect to a martingale M , then the resulting process M (t) = 0 X (u)dM (u) is a
martingale.
From the properties stated above it follows that the score process f`u( 0 ) u 0g,
evaluated at the true parameter value 0 , is a martingale. For many martingales it is
possible to prove central limit theorems as some quantity tends to in nity. In addition
to the mean we therefore need Rto derive the variance of a martingale. Consider
a martingale of the form Mu = 0u f (t)(dN (t) ; (t;)dt), where N (t) is a counting
process with intensity (t) and f is a predictable process (as in our case above). De ne
R
the socalled optional and predictable variation processes by M ]u = 0u f 2 (t)dN (t)
R u f 2(t) (t;)dt respectively. The following lemma explains why they
and M u = 0
are called variation processes. Lemma 9.1 Suppose the martingale M de ned above has nite second order moments. Then Var(Mu) = E M u ] = E M ]u ]. R Proof. The second equality is immediate since M ]u ; M u = 01 f 2(t)(dN (t) ;
(t;)dt) is itself a martingale, so its expectation is 0. The rst P
equality is derived
as follows. The de ning integral of Mu is the limit of the sum
f (t)( N (t) ;
(t)h) using the same notation as before. The variance of such a sum is the sum of
covariances. Each such covariance term is equal to the expectation of the product
since
t E f (t)( N (t) ; (t)h)] = E E f (t)( N (t) ; (t)h)jHt]]
= E f (t)E N (t) ; (t)hjHt ]]
= 0: This follows because the increments have mean 0. Consider a covariance term with
disjoint time intervals s and t where s + h < t. Then
E f (s)( N (s) ; (s)h)f (t)( N (t) ; (t)h)]
= E E f (s)( N (s) ; (s)h)f (t)( N (t) ; (t)h)jHs+h]]
= E f (s)( N (s) ; (s)h)E f (t)( N (t) ; (t)h)jHs+h]]
= 0:
The covariance between terms of disjoint time intervals is hence 0. Now the diagonal
terms, i.e. the variance terms, are
Var f (t)( N (t); (t)h)] = E f 2 (t)( N (t); (t)h)2 ] = E f 2(t)E ( N (t); (t)h)2 jHt ]]:
The factor inside the inner expectation takes the value (; (t)h)2 if there is no jump in
the interval and the value (1 ; (t)h)2 1 if there is a jump, and the latter happens
with probability (t)h. Neglecting terms of order o(h) we thus have E ( N (t) ;
(t)h)2 jHt;] = (t)h and consequently the variance term equals E f 2(t) (t)h]. The
variance of the sum hence equals the sum of variances:
X
X
Var Mu ] = E f 2 (t) (t)h + o(h)] = E f 2(t) (t)h + o(h)]:
t t R Taking smaller and smaller intervals shows that this converges to E 0u f 2(t) (t;)dt]
as stated.
Using the same technique one can prove the following lemma stating that martingales from counting processes with no simultaneous jumps are uncorrelated. Lemma 9.2 Consider two martingales M1 and M2 de ned by
M1 (u) =
M2 (u) = Zu
f1 (t)(dN1(t) ; 1(t;)dt)
0
Zu
0 f2 (t)(dN2(t) ; 2(t;)dt) where N1 and N2 are counting processes without simultaneous jumps having intensities
1 and 2 respectively and f1 and f2 are predictable processes. Then the covariance
function satis es Cov(M1(u) M2 (t)) = 0 for all u t 0.
Proof. The lemma is proven using methods similar to those in the proof of Lemma
9.1. Hints are given in Exercise 9.1.
We conclude this section by stating a martingale central limit theorem due to
Rebolledo (1980) (see also Andersen et al., 1993, p 83). For this we study a sequence
of martingales M (n) , indexed by n. Let M (n) be a martingale containing all jumps
of M (n) larger than in absolute value, and let the function m(u) be a continuous
increasing deterministic function. P
Theorem 9.3 Assume that M (n) ]u!m(u) or M (n) P
!m(u) and that for each >
P
0, M (n) (u)!0 for all u. Then M (n) ) M (1) , where M (1) is a continuous Gaussian
martingale with M (1) u = M (1) ]u = m(u). In particular M (1) (t) ; M (1) (s) is
Gaussian with mean 0 and variance m(t) ; m(s) for t s.
u We omit the proof of the theorem. In the Markovian case it can be proven using
methods similar to those presented in Chapter 5. 9.2 MLestimation for the standard SIR epidemic
Assume now that we observe En m( I ) up to some time u, that is we observe all
infection times i and removal times i that have occurred up until u. De ne N1(t) =
n ; X (t), a counting process for the number of infections that have occurred in (0 t],
and N2 (t) = n + m ; (X (t) + Y (t)) a counting process for the number of removals
in (0 t] (remember that X (0) = n and Y (0) = m = n). The rst counting process
has intensity ( =n)X (t;)Y (t;). The counting process N2 has a more complicated
intensity. If F denotes the distribution ofP infectious period I and f its density
the
function then the intensity function at t is i f (t ; i )=(1 ; F (t ; i )), where the sum
extends over individuals which are infectious at t;. Applying results of the previous
section and manipulating the integrals for the counting process N2 it follows that the
loglikelihood is given by `u( F ) = Zu + (log ( =n)X (t;)Y (t;)]dN1 (t) ; ( =n)X (t;)Y (t;)dt) X 0
i i u< log(1 ; F (u ; i)) + i X i i u log f ( i ; i ): (9.4) To estimate we di erentiate the loglikelihood with respect to
@ ` ( F ) = Z u 1 dN (t) ; X (t;)Y (t;)dt = N1 (u) ; Z u X (t;)Y (t;)dt
1
@u
n
0
0
where, as before, X (t) = X (t)=n. (Whenever integration is with respect to Lebesgue's
measure (dt), as on the far right above, one may replace the argument `t;' by `t', this
is done in what follows.) Solving the likelihood equation @@ `u( F ) = 0 thus gives
the MLestimate
^ = R u N1 (u) :
(9.5)
0 X (t)Y (t)dt
Estimating properties of the infectious period can be done parametrically or nonparametrically. In particular, if the epidemic is observed to the end, u = T =
inf ft Y (t) = 0g, then the rst sum in (9.4) is empty and inference is based on the observed lengths of the infectious periods Ii among individuals who were infected.
For example, = E I ] is simply estimated by the observed mean I .
For the rest of this section we treat the Markovian version of the standard SIR
model, i.e. assuming that the length of the infectious period is exponentially distributed with mean length 1= . The reason for doing this is that convergence results
can be derived from the theory presented in Chapter 5, but most results are true for
the general case. For the Markovian case the intensity for N2(t) is simply Y (t) (each
infectious individual is removed at constant rate ). The derivative of the likelihood
with respect to then becomes
@ ` ( ) = Z u 1 (dN (t) ; Y (t)dt) = N2(u) ; Z u Y (t)dt:
2
@u
0
0
Consequently, the ML estimator is
N
^ = R u 2(u) :
(9.6)
0 Y (t)dt
A more natural parameter to estimate is 1= , the expected length of the infectious period, which is of course estimated by the inverse of the estimator above. In particular,
R
if the epidemic is observed to the the end u = T the estimator is 0T Y (t)dt=N2(T ),
and this is simply the aggregated length of all infectious periods divided by the number who were removed. The estimator is hence simply the arithmetic mean of the
observed infectious periods.
We will now prove that the MLestimators are asymptotically normal. First we
apply the martingale central limit theorem to the score processes. Lemma 9.4 Consider the Markovian version of the standard SIR epidemic with
initial values X (n)(0) = n and Y (n) (0) = n. De ne the normed score processes
(
(
S1n) (u) = n;1=2 @@ `u( ) and S2n) (u) = n;1=2 @@ `u( ), evaluated at the true parameter values ( 0 0). Then
(
S1n) ) S1 and (
S2n) ) S2 where S1 and S2 are Gaussian martingales. The covariance function of S1 and S2,
denoted v1 and v2 respectively, are de ned through the deterministic limiting functions
x(t) and y(t) of Section 5.5, and are given by v1(u) =
v2(u) =
respectively. ;2
0
;2
0 Zu
0 x(t)y (t)dt =
Z0 u
0 ;2 (1
0 ;2
0 y (t)dt = 0 (1 + ; x(u))
; x(u) ; y(u)) Proof. The result follows immediately from the martingale central limit theorem
(Theorem 9.3). The rst assumption of the theorem was shown in Section 5.5, and
since the size of all jumps are n;1=2 for the normed process, there will be no jumps
larger than for n large enough.
We are now able to prove the main result of this chapter, a result originally given
by Rida (1991).
Theorem 9.5 The MLestimators ^ and ^ are asymptotically Gaussian with asymptotic means 0 and 0 (the true parameter values) respectively. The asymptotic variance of ^ is 1=nv1(u) and for ^ the variance is 1=nv2(u). Consistent estimates of the
;
standard errors of ^ and the more natural parameter ^ 1 = 1=^ (the average length
of the infectious period) are
p
s.e. ^ = ^= N1 (u) and
;
s.e. ^ 1 p
;
= ^ 1= N2(u): For the case where the epidemic is observed to its end (u = T ) the deterministic
integrals appearing in the variance functions v1 and v2 may be replaced by ; and
respectively ( was de ned in Chapter 4). Similarly, the square root expressions for
the standard errors then becomes Z and Z + n respectively.
Proof. We prove the statement for ^ , the proof for ^ being similar. The estimator ^
is the solution to the likelihood equation @@ `u( ) = 0. Multiplying the likelihood
p
equation by 0= n gives
1 N (u) ; Z u ^X (t)Y (t)dt = S (n) (u) + p ( ; ^) Z u X (t)Y (t)dt:
1
0= p
1
01
0 n
0
Rearranging the formula yields pn ^ ; n 0 (
S1n) (u)
:
(9.7)
0 = ;1 R u
0 0 X (t)Y (t)dt
(
According to Lemma 9.4 S1n)(u) converges to a Gaussian random variable with mean
R
0 and variance v1 (u). Further, by Theorem 5.2 ;1 0u X (t)Y (t)dt converges in proba0
R
bility to 0u x(t)y(t)dt = v1 (u). Hence, by Slutsky's theorem it follows that ^ is asympR
totically normal with prescribed mean and variance. That 01 0 x(t)y(t)dt = ;
is easily shown from the deterministic equations in Section 5.5. Finally, N1 (u) also
R
converges in probability to 0u 0 x(t)y(t)dt using Theorem 5.2, from which it follows
that the standard error is consistent.
Of course, the basic reproduction number 0 = 0= 0 is estimated by ^ = ^=^,
where ^ and ^ are de ned by (9.5) and (9.6) respectively. This estimator is also
asymptotically Gaussian as the following corollary shows. Corollary 9.6 The estimator ^ = ^=^ is asymptotically Gaussian with mean 0 and
;
;
2;
2
variance v1 1 (u) + 0 v2 1(u) =(n 0 ). If the epidemic is observed to its end (i.e. u =
q
^) = ^ ;Z ;1 + (Z + );1 =n.
T ) then a consistent standard error is given by s:e:( Proof. The proof consists of two steps. First we argue that ^ and ^ are asymptotically
independent. The second part, left for Exercise 9.2, consists of applying the method,
i.e. expanding ^ and ^ around their true values 0 and 0, by Taylor's theorem, and
using their independence. To show that ^ and ^ are asymptotically independent we
p;
(
;
note that, by de nition n ^;1 ; 0 1 = S2n) (u)=N2(u) and a similar expression for
pn ^ ; 0 is given in Equation (9.7). The denominators in the two expressions
(
(
converge in probability to deterministic limits, and S1n) and S2n) are uncorrelated
due to Lemma 9.2 which proves asymptotic independence.
Statistical analysis for various extensions of the standard SIR model, assuming
complete observation, has not received much attention in the literature. One reason
for this is that the epidemic process is rarely observed in such detail. Another reason is
that for many models (e.g. most network models) stochastic properties of the epidemic
process are unknown, only properties of the nal state having been derived. Two
exceptions are Rhodes et al. (1996) and Britton (1998a). The former considers a
more detailed model for a homogeneous community and assumes even more detailed
data information about actual contacts is known. For such data Rhodes et al. (1996)
derive estimation procedures for various parameters, including the probability that
a contact leads to infection. Britton (1998a) derives estimation procedures based
on complete data for the Markovian version of the multitype epidemic model (see
Chapter 6). It is shown that the infectivity of a given type can only be estimated
consistently if the corresponding susceptibility di ers from all other susceptibilities. Exercises
9.1. Prove Lemma 9.2. (Hint: Use the same technique as in the proof of Lemma 9.1, i.e. to divide the interval into small disjoint intervals of length h and show that the
conditional expectation of each term is 0, the diagonal terms as well! To be precise
the terms are of order o(h2).) 9.2. Prove the remaining part of Corollary 9.6. (Hint: Apply the method together with the results of Theorem 9.5 and independence to derive the asymptotic variance
of the estimator ^.) 9.3. Table 9.1 on the next two pages describes an outbreak of smallpox in a Nigerian village (taken from Table 6.1 p 112113, Becker, 1989). For each day the number of
susceptibles and infectives are given as well as the number of infections that occurred
during that day. (Note that the number of infectives does not increase as soon as an infection occurs. The reason for this is that individuals are latent for some time
before turning susceptible. This does not a ect estimates.) Assume this to be an
outbreak from the Markovian version of En m( I ).
a) Estimate the infection parameter , the mean length of the infectious period ;1
and the basic reproduction number = = .
b) Give standard errors for the estimates. 9.4. Suppose that instead of having X (0) = n we have X (0) = sn, meaning that only a proportion s of those not initially infectious are susceptible and the remaining
are immune (if the number of removed is denoted Z (t) we hence have Z (0) = (1 ; s)n
under the present assumption). What is the e ect on the MLestimator ^ and its
variance? Answer the question by considering two communities having the same
numbers of infectives and susceptibles initially and all through the epidemic, only one
community having no initially immune and the other community having, say, half as
many initially immune as initially susceptible. (Hint: Note that the two communities
have di erent n = X (0) + Z (0)). Table 9.1. The spread of smallpox in a Nigerian village (continued)
Number Number Number
Number Number Number
Time infectious suscept. infected Time infectious suscept. infected
t
I (t)
S ( t)
C (t)
t
I (t)
S (t)
C (t)
0
1
119
1
22
1
111
1
1
1
118
0
23
2
110
0
2
1
118
0
24
2
110
0
3
1
118
0
25
2
110
1
4
1
118
0
26
5
109
0
5
1
118
0
27
6
109
2
6
1
118
0
28
5
107
0
7
1
118
1
29
5
107
2
8
1
117
0
30
4
105
0
9
1
117
1
31
5
105
0
10
1
116
0
32
5
105
0
11
1
116
0
33
2
105
0
12
1
116
3
34
1
105
1
13
1
113
1
35
1
104
0
14
1
112
0
36
2
104
0
15
1
112
0
37
2
104
1
16
1
112
0
38
1
103
1
17
1
112
1
39
2
102
0
18
1
111
0
40
2
102
0
19
1
111
0
41
4
102
0
20
1
111
0
42
4
102
2
21
1
111
0
43
5
100
1 Number Number Number
Number Number Number
Time infectious suscept. infected Time infectious susceptib. infected
t
I (t)
S (t)
C (t)
t
I (t)
S (t)
C (t)
44
5
99
1
64
5
90
0
45
5
98
1
65
4
90
0
46
4
97
0
66
3
90
0
47
4
97
2
67
5
90
0
48
3
95
1
68
3
90
0
49
3
94
0
69
2
90
0
50
1
94
0
70
2
90
0
51
2
94
0
71
2
90
0
52
3
94
0
72
3
90
0
53
3
94
2
73
3
90
0
54
3
92
0
74
1
90
0
55
2
92
0
75
1
90
0
56
4
92
0
76
1
90
0
57
5
92
0
77
2
90
0
58
5
92
1
78
2
90
0
59
5
91
0
79
1
90
0
60
5
91
0
80
1
90
0
61
7
91
0
81
1
90
0
62
8
91
0
82
1
90
0
63
6
91
1
83
1
90
0 10 Estimation in partially observed epidemics
In the previous chapter, statistical analysis was based on what we called complete
observation of the epidemic process both the times of infection and removal (recovery)
for all infected individuals were observed. In real life such detailed data is rarely
available. In the present chapter we study estimation procedures for less detailed
partial data. The likelihood for partial data is usually cumbersome to work with,
being a sum or integral over all complete data sets resulting in the observed partial
data. Other estimation techniques can turn out to be much simpler. Below we present
two general techniques which have been successful in several epidemic applications:
martingale methods and the EMalgorithm.
The two techniques are applied to a speci c estimation problem. First, we use
martingale methods to derive an estimator for the basic reproduction number in the
standard SIR epidemic, assuming that only the initial and nal states are observed.
Second, we consider a discrete time version of the models presented previously, introducing chainbinomial models, which are interesting in their own right. Assuming
that the chains are observed we make use of the EMalgorithm to deduce an estimator of the transmission parameter. It is worth pointing out that pure likelihood
methods can be applied in several cases, even when only partial data is available.
If, for example, several small subpopulations (e.g. households) are observed, then
MLestimation is often possible from nal size data. Estimation then consists of traditional MLtheory with the additional ingredient of potential numerical problems
due to the complicated form of the likelihood (cf. the exact likelihood in Section 2.4).
Addy et al. (1991) present estimation procedures for nal size data of subpopulations,
allowing for a heterogeneous community and also for transmission from outside the
subpopulation. The ideas extend the early work by Longini and Koopman (1982)
who treat a homogeneous community and a special form of transmission dynamics. 10.1 Estimation based on martingale methods
The standard SIR epidemic En m( I ) was de ned in Section 2.1. Suppose that we
have collected data from such an epidemic and that data consist of knowing how many
individuals have been infected during the course of the epidemic, besides knowing
the initial state (i.e. the number of initially susceptible and infectious individuals
respectively). In many cases the initial state may not be known. The initial number of
infectives is usually small thus not causing much uncertainty the substantial problem
lies in knowing how many individuals are initially susceptible to the disease. Quite
often some individuals are immune even before the disease is introduced, perhaps
due to an earlier outbreak of the same or a similar disease. Then the number of
susceptibles has to be estimated prior to the outbreak, using traditional statistical
methods. Here we assume this number to be known thus neglecting such uncertainty. Our data consists of n = X (0), n = Y (0) (the initial state) and the nal number
infected Z = n ; X (T ), and implicitly knowing that Y (T ) = 0 since there are are
no infectious individuals at the end of the epidemic. As observed in Section 2.4 the
exact distribution of the nal size of the epidemic has a numerically complicated form,
thus not enabling much help in estimation when n is large. One way to estimate is
of course to use the central limit theorem stating that Z is approximately normally
distributed and equating the mean to the observed value Z . Here we take a di erent
approach based on martingale methods, a method applicable in many other cases.
From Chapter 9 we know that Zu
0 ;dN (t) ; X (t)Y (t)dt ; Z u g(t) (dN (t) ; Y (t)dt)
f (t) 1
0
2
0
0 is a martingale for any choice of predictable (leftcontinuous) processes f = ff (t) t
0g and g = fg(t) t 0g. The MLestimator ^ was obtained by equating the martingale to 0 for the choice f (t) = 1= and g(t) = 0, and ^ from f (t) = 0 and
g(t) = 1=R u. This method can no longer be used in the case of nal size data: for
example 0 X (t)Y (t)dt appearing in ^ is not observed. However, the same idea of
equating a martingale to its mean, a special form of the method of moments, can be
adopted by choosing f and g cleverly so that the resulting martingale only depends
on observed quantities. We want to choose f and g such that the `dt' terms cancel
out. This happens if f (t) = 0=( 0X (t;)) (note that f (t) is leftcontinuous) and
p
g(t) = 1. Let M (n) denote the resulting zeromean martingale divided by n and let
0 = 0 = 0 be the basic reproduction number. That is, pnM (n)(u) 1 ;dN (t) ; X (t)Y (t)dt ; Z u (dN (t) ; Y (t)dt)
=
1
0
2
0
0
0 Z 0 X (t;)
Zu
u1
=1
dN1(t) ; dN2(t)
0 0 X (t;)
0
n 1+ 1 + +
1
=
(10.1)
n ; (N1(u) ; 1) ; N2 (u):
0 n n;1 Zu The last equality is true since X (t) = n ; N1(t). For u = T , the end of the epidemic,
its value is determined by the nal size data since N1 (T ) = Z and N2 (T ) = n + Z .
Solving the equation M (n) (T ) = 0 leads to the estimator
1
1
1
^ = n + n;1 + + n;(Z ;1) :
Z+ (10.2) To see that the estimator is reasonable, we look at its large population approximation
using the approximation 1=n +1=(n ; 1)+ + 1=(n ; (Z ; 1)) ; log(1 ; Z ) implying
that
^ ; log(1 ; Z )=(Z + ): From the law of large numbers derived in Section 4.3 it is seen that the estimator
agrees with the large population limit. The large population limit for Z + , denoted
by , is de ned in (4.1), where = E (I ) corresponds to 1= 0 (and hence to 0 ) for
the Markovian case treated here.
A central limit theorem can be derived using methods similar to those in the case
of complete data treated in Section 9.2. From the martingale central limit theorem
(Theorem 9.3) we have the following corollary: Corollary 10.1 The martingale M (n) de ned by (10.1) satis es M (n) ) M. The
limiting process M is a continuous Gaussian martingale with covariance function
v(u) = 12 x(1u) ; 1 + (1 + ; x(u) ; y(u))
0
where x(u) and y(u) are deterministic functions de ned in Section 5.5. The martingale observed at its end, M (n) (T ) = M (n) (1), converges to a Gaussian random
;
variable with mean 0 and variance v (1) = 0 2 ((1 + ; );1 ; 1) + .
Proof. The optional variation process M (n) ] has the following form
Zu 1
1 Z u dN (t)
(n) ] = 1
Mu
2
n 0 0 X 2 (t;) dN1(t) + n 0 2
1
1
= 12 1 +
2 + + (X (u) + 1=n)2 + (1 + ; X (u) ; Y (u)):
n0
(1 ; 1=n)
(The fact the N1 and N2 jump at distinct timepoints makes the two martingales
de ned from N1 and N2 independent, cf. Lemma 9.2, so the variation process of their
di erence is the sum of the two variation processes.) From Chapter 5 we know that
X (u) and Y (u) respectively converge to x(u) and y(u) of Section 5.5 almost surely,
uniformly on bounded intervals. Because an integral is de ned as the limit of a sum
with smaller and smaller time increments it then follows that M (n) ]u converges to
p
v(u). Because the jumps of M (n) are of size 1= n we can then apply Theorem 9.3 to
conclude weak convergence of the martingale. A formal proof of the nal statement
is technical and not given here (see Rida, 1991). The variance is as expected since
x(1) = 1 + ; and y(1) = 0, the asymptotic proportions of susceptibles and
infectives respectively at the end of the epidemic. The corollary above immediately induces a central limit theorem for the estimator
^ which we state in the following corollary (see also Rida, 1991). Corollary 10.2 The estimator ^ de ned in (10.2) is asymptotically normally dis tributed with mean 0 and asymptotic variance
1
;
2
2 1;( ; ) + 0
n where is de ned in (4.1) with 0 replacing . A consistent standard error for the
estimator is given by Z + ^2 (Z + ) 1=2 :
s:e:( ^) = p 1
n(Z + ) 1 ; Z
Proof. From the de nition of ^ we have pnM (n)(T ) + 1 ; 1 ! Z T 1 dN1(t):
0=
^ 0 0 X (t;)
R
p
;
Rearranging the formula gives 0 1 ; ^;1 = nM (n) (T )= 0T X (t);1 dN1(t). As noted
earlier, the integral equals n (1=n + + 1=(n ; Z + 1)). In Section 4.2 it was shown
that Z=n converges in probability to ; . Hence it follows that
1+ +
1
n
n ; Z + 1 ! ; log(1 + ; ) in probability, and the right hand side is equal to 0 due to the de nition of .
Together with Corollary 10.1, stating that M (n) (T ) is asymptotically normal we can
p;
apply Slutsky's theorem to conclude that n( 0 1;; ^;1 ) converges to a Gaussian
;
random variable with mean 0 and variance ( 0 );2 0 2 ((1 + ; );1 ; 1) + . Finally, apply the method, i.e. derive the Taylor expansion of the estimator around
4
the true parameter, to conclude that asymptotically Var( ^) = 0 Var( ^;1 ).
Using martingale methods we have shown how to derive an estimator for based
on observing only the nal size. If the complete data were available a better estimator
is of course ^ = ^=^ of Chapter 9, which uses more information. When only the nal
size is observed we cannot estimate the two parameters separately. This is quite
natural: one random quantity (the nal size) enables estimation of one parameter,
and since we have no knowledge of the time evolution of the process, we should not
expect any information about the length of the infectious period.
Sometimes information about the time of diagnosis of infected individuals is also
available. The time of diagnosis is approximately equal to the `removal time' because
social activity is often reduced when symptoms become obvious, and so is the infectiousness. This motivates the study by Becker and Hasofer (1997), which derives
estimation procedures for data on removal times, assuming the Markovian version
of the standard SIR epidemic. Besides the martingale considered above, they derive
~
another estimating equation from the martingale M (t) = X (t)(1+ =n)n+ n;X (t);Y (t) .
This enables them to estimate and separately as in the case of complete data.
Britton (1998a) has constructed estimators, using martingale methods similar
to those of the present section, for nal size data of the multitype SIR model of Chapter 6, assuming proportionate mixing. Susceptibility parameters are shown to
be asymptotically Gaussian whereas infectivities and the basic reproduction number
cannot be estimated consistently. An explanation is that the dimension of the data is
lower than the number of parameters, and by observing those who were infected, not
much can be said about those who caused the disease to spread further, i.e. about
the infectivities. 10.2 Estimation based on the EMalgorithm
The EMalgorithm was rst named by Dempster, Laird and Rubin (1977), but the
method dates back even earlier. It is a method aiming at simplifying likelihood inference by introducing more detailed hypothetical data. First, we explain the general
method very brie y, and then give an application to epidemic inference.
Let x denote the observed data and suppose we wish to estimate some parameter(s)
, but that the loglikelihood `( x) is cumbersome to work with. Suppose further
that estimation would be much simpler, or even trivial, if some hypothetical data y,
containing x, were available. The EMalgorithm then suggests that one `pretend' this
more detailed data is available and estimate the parameter(s). Then one computes
the expected value of the hypothetical data given the observed data. This algorithm
is repeated until the change in value of the parameter estimate is negligible. Using
mathematical notation one should start with an initial parameter choice (0) . Then
perform the E(xpectation)step and M(aximization)step repeatedly, where the Eand Msteps respectively consist of computing Estep : f ( x (j;1) ) = E (`( Y )jx (j;1) )
Mstep : (j) = argmax f ( x (j;1) ):
This algorithm is iterated until the di erence between (j) and (j+1) is negligible.
The resulting value is the parameter estimate. The hard part of the EMalgorithm
lies in inventing suitable imaginary data. How this is done varies from application to
application.
We now apply this method to an estimation problem in epidemics, where we
consider a discrete time epidemic model within the class of chainbinomial models.
For each integer k, let Xk and Yk respectively denote the number of susceptible and
infectious individuals at time, or generation, k. The name chainbinomial comes from
the way susceptible individuals are reduced (i.e. become infected): given Xk = xk
and Yk = yk the number of susceptible individuals at k + 1, Xk+1, is binomially
distributed with parameters xk and qy . This means that a susceptible individual in
generation k remains susceptible at k + 1 if she escapes infection from each of the
yk infectious individuals at k, and these noninfective events are independent and
occur with probability q. Here we concentrate on estimation of q based on the data
(x1 y1) : : : (xr yr ) we do not have to consider how long individuals are in the latent
k state and infectious before becoming removed. The simplest and most studied model
is known as the discrete time ReedFrost model where it is assumed that the infected
individuals become infectious immediately and are removed in the next generation
implying that yk+1 = xk ; xk+1. All we assume here is that infections occur according
to the chainbinomial model. We therefore introduce the notation ik+1 = xk ; xk+1 to
denote the number of individuals that become infected in generation k + 1, contained
in the data.
The likelihood with respect to q of the data is
r;1
Y xk k=1 y x +1
y i +1
xk+1 (q ) (1 ; q )
k k k k and the loglikelihood is given by `(q fxk yk g) = c + r;1
X
k=1 (xk+1 log(qy ) + ik+1 log(1 ; qy )) :
k k To maximize this expression is numerically complicated if the number of infectious
individuals is high in some generations. The reason is that the derivative of the second
term above then becomes messy. Suppose that rather than observing (xk+1 ik+1) we
observed xk+1 0 xk+1 1 : : : xk+1 y , where xk+1 j is the number of individuals susceptiP
ble in generation k who had j (out of yk ) infectious contacts. Note that j 1 xk+1 j =
ik+1 and x;k+1 0 = xk+1. This data extension separates the probability (1 ; qy ) into
the terms yj (1 ; q)j qy ;j . Estimation of this extended model is close to trivial. The
loglikelihood is given by
k k k k `(q fxk 0 xk 1 : : : xk y 1 k; yk g) = c + r;1 y
XX
k k=1 j =0 xk+1 j (j log(1 ; q) + (yk ; j ) log q) : Di erentiating this and equating the likelihood equation to 0 reveals that the MLestimator for the extended data is
Pr;1 Py (y ; j )x
Pr;1 y x ; Pr;1 Py jx
k+1 j
k=1 j =0 k
= k=1 k k Pr;1k=1 j=0 k+1 j :
q = Pr;1 Py
^
k=1 j =0 yk xk+1 j
k=1 yk xk
Note that the estimator is simply the number of contacts not resulting in infection
divided by the total number of contacts. Unfortunately the data xk+1 1 : : : xk+1 y is
not available. We therefore replace the unobserved quantities above by their expected
values given the available data. For j 1 we have
;y (1 ; q)j qy ;j
E Xk+1 j jfxk yk g q] = ik+1 j 1 ; qy
:
This follows because each of the yk individuals who were infected has a binomially
distributed number of infectious contacts, a number conditioned on being positive.
k k k k k k k P The expectation of the negative term in the numerator of q is therefore k ik+1(1 ;
^
y ).
q)yk =(1 ; q
The EMalgorithm starts with an initial guess q(0) and then updates the estimate
iteratively according to the iteration function
k Pr;1 i (1;q( 1))y
k=1 k+1 1;q( 1)
q(j) = 1 ;
Pr;1 y x :
j;
yk
j; k k=1 k k Iterating this function is very easy, since only the numerator has to be reevaluated
each iteration. In practice less than 10 iterations should su ce for the estimator to
converge.
The EMalgorithm has been used successfully in many areas of statistics. In the
analysis of infectious disease data it seems suitable for several problems but often
methods remain to be developed. In Becker (1997) the EMalgorithm is applied to
several problems, with a particular interest in HIV/AIDS data. Exercises
10.1. Consider the data in Exercise 9.3 and adopt the same model assumptions. Estimate and give the standard error assuming only the nal state is observed. 10.2. During October/November 1967 an outbreak of respiratory disease occurred on the isolated island of Tristan da Cunha (see Section 8.2.3 of Becker, 1989, for
more details on on the data). Among the 255 individuals of the island 40 infections
occurred. Assume that one individual was infected from outside and that all other
individuals were susceptible to the disease (reasonable assumption). Estimate and
derive the standard error for the estimate under the simplifying assumption of a
homogeneous uniformly mixing population). 10.3. The standard deviation of ^ based on complete observation of the epidemic process should be smaller than the standard deviation of ^, but how much? Derive
an expression for the ratio of the standard errors for the nal size and complete data
estimators. Compute the ratio numerically for 0 = 1:5 and 0 = 2:5. (Hint: Use
the expressions for the standard errors in Corollaries 9.6 and 10.2 and their large
population limits.)
~
10.4. Prove that M (t) = X (t)(1+ =n)n+ n;X (t);Y (t) is a martingale (this martingale is used by Becker and Hasofer, 1997, to construct an estimating equation). (Hint:
~
~
Look at a small time increment M (t + h) ; M (t) and show that its expectation,
conditioned on X (t) and Y (t), equals 0.) 10.5. Table 10.1 below describes an outbreak of common cold in 664 households of size 5 (taken from Table 2.7, p 31, Becker, 1989). Each household has one introductory
case, after this external infections can be neglected. Infections are observed in terms
of generations and infected individuals become infectious immediately and remain
so over one generation only (this implies that yk = ik using the notation of Section
10.2). A chain 1 ! 2 ! 1 hence implies that (y1 = 1 x1 = 4), y2 = 2 x2 = 2,
y3 = 1 x3 = 1 and y4 = 0 x4 = 1 for this household. Use the EMalgorithm to
estimate q, the probability to escape infection from one infectious individual over
one generation. (When estimating q you should use all households by treating each
chain separately. That is, in the iterative estimator q(k) you should sum also over all
households.) You may of course use exact ML estimates if you prefer.
Table 10.1. Outbreak chains for common cold in households of size 5
Chain
Obs. frequency
1
423
1!1
131
1!1!1
36
1!2
24
1!1!1!1
14
1!1!2
8
1!2!1
11
1!3
3
1!1!1!1!1
4
1!1!1!2
2
1!1!2!1
2
1!1!3
2
1!2!1!1
3
1!2!2
1
1!3!1
0
1!4
0 11 Markov Chain Monte Carlo
Markov Chain Monte Carlo (MCMC) is a computerintensive statistical tool that
has received considerable attention over the past few years. Using MCMC theory,
it is often quite simple to write e cient algorithms for sampling from extremely
complicated target distributions thus, it is not di cult to understand why these
techniques have found important applications in a vast number of di erent areas.
Although the literature on MCMC methods is growing rapidly, the excellent book
by Gilks, Richardson and Spiegelhalter (1996) provides a good starting point for the
interested reader.
In the rst two sections of this chapter we present the main ideas of MCMC,
and apply the techniques to two important situations: Calculating expectations with
respect to complicated probability measures, and optimizing complicated functions.
The optimization method described can sometimes be a good alternative to the EMalgorithm for calculating MLestimators, but this point will not be developed further
here. In Section 11.3 some practical issues are discussed, regarding e.g. convergence
diagnostics. Finally, in Section 11.4 we describe a Bayesian approach for drawing
inferences about basic parameters in stochastic epidemic models of the SIR type.
(Most current applications of MCMC are oriented towards Bayesian inference, as are
also those in the epidemic area. The example described here is taken from O'Neill
and Roberts, 1999, but see also O'Neill et al., 199?.) 11.1 Description of the techniques
The idea which lies behind Markov Chain Monte Carlo, rst introduced by Metropolis et al. (1953), is in fact very simple. Suppose that we wish to sample from some
target distribution (x), x 2 E Rp, which is known only up to some multiplicative
constant. In situations where is su ciently complex so that simple Monte Carlo
methods cannot be used, an indirect sampling method would be to construct an aperiodic and irreducible Markov chain with state space E whose stationary distribution
is given by . This means that after having run the chain to stationarity, we can
extract a (dependent) sample of any desired size from the target distribution. Some
important questions immediately present themselves:
Is it always possible to construct a Markov chain with the desired properties?
How do we nd a suitable starting point?
How do we determine the burnin period, i.e. the number of iterations needed
until the chain is su ciently close to stationarity?
What are the implications of the fact that there are dependencies in the sample? We answer the rst question right away, leaving the other issues until Section 11.3.
Constructing a Markov chain with state space E and stationary distribution is
surprisingly easy. We describe here the MetropolisHastings algorithm (see Hastings,
1970), an algorithm that is easy to understand and implement.
The Markov chain X , = 0 1 2 : : : , is started at some suitable point in E .
Given X = x, the next state X +1 is chosen by rst sampling a candidate point
Y from some proposal distribution q( j x) and then accepting this candidate with
probability (x Y ), where
(y
y
(11.1)
(x y) = min 1 (x)q(x jj x) :
)q(y )
If Y is accepted then of course X +1 = Y , otherwise X +1 = X . As soon as a proposal
distribution, suitable for the problem at hand, has been speci ed implementation of
the MetropolisHastings algorithm is easy. Note that possibly nasty normalizing
factors in the target distribution disappear, since only the ratio (y)= (x) is needed
in the calculation of the acceptance probability (x y).
We now argue that the distribution of X converges to the target distribution
as ! 1. The transition probability is given by P (y j x) = q(y j x) (x y) and Z for all y 6= x (acceptance) P (x j x) = 1 ; q(y j x) (x y) dy (rejection): By checking separately the cases where the quotient in (11.1) is below one and above
one, respectively, it follows immediately that
(x)q(y j x) (x y) = (y)q(x j y) (y x) implying detailed balance for the chain: (x)P (y j x) = (y)P (x j y) for all x y 2 E . Integrate both sides of the detailed balance equation with respect to
x to obtain
Z
(x)P (y j x) dx = (y) for all y 2 E . Thus is stationary for the process X . Under weak regularity
conditions on the proposal distribution it can be shown that X actually converges in
distribution, regardless of the choice of initial value.
The performance of the algorithm depends heavily on the form of the proposal
distribution q. For computational e ciency, q should be chosen so that it can be easily sampled and evaluated. On the other hand, q and must be related in such a
way that the acceptance probability becomes reasonably large otherwise the chain
might be mixing slowly (i.e. moving slowly around the support of ).
Let us now brie y present three widely used updating schemes. The Metropolis
algorithm assumes that q (y j x) = q (x j y ) for all x y 2 E . This is the case if e.g. the
proposal only depends on the distance between the points x and y, q(y j x) = q(jy ;xj).
Then the acceptance probability reduces to
(x y) = min 1 (y) :
(x)
Another simple updating scheme, called the independence sampler, is obtained by
putting q(y j x) = q(y), so that new candidate points are chosen without considering
the present value of the chain. In this case,
w(y
(x y) = min 1 w(x)
)
where w(x) = (x)=q(x). For the independence sampler to work well, q should be a
fairly good approximation to .
Finally we describe the Gibbs sampler. Suppose that E is a subset of the possibly
highdimensional space Rp. We run through the indices i, 1 i p, sequentially,
and for a given index i we choose q(y j x) as follows:
q(y j x) = (yi j x1 xi;1xi+1 xp)
if yj = xj for all j 6= i, and q(y j x) = 0 otherwise. Thus the candidate point y
is obtained by changing the ith coordinate of x according to the full conditional
distribution. A quick calculation shows that (x y) = 1 for all x y 2 E , meaning
that new candidates are always accepted! The Gibbs sampler is particularly well
suited to deal with spatial models, where the full conditional distribution of a single
site often depends only on the nearest neighbouring sites and can consequently be
written down easily. 11.2 Important examples
Calculating expectations The problem of calculating expectations of complicated highdimensional distributions arises in many situations. Let X be a random variable with distribution (x),
x 2 E Rp, and consider the expression E f (X )] = Z f (x) (x) dx for some function f of interest. If it is possible to use classical Monte Carlo simulation
to draw an independent sample X , 1
n, with the correct distribution, then
the time average
n
1 X f (X )
n =1 (11.2) is guaranteed to provide an accurate approximation of E f (X )] if only the sample size
n is large enough. On the other hand, in cases where straightforward Monte Carlo
methods fail, we may instead use MCMC to obtain a dependent sample X , 1
n, and again form the time average (11.2). Under certain regularity conditions on the
Markov chain (see e.g. Roberts and Smith, 1994), we may apply the ergodic theorem
to show that the time average is consistent, i.e.
n
1 X f (X ) ! E f (X )]
n =1 almost surely as n ! 1. (A detailed presentation of ergodic theory lies beyond the
scope of this text.) In practice, the values X from the burnin period are usually
discarded when calculating time averages.
Stochastic annealing We now turn to another topic, namely optimization of nonstandard functions using MCMC techniques. We are faced with the problem of minimizing a realvalued
function f (x), x 2 E Rp, where for simplicity it is assumed that E consists of all
the integer vectors inside some large box. Thus, every point x in E (excluding the
boundary) has 2p nearest neighbours in E . Also, the optimal point, x say, is unique
^
by assumption. Implicitly we assume that there is no good deterministic algorithm
available to lead us to the vicinity of the optimal point x, as is often the case in real^
life problems. We will demonstrate how a series of well chosen Metropolis algorithms
can be used to solve the problem. The technique is known as stochastic (or simulated)
annealing.
Given > 0 we de ne the target distribution
; f (x)
(x) = P e ; f (y)
x 2 E:
y2E e
The following Markov chain X ,
1, will have as its stationary distribution.
Start anywhere in E . Given X = x, propose one of the nearest neighbours at
random, and call this point Y . Then accept the new point with probability (x Y ),
where
;
(y
(x y) = min 1 (x) = min 1 e; f (y);f (x)] :
) It is easily checked that this updating scheme corresponds to the Metropolis algorithm
described in the last section, hence the theory tells us that is indeed the stationary
distribution for X .
The trick is now to run through a sequence of Metropolis algorithms according to
the following scheme:
Take > 0 and pick a starting point in E
Run X close to stationarity Increase by a small amount, ! 0
With the old endpoint as starting point, run a new Markov chain X close to
stationarity
Increase 0 by a small amount, 0 ! 00
0 and so on. As becomes large, becomes more and more peaked at the optimal
point x, since
^
; f (x)
; f (x);f (^)]
x
(x) = P e e; f (y) = P e e; f (y);f (^)]
x
y 2E y2E which tends to 1 as ! 1 if x = x, and otherwise tends to 0. This indicates that
^
a carefully tuned stochastic annealing scheme should lead us to the desired optimal
point. It is very di cult, however, to give any guidance regarding good updating rules
for , and proper stopping criteria for the Markov chains. There are many general
theoretical results available, but each speci c problem requires its own ne tuning
and there will always be a certain amount of trial and error involved. 11.3 Practical implementation issues
As soon as an MCMC algorithm suitable for the problem at hand has been determined, practical questions like the determination of starting points, burnin periods
and sample sizes have to be addressed. Here we brie y discuss these issues. The
problem of estimating the expectation E f (X )], for some random variable X on E
and some function f , by calculating time averages will serve as the main example
throughout this section.
Starting point First of all, we need a starting point for the Markov chain. A rapidly mixing chain
will quickly nd its way no matter how the starting value is chosen. On the other
hand, some chains converge to stationarity very slowly unless the starting point is chosen carefully. If several processors are available on the computer it is always a
good idea to run chains in parallel, all of them starting at di erent values.
Burnin period In order to reduce the risk of inferential bias caused by the e ect of starting values,
iterates during a certain initial period are usually discarded. The length of this burnin period depends on the rate at which the Markov chain converges to stationarity.
Ideally, we would like to compute this convergence rate, and use it to estimate the
number of iteration steps needed. This is usually not a tractable problem: the power
of MCMC methods lies in their ability to solve extremely complicated statistical
problems, and complex MCMC solutions typically involve chains that are very di cult
to analyse theoretically.
It is preferable to use convergence diagnostics (for a review, see Cowles and Carlin,
1994). Many di erent convergence diagnostics have been proposed in the literature,
but they all have a common characteristic, namely that the sampler output is used in
some way. Indeed, in many cases the output makes it obvious from the output when
the initial transient period is over. We conclude by giving a simple rule of thumb that
could be appropriate in some situations. If we are interested in estimating E f (X )]
by computing time averages, then Geyer (1992) argues that since the burnin period
is likely to be less than 1% of the number of iterations needed to obtain adequate
precision in the time average, we should concentrate on calculating the total run
length needed and then just discard 1% of the iterations.
Sample size How many iterations should we perform to ensure that our estimator is accurate
enough? This question is also di cult to answer, because of the possible dependencies
in the sample. We propose here some simple ideas that can be useful in particular
circumstances.
First, if the iterates X are indeed independent, then (assuming stationarity) the
P
p
standard deviation of the time average n=1 f (X )=n is given by = n, where is
the standard deviation of f (X ). Alternatively, if the series can be approximated by
a rst order autoregressive process then the theory of time series tells us that the
standard deviation of the time average is r1 + pn 1 ;
where is the autocorrelation of the series f (X ), and the standard deviation as
above. Another simple device used is to thin the observations by saving only every kth
value, thus ensuring that the resulting sequence will consist of more or less indep
pendent values. In that case the simple sample mean k=n will again do the job.
Finally, an obvious ad hoc method for determining the required run length is to run
several chains in parallel, all with di erent starting points, and compare the resulting
time averages. We shall not be satis ed until all of them agree adequately. 11.4 Bayesian inference for epidemics
In this nal section we indicate why the development of MCMC methods has had
such a profound e ect on applied Bayesian statistics. Then, following O'Neill and
Roberts (1999), we use MCMC to draw inferences on the infection probability when
the nal sizes of several household epidemics of ReedFrost type are observed. We
have deliberately chosen a very simple example, just to introduce the reader to the
subject. The main topic in O'Neill and Roberts (1999) is inference on the infection and
removal rates in the Markovian standard SIR epidemic model, after having observed
the removal times. This investigation is also carried out using Markov Chain Monte
Carlo in a Bayesian framework. Similar results can be achieved using other methods,
see Becker and Hasofer (1997), but as soon as the models become slightly more
complicated, e.g. by the introduction of latent periods, the exibility in the MCMC
approach becomes apparent.
From a Bayesian perspective, there is no real di erence between observables and
parameters of a statistical model: all are considered random quantities. Let D denote the observed data, and collect the model parameters and the missing data in a
vector x. Then set up the joint distribution p(D x) over all random quantities. This
distribution can be written p(D x) = p(x)L(D j x) where p(x) is the prior distribution of x and L(D j x) is the likelihood. Bayes' theorem
is now used to determine the posterior distribution of x given the observed data: pxLD x
p(x j D) = R p((y))L((D jjy))dy : This posterior distribution is the object of all Bayesian inference.
The integral in the formula for p(x j D) above is in general very di cult to compute, especially in high dimensions. However, by using the MetropolisHastings algorithm for sampling from the target distribution (x) = p(x j D) this problem is
overcome very elegantly, since, as already noted below Eq. (11.1), normalizing factors in this target distribution never need to be computed. This is the main reason
why MCMC has become such an important tool in Bayesian statistics. Turning to the epidemic application, consider a population consisting of a number
of households, each of size three. Start with one infectious individual in each household and run independent ReedFrost epidemics. Denote the avoidance probability
by q, so that 1 ; q is the probability that a given infective infects a given susceptible
household member. Now, for j = 0 1 2, let Nj denote the number of households in
which an epidemic of nal size j has occurred. Our task is to estimate the avoidance
probability q given the data D = (N0 N1 N2).
Recall from Section 1.2 that, for a given household,
P(Z = 0) = q2 P(Z = 1) = 2(1 ; q)q2 P(Z = 2) = (1 ; q)2 + 2(1 ; q)2q:
Note that the rst term in P(Z = 2) is the probability of having the entire household
infected in one generation while the second term is the probability that it takes two
~
generations to infect the household. Denoting by N2 the (unobserved) number of two
generationepidemics in the population, it follows that the likelihood function for q
can be written
~
~
~
L(qjD N2) = C (q2)N0 (2(1 ; q)q2)N1 ((1 ; q)2)N2;N2 (2(1 ; q)2 q)N2
~
~
= C 2N1 +N2 q2N0+2N1 +N2 (1 ; q)N1+2N2 :
Readers familiar with basic Bayesian statistics will notice from this equation that if
a Beta( ) distribution is chosen as the prior distribution of q, then the posterior
distribution is given by
~
~
p(q j D N2) Beta(2N0 + 2N1 + N2 + N1 + 2N2 + ):
(11.3)
Also, since the household epidemics are assumed to be independent, we have that
q
~
p(N2 j D q) Bin N2 2q2+ 1 :
(11.4)
The Gibbs sampler of Section 11.1 can now be used to sample from the target dis~
~
tribution (q N2) = p(q N2 j D), since we have the expressions (11.3) and (11.4) for
~
the corresponding conditional distributions. By introducing the extra variable N2 we
~
were able to write the likelihood function in a convenient form, but the actual N2values will probably be regarded as uninteresting and may therefore not be included
in the sample output. Exercises
11.1. Show that the acceptance probability is 1 for the Gibbs sampler.
11.2. Implement the Gibbs sampler described in Section 11.4, and estimate the avoidance probability q from the hypothetical data given in the Table 11.1 below. All
households had 1 initially infective and 2 initially susceptible individuals. Table 11.1. Household outbreaks
Final Number of
size households
0
39
1
26
2
35 12 Vaccination
Perhaps the most important practical reason for the statistical analysis of epidemics
lies in its application to vaccination policy. In the present chapter we focus on this
topic. More precisely we consider the following question: who and how many individuals should be vaccinated to prevent future epidemic outbreaks? In Section 12.1 we
study this question for the standard SIR epidemic, and in Section 12.2 for endemic
diseases, focusing on estimates and neglecting uncertainties. The chapter concludes
with a section devoted to the estimation of the vaccine e cacy, which measures the
reduction in susceptibility resulting from vaccination. In Sections 12.1 and 12.3 we
make use of the multitype epidemic presented in Chapter 6, where the types consist
of vaccinated and unvaccinated individuals. 12.1 Estimating vaccination policies based on one epidemic
Assume that the standard SIR epidemic En m( I ) is observed in a large community.
If the epidemic was observed completely, we can estimate the basic reproduction
number using methods presented in Section 9.2 (see also the exercises) if the nal
size was observed, we use the methods of Section 10.1. Suppose now that a vaccine is
available which reduces the transmission rate between an infectious and a vaccinated
individual to (1 ; r)=n. The corresponding rate for a susceptible individual is =n,
so r is the relative reduction, which is assumed to be known (its estimation is treated
separately in Section 12.3). The important case of a perfect vaccine giving 100% and
lifelong immunity corresponds to r = 1. Based on the data from one outbreak, we
wish to nd the proportion vc of the susceptible population, the critical vaccination
coverage, that has to be vaccinated to prevent future outbreaks.
We rst answer the question assuming to be known. Suppose a proportion v is
vaccinated, then the community consists of two types of individuals, unvaccinated and
vaccinated, and the multitype model of Chapter 6 is relevant. Adopting the notation
of Chapter 6, we have two types of individual, 1 and 2 (unvaccinated and vaccinated)
with 1 = 1 ; v and 2 = v. The infection parameters satisfy 11 = 21 = ,
12 = 22 = (1 ; r) and 1 = 2 = we are implicitly assuming that vaccination
only reduces the risk of becoming infected and not the infectivity once infected. The
reproduction number, here denoted by Rv for obvious reasons, is then according to
Section 6.2 the largest eigenvalue of the matrix with coe cients f i ij j g. For our
speci c case this eigenvalue is simply Rv = (1 ; v) + (1 ; r)v: (12.1) From Section 6.2 we know that a major outbreak is impossible if Rv 1. In terms
of the proportion vaccinated v, this is equivalent to v (1 ; 1= )=r (remember that
= { the reproduction number prior to vaccination). This means that the critical vc 0.8
0.6
0.4
0.2 5 10 15 20 Figure 12.1: Graph of the critical vaccination coverage vc plotted against .
vaccination (or immunity) coverage vc, above which the population is protected from
major outbreaks, is
vc = 1 1 ; 1 :
r
It may happen that vc > 1 (if r is small and large). The interpretation in this
case is that the community is not protected from major outbreaks even when the
whole community is vaccinated. The case with a perfect vaccine (r = 1) reduces to
vc = 1 ; 1= . The situation where the whole community, and not only vaccinated
individuals, is protected from the epidemic is known as herd immunity. In Figure
12.1 above vc is plotted against the basic reproduction number for the case of a
perfect vaccine (i.e. r = 1). For example, if is known to equal 5, as may be the
case for in uenza, then the critical vaccination coverage is vc = 1 ; 1=5 = 0:8 which
means that at least 80% of the community should be vaccinated in order to prevent
an outbreak.
We now consider the case where is unknown. We then estimate it from observation of one epidemic outbreak (prior to vaccination) using theory from Chapter 9 or
10 depending on whether the epidemic is observed completely or only its nal state
is observed. Either way we get an estimate ^ (in case of complete data ^ = ^ ^;1).
We obviously estimate vc by vc = 1 1 ; 1 :
^r
^ (12.2) Using the method, we obtain the consistent standard error s:e:(^c) = s:e:( ^)=r ^2.
v
^) depends on the type of data. For nal size data it is given
The standard error s:e:(
1=2
in Corollary 10.2 for complete data it equals (^;1 s:e:(^))2 + (^ s:e:(^;1))2 ,
applying the method and the fact that the two parameter estimates are independent.
A con dence interval for vc is obtained using the estimate and the standard error,
together with the asymptotic normality of the estimator. Here it is most natural to
use a onesided con dence interval, since underestimation has worse consequences
than overestimation.
In a heterogeneous community, the question of vaccination is more complex: now
the e ect of vaccination depends not only on the number vaccinated, but also on how
individuals to be vaccinated are allocated. Suppose, for example, that the population
consists of homogeneous individuals belonging to households where it is assumed that
individuals mix at a higher rate within households, as in the model of Section 6.3. A
relevant question is then how to select individuals for vaccination optimally, in the
sense of most reducing the reproduction number. Is it better to spread vaccination
over households or should one try to vaccinate certain households completely? Ball et
al. (1997) show that for a subclass of the model in Section 6.3 the optimal vaccination strategy is the `equalizing' strategy. This strategy picks individuals sequentially
from households with the largest number of remaining susceptible individuals, thus
equalizing the number of remaining susceptibles in households. They conjecture this
strategy to be optimal for the model in general. If instead the population is heterogeneous due to individual heterogeneities, such as varying infectivity and susceptibility,
the same question of who to vaccinate is relevant. In case of equal infectivity one
should vaccinate individuals with the highest susceptibility, but if the infectivity also
varies there is no simple solution. Then there is a tradeo between high susceptibility
and high infectivity in the optimal selection procedure.
The following important general question concerning estimation and vaccination
was posed by Becker (e.g. Becker and Utev, 1998). Suppose an epidemic outbreak
was observed and we want to draw inferences on who and how many to vaccinate.
For a given model a vaccination strategy can often be derived, as with the standard
SIR epidemic above. The question is how vaccination strategies derived from di erent
model assumptions are related. That is, if a vaccination strategy was derived under
the assumption that the observed epidemic was from model A, say, but model B is
the true model, is the community still under herd immunity?
As an example, suppose 60% in a large, completely susceptible, community was
infected during the epidemic season. With previous terminology we have Z = 0:6
and we may assume that = 0 (i.e. there were only few initial infectives). Assuming
the epidemic is the standard SIR epidemic we can apply results from Chapter 10 to
estimate R0 = by ^ ; ln(1 ; Z )=Z = 1:53. The critical vaccination coverage
for a vaccine resulting in 90% reduction of susceptibility (r = 0:9) is then estimated
from (12.2) as vc = 0:9;1(1 ; 1:53;1) = 0:384. This means that at least 38.4% should
^ be vaccinated for the community to possess herd immunity. A relevant question is
whether the community possesses herd immunity if a random sample of the community of this size is vaccinated, when the model assumptions are false. Suppose that
the community consisted of households, all of size four to keep things simple, and that
the rate of transmission is higher within households. Is the community then protected
from future outbreaks? Or suppose that in fact 40% of the females and 80% of the
males were infected (implying that males are more susceptible). Is the community
then under herd immunity? The answer to the rst question for this speci c example
is `yes', and to the second question `no'.
There is no general answer for the rst type of question, i.e. whether neglecting
the presence of households, can result in underestimation of the critical vaccination
coverage. Becker and Utev (1998) show that if all households are of equal size then
the critical vaccination coverage is lower than if the data was collected from a homogeneous community without households. However, if households are of di erent
sizes, as in real life, then the relation can go either way (the direction depends on
the household structure and the proportion infected). If the critical vaccination coverage is derived taking account of individual heterogeneities (men and women in the
example above) then this proportion is always larger than the vaccination coverage
derived neglecting heterogeneities (Britton, 1998b).
The natural conclusion is, of course, that heterogeneities should be taken into
account. When doing this, one can also devise better vaccination strategies than
picking individuals at random. The question is still important because it is reasonable
to believe that some heterogeneities, social and/or individual, are not observable
and therefore not taken into account. The results mentioned above then state that
outbreaks may in fact occur in the community even when the proportion vaccinated
exceeds vc, if vc is derived neglecting some relevant heterogeneities. 12.2 Estimating vaccination policies for endemic diseases
The question of who and how many individuals to vaccinate is perhaps even more
important for endemic diseases. The prime example of an endemic infectious disease
which has been completely eradicated by vaccination is smallpox. Presently most
childhood diseases are subject to vaccination policies in the western world, and these
diseases are no longer endemic in most parts. It is not possible to vaccinate everyone
for practical and ethical reasons (e.g. some religious communities oppose vaccination).
To derive the necessary proportion vc to vaccinate (the critical vaccination coverage)
is hence of great importance.
Below, we derive estimates for the two endemic models presented in Chapter
8. Because of the rather complicated structure of these models we shall not derive
standard errors but only estimates. Of course, the models are oversimpli ed in that
the community is assumed to be homogeneous and individuals are assumed to mix uniformly. Further, any seasonal e ects are neglected, even though it is wellknown
that the beginning of school in the autumn usually implies outbreak peaks. E ects
of vaccination policies, and their various estimation problems, for more realistic but
deterministic models have been studied in many papers (e.g. Anderson and May,
1991, and Greenhalgh and Dietz, 1994).
The SIR model with demography Let us rst consider the SIR model with demography of Section 8.1, which we recapitulate brie y. New individuals are born at rate n, an infectious individual has
close contact with other individuals at rate (any speci c individual at rate =n) and
becomes removed at rate . Finally, each individual dies at rate irrespective of her
disease state. The bivariate Markov process (X Y ) = f(X (t) Y (t)) t 0g, which
keeps track of the number of susceptibles and infectives respectively, is speci ed by
the starting point (n n) and its transition rates given in Section 8.1. Suppose now
that a perfect vaccine is available, i.e. a vaccine giving lifelong 100% immunity, and
that a proportion v of all individuals are vaccinated at birth. The only transition rate
that is a ected is that susceptibles are born at rate (1 ; v)n (the remaining newborns are vaccinated and immune). We can apply the same model and results, with
slight changes in the parameters as follows. Let n0 = n(1 ; v) so that susceptibles are
born at rate n0. For the jump rates to be consistent with the rates of Section 8.1, we
then have to write n0 in the transition rate for new infections, and this can be done
if we introduce 0 = (1 ; v). Thus, we have the model of Section 8.1 with n0 and 0
replacing n and . Consequently the reproduction number Rv after the vaccination
scheme is started, is given by
0
;
Rv = + = (1+ v) = (1 ; v)R0:
From properties of the model it then follows that the only stationary point is the
diseasefree state if Rv < 1. This is equivalent to v > 1 ; 1=R0. The critical vaccination coverage is hence
1
vc = 1 ; R = 1 ; + :
0
If R0 is unknown it may be estimated prior to vaccination by observing the community
in its stationary state. From Section 8.1 we know that in case of endemicity, the sta;;
tionary point for population proportions is given by (^ y) = R0 1 (R0 ; 1)( R0);1 .
x^
From this it immediately follows that R0 is estimated by the inverse of xobs, the
^
observed proportion of susceptibles: R0 = 1=xobs. To conclude, an estimate of the
critical vaccination coverage vc necessary to prevent future outbreaks is
vc = 1 ; xobs :
^
(12.3)
Consider as an example measles in the western world prior to vaccination. On the
average a fraction of approximately 1=15 7% had not yet experienced the disease and were thus susceptible. This implies that vc 93%, and a vaccination coverage
^
above this value will eradicate the disease. The vaccination coverage of measles in
the United States has by now exceeded 90% and the disease is no longer endemic only smaller local outbreaks still occur. This shows that the estimate is in the right
ball park even though it is estimated from a simple model neglecting heterogeneities
in the community as well as social and geographic structure.
Because most individuals will experience the disease before dying, and all individuals live equally long on the average, the community proportion of susceptibles
will coincide with the average fraction of an individual's life that she is susceptible.
That is, the average age at infection divided by the average lifetime. Conversely, the
average age of infection is the average lifetime divided by R0. This explains the name
`childhood disease': the (average) age of infection is small mainly because R0 is large,
and not because of the special behaviour of children. For example, measles is believed
to have R0 15, so with an average lifetime of 75 years the formula states that the
average age at infection is 75/15=5 years.
The SIS model In the SIS model (Section 8.2), the community is xed and new susceptibles arise
not from births of new individuals but from infectious individuals recovering and
becoming susceptible (in SIR models they become immune after recovery or die). An
infectious individual has close contact with others at rate ( =n with any speci c
individual) and an infectious individual recovers and becomes susceptible at rate .
The transition rates for the induced Markov process are given in Section 8.2 where
properties of the model are also presented. The basic reproduction number satis es
R0 = = and the epidemic dies out quickly unless R0 > 1. Assume as in the
previous paragraph that a perfect vaccine is available and that the proportion v of all
individuals is vaccinated. This a ects the transition rate corresponding to infection:
if there are i infectives the number of susceptibles is n(1 ; v) ; i (and not n ; i
as it is prior to vaccination). However, if we replace n by n0 = n(1 ; v) and by
0 = (1 ; v ) the transition rates for the vaccinated community coincide with the
rates of Section 8.2. From results of Section 8.2 it thus follows that the epidemic will
surely become extinct if 0 = 1, which is equivalent to v 1 ; = . The critical
vaccination coverage is therefore
1
vc = 1 ; = 1 ; R :
0
Prior to vaccination the proportion of susceptibles converges to = = 1=R0. If R0
is unknown it can hence be estimated by the inverse of xobs , the observed proportion
of susceptibles for a community in its stationary state. An estimate for the critical
vaccination coverage is thus
vc = 1 ; xobs :
^ We see that this is the same estimate as in the SIR model with demography (12.3).
The relation vc = 1 ; 1=R0 holds for a larger class of models including the models
assuming homogeneous mixing and a homogeneous community. 12.3 Estimation of vaccine e cacy
We now proceed to estimate the parameter r de ned in Section 12.1 as the relative
reduction in the rate at which vaccinated individuals become infected by infectious
individuals, compared with the rate for unvaccinated individuals (the rate for unvaccinated is =n and (1 ; r)=n for vaccinated). It is assumed that the infectivity
of vaccinated individuals is unchanged. That is, if a vaccinated individual becomes
infected she is infectious as she would have been if not vaccinated. We estimate r
assuming that we observe the standard SIR epidemic in which part of the community
is vaccinated and the rest are not. To keep things simple we assume a large community in which a negligible fraction is vaccinated. Alternative methods for estimating
r can be derived from a matched pair study, for example with sibling pairs where one
sibling is vaccinated and the other is not.
Assume that a sample of size among the n susceptibles ( v = =n being negligible) were selected and vaccinated prior to the epidemic. After the epidemic outbreak
has run its course, a proportion Zu of the unvaccinated and a proportion Zv of the
vaccinated are found to have been infected. The initial proportion of infectives is
assumed to be small. The overall proportion infected is Z = v Zv + (1 ; v )Zu Zu.
According to Section 6.2, in case of a major outbreak (Zu Zv ) converges in probability
to ( u v ), the positive solution of
1;
1; u
v =
= e; (
e;(1;r) vv +(1; v ) u )
( v v +(1; v ) u ) : From these equations it follows that 1 ; r = log(1 ; v )= log(1 ; u). Replacing the
limiting quantities by the corresponding observed values gives an estimate of r:
(12.4)
r = 1 ; log(1 ; Zv ) :
^
log(1 ; Zu)
Up until now, we have not made use of the assumption that the proportion vaccinated
v is small, and consequently the estimate is consistent without this assumption. A
standard error for the estimate can also be derived for the general case using the
central limit theorem of Ball and Clancy (1993) presented in Section 6.2. Here we
treat the simpler case with v assumed to be small. This implies that Zu can be
treated as deterministic in comparison with Zv . Because the proportion vaccinated is
negligible these individuals do not a ect the overall proportion infected. Thus, each
vaccinated individual behaves more or less independently and becomes infected with
probability v = 1 ; e;(1;r) ( +(1; ) ). Consequently Zv is binomially distributed
vv v u with parameters and v . Treating Zu as deterministic, we see from equation (12.4)
that the variance of r equals the variance of log(1 ; Zv ) divided by (log(1 ; Zu))2.
^
Expanding log(1;Zv ) in a Taylor series, and applying the method it follows that the
variance of log(1 ; Zv ) approximately equals the variance of Zv divided by (1 ; v )2.
Finally, the variance of Zv is v (1 ; v )= since Zv is binomially distributed, and this
may be estimated by Zv (1 ; Zv )= . A consistent estimate of the standard deviation
of r is thus
^
s:e:(^) = p
r p Zv (1 ; Zv )
:
(1 ; Zv )j log(1 ; Zu)j (12.5) As an example, suppose that in a large community 50% of the unvaccinated were
infected while only 10% in the sample of = 200 vaccinated individuals were infected.
This means Zu = 0:5 and Zv = 0:1. Equations (12.4) then gives r = 0:848 and
^
s:e:(^) = 0:034 from (12.5).
r
The parameter r measures the reduction in transmission which is a natural measure of the e cacy of a vaccine. Traditionally however, the de nition of the vaccine
e cacy V E is
V E = 1 ; Zv
Zu
using the terminology of the present section. This measure is not a very satisfactory
single measure of the vaccine e cacy because it depends on the proportion vaccinated
and on the community from which data come. That is, one would get a di erent
estimate if a larger/smaller proportion were vaccinated prior to the epidemic, and
also in a di erent community in which individuals mix at higher/lower rate. The
parameter r is to be preferred, as it does not have these unsatisfactory properties. Exercises
12.1. Check that the formula for Rv , given in equation (12.1), actually gives the largest eigenvalue to the speci ed matrix. 12.2. As discussed in the introduction of Part II and in Exercise 9.4 the estimates of Part II can be modi ed to cover the case with initially immune individuals (for
simplicity the text is written assuming no initially immune individuals). Suppose the
proportion initially infectious is negligible ( 0) and that s is the initial proportion
susceptible. Then it is not hard to show (simply by transforming parameters) that the
ultimate proportion infected among the initially susceptibles converges in probability
to a solution of the equation 1 ; e s to be compared with equation (4.2) for the case
with no initially immunes (remember that = ). This implies that the estimates of
the present section are not estimates of but of s. Derive estimates and con dence
bounds for and vc assuming that the initial proportion susceptible s is known and where data consists of knowing the observed proportion infected among those initially
suceptible. 12.3. Estimate and derive standard error for the critical vaccination coverage vc for
the smallpox outbreak presented in Exercise 9.3.
a) assuming complete data.
b) assuming that only the nal state is observed (see Exercise 10.1). 12.4. Estimate vc for the Tristan da Cunha outbreak presented in Exercise 10.2.
12.5. Suppose that 10 years is the average age at infection of some childhood disease in a society with average lifelength of 70 years. Assume that the SIR model with demography applies and derive an estimate of vc, the proportion necessary to vaccinate
to surely prevent future outbreaks. 12.6. Suppose a hypothetical vaccine trial resulted in 16 infected individuals among the sample of 185 vaccinated individuals whereas 42% of the remaining (large number
of) individuals were infected. Derive an estimate of the reduction parameter r and
give a standard error for the estimate. ...
View
Full Document
 Spring '11
 Gibs
 Demography, Probability, Epidemiology, Infectious Disease, The Land, Probability theory, Stochastic process

Click to edit the document details