andersson

# andersson - STOCHASTIC EPIDEMIC MODELS AND THEIR...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STOCHASTIC EPIDEMIC MODELS AND THEIR STATISTICAL ANALYSIS Hakan ANDERSSON and Tom BRITTON February 2000 Hakan Andersson Group Financial Risk Control Swedbank SE-105 34 Stockholm Sweden e-mail: hakan.b.andersson@foreningssparbanken.se Tom Britton Department of Mathematics Uppsala University P.O. Box 480 SE-751 06 Uppsala Sweden e-mail: tom.britton@math.uu.se Preface The present lecture notes describe stochastic epidemic models and methods for their statistical analysis. Our aim is to present ideas for such models, and methods for their analysis along the way we make practical use of several probabilistic and statistical techniques. This will be done without focusing on any speci c disease, and instead rigorously analyzing rather simple models. The reader of these lecture notes could thus have a two-fold purpose in mind: to learn about epidemic models and their statistical analysis, and/or to learn and apply techniques in probability and statistics. The lecture notes require an early graduate level knowledge of probability and statistics. They introduce several techniques which might be new to students, but our intention is to present these keeping the technical level at a minimum. Techniques that are explained and applied in the lecture notes are, for example: coupling, di usion approximation, random graphs, likelihood theory for counting processes, martingales, the EM-algorithm and MCMC methods. The aim is to introduce and apply these techniques, thus hopefully motivating their further theoretical treatment. A few sections, mainly in Chapter 5, assume some knowledge of weak convergence we hope that readers not familiar with this theory can understand the these parts at a heuristic level. The text is divided into two distinct but related parts: modelling and estimation. The rst part covers stochastic models and their properties, often assuming a large community in which the disease is spread. The second part deals with statistical questions, that is, what can be said about the model and its parameters, given that an epidemic outbreak has been observed. The second part uses results from the rst part, and is hence not suited for reading without having read the rst part. The lecture notes are self-instructive and may be read by anyone interested in the area. They are suited for a one-semester course of approximately 15 two-hour lectures. Most chapters may be presented in one such lecture. Chapters that need somewhat longer treatment are Chapters 5, 6, and 8. Each chapter ends with a few exercises giving extensions of the theory presented in the text. These notes were written during the spring term 1999 when the authors gave a joint graduate course in the Departments of Mathematics at Stockholm and Uppsala Universities. We thank the participants in the course: Anders Bjorkstrom, Nestor Correia, Maria Deijfen, Peter Grenholm, Annika Gunnerhed, Allan Gut, Jemila Seid Hamid, Stefan Israelsson, Kristi Kuljus, Johan Lindback and Habte Tewoldeberhan for their constructive criticisms of the manuscript as well as for pointing out several errors. The lecture notes were re-written taking their comments into account during the second half 1999. We are also grateful to the referees of Springer for careful reading and numerous constructive suggestions on how to clearify bits and pieces as well as language improvement. Needless to say, the authors are responsible for any remaining errors. Tom Britton gratefully acknowledges support from the Swedish Natural Science Research Council. Hakan Andersson, Stockholm Tom Britton, Uppsala February, 2000 Contents Preface List of contents Part I: STOCHASTIC MODELLING Chapter 1. Introduction 1.1. Stochastic versus deterministic models 1.2. A simple epidemic model: The Reed-Frost model 1.3. Stochastic epidemics in large communities 1.4. History of epidemic modelling Exercises v vii 1 3 3 4 6 7 9 Chapter 2. The standard SIR epidemic model 11 11 12 14 15 18 Chapter 3. Coupling methods 19 19 22 22 26 Chapter 4. The threshold limit theorem 27 27 28 2.1. De nition of the model 2.2. The Sellke construction 2.3. The Markovian case 2.4. Exact results Exercises 3.1. First examples 3.2. De nition of coupling 3.3. Applications to epidemics Exercises 4.1. The imbedded process 4.2. Preliminary convergence results 4.3. The case mn=n ! > 0 as n ! 1 4.4. The case mn = m for all n 4.5. Duration of the Markovian SIR epidemic Exercises 30 32 34 36 Chapter 5. Density dependent jump Markov processes 39 39 40 41 43 46 48 Chapter 6. Multitype epidemics 51 51 53 55 56 61 Chapter 7. Epidemics and graphs 63 64 65 66 70 72 Chapter 8. Models for endemic diseases 73 73 77 83 5.1. An example: A simple birth and death process 5.2. The general model 5.3. The Law of Large Numbers 5.4. The Central Limit Theorem 5.5. Applications to epidemic models Exercises 6.1. The standard SIR multitype epidemic model 6.2. Large population limits 6.3. Household model 6.4. Comparing equal and varying susceptibility Exercises 7.1. Random graph interpretation 7.2. Constant infectious period 7.3. Epidemics and social networks 7.4. The two-dimensional lattice Exercises 8.1. The SIR model with demography 8.2. The SIS model Exercises Part II: ESTIMATION Chapter 9. Complete observation of the epidemic process 9.1. Martingales and log-likelihoods of counting processes 9.2. ML-estimation for the standard SIR epidemic Exercises 85 87 87 91 94 Chapter 10. Estimation in partially observed epidemics 99 99 103 105 Chapter 11. Markov Chain Monte Carlo methods 107 107 109 111 113 114 Chapter 12. Vaccination 12.1. Estimating vaccination policies based on one epidemic 12.2. Estimating vaccination policies for endemic diseases 12.3. Estimation of vaccine e cacy Exercises 117 117 120 123 124 References 129 10.1. Estimation based on martingale methods 10.2. Estimation based on the EM-algorithm Exercises 11.1. Description of the techniques 11.2. Important examples 11.3. Practical implementation issues 11.4. Bayesian inference for epidemics Exercises Part I STOCHASTIC MODELLING In the rst part of these lecture notes, we present stochastic models for the spread of an infectious disease in some given population. We stress that the models aim at describing the spread of viral or bacterial infections with a person-to-person transmission mechanism. Diseases that belong to this category are, for example, childhood diseases (measles, chickenpox, mumps, rubella, ...), STD's (sexually transmitted diseases) and less severe diseases such as in uenza and the common cold. Diseases excluded from these models are, for example, host-vector and parasitic infections although they have some features in common. Modi cations of epidemic models can also be used in applications in the social sciences, modelling for example the spread of knowledge or rumours (see e.g. Daley and Kendall, 1965, and Maki and Thompson, 1973). In this context being susceptible corresponds to being an ignorant (not having some speci c information) and infectious corresponds to being a spreader (spreading the information to other individuals). One possible rumour model is where the spreader continues to spread the rumour forever which corresponds to the SI model (e.g. Exercise 2.3). Two main features make the modelling of infectious diseases di erent from other types of disease. The rst and perhaps most important reason is that strong dependencies are naturally present: whether or not an individual becomes infected depends strongly on the status of other individuals in its vicinity. In Section 1.2 we shall see how this complicates the stochastic analysis, even for a very small group of individuals. For non-transmittable diseases this is usually not the case. Occurences of such diseases are usually modelled using survival analysis, in which hazard functions, describing the age-dependent risk for an individual to fall ill, are speci ed proportional hazards (Cox, 1972) is an important example. These hazards may be speci c to each individual and even contain a random parameter which may be correlated between related individuals as in frailty models (e.g. Hougaard, 1995). Even then the disease-times for di erent individuals are de ned to be independent given the hazard functions, contrary to the case for infectious diseases. A thorough analysis of such models and their statistical analysis, including many examples, is given in Andersen et al. (1993). The second feature, which a ects the statistical analysis, is that most often the epidemic process is only partly observed. For example it is rarely known by whom an infected individual was infected, nor the time at which the individual was infected and during what period she was infectious (here and in the sequel we denote individuals by `she'). Problems related to this property are treated in Part II of the lecture notes. In the introductory chapter we present the simplest stochastic model, discuss advantages and disadvantages of stochastic models as opposed to deterministic models and give a brief overview of the history of epidemic models and give some general references on epidemic modelling. In Chapter 2 we de ne what we call `The standard SIR epidemic model' (SIR epidemic for short), which serves as the canonical model in the text. The abbreviation SIR is short for `susceptible', `infectious' and `removed'. SIR models assume that individuals are at rst susceptible, if they get infected they remain infectious for some time, after which they recover and become immune. An individual is said to be removed if she has recovered and is immune or dies, and plays no further role in the epidemic. A construction of the SIR model is given in Chapter 2, and exact results concerning the nal size (i.e. the total number infected by the disease) and `the total area under the trajectory of infectives' are derived. In Chapter 3 the Coupling Method is presented, a method which enables the SIR epidemic model to be approximated by a branching process during its initial stage, assuming a large population. First, the main idea of coupling is given, and then it is applied to the SIR epidemic model. In Chapter 4 the important threshold limit theorem, concerning the nal size of the SIR epidemic in a large population, is stated and proved. In Chapter 5 we are concerned with approximations of the entire epidemic process and not only its nal size. This relies on di usion theory for the special case when the epidemic model is Markovian. Chapters 6 and 7 extend the standard SIR epidemic to the case where the population is not completely homogeneous. Chapter 6 is devoted to so-called multitype models, where individuals are of di erent `types' and these types di er, for example in terms of susceptibility. In Chapter 6 we also model the spread of disease when the community is built up of households, assuming di erent transmission rates between individuals depending on whether they belong to the same household or not. In Chapter 7 we characterise an epidemic model in terms of random graphs. This technique is introduced to model heterogeneities caused by e.g. social structures or geographic location. The last chapter of Part I, Chapter 8, concerns models for endemic diseases, i.e. diseases which are present in the population over a long period of time. The models of this chapter are di erent in that new susceptible individuals have to enter the population for endemicity to occur. This can be achieved by assuming that old individuals die and new (susceptible) individuals are born into the population, i.e. a dynamic population, or that individuals become susceptible after recovery rather than immune. The latter type of model is, for obvious reasons, called an SIS model. 1 Introduction 1.1 Stochastic versus deterministic models These lecture notes focus on stochastic models and their statistical analysis. Deterministic epidemic models have perhaps received more attention in the literature (see also Section 1.4 about the history of epidemic models). For example, the monograph by Anderson and May (1991), probably the most cited reference in the recent literature on epidemic models, treats almost exclusively deterministic models. The main advantage of deterministic models lies in their simpler (but not necessarily simple!) analysis. For a stochastic epidemic model to be mathematically manageable it has to be quite simple, and thus not entirely realistic. Deterministic models can be more complex, yet still possible to analyse, at least when numerical solutions are adequate. Several reasons suggest that stochastic models are to be preferred when their analysis is possible. First, the most natural way to describe the spread of disease is stochastic one de nes the probability of disease transmission between two individuals, rather than stating certainly whether or not transmission will occur. Deterministic models describe the spread under the assumption of mass action, relying on the law of large numbers. In fact, an important part in stochastic modelling lies in showing what deterministic model the stochastic model converges to, when the community size becomes large. Secondly, there are phenomena which are genuinely stochastic and do not satisfy a law of large numbers. For example, in a large community, many models will lead either to a minor outbreak infecting only few individuals, or else to a major outbreak infecting a more or less deterministic proportion of the community (see Section 1.3 below). To calculate the probability of the two events is only possible in a stochastic setting. Further, when considering extinction of endemic diseases (see Chapter 8), this can only be analysed with stochastic models, since extinction occurs when the epidemic process deviates from the expected level. A third important advantage concerns estimation. Knowledge about uncertainty in estimates requires a stochastic model, and an estimate is not of much use without some knowledge of its uncertainty. To conclude, stochastic models are to be preferred when their analysis is possible otherwise deterministic models should be used. Deterministic models can also serve as introductory models when studying new phenomena. We see no con ict between the two and believe that both types of models play an important role in better understanding the mechanisms of disease spread. In these lecture notes we focus on stochastic models, for deterministic models we refer the reader to Anderson and May (1991), and Bailey (1975) { who also treats stochastic models. 1.2 A simple epidemic model: The Reed-Frost model Before describing some general characteristics of the models to be studied, we present the simplest possible epidemic model for the spread of an infectious disease, say the common cold, in a small group of individuals. The model is called the Reed-Frost model named after its founders (see Section 1.4 on the history of Epidemic modelling) and is a so-called chain-binomial model (also treated in Section 10.2). The model is an SIR epidemic model which means that individuals are at rst susceptible to the disease. If an individual becomes infected, she will rst be infectious, and called an infective, for some time and then recover and become immune, a state called removed. An individual is said to be infected if she is either infectious or removed, i.e. no longer susceptible. The model is usually speci ed using discrete time dynamics. In the discrete time scenario it is natural to think of the infectious period as being short and preceded by a longer latent period. Then new infections will occur in generations, these generations being separated by the latent period as the discrete time unit. The event probabilities in a given generation depend only on the state of the epidemic in the previous generation (i.e. a Markov model), and these events are speci ed by certain binomial probabilities. If we let Xj and Yj denote the number of susceptibles and infectives respectively at time (or generation) j , the chain-binomial Reed-Frost model has conditional probabilities P(Yj+1 = yj+1jX0 = x0 Y0 = y0 : : : Xj = xj Yj = yj ) = P(Yj+1 = yj+1jXj = xj Yj = yj ) xj (1 ; qy )y +1 (qy )x ;y +1 yj+1 and Xj+1 = Xj ; Yj+1. This means that a given susceptible of generation j remains susceptible in the next generation if she escapes infection from all infectives of generation j these events are independent each occurring with probability q. Further, di erent susceptibles in a given generation become infected independently of one another and infectious individuals are removed in the next generation. Given the initial state X0 = n and Y0 = m the probability of the complete chain y1 : : : yk yk+1 = 0 is obtained by conditioning sequentially and using the Markov property of the chain. If we let xj+1 = xj ; yj+1 we have P(Y1 = y1 : : : Yk = yk Yk+1 = 0jX0 = n Y0 = m) = P(Y1 = y1jX0 = n Y0 = m) P(Yk+1 = 0jXk = xk Yk = yk ) xk (1 ; qy )0 (qy )x : n = y (1 ; qm)y1 (qm )n;y1 0 = 1 j j j j j k k k From a mathematical point of view, the spread of the disease does not have to occur in generations. The necessary assumption is that each individual who becomes infected has `infectious contact' with any other given individual (meaning that the individual becomes infected if she is still susceptible) with probability p = 1 ; q, and all such contacts occur independently. Of course, the notion of generation becomes less meaningful, unless there is a long latency period prior to the infectious period. Still the formula can be used when computing the total number of infected individuals P Z = j 1 Yj , as we shall see (note that the initially infectives m are excluded). The quantity Z is also known as the nal size of the epidemic. To compute P(Z = zjX0 = n Y0 = m) we simply sum the probabilities of all P chains for which jyj = j 1 yj = z. From the de ning equations it is seen that Yj = 0 implies that Yj+1 = 0. This means that new infections may only occur whenever some individuals are infectious, which implies that the length of a chain cannot be longer than the total number infected, making the number of possible chains nite. The probability function for the nal number infected is hence P(Z = zjX0 = n Y0 = m) = X y:jyj=z P(Y1 = y1 : : : Yk = yk Yk+1 = 0jX0 = n Y0 = m): Below we calculate the probability function explicitly for Z , the nal number infected among those initially susceptible, when Y0 = m = 1 and X0 = n = 1 2, and 3. Since m = 1 in all cases we omit it in the conditioning notation. We start with n = 1: P(Z = 0jX0 = 1) = P(Y1 = 0jX0 = 1) = q P(Z = 1jX0 = 1) = P(Y1 = 1 Y2 = 0jX0 = 1) = p: For n = 2 we have P(Z = 0jX0 = 2) = P(Y1 = 0jX0 = 2) = q2 P(Z = 1jX0 = 2) = P(Y1 = 1 Y2 = 0jX0 = 2) = 2 pq q 1 P(Z = 2jX0 = 2) = P(Y1 = 2 Y2 = 0jX0 = 2) + P(Y1 = 1 Y2 = 1 Y3 = 0jX0 = 2) = p2 + 2 pq p: 1 For n = 3 we compute only the rst three probabilities, the nal probability may be derived from the complement. P(Z = 0jX0 = 3) = P(Y1 = 0jX0 = 3) = q3 P(Z = 1jX0 = 3) = P(Y1 = 1 Y2 = 0jX0 = 3) = 3 pq2 q2 1 P(Z = 2jX0 = 3) = P(Y1 = 2 Y2 = 0jX0 = 3) + P(Y1 = 1 Y2 = 1 Y3 = 0jX0 = 3) = 3 p2 q q2 + 3 pq2 2 1 2 pq q: 1 Needless to say, these probabilities become very complicated to compute even for moderate sized groups (n larger than 10 say). The Reed-Frost model is a special case of the standard SIR epidemic model presented in Chapter 2, in which the length of the infectious period is deterministic (implying that contacts between pairs of individuals occur independently). As for the model of Chapter 2, we implicitly assume that the population is homogeneous and that individuals mix homogeneously. In Section 2.4 we derive the nal size probabilities in a more coherent way for the more general model. 1.3 Stochastic epidemics in large communities The previous section illustrates the need for approximation methods when considering large communities. In fact, the recursive formulas for the nal size presented in Section 2.4 are numerically unstable and cannot be applied to obtain numerical solutions when the number of susceptibles n exceeds 50-100. The main insight from such large population approximations is the threshold limit theorem (Chapter 4). Loosely speaking, this theorem states that, as n ! 1 one of two possible scenarios can occur: either a) only few individuals will ever become infected, or else b) a more or less deterministic positive proportion of the susceptibles, with some Gaussian noise of smaller order, will have been infected by the end of the epidemic. The latter scenario is referred to as a large or major outbreak. Much work has been carried out to state versions of this theorem for more and more general models, and in particular to derive the probabilities of each of the two scenarios as well as nding the deterministic proportion infected in case of a large outbreak. Another important task is to nd out for which parameter values the asymptotic probability of a major outbreak is 0. The parameter R0 , called the basic reproduction number, plays a crucial role in this context. The parameter R0 , a function of the model, is the average number of new infections caused by a `typical' infective during the early stages of the epidemic. The threshold limit theorem states that a major outbreak in a large population is possible if and only if R0 > 1. From a statistical perspective only major outbreaks are considered since minor outbreaks would rarely be observed at all. Questions of interest are then to derive parameter estimates, including their uncertainty, having observed the number of infected. The main motivation for such estimation lies in the control of disease spread. If for example a vaccine is available, a question of practical relevance is to estimate what proportion of the susceptible population it is necessary to vaccinate, in order to avoid future outbreaks, a state known as `herd immunity'. This type of question is considered in Chapter 12. Sometimes not only the nal state of the epidemic is of interest, but rather the entire epidemic process. It is then possible to approximate the epidemic process using di usion theory (see Chapter 5). This is, for example, the case in the statistical setting where data is collected continuously from an ongoing epidemic, a topic treated in Chapter 9. 1.4 History of epidemic modelling This section will give only a short selected presentation of some early mathematical models for the spread of infectious diseases. We have no intention of covering all important events in the development of this subject. For a more detailed historical overview we refer the reader to the books by Bailey (1975) and Anderson and May (1991). One of the earliest studies of the non-linearity of an epidemic model was contained in a paper by Hamer (1906). Hamer postulated that the probability of a new infection in the next discrete time-step was proportional to the product of the number of susceptibles and the number of infectives. Ross (1908) translated this `mass action principle' to the continuous time setting. The rst complete mathematical model for the spread of an infectious disease which received attention in the literature was a deterministic model of Kermack and McKendrick (1927). This model, known as the deterministic general epidemic, is now presented (note that the parametrization is di erent from the models treated later in the text). Let x(t), y(t) and z(t) respectively denote the number of susceptibles, infectives and removed (=recovered and immune, or dead) individuals. The population is considered to be of xed size n (i.e. x(t) + y(t)+ z(t) = n for all t). The model is then de ned by the following set of di erential equations x0 (t) = ; x(t)y(t) y0(t) = x(t)y(t) ; y(t) z0 (t) = y(t) (1.1) with initial state (x(0) y(0) z(0)) = (x0 y0 0). In Figure 1.1 we show how (x y z) varies over time for = 1:9 and = 1 where the bottom region corresponds to x(t), the middle region to y(t) and the top region to z(t) (which is empty at t = 0 since z(0) = 0). Note that their sum remains constant over time since x(t)+ y(t)+ z(t) = n. The factor x(t)y(t) in (1.1) is the crucial non-linear term, indicating that infections occur at high rate only when there are many susceptibles and infectives. From the rst two equations it follows that dx=dz = ; x, where = = . So x(t) = x0 e; z(t) and hence y(t) = n ; z(t) ; x(t) = n ; z(t) ; x0 e; z(t) . Kermack and McKendrick (1927) showed that y is decreasing unless y0( x0 ; ) > 0 or equivalently x0 > 1= . In this latter case there will be a growing epidemic. This observation is known as the threshold result, i.e. that completely di erent behaviour will occur depending on whether x0 exceeds 1= or not. Another important observation was that z(t) ! z1 < n as t ! 1, where z1 is the solution of z = n ; x0 e; z . This leads to the very important property that not everyone becomes infected! The rst stochastic model was proposed by McKendrick (1926). This model, a 100 80 60 40 20 2.5 5 7.5 10 12.5 15 Figure 1.1: Plot of (x(t) y(t) z(t)) de ned by (1.1) with = 1:9, = 1, x0 = 100 and y0 = 1. For each t, x(t) corresponds to the lower segment, y(t) to the mid-segment and z(t) to the top-segment. (N.B. x(t) + y(t) + z(t) = 101 for all t.) stochastic continuous-time version of the deterministic model of Kermack and McKendrick (1927), did not receive much attention. A model which attracted more attention at the time, even though it was never published, was the chain-binomial model of Reed and Frost (presented in Section 1.2) put forward in a series of lectures in 1928. It was not until the late 1940's, when Bartlett (1949) studied the stochastic version of the Kermack-McKendrick model, that stochastic continuous-time epidemic models began to be analysed more extensively. Since then, the e ort put into modelling infectious diseases has more or less exploded. In the rest of the lecture notes a selection of these methods will be presented. The list of references covering epidemic models and their analysis can of course be made long. Here we only mention a few central texts. The rst pioneering monograph was written by Bailey in 1957 but has since then been reprinted in a second edition (Bailey, 1975). This book covers both stochastic and deterministic models as well as statistical inference with numerous applications to real data. Becker (1989) is mainly concerned with statistical analysis of infectious diseases. Gabriel et al. (1990) is a collection of papers, on stochastic modelling and some statistical inference, that were written for a workshop on stochastic epidemic modelling in Luminy, France. The book by Anderson and May (1991) mentioned previously is probably the book which has received the most attention, together with Bailey (1975). Anderson and May (1991) model the spread of disease for several di erent situations and give many practical applications. The book is also concerned with estimation however, both modelling and inference are deterministic. A thematic semester at Isaac Newton Institute, Cambridge, resulted in three collections of papers (Mollison, 1995, Isham and Medley, 1996, and Grenfell and Dobson, 1996), covering topics in stochastic modelling and statistical analysis of epidemics (i.e. its propagation in a community), human infectious diseases (i.e. the infection process inside the body), and animal diseases, respectively. Very recently, two new monographs have been published. The rst one is written by Daley and Gani (1999) who have a long experience in stochastic modelling of the spread of disease. This book focuses on stochastic modelling but contains also statistical inference and deterministic modelling, as well as several historical remarks. Diekmann and Heesterbeek (200?), nally, are concerned with mathematical epidemiology of infectious diseases and also apply their methods to real data. Exercises 1.1. Compute P(Z = 0jX0 = 10 Y0 = 1), P(Z = 1jX0 = 10 Y0 = 1), P(Z = 2jX0 = 10 Y0 = 1) and P(Z = 3jX0 = 10 Y0 = 1) for the Reed-Frost model. (Hint: Compute the probability for each epidemic chain giving Z = k separately and then add these probabilities.) 1.2. Show that y(t) ! 0 as t ! 1 for the model of Kermack and McKendrick given by the di erential equations (1.1). 1.3. Consider again the Kermack-McKendrick model and assume x0 > 1= . Using the formula y(t) = n ; z(t) ; x0 e; z(t) , show that the maximum number of infectives occurs exactly at the time t when x(t) = 1= . 2 The standard SIR epidemic model In this chapter we present a simple model for the spread of an infectious disease. Several simplifying assumptions are made. In particular, the population is assumed to be closed, homogeneous and homogeneously mixing. Also, the e ects of latent periods, change in behaviour, time varying infectivity and temporary or partial immunity are not taken into account. In later chapters we shall indicate ways of handling some of these complicating features of a real-life epidemic. 2.1 De nition of the model We assume that initially there are m infectious individuals (that have just become infected) and n susceptible individuals. The infectious periods of di erent infectives are independent and identically distributed according to some random variable I , having an arbitrary but speci ed distribution. During her infectious period an infective makes contacts with a given individual at the time points of a time homogeneous Poisson process with intensity =n. If a contacted individual is still susceptible, then she becomes infectious and is immediately able to infect other individuals. An individual is considered `removed' once her infectious period has terminated, and is then immune to new infections, playing no further part in the epidemic spread. The epidemic ceases as soon as there are no more infectious individuals present in the population. All Poisson processes are assumed to be independent of each other they are also independent of the infectious periods. We call this model the standard SIR epidemic model, the letters S, I, R standing for the terms `susceptible', `infectious' and `removed', respectively. Following Ball (1995), we denote the process by En m( I ). Also denote the mean and the variance of the infectious period I by and 2 , respectively. The rate of contacting a given individual is set to =n in order to keep the rate at which a given infective makes contact with other (initially susceptible) individuals constant (= ), independently of the population size. The special case where the infectious period has an exponential distribution will be discussed in some detail further on. The basic reproduction number and the nal epidemic size, already encountered in the Introduction, are two extremely important epidemiological quantities. The nal size of the epidemic, Z , is simply de ned as the number of initially susceptible individuals that ultimately become infected. Thus Z is a nite random variable taking values between 0 and n. In Section 2.4 we shall derive a linear system of equations for the distribution of the nal epidemic size. The basic reproduction number, R0 , is a little more di cult to describe. For this simple model R0 is conveniently de ned as the expected number of infections generated by one infectious individual in a large susceptible population. For the model presented above R0 = since is the average length of the infectious period and during the infectious period an infectious individual has contact with initially susceptible individuals at rate . The branching approximation of Section 3.3 will in a rigorous way give the basic reproduction number the interpretation of a critical parameter indicating whether a large outbreak is possible or not. For epidemic models with di erent types of heterogeneities, it is not always obvious how R0 should be de ned. We shall return to this problem in Chapters 6 and 7. 2.2 The Sellke construction The following alternative elegant construction of the standard SIR epidemic model is based on Sellke (1983). We keep track of the total `infection pressure' generated by the infectious individuals. Each susceptible individual is associated with a critical level of `exposure to infection', and as soon as the infection pressure reaches this level, the susceptible becomes infected. We call this level the threshold of the individual. This purely mathematical construction does not re ect any properties of a real-life epidemic but it will serve as an important tool in the derivation of several results in later chapters. Label the initial infectives ;(m ; 1) ;(m ; 2) : : : 0 and the initial susceptibles 1 2 : : : , n. Let I;(m;1) I;(m;2) : : : In be independent and identically distributed random variables, each distributed according to I . Also, let Q1 Q2 : : : Qn be an independent sequence of independent and identically distributed exponential random variables, having mean 1. These are the individual thresholds. For i = ;(m ; 1) ;(m ; 2) : : : 0, the initial infective labelled i remains infectious for a time Ii and is then removed. Denote by Y (t) the number of infectives at time t, and let A(t) = n Zt 0 Y (u) du (2.1) be the total infection pressure exerted on a given susceptible up to time t. Note that in A(t) the infectives are weighted according to their infectious periods. For i = 1 2 : : : n, the susceptible labelled i becomes infected when A(t) reaches Qi . The j th susceptible who becomes infected (not necessarily the susceptible labelled j !) remains infectious for a time Ij and is then removed. The epidemic ceases when there are no more infectives present. In Figure 2.1 we have plotted the total infection pressure A(t) against t for an epidemic starting with one infectious individual (m = 1). On the y-axis we have indicated the smallest individual thresholds (Q(i) denotes the ith order statistic) and horizontally the corresponding infectious period translated in time to the instant when the individual becomes infected. Note that the slope of A(t) is proportional to the number of infectious periods covering the time point t (i.e. the number of infectives Y (t)!) as it should be according to the de nition of A given in (2.1). 6 Q(5) Q(4) ; ; Q(3) Q(2) Q(1) ; ; ; ; - t Figure 2.1: A typical realisation of the total infection pressure with m = 1 initially infectious individual. Note that the infection pressure never reaches Q(4) so the epidemic stops and the nal size is Z = 3. Let us check that this construction gives a process equivalent to the standard SIR epidemic. The infectious periods follow the correct distribution and, if Y (t) = y and the individual labelled i is still susceptible at time t, then she will become infected during (t t + t) with probability y t=n + o( t). Indeed, owing to the lack-ofmemory property of the individual threshold Qi , the probability of the complementary event is given by P(Qi > A(t + t) j Qi > A(t)) = P(Qi > A(t + t) ; A(t)) = = = e; A(t+ t);A(t)] exp ; n y t + o( t) 1 ; y t=n + o( t) where the third equality follows from the de nition of A(t), since no infections will occur in a small enough time increment, making Y (u) constant and equal to y in (t t + t). In the original formulation of the model, our susceptible individual is contacted according to the superposition of y independent Poisson processes, each of intensity =n, giving rise to the same infection probability. 2.3 The Markovian case Consider the standard SIR model En m( I ) and denote by X (t) and Y (t) the number of susceptibles and the number of infectives, respectively, at time t. The process (X Y ) = f(X (t) Y (t)) t 0g will be a Markov process if and only if the infectious period has the lack-of-memory property. Assume therefore that I is exponentially distributed with intensity . Then the process (X Y ) is governed by the following transition table: from to at rate (i j ) (i ; 1 j + 1) ij=n (i j ; 1) j which follows immediately from the de nition of the model. This model, which originated with Bartlett (1949), is known as the general stochastic epidemic, a name that now seems inappropriate, since the model has over the years been generalized in an innumerable number of ways. The assumption of an exponentially distributed infectious period is certainly not epidemiologically motivated, although with this assumption the mathematical analysis becomes much simpler. Notably, using Markov process theory we can obtain deterministic and di usion approximations for the whole trajectory, which are valid for large population sizes (see Chapter 5). This is usually hard to achieve when the stochastic process is not Markovian. On the other hand, the modern probabilistic methods used in this text to derive branching process approximations (Section 3.3) together with results for the nal epidemic size (Section 2.4 and Chapter 4) do not rely on the Markov property, but can be carried out for all instances of the standard SIR epidemic de ned in Section 2.1. 2.4 Exact results Consider again the standard SIR epidemic En m ( I ). We will derive a triangular n linear system of equations for P n = (P0n P1n : : : Pn ), where Pkn is the probability that k of the initial susceptibles are ultimately infected. R Let Z be the nal size of the epidemic, and let A = A(1) = n 01 Y (u) du be the total pressure of the epidemic. Recall the Sellke construction above. Both the nal size and the total pressure can be expressed in terms of the infectious periods and the individual thresholds. First, 9 8 i < X= Ij Z = min :i : Q(i+1) > n j =;(m;1) where Q(1) Q(2) : : : Q(n) are the order statistics of Q1 Q2 : : : Qn, since the epidemic stops as soon as the infection pressure generated by the previously infected individuals is insu cient to infect any more susceptibles. Also, A= n Z X j =;(m;1) Ij which is just another way of writing A(1). It is thus clear that the nal size and the total pressure are intimately related. In fact, we have the following Wald's identity for epidemics (Ball, 1986): Lemma 2.1 Consider the standard SIR epidemic En m( I ) and let A be as above. Then E e; A ( =n)Z +m = 1 0 where ( ) = E exp(; I )] is the Laplace transform of I . Proof. To prove the identity, we note that ; ( =n) n+m = 20 13 n X A5 E 4exp @; n Ij j =;(m;1) " n X !!# = E exp ; =Ee ( =n) h ; A; A+ n i j =Z +1 n;Z Ij where the last identity follows since the variables Ij , j Z + 1, are independent of both Z and A. n We are now in position to derive the system of equations for P n = (P0n : : : Pn ). For each k 1, de ne K to be the set f1 2 : : : kg also, let 0 be the empty set. Recall that the initial susceptibles are labelled 1 2 : : : n. Pkn is the probability that k initial susceptibles are infected in the En m( I ) epidemic, and PK is the probability ;n P n . n that precisely the set K is infected. By symmetry, Pkn = k K Now x k and choose ` such that 0 k ` n, implying that K L N. We use the notion of infection pressure to compare an epidemic within N with a sub-epidemic within L. The event that an epidemic within N infects precisely the set K is the same as the event that a sub-epidemic within L infects precisely K, and that these k new infectives, together with the m initial infectives, fail to infect any of the individuals in the set N n L. We know from the Sellke construction that the probability of avoiding the infection is given by exp(;a), given that the sub-epidemic has generated the infection pressure A` = a. It follows that n ` PK = PKE exp(;A` (n ; `)) j Z ` = k] where Z ` is the nal size of the sub-epidemic. This equation is equivalent to ;` P n k k = P `E exp(;A` (n ; `)) j Z ` = k]: ;n (2.2) k k Now let us use Wald's identity (Lemma 2.1) applied to the sub-epidemic and with = n ; ` to get h i E e;A (n;`) ( (n ; `)=n)]Z +m = 1 or, conditioning on the nal size Z `, ` ` ` X Pk`E ;exp(;A`(n ; `)) j Z ` = k ( (n ; `)=n)]k+m k=0 = 1: (2.3) Equations (2.2) and (2.3) immediately give us ;` P n ` X k ;n ( (nk; `)=n)]k+m = 1: k=0 k ;` ;n = ;n;k ;n , we arrive at the following result: Finally, noting that k k `;k ` Theorem 2.2 Consider the standard SIR epidemic En m( I ). Denote by Pkn the probability that the nal size of the epidemic is equal to k, 0 k ` X n;k n ` ; k Pk k=0 ( (n ; `)=n)]k+m = n ` n. Then 0 ` n: (2.4) P[Z+m=j] 0.2 0.15 0.1 0.05 10 20 30 40 50 j Figure 2.2: The exact distribution of Z + m for m = 1, n = 50, = 1:5 and I i.e. the infectious period is constant and equal to 1. 1, Note that, since the system of equations is triangular, the nal-size-probabilities can be solved recursively. The proof of the theorem depends on the infection pressure generated by the various infectives, rather than the actual infectious periods. This indicates that we may allow for latent periods and time-dependent infection rates in the modelling, and still get the same type of nal size results, as long as the required infection pressure can be calculated. That this is indeed the case will become even clearer in Section 7.1 where the concept of random graphs is used to describe the epidemic ow through a homogeneous and uniformly mixing population. In Figure n 2.2 the probabilities P0n P1n : : : Pn , are plotted for a speci c choice of community and parameter values (the gure actually shows the plot of P (Z + m = k) for different values of k but since m = 1 this corresponds to Pkn;1). When the infectious period is constant, I c say, an infectious individual infects susceptible individuals independently (with probability p = 1 ; e; c) the distribution of the nal size is then equivalent to the Reed-Frost model de ned for discrete time dynamics in Section 1.2. It is seen in the gure that the distribution is bimodal: either a few individuals are infected or else a fairly large number are infected. This qualitative behavior becomes more and more evident as n, the initial number of susceptibles, increases. A mathematical proof ofthis is given by the threshold limit theorem of Chapter 4. Finally we mention brie y the very elegant theory developed by Lefevre and Picard in a series of papers (see e.g. Lefevre and Picard, 1990). They work with quite general classes of stochastic epidemic models, both single-type and multitype (cf. Chapter 6), and derive e.g. equations for the nal size distribution and the total force of infection A(1), using a non-standard family of polynomials initially introduced by Gontcharo (1937). However, we have decided not to include any presentation of their work, since the approach is rather algebraic in nature, and hardly increases the intuitive understanding of the models treated here. Exercises 2.1. Compute P0n, P1n and P2n numerically using the recursive formula given by (2.4) assuming n = 10, m = 1, = 2 and that the infectious period I is: a) exponentially distributed (the Markovian case) with mean 1 time unit. b) ;(2 2)-distributed (i.e. with mean 1). c) constant and equal to 1. 2.2. Assume m = 1, = 1 and that the infectious period is constant with mean 1 time unit. Use your favorite computer and mathematical software to see when the recursive formula of Section 2.4 breaks down numerically by computing and plotting n P1n : : : Pn for n = 10 20 30 : : : (for most computers negative probabilities start appearing around n 70 ; 90). 2.3. Modify the standard SIR epidemic so that the infectious period is = 1. (This model is denoted the SI model since individuals never get removed. It also has applications in sociology for the spread of rumours/knowledge. Infectious then corresponds to knowing, and spreading, the rumour.) For this model X (t) + Y (t) = n + m for all t. Assume m = 1. Describe the random process Y = fY (t) t 0g, of infectious individuals. Calculate the expected waiting time until everyone in the population becomes infected. (Hint: Consider the consecutive waiting times between infections.) What happens with this expression when n gets large? 3 Coupling methods Let us assume that we are interested in comparing two or more random elements with each other. It is sometimes possible to construct versions of these random elements on the same probability space, in such a way that the comparison suddenly becomes easy (indeed, often trivial) to carry out. This procedure is called coupling, the term referring to the fact that the random elements so constructed are often highly dependent. The coupling method has found many important applications in various elds of probability theory, including Markov processes, renewal processes and Poisson approximation. The book by Lindvall (1992) provides a nice introduction to the subject. Here we introduce some classical coupling ideas by providing simple examples. Then, after presenting the formal de nition, we describe some applications of the coupling method to the standard SIR epidemic model En m ( I ). First, it is shown that the number of infectious individuals in a large population initially behaves like a branching process. By coupling En m ( I ) with a branching process, we justify the approximation of the epidemic by the simpler and thoroughly analysed branching process. This result indicates at the same time the signi cance of the basic reproduction number R0 . Second, by coupling two epidemics with di erent contact parameter , we prove the intuitively obvious fact that the accumulated number of infected individuals at a given time grows (in a sense yet to be de ned) with , the infectiousness of the disease. 3.1 First examples Gambler's ruin problem Consider rst the standard gambler's ruin problem. An individual with an initial capital of m units of money goes to a casino. He plays a series of independent games, in each of which he wins one unit with probability and loses one unit with probability 1 ; . He continues until either his capital reaches n (n > m) or he goes bankrupt. We wish to use coupling to prove that P ( ), the probability of reaching the capital n given , is increasing in (as would be expected). To this end, let U1 U2 : : : be independent and identically distributed random variables, each uniformly distributed on the interval (0 1). Then, for a given , de ne +1 if Ui Y (i) = ;1 otherwise i = 1 2 : : : , and X X ( ) = m + Y (i) i=1 = 1 2 : : : . Then, since Y (i) is +1 with probability , the process X = fX ( ) = 1 2 : : : g, clearly describes the capital of the gambler. Note carefully that the same set of uniform variables is used to construct an entire family of trajectories, indexed by . Now, if < 0 then by construction Y (i) Y (i) for all i, so X ( ) X ( ) for all . This means that if a trajectory X ( ) reaches n before 0 then the same is true for X ( ). Thus we have that P ( ) P ( 0), since this property does not depend on the particular sample space chosen. 0 0 0 Stochastic ordering Let X and X 0 be real-valued random variables. We say that X is stochastically smaller D than X 0 and write X X 0 if P(X a) P(X 0 a) for all a, i.e. the probability that X exceeds an arbitrary level is smaller than the probability that X 0 exceeds the same level. When working with stochastically ordered random variables, the following simple D result can often be very helpful: if X and X 0 are random variables such that X X 0, ^^ ^^ then there exists a coupling (X X 0) of X and X 0 such that X X 0. The proof goes as follows. For a given distribution function G, de ne the generalized inverse G of G by G (u) = inf fx : G(x) ug 0 < u < 1: Then G (U ) has distribution function G if U is uniformly distributed on (0 1). If F and F 0 are the distribution functions of X and X 0, respectively, we have F F 0 ^ by assumption. Thus F F 0 , and we see that the variables X = F (U ) and 0 = F 0 (U ) provide us with the desired coupling. ^ X Domination of birth and death processes In our next example, di erent birth and death processes are compared. Suppose that X = fX (t) t 0g, and X 0 = fX 0(t) t 0g, are two birth and death processes on the set of nonnegative integers. The process X has birth rates i and death rates i, P(X (t + dt) ; X (t) = +1 j X (t) = i) = idt + o(dt) P(X (t + dt) ; X (t) = ;1 j X (t) = i) = idt + o(dt) X (0) = m, and likewise X 0 has birth rates 0i, death rates 0i and initial value m0. We use coupling to show that if i 0i for all i 0 and i 0i for all i 1 then X (t) is stochastically smaller than X 0(t) for all t (provided also m m0 ). ^^ De ne a bivariate process (X X 0) with initial value (m m0) and with the following intensity table: from to at rate (i j ) (i + 1 j ) i 0 (i j + 1) j (i ; 1 j ) i 0 (i j ; 1) j (i i) (i + 1 i + 1) i 0; i (i i + 1) i (i ; 1 i ; 1) 0i (i ; 1 i) i ; 0i : This process is well-de ned since all the numbers describing the transition rates are ^ nonnegative by assumption. It is easily checked that the rst coordinate X follows the same law as X . Likewise, the second coordinate is distributed as X 0. Moreover, if the bivariate process starts above the diagonal i = j then it will stay above the ^ ^ ^ ^ diagonal at all times. Hence we have found versions X and X 0 with X (t) X 0(t) for D all t in particular X (t) X 0 (t) for all t, since this latter property has nothing to do with the sample space chosen. Convergence of Markov chains As a nal illustration of the coupling method we show the classical result that an irreducible aperiodic Markov chain with a nite m-state space approaches stationarity as time grows, regardless of the initial distribution. Let X = (X (0) X (1) : : : ) be such a Markov chain and denote its transition probability matrix by P and its initial distribution by . Also, let = ( 1 : : : m ) be the unique strictly positive stationary distribution, satisfying = P . We wish to prove that P(X ( ) = j ) ! j as ! 1, for all xed states j , j = 1 : : : m. We now give the classical coupling proof. Introduce a Markov chain X 0 = (X 0(0), X 0(1) : : : ), independent of X , governed by the transition matrix P , and with ini^ tial distribution . This makes X 0 stationary. Then x a state j and de ne X = ^ (0) X (1) : : : ) by ^ (X X ^ X ( ) = X 0(( )) if < T if T where T = minf 0 : X ( ) = X 0( )g: ^ Due to the (strong) Markov property, the processes X and X will be equally distributed. Thus ^ jP(X ( ) = j ) ; P(X 0( ) = j)j = jP(X ( ) = j ) ; P(X 0( ) = j )j ^ = jP(X ( ) = j T > ) ; P(X 0( ) = j T > )j P(T > ): It is easy to see that P(T > ) ! 0 as ! 1, i.e. that T is nite a.s. Indeed, the bivariate process (X X 0 ) is irreducible and aperiodic, and T is the rst time this process visits the diagonal. 3.2 De nition of coupling Skills in coupling are acquired only by working through many examples. The actual de nition is not very illuminating, but we give it here anyway for the sake of completeness. Given probability spaces ( F P) and ( 0 F 0 P0), denote the state space by E E could be the set of nonnegative integers, the set of real numbers, the set of sequences of real numbers, the space of right-continuous real-valued functions de ned on 0 1), and so on. By a coupling of the random elements X : ! E and X 0 : 0 ! E we mean a ^^ ^^ probability space ( ^ F P) and a random element (X X 0) : ^ ! E 2 such that D^ X=X and D^ X 0 = X 0: ^ In our rst example, the desired result was obtained by putting X = X and ^ X 0 = X . The second and third example also t nicely into our de nition, and so ^ does the last one if we put X 0 = X 0. 0 3.3 Applications to epidemics Initial approximation Intuitively speaking, during the initial stages of an epidemic in a large population we would expect that, with high probability, contacted individuals are susceptible, so that the number of infectious individuals follows some kind of branching behaviour. This branching approximation idea has a long history, see e.g. Bartlett (1955) and Kendall (1956). Here, following Ball and Donnelly (1995), we use a coupling argument to investigate how the approximation improves as the population size tends to in nity. Let us rst de ne the branching process, and derive heuristically some initial results concerning this process. At time t = 0 there exists a group of m ancestors (that have just been born). The life spans of di erent individuals are independent and identically distributed according to a random variable I . During her life span, a given ancestor gives birth at the time points of a Poisson process with intensity . Her children have independent and identically distributed life spans and themselves give birth according to Poisson processes with intensity , and so on. We assume that all Poisson processes are independent of each other they are also independent of the life spans. Denote the resulting process by Em( I ). Jagers (1975) gives a thorough treatment of a class of continuous time branching processes containing the above process as a special case. Let fY (t) t 0g, be the number of individuals alive at time t, t 0, and denote by D the number of o spring of a given individual. We wish to investigate the possible extinction/explosion of Y (t) as t grows. First, since there are on the average mE (D) individuals in the th generation, it is intuitively clear that the process will become extinct if and only E (D) 1. Turning to the more interesting case where E (D) > 1, we let q be the extinction probability of the branching process and rst assume that m = 1. Then, by letting D0 be the number of children of the ancestor, we have q= 1 X k=0 P(extinction j D0 = k)P(D0 = k): But ultimate extinction will occur if and only if all of P (independent) branches the generated by these children become extinct, hence q = 1 qk P(D0 = k), showing k=0 that q is the solution of the equation = E ( D ). With a little more work one can show that q is the smallest solution to this equation. Finally, when there are m ancestors the extinction probability is given by qm with q as above. For detailed proofs of all these results, see Jagers (1975). We conclude our description of the process Em ( I ) by presenting the fundamental quantities encountered above in a somewhat more explicit form. Note that, given I = t, the number of children D is Poisson distributed with mean t. It follows that the probability generating function of D is ; E( D) = E E( D j I ) = E (expf; I (1 ; )g) = ( (1 ; )) and its expectation is E (D ) = where the probability generating function and the expectation of I are given by and , respectively. Also note that the variance of D is strictly positive, even if the lifetime I is constant. Returning to the main topic of this section, we examine the branching approximation of our epidemic process. Consider a sequence of standard SIR epidemic processes En m( I ), n 1. Let fYn(t) t 0g, be the process describing the number of infectives in the nth epidemic. We wish to compare this process with fY (t) t 0g, the process describing the number of individuals alive in the continuous time branching process Em ( I ). First we construct the branching process Em ( I ). Suppose that the probability space ( F P) holds the individual life histories H;(m;1) H;(m;2) : : : , where each 6 H3 H2 H1 H0 u u eu e eu - t Figure 3.1: Construction of Em ( I ) for m = 1 using life histories. Hi is a list containing the life span of the ith individual together with the time points at which this individual gives birth. Let H;(m;1) H;(m;2) : : : H0 be the life histories of the m ancestors and let Hi , i 1, be the life history of the ith individual born. Next, we use an independent sequence Ui , i 1, of independent and identically distributed random variables de ned on ( F P), each uniformly distributed on (0 1), to construct all of the epidemic processes En m ( I ), n 1. Fix n and label the initial susceptibles 1 2 : : : n. The initial ancestors in the branching process correspond to the initial infectives in the epidemic. A contact in the epidemic process occurs whenever a birth occurs in the branching process. The individual contacted at the ith contact has label Ci = nUi ] + 1. If this individual is still susceptible, then she becomes infected in the epidemic, otherwise she and all of her descendants in the branching process are ignored in the epidemic process. In the latter case the individual is called a ghost, following Mollison (1977). Finally, the death of an nonghost individual in the branching process corresponds to removal in the epidemic. This construction leads to a process equivalent to En m( I ). In Figure 3.1 the construction of Em ( I ) is illustrated with m = 1. The life histories of the initially infectious individuals are inserted horizontally one at each y-level, starting at t = 0 (in the example only H0 since m = 1). As t grows, a new life history is inserted whenever there is a birth among the life histories present. In the construction of the epidemic En m( I ) the same procedure is used, except that the life history is not inserted if the label Ci has appeared previously, i.e. the individual is a ghost. Obviously, the processes Yn and Y agree until the time Tn of the rst ghost. The number of births in the branching process during a xed time interval 0 t0] is nite a.s. It is easily checked that any nite number of labels Ci will be distinct with a probability tending to 1 as n ! 1, showing that P(Tn > t0 ) ! 1 as n ! 1: Ball and Donnelly (1995) have shown that the branching process breaks down roughly at time C log(n). Note that the problem of deciding the number of contacts made before a previously infected individual is picked again, i.e. the number of trials before one of the labels Ci is repeated, is a variant of the classical birthday problem. We collect our ndings in the following theorem: Theorem 3.1 Consider a sequence of epidemic processes En m( I ), n 1. Also, denote by Yn (t) the number of infectives at time t, t 0. Then, for each xed t0 , Yn(t0) ! Y (t0 ) almost surely, where fY (t) t 0g, is the process describing the number of individuals alive in the branching process Em ( I ). If 1 then Y becomes extinct with probability 1. On the other hand, if > 1 then Y becomes extinct with probability qm , where q is the smallest root of the equation ( (1 ; )) = , or explodes with probability 1 ; qm . This result shows that the basic reproduction number R0 = determines, for a large population, whether or not a large outbreak of the epidemic may occur. Monotonicity We nally show how coupling can be used to compare epidemics with the same distribution for infectious periods but with di erent infection rates. We consider 0 , and two standard SIR epidemic models En m ( I ) and En m( 0 I ) where prove that the accumulated number of infected individuals at time t for the epidemic with infection rate is stochastically smaller than the corresponding quantity for the epidemic with rate 0. Let us return to the Sellke construction. Let I;(m;1) I;(m;2) : : : In be the infectious periods and let Q1 Q2 : : : Qn be the individual thresholds of the epidemic En m( I ). Thus the variables Qi , 1 i n, are exponentially distributed with intensity 1. To construct En m( 0 I ), use the same realizations of infectious periods and individual thresholds. Only the infection pressure processes, A(t) and A0 (t), respectively, are di erent for the two models. For 1 j n, de ne Tj to be the time of the j th infection in the epidemic with infection rate Tj = 1 if the nal size is strictly less than j . Similarly de ne Tj0, 1 j n, for the epidemic process with infection rate 0. The rst infection in the epidemic with rate 0 will occur earlier than the rst infection in the other epidemic, i.e. T10 T1, since A(t) A0 (t) up to the time point T10 . This will give rise to an even larger di erence in magnitude between the two infection pressure processes. Consequently, it follows that T20 T2 , and so on. The desired result follows, since jfj : Tj tgj jfj : Tj0 tgj for all t. Exercises ^^ 3.1. Check that the process (X X 0), de ned in the section on domination of birth and death processes, is indeed a coupling of the birth and death processes X and X 0, i.e. that the marginal distributions coincide with the distributions of X and X 0, respectively. 3.2. In Section 3.3 we saw that the branching approximation broke down on the part of the sample space where the size of the total progeny in the branching process was in nite. Show that in this case the rst ghost in the construction appears when p approximately n individuals have become infected. It is well-known in branching process theory that, given explosion, the total number born before t grows like e t , where (the so-called Malthusian parameter) satis es a certain equation. What can hence be said about the time of the appearance of the rst ghost? 3.3. Consider the Markovian version (Section 2.3) of the standard SIR epidemic (m xed, n large). Without referring to the branching approximation of Section 3.3, approximate the process of infectives Yn(t) during the initial stage of the epidemic with a suitable simple birth and death process. What is the probability of extinction/explosion of this approximating process? (Hint: Xn(t) n during the initial stage of the epidemic.) 4 The threshold limit theorem We will now explore in greater detail the large population limit of the nal size distribution for the standard SIR epidemic model En m( I ). We have seen (Section 3.3) that, if the population of susceptibles is large and we introduce a small number of initial infectives, the number of infectious individuals behaves like a branching process in the beginning. If the basic reproduction number R0 = is less than or equal to 1, a small outbreak will occur. On the other hand, if R0 exceeds 1, then there is a positive probability that the approximating branching process explodes this implies, of course, that the branching process approximation will break down after some time. Then it is reasonable to expect that the nal epidemic size will satisfy a law of large numbers. This indicates that the asymptotic distribution of the nal size actually consists of two parts, one close to zero and the other concentrated around some deterministic value. In this chapter we sketch the derivation of these results, using the Sellke construction and the beautiful imbedding representation of ScaliaTomba (1985, 1990). A uctuation result for the nal size, given a large outbreak, will also be given. In the nal section we combine earlier results and indicate the proof of a theorem due to Barbour (1975) on the duration of the (Markovian) standard SIR epidemic. 4.1 The imbedded process To explain the imbedding idea, we need to introduce two auxiliary processes the infection pressure process I (t) = n t];m X j =;(m;1) and the threshold process Q(t) = n X j =1 Ij 0 t n+m 1 fQ j tg t 0: Recall that the variables Ij are the infectious periods, and the variables Qj are the individual thresholds. The index t should not be interpreted as ordinary time. We know from Section 2.4 that the nal size Z can be written 8 < Z = min :i 0 : Q(i+1) > n i X j =;(m;1) 9 = Ij : Let us express this quantity in terms of the infectivity and the susceptibility processes. By de nition, Z = i if and only if Q(1) I (0 + m) Q(2) I (1 + m) ... Q(i) I (i ; 1 + m) Q(i+1) > I (i + m): Put another way, the pressure I (0 + m) is enough to infect at least one individual, the pressure I (1 + m) is then enough to infect at least two individuals, and so on. Hence Q(I (0 + m)) > 0 Q(I (1 + m)) > 1 ... Q(I (i ; 1 + m)) > i ; 1 Q(I (i + m)) i: It follows immediately that Z = minft 0 : Q(I (t + m)) = tg: For future use, denote the composition Q(I (t)) by C (t). Figure 4.1 shows simulated realizations of C (t) for two di erent population sizes. The nal size Z is the \time" when C (t) intersects t. 4.2 Preliminary convergence results From now on we consider a sequence of epidemic processes En m( I ), n 1 the quantities de ned above will now have a subscript n. It is straightforward to analyse the (independent!) processes In and Qn. De ne In (t) = In(nt) and Qn(t) = Qn (t)=n. Then In(t) ! t and Qn(t) ! 1 ; e;t in probability, uniformly on compact sets. Here we recall that = E (I ). Since composition is a continuous operation, it follows that Cn(t) = Qn(In(t)) ! 1 ; e; t in probability, uniformly on compact sets. We also have the following uctuation results. The sequence of processes pn ;In(t) ; t t 0 100 1000 80 800 60 600 40 400 20 200 20 40 60 80 100 200 400 600 800 1000 Figure 4.1: Simulation of the imbedded process C (t) for n = 100 (left) and n = 1000 (right). n 1, converges in distribution to a Wiener process with variance sequence pn ;Qn(t) ; 1 ; e;t t 0 2 2 t. Also, the n 1, converges weakly to a Gaussian process with mean 0 and variance e;t (1 ; e;t ). (Readers not familiar with the theory of weak convergence of sequences of random processes may consult e.g. Billingsley, 1968. Convergence for a xed arbitrary value of t follows from the usual law of large numbers and the central limit theorem, hence the stated results are certainly plausible.) Finally, in order to study the uctuations C~n(t) of the composite process Cn(t) = Qn(In(t)) we write pn ;Qn(In(t)) ; 1 ; e; t = + pn Qn(In(t)) ; h1 ; e;I (t)i pn h1 ; e;I (t)i ; 1 ; e; t n n : ~ It is seen that this process converges weakly to a zero mean Gaussian process C (t) with variance ; e; t 1 ; e; t + 2 2 te;2 t : All of the results above are derived rigorously by Scalia-Tomba (1985, 1990). 4.3 The case mn=n ! > 0 as n ! 1 Write the nal size proportion as Zn = 1 min t 0 : C t + mn = t : n nn nn n 0 Working with Zn = Zn + mn, which includes the initial infectives in the nal size, will lead to cleaner formulas. We know that Cn(t) converges in probability to 1 ; e; t , and also that mn =n ! > 0 by assumption. It is thus easily seen that the sequence 0 0 Zn = Zn=n converges to , the unique solution to the equation 1 ; e; ; : To give an intuitive explanation of this equation, rewrite it as 1 + ; = e; : (4.1) A ;given individual avoids becoming infected by a given infective with probability E e; I=n . Hence the probability of escaping infection is given by ; E e; I=n = n 1; n e; i.e. the right hand side of (4.1). But the probability of escaping infection is of course equal to the proportion of initial susceptibles who remain uninfected, i.e. the left hand side of (4.1). In Figure 4.2 the two functions f (t) = 1 + ; t and g(t) = e; t are plotted for the case = 0:1, = 1:8 and = 1. The intersection of the two curves de nes the unique solution 0:903. 0 0 To obtain a central limit theorem for Zn, we proceed as follows. Since Zn = 0 ) + mn =n and = 1 ; e; + , we may write Cn(Zn pn ;Z 0 ; Zn 0 h pn pn pn i Cn ;Zn0 ; 1 ; e; Z n h ; Zi + 1;e ; 1 ; e; + o(1) h i Cn ;Zn0 ; 1 ; e; Z = p;0 + e; n Zn ; + o(1): p0 Rearranging and taking limits yields that the sequence n(Zn ; ) converges to a = 0 n 0 n 0 n normally distributed random variable with mean 0 and variance ;e; ;1 ; e; + 2 2 e;2 ;1 ; e; 2 : 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 Figure 4.2: Graphical illustration of , the solution to (4.1), for = 0:1, = 1:8 and = 1. The expression above is well-de ned if the condition e; < 1 is ful lled. To see that this is indeed the case, consider the functions exp(; t) and 1 + ; t in Figure 4.2. The derivative of the exponential function has to be strictly greater than ;1 at the crossing point , leading to the desired conclusion. The condition has actually a very natural interpretation. At the beginning, the basic reproduction number is given by R0 = , i.e. an infectious individual generates on the average new cases in a large susceptible population. However, after a large outbreak a fraction 1 ; ( ; ) has escaped infection so that a second introduction of the disease in the population would correspond to the e ective basic reproduction number (1 ; ( ; )) = exp(; ). This number is less than 1, otherwise the epidemic would not have ceased in the rst place. Let us state our results as a theorem. Theorem 4.1 Consider a sequence of epidemic processes En m ( I ). Assume that mn=n ! > 0 as n ! 1 and de ne as the solution to (4.1). Also denote the p0 0 nal epidemic size by Zn and write Zn = Zn + mn . Then the sequence n(Zn=n ; ) converges to a normally distributed random variable with mean 0 and variance (1 ; ) + 2 2 2 (1 ; )2 n where = 1 + ; = e; . 4.4 The case mn = m for all n Recall the branching process approximation result of Section 3.3, implying that the nal size Zn of the epidemic En m( I ) converges almost surely to the total progeny of a suitably chosen branching process Em( I ) as n ! 1. Thus we have that 0 lim lim P(Zn t) = qm t!1 n!1 of where qm is the extinction probability p the approximating process. Choose a sequence tn such that tn =n ! 0 and tn= n ! 1 as n ! 1. Also, de ne as the nontrivial solution of the equation 1 ; e; = : (4.2) We will show that 0 lim P(Zn tn) = qm (4.3) n!1 p 0 (4.4) lim nlim P(tn < Zn < n ; c n) = 0 c!1 !1 pn) = 0: 0 (4.5) lim lim P(Zn > n + c c!1 n!1 This means that if n is large and if the branching approximation breaks down, then p p the nal size falls in the range n ; c n n + c n] with high probability (for some large xed c). Scalia-Tomba (1985, 1990) has proved that the central limit theorem above (with = 0) applies in the case of a large outbreak. We start by proving (4.3), following Ball and Clancy (1993) rather than ScaliaTomba (1985, 1990). De ne Pt Em ( I )] to be the probability that the branching 0 process Em ( I ) has total progeny t (including the m ancestors). If Zn tn then each infective contacts susceptible individuals at a rate bounded below by n = (n + m ; tn )=n. It follows that P(Z 0 n tn) t X n t=0 1 X t=0 Pt E m ( n I )] Pt E m ( n I )] m and the right hand side is the extinction probability qn of the modi ed branching process Em( n I ). Now n ! as n ! 1, so that qn ! q. Thus, for xed > 0 there remains to choose t and n large enough that 0 0 qm ; P(Zn t) P(Zn tn) qm + : Equation (4.3) follows. Now assume that the basic reproduction number is above 1, i.e. R0 = > 1, otherwise there is nothing left to prove. We proceed to show (4.4). We have ; cpn) P(Cn(t + m) = t for some t 2 tn n ; cpn]) p t 1~ 1 ; e; (t+m)=n ; n = ; p Cn t + m for some t 2 tn n ; c n] n n 0 P(tn < Zn < n =P : Are there any values of t in the prescribed interval for which the equality above could be ful lled? The left hand side is a concave function of t, so it is enough to check the p endpoints. First put t = n ; c n. After an easy calculation, we have t c; 1 ; e; (t+m)=n ; n = pn e; + 1 + O(1=n) p ~ and the process jCn(t)j= n will not be able to reach this level if c is large enough. Second, set t = tn . It follows that t n 1 ; e; (t+m)=n ; n = ( ; 1) tn + O(1=n): p p Again, this quantity is large on the 1= n scale, since tn= n ! 1 as n ! 1. (Note that > 1 by assumption.) This proves (4.4), and (4.5) is veri ed in the same manner. We have thus indicated the proof of the following very important result. Theorem 4.2 Consider a sequence of epidemic processes En m ( I ). Assume that n mn = m for all n, and de ne as the nontrivial solution to (4.2). Also denote the 0 nal epidemic size by Zn and write Zn = Zn + m. If 1 then Zn ! Z almost surely, where P(Z < 1) = 1 and Z is the total progeny in a continuous time branching process Em ( I ), initiated by m ancestors, in which individuals give birth at the rate during a lifetime distributed according to I. If > 1 then Zn still converges to Z , but now P(Z < 1) = qm , where qm is the extinction probability of the branching process. With probability 1 ; qm , the sequence pn(Zn0 =n ; ) converges to a normally distributed random variable with mean 0 and variance (1 ; ) + 2 2 2 (1 ; )2 where = 1 ; . From the theorem it follows in particular that, in the case where R0 > 1, the nal size proportion Zn=n converges in distribution to a random variable with mass qm at the point 0 and mass 1 ; qm at the point . In Figures 4.3 and 4.4 histograms of the nal size in 10000 simulations are reported. Both gures treat the Markovian version of the standard epidemic model (cf. Section 2.3) and an initial population of n = 1000 susceptibles and m = 1 infectious individual. Figure 4.3 is the frequency distribution when R0 = 0:8 < 1 whereas Figure 4.4 corresponds to the case where R0 = 1:5 > 1. It is clear from the picture that major outbreaks occur only in the latter case. The solution of (4.2) is 0:583 when R0 = = 1:5 and the initial proportion infectious is negligible. The variance expression of Theorem 4.2 is 3.139. The theorem then states that the distribution of the major outbreak sizes should be p approximately Gaussian with mean n = 583 and standard deviation 3:139n 56, agreeing quite well with the histogram. 4.5 Duration of the Markovian SIR epidemic Let us nally discuss the duration Tn of the Markovian SIR epidemic, i.e. the time between the rst infection and the last removal in the epidemic. Barbour (1975) has derived limit theorems, as the population size becomes large, for the distribution of the duration. The epidemic is allowed to start either with a positive fraction of infectives or with a single case. Below we give a heuristic motivation of the fact that Tn is either O(1) or grows like log(n) as n ! 1. We only discuss the (more complicated) case with one initially infectious individual. Consider a sequence of standard SIR epidemics En m ( I ), where I is exponentially distributed with rate . Let Xn(t) and Yn(t) denote the number of susceptibles and infectives, respectively, at time t. De ne the duration as Tn = inf ft 0 : Yn(t) = 0g: We have seen (Exercise 3.3) that, if n is large, the number of infectives Yn is well approximated at the beginning by a linear birth and death process Y = fY (t) t 0g with birth rate y and death rate y. If = 1 then this approximating process will become extinct with probability 1, corresponding to a duration Tn that is O(1). On the other hand, if = > 1 then Y will become extinct with probability ( = );1 and will explode with probability 1 ; ( = );1. In this latter case, the approximation actually breaks down after approximately C1 log(n) time units, as we have seen in Exercise 3.2, and we say that the rst phase of the epidemic is over. In the second phase of the epidemic the bivariate process (Xn Yn) moves in a more or less deterministic fashion, and one can show that the duration of this deterministic phase is O(1) (cf. Chapter 5). The third phase then begins when Xn=n has settled to 1 ; but there are still infectives present in the population. Here Yn can be approximated by another ~ ~ birth and death process Y = fY (t) t 0g, this time with birth rate (1 ; )y and death rate y. We have seen in Section 4.3 that (1 ; )= = (1 ; ) < 1, so ~ the approximating process Y is always subcritical, and will therefore become extinct after an additional time C2 log(n). Cf. the construction of Whittle (1955). n 8000 6000 4000 2000 0 200 400 600 800 1000 Figure 4.3: Histogram of nal sizes for 10000 simulations of Em n( I ) with m = 1, n = 1000 and R0 = = 0:8, i.e. below threshold. Each histogram bar has width 10, so for example the second bar denotes the frequency of outbreak sizes between 10 and 19. 8000 6000 4000 2000 0 200 400 600 800 1000 Figure 4.4: Histogram of nal sizes for 10000 simulations of Em n( I ) with m = 1, n = 1000 and R0 = = 1:5, i.e. above threshold. These arguments together make it plausible that either Tn is short, or the ratio Tn= log(n) converges in distribution as n ! 1. Actually a much stronger statement is true: Either Tn is short or the di erence Tn ; C log(n) converges in distribution as n ! 1. We have the following theorem: Theorem 4.3 Consider a sequence of epidemic processes En m ( I ), where I is n exponential with rate . Assume that mn = 1 for all n, and de ne as the nontrivial solution to (4:2). If = 1 then the duration Tn of the epidemic converges weakly to the time to extinction of a birth and death process fY (t) t 0g with birth rate y and death rate y. If = > 1 then the above is true only on a part of the sample space of probability mass ( = );1 . On the rest of the sample space, Tn ; ; 1 +1 (1 ; ) ; log(n) ; c ! W in distribution as n ! 1. Here c = c( ) is a constant and W has the distribution of W1 =( ; (1 ; )) + W2 =( ; ), where W1 and W2 are independent, both with the extreme value distribution function F (w) = exp(;e;w ). The proof is not di cult but rather technical, and is therefore omitted (see Barbour, 1975). Exercises 4.1. This exercise will give you a feeling of what proportion gets infected, assuming a major outbreak, for di erent parameter values. Using equation (4.1), compute the (asymptotic) proportion that gets infected numerically if: a) R0 = 2 (i.e. = 2) and = 0:1. b) R0 = 0:8 and = 0:1: c) R0 = 2 and = 0 (the proportion infected is now the largest solution to (4.1) or equivalently (4.2)). d) R0 = 0:8 and = 0. (What does this say about the possibility of a major outbreak?) 4.2. We know that the basic reproduction number R0 = for the standard SIR epidemic remains constant as n grows. What happens with the nal size if instead R0 grows with the population size, R0 = R0 (n)? Study the following cases: a) R0 (n) = Cn. b) R0(n) = log(n). (Hint: Since most individuals will get infected when R0 is large, consider instead the number of individuals who escape the epidemic. Compute the probability that a given individual i escapes infection by looking at her threshold Qi .) 4.3. Consider the standard SIR epidemic. Assume that m = 1, n is large, and the basic reproduction number is above 1. Using the branching approximation, show that the probability of a large outbreak is always less than or equal to the nal size proportion in case of a large outbreak. When does equality hold? (Hint: Derive expressions for the two quantities and apply Jensen's inequality to nd a relation between them.) 5 Density dependent jump Markov processes In the present section we shall approximate certain jump Markov processes as a parameter n, interpreted as the population size, becomes large. The results will be presented in a form general enough for our purposes. More general results, as well as other extensions, may be found in Chapter 11 of Ethier and Kurtz (1986), which has served as our main source. With the aim to explain the intuition behind the theory we start with a simple example, a birth and death process with constant birth rate (`immigration') and constant individual death rate. The results in Sections 5.3 and 5.4 are applied to this example, thus giving explicit solutions. In Section 5.5 we apply the results to the Markovian version of the epidemic model described in Section 2.3. It is shown that this process converges weakly to a certain Gaussian process but in this case it is not possible to obtain explicit solutions for the deterministic limit and the covariance function. It is worth mentioning that the techniques presented in this chapter may be applied to a wide range of problems such as more general epidemic models and models for chemical reactions and population genetics, as well as other population processes. Two fundamental results about Poisson processes will be used without proof. In Section 5.3 we use the fact that if Y = fY (t) t 0g is a Poisson process then limn!1 sups t jn;1Y (ns) ; sj = 0 almost surely, for any t 0. In p Section 5.4 we ap(n) , de ned by W (n) (t) = n(n;1 Y (nt) ; t), ply the result stating that the process W converges weakly to the standard Brownian motion. This is proven using the fact P that Y (nt) = n=1 (Y (kt) ; Y ((k ; 1)t)) is a sum of n independent and identically k distributed Poisson random variables, and applying Donsker's theorem (e.g. Billingsley, 1968 p 68). See for example Ethier and Kurtz (1986), Exercise 4.10 for hints to a complete proof. In Section 5.4 we also integrate with respect to Gaussian processes, the so called It^ integrals. However, it is possible to read the text without any o knowledge of such integrals. 5.1 An example: A simple birth and death process Let X = fX (t) t 0g be a birth and death process with constant birth rate: k = , and with death rates: k = k. This means that individuals enter the population at constant rate (immigration) and each individual lives for an exponentially distributed time with mean 1= . If at time t, the process satis es X (t) > = , then the death rate (= X (t)) exceeds the birth rate (= ), and vice versa if X (t) < = . One might therefore de ne = as the `equilibrium' of the population, and one would expect the population size to uctuate around this value, except possibly during the initial phase, if X (0) is far from the equilibrium. In Figure 5.1 below a simulation of the birth and death process with = 200, = 1 and X (0) = 100 is plotted. As we can see the process soon reaches the equilibrium value = = 200 around which it uctuates randomly. The arguments above are very loose, and the present chapter 200 150 100 50 2 4 6 8 Figure 5.1: Simulation of birth and death process with 100. k = 200, 10 k t = k and X (0) = will formalize the mathematical results. In particular, if is large, implying that the population size at equilibrium is large, we will show that the process above (and others of similar type) can be approximated by certain di usion processes. For the birth and death process de ned above, it turns out that it can be approximated by a so-called Ornstein-Uhlenbeck process. Such a process makes excursions away from the equilibrium, but the farther away from equilibrium the stronger is the drift towards equilibrium a result in line with the heuristics mentioned above. At the end of Sections 5.3 and 5.4 we apply the results of this section to the birth and death process and obtain explicit solutions. The birth and death process ts in with the model below if , the parameter assumed to be large, is replaced by n. 5.2 The general model Suppose that for each n 1, Zn = fZn(t) t 0g is a continuous-time Markov ( process on the d-dimensional lattice Z d governed by the jump intensities qznz)+` = n `(n;1 z ) z ` 2 Z d . This means that P(Zn(t + h) = z + ` j Zn(t) = z) = hn `(n;1 z) + o(h) ` 6= 0 X P(Zn(t + h) = z j Zn(t) = z) = 1 ; hn `(n;1z) + o(h): (5.1) ` We assume that the process has only a nite number of possible transitions, i.e. that there are only nitely many ` 2 Z d for which supx `(x) > 0, and that these transition rates `(x) are continuous functions. The starting point Zn(0) is assumed non-random. The rates above explain why the processes are called density dependent. The jump rates depend on the density of the process (i.e. normed by n). The factor n implies that the rates increase with n, a necessary criterion for the processes to behave more and more closely like a di usion. One way to characterize this process is by means of Poisson processes. To this end, let Y` = fY`(t) t 0g be independent standard Poisson process de ned for each of the possible transitions `. Then Zn can be written as Zn(t) = Zn(0) + X ` `Y` n Zt 0 ` (n ;1 Z n (s))ds : (5.2) Note that even though the Poisson processes were de ned independently, the terms in the sum above are dependent since the observation points of the Poisson processes are dependent. It is easy to check that (5.2) satis es (5.1) by recalling that the probability of a jump in a Poisson process in a short time interval is proportional to the length of the interval. Given Zn(t) = z, the probability that there will be a R jump in Y` n 0t ` (n;1Zn(s))ds during (t t + h) is thus hn `(n;1 z) + o(h), since the integrand is equal to n `(n;1 z) until the rst jump after t. Note also that the processes are de ned on the same probability space for di erent n, as the same Poisson processes are utilized in the construction. Below we prove convergence theorems for the sequence of processes fZng. 5.3 The Law of Large Numbers Before proving convergence results for fZng we derive an inequality known as Gronwall's inequality, which is interesting in its own right. Lemma 5.1 R(Gronwall's inequality). Assume f is a real function satisfying 0 t f (t) a + b 0 f (s)ds, for some positive constants a and b and for all t f (t) aebt t 0. 0. Then Proof. By iterating the inequality above one obtains f (t) a+b = Zt 0 f (s1 )ds1 a + b a + abt + b2 = aebt : Z0 t Z0 0 a+b Z s1 0 f (s2)ds2 ds1 f (s2)ds2ds1 ;a + b Z s2 f (s )ds ds ds 3 3 21 00 0 1 (bt)2 + b3 Z t Z s1 Z s2 f (s )ds ds ds : : : a X (bt)k a + a(bt) + a 2 3 321 k! a + abt + b2 = Z t Z s1 Zt s1 0 0 0 k=0 The result follows. ^ ^ Let Y` be centered Poisson processes, that is, Y`(t) = Y`(t) ; t. Further, let ;1 Zn and de ne the drift function F by F (x) = P ` ` (x), x 2 Z d . With this Zn = n ` notation, equation (5.2) is equivalent to Zn(t) = Zn X^ (0) + n;1 `Y n ` ` Zt 0 ` (Zn (s))ds + Zt 0 F (Zn(s))ds: (5.3) The second term on the right hand side of (5.3) will be small for large n since ^ sups t n;1jY`(ns)j converges to 0 almost surely, as mentioned at the beginning of the present chapter. This suggests that Zn will resemble the deterministic vector function z(t) de ned as the solution to the integral equation z(t) = z0 + Zt 0 F (z(s))ds: (5.4) This deterministic approximation can also be explained intuitively. The process Zn P starts at Zn(0) and the `average drift' of Zn(s) at s is ` ` `(Zn(s))ds = F (Zn(s))ds, R implying that Zn(t) should be approximately equal to Zn(0) + 0t F (Zn(s))ds. The following theorem proves this strictly. Theorem 5.2 Suppose that limn!1 Zn(0) = z0 and that for each compact K 2 Rd there is a constant MK > 0 such that jF (x) ; F (y )j MK jx ; y j 8x y 2 K . Then limn!1 sups t jZn(s) ; z (s)j = 0 almost surely, where z (t) is the unique solution to (5.4). Proof. The validity of the theorem relies only on ` in some neighbourhood K of s tg. De ne thus ` = supx2K `(x), which is nite due to the continuity of `. From (5.3), the de nition of z(t) and the assumptions of the theorem we have fz(s) 0 jZn(s) ; z(s)j = Zn(0) ; z0 + Z s; 0 X^ + n;1 `Y ` n ` Zs 0 ` (Zn (u))du F (Zn(u)) ; F (z(u)) du X jZn(0) ; z0j + j`j u s ` sup n;1 ^ jY`(n `u)j + Zs 0 MK jZn(u) ; z(u)jdu: By Gronwall's inequality (Lemma 5.1) this implies that jZn(s) ; z(s)j X ^ jZn(0) ; z0j + j`j sup n;1jY`(n `u)j us ` ! eM s: K Taking the supremum then yields sup jZn(s) ; z(s)j X ^ jZn(0) ; z0j + j`j supt n;1jY`(n `u)j u st ` ! eM t : K The exponential function is independent of n, the rst term in the brackets converges to 0 by assumption, and each term in the nite sum converges almost surely to 0 ^ owing to the fact that sups t n;1Y`(ns) converges to 0 almost surely, for any t. This completes the proof. We now apply the theorem to the birth and death process de ned in Section 5.1. We replace the parameter by n and denote the corresponding process by Xn. The two possible jumps are 1, and the jump rates are n for an increase and a for a decrease if Xn(t) = a. With the notation of Section 5.2 this implies that 1 (x) = 1 and ;1 (x) = x. Consequently we have F (x) = 1 (x) ; ;1 (x) = 1 ; x. It is easy to show that the solution to (5.4), denoted here by x(t), is given by x(t) = ;1 + (x0 ; ;1)e; t . By the theorem it then follows that if the initial value converges, i.e. Xn(0)=n ! x0 , then Xn(t)=n converges to x(t) almost surely, uniformly on compact sets. In particular we see that for su ciently large t x(t) ;1, so Xn(t) n ;1, as was seen heuristically in Section 5.1. 5.4 The Central Limit Theorem In the previous section, it was shown that the normed jump Markov vector process Zn, for a large population n, was approximately equal to the deterministic vector function z de ned by equation (5.4). The next natural step is to study the deviations between the two, that is to derive a central limit theorem. As usual, it turns out that p p the deviations are of order n. Before de ning the n-scaled vector process Vn we de ne the similarly scaled Poisson processes de ned in the previous section p; ^ W`(n)(t) = n n;1 Y`(nt) ; t = n;1=2 Y (nt): As pointed out at the beginning of this chapter, W`(n) converges to the standard p Brownian motion W`. The n-scaled centered vector process Vn is then de ned by p; Vn(t) = n Zn(t) ; z(t) (5.5) = vn(0) + X ` `W (n) ` Zt 0 Z tp ; n F (Zn(s)) ; F (z(s)) ds: ` (Zn(s))ds + p; Of course, vn(0) = n Zn(0) ; z(0) 0 in (5.5), which by assumption is non-random. The second equality is a direct consequence of the de nition of W`(n) , Zn and z. We can expand the integrand on the far right by Taylor's theorem, so that pn ;F (Zn(s)) ; F (z(s)) = pn@F (z(s))(Zn(s) ; z(s)) + O(pnjZn(s) ; z(s)j2) = @F (z(s))Vn (s) + O(jZn(s) ; z(s)j)Vn(s) where @F = (@j Fi ) is the matrix function of partial derivatives. From Theorem 5.2 we know that Zn converges to z, and from the beginning of this chapter that W`(n) converges to W`, a standard Brownian motion. This suggests that Vn converges to a process V de ned by the integral equation V (t) = v0 + X ` `W` Zt 0 ` (z (s))ds + Zt 0 @F (z(s))V (s)ds: This is proven in the following theorem where we use the notation G(x) = (5.6) P ``T (x). ` ` Theorem 5.3 Suppose @F is continuous and that limn!1 vn(0) = v0 (constant). Then Vn ) V , the process de ned in equation (5.6). This process V is a Gaussian vector process with covariance matrix Cov(V (t) V (r)) = where Z r^t 0 (t s)G(z(s))( (r s))T ds is a matrix function de ned as the solution of 0 (t 2 s) = ; (t s)@F (z(s)) (s s) = I ( 02 denotes the partial derivative with respect to s). Proof. De ne n(t) by n (t) = = Z tp ; n F (Zn(s)) ; F (z(s)) ; @F (z(s))Vn (s) ds 0 Zt 0 O(jZn(s) ; z(s)j)Vn(s)ds: From Theorem 5.2 we know that Zn(s) converges to z(s) uniformly on bounded intervals. Thus, since Vn is bounded in probability it follows that sups t j n(s)j converges to 0 almost surely. Introduce Un(t) = P `W (n) R t (Z (s))ds `` 0`n and U (t) = P `W R t (z(s))ds ` ` 0` These are the second terms in the de ning equations of Vn and V , equations (5.5) and (5.6) respectively. Rewrite (5.5) and (5.6) to obtain ~ Un(t) := Un (t) + n(t) = ;vn(0) + Vn(t) ; U (t) = ;v0 + V (t) ; Zt 0 Zt @F (z(s))V (s)ds: 0 @F (z(s))Vn(s)ds: (5.7) (5.8) . The processes fW`(n) n 1g converge to a standard Brownian motion for each `. Since Zn converges uniformly on bounded intervals to z and n converges to 0 it ~ follows that Un ) U . From the de nition of and by partial integration, Zt ~ (t s)dUn(s) = 0 = = Zt Zt (t s)dVn(s) ; (t s)@F (z(s))Vn (s)ds 0 (t t)Vn(t) ; (t 0)vn(0) 0 Zt ; 0 ( 02(t s) + Vn(t) ; (t 0)vn(0) so that Vn(t) = (t 0)vn (0) + Zt 0 (t s)@F (z(s))) Vn(s)ds ~ (t s) dUn(s): (5.9) An identical argument shows that V satis es V (t) = (t 0)v0 + Zt 0 (t s) dU (s): (5.10) From equations (5.9) and (5.10) it then follows that Vn ) V by the continuous mapping theorem (e.g. Corollary 3.1.9 of Ethier and Kurtz, 1986). The vector process U is Gaussian, in fact a time-inhomogeneous Brownian motion vector. It follows that V is also Gaussian and the variance function is as speci ed in the theorem. We now apply the result to the birth and death process de ned in Section 5.1. In the previous section we concluded that Xn(t)=n converged to the deterministic function x(t) = ;1 + (x0 ; ;1)e; t . To simplify notation we assume that the process is started in equilibrium, that is Xn(0) = n ;1 implying that x0 = ;1 and hence x(t) = ;1 for all t. Since F (x) = 1 ; x it follows that F 0(x) = ; . The function de ned in the theorem then has the solution pt s) = e; (t;s) , and ( G(x(s)) = 1 + x(s) = 2. The theorem is applied to Vn(t) = n(n;1Xn(t) ; ;1). The assumption Xn(0) = n= implies that vn(0) = 0 for all n, so v0 = 0. The theorem then states that the scaled birth and death process Vn converges to (i.e. may be approximated by) a Gaussian process V with covariance function Cov(V (t) V (r)) = Z t^r 0 e; (t;s) 2e; (r;s)ds = ;1 ;e; jt;rj ; e; (t+r) : (Except at the start, i.e. for small t and r, the second term on the far right is negligible and then Cov(V (t) V (r)) ;1e; jt;rj.) A Gaussian process having this covariance function is known as an Ornstein-Uhlenbeck process, an important process in di usion theory (e.g. Karatzas and Shreve, 1991). It may also be illuminating to write equation (5.6) explicitly for our example: V (t) = W1(t) ; W;1 (t) ; Zt 0 V (s)ds: Since W1 (t) and W;1(t) are independent their di erence has the same distributional p properties as 2W (t) (where W is a standard Brownian motion, of course). Performing this substitution and writing the integral equation above as a (stochastic) di erential equation we have p dV (t) = ; V (t) dt + 2dW (t) which is the de ning di erential equation for the Ornstein-Uhlenbeck process. The properties of the birth and death process derived heuristically can be veri ed. The limiting process has a drift back towards equilibrium (now at the origin due to centering). Beside the negative drift, there is random noise expressed in the second term. 5.5 Applications to epidemic models In the present section we apply the results presented above to a special case of the epidemic model de ned in Chapter 2. We stress that the approximation concerns the whole epidemic process (as it evolves in time) and not only the nal size as mainly studied in preceding sections. Due to the complex structure of epidemic models (although simple to de ne), we will not obtain very explicit solutions, as we did for the birth and death process above. The theory involves approximations relying on the central limit theorem. This means that we can only hope to approximate the epidemic process when there are many infectious individuals, thus excluding the initial and nal phases of the epidemic. We will therefore assume a (small) positive proportion of infectives when the epidemic starts how to approximate the initial phase of the epidemic using coupling methods was described in Section 3.3. Further, we treat the special case of the standard epidemic model presented in Section 2.3 in which the infectious periods fIig are exponentially distributed, making the epidemic process Markovian. By extending the dimension of the process, other distributions may be modelled using the same theory. For example, if the infectious period is ;(k ) the epidemic model falls under the model of Section 5.2 in which there are k infectious states, which individuals pass through sequentially with identical jump rates . Using the notation of Chapter 2, the model treated in this section is denoted En n( I ), where I is exponentially distributed with intensity . Our two-dimensional process is Zn = (Xn Yn), where Xn(t) denotes the number of susceptibles at t and Yn(t) the number of infectives at the same time point. The initial values are Xn(0) = n and Yn(0) = n. The proportion of initial infectives is assumed positive but usually very small. The process (Xn Yn) can make two types of jumps. Either a susceptible becomes infected, implying that the process changes by (;1 1), or else an infective is removed. The latter a ects the process by (0 ;1). Because the infectious periods are exponentially distributed with intensity parameter , the jump rate for a (0 ;1)-jump is Yn(t). New infections, or equivalently (;1 1)-jumps, occur at rate ( =n)Xn(t)Yn(t). The jump intensity functions for this model are thus and (;1 1) (x y ) = xy (0 ;1) (x y ) = y: Let x = x=n, and similarly y = y=n. We have then argued that the epidemic process has the following jump intensities P((Xn(t+h) Yn (t+h)) = (x;1 y +1)j(Xn(t) Yn(t)) = (x y)) = hn (;1 1) (x y)+o(h) P((Xn(t + h) Yn(t + h)) = (x y ; 1)j(Xn(t) Yn(t)) = (x y)) = hn (0 ;1) (x y) + o(h): The drift function F de ned in Section 5.3 is then given by F (x y) = (; xy xy ; y): The deterministic solution z = (x y) to the integral equation (5.4) corresponds to the pair of di erential equations x0 (t) = ; x(t)y(t) x(0) = 1 0(t) = x(t)y (t) ; y (t) y y(0) = : Recalling the historical overview in Section 1.4, we see that these di erential equations are identical to those of the rst deterministic epidemic model presented by Kermack and McKendrick (1927). A parametric solution to this set of di erential equations was derived in Section 1.4: x(t) = e; z(t) y(t) = 1 + ; z(t) ; e; z(t) ; where z(t) is de ned by the di erential equation z0 (t) = 1 + ; z(t) ; e; z(t) , with initial value z(0) = 0. Here = = . First we apply Theorem 5.2 to show that the `density' process (Xn Yn) = (Xn=n Yn=n), converges to the deterministic functions de ned above. To see that the conditions are ful lled, we note that (Xn(0) Yn(0)) = (1 ) = (x0 y0) so the requirement that the initial value converges is obvious. Using the fact that the domain of interest satis es 0 x1 x2 y1 y2 1 + together with the elementary bound ab (a2 + b2 )=2, one can show that jF (x1 y1) ; F (x2 y2)j (2 (1 + ) + ) j(x1 y1) ; (x2 y2)j: The upper bound is just a rough estimate but su cient for the second assumption of the theorem to be satis ed. Theorem 5.2 thus proves that (Xn(t) Yn(t)) converges almost surely to (x(t) y(t)), uniformly on bounded intervals. Theorem 5.3 can be applied to conclude that the uctuations around the deterministic solution are asymptotically Gaussian. However, the covariance function ~~ ~ de ; the very Vn Xn(t) = pn nedn(in) ; x(ttheoremYis(tnot pn ;Yexplicit.y(tLet The = (Xn Yn), wherederivatives ~n ) = Xt ) and matrix of partial n (t) ; ) . of the drift function and the matrix function G appearing in the theorem, are y ; xy @F (x y) = ; yy ; ;x and G(x y) = ; xxy xy + y : x ~ ~ The matrix @F (x y) is continuous and (Xn(0) Yn(0)) = (0 0), so the assumptions ~~ of Theorem 5.3 are satis ed. Hence, it follows that (Xn Yn) converges to a Gaussian vector process V . The set of di erential equations de ning in the covariance function are (derivatives below are with respect to s) 0 (t s) 0 (t s) 11 (t s) 12 (t s) ; y (s) ; x(s) 11 12 0 (t s) 0 (t s) = ; 21 (t s) 22 (t s) y(s) x(s) ; 21 22 and with 11(s s) = 22(s s) = 1 and 12(s s) = 21 (s s) = 0. It is seen that this is actually two independent pairs of di erential equations. The rst pair contains 11 and 12 and the second pair of di erential equations contains 21 and 22 and is identical to the rst one except for di erent initial conditions. The solutions are not explicit but can be partially derived. They are used for the computation of the covariance functions of the process, as speci ed in the theorem. For example, it ~ follows from the theorem that the limit of Xn(t) has variance Zt 0 ( 11(t s) ; 12(t s))2 x(s)y(s) + 12 (t s)2 y(s) ds: As mentioned earlier, it is complicated and not very illuminating to derive more explicit solutions. For interested readers we refer to Kurtz (1981) who characterizes the limiting process for a general distribution of the infectious period, without assuming exponentially distributed infectious periods as we have done. Exercises 5.1. Consider the SI model (see Exercise 2.5) and assume that m = mn = n , so that there is a positive proportion of initial infectives. Derive a law of large numbers for Yn(t), i.e. let n ! 1. (Hint: The rate of new infections is ( =n)(n + m ; Yn(t))Yn(t) since there are no removed individuals and hence Xn(t) = n + m ; Yn(t).) 5.2. The SI model (continued). Let y(t) be the deterministic limit derived in Exercise 5.1. To simplify computations you may assume that the initial proportion infecive is negligible (although positive). The central limit theorem of Section 5.4 implies that pn (n;1Yn(t) ; y(t)) converges to a Gaussian process. Derive the integral expression of Theorem 5.3 for the covariance function. Ultimately everyone gets infected (since individuals remain infectious forever) so the variance function tends to 0 as t = r gets large, but at what rate? (Hint: You may use without proof the fact that (t s) splits into a product f (t)g(s).) 5.3. An SIRS epidemic is just like the SIR epidemic, only individuals in the removed state lose their immunity after some time and become susceptible again. This in ow of new susceptibles can lead to a situation where the disease `survives' for a long time, a behaviour known as endemicity (cf. Chapter 8). . Extend the Markovian SIR epidemic to a Markovian SIRS epidemic by assuming that removed individuals become susceptible again independently at rate . Assuming Xn(0) = n and Yn(0) = m where m = mn = n , derive the law a large numbers for this model. 6 Multitype epidemics The epidemic model studied so far, En m( I ), assumes that the population is homogeneous (with regard to the disease) and that individuals mix uniformly. In real life epidemics, this is rarely the case. For example, children are usually more susceptible to in uenza, and sometimes individuals with a previous history of the disease have acquired some partial immunity. For STDs (sexually transmitted diseases), some individuals have higher infectivity in that they are more promiscuous (varying infectivity is often the case in other transmittable diseases as well). It may also be that the infectious periods of di erent individuals are not identically distributed however, the assumption of independence seems reasonable in most cases. These heterogeneities can be characterised as individual. A second group of heterogeneities is caused by the social structure in the population. The model En m ( I ) assumes that an individual has contact with each individual at equal rate (= =n), so that there is uniform mixing. In real life, the presence of social structures, such as households, friendly (including work) relations, and geographical structures, violates this assumption. In the present chapter we relax these assumptions and examine the consequences to the spread of disease. In Section 6.1 we indicate how to generalise the Sellke construction and the exact results presented in Chapter 2, to the case where the contact rates between di erent pairs of individuals as well as the distributions of the infectious periods may vary. In Section 6.2 we discuss large population approximations. When considering a sequence of epidemic models indexed by the increasing population size, this must be done without increasing the number of parameters. We therefore treat the case where individuals in the population can be characterised by di erent types of individuals, assuming that individuals of the same type are homogeneous with respect to susceptibility, infectivity and social mixing and have the same distribution of the infectious period. In Section 6.3 a model for epidemics among households is introduced. This model is motivated by the obvious fact that the rate of transmission tends to be much higher within households than between individuals of di erent households. We give some preliminary results, referring the interested reader to Ball et al. (1997) for the full story on household epidemics. Finally, in Section 6.4 we compare the distribution of the nal size for a speci c form of the multitype epidemic model with that of the nal size for the `single type' model En m( I ). 6.1 The standard SIR multitype epidemic model The model of Chapter 2 can be generalised in a straightforward way to the case where there are di erent types of individuals. Before de ning the model, originally considered by Ball (1986), we need some notation. Assume the population splits up into k groups of individuals labelled 1 : : : k. Suppose that initially there are ni suscepti- ble i-individuals and mi infectious i-individuals, i = 1 : : : k. Let n = (n1 : : : nk ), m = (m1 : : : mk ), n = Pi ni and m = Pi mi , the latter two denoting the total number of initially susceptible and infectious individuals respectively. Finally, we let i = ni =n denote the proportion of individuals of type i (i-individuals). Infectious periods of infectives of type i are distributed according to a random variable Ii with moment generating function i, i = 1 : : : k, and all infectious periods are de ned to be independent. De ne i = E (Ii) and i2 = Var(Ii) for future use. During the infectious period of an i-individual she has contact with a given j individual at the time points of a homogeneous Poisson process with intensity ij =n. If the contacted individual is still susceptible, she becomes infectious and is able to infect other individuals. After the infectious period the individual becomes recovered and immune and is called removed. All Poisson processes are de ned to be independent. The epidemic ceases as soon as there are no infectious individuals left in the population. Note that ij may not necessarily coincide with ji. Included in this `contact parameter' is not only the rate at which two individuals meet (naturally symmetric) but also the probability of disease-transmission which depends on the infectivity of the infective and the susceptibility of the susceptible. Some special cases of the general contact parameters f ij g have received attention in the literature. The case where the contact parameter splits up into a product ij = i j goes under the name proportionate mixing. The parameters f ig and f j g are then called infectivities and susceptibilities respectively. The case where individuals vary only in terms of their susceptibility has also received special attention. In Section 6.4, we study this case more closely and see how the heterogeneity a ects the nal size of the epidemic. The model de ned above covers the case where all individuals are di erent: simply let each individual be a type of her own. The parameters of the model are the contact parameters = f ij g and the random variables I = (I1 : : : Ik ) describing the infectious periods. Following the notation of Chapter 2 we denote the model de ned above by En m( I). The Sellke construction can be generalised to the model above in a simple fashion. Label the i-individuals (i ;(mi ; 1)) (i ;(mi ; 2)) : : : (i ni) with the infectives listed rst, i = 1 : : : k. Let Ii ;(m ;1) Ii ;(m ;2) : : : Ii n be identically distributed random variables, each distributed according to Ii, i = 1 : : : k. Also, let Q1 1 : : : Qi n Q2 1 : : : , Qk n be independent and identically distributed exponential random variables, having intensity 1. These are the individual thresholds. All random variables listed above are de ned to be mutually independent. The initial infective labelled (i r) remains infectious for a time Ii r and is then removed. Denote by Yi(t) the number of infective i-individuals at time t, and let i i i i k Aj (t) = k X ij Z t i=1 n 0 Yi(u) du (6.1) be the total infection pressure exerted on a given j -susceptible up to time t. The susceptible labelled (j u) becomes infected when Aj (t) reaches Qj u and she remains infectious for a time Ij u and is then removed. The epidemic ceases when there are no infectives left in the population. Exact results for the model En m( I) are derived in a way similar to that of the homogeneous case described in Section 2.4. We shall not perform this generalisation but encourage the reader to do so. The formula for the nal size can be expressed in a form similar to the nal size formula for a homogeneous population. Introduce the vector notation k a = Y ai b i=1 bi a a1 a XXX and b=0 = k b1 =0 bk =0 and let a b mean ai bi i = 1 : : : k. The nal size of the epidemic is now speci ed by a vector Z where the component Zi denotes the number of initially susceptible i-individuals who became infected. Let Pu = P(Z = u). Ball (1986) has shown that Pu can be derived from the recursive formula v k X n;u Y Pu v;u u=0 i=1 i k X j =1 !u +m (nj ; vj ) ij =n i i =n v 0vn resembling the recursive formula for a homogeneous population in Section 2.4. 6.2 Large population limits In the present section we discuss large population limits of the model En m( I). We consider the case where the population size n grows but k, the number of di erent types, is kept xed. A further assumption is that the matrix f i ij j g, containing the expected number of contacts between all pairs of individuals, is irreducible. (A k k matrix A = (aij ) is called irreducible if it is impossible to nd a partition D1 D2 of the index set f1 : : : kg such that aij = 0 whenever i 2 D1 and j 2 D2 .) This assumption eliminates the possibility that part of the community encounters a major outbreak whereas another sub-group in the population remains una ected. A proof of the central limit theorem for the model is very much in the same spirit as the proof for a homogeneous population in Chapter 4, but is more technical and is omitted here. A complete proof is given by Ball and Clancy (1993). Below we present and interpret the result. Let i(n) = n(in) =n, (in) = m(in) =n(in) , i = 1 : : : k, and assume that these quantities converge as n tends to in nity. For the limits (written without superscripts) it is assumed that each i is strictly positive whereas i may be 0 or positive. Introduce the notation Zi0(n) = Zi(n) + mi and Zi0(n) = Zi0(n) =ni, i = 1 : : : k. Thus, Zi0(n) is the proportion among the initially susceptible i-individuals who were infected during the course of the epidemic plus the ratio i between the number of initially infectious and initially susceptible i-individuals. As in the homogeneous case, we distinguish between two casesP depending on whether the initial number of infectious individuals is of order n (i.e. i i > 0) or nite. P If ( 1 ::: i i > 0 then the vector k ) is the unique solution ; fZi0(n)g converges in probability to to the equations = e; P , where = j = 1 : : : k: (6.2) The equation above has a natural interpretation. Let j = 1 + j ; j be the left hand side above. Then j is the proportion of initially susceptible j -individuals who escape infection. The factor i i in the exponent on the right hand side is the number of infected i-individuals (divided by n) and i ij is the expected infection force (multiplied by n) exerted on each j -individual. The sum in the exponent is thus the total infection force acting upon j -individuals, so the right hand side is the probability for a j -individual to escape infection. The balance equation above then simply states that the probability of escaping infection equals the proportion of individuals that escape infection. Ball and Clancy (1993) also prove a central limit theorem for the ~ ~ vector Z0(n) with components Zi0(n) = pni Zi0(n) ; i . It is shown that 1+ j j i i i i ij ; ~D Z0(n) !N 0 (S T );1 S ;1 where S and are matrices de ned by Sij = ij = ; p i j i j ij and k p i j i jX i (1 ; i ) ij + ij r=1 2 r r ri rj r : Above ij is the Kronecker -function and i and i2 are the mean and variance of the infectious period for i-individuals. It is worth observing that the variance increases with the variances of the infectious period, and that it coincides with the variance for a homogeneous population if there is only one type (k = 1). For the second case where the number m of initially infectives is kept xed as n ! 1 Ball and Clancy (1993) prove a threshold limit theorem for the multitype model. The theorem states that Z0(n) converges in distribution to the distribution of the total progeny of a multitype branching process having m ancestors, and with life distributions fIig and birth rates f ij j g. The basic reproduction number R0 for the epidemic model is the largest eigenvalue of the matrix of mean o spring f i ij j g. If R0 1 it follows from standard branching process theory (e.g. Jagers, 1975) that the total progeny of the branching process (and hence also for the epidemic) is almost Q surely nite. If R0 > 1 there is a positive probability 1 ; i qim that the branching i process explodes (fqig is the solution to a certain equation). On this part of the ~ sample space the vector Z(n) converges to a normal distribution with mean 0, and the same variance as for the previous case, except for replacing each i by 0 in the expression. 6.3 Household model When modelling the spread of disease in a human population, it is very important to take into account the formation of small social groups such as households, schools and work places, the reason being that the spread of infection is usually greatly facilitated among such groups, which have a high level of mixing. Regarding household models, Ball et al. (1997) is the main reference. For important contributions to the theory and practical applications, see e.g. Becker and Dietz (1995), Becker and Hall (1996), Becker and Starczak (1997) and Islam et al. (1996). A related model is treated in an early paper by Bartoszynski (1972). Work on outbreaks within households, in the presence of community infection, but without considering the dynamics of the latter can be found in e.g. Longini and Koopman (1982), Addy et al. (1991) and Becker (1989). See also Andersson and Britton (1998). De nition of the model Initially there are n susceptible individuals and m infectious individuals. Let the susceptible population be subdivided into n=h households each of size h. (The case of unequal household sizes can be treated similarly, but the notation becomes cumbersome.) The infectious periods of di erent infectives are independent and identically distributed according to a random variable I . Throughout her infectious period a given infective contacts a given individual in the population (within or outside her household) at rate G=n, and, additionally, a given individual in her own household at rate L. If the contacted individual is still susceptible, she becomes infectious and is immediately able to infect other individuals. After the infectious period the individual becomes removed. As usual, all the random variables and Poisson processes involved are assumed to be mutually independent. Note that, by de nition, infectives make both `global' and `local' contacts with their household members. This is for mathematical convenience only and in a large population the global rate G=n can be neglected compared to the local rate L. Basic reproduction number As usual we study a sequence of epidemic processes indexed by the population size n. The household size h is kept xed while the number of households grows with the population size, in contrast with the situation in Section 6.1 where the number of groups was kept xed while the group sizes grew. By using branching approximations in a heuristic way, it is possible to derive the basic reproduction number R0 . Assume that n is large and m = 1. The initial infective contacts on average G individuals, and these individuals will belong to distinct households with high probability. Moreover, each of the individuals infected in this way generates a small sub-epidemic in P her own household, comprising on average ML = h=1 jP1j individuals. Here P1j , j and more generally Pij is de ned as the probability that, given i initial infectives in the household, ultimately j household members become infected, 0 i j h. All these new infectives now make global contacts, introducing the disease in other households, and so on. The branching character of the number of infectious individuals is thus demonstrated, implying that the basic reproduction number R0 is given by R 0 = ML G : Of course, if there are no household formations (h = 1) we arrive at the old expression R0 = G , whereas R0 = h G if the disease is highly infectious within households ( L = 1). Final size equation 0 Assume that n is large and that mn=n ! > 0 as n ! 1. Let Zn be the ultimate number of infected individuals (including the initial infectives), and suppose that this 0 quantity satis es a law of large numbers, Zn=n ! in probability as n ! 1. We will derive by heuristic means the asymptotic equation for . De ne qj to be the asymptotic proportion of households with j individuals ultimately infected, so that h 1 X jq : =+ (6.3) h j=0 j Each individual in a given household of size h will become infected from outside with asymptotic probability 1 ; exp (; G ), and these infections occur independently of each other, thus remembering the de nition of Pij above it follows that j X h ; ; h;i ; ; i qj = e 1;e Pij : (6.4) i=0 i Equations (6.3) and (6.4) together yield an implicit equation for . It is possible to derive rigorously the threshold limit theorem for the nal size together with a normal approximation result in case of a large outbreak. The proof is very similar to the proof outlined in Chapter 4, but is notationally much more inconvenient, hence we refer to Ball et al. (1997) for the details. G G 6.4 Comparing equal and varying susceptibility In the present section we study a speci c form of the multitype epidemic model de ned in Section 6.1. We assume all individuals have the same distribution of the infectious period and that the contact parameters f ij g satisfy ij = j . The interpretation of this is that all infectives are equally infectious { individuals only vary in terms of their susceptibility. In particular we compare the nal size for such an epidemic with that of the nal size of the `corresponding' homogeneous population. In reality, one may not know which model is the correct one, thus motivating such a comparison. As a by-product we use a coupling argument to show a surprising result about exponential variables. This result was proven independently by Proschan and Sethuraman (1976) and Ball (1985). The proof below is inspired by Barbour, Lindvall and Rogers (1991). First we prove a lemma and then the main theorem concerning exponential variables, then we see what drastic consequences this has to the comparison of nal size between di erent populations. Lemma 6.1 Let Exp( 1), 2 Exp( 2 ) and 1 and 2 Exp( ), where = ( 1 + 2)=2. Assume further that all these random variables are mutually independent. 0000 0 0D 0 0D Then there exists a coupling ( 1 2 1 2 ) such that ( 1 2 )=( 1 2), ( 1 2 )=( 1 2) and for which the order statistics satisfy 1 0 (1) 0 (1) In particular it follows that (i) 0 (2) D (i) 0 (2) almost surely. i = 1 2. Proof. First we mention the standard way of constructing a continuous random variable X with distribution function F from a uniform random variable U . This is simply done by evaluating the inverse of the distribution function at the point U : X = F ;1(U ). It is easy to show that the random variable so obtained has distribution function F . This type of construction will be used repeatedly, without explicitly performing the transformation. Before constructing the random variables we look at the law of the order statistics. It follows immediately from the de nition of the random variables that P( (1) > t) = P( 1 > t 2 > t) = e; 1 te; 2 t = e; te; t = P( 1 > t 2 > t) = P( (1) > t) so the rst order statistics are actually identically distributed. The lack of memory property for exponential variables together with the fact that 1 and 2 have the same intensity parameter implies that the conditional second order statistics satisfy P( (2) > t + uj (1) = u) = e; t . In the corresponding probability for (2) we have to condition on whichever variable is smaller. Introduce = 1=( 1 + 2 ), so 1 = 2 and 2 = (1 ; )2 . Note that = P( 1 < 2). We then have P( (2) > t + uj (1) = u) = e;(1; )2 t + (1 ; )e; 2 t P( (2) > t + uj (1) = u) = e; t : Let f ( ) denote the right hand side of the rst equation. Then f (1=2) equals the right hand side of the second equation. It is easy to show that f is a convex function which attains it minimum at = 1=2. Hence it follows that P ( (2) > t + uj (1) = u) P ( (2) > t + uj (1) = u). Now consider the coupling. Let U1 : : : U4 be i:i:d: uniform random variables. 0 0 We construct the rst order statistics (1) and (1) by taking the inverse functions 0 0 of U1 (the fact that the distributions are the same implies that (1) (1) ). We 0 0 0 construct (2) and (2) using U2 by letting (2) be the inverse function of the conditional 0 given 0 , evaluated at U2 and similarly for 0 . It follows that distribution of (2) 0 0 . Finally, we use (1)3 and U4 to determine which one of 0 (2) 0 , and 0 and U 1 and 2 1 (2) (2) 0 respectively, is the smaller. Since 0 and 0 are identically distributed, we simply 2 1 2 0 0 let 1 = (1) with probability 0.5. More formally 0 1 0 2 = = 0 (1) 1(U3 0:5) + 0 (1) 1(U3 >0:5) + 0 to equal (1) , 0 0 We select which of 1 and 2 is become correct. This is done as follows 0 1 0 2 = = 0 (1) 1(U4 P( 1 2 j (1) (2) )) + 0 (1) 1(U4 >P( 1 2 j (1) (2) )) + 0 (2) 1(U3 >0:5) 0 (2) 1(U3 0:5) : so that the conditional probabilities 0 (2) 1(U4 >P( 1 2 j (1) (2) )) 0 (2) 1(U4 P( 1 2 j (1) (2) )) : It is straightforward to check that this coupling satis es the conclusions of the theorem. The distributional statement of the theorem is easy to show once the coupling has been constructed: 0 (i) > t) = P ( (i) > t) inequality follows trivially since (0i) P( The P( 0 (i) > t) = P ( (i) > t): 0 (i) almost surely. The lemma above is used repeatedly in the following theorem. Theorem 6.2 Let X1 : : : Xn and Y1 : : : Yn be independent exponentially distributed random variables: Xi Exp( i ), and Yi Exp( ), i = 1 : : : n, where = 0 ( 1 + : : : + n)=n. Then there exists a coupling (X10 : : : Xn Y10 : : : Yn0 ) such that 0D (X10 : : : Xn)=(X1 : : : Xn) and for which In particular, and D (Y10 : : : Yn0 )=(Y1 : : : Yn) Y(0j) X(0j) j = 1 : : : n almost surely. D Y(j) X(j) j = 1 : : : n: Proof. Start with the independent sequence X1 : : : Xn. Select the rst two variables X1 and X2 and use the lemma to construct 1 and 2 such that a) they are independent and both exponentially distributed with the same parameter ( 1 + 2)=2, D D b) min( 1 2) min(X1 X2), and c) max( 1 2) max(X1 X2). We have thus constructed the sequence ( 1 2 X3 : : : Xn) of independent exponential variables with parameters (( 1 + 2 )=2 ( 1 + 2 )=2 3 : : : n) with smaller order statistics than the original sequence. The same procedure is repeated, choosing the pairs having largest and smallest -value, with the e ect that the order statistics are reduced and the parameters are brought closer and closer to . In the limit the order statistic converges to Z1 : : : Zn say, which has the same law as the order statistics of n independent exponential random variables with common parameter . Reordering these variables at random gives a sample distributed as Y1 : : : Yn, satisfying the statement of the theorem. The proof is complete. The theorem above is general and not related to epidemic models. The relevance of it for comparing the distributions of the nal size for a homogeneous population with that of a population with varying susceptibility is made clear in the following corollary originally derived by Ball (1985). Below we let (X (t) Y (t)) denote the number of susceptibles and infectives respectively at t for the standard SIR epidemic, ~~ and (X (t) Y (t)) the corresponding numbers for the multitype epidemic. Corollary 6.3 Consider the (homogeneous) standard SIR epidemic En m ( I ) and the multitype epidemic En m( I) in which di erent types di er only in terms of susceptibility, and which has the same distribution of infectious periods and the same initial numbers of infectives and susceptibles as the homogeneous epidemic (i.e. all P P infectious periods have the same distribution as I and i mi =P , i ni = n and m ij = j ). Assume further that and f j g are related by = j nj j =n. Then it ~~ is possible to construct versions (X 0 (t) Y 0 (t)) and (X 0(t) Y 0 (t)) of the two epidemics such that ~ X 0(t) X 0(t) for all t, almost surely. ~ As a consequence the nal sizes satisfy Z 0 Z 0 almost surely. In particular it follows that D D ~ ~ X (t) X (t) and Z Z for any n m 1 and t 0. Proof. The proof follows immediately if we modify the Sellke construction slightly. Because individuals vary only in terms of susceptibility (i.e. ij = j ) the infection pressure exerted on j -individuals, Aj (t) de ned in (6.1), satis es Aj (t) = ( j =n) XZ t i 0 ~ Yi(u)du = ( j = )A(t) PR ~ where A(t) = ( =n) i 0t Yi(u)du. The independent and exponentially distributed individual thresholds fQjk g were all de ned to have intensity parameter equal to 1. If we de ne instead the thresholds of j -individuals (Qj 1 : : : Qj n ) to have parameter j = , j = 1 : : : k, then we may assume all individuals to have the same infection ~ pressure A(t). For this modi ed Sellke construction, it follows from Theorem 6.2 that we may construct individual thresholds for the two epidemics such that the order statistics for the homogeneous epidemic is smaller than those of the model with varying susceptibility. Since we may use the same realisations of i.i.d. infectious periods for both epidemics it follows that the pressure A(t) for the homogeneous ~ epidemic exceeds the pressure A(t) in the heterogeneous model for every t using a similar argument as that used in the monotonicity result in Section 3.3. This implies that even more individuals in the homogeneous model will become infected, and so forth. j The corollary above is not only interesting from a theoretical point of view. In real-life it is of course hard to know the susceptibilities of di erent individuals, even of di erent subgroups in the population. If all that is known is the average susceptibility the corollary then states that a homogeneous population is the worst case. That is, if we calculate the probability assuming a homogeneous population, then any heterogeneity in susceptibility can only make things better in that probability mass is shifted towards smaller outbreaks. Andersson and Britton (1998) have argued that the calibration by comparing different models having the same arithmetic mean susceptibility is not necessarily a fair comparison. In fact, by the Sellke construction it seems fairer to have the same population-average of expected individual thresholds. The thresholds are exponentially distributed, so a threshold with parameter i has expected value ;1. With i this argument a population with parameters ( 1; : : : Pn) should be compared with a homogeneous population with parameter H = n;1 i ;1 ;1 . This means that an i alternative calibration is to assume that the harmonic means should coincide. Andersson and Britton (1998) perform this comparison and conclude that there is no corresponding strong result holding for all n and m and all t. There is one domination result stating that if m is xed, then the probability of a major outbreak is minimised for the homogeneous community. Their second result holds only for the nal size, assuming large outbreaks in large communities. For this case they conclude that a homogeneous population gives a larger outbreak if the disease is highly infectious, whereas some heterogeneity in the population will cause a larger outbreak if the disease is less infectious. More precisely, they show that if H 2(1 ; e;2) 2:31 then a homogeneous model will give a larger outbreak than any heterogeneous population having the same harmonic mean susceptibility, whereas if H < 2(1 ; e;2) then some heterogeneous population with the same harmonic mean susceptibility will produce a larger outbreak in case of a major outbreak. Exercises 6.1. Consider the standard SIR multitype epidemic model with two di erent types and assume that the expected length of the infectious period is identical for the two types 1 = 2 (=1 say). Compute the basic reproduction number R0 in case a) ij = i j (proportionate mixing). Extend the result to the case with k di erent types. b) ii = + and ij = , i 6= j (i.e. the mixing rate is higher within types than between). 6.2. Calculate numerically the nal size vector for the two examples of Exercise 6.1 with 1 = 2 = 0:5 (in both cases), 1 = 1 = 1, 2 = 2 = 2, = 1 and = 1. 6.3. In Section 6.4 the nal size domination of a homogeneous population was es- tablished. To see if the di erence is severe we here study the large population limits of the nal size (Equations (4.1) and (6.2)) in some examples. Assume the initial proportion infectives is negligible ( = i = 0) and without loss of generality that = 1. a) Compare the nal size of a homogeneous population having = 1:5 with the overall nal size in population in which half of the community has susceptibility 1 = 1 and the other half 2 = 2 (implying the same arithmetic mean). b) Do the same thing as in a) replacing 1.5 by 3, 1 by 0.5 and 2 by 5.5. c) Now use the calibration suggested by Andersson and Britton (1998) to compare a homogeneous population having = 1:5 with a population with half of the individuals with 1 = 1 and the other half having 2 = 3 (note that the harmonic mean is unchanged). d) Repeat c) with all parameters doubled. 7 Epidemics and graphs Random graphs provide us with a useful tool in understanding the structure of stochastic epidemic models. By representing the individuals in a population by vertices and transmission links by arrows between these vertices, we obtain a graph that contains information on many important characteristics, such as the nal epidemic size, the basic reproduction number and the probability of a large outbreak. The connection between SIR epidemics and random graphs was observed by Ludwig (1974), and has then been more fully exploited by von Bahr and Martin-Lof (1980), Ball and Barbour (1990) and Barbour and Mollison (1990). In the rst two sections of this chapter the random graph interpretation of the standard SIR epidemic model will be given we also show that the special case of a constant infectious period yields a particularly nice class of random graphs. Random graphs can also serve as models of social networks. We can give a simple reason for introducing the network concept in epidemic modelling. Many of the classical results for the standard SIR epidemic model require the population size n to be large. But, according to the modelling assumptions, contacts between two given individuals in a large population occur at a very low rate in principle, this implies that the possibility of repeated contacts is not taken into account. This observation indicates that the assumption of homogeneous mixing is not very realistic when describing epidemic spread in large populations. If we wish to allow repeated contacts between certain pairs of individuals, a possible solution is to pick a graph describing the relations between individuals, and then let the disease spread along the social network so obtained. In that way each individual is assigned to a small neighbourhood of other individuals, and can then contact each of her neighbours at a `normal' rate. This subject is now receiving increasing attention, see e.g. Altmann (1995, 1998), Rand (1997), Diekmann et al. (1998) and Andersson (1998, 2000). It is far from obvious how a suitable form of the network is to be chosen. The graph should be complicated enough to catch something of the sometimes extremely irregular contact pattern in a population of living organisms, but at the same time simple enough to lend itself to mathematical analysis. Here (Section 7.3) we will just scratch the surface by explaining how to run the Markovian version of the standard SIR epidemic on a very simple network (whereby the Markov property is destroyed!). We have so far presented various ways of modelling social relations in a population, ignoring completely the possibility of geographical spread of the disease. In Section 7.4 we make up for lost ground by presenting some results for a standard SIR epidemic model on the two-dimensional lattice, where the spatial element determines the progress of the epidemic completely. In order to approximate reality, social space and geographical space should of course be considered simultaneously, but to nd natural models for both remains an open problem. 7.1 Random graph interpretation Consider again the standard SIR epidemic En m( I ), endowing the individuals with random variables I;(m;1) I;(m;2) : : : In. Now represent the individuals by vertices of a graph, giving the vertices labels ;(m ; 1) ;(m ; 2) : : : n. If the ith individual ever becomes infected, we know that she will contact a given individual j in the population with probability pi = 1 ; exp(; Ii =n). Hence, for each given ordered pair (i j ) (i 6= j ) of vertices of the graph, let us draw an arrow from i to j with probability pi, the presence of such an arrow having the following interpretation: \If i ever becomes infected then i will contact j during her infectious period". Strictly speaking, these arrows are not drawn independently, but they are conditionally independent given the I -variables. There remains only to mark the vertices ;(m ; 1) ;(m ; 2) : : : 0 (representing the initial infectives) and just follow the arrows. The size of the subgraph traced in this manner is exactly the nal epidemic size. The random graph obtained above can be described as a digraph (directed graph) with prescribed distribution of the out-degrees (the number of out-going arrows from a given vertex). Evidently we lose information on events referring to time when making this construction, since only the epidemic chain is recorded the point is that characteristics related to the nal epidemic size are found to have nice graph-theoretic counterparts. As a simple application we use random graphs to indicate that even in a situation where the infectivity pro le is extremely complicated, many results may be derived with the same ease as for the standard SIR epidemic. Consider the sample space consisting of all nonnegative functions (t), t 0, with nite integral, and put some probability measure P on this space. Now let ;(m;1) (t) ;(m;2) (t) : : : n(t) be an independent sample, i(t) giving the infectiousness of individual i at time t after infection. If the ith individual ever becomes infected, she will contact a given individual j during her infectious period with probability 1 Z 1 (t) dt : pi = 1 ; exp ; n i 0 We can now draw a directed graph and trace the epidemic ow exactly as before. Using this representation, quantities such as the basic reproduction number and the distribution of the nal epidemic size may easily be derived. Since latency periods and complicated time dependent infection rates can be modelled using this device, it is easy to form the impression that we have found a way to model the spread of just about any infectious disease. Note, however, that the population is still assumed to be homogeneously mixing, in the sense that all susceptibles are equally likely to contract the disease at all times this decreases drastically the realism of the model. Figure 7.1 illustrates such a random graph with m = 1 initially infectious (marked with a large black circle) and n = 11 initially susceptible individuals. There are e u Y H HH H e ; ; ; ; u e HHz ;@ ; @ R @ e u A A A A A U ; ;; ;; e ; e J J J Je u Figure 7.1: Directed graph representation of a realization of Em n( I ) with m = 1 and n = 11. The initial infective is the larger black circle, while all other ultimately infected are smaller black circles. Small white circles correspond to uninfected individuals. Z = 4 additionally infected in the illustration (infected individuals have smaller black circles). Note that individuals to the right do not become infected even though there are potential contacts. The reason is that the disease never reaches this subgroup. The double arrow to the far right has the interpretation that either individual could have infected the other, had she been infected. 7.2 Constant infectious period If the infectious period in the standard SIR epidemic model is constant, it turns out that the random graph formulation involves the classical Bernoulli random graphs, which we describe next. The theory of random graphs was introduced by Erdos and Renyi (1959) and has been extensively studied ever since. In particular, the class of Bernoulli random graphs has been very thoroughly explored, see e.g. Bollobas (1985) and Barbour et al. (1992). This graph model, often referred to simply as the G (N p) model, is de ned as follows. We are given a set of N labelled vertices. With probability p, we connect a given pair of distinct vertices i and j by drawing an (undirected) edge between them. These connections are made independently of each other. It is clear that the degree Di of the ith vertex, i.e. the number of vertices adjacent to this vertex, is binomially distributed with parameters N ; 1 and p, so that the average degree is given by (N ; 1)p. In order to keep the size of the neighbourhood bounded as N grows large, we have to put p = =N , for some > 0. This implies that Di will be approximately Poisson distributed with parameter if N is large, according to the classical Poisson approximation theorem. Two vertices i and j are said to belong to the same component if and only if there exists a path of edges between i and j . It is thus clear that any graph consists of a number of disjoint connected components. Properties of these components have been extensively studied for the Bernoulli graph model. In particular, the following result is well-known in random graph theory. Theorem 7.1 Consider the G (N p) random graph model. Assume p = pN = =N , N ! 1. If 1 then a vertex chosen at random will belong to a component of size O(1). On the other hand, if > 1 then the relative size of the largest component converges in probability to some constant C strictly between 0 and 1, as N ! 1. Also, a randomly chosen vertex will belong to this giant component with probability C and it will belong to a component of size O(1) with probability 1 ; C . Turning again to the graph interpretation of the standard SIR epidemic, let us note that if the infectious period is constant, I , then the arrows are drawn independently of each other, and the probability of drawing an arrow from i to j is the same for all ordered pairs (i j ), i 6= j . Moreover, since only one (if any) of the two arrows is actually used for disease transmission, it is enough to draw one undirected edge between i and j , the presence of such an edge having the following interpretation: \If i ever becomes infected then i will contact j , and vice versa". Thus, setting N = n + m and p = 1 ; exp(; =n), we have arrived at the G (N p) random graph model! Random graph theory can now be invoked to derive results for the corresponding epidemic process. For instance, Theorem 7.1 above immediately implies that, in the case where the basic reproduction number exceeds 1, the asymptotic probability of a large outbreak (the probability of picking a vertex belonging to the giant component) is equal to the relative size of a large outbreak (the size of the giant component). Such a result can of course also be derived by calculating the explosion probability of the approximating branching process and comparing it with the solution to the nal size equation, but the graph approach gives us much more insight to the phenomenon. 7.3 Epidemics and social networks We have indicated above how to use directed (undirected) random graphs to describe the epidemic ow, the arrows (edges) representing possible channels for disease transmission. We now proceed to show how to run an epidemic process on a random graph, with the edges of the graph representing relations between individuals. Individuals related in this way will be called neighbours. First the epidemic process on a xed arbitrary graph is de ned, and then some large population results are derived in the case where the underlying network is modelled as a Bernoulli random graph. An epidemic process on a xed graph Consider a closed population consisting of N = n + m individuals. Represent the neighbourhood structure in the population with a labelled undirected graph G , so that the ith and j th vertices of the graph are connected by an edge if and only if individuals i and j are neighbours. The graph will often be the result of some random experiment. We assume that the structure is xed during the course of the epidemic. Let G be the adjacency matrix of the graph G , so that Gij = 1 if i and j are connected and Gij = 0 otherwise. It follows that G is a symmetric binary N N matrix with zeros in the diagonal. Let us next de ne the dynamics of the epidemic process. We pick m initially infectious individuals at random from the population. An infectious individual remains so for an arbitrarily distributed time period I . During this time period she makes `close contacts' with each of her neighbours according to the points of a Poisson process with intensity . If the individual so contacted is still susceptible, then she will immediately become infectious. After the infectious period, the infectious individual recovers and is then immune to further infections. All infectious periods and Poisson processes are assumed to be independent of each other. Denote the types `susceptible' and `infectious' by the letters X and Y , respectively. Generally speaking we set Ai = 1 if the ith individual is of type A, and Ai = 0 otherwise. Then de ne the number of individuals, connected pairs and connected triples with given type con gurations in the following way: A]n = AB ]n = ABC ]n = X n+m i=1 n+m X Ai AiGij Bj i j =1 n+m X i j k=1 i6=k AiGij Bj Gjk Ck : For instance, X Y ]n(t) is the number of neighbours i, j where i is susceptible and j is infectious at time t. If the total number of neighbours of a given individual is kept bounded, then all the quantities above are O(n). Note that AB ]n = B A]n , and that each pair in AA]n is counted twice. Bernoulli networks As an illustration, we take the Bernoulli graph model as our underlying network. Set N = n + m, the total number of individuals, and put p = =n for some > 0. For an outcome G belonging to G (N p) we run an epidemic on G , where for simplicity it is assumed that the infectious period is exponentially distributed with intensity . The basic reproduction number of this model is easily derived. Introduce an infectious individual in a susceptible population consisting of n individuals, where n is large. This individual has on the average neighbours, each of whom will become infected with probability =( + ), hence R0 = + : We now describe the time dynamics of the model. Assume that the proportion m=n of initially infectious individuals tends to a nontrivial limit as n tends to in nity. Let us assume that A]n =n, AB ]n=n and ABC ]n =n all tend to deterministic limits as n ! 1, and denote these limiting processes by a, ab] and abc], respectively. In Rand (1997) the following system of equations is derived: dx = ; xy] dt dy = xy] ; y dt d xx] = ;2 xxy] dt d xy] = ( xxy] ; yxy] ; xy]) ; xy] dt d yy] = 2 ( yxy] + xy]) ; 2 yy]: dt Let us explain the fourth line the other lines are then obtained by similar reasoning. If the central individual j in a connected XXY -triple (i j k) is infected by the individual k, we gain an XY -pair (i j ). On the other hand, we may lose an XY -pair (j k) in three ways: the individual j in an Y XY -triple (i j k) is infected by the individual i k infects j directly k becomes removed. The fourth line now follows readily. The system is not very useful as it stands, since no description of the time dynamics of the variables abc] is provided. Fortunately, the equations can be closed at the level of pairs by the following device. Consider a connected triple (i j k). Then, P (Ai Bj Ck = 1) = P (Ai = 1 j Bj Ck = 1) P (Bj Ck = 1) : If n is large, then the in uence on i of the second neighbour k is negligible given the state of the rst neighbour j , hence asymptotically, P (Ai = 1 j Bj Ck = 1) = P (Ai = 1 j Bj = 1) (i 1) = PPABBj==1) : ( j This translates to the formula abc] = ab]b bc] for all a b c: Finally, we insert this relation into the system above to obtain a closed system in the ve variables x, y, xx], xy] and yy]. The initial conditions are given by x(0) = 1, y(0) = , xx](0) = , xy](0) = and yy](0) = 2. This system can be simpli ed considerably. The equation for yy] is super uous, since yy] does not appear in the other equations. Also the equation for xx] can be crossed out, since by writing down the di erential equation for x2 and comparing it with the di erential equation for xx] it follows that xx] = x2 at all times. Finally, we de ne y = xy]=x, the number of infectious neighbours of a typical susceptible ^ individual. It can be veri ed that dy = 1 x d xy] ; xy] dx = ( x ; ; ) y ^ ^ dt x2 dt dt resulting in the di erential system dx = ; xy ^ dt dy = xy ; y ^ dt dy = ( x ; ; ) y ^ ^ (7.1) dt with initial condition x(0) = 1, y(0) = and y(0) = . For a rigorous derivation of ^ these results (and more), see Altmann (1998). The equation for the nal size proportion , where the proportion of initial infectives is included as in Chapter 4, is obtained as follows. Divide the third line of (7.1) by the rst one and integrate to get + log(x) = ; (1 + ; x) + y: ^ At the end of the epidemic y = y = 0, implying that ^ 1+ ; = exp ; + : Note that this transcendental equation is of the same form as the nal size equation (4.1). The basic reproduction number R0 appears in the formula, as it should. It is clear that there should be a qualitative di erence in behaviour between the standard SIR epidemic in a uniformly mixing population (small per capita infection rate, but a large number of individuals can be contacted) and the same type of epidemic on a social network (large per capita infection rate, but only a small number of individuals can be contacted). Indeed, an individual with an extremely long infectious period might infect a huge number of individuals under the assumption of homogeneous mixing. In a social network, however, such an individual will at most infect all of her neighbours, which is probably a small number, and will otherwise have no impact on the spread of the disease. Let us explore these thoughts further by indicating how to retrieve the classical Kermack-McKendrick model as a limit of (7.1) as the number of neighbours ! 1 and the infection rate ! 0 in such a way that ! b > 0. Using (7.1) we note that ^ ^ d y ;y =; y ;y +R dt where R = ; y= is small given the conditions on the parameters and . Gronwall's ^ inequality (Lemma 5.1) applied to the function jy= ; yj now shows that y= and y ^ ^ coincide in the limit, hence we need not bother with the variable y. We end up with ^ the system dx = ;bxy dt dy = bxy ; y dt which we recognize as the Kermack-McKendrick model. Also, R0 = which is perfectly in line with the above ndings. 7.4 The two-dimensional lattice =( + ) ! b= Consider the two-dimensional lattice Z2 . Our graph G is obtained by drawing an edge between two sites i j 2 Z2 if and only if ji ; j j = 1 (nearest neighbour interaction). Then a Markovian epidemic process is run on G as described in Section 7.3 (without loss of generality, assume = 1). Note however that this time we are not considering a sequence of processes on larger and larger nite graphs but rather a single process on an in nite graph. This and related models have been studied in e.g. Mollison (1977), Kuulasmaa (1982), Kuulasmaa and Zachary (1984) and Cox and Durrett (1988). For an excellent introduction to the general theory of interacting particle systems and percolation processes, see Durrett (1995). Also, the book by Liggett (1985) provides a thorough treatment of the subject. Following Cox and Durrett (1988) we will discuss the critical infection rate, a quantity that corresponds to the basic reproduction number, and the asymptotic shape of the epidemic given nonextinction. Critical infection rate In order to discuss the critical infection rate, we again draw a directed graph to describe the disease spread without explicitly referring to the actual time dynamics. We know that a given site i will be infectious for an exponential holding time with mean 1 (if she ever becomes infected). If i contacts a given neighbour site j during the infectious period we draw an arrow from i to j , and we say in this case that the oriented bond (i j ) is open. Otherwise (i j ) is declared to be closed. Now let C be the set of sites that can be reached from 0 by a path of open bonds. It follows that C is exactly the set of sites that will ever become infected if we start by making the origin infectious. Now the critical infection rate may be de ned as c = inf f : P(jCj = 1) > 0g: (7.2) By de nition, a major outbreak has positive probability if and only if R0 > 1. So if 0 < c < 1 then our basic reproduction number may be written simply as R0 = = c, but this notation has not been acknowledged in the literature. It is very di cult to nd good bounds for the critical rate c, or equivalently the critical probability pc = c=( c + 1). By using the so called zero-function and the beautiful comparison theorem of Kuulasmaa (1982) some progress can be made, as we now explain. The zero-function is de ned as follows. For each subset A of fi 2 Z2 : jij = 1g set (A) = P(all bonds (0 i) i 2 A are closed): Easy calculation yields (A) = 1;p 1 ; p + pjAj where p = +1 : Fix p, and consider two extreme percolation processes, with zero-functions 0 and 1 and with critical probabilities p0 and p1 , respectively. The rst one is obtained by c c opening the bonds with probability p independently of each other. Obviously, 0 (A) = (1 ; p)jAj: The other extreme is obtained by, for each site i, opening all the bonds (i i + j ), jjj = 1, with probability p and closing all of them with probability 1 ; p. We have 1 (A) = 1 ;p jAj 1: It is easily checked that 0 (A) (A) 1(A) for all A, and the comparison theorem 1 tells us that p0 pc p1. The number p0 is exactly 2 (see Kesten, 1982), while the c c c other probability p1 is unknown. Numerical studies in the physics literature give c p1 0:5927. Translating to rates yields 1 1:4552. Simulation studies by c c Kuulasmaa (1982) provide us with the narrower interval 1:12 c 1:25: Asymptotic shape Let (t) be the set of removed cases at time t. If the origin is infected initially then, given non-extinction, we expect the epidemic to grow in all directions of the plane at a linear rate. The shape theorem of Cox and Durrett (1988) states the following: If > c then, given non-extinction, there is a convex set D R2 such that for any > 0 we have t(1 ; )D \ C (t) t(1 + )D (7.3) for t su ciently large. Moreover, the set of infectious sites is situated close to the boundary of tD. The set (t) necessarily contains many \holes", i.e. areas of susceptible sites that are surrounded by removed sites. Not much is known about the actual shape of the set D. Exercises 7.1. Consider the graph interpretation of the standard SIR epidemic given in Section 7.1. Describe the distribution of the out-degrees and the distribution of the in-degrees. What happens with these distributions as n ! 1, m = 1? 7.2. Check the details in the derivation of Eq. (7.1). Also, show that for some choices of parameter values , , the proportion of infectives y is increasing initially even though R0 is below 1. 7.3. An epidemic process on a regular network. A graph on N = n + m vertices is called k-regular if all vertices have degree k. For N large, pick a k-regular graph at random and run the epidemic model of Section 7.3 on this graph. Assuming m = 1 (so = 0 asymptotically), nd conditions on (k ) that render possible a large outbreak. Also, derive heuristically the following equation for the nal size proportion : 1; = + + s + k where s = + + s + k;1 : 8 Models for endemic diseases All the epidemic models encountered so far have assumed a closed population, i.e. births, deaths, immigration and emigration of individuals are not considered. However, when modelling the spread of a disease with a very long infectious period or a disease in a very large population, dynamic changes in the population itself cannot be ignored. Indeed, in a large community the susceptible population might be augmented fast enough for the epidemic to be maintained for a long time without introducing new infectious individuals into the community our common childhood diseases are typical examples. Such a disease is called endemic. Already Bartlett (1956) proposed a stochastic epidemic model for endemic diseases. This model was then modi ed slightly into what is known as the SIR epidemic process with demography (van Herwaarden and Grasman, 1995, and Nasell, 1999). In Section 8.1 this model is presented, and some large population results for it are given. We also discuss brie y the time to extinction of the epidemic, and the interesting notion of critical community size, loosely de ned as the population size needed for the epidemic to persist over a given time horizon with a given probability. Another way of achieving endemicity is to retain the assumption of a closed population, but to suppose instead that infected individuals lose their immunity after some time. It should be noted, however, that this is an arti cial way to keep the epidemic going. In reality, we would not expect to observe a xed population where individuals contract the same disease over and over again. The special case, where individuals become susceptible immediately after the infectious period, turns out to be particularly easy to analyse, much easier than the SIR epidemic with demography. This model, called the SIS epidemic model, is discussed in Section 8.2. In particular, the time to extinction is investigated in some detail. Throughout this chapter the infectious period is assumed to be exponentially distributed, thus leading to Markovian epidemic processes which can be well understood using the techniques of Chapter 5. 8.1 The SIR model with demography Let us describe the stochastic SIR model with demography, starting with the population dynamics. Individuals are born into the population at a constant rate n and each of them has an exponentially distributed lifetime with intensity , i.e. the average lifetime is given by 1= . The population size will thus uctuate around the quantity n. The reason for choosing size-independent birth rates is to avoid population extinction or explosion. Assume now that the population is homogeneous and homogeneously mixing. Initially there are n susceptible and m infectious individuals in the population. A given infective stays infectious for a time period that is exponentially distributed with intensity (unless she dies for other causes than the disease before the end of that period). During that time she contacts a given individ- ual at rate =n. If the contacted individual is susceptible, she immediately becomes infectious and proceeds to infect other individuals. All random variables and Poisson processes involved are assumed to be mutually independent. Denote by X (t) the number of susceptibles at time t, and let Y (t) denote the number of infectives at time t. Then (X Y ) = f(X (t) Y (t)) t 0g is a bivariate continuous time Markov process with the following transition rates: from to (i j ) (i + 1 j ) (i ; 1 j ) (i ; 1 j + 1) (i j ; 1) at rate n i ij=n ( + )j: The four transitions above correspond to births of susceptibles, deaths of susceptibles, infections of susceptibles, and recovery or deaths of infectives, respectively. The possibility of births and deaths of individuals is the only feature that distinguishes this model from the standard Markovian SIR epidemic model. As we will see shortly, this new component in the model will have a great impact on the qualitative behaviour of the epidemic. Law of large numbers Consider a sequence (Xn Yn) of SIR models with demography, all of the processes having the same parameters , and . Assuming that the proportion mn =n tends to > 0 as n ! 1, we rst derive a law of large numbers for the sequence (Xn Yn) = (Xn=n Yn=n). We wish to apply Theorem 5.2. The rates ` of Section 5.2 are as follows: (1 0) (x y ) = (;1 1) (x y ) = xy (;1 0) (x (0 ;1) (x y) = x y ) = ( + )y leading to the drift function F (x y) = (; xy + (1 ; x) xy ; ( + )y) : Exactly as in Section 5.5, we check that the conditions of Theorem 5.2 are ful lled. Hence (Xn(t) Yn(t)) tends to (x(t) y(t)) in probability uniformly on compact time intervals as n ! 1, where (x(t) y(t)) is the solution to the system of di erential equations dx = ; xy + (1 ; x) dt dy = xy ; ( + )y dt with initial condition (x(0) y(0)) = (1 ). This system cannot be solved explicitly, but we can understand its behaviour for large t by nding the stationary points, i.e. the points where the time derivatives are zero. Let us introduce two auxiliary parameters. The basic reproduction number is given by R0 = =( + ), since the true infectious period is exponentially distributed with intensity + , taking into account the possibility of death before recovery. Also, let = ( + )= be the ratio of average lifetime to average duration of infection. There are two stationary points, namely the disease-free state (1 0) together with the point (^ y) = + x^ 1; + = 1 R0 ; 1 : + R0 R0 The rst of the stationary points is stable for R0 < 1 and unstable for R0 > 1, while the second one is stable for R0 > 1 (and is otherwise negative and therefore uninteresting). In other words, if R0 < 1 the infection is predicted to die out fairly quickly. On the other hand, if R0 > 1 then it will rise towards a positive infection level, called the endemic level. Central limit theorem Assume that R0 > 1 and suppose for simplicity that the process is started close to the endemic level (nx ny). Since the process is positively recurrent and all states (i j ) ^^ with j 1 communicate, the process will become absorbed into the set of disease-free states f(i 0) : i 0g in nite time. Prior to absorption we expect to observe small uctuations around the endemic level. To examine the nature of these uctuations, de ne p; ~ ~ Xn(t) Yn(t) = n Xn(t) ; x(t) Yn(t) ; y(t) t 0: We wish to apply the central limit theorem of Section 5.4. Using the notation of that section, we have ^ ;^ @F (^ y) = ; yy; x ; x; = R;R01 ; x^ ^ ^ 0 0; and ^ ^^ ; ^^ 20 R ; 1) G(^ y) = (1 + x)x+ xy xy + ( xy )y = R ;(RR; 1) ;(R 0 ; 1) : x^ ; ^y ^ ^^ +^ 2( 0 0 0 Here we have expressed all quantities in terms of the new parameters R0 and , using that x = 1=R0, y = (R0 ; 1)=( R0), = R0 and + = . It can be shown that ^ ^ the matrix function (t s) of Theorem 5.3 splits into a product ~ (t) ~ ;1 (s), hence the covariance matrix (t), t 0, satis es the di erential equation d = @F (^ y) + (@F (^ y))T + G(^ y) x^ x^ x^ (8.1) dt by straightforward calculation using the variance expression of the theorem. We ~~ conclude that (Xn Yn) converges weakly on compact time intervals to a Gaussian ~~ process (X Y ) with mean zero and covariance matrix . In particular, an explicit expression for the covariance matrix for large t is easily derived as the stationary solution, ^ , to (8.1). By easy calculation, ^ = 12 R0 + R0 ;R0 ;R0 2 R0 ; 1 + R0 = : Time to extinction We still assume that R0 > 1. As already noted above, the time to extinction Tn = inf ft 0 : Yn(t) = 0g from the endemic level is a.s. nite, for any xed n. It is a classical (and very di cult) problem, going back to Bartlett (1956), to obtain estimates of this random variable. Not even the expected time to extinction is easily found. An approach that at rst sight seems natural is to let the population size tend to in nity and regard disease extinction as the result of a large deviation from a high endemic level. Such an analysis is performed in van Herwaarden and Grasman (1995), who derive a complicated asymptotic formula for E (Tn). Nasell (1999) takes a quite di erent approach when deriving heuristically an approximate expression for E (Tn ). He notes that the coe cient of variation of the number of infectious individuals in endemicity is given by pn ^ 22 pnpR0 ; 1 + R02= ny ^ = R0 R0 n(R0 ; 1) pnpR0 ; 1 2 where the last approximation is due to the fact that R0 . Indeed, if we assume that the average lifetime is 80 years and consider a disease with an infectious period of two weeks, say, then is about 2000. Thus, even if the population size is several millions, the coe cient of variation above is still quite large. This observation suggests that for a real-life disease in a homogeneously mixing community, extinction is likely to be caused by a normal uctuation from a not so high endemic level, rather than by a large deviation from a high level. Simulation results also show that, for realistic parameter values, the Nasell (1999) formula gives a much better approximation to the observed time to extinction than does the formula derived by van Herwaarden and Grasman (1995). Finally we mention the notion of critical community size. Needless to say, the distribution of Tn depends on the parameters R0 , and as well as the population size n. The critical community size nc = nc(t p) with time horizon t and extinction probability p is de ned as the solution n to the equation P(Tn > t) = 1 ; p: If p is small, this means that for community sizes larger than nc the infection is likely to persist more than t time units. Nasell (1999) obtains estimates of the critical community size for some sets of parameter values. Below, two simulations of the SIR model with demography are shown in Figure 8.1. In both simulations n = 100 000, R0 = 10 and = 3750 (corresponding to an infectious period of 1 week and average lifetime of 75 years) implying that the equilibrium is at (nx ny) = (10000 24). The y-axis corresponds to the number of ^^ infectives and the x-axis to the number of susceptibles. In the top gure the disease becomes extinct rather quickly, whereas it seems to become endemic in the lower gure. 8.2 The SIS model The stochastic SIS model for endemic infections is de ned as follows. Suppose that we have a closed homogeneously mixing population consisting of n individuals. We let m of these individuals become infectious at time t = 0. Each infectious individual remains infectious for a time period that is exponentially distributed with parameter . During that time the individual makes contact with a given individual at rate =n. If a contacted individual is susceptible then she immediately becomes infectious. An infectious individual becomes susceptible again right after the infectious period, in contrast to the SIR epidemic models where the infectious period is followed by life-long immunity to the disease. All infectious periods and Poisson processes are assumed to be mutually independent. Since individuals are either susceptible or infectious, it is enough to keep track of the number of individuals in the infectious state, say. Denote by Y (t) the number of infectious individuals at time t. Then Y = fY (t) t 0g, can be described as a simple continuous time birth and death process on the state space Sn = f0 : : : ng, having the following transition rates: from to at rate i i + 1 i(n ; i)=n i i;1 i Y (0) = m. Note that we are using the total population size rather than just the number of initial susceptibles when scaling the infection rate. The reason is, of course, that the initial infectives may become infectious several times over the course of the epidemic and are thus taking a much more active part in the progress of the epidemic than in the SIR case, where they were merely used to start up the process. 60 50 40 30 20 10 9850 9900 9950 10000 10050 10100 9850 9900 9950 10000 10050 10100 60 50 40 30 20 10 Figure 8.1: Simulations of the SIR model with demography for average population size n = 100 000 and equilibrium point (10000 ,24). In the top gure the disease became extinct quickly (absorbed by the x-axis) whereas the disease persisted for a long time in the bottom gure. Law of large numbers Consider as usual a sequence Yn of epidemic models, all with the same infection rate and recovery rate . If the proportion of initial infectives, mn=n, tends to a nontrivial limit , 0 < < 1, as n tends to in nity, we expect a law of large numbers for the scaled process Yn = Yn=n to be valid. To prove this, we use Theorem 5.2. Using the notation of Chapter 5, we have 1 (y) = y(1 ; y) and ;1(y) = y, so that the drift function is given by F (y) = y(1 ; y) ; y: This function is easily seen to satisfy the condition of Theorem 5.2, hence we conclude that jYn(t) ; y(t)j tends to zero in probability uniformly on compact time intervals as n ! 1, where y(t) is the solution to the di erential equation dy = y(1 ; y) ; y dt y(0) = . The explicit solution is given by ;1 ; e( ; )t y(t) = 1 ; + (e( ; )t ; 1) if 6= , while y(t) = =(1 + t) if = . The basic reproduction number is given by = . We note that y(t) ! 0 as t ! 1 if = 1. On the other hand, if = > 1 then y(t) ! y = 1 ; = > 0 as t ! 1. In this case, the value ny is called the ^ ^ endemic level of the process. Even though the deterministic motion y(t) is just an approximation of Yn(t) valid within a xed nite time horizon, these results indicate that the stochastic SIS model will behave very di erently depending on whether the basic reproduction number is below or above 1. Central limit theorem p In order to study the uctuations of the process Yn, we de ne the n-scaled centered process p; ~ t 0: Yn(t) = n Yn(t) ; y(t) We restrict ourselves to the interesting case where = > 1 and the process is started close to the endemic level, i.e. Yn(0) ! 1 ; = in probability as n ! 1. Let us apply Theorem 5.3. We have @F (^) = ; ; 2 y = ;( ; ) y ^ G(^) = y(1 ; y) + y = 2 ( ; ) y ^ ^ ^ (t s) = e;( ; )(t;s) ~ and the theorem states that Yn converges weakly on compact time intervals to a ~ Gaussian process Y with mean zero and variance function Zt ~ Var(Y (t)) = 2 ( = ; ) 0 e;2( ; )(t;s) ds ;1 ; e;2( ; )t : The corresponding formulas become more complicated if the process is started away from equilibrium, or in the case where the basic reproduction number is below 1. Time to extinction We proceed to study the time Tn to extinction of the process Yn, Tn = inf ft 0 : Yn(t) = 0g: Since Yn is irreducible on the set Sn n f0g and has the state f0g as absorbing state, the time to extinction is a.s. nite. It is of considerable interest to understand what happens to Tn as the population grows to in nity. The theorem below shows that the time to extinction for the stochastic model is a quantity that re ects the threshold behaviour of the deterministic process. In the case the time to extinction is always relatively short, while in the case > it may be astronomical. Before stating the result we need a de nition. We say that an and bn are asymptotically equivalent as n ! 1, an bn , if the quotient an=bn tends to unity as n becomes large. Theorem 8.1 The time Tn to extinction of the stochastic SIS model has the following asymptotic properties: (A1) > and mn=n ! > 0 as n ! 1 : Tn=E (Tn) ! U in distribution, where U is exponentially distributed with parameter 1. Moreover, E (Tn ) r 2 n( ; e )2 as n ! 1, where V = log( = ) ; 1 + = > 0. nV (8.2) (A2) > and mn = m 1 for all n : Tn ! T a.s. where P(T < 1) = ( = )m < 1. Here T is the extinction time for a linear birth and death process with birth rate i, death rate i and with m individuals in the beginning. On the set where T is in nite, Tn =E (Tn ) ! U in distribution, with U and E (Tn) as above. (B1) and mn=n ! > 0 as n ! 1 : We have that ( ; (1 ; )) Tn ; log n ; log ; log 1 ; (1 ; ) !W in distribution, where W has the extreme value distribution: P(W w) = expf;e;w g: (B2) and mn = m P(T < 1) = 1. 1 for all n: Tn ! T a.s. with T as in (A2), but now The theorem states the following: (A1) If the basic reproduction number R0 = = is above 1 and the number of initial infectives is large, then the time to extinction Tn grows exponentially with the population size. (A2) If R0 > 1 and we start with a small number of infectives, then a kind of threshold phenomenon presents itself. On one part of the sample space Tn stays small, on the other part Tn grows exponentially. (B1) If R0 1 but the number of initial infectives is large, then Tn behaves like log n. (B2) Finally, if R0 1 and there is a small number of infectious individuals in the beginning then the time to extinction is always small. Proofs and references are given in Andersson and Djehiche (1998). Below we indicate the proof of the result in the most interesting case (A2). (A1) and (A2) fall in the class of results on the exponential limiting distributions of rst passage times of birth and death processes into rare sets of states, see e.g. Keilson (1979). The excellent book by Aldous (1989) gives a heuristic treatment of related topics. The results (A2) and (B2) partly rely on the coupling argument given below. The paper by Nasell (1996) is mainly concerned with the so-called quasi-stationary distribution of the process Yn, de ned by Qn(i) = tlim P (Yn(t) = i j Yn(s) 6= 0 0 s t) !1 i = 1 2 ::: n but interesting results concerning the time to extinction are also obtained. Nasell notes that, for each n, Tn is exponentially distributed with parameter Qn(1) if the initial distribution of the process equals Qn , and then uses asymptotic expansions in order to estimate 1= Qn(1). In the situation (A1), and only there, whether the process is started from our prescribed xed value or from the quasi-stationary distribution is of no importance for the asymptotic results, and indeed, in this case 1= Qn(1) is asymptotically equivalent to the right hand side of (8.2). For a nice treatment of quasi-stationary distributions, see van Doorn (1991). We now sketch a proof of (A2). By means of a simple coupling we show that, if the initial number mn of infectives stays constant as n ! 1, then the epidemic process resembles a linear birth and death process at the beginning of the time development. De ne Z (t), t 0, as a continuous time birth and death process with the following transition table: from to at rate j j+1 j j j;1 j Z (0) = m. It is well-known that the time to extinction T for Z satis es P(T < 1) = ( = )m if > . The distribution of T ^ ^be given in a closed form (see e.g. can Syski, 1992). Now de ne a bivariate process (Yn Zn) with transition intensities from to at rate (i j ) (i + 1 j + 1) i(n ; i)=n (i j + 1) j ; i(n ; i)=n (i ; 1 j ; 1) i (i j ; 1) j; i ^ ^ and initial value (Yn(0) Zn(0)) = (m m). It is clear that the marginal distribu^ tions coincide with the distributions of Yn and Z , respectively. Moreover, Yn(t) ^ Zn(t) for all t 0. Clearly the approximation breaks down as soon as the bivariate process leaves the diagonal. By investigating the jump rates we see that downward jumps from the diagonal are impossible while upward jumps are very rare as long as the states (i i) have i2 n. This implies that the two coordinates will stick together during the whole epidemic on that part of the sample space where the time to extinc^ ^ ^ tion of Zn is nite. On the other hand, if Zn explodes, then one can show that Yn reaches the endemic level ny (or rather the integer closest to this number) with high ^ probability. To study the time to extinction given this latter event, we write Tn = An + K X n k=1 Bn(k) + Cn where An = inf ft 0 : Yn(t) = 0 or Yn(t) nyg, Kn is the number of completed ^ excursions from the endemic level, Bn(k) is the length of the kth completed excursion and Cn measures the time to reach the absorbing state counted from the time of the last entrance to the endemic level. Of course, Kn = 0 and Cn = 0 if Yn never reaches the endemic level. By the strong Markov property, the variables Bn(k) are independent and identically distributed. Also, the number of completed excursions Kn has a simple geometric distribution with parameter n, say. The following technical results are proved in Andersson and Djehiche (1998). First, the probability n of absorption during an excursion satis es = ; 1 e;nV as n ! 1 p (8.3) n 2= with V = log( = ) ; 1 + = > 0. Also, the expected excursion lengths satisfy E (Bn(k)) r p = as n ! 1: 2 n 2( ; ) Finally, for any C > 0 we have An ! 0 and Cn enC enC in probability as n ! 1. Using (8.5), we may write Tn E (Tn) !0 (8.4) (8.5) K K X 1 Kn 1 X Bn(k) : E (Kn)E (Bn(1)) k=1 Bn(k) = E (Kn) Kn k=1 E (Bn(k)) n n Condition on the event that the approximating linear birth and death process explodes. Then the normed sum to the right tends to 1 in probability by a version of the law of large numbers, and it is easily checked that Kn=E (Kn) tends to an exponentially distributed random variable. Also, since the expectation of Tn is asymptotically equivalent to E (Bn(1))= n, Equation (8.2) follows from (8.3) and (8.4). Exercises 8.1. Derive the di erential equation (8.1) for the covariance matrix (t), t 0, of the SIR epidemic with demography. Also derive the expression for ^ . 8.2. Find the endemic level (^ y) for the normed SIRS epidemic (Xn(t) Yn(t)) (see x^ Exercise 5.3). 8.3. SIRS epidemic model (continued). Suppose that the epidemic process is started at the endemic level (i.e. (Xn(0) Yn(0)) = (^; y)). Apply Theorem 5.3 to derive a px ^ p ~ ~ central limit theorem for (Xn(t) Yn(t)) = n Xn(t) ; x Yn(t) ; y , the n-scaled ^ ^ centered process. and mn=n ! > 0 as n ! 1. Give an approximation of the time until all the initial infectives have recovered from their rst encounter with the disease, thus providing an elementary lower bound for Tn. 8.4. Consider the SIS epidemic with Part II ESTIMATION Until now, these lecture notes have been concerned with modelling the spread of disease in a community. Such modelling can be interesting in its own right and certain characteristics of the models, such as the threshold limit theorem, can provide deeper insight into the spread of disease. However, another reason for modelling epidemics is to draw conclusions about particular diseases, for the guidance of health authorities. In the sections to follow, we therefore examine the equally important area of drawing inferences about model parameters. Inference in epidemics is non-standard in that the epidemic process is often only partly observed. The available data usually consists of knowledge of those infected during the course of the epidemic, and sometimes at what time points these individuals exhibited symptoms.This implies that statistical analysis has to be based on partial observation, which, together with the strong dependencies in epidemic models, studied in Part I, makes analysis of the simplest models quite complicated. In Chapter 9 we assume that the epidemic is observed completely, meaning that for each individual we know if she was infected or not, and if she was, the times of infection as well as removal (i.e. recovery and immunity). Such detailed data is in practice hard to collect, at least for large communities. Still, statistical methods for such data can be of interest as an indication of how much better estimates would be if such data had been available. If the answer turns out to be `very much better' then e orts should be made to collect the necessary data. In Chapter 10 we treat more realistic types of partial data. We concentrate on the case where the nal state is observed and brie y discuss the case where the removal times of infected individuals are also observed. The statistical methods treated in Chapters 9 and 10 focus on the standard SIR epidemic model however, methods to extend the techniques to more realistic assumptions which take account of individual heterogeneities in the community and non-uniform mixing patterns, due to the presence of social networks, are also discussed. When allowing for di erent heterogeneities, statistical analysis quickly becomes cumbersome, particularly in the realistic case where the epidemic is only partially observed. A di erent approach to statistical analysis is then to make use of Markov Chain Monte Carlo methods. This technique, which has yet to be fully developed for infectious disease data, is presented in Chapter 11. Chapter 12 focuses on the estimation of parameters most relevant for health authorities. More important than knowing the values of certain model parameters is their interpretation in terms of how to avoid future epidemics by the introduction of preventive measures such as vaccination. More speci cally, we focus on estimating the critical vaccination coverage, that is, the proportion of the susceptible population which it is necessary to vaccinate in order to prevent future outbreaks. In Part I, we observed that if there are only few initially infectious individuals the epidemic may not take o . When drawing inferences, we can only hope for consistency in our estimation if we observe a major outbreak. Thus, one has to condition on the occurrence of a major outbreak. Rather than doing this, we take the easy way out and assume that the proportion of initially infectious individuals is positive ( > 0). In the rst part of the text it was assumed that no individuals were initially immune. In reality this is rarely true: quite often there are immune individuals, possibly due to a previous outbreak of the disease. The same techniques and similar results also apply to the case with initially immune individuals, by a slight transformation of parameters. In Part II of the text we retain the assumption of no initially immune individuals. Conclusions about the parameters will be misleading if this assumption does not hold. In Exercises 9.4 and 12.2 we give hints on how to modify estimation in the case of a known positive proportion of initially immune individuals. Another problem not discussed in the text is that of model choice we are not comparing di erent models statistically nor checking if data is consistent with the general model. In the present notes, parameter estimates are derived on the implicit assumption that the model is true. We concentrate on presenting ideas and techniques for estimation, rather than actually applying the methods to real data. The main references for statistical methods for epidemics are Becker (1989) and Anderson and May (1991), with the latter excluding uncertainty considerations. The earlier monograph on modelling by Bailey (1975) also treats statistical inference. See also the recent survey paper by Becker and Britton (1999). 9 Complete observation of the epidemic process In this chapter we assume that the standard SIR epidemic process En m( I ) is observed completely. By complete observation is meant that the infection times i and removal times i (hence also the length of the infectious period Ii = i ; i) of all infected individuals are observed for individuals who did not become infected we de ne i = 1 and Ii = 0 for formulas to be consistent. From the data it is obviously possible to deduce how many individuals are susceptible, infectious (and removed) for each time t, implying that (X Y ) = f(X (t) Y (t)) t 0g as well as the nal size Z = n ; X (1) are observed. Based on the observed data we want to draw inferences on the transmission parameter and the distribution of the infectious period I . We do this by means of Maximum Likelihood (ML) theory. First, we have to make the meaning of `likelihood' clear for such epidemic processes. Hence, in Section 9.1 we derive the log-likelihood for a vector of counting processes, and also state some properties of martingales which will turn out useful in the statistical analysis. 9.1 Martingales and log-likelihoods of counting processes An excellent reference to statistical methods for counting process data is Andersen et al. (1993), where both theory and many examples are given. In the present section we state some results and motivate them heuristically. Consider a vector of counting processes N = (N1 : : : Nk ), where each component Ni(t) counts the number of times a speci c event has occurred up to time t. Assume that the probability that such an event occurs in (t t + h] given the history of the whole vector process up until t, denoted Ht, satis es P(Ni(t + h) ; Ni(t) = 1 jHt) = h i(t)Xo(h) i = 1 : : : k + P(N (t + h) ; N (t) = 0 jHt) = 1 ; h i(t) + o(h): i (9.1) In the equations above i (t), denoted intensity functions, are non-negative functions which may depend on the history of the process up until time t. Formally this is written as i (t) 2 Ht and (Ht )t 0 is called a ltration of - elds. To go into mathematical details is beyond the scope of these lecture notes { often, it su ces to understand their meaning heuristically. The equations above imply that, in a short time interval, only one of the k di erent events can occur (i.e. the corresponding counting process increases by 1). Notice the similarities between (9.1) and the jump rates (5.1) of Chapter 5. The intensities above are more general in that they may also depend on the process prior to t (i.e. the process is not necessarily Markovian), and less general in that only one type of jump (increase by 1) can occur. What is the log-likelihood of the observations on fN (t) 0 t u)g for this process? We derive this heuristically by dividing (0 u) into small increments t of length h and working out the probability of observing what happens in the increment, conditional on the history prior to the increment. Let t = t t + h) and let Ni(t) = Ni(t + h) ; Ni (t) be the number of i-events that occur in this interval. Then the probability of our observations can be written as 0 Y @Y (h i(t)) t (0 u) 1;h Ni (t) i X i !1;P i (t) i Ni (t) 1 + o(h)A : The two factors within the large brackets above simply pick out which of the probabilities to use, depending on what was observed in t. As the increment length h becomes smaller, the number of increments increases and all but the increments containing events appear in the second factor above. For small h this factor is equivalent P to exp (;h i i(t)). The expression above decreases (except if no events have occurred) because of the h's appearing in the inner product. This is not surprising: the probability of observing the events exactly at some given time points is 0! However, if we divide the expression above by h to a power equal to the observed number of jumps, and let h tend to 0, it can be shown that the logarithm of the expression above converges to `u(N (t) 0 t u) = XZ u 0 i (log i(t;)]dNi (t) ; i(t;)dt) (9.2) and this is the log-likelihood of the observed process, with i(t;) denoting the intensity just before t. In the expression above dNi(t) is 1 at times t when the counting process increases, and 0 otherwise, so the rst integral is actually a sum. The log-likelihood is used for drawing inferences about parameter(s) of the intensity functions. From now on, we therefore write i( t;) and `u( ) for the intensities and the log-likelihood respectively. Below a one dimensional parameter is considered, otherwise partial derivatives should be used. The true, unknown parameter value is denoted by 0 . Further, all derivatives will be with respect to the parameter and not time. If we have observed the counting process up until u, the ML-estimator ^u is the solution which maximizes `u( ). We nd the maximum by di erentiating the log-likelihood: `0 ( u )= = X Z u; 0i( t;) dNi(t) ; i ( t;) i Z0 X u 0 ( t;) i 0 i i( 0( i t;)dt t;) (dNi(t) ; i( t;)dt) : (9.3) The process above, viewed as a function of u is often called the score process in statistical literature. The ML-estimator ^ is the solution of the equation `0u( ) = 0. For the true parameter 0 , i.e. using the true intensities, the integration increments dNi(t) ; i( 0 t;)dt have zero mean (cf. Equation 9.1). Such zero-mean increments are closely connected to martingales, a notion we now describe. A process M = fMu u 0g is said to be a martingale (with respect to the history Ht) if it satis es two conditions: E(jMuj) < 1 for all u, and E (MujHt) = Mt for u t. The theory of martingales is rich and profound a rigorous presentation is given in Andersen et al. (1993), Chapter II, and a less mathematical description with a view to applications in epidemics is to be found in Becker (1989), Chapter 7. We now state some properties of martingales relevant for our application (here we only consider 1-dimensional martingales, but many results can be extended to the multivariate case). The single most important property of a martingale M is the second de ning property, namely that its expected future value equals its present value. In our case the martingale will be `u( 0 ) which is 0 at u = 0, implying that ~ E (`u( 0 )) = 0 for all u. Further, a linear combination aM + bM of martingales is a martingale. If a bounded predictable process, i.e. a bounded process X for which X(t) is determined by Ht; (the history up to but not including t), tis integrated with R ~ respect to a martingale M , then the resulting process M (t) = 0 X (u)dM (u) is a martingale. From the properties stated above it follows that the score process f`u( 0 ) u 0g, evaluated at the true parameter value 0 , is a martingale. For many martingales it is possible to prove central limit theorems as some quantity tends to in nity. In addition to the mean we therefore need Rto derive the variance of a martingale. Consider a martingale of the form Mu = 0u f (t)(dN (t) ; (t;)dt), where N (t) is a counting process with intensity (t) and f is a predictable process (as in our case above). De ne R the so-called optional and predictable variation processes by M ]u = 0u f 2 (t)dN (t) R u f 2(t) (t;)dt respectively. The following lemma explains why they and M u = 0 are called variation processes. Lemma 9.1 Suppose the martingale M de ned above has nite second order moments. Then Var(Mu) = E M u ] = E M ]u ]. R Proof. The second equality is immediate since M ]u ; M u = 01 f 2(t)(dN (t) ; (t;)dt) is itself a martingale, so its expectation is 0. The rst P equality is derived as follows. The de ning integral of Mu is the limit of the sum f (t)( N (t) ; (t)h) using the same notation as before. The variance of such a sum is the sum of covariances. Each such covariance term is equal to the expectation of the product since t E f (t)( N (t) ; (t)h)] = E E f (t)( N (t) ; (t)h)jHt]] = E f (t)E N (t) ; (t)hjHt ]] = 0: This follows because the increments have mean 0. Consider a covariance term with disjoint time intervals s and t where s + h < t. Then E f (s)( N (s) ; (s)h)f (t)( N (t) ; (t)h)] = E E f (s)( N (s) ; (s)h)f (t)( N (t) ; (t)h)jHs+h]] = E f (s)( N (s) ; (s)h)E f (t)( N (t) ; (t)h)jHs+h]] = 0: The covariance between terms of disjoint time intervals is hence 0. Now the diagonal terms, i.e. the variance terms, are Var f (t)( N (t); (t)h)] = E f 2 (t)( N (t); (t)h)2 ] = E f 2(t)E ( N (t); (t)h)2 jHt ]]: The factor inside the inner expectation takes the value (; (t)h)2 if there is no jump in the interval and the value (1 ; (t)h)2 1 if there is a jump, and the latter happens with probability (t)h. Neglecting terms of order o(h) we thus have E ( N (t) ; (t)h)2 jHt;] = (t)h and consequently the variance term equals E f 2(t) (t)h]. The variance of the sum hence equals the sum of variances: X X Var Mu ] = E f 2 (t) (t)h + o(h)] = E f 2(t) (t)h + o(h)]: t t R Taking smaller and smaller intervals shows that this converges to E 0u f 2(t) (t;)dt] as stated. Using the same technique one can prove the following lemma stating that martingales from counting processes with no simultaneous jumps are uncorrelated. Lemma 9.2 Consider two martingales M1 and M2 de ned by M1 (u) = M2 (u) = Zu f1 (t)(dN1(t) ; 1(t;)dt) 0 Zu 0 f2 (t)(dN2(t) ; 2(t;)dt) where N1 and N2 are counting processes without simultaneous jumps having intensities 1 and 2 respectively and f1 and f2 are predictable processes. Then the covariance function satis es Cov(M1(u) M2 (t)) = 0 for all u t 0. Proof. The lemma is proven using methods similar to those in the proof of Lemma 9.1. Hints are given in Exercise 9.1. We conclude this section by stating a martingale central limit theorem due to Rebolledo (1980) (see also Andersen et al., 1993, p 83). For this we study a sequence of martingales M (n) , indexed by n. Let M (n) be a martingale containing all jumps of M (n) larger than in absolute value, and let the function m(u) be a continuous increasing deterministic function. P Theorem 9.3 Assume that M (n) ]u!m(u) or M (n) P !m(u) and that for each > P 0, M (n) (u)!0 for all u. Then M (n) ) M (1) , where M (1) is a continuous Gaussian martingale with M (1) u = M (1) ]u = m(u). In particular M (1) (t) ; M (1) (s) is Gaussian with mean 0 and variance m(t) ; m(s) for t s. u We omit the proof of the theorem. In the Markovian case it can be proven using methods similar to those presented in Chapter 5. 9.2 ML-estimation for the standard SIR epidemic Assume now that we observe En m( I ) up to some time u, that is we observe all infection times i and removal times i that have occurred up until u. De ne N1(t) = n ; X (t), a counting process for the number of infections that have occurred in (0 t], and N2 (t) = n + m ; (X (t) + Y (t)) a counting process for the number of removals in (0 t] (remember that X (0) = n and Y (0) = m = n). The rst counting process has intensity ( =n)X (t;)Y (t;). The counting process N2 has a more complicated intensity. If F denotes the distribution ofP infectious period I and f its density the function then the intensity function at t is i f (t ; i )=(1 ; F (t ; i )), where the sum extends over individuals which are infectious at t;. Applying results of the previous section and manipulating the integrals for the counting process N2 it follows that the log-likelihood is given by `u( F ) = Zu + (log ( =n)X (t;)Y (t;)]dN1 (t) ; ( =n)X (t;)Y (t;)dt) X 0 i i u< log(1 ; F (u ; i)) + i X i i u log f ( i ; i ): (9.4) To estimate we di erentiate the log-likelihood with respect to @ ` ( F ) = Z u 1 dN (t) ; X (t;)Y (t;)dt = N1 (u) ; Z u X (t;)Y (t;)dt 1 @u n 0 0 where, as before, X (t) = X (t)=n. (Whenever integration is with respect to Lebesgue's measure (dt), as on the far right above, one may replace the argument `t;' by `t', this is done in what follows.) Solving the likelihood equation @@ `u( F ) = 0 thus gives the ML-estimate ^ = R u N1 (u) : (9.5) 0 X (t)Y (t)dt Estimating properties of the infectious period can be done parametrically or nonparametrically. In particular, if the epidemic is observed to the end, u = T = inf ft Y (t) = 0g, then the rst sum in (9.4) is empty and inference is based on the observed lengths of the infectious periods Ii among individuals who were infected. For example, = E I ] is simply estimated by the observed mean I . For the rest of this section we treat the Markovian version of the standard SIR model, i.e. assuming that the length of the infectious period is exponentially distributed with mean length 1= . The reason for doing this is that convergence results can be derived from the theory presented in Chapter 5, but most results are true for the general case. For the Markovian case the intensity for N2(t) is simply Y (t) (each infectious individual is removed at constant rate ). The derivative of the likelihood with respect to then becomes @ ` ( ) = Z u 1 (dN (t) ; Y (t)dt) = N2(u) ; Z u Y (t)dt: 2 @u 0 0 Consequently, the ML estimator is N ^ = R u 2(u) : (9.6) 0 Y (t)dt A more natural parameter to estimate is 1= , the expected length of the infectious period, which is of course estimated by the inverse of the estimator above. In particular, R if the epidemic is observed to the the end u = T the estimator is 0T Y (t)dt=N2(T ), and this is simply the aggregated length of all infectious periods divided by the number who were removed. The estimator is hence simply the arithmetic mean of the observed infectious periods. We will now prove that the ML-estimators are asymptotically normal. First we apply the martingale central limit theorem to the score processes. Lemma 9.4 Consider the Markovian version of the standard SIR epidemic with initial values X (n)(0) = n and Y (n) (0) = n. De ne the normed score processes ( ( S1n) (u) = n;1=2 @@ `u( ) and S2n) (u) = n;1=2 @@ `u( ), evaluated at the true parameter values ( 0 0). Then ( S1n) ) S1 and ( S2n) ) S2 where S1 and S2 are Gaussian martingales. The covariance function of S1 and S2, denoted v1 and v2 respectively, are de ned through the deterministic limiting functions x(t) and y(t) of Section 5.5, and are given by v1(u) = v2(u) = respectively. ;2 0 ;2 0 Zu 0 x(t)y (t)dt = Z0 u 0 ;2 (1 0 ;2 0 y (t)dt = 0 (1 + ; x(u)) ; x(u) ; y(u)) Proof. The result follows immediately from the martingale central limit theorem (Theorem 9.3). The rst assumption of the theorem was shown in Section 5.5, and since the size of all jumps are n;1=2 for the normed process, there will be no jumps larger than for n large enough. We are now able to prove the main result of this chapter, a result originally given by Rida (1991). Theorem 9.5 The ML-estimators ^ and ^ are asymptotically Gaussian with asymptotic means 0 and 0 (the true parameter values) respectively. The asymptotic variance of ^ is 1=nv1(u) and for ^ the variance is 1=nv2(u). Consistent estimates of the ; standard errors of ^ and the more natural parameter ^ 1 = 1=^ (the average length of the infectious period) are p s.e. ^ = ^= N1 (u) and ; s.e. ^ 1 p ; = ^ 1= N2(u): For the case where the epidemic is observed to its end (u = T ) the deterministic integrals appearing in the variance functions v1 and v2 may be replaced by ; and respectively ( was de ned in Chapter 4). Similarly, the square root expressions for the standard errors then becomes Z and Z + n respectively. Proof. We prove the statement for ^ , the proof for ^ being similar. The estimator ^ is the solution to the likelihood equation @@ `u( ) = 0. Multiplying the likelihood p equation by 0= n gives 1 N (u) ; Z u ^X (t)Y (t)dt = S (n) (u) + p ( ; ^) Z u X (t)Y (t)dt: 1 0= p 1 01 0 n 0 Rearranging the formula yields pn ^ ; n 0 ( S1n) (u) : (9.7) 0 = ;1 R u 0 0 X (t)Y (t)dt ( According to Lemma 9.4 S1n)(u) converges to a Gaussian random variable with mean R 0 and variance v1 (u). Further, by Theorem 5.2 ;1 0u X (t)Y (t)dt converges in proba0 R bility to 0u x(t)y(t)dt = v1 (u). Hence, by Slutsky's theorem it follows that ^ is asympR totically normal with prescribed mean and variance. That 01 0 x(t)y(t)dt = ; is easily shown from the deterministic equations in Section 5.5. Finally, N1 (u) also R converges in probability to 0u 0 x(t)y(t)dt using Theorem 5.2, from which it follows that the standard error is consistent. Of course, the basic reproduction number 0 = 0= 0 is estimated by ^ = ^=^, where ^ and ^ are de ned by (9.5) and (9.6) respectively. This estimator is also asymptotically Gaussian as the following corollary shows. Corollary 9.6 The estimator ^ = ^=^ is asymptotically Gaussian with mean 0 and ; ; 2; 2 variance v1 1 (u) + 0 v2 1(u) =(n 0 ). If the epidemic is observed to its end (i.e. u = q ^) = ^ ;Z ;1 + (Z + );1 =n. T ) then a consistent standard error is given by s:e:( Proof. The proof consists of two steps. First we argue that ^ and ^ are asymptotically independent. The second part, left for Exercise 9.2, consists of applying the -method, i.e. expanding ^ and ^ around their true values 0 and 0, by Taylor's theorem, and using their independence. To show that ^ and ^ are asymptotically independent we p; ( ; note that, by de nition n ^;1 ; 0 1 = S2n) (u)=N2(u) and a similar expression for pn ^ ; 0 is given in Equation (9.7). The denominators in the two expressions ( ( converge in probability to deterministic limits, and S1n) and S2n) are uncorrelated due to Lemma 9.2 which proves asymptotic independence. Statistical analysis for various extensions of the standard SIR model, assuming complete observation, has not received much attention in the literature. One reason for this is that the epidemic process is rarely observed in such detail. Another reason is that for many models (e.g. most network models) stochastic properties of the epidemic process are unknown, only properties of the nal state having been derived. Two exceptions are Rhodes et al. (1996) and Britton (1998a). The former considers a more detailed model for a homogeneous community and assumes even more detailed data information about actual contacts is known. For such data Rhodes et al. (1996) derive estimation procedures for various parameters, including the probability that a contact leads to infection. Britton (1998a) derives estimation procedures based on complete data for the Markovian version of the multitype epidemic model (see Chapter 6). It is shown that the infectivity of a given type can only be estimated consistently if the corresponding susceptibility di ers from all other susceptibilities. Exercises 9.1. Prove Lemma 9.2. (Hint: Use the same technique as in the proof of Lemma 9.1, i.e. to divide the interval into small disjoint intervals of length h and show that the conditional expectation of each term is 0, the diagonal terms as well! To be precise the terms are of order o(h2).) 9.2. Prove the remaining part of Corollary 9.6. (Hint: Apply the -method together with the results of Theorem 9.5 and independence to derive the asymptotic variance of the estimator ^.) 9.3. Table 9.1 on the next two pages describes an outbreak of smallpox in a Nigerian village (taken from Table 6.1 p 112-113, Becker, 1989). For each day the number of susceptibles and infectives are given as well as the number of infections that occurred during that day. (Note that the number of infectives does not increase as soon as an infection occurs. The reason for this is that individuals are latent for some time before turning susceptible. This does not a ect estimates.) Assume this to be an outbreak from the Markovian version of En m( I ). a) Estimate the infection parameter , the mean length of the infectious period ;1 and the basic reproduction number = = . b) Give standard errors for the estimates. 9.4. Suppose that instead of having X (0) = n we have X (0) = sn, meaning that only a proportion s of those not initially infectious are susceptible and the remaining are immune (if the number of removed is denoted Z (t) we hence have Z (0) = (1 ; s)n under the present assumption). What is the e ect on the ML-estimator ^ and its variance? Answer the question by considering two communities having the same numbers of infectives and susceptibles initially and all through the epidemic, only one community having no initially immune and the other community having, say, half as many initially immune as initially susceptible. (Hint: Note that the two communities have di erent n = X (0) + Z (0)). Table 9.1. The spread of smallpox in a Nigerian village (continued) Number Number Number Number Number Number Time infectious suscept. infected Time infectious suscept. infected t I (t) S ( t) C (t) t I (t) S (t) C (t) 0 1 119 1 22 1 111 1 1 1 118 0 23 2 110 0 2 1 118 0 24 2 110 0 3 1 118 0 25 2 110 1 4 1 118 0 26 5 109 0 5 1 118 0 27 6 109 2 6 1 118 0 28 5 107 0 7 1 118 1 29 5 107 2 8 1 117 0 30 4 105 0 9 1 117 1 31 5 105 0 10 1 116 0 32 5 105 0 11 1 116 0 33 2 105 0 12 1 116 3 34 1 105 1 13 1 113 1 35 1 104 0 14 1 112 0 36 2 104 0 15 1 112 0 37 2 104 1 16 1 112 0 38 1 103 1 17 1 112 1 39 2 102 0 18 1 111 0 40 2 102 0 19 1 111 0 41 4 102 0 20 1 111 0 42 4 102 2 21 1 111 0 43 5 100 1 Number Number Number Number Number Number Time infectious suscept. infected Time infectious susceptib. infected t I (t) S (t) C (t) t I (t) S (t) C (t) 44 5 99 1 64 5 90 0 45 5 98 1 65 4 90 0 46 4 97 0 66 3 90 0 47 4 97 2 67 5 90 0 48 3 95 1 68 3 90 0 49 3 94 0 69 2 90 0 50 1 94 0 70 2 90 0 51 2 94 0 71 2 90 0 52 3 94 0 72 3 90 0 53 3 94 2 73 3 90 0 54 3 92 0 74 1 90 0 55 2 92 0 75 1 90 0 56 4 92 0 76 1 90 0 57 5 92 0 77 2 90 0 58 5 92 1 78 2 90 0 59 5 91 0 79 1 90 0 60 5 91 0 80 1 90 0 61 7 91 0 81 1 90 0 62 8 91 0 82 1 90 0 63 6 91 1 83 1 90 0 10 Estimation in partially observed epidemics In the previous chapter, statistical analysis was based on what we called complete observation of the epidemic process both the times of infection and removal (recovery) for all infected individuals were observed. In real life such detailed data is rarely available. In the present chapter we study estimation procedures for less detailed partial data. The likelihood for partial data is usually cumbersome to work with, being a sum or integral over all complete data sets resulting in the observed partial data. Other estimation techniques can turn out to be much simpler. Below we present two general techniques which have been successful in several epidemic applications: martingale methods and the EM-algorithm. The two techniques are applied to a speci c estimation problem. First, we use martingale methods to derive an estimator for the basic reproduction number in the standard SIR epidemic, assuming that only the initial and nal states are observed. Second, we consider a discrete time version of the models presented previously, introducing chain-binomial models, which are interesting in their own right. Assuming that the chains are observed we make use of the EM-algorithm to deduce an estimator of the transmission parameter. It is worth pointing out that pure likelihood methods can be applied in several cases, even when only partial data is available. If, for example, several small sub-populations (e.g. households) are observed, then ML-estimation is often possible from nal size data. Estimation then consists of traditional ML-theory with the additional ingredient of potential numerical problems due to the complicated form of the likelihood (cf. the exact likelihood in Section 2.4). Addy et al. (1991) present estimation procedures for nal size data of sub-populations, allowing for a heterogeneous community and also for transmission from outside the sub-population. The ideas extend the early work by Longini and Koopman (1982) who treat a homogeneous community and a special form of transmission dynamics. 10.1 Estimation based on martingale methods The standard SIR epidemic En m( I ) was de ned in Section 2.1. Suppose that we have collected data from such an epidemic and that data consist of knowing how many individuals have been infected during the course of the epidemic, besides knowing the initial state (i.e. the number of initially susceptible and infectious individuals respectively). In many cases the initial state may not be known. The initial number of infectives is usually small thus not causing much uncertainty the substantial problem lies in knowing how many individuals are initially susceptible to the disease. Quite often some individuals are immune even before the disease is introduced, perhaps due to an earlier outbreak of the same or a similar disease. Then the number of susceptibles has to be estimated prior to the outbreak, using traditional statistical methods. Here we assume this number to be known thus neglecting such uncertainty. Our data consists of n = X (0), n = Y (0) (the initial state) and the nal number infected Z = n ; X (T ), and implicitly knowing that Y (T ) = 0 since there are are no infectious individuals at the end of the epidemic. As observed in Section 2.4 the exact distribution of the nal size of the epidemic has a numerically complicated form, thus not enabling much help in estimation when n is large. One way to estimate is of course to use the central limit theorem stating that Z is approximately normally distributed and equating the mean to the observed value Z . Here we take a di erent approach based on martingale methods, a method applicable in many other cases. From Chapter 9 we know that Zu 0 ;dN (t) ; X (t)Y (t)dt ; Z u g(t) (dN (t) ; Y (t)dt) f (t) 1 0 2 0 0 is a martingale for any choice of predictable (left-continuous) processes f = ff (t) t 0g and g = fg(t) t 0g. The ML-estimator ^ was obtained by equating the martingale to 0 for the choice f (t) = 1= and g(t) = 0, and ^ from f (t) = 0 and g(t) = 1=R u. This method can no longer be used in the case of nal size data: for example 0 X (t)Y (t)dt appearing in ^ is not observed. However, the same idea of equating a martingale to its mean, a special form of the method of moments, can be adopted by choosing f and g cleverly so that the resulting martingale only depends on observed quantities. We want to choose f and g such that the `dt' terms cancel out. This happens if f (t) = 0=( 0X (t;)) (note that f (t) is left-continuous) and p g(t) = 1. Let M (n) denote the resulting zero-mean martingale divided by n and let 0 = 0 = 0 be the basic reproduction number. That is, pnM (n)(u) 1 ;dN (t) ; X (t)Y (t)dt ; Z u (dN (t) ; Y (t)dt) = 1 0 2 0 0 0 Z 0 X (t;) Zu u1 =1 dN1(t) ; dN2(t) 0 0 X (t;) 0 n 1+ 1 + + 1 = (10.1) n ; (N1(u) ; 1) ; N2 (u): 0 n n;1 Zu The last equality is true since X (t) = n ; N1(t). For u = T , the end of the epidemic, its value is determined by the nal size data since N1 (T ) = Z and N2 (T ) = n + Z . Solving the equation M (n) (T ) = 0 leads to the estimator 1 1 1 ^ = n + n;1 + + n;(Z ;1) : Z+ (10.2) To see that the estimator is reasonable, we look at its large population approximation using the approximation 1=n +1=(n ; 1)+ + 1=(n ; (Z ; 1)) ; log(1 ; Z ) implying that ^ ; log(1 ; Z )=(Z + ): From the law of large numbers derived in Section 4.3 it is seen that the estimator agrees with the large population limit. The large population limit for Z + , denoted by , is de ned in (4.1), where = E (I ) corresponds to 1= 0 (and hence to 0 ) for the Markovian case treated here. A central limit theorem can be derived using methods similar to those in the case of complete data treated in Section 9.2. From the martingale central limit theorem (Theorem 9.3) we have the following corollary: Corollary 10.1 The martingale M (n) de ned by (10.1) satis es M (n) ) M. The limiting process M is a continuous Gaussian martingale with covariance function v(u) = 12 x(1u) ; 1 + (1 + ; x(u) ; y(u)) 0 where x(u) and y(u) are deterministic functions de ned in Section 5.5. The martingale observed at its end, M (n) (T ) = M (n) (1), converges to a Gaussian random ; variable with mean 0 and variance v (1) = 0 2 ((1 + ; );1 ; 1) + . Proof. The optional variation process M (n) ] has the following form Zu 1 1 Z u dN (t) (n) ] = 1 Mu 2 n 0 0 X 2 (t;) dN1(t) + n 0 2 1 1 = 12 1 + 2 + + (X (u) + 1=n)2 + (1 + ; X (u) ; Y (u)): n0 (1 ; 1=n) (The fact the N1 and N2 jump at distinct time-points makes the two martingales de ned from N1 and N2 independent, cf. Lemma 9.2, so the variation process of their di erence is the sum of the two variation processes.) From Chapter 5 we know that X (u) and Y (u) respectively converge to x(u) and y(u) of Section 5.5 almost surely, uniformly on bounded intervals. Because an integral is de ned as the limit of a sum with smaller and smaller time increments it then follows that M (n) ]u converges to p v(u). Because the jumps of M (n) are of size 1= n we can then apply Theorem 9.3 to conclude weak convergence of the martingale. A formal proof of the nal statement is technical and not given here (see Rida, 1991). The variance is as expected since x(1) = 1 + ; and y(1) = 0, the asymptotic proportions of susceptibles and infectives respectively at the end of the epidemic. The corollary above immediately induces a central limit theorem for the estimator ^ which we state in the following corollary (see also Rida, 1991). Corollary 10.2 The estimator ^ de ned in (10.2) is asymptotically normally dis- tributed with mean 0 and asymptotic variance 1 ; 2 2 1;( ; ) + 0 n where is de ned in (4.1) with 0 replacing . A consistent standard error for the estimator is given by Z + ^2 (Z + ) 1=2 : s:e:( ^) = p 1 n(Z + ) 1 ; Z Proof. From the de nition of ^ we have pnM (n)(T ) + 1 ; 1 ! Z T 1 dN1(t): 0= ^ 0 0 X (t;) R p ; Rearranging the formula gives 0 1 ; ^;1 = nM (n) (T )= 0T X (t);1 dN1(t). As noted earlier, the integral equals n (1=n + + 1=(n ; Z + 1)). In Section 4.2 it was shown that Z=n converges in probability to ; . Hence it follows that 1+ + 1 n n ; Z + 1 ! ; log(1 + ; ) in probability, and the right hand side is equal to 0 due to the de nition of . Together with Corollary 10.1, stating that M (n) (T ) is asymptotically normal we can p; apply Slutsky's theorem to conclude that n( 0 1;; ^;1 ) converges to a Gaussian ; random variable with mean 0 and variance ( 0 );2 0 2 ((1 + ; );1 ; 1) + . Finally, apply the -method, i.e. derive the Taylor expansion of the estimator around 4 the true parameter, to conclude that asymptotically Var( ^) = 0 Var( ^;1 ). Using martingale methods we have shown how to derive an estimator for based on observing only the nal size. If the complete data were available a better estimator is of course ^ = ^=^ of Chapter 9, which uses more information. When only the nal size is observed we cannot estimate the two parameters separately. This is quite natural: one random quantity (the nal size) enables estimation of one parameter, and since we have no knowledge of the time evolution of the process, we should not expect any information about the length of the infectious period. Sometimes information about the time of diagnosis of infected individuals is also available. The time of diagnosis is approximately equal to the `removal time' because social activity is often reduced when symptoms become obvious, and so is the infectiousness. This motivates the study by Becker and Hasofer (1997), which derives estimation procedures for data on removal times, assuming the Markovian version of the standard SIR epidemic. Besides the martingale considered above, they derive ~ another estimating equation from the martingale M (t) = X (t)(1+ =n)n+ n;X (t);Y (t) . This enables them to estimate and separately as in the case of complete data. Britton (1998a) has constructed estimators, using martingale methods similar to those of the present section, for nal size data of the multitype SIR model of Chapter 6, assuming proportionate mixing. Susceptibility parameters are shown to be asymptotically Gaussian whereas infectivities and the basic reproduction number cannot be estimated consistently. An explanation is that the dimension of the data is lower than the number of parameters, and by observing those who were infected, not much can be said about those who caused the disease to spread further, i.e. about the infectivities. 10.2 Estimation based on the EM-algorithm The EM-algorithm was rst named by Dempster, Laird and Rubin (1977), but the method dates back even earlier. It is a method aiming at simplifying likelihood inference by introducing more detailed hypothetical data. First, we explain the general method very brie y, and then give an application to epidemic inference. Let x denote the observed data and suppose we wish to estimate some parameter(s) , but that the log-likelihood `( x) is cumbersome to work with. Suppose further that estimation would be much simpler, or even trivial, if some hypothetical data y, containing x, were available. The EM-algorithm then suggests that one `pretend' this more detailed data is available and estimate the parameter(s). Then one computes the expected value of the hypothetical data given the observed data. This algorithm is repeated until the change in value of the parameter estimate is negligible. Using mathematical notation one should start with an initial parameter choice (0) . Then perform the E(xpectation)-step and M(aximization)-step repeatedly, where the Eand M-steps respectively consist of computing E-step : f ( x (j;1) ) = E (`( Y )jx (j;1) ) M-step : (j) = argmax f ( x (j;1) ): This algorithm is iterated until the di erence between (j) and (j+1) is negligible. The resulting value is the parameter estimate. The hard part of the EM-algorithm lies in inventing suitable imaginary data. How this is done varies from application to application. We now apply this method to an estimation problem in epidemics, where we consider a discrete time epidemic model within the class of chain-binomial models. For each integer k, let Xk and Yk respectively denote the number of susceptible and infectious individuals at time, or generation, k. The name chain-binomial comes from the way susceptible individuals are reduced (i.e. become infected): given Xk = xk and Yk = yk the number of susceptible individuals at k + 1, Xk+1, is binomially distributed with parameters xk and qy . This means that a susceptible individual in generation k remains susceptible at k + 1 if she escapes infection from each of the yk infectious individuals at k, and these non-infective events are independent and occur with probability q. Here we concentrate on estimation of q based on the data (x1 y1) : : : (xr yr ) we do not have to consider how long individuals are in the latent k state and infectious before becoming removed. The simplest and most studied model is known as the discrete time Reed-Frost model where it is assumed that the infected individuals become infectious immediately and are removed in the next generation implying that yk+1 = xk ; xk+1. All we assume here is that infections occur according to the chain-binomial model. We therefore introduce the notation ik+1 = xk ; xk+1 to denote the number of individuals that become infected in generation k + 1, contained in the data. The likelihood with respect to q of the data is r;1 Y xk k=1 y x +1 y i +1 xk+1 (q ) (1 ; q ) k k k k and the log-likelihood is given by `(q fxk yk g) = c + r;1 X k=1 (xk+1 log(qy ) + ik+1 log(1 ; qy )) : k k To maximize this expression is numerically complicated if the number of infectious individuals is high in some generations. The reason is that the derivative of the second term above then becomes messy. Suppose that rather than observing (xk+1 ik+1) we observed xk+1 0 xk+1 1 : : : xk+1 y , where xk+1 j is the number of individuals susceptiP ble in generation k who had j (out of yk ) infectious contacts. Note that j 1 xk+1 j = ik+1 and x;k+1 0 = xk+1. This data extension separates the probability (1 ; qy ) into the terms yj (1 ; q)j qy ;j . Estimation of this extended model is close to trivial. The log-likelihood is given by k k k k `(q fxk 0 xk 1 : : : xk y 1 k; yk g) = c + r;1 y XX k k=1 j =0 xk+1 j (j log(1 ; q) + (yk ; j ) log q) : Di erentiating this and equating the likelihood equation to 0 reveals that the MLestimator for the extended data is Pr;1 Py (y ; j )x Pr;1 y x ; Pr;1 Py jx k+1 j k=1 j =0 k = k=1 k k Pr;1k=1 j=0 k+1 j : q = Pr;1 Py ^ k=1 j =0 yk xk+1 j k=1 yk xk Note that the estimator is simply the number of contacts not resulting in infection divided by the total number of contacts. Unfortunately the data xk+1 1 : : : xk+1 y is not available. We therefore replace the unobserved quantities above by their expected values given the available data. For j 1 we have ;y (1 ; q)j qy ;j E Xk+1 j jfxk yk g q] = ik+1 j 1 ; qy : This follows because each of the yk individuals who were infected has a binomially distributed number of infectious contacts, a number conditioned on being positive. k k k k k k k P The expectation of the negative term in the numerator of q is therefore k ik+1(1 ; ^ y ). q)yk =(1 ; q The EM-algorithm starts with an initial guess q(0) and then updates the estimate iteratively according to the iteration function k Pr;1 i (1;q( 1))y k=1 k+1 1;q( 1) q(j) = 1 ; Pr;1 y x : j; yk j; k k=1 k k Iterating this function is very easy, since only the numerator has to be re-evaluated each iteration. In practice less than 10 iterations should su ce for the estimator to converge. The EM-algorithm has been used successfully in many areas of statistics. In the analysis of infectious disease data it seems suitable for several problems but often methods remain to be developed. In Becker (1997) the EM-algorithm is applied to several problems, with a particular interest in HIV/AIDS data. Exercises 10.1. Consider the data in Exercise 9.3 and adopt the same model assumptions. Estimate and give the standard error assuming only the nal state is observed. 10.2. During October/November 1967 an outbreak of respiratory disease occurred on the isolated island of Tristan da Cunha (see Section 8.2.3 of Becker, 1989, for more details on on the data). Among the 255 individuals of the island 40 infections occurred. Assume that one individual was infected from outside and that all other individuals were susceptible to the disease (reasonable assumption). Estimate and derive the standard error for the estimate under the simplifying assumption of a homogeneous uniformly mixing population). 10.3. The standard deviation of ^ based on complete observation of the epidemic process should be smaller than the standard deviation of ^, but how much? Derive an expression for the ratio of the standard errors for the nal size and complete data estimators. Compute the ratio numerically for 0 = 1:5 and 0 = 2:5. (Hint: Use the expressions for the standard errors in Corollaries 9.6 and 10.2 and their large population limits.) ~ 10.4. Prove that M (t) = X (t)(1+ =n)n+ n;X (t);Y (t) is a martingale (this martingale is used by Becker and Hasofer, 1997, to construct an estimating equation). (Hint: ~ ~ Look at a small time increment M (t + h) ; M (t) and show that its expectation, conditioned on X (t) and Y (t), equals 0.) 10.5. Table 10.1 below describes an outbreak of common cold in 664 households of size 5 (taken from Table 2.7, p 31, Becker, 1989). Each household has one introductory case, after this external infections can be neglected. Infections are observed in terms of generations and infected individuals become infectious immediately and remain so over one generation only (this implies that yk = ik using the notation of Section 10.2). A chain 1 ! 2 ! 1 hence implies that (y1 = 1 x1 = 4), y2 = 2 x2 = 2, y3 = 1 x3 = 1 and y4 = 0 x4 = 1 for this household. Use the EM-algorithm to estimate q, the probability to escape infection from one infectious individual over one generation. (When estimating q you should use all households by treating each chain separately. That is, in the iterative estimator q(k) you should sum also over all households.) You may of course use exact ML estimates if you prefer. Table 10.1. Outbreak chains for common cold in households of size 5 Chain Obs. frequency 1 423 1!1 131 1!1!1 36 1!2 24 1!1!1!1 14 1!1!2 8 1!2!1 11 1!3 3 1!1!1!1!1 4 1!1!1!2 2 1!1!2!1 2 1!1!3 2 1!2!1!1 3 1!2!2 1 1!3!1 0 1!4 0 11 Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) is a computer-intensive statistical tool that has received considerable attention over the past few years. Using MCMC theory, it is often quite simple to write e cient algorithms for sampling from extremely complicated target distributions thus, it is not di cult to understand why these techniques have found important applications in a vast number of di erent areas. Although the literature on MCMC methods is growing rapidly, the excellent book by Gilks, Richardson and Spiegelhalter (1996) provides a good starting point for the interested reader. In the rst two sections of this chapter we present the main ideas of MCMC, and apply the techniques to two important situations: Calculating expectations with respect to complicated probability measures, and optimizing complicated functions. The optimization method described can sometimes be a good alternative to the EMalgorithm for calculating ML-estimators, but this point will not be developed further here. In Section 11.3 some practical issues are discussed, regarding e.g. convergence diagnostics. Finally, in Section 11.4 we describe a Bayesian approach for drawing inferences about basic parameters in stochastic epidemic models of the SIR type. (Most current applications of MCMC are oriented towards Bayesian inference, as are also those in the epidemic area. The example described here is taken from O'Neill and Roberts, 1999, but see also O'Neill et al., 199?.) 11.1 Description of the techniques The idea which lies behind Markov Chain Monte Carlo, rst introduced by Metropolis et al. (1953), is in fact very simple. Suppose that we wish to sample from some target distribution (x), x 2 E Rp, which is known only up to some multiplicative constant. In situations where is su ciently complex so that simple Monte Carlo methods cannot be used, an indirect sampling method would be to construct an aperiodic and irreducible Markov chain with state space E whose stationary distribution is given by . This means that after having run the chain to stationarity, we can extract a (dependent) sample of any desired size from the target distribution. Some important questions immediately present themselves: Is it always possible to construct a Markov chain with the desired properties? How do we nd a suitable starting point? How do we determine the burn-in period, i.e. the number of iterations needed until the chain is su ciently close to stationarity? What are the implications of the fact that there are dependencies in the sample? We answer the rst question right away, leaving the other issues until Section 11.3. Constructing a Markov chain with state space E and stationary distribution is surprisingly easy. We describe here the Metropolis-Hastings algorithm (see Hastings, 1970), an algorithm that is easy to understand and implement. The Markov chain X , = 0 1 2 : : : , is started at some suitable point in E . Given X = x, the next state X +1 is chosen by rst sampling a candidate point Y from some proposal distribution q( j x) and then accepting this candidate with probability (x Y ), where (y y (11.1) (x y) = min 1 (x)q(x jj x) : )q(y ) If Y is accepted then of course X +1 = Y , otherwise X +1 = X . As soon as a proposal distribution, suitable for the problem at hand, has been speci ed implementation of the Metropolis-Hastings algorithm is easy. Note that possibly nasty normalizing factors in the target distribution disappear, since only the ratio (y)= (x) is needed in the calculation of the acceptance probability (x y). We now argue that the distribution of X converges to the target distribution as ! 1. The transition probability is given by P (y j x) = q(y j x) (x y) and Z for all y 6= x (acceptance) P (x j x) = 1 ; q(y j x) (x y) dy (rejection): By checking separately the cases where the quotient in (11.1) is below one and above one, respectively, it follows immediately that (x)q(y j x) (x y) = (y)q(x j y) (y x) implying detailed balance for the chain: (x)P (y j x) = (y)P (x j y) for all x y 2 E . Integrate both sides of the detailed balance equation with respect to x to obtain Z (x)P (y j x) dx = (y) for all y 2 E . Thus is stationary for the process X . Under weak regularity conditions on the proposal distribution it can be shown that X actually converges in distribution, regardless of the choice of initial value. The performance of the algorithm depends heavily on the form of the proposal distribution q. For computational e ciency, q should be chosen so that it can be easily sampled and evaluated. On the other hand, q and must be related in such a way that the acceptance probability becomes reasonably large otherwise the chain might be mixing slowly (i.e. moving slowly around the support of ). Let us now brie y present three widely used updating schemes. The Metropolis algorithm assumes that q (y j x) = q (x j y ) for all x y 2 E . This is the case if e.g. the proposal only depends on the distance between the points x and y, q(y j x) = q(jy ;xj). Then the acceptance probability reduces to (x y) = min 1 (y) : (x) Another simple updating scheme, called the independence sampler, is obtained by putting q(y j x) = q(y), so that new candidate points are chosen without considering the present value of the chain. In this case, w(y (x y) = min 1 w(x) ) where w(x) = (x)=q(x). For the independence sampler to work well, q should be a fairly good approximation to . Finally we describe the Gibbs sampler. Suppose that E is a subset of the possibly high-dimensional space Rp. We run through the indices i, 1 i p, sequentially, and for a given index i we choose q(y j x) as follows: q(y j x) = (yi j x1 xi;1xi+1 xp) if yj = xj for all j 6= i, and q(y j x) = 0 otherwise. Thus the candidate point y is obtained by changing the ith coordinate of x according to the full conditional distribution. A quick calculation shows that (x y) = 1 for all x y 2 E , meaning that new candidates are always accepted! The Gibbs sampler is particularly well suited to deal with spatial models, where the full conditional distribution of a single site often depends only on the nearest neighbouring sites and can consequently be written down easily. 11.2 Important examples Calculating expectations The problem of calculating expectations of complicated high-dimensional distributions arises in many situations. Let X be a random variable with distribution (x), x 2 E Rp, and consider the expression E f (X )] = Z f (x) (x) dx for some function f of interest. If it is possible to use classical Monte Carlo simulation to draw an independent sample X , 1 n, with the correct distribution, then the time average n 1 X f (X ) n =1 (11.2) is guaranteed to provide an accurate approximation of E f (X )] if only the sample size n is large enough. On the other hand, in cases where straightforward Monte Carlo methods fail, we may instead use MCMC to obtain a dependent sample X , 1 n, and again form the time average (11.2). Under certain regularity conditions on the Markov chain (see e.g. Roberts and Smith, 1994), we may apply the ergodic theorem to show that the time average is consistent, i.e. n 1 X f (X ) ! E f (X )] n =1 almost surely as n ! 1. (A detailed presentation of ergodic theory lies beyond the scope of this text.) In practice, the values X from the burn-in period are usually discarded when calculating time averages. Stochastic annealing We now turn to another topic, namely optimization of non-standard functions using MCMC techniques. We are faced with the problem of minimizing a real-valued function f (x), x 2 E Rp, where for simplicity it is assumed that E consists of all the integer vectors inside some large box. Thus, every point x in E (excluding the boundary) has 2p nearest neighbours in E . Also, the optimal point, x say, is unique ^ by assumption. Implicitly we assume that there is no good deterministic algorithm available to lead us to the vicinity of the optimal point x, as is often the case in real^ life problems. We will demonstrate how a series of well chosen Metropolis algorithms can be used to solve the problem. The technique is known as stochastic (or simulated) annealing. Given > 0 we de ne the target distribution ; f (x) (x) = P e ; f (y) x 2 E: y2E e The following Markov chain X , 1, will have as its stationary distribution. Start anywhere in E . Given X = x, propose one of the nearest neighbours at random, and call this point Y . Then accept the new point with probability (x Y ), where ; (y (x y) = min 1 (x) = min 1 e; f (y);f (x)] : ) It is easily checked that this updating scheme corresponds to the Metropolis algorithm described in the last section, hence the theory tells us that is indeed the stationary distribution for X . The trick is now to run through a sequence of Metropolis algorithms according to the following scheme: Take > 0 and pick a starting point in E Run X close to stationarity Increase by a small amount, ! 0 With the old endpoint as starting point, run a new Markov chain X close to stationarity Increase 0 by a small amount, 0 ! 00 0 and so on. As becomes large, becomes more and more peaked at the optimal point x, since ^ ; f (x) ; f (x);f (^)] x (x) = P e e; f (y) = P e e; f (y);f (^)] x y 2E y2E which tends to 1 as ! 1 if x = x, and otherwise tends to 0. This indicates that ^ a carefully tuned stochastic annealing scheme should lead us to the desired optimal point. It is very di cult, however, to give any guidance regarding good updating rules for , and proper stopping criteria for the Markov chains. There are many general theoretical results available, but each speci c problem requires its own ne tuning and there will always be a certain amount of trial and error involved. 11.3 Practical implementation issues As soon as an MCMC algorithm suitable for the problem at hand has been determined, practical questions like the determination of starting points, burn-in periods and sample sizes have to be addressed. Here we brie y discuss these issues. The problem of estimating the expectation E f (X )], for some random variable X on E and some function f , by calculating time averages will serve as the main example throughout this section. Starting point First of all, we need a starting point for the Markov chain. A rapidly mixing chain will quickly nd its way no matter how the starting value is chosen. On the other hand, some chains converge to stationarity very slowly unless the starting point is chosen carefully. If several processors are available on the computer it is always a good idea to run chains in parallel, all of them starting at di erent values. Burn-in period In order to reduce the risk of inferential bias caused by the e ect of starting values, iterates during a certain initial period are usually discarded. The length of this burnin period depends on the rate at which the Markov chain converges to stationarity. Ideally, we would like to compute this convergence rate, and use it to estimate the number of iteration steps needed. This is usually not a tractable problem: the power of MCMC methods lies in their ability to solve extremely complicated statistical problems, and complex MCMC solutions typically involve chains that are very di cult to analyse theoretically. It is preferable to use convergence diagnostics (for a review, see Cowles and Carlin, 1994). Many di erent convergence diagnostics have been proposed in the literature, but they all have a common characteristic, namely that the sampler output is used in some way. Indeed, in many cases the output makes it obvious from the output when the initial transient period is over. We conclude by giving a simple rule of thumb that could be appropriate in some situations. If we are interested in estimating E f (X )] by computing time averages, then Geyer (1992) argues that since the burn-in period is likely to be less than 1% of the number of iterations needed to obtain adequate precision in the time average, we should concentrate on calculating the total run length needed and then just discard 1% of the iterations. Sample size How many iterations should we perform to ensure that our estimator is accurate enough? This question is also di cult to answer, because of the possible dependencies in the sample. We propose here some simple ideas that can be useful in particular circumstances. First, if the iterates X are indeed independent, then (assuming stationarity) the P p standard deviation of the time average n=1 f (X )=n is given by = n, where is the standard deviation of f (X ). Alternatively, if the series can be approximated by a rst order autoregressive process then the theory of time series tells us that the standard deviation of the time average is r1 + pn 1 ; where is the autocorrelation of the series f (X ), and the standard deviation as above. Another simple device used is to thin the observations by saving only every kth value, thus ensuring that the resulting sequence will consist of more or less indep pendent values. In that case the simple sample mean k=n will again do the job. Finally, an obvious ad hoc method for determining the required run length is to run several chains in parallel, all with di erent starting points, and compare the resulting time averages. We shall not be satis ed until all of them agree adequately. 11.4 Bayesian inference for epidemics In this nal section we indicate why the development of MCMC methods has had such a profound e ect on applied Bayesian statistics. Then, following O'Neill and Roberts (1999), we use MCMC to draw inferences on the infection probability when the nal sizes of several household epidemics of Reed-Frost type are observed. We have deliberately chosen a very simple example, just to introduce the reader to the subject. The main topic in O'Neill and Roberts (1999) is inference on the infection and removal rates in the Markovian standard SIR epidemic model, after having observed the removal times. This investigation is also carried out using Markov Chain Monte Carlo in a Bayesian framework. Similar results can be achieved using other methods, see Becker and Hasofer (1997), but as soon as the models become slightly more complicated, e.g. by the introduction of latent periods, the exibility in the MCMC approach becomes apparent. From a Bayesian perspective, there is no real di erence between observables and parameters of a statistical model: all are considered random quantities. Let D denote the observed data, and collect the model parameters and the missing data in a vector x. Then set up the joint distribution p(D x) over all random quantities. This distribution can be written p(D x) = p(x)L(D j x) where p(x) is the prior distribution of x and L(D j x) is the likelihood. Bayes' theorem is now used to determine the posterior distribution of x given the observed data: pxLD x p(x j D) = R p((y))L((D jjy))dy : This posterior distribution is the object of all Bayesian inference. The integral in the formula for p(x j D) above is in general very di cult to compute, especially in high dimensions. However, by using the Metropolis-Hastings algorithm for sampling from the target distribution (x) = p(x j D) this problem is overcome very elegantly, since, as already noted below Eq. (11.1), normalizing factors in this target distribution never need to be computed. This is the main reason why MCMC has become such an important tool in Bayesian statistics. Turning to the epidemic application, consider a population consisting of a number of households, each of size three. Start with one infectious individual in each household and run independent Reed-Frost epidemics. Denote the avoidance probability by q, so that 1 ; q is the probability that a given infective infects a given susceptible household member. Now, for j = 0 1 2, let Nj denote the number of households in which an epidemic of nal size j has occurred. Our task is to estimate the avoidance probability q given the data D = (N0 N1 N2). Recall from Section 1.2 that, for a given household, P(Z = 0) = q2 P(Z = 1) = 2(1 ; q)q2 P(Z = 2) = (1 ; q)2 + 2(1 ; q)2q: Note that the rst term in P(Z = 2) is the probability of having the entire household infected in one generation while the second term is the probability that it takes two ~ generations to infect the household. Denoting by N2 the (unobserved) number of two generation-epidemics in the population, it follows that the likelihood function for q can be written ~ ~ ~ L(qjD N2) = C (q2)N0 (2(1 ; q)q2)N1 ((1 ; q)2)N2;N2 (2(1 ; q)2 q)N2 ~ ~ = C 2N1 +N2 q2N0+2N1 +N2 (1 ; q)N1+2N2 : Readers familiar with basic Bayesian statistics will notice from this equation that if a Beta( ) distribution is chosen as the prior distribution of q, then the posterior distribution is given by ~ ~ p(q j D N2) Beta(2N0 + 2N1 + N2 + N1 + 2N2 + ): (11.3) Also, since the household epidemics are assumed to be independent, we have that q ~ p(N2 j D q) Bin N2 2q2+ 1 : (11.4) The Gibbs sampler of Section 11.1 can now be used to sample from the target dis~ ~ tribution (q N2) = p(q N2 j D), since we have the expressions (11.3) and (11.4) for ~ the corresponding conditional distributions. By introducing the extra variable N2 we ~ were able to write the likelihood function in a convenient form, but the actual N2values will probably be regarded as uninteresting and may therefore not be included in the sample output. Exercises 11.1. Show that the acceptance probability is 1 for the Gibbs sampler. 11.2. Implement the Gibbs sampler described in Section 11.4, and estimate the avoidance probability q from the hypothetical data given in the Table 11.1 below. All households had 1 initially infective and 2 initially susceptible individuals. Table 11.1. Household outbreaks Final Number of size households 0 39 1 26 2 35 12 Vaccination Perhaps the most important practical reason for the statistical analysis of epidemics lies in its application to vaccination policy. In the present chapter we focus on this topic. More precisely we consider the following question: who and how many individuals should be vaccinated to prevent future epidemic outbreaks? In Section 12.1 we study this question for the standard SIR epidemic, and in Section 12.2 for endemic diseases, focusing on estimates and neglecting uncertainties. The chapter concludes with a section devoted to the estimation of the vaccine e cacy, which measures the reduction in susceptibility resulting from vaccination. In Sections 12.1 and 12.3 we make use of the multitype epidemic presented in Chapter 6, where the types consist of vaccinated and unvaccinated individuals. 12.1 Estimating vaccination policies based on one epidemic Assume that the standard SIR epidemic En m( I ) is observed in a large community. If the epidemic was observed completely, we can estimate the basic reproduction number using methods presented in Section 9.2 (see also the exercises) if the nal size was observed, we use the methods of Section 10.1. Suppose now that a vaccine is available which reduces the transmission rate between an infectious and a vaccinated individual to (1 ; r)=n. The corresponding rate for a susceptible individual is =n, so r is the relative reduction, which is assumed to be known (its estimation is treated separately in Section 12.3). The important case of a perfect vaccine giving 100% and life-long immunity corresponds to r = 1. Based on the data from one outbreak, we wish to nd the proportion vc of the susceptible population, the critical vaccination coverage, that has to be vaccinated to prevent future outbreaks. We rst answer the question assuming to be known. Suppose a proportion v is vaccinated, then the community consists of two types of individuals, unvaccinated and vaccinated, and the multitype model of Chapter 6 is relevant. Adopting the notation of Chapter 6, we have two types of individual, 1 and 2 (unvaccinated and vaccinated) with 1 = 1 ; v and 2 = v. The infection parameters satisfy 11 = 21 = , 12 = 22 = (1 ; r) and 1 = 2 = we are implicitly assuming that vaccination only reduces the risk of becoming infected and not the infectivity once infected. The reproduction number, here denoted by Rv for obvious reasons, is then according to Section 6.2 the largest eigenvalue of the matrix with coe cients f i ij j g. For our speci c case this eigenvalue is simply Rv = (1 ; v) + (1 ; r)v: (12.1) From Section 6.2 we know that a major outbreak is impossible if Rv 1. In terms of the proportion vaccinated v, this is equivalent to v (1 ; 1= )=r (remember that = { the reproduction number prior to vaccination). This means that the critical vc 0.8 0.6 0.4 0.2 5 10 15 20 Figure 12.1: Graph of the critical vaccination coverage vc plotted against . vaccination (or immunity) coverage vc, above which the population is protected from major outbreaks, is vc = 1 1 ; 1 : r It may happen that vc > 1 (if r is small and large). The interpretation in this case is that the community is not protected from major outbreaks even when the whole community is vaccinated. The case with a perfect vaccine (r = 1) reduces to vc = 1 ; 1= . The situation where the whole community, and not only vaccinated individuals, is protected from the epidemic is known as herd immunity. In Figure 12.1 above vc is plotted against the basic reproduction number for the case of a perfect vaccine (i.e. r = 1). For example, if is known to equal 5, as may be the case for in uenza, then the critical vaccination coverage is vc = 1 ; 1=5 = 0:8 which means that at least 80% of the community should be vaccinated in order to prevent an outbreak. We now consider the case where is unknown. We then estimate it from observation of one epidemic outbreak (prior to vaccination) using theory from Chapter 9 or 10 depending on whether the epidemic is observed completely or only its nal state is observed. Either way we get an estimate ^ (in case of complete data ^ = ^ ^;1). We obviously estimate vc by vc = 1 1 ; 1 : ^r ^ (12.2) Using the -method, we obtain the consistent standard error s:e:(^c) = s:e:( ^)=r ^2. v ^) depends on the type of data. For nal size data it is given The standard error s:e:( 1=2 in Corollary 10.2 for complete data it equals (^;1 s:e:(^))2 + (^ s:e:(^;1))2 , applying the -method and the fact that the two parameter estimates are independent. A con dence interval for vc is obtained using the estimate and the standard error, together with the asymptotic normality of the estimator. Here it is most natural to use a one-sided con dence interval, since under-estimation has worse consequences than over-estimation. In a heterogeneous community, the question of vaccination is more complex: now the e ect of vaccination depends not only on the number vaccinated, but also on how individuals to be vaccinated are allocated. Suppose, for example, that the population consists of homogeneous individuals belonging to households where it is assumed that individuals mix at a higher rate within households, as in the model of Section 6.3. A relevant question is then how to select individuals for vaccination optimally, in the sense of most reducing the reproduction number. Is it better to spread vaccination over households or should one try to vaccinate certain households completely? Ball et al. (1997) show that for a sub-class of the model in Section 6.3 the optimal vaccination strategy is the `equalizing' strategy. This strategy picks individuals sequentially from households with the largest number of remaining susceptible individuals, thus equalizing the number of remaining susceptibles in households. They conjecture this strategy to be optimal for the model in general. If instead the population is heterogeneous due to individual heterogeneities, such as varying infectivity and susceptibility, the same question of who to vaccinate is relevant. In case of equal infectivity one should vaccinate individuals with the highest susceptibility, but if the infectivity also varies there is no simple solution. Then there is a trade-o between high susceptibility and high infectivity in the optimal selection procedure. The following important general question concerning estimation and vaccination was posed by Becker (e.g. Becker and Utev, 1998). Suppose an epidemic outbreak was observed and we want to draw inferences on who and how many to vaccinate. For a given model a vaccination strategy can often be derived, as with the standard SIR epidemic above. The question is how vaccination strategies derived from di erent model assumptions are related. That is, if a vaccination strategy was derived under the assumption that the observed epidemic was from model A, say, but model B is the true model, is the community still under herd immunity? As an example, suppose 60% in a large, completely susceptible, community was infected during the epidemic season. With previous terminology we have Z = 0:6 and we may assume that = 0 (i.e. there were only few initial infectives). Assuming the epidemic is the standard SIR epidemic we can apply results from Chapter 10 to estimate R0 = by ^ ; ln(1 ; Z )=Z = 1:53. The critical vaccination coverage for a vaccine resulting in 90% reduction of susceptibility (r = 0:9) is then estimated from (12.2) as vc = 0:9;1(1 ; 1:53;1) = 0:384. This means that at least 38.4% should ^ be vaccinated for the community to possess herd immunity. A relevant question is whether the community possesses herd immunity if a random sample of the community of this size is vaccinated, when the model assumptions are false. Suppose that the community consisted of households, all of size four to keep things simple, and that the rate of transmission is higher within households. Is the community then protected from future outbreaks? Or suppose that in fact 40% of the females and 80% of the males were infected (implying that males are more susceptible). Is the community then under herd immunity? The answer to the rst question for this speci c example is `yes', and to the second question `no'. There is no general answer for the rst type of question, i.e. whether neglecting the presence of households, can result in under-estimation of the critical vaccination coverage. Becker and Utev (1998) show that if all households are of equal size then the critical vaccination coverage is lower than if the data was collected from a homogeneous community without households. However, if households are of di erent sizes, as in real life, then the relation can go either way (the direction depends on the household structure and the proportion infected). If the critical vaccination coverage is derived taking account of individual heterogeneities (men and women in the example above) then this proportion is always larger than the vaccination coverage derived neglecting heterogeneities (Britton, 1998b). The natural conclusion is, of course, that heterogeneities should be taken into account. When doing this, one can also devise better vaccination strategies than picking individuals at random. The question is still important because it is reasonable to believe that some heterogeneities, social and/or individual, are not observable and therefore not taken into account. The results mentioned above then state that outbreaks may in fact occur in the community even when the proportion vaccinated exceeds vc, if vc is derived neglecting some relevant heterogeneities. 12.2 Estimating vaccination policies for endemic diseases The question of who and how many individuals to vaccinate is perhaps even more important for endemic diseases. The prime example of an endemic infectious disease which has been completely eradicated by vaccination is smallpox. Presently most childhood diseases are subject to vaccination policies in the western world, and these diseases are no longer endemic in most parts. It is not possible to vaccinate everyone for practical and ethical reasons (e.g. some religious communities oppose vaccination). To derive the necessary proportion vc to vaccinate (the critical vaccination coverage) is hence of great importance. Below, we derive estimates for the two endemic models presented in Chapter 8. Because of the rather complicated structure of these models we shall not derive standard errors but only estimates. Of course, the models are over-simpli ed in that the community is assumed to be homogeneous and individuals are assumed to mix uniformly. Further, any seasonal e ects are neglected, even though it is well-known that the beginning of school in the autumn usually implies outbreak peaks. E ects of vaccination policies, and their various estimation problems, for more realistic but deterministic models have been studied in many papers (e.g. Anderson and May, 1991, and Greenhalgh and Dietz, 1994). The SIR model with demography Let us rst consider the SIR model with demography of Section 8.1, which we recapitulate brie y. New individuals are born at rate n, an infectious individual has close contact with other individuals at rate (any speci c individual at rate =n) and becomes removed at rate . Finally, each individual dies at rate irrespective of her disease state. The bivariate Markov process (X Y ) = f(X (t) Y (t)) t 0g, which keeps track of the number of susceptibles and infectives respectively, is speci ed by the starting point (n n) and its transition rates given in Section 8.1. Suppose now that a perfect vaccine is available, i.e. a vaccine giving life-long 100% immunity, and that a proportion v of all individuals are vaccinated at birth. The only transition rate that is a ected is that susceptibles are born at rate (1 ; v)n (the remaining newborns are vaccinated and immune). We can apply the same model and results, with slight changes in the parameters as follows. Let n0 = n(1 ; v) so that susceptibles are born at rate n0. For the jump rates to be consistent with the rates of Section 8.1, we then have to write n0 in the transition rate for new infections, and this can be done if we introduce 0 = (1 ; v). Thus, we have the model of Section 8.1 with n0 and 0 replacing n and . Consequently the reproduction number Rv after the vaccination scheme is started, is given by 0 ; Rv = + = (1+ v) = (1 ; v)R0: From properties of the model it then follows that the only stationary point is the disease-free state if Rv < 1. This is equivalent to v > 1 ; 1=R0. The critical vaccination coverage is hence 1 vc = 1 ; R = 1 ; + : 0 If R0 is unknown it may be estimated prior to vaccination by observing the community in its stationary state. From Section 8.1 we know that in case of endemicity, the sta;; tionary point for population proportions is given by (^ y) = R0 1 (R0 ; 1)( R0);1 . x^ From this it immediately follows that R0 is estimated by the inverse of xobs, the ^ observed proportion of susceptibles: R0 = 1=xobs. To conclude, an estimate of the critical vaccination coverage vc necessary to prevent future outbreaks is vc = 1 ; xobs : ^ (12.3) Consider as an example measles in the western world prior to vaccination. On the average a fraction of approximately 1=15 7% had not yet experienced the disease and were thus susceptible. This implies that vc 93%, and a vaccination coverage ^ above this value will eradicate the disease. The vaccination coverage of measles in the United States has by now exceeded 90% and the disease is no longer endemic only smaller local outbreaks still occur. This shows that the estimate is in the right ball park even though it is estimated from a simple model neglecting heterogeneities in the community as well as social and geographic structure. Because most individuals will experience the disease before dying, and all individuals live equally long on the average, the community proportion of susceptibles will coincide with the average fraction of an individual's life that she is susceptible. That is, the average age at infection divided by the average lifetime. Conversely, the average age of infection is the average lifetime divided by R0. This explains the name `childhood disease': the (average) age of infection is small mainly because R0 is large, and not because of the special behaviour of children. For example, measles is believed to have R0 15, so with an average lifetime of 75 years the formula states that the average age at infection is 75/15=5 years. The SIS model In the SIS model (Section 8.2), the community is xed and new susceptibles arise not from births of new individuals but from infectious individuals recovering and becoming susceptible (in SIR models they become immune after recovery or die). An infectious individual has close contact with others at rate ( =n with any speci c individual) and an infectious individual recovers and becomes susceptible at rate . The transition rates for the induced Markov process are given in Section 8.2 where properties of the model are also presented. The basic reproduction number satis es R0 = = and the epidemic dies out quickly unless R0 > 1. Assume as in the previous paragraph that a perfect vaccine is available and that the proportion v of all individuals is vaccinated. This a ects the transition rate corresponding to infection: if there are i infectives the number of susceptibles is n(1 ; v) ; i (and not n ; i as it is prior to vaccination). However, if we replace n by n0 = n(1 ; v) and by 0 = (1 ; v ) the transition rates for the vaccinated community coincide with the rates of Section 8.2. From results of Section 8.2 it thus follows that the epidemic will surely become extinct if 0 = 1, which is equivalent to v 1 ; = . The critical vaccination coverage is therefore 1 vc = 1 ; = 1 ; R : 0 Prior to vaccination the proportion of susceptibles converges to = = 1=R0. If R0 is unknown it can hence be estimated by the inverse of xobs , the observed proportion of susceptibles for a community in its stationary state. An estimate for the critical vaccination coverage is thus vc = 1 ; xobs : ^ We see that this is the same estimate as in the SIR model with demography (12.3). The relation vc = 1 ; 1=R0 holds for a larger class of models including the models assuming homogeneous mixing and a homogeneous community. 12.3 Estimation of vaccine e cacy We now proceed to estimate the parameter r de ned in Section 12.1 as the relative reduction in the rate at which vaccinated individuals become infected by infectious individuals, compared with the rate for unvaccinated individuals (the rate for unvaccinated is =n and (1 ; r)=n for vaccinated). It is assumed that the infectivity of vaccinated individuals is unchanged. That is, if a vaccinated individual becomes infected she is infectious as she would have been if not vaccinated. We estimate r assuming that we observe the standard SIR epidemic in which part of the community is vaccinated and the rest are not. To keep things simple we assume a large community in which a negligible fraction is vaccinated. Alternative methods for estimating r can be derived from a matched pair study, for example with sibling pairs where one sibling is vaccinated and the other is not. Assume that a sample of size among the n susceptibles ( v = =n being negligible) were selected and vaccinated prior to the epidemic. After the epidemic outbreak has run its course, a proportion Zu of the unvaccinated and a proportion Zv of the vaccinated are found to have been infected. The initial proportion of infectives is assumed to be small. The overall proportion infected is Z = v Zv + (1 ; v )Zu Zu. According to Section 6.2, in case of a major outbreak (Zu Zv ) converges in probability to ( u v ), the positive solution of 1; 1; u v = = e; ( e;(1;r) vv +(1; v ) u ) ( v v +(1; v ) u ) : From these equations it follows that 1 ; r = log(1 ; v )= log(1 ; u). Replacing the limiting quantities by the corresponding observed values gives an estimate of r: (12.4) r = 1 ; log(1 ; Zv ) : ^ log(1 ; Zu) Up until now, we have not made use of the assumption that the proportion vaccinated v is small, and consequently the estimate is consistent without this assumption. A standard error for the estimate can also be derived for the general case using the central limit theorem of Ball and Clancy (1993) presented in Section 6.2. Here we treat the simpler case with v assumed to be small. This implies that Zu can be treated as deterministic in comparison with Zv . Because the proportion vaccinated is negligible these individuals do not a ect the overall proportion infected. Thus, each vaccinated individual behaves more or less independently and becomes infected with probability v = 1 ; e;(1;r) ( +(1; ) ). Consequently Zv is binomially distributed vv v u with parameters and v . Treating Zu as deterministic, we see from equation (12.4) that the variance of r equals the variance of log(1 ; Zv ) divided by (log(1 ; Zu))2. ^ Expanding log(1;Zv ) in a Taylor series, and applying the -method it follows that the variance of log(1 ; Zv ) approximately equals the variance of Zv divided by (1 ; v )2. Finally, the variance of Zv is v (1 ; v )= since Zv is binomially distributed, and this may be estimated by Zv (1 ; Zv )= . A consistent estimate of the standard deviation of r is thus ^ s:e:(^) = p r p Zv (1 ; Zv ) : (1 ; Zv )j log(1 ; Zu)j (12.5) As an example, suppose that in a large community 50% of the unvaccinated were infected while only 10% in the sample of = 200 vaccinated individuals were infected. This means Zu = 0:5 and Zv = 0:1. Equations (12.4) then gives r = 0:848 and ^ s:e:(^) = 0:034 from (12.5). r The parameter r measures the reduction in transmission which is a natural measure of the e cacy of a vaccine. Traditionally however, the de nition of the vaccine e cacy V E is V E = 1 ; Zv Zu using the terminology of the present section. This measure is not a very satisfactory single measure of the vaccine e cacy because it depends on the proportion vaccinated and on the community from which data come. That is, one would get a di erent estimate if a larger/smaller proportion were vaccinated prior to the epidemic, and also in a di erent community in which individuals mix at higher/lower rate. The parameter r is to be preferred, as it does not have these unsatisfactory properties. Exercises 12.1. Check that the formula for Rv , given in equation (12.1), actually gives the largest eigenvalue to the speci ed matrix. 12.2. As discussed in the introduction of Part II and in Exercise 9.4 the estimates of Part II can be modi ed to cover the case with initially immune individuals (for simplicity the text is written assuming no initially immune individuals). Suppose the proportion initially infectious is negligible ( 0) and that s is the initial proportion susceptible. Then it is not hard to show (simply by transforming parameters) that the ultimate proportion infected among the initially susceptibles converges in probability to a solution of the equation 1 ; e s to be compared with equation (4.2) for the case with no initially immunes (remember that = ). This implies that the estimates of the present section are not estimates of but of s. Derive estimates and con dence bounds for and vc assuming that the initial proportion susceptible s is known and where data consists of knowing the observed proportion infected among those initially suceptible. 12.3. Estimate and derive standard error for the critical vaccination coverage vc for the smallpox outbreak presented in Exercise 9.3. a) assuming complete data. b) assuming that only the nal state is observed (see Exercise 10.1). 12.4. Estimate vc for the Tristan da Cunha outbreak presented in Exercise 10.2. 12.5. Suppose that 10 years is the average age at infection of some childhood disease in a society with average life-length of 70 years. Assume that the SIR model with demography applies and derive an estimate of vc, the proportion necessary to vaccinate to surely prevent future outbreaks. 12.6. Suppose a hypothetical vaccine trial resulted in 16 infected individuals among the sample of 185 vaccinated individuals whereas 42% of the remaining (large number of) individuals were infected. Derive an estimate of the reduction parameter r and give a standard error for the estimate. ...
View Full Document

Ask a homework question - tutors are online