chap1

# chap1 - CHAPTER 1 AIR POLLUTION AND PUBLIC HEALTH A CASE...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 1 AIR POLLUTION AND PUBLIC HEALTH: A CASE STUDY FOR REGRESSION ANALYSIS 1 2 Fundamental Equations Simple linear regression: yi = α + βxi + i, i = 1, ..., n. Multiple regression: p xij βj + i, yi = i = 1, ..., n. j =1 The i are random errors (typically mean 0, common variance σ 2) 3 QUESTIONS • Which variable should be y and which should be x? • Should either or both variables be transformed (for example, by taking logarithms) prior to analysis? • Is the relationship between y and x linear, or should some nonlinear function be considered? • Are there omitted variables whose inclusion might substantially change our conclusions about the form of the relationship? • Are there outliers or inﬂuential values among either the x or y variables which may be distorting the interpretation of the data? • Are the random errors truly independent, of equal variance, and normally distributed? • Finally and perhaps most problematically of all — having found a relationship between the two variables and tested its statistical signiﬁcance, can this be taken as evidence of a causal eﬀect? 4 Background on Air Pollution and Health It is widely recognized that high air pollution has a signiﬁcant public health impact. However there are many issues that are still debated in the context of present-day air pollution regulations: • Which precise pollutants are responsible, e.g. ozone, sulfur dioxide, ﬁne particulate matter, coarse particulate matter, etc.? • Is there a threshold eﬀect, i.e. a safe level below which air pollution essentially has no adverse consequences? • Which subsets of the population are most aﬀected, and in what ways (what kinds of deaths or other adverse reactions such as asthma)? • Exactly how should one quantify the overall eﬀect? 5 Early Studies — London in the 1950s • Very high air pollution levels (20–30 times current EPA standards?) and sharp rises in deaths (e.g. the December 1952 smog in London is generally reckoned to have caused 4,000 “excess deaths” in a 4–5 day period) • Graphs of daily deaths versus air pollution levels give some nice simple examples of linear regression relationships • Research on “what really happened” continues up to this day 6 7 8 Plots of data show: • Strong visible association between smoke levels (circles) and deaths (black dots) — suggestion of a one-day lag (Fig. 1.2) • A scatterplot shows a more direct relationship with high correlation (Fig. 1.3) • However, more detailed plots from a different smog event show a number of possible complicating factors, e.g. temperatures were also very low on the days when deaths were high. We can also see some distinction among diﬀerent age groups (Fig. 1.4) • Individual scatterplots give more information, but just looking at correlations could be misleading for determining the true causal relationship (Fig. 1.5) • Time series over several London winters show gradually decreasing deaths as smoke levels sharply decreased (Fig. 1.6) 9 • We can also look at scatterplots of the annual aggregated deaths (Fig. 1.7). Note the eﬀect of the outlier due to the very cold 1963 winter. 10 11 12 13 14 15 Modern Studies of Air Pollution and Health Modern studies apply many of the same methods of analysis to much larger data sets. For example, Kelsall et al.∗ analyzed a 14year series of daily mortality and air pollution in Philadelphia. They decomposed the observed daily time series of deaths into three components, (a) seasonal and long-term trend, (b) meteorology, (c) air pollution. They concluded that even though much of the variability can be attributed to (a) and (b), there is a strong enough residual eﬀect under (c) to conclude that air pollution has a very signiﬁcant eﬀect, even at levels of air pollution much lower than the infamous London smogs. ∗ J.E. Kelsall, J.M. Samet, S.L. Zeger and J. Xu, American Journal of Epidemiology 146, 750–762 16 The present study is based on a re-analysis of the same data and is intended to illuminate many of the issues, including some that remain controversial. Figure 1.8. Time series plot of weekly deaths in Philadelphia, with smoothed lowess curve. The vertical dotted lines are placed to indicate the ends of years. Figure 1.9. Scatterplots of daily deaths against four covariates, with ﬁtted subsample averages (the text will describe the details of how these were calculated). TSP is Total Suspended Particulates (this measure has now been replaced in most studies by more speciﬁc measures of particles in diﬀerent size ranges). SO2 is sulfur dioxide. 17 1974-1981 • 350 • • Deaths 300 250 200 • •• • •• • •• • • • • • • • •• • • •• •• • • •• • • • • •• •• •• ••• • • • •••••• •• • • • • •• • • • •• • • • • • ••• •• •• • •• •• •• • ••• • •• •• •• • ••• • •• ••••• •• • •• •• • • •• •• •• • • • ••••• •• •• • •• • • •• •••• • •••••• •• • • • • •••• •• ••• • • • • •• •• •• •••• •••••• •• • • •• •• •• • • • •• • • • •• ••• ••••••••• • •• • • • •• • •• • ••• • •••• •• •• • • • • •• • • • • •• • • • •• • ••• • • •••• • • ••••• ••••• • • • •• •••••••• • •••• •• • • • •• •• ••• •• • • •• • • • •• • •••• • • ••• •• • • •• • • •• • • • • •• • •• • •• • • • • • •••• •• •• 0 100 200 300 400 Week 1981-1988 350 Deaths 300 • • •• • •• •• • •• • •• • •• • • •• •• • • •• • • • •• • •• • •• •• • •• • • •••• • • • • • •• • • • • •• • • •• • •• • • • • • • • • •• • • • • • • • •• ••• • • •• ••• • •• • •••• • •• ••• • • ••• •• ••• • • ••••• • • • • • •••••• • •••• • • • • •• •• • • • • •• • •• ••• • • •• ••• • •• • •• •• • • • • • •• •••• •• •••••• •• •• •• • •••• • ••••• ••• • • • •• • • • •• ••••• ••••• • ••• • ••••• • •• •••• •• •• • • • • •••• •• • ••• •• •• •• •• •• ••• • • •• • • •• • • • ••• • • • •••• • • •• •• •• • • • •••• • •• •• • • • • •• •• • • ••• •• •• • • ••• • • • • • • • • • • 250 200 400 500 600 700 Week 18 (a) Deaths 50 40 30 20 . .. . . .. . . . ... . . . .. ... . . . . . . . . ... ... . . . . . . . . ... .......... .. . .. ... . . . . ...... . . . ....... .. . ....... .. .. ..... .. . ... . . .. .... ........... .. .. .... .. . . .. ... ... .. .. ..... ... . . . . .... . . . ...................... ... .. .... . ......... . .... ............ ...... . .... ... ..... . . . ... ...................................... .......... ................ . . .. . ........................................ ........ ... .. .... .... ......................................................... . . . ... .. .. . .. ........................................................................... . .... ... ................................................. .. . •........... ........ ... ........ ......... ... ......•.............•................................................................. ..................................................................... . ....................................................................... . .. ...... ..... ... . .. ....... . .. . ...... ...................•....................................................... .. • . ...................................•............................ ..•• • • • .. ....... .... . . . .. .. ... ....... ....... • ... .. .. .................................................................. .... ......................................................•.....•.......•.•........ . . ... .. •.. ........• .. ........................................••.......... ........................................................................ .... . .. . ............. .............................. ........... . . .. .... ......................................................... . ..... . . ..... ..... . .. ..................... . . .. ........ . ................................................................... .. . . .... . ....................................................... . ........ ............ .. .... .. . .. . . .. .......... ............... .. . . .... . . .. .. . ....... .. ............. . .. . . . . .. .. . . . ... . ....................... . ..... . . . .. . 0 20 40 60 60 50 Deaths 60 (b) 40 30 20 80 .. . . . ... . .. . .. .. . .... . . .. . . . . . ... .. ....... .. . .. . . .... . ...... .. .. ... . .. . .. . ... ............ .... ... .. .. .. . .. . . ... . . .. ... . . . . .. . . . .. ............................ . ..... . .. ... ..................................................................... . . . .. . . .. . . . .. ... .. . .. ..... ............................................. ......... . . . . . .. .. . . . ... ..... ..................... ....... .. ....... .... . . ... ......... ....... ... . .......... ... .. . .................................................................................. ............ ....... ...... .................... . . ... .. . . . . ..... .. ...... ............................................................................ . .. .................•....................................................... . .. .. ...•................................... ............... . ... . . ...................... ................................. . ... . • .. • .. ..... . .... ..... . . ...............................•..........•.................................... ... ...............•.....•........................................ . ... .......................................•.......................... . . • • • . ... ... • .. .. . . .. ................ ............•........... .. ..••.• .. . ..... .........................................•................. .. . .. .......................................................•..................... .... ......•.......... . ...................................................... . ..... .. . . ...................................................................... .. . . .... .. . . ... .... ...... . ..... ............................................. .. . ... . ................................................................ ...... . ... ... ..... .. . ....... .................................... . . .. . ....... ........................ . . . .. . . ..... .. ............ ........ . . . . .. . ... ... . . ... . . ... . ... .... ..... ........ .. . . .. . ... . ... -20 0 20 40 Temperature Deaths 50 40 30 20 (d) . .. . . . . . . ... . . . . . .. . . . . . . . .. .. .. . . . .. . .. . .. ... ... . .. ... . .. . . ..... . . . . . . .. . . .................... ........ ... . . ................. .... . .. . . . ......................................... ......... . .. . . ................. . .. .. .. .. .. ...... . . ............................................ .. . ... . ... ................................. . . . .......................................... . .. . .................................... ........ .. . .... .... .... ....... .. .. . .. .. . ............................................................... .... . . . . .................................. ... ... . . .. ..... . . ........................................................ .. ... ... . .... ..... ......................... ...... ... . . .. ......................................................•........... . ... ................. •............... . . ... . . .. . ••.••................•.. . .. ... .. . ................•••.....•...•..............•............... .. .......••......• •. .•• . . .• ............................................ ..... . . . .. ....................................................... . . . . . .. .. .. .... .............................................. ..... . . .................................. . ................................................. .. ... . . .. .. ... .. ... ... . .................. . ........................................... . . .. . ... ..... ..................................... . . . . ................................. ...... . .. . .... .......... ........ . . . .. . . . . ............................... .... . .. . . ........... .... . . . .. 50 100 TSP 150 200 60 50 Deaths 60 80 Dewpoint (c) 60 40 30 20 .. .. . .. . .. .. . . . .. ....... . .. . . . ..... ...... .... . .. . .. . .. ..... ..... .. . ...... . .. . ................... .. . . . . .... .. . ........ ....... . .. . ........................................... . . . . . . ... ... . . . .. . .. . . . ............................. ...... ......... .. ...... . . ...................... ..... ..... . . ......... ... .... ...... ............................................. .. . . .......................................... ... .. . . ... .. .. . . .................................................. . .... . .. ...................................... . . . . .................................... . .• ... .. . .. ...... ........... . . .. . . .................•....•.•........•............... .. . . . .. •.. ...........................•..... ... . . . ....••..••.•• . .. . ............•...................... . . .. ............................................. ........ . . .................................. ... . ....... .•...•......... ..... ..... .. ..•...••... ... ......•..................................... ... . .. ........................................... . . . ................... . .. ................................. ..... ... . . . ..... .. .. ... ....................................... . .............................. . . . ................ ...... ..... .... .. ............................. . . .... . . ................. . ....................... . . . . ...... .................. . .... . . . . 0 20 40 60 SO2 19 . . . . .. . . . .. 80 100 Outline of analysis (more details in Chapter 5): • The y variable was taken to be the square root of daily death count. The square root transformation is motivated in part by the fact that if the death counts have a Poisson distribution, which seems a reasonable intuitive assumption, then taking square roots stabilizes the variances. See Section 5.3 for a more detailed discussion of transformations. • Seasonal trend was handled by representing the smooth curve in Fig. 1.8 as a linear combination of 180 ﬁxed basis functions. Since we believe that the acute eﬀects of interest occur within at most one week of a high-pollution event, these seasonal covariates should not confound the pollution eﬀect. 20 • Meteorological eﬀects were handled by introducing linear and quadratic terms for both temperature and dewpoint, with additional terms for the range above 75oF and 60oF respectively. Moreover, to allow for delayed eﬀects of up to four days, the current day’s values were supplemented by those lagged from one to four days. A variable selection was performed to remove insigniﬁcant terms which could still confound the pollution eﬀects we are trying to measure. Section 5.2 in Chapter 5 has more detail on variable selection. • Five pollutants were introduced both singly and in combination. Since we do not know which lags of pollutant are most relevant, we used all possible lags between 0 and 4 days, as well as averages of 2, 3, 4 and 5 consecutive days within this period. Among 21 the 15 possible “exposure measures” thus generated, the one with the most significant eﬀect was selected and used in the subsequent analysis. Possible “data snooping” criticisms of this. • • • • • Conclusions The most signiﬁcant exposure measure for TSP is current day’s value. The t value (estimate divided by standard error) is 3.1, which is signiﬁcant at the (two-sided) level of .002. The most signiﬁcant exposure measure for SO2 is current day’s value, with t = 3.3. The most signiﬁcant exposure measure for NO2 is the 4-day lagged value, with t = 2.1. The most signiﬁcant exposure measure for CO is the average of the 3-day and 4-day lags, with t = 2.8. The most signiﬁcant exposure measure for O3 is the average of the current day’s value with those for lags 1 and 2. This leads to t = 2.9. All of these are statistically signiﬁcant at the 5% level, but note the “selection eﬀect” of comparing diﬀerent measures of exposure and only picking out the most signiﬁcant. 22 When diﬀerent combinations of the variables are included, the results change. For instance, with TSP and SO2 together, the t statistics are 1.5 (TSP) and 1.4 (SO2). Other analyses are: TSP, O3 together: t=2.8 (TSP), 2.9 (O3). TSP, SO2, O3 together: t=1.1, 1.6, 2.9. TSP, SO2, NO2, CO, O3 all included: t =1.2, 1.7, 0.4, 2.1, 2.7. 23 We can also look for nonlinear relationships between airpollution and mortality, which are ﬁtted by more complicated versions of regression analysis. For example, Fig. 1.10 shows some possible piecewise linear relationships between TSP and mortality, suggesting higher slopes at higher TSP levels (relevant to determination of standards). The conclusions about multiple pollutants and nonlinear relationships suggest that the truth about how air pollution aﬀects mortality may be more complicated than a simple linear relationship based on a single pollutant would suggest. 24 (a) (b) 0.04 0.02 0.02 Effect Effect 0.04 0.0 0.0 -0.02 -0.02 0 20 40 60 TSP 80 100 0 20 40 60 80 100 TSP 25 Other types of study lead to diﬀerent types of conclusion. For example, Fig. 1.11 is based on a famous prospective study∗ in which adjusted mortality rate was plotted against median level of ﬁne particulate matter in each of 51 cities. The diﬃcultes here include that measurement of particulate matter is very imprecise if we are only using a single value to represent a long period of study, and there are many other possible reasons why the deaths rate in these cities might diﬀer (ecological bias problem). ∗ Pope, C.A., Thun, M.J., Namboodiri, M.M., Dockery, D.W., Evans, J.S., Speizer, F.E. and Heath, C.W. (1995), Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. Am. J. Respir. Crit. Care Med. 151, 669–674 26 27 Summary of Chapter Air pollution is a major public health issue The relationship between air pollution (particulate matter, sulfur dioxide, etc.) can be examined through regression relationships Simple analyses — individual-day analyses of high pollution episodes, analyses based on annual summary statistics from long time series — are interesting exercises but don’t really help understand the full phenomenon Therefore, in recent years attention has move to the daily analyses of long time series which are much more informative, but also pose many complicated problems of interpretation There are other types of data sets (e.g. prospective studies) that add to the information (and the confusion) 28 ...
View Full Document

## This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.

Ask a homework question - tutors are online