This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Case-control Studies
Readings: Chapter 11, Oleckno EPID 301 Kristan Aronson March 18, 2010 Study Designs Case-control Study: Learning Objectives Understand: Subject Selection Exposure measurement Outcome measurement Effect measures Describe: Strengths Weaknesses Variants of case-control study Case-control Study: Key Features Analytic study Selection of subjects based on outcome status (i.e. diseased vs. non-diseased) Exposure ascertained retrospectively (usually) Cannot provide direct estimation of incidence of outcome Case-control study
More modest and a little bit riskier than a cohort "vintage", but you don't have to wait to drink it, it is much less expensive, and sometimes surprisingly good When to conduct a casecontrol study Long time from exposure to outcome Rare outcome: efficient design Outcome where medical care is sought (e.g. hip fracture, breast cancer) Case-control Design
Exposed Have the Outcome (Cases) Source Population Do not have the Outcome (Controls) Exposed Not exposed Not exposed Design Risk Factors for Breast Cancer e.g. sociodemographics, reproductive hx, family hx, Ocs, etc Exposed New breast cancer Not exposed Women in Kingston Exposed No breast cancer Not exposed Case-control studies: Source Population
Theoretically nested within an underlying cohort or base population Controls intended represent the exposure distribution in the underlying study base (not necessarily a random sample of the base) From study base, select cases and controls -> representative of the same base population Primary study base Base is defined by the population experience that the investigator targets Cases are the subjects in the base who develop the outcome Controls sampled to represent the underlying primary study base Controls represent exposure distribution in study base Source Population Defined to ensure cases and controls have similar characteristics with the exception of exposure status Eligibility Criteria Demographic characteristics Specific time-frame of recruitment Outcome defined a priori by accepted diagnostic criteria Source population Exposed Unexposed Source population Exposed Unexposed Cases Source population Exposed Unexposed Sample Cases Controls Source population Exposed Unexposed Sample Cases Controls = Sample of the denominator Representative with regard to exposure Controls Intuitively if the frequency of exposure is higher among cases than controls then the incidence rate will probably be higher among exposed than non-exposed. Selection of Cases: Incident vs. Prevalent Incident i.e. newly diagnosed cases Preferred as focus on risk of outcome, not survival with outcome Enables estimation of relative incidence of disease in exposed vs. unexposed But may take time to recruit Selection of Cases: Incident vs. Prevalent Prevalent i.e. cases with existing disease Practically easier as cases already available for study Enables estimation of relative prevalence of outcome in exposed vs. unexposed Unknown whether association due to risk of developing outcome or survival with outcome Selection of Cases: Source Hospitals Other Medical facilities Disease Registries General Population Ideal but often not impractical Selection of Controls Principle: Controls estimate the frequency and degree of exposure that would be expected if there is no association between exposure and outcome Selected from same population "at risk" of outcome as the cases (controls are individuals who would have become cases in your study had they developed the outcome during the study period) Level of exposure in control group should represent level in source population Must be able to measure exposure in comparable manner Characteristics similar to cases with the exception of exposure status Population-based controls The best control group is a random sample of individuals from same source population as the cases who have not developed the disease Population-based controls are the best way to ensure that the distribution of exposure among the controls is representative of the base population Methods: Random digit dialing or canvassing households Hospital/clinic controls Hospital controls are commonly used source Hospital controls may not be representative of exposure rates in the target population The use of other ill persons as controls will provide a valid result only if their illness is unrelated to the exposure in question Benefits of using hospital controls Convenient Cheaper Numerous Possibly higher response When a population-based case registry is not available, hospital controls may represent the sub-population from which the cases arose Selection of Controls: Source Depends on: Source of cases Cost Resources available Random or probability sample of source population Electoral lists, schools, insurance company lists, places of employment, military service Neighbourhood, schoolmates, fellow workers Friends, siblings Hospitalized: other patients except those with outcome Hospitalized: with specific diagnosis Selection of Controls:
Hospital Advantages Easy to identify High level of cooperation Comparable recall Disadvantages Unclear if represent source population Generalizability High cost/effort Potential overmatching Generalizability Population Best when cases population based Other (e.g. friends, relative, spouse) Highly motivated Less motivated to co-operate Selection of Controls: To Match or not to Match Selection of controls so that they are similar to cases in certain characteristics (e.g. age, sex, possibly other factors): only match on strong confounders, others can be controlled for in the analysis Theoretically accounts for confounding Cannot study association between matched characteristic and outcome Selection of Controls: Matching Individual Frequency (or group) But Overmatching reduces study power Additional costs Potentially cases excluded if no match found Selection of Controls: More than one control per case? Increases statistical power More than 4 controls per case? Increase in power negligible Not cost-effective Multiple control groups? Using controls from different sources may add credibility to results May also be logistically difficult and compromise results Measuring exposure Ideally, measure exposure in relation to biologically effective time window Assumption that exposure measured at time of study is same as when disease process began Sometimes can use existing records or biomarkers Consider: Presence/ Absence Duration Intensity Measuring exposure Methods: Questionnaire Existing records Lab tests (even biomarkers of exposure may be influenced by the
onset of the disease or may not represent the relevant exposure window) Physical measurements Clinical examinations / procedures Environment Genetic factors (do not change over time and the efficiency of a
case-control design results in genetic analysis of fewer samples) Effect Measures: 2 x 2 Table Analysis
Disease Exposed Unexposed No Disease A C A+C B D B+D A+B C+D N Odds that case exposed = A/C Odds that control exposed = B/D Odds Ratio = AD/BC Odds Ratio (OR) Ratio that measures odds of exposure for cases compared to controls e.g. study of the association between Diabetes Mellitus (exposure) and Cataract Formation (outcome)
Cataracts Cases Controls 55 552 607 84 1927 2011 Total 139 2479 2618 Diabetes Yes Mellitus No Total Odds that case exposed = A/C = 55/552 = 0.1 Odds that control exposed = B/D = 84/1927 = 0.04 Odds Ratio = (AD)/BC) = (55)(1927)/(84)(552) = 2.3 Odds Ratio (OR) Interpretation: The odds of exposure for cases is 2.3 times the odds of exposure for controls If interpreted as RR: Those with Diabetes Mellitus are 2.3 times more likely to develop cataracts than those without Diabetes Mellitus Internal Validity Selection Bias Information Bias Confounding Bias and confounding are types of systematic error in the design, conduct or analysis of an epidemiologic study that results in a parameter estimate that does not represent the true effect in the target population Internal validity
Selection bias differential access to the study population Information bias inaccuracy in measurement or classification Confounding bias unfair comparison Bias: Any trend in the collection, analysis, interpretation, publication or review of data that can lead to conclusions that are systematically different from the truth. (Last J. 2001) Selection bias in case-control studies: occurs when there is a systematic difference between the characteristics of those selected for the study and those who are not: must be according to BOTH exposure and outcome Occurs when exposure status of cases or controls influences whether or not they are included as subjects in the study For example: differential access to the study population according to exposure-disease groups Example of Selection Bias in case-control study: exposure affects the ascertainment of cases ("detection bias"): for example, doctors are more likely to suspect the presence of heart disease in someone who is overweight, hypertensive, and/or smokes cigarettes, so a middle-aged person with these characteristics who comes to a doctor complaining of chest pain may be more likely to have a thorough workup for angina pectoris than someone without heart disease risk factors Selection bias in case-control studies Case-control studies are highly vulnerable to selection bias, particularly in selection of the control group The purpose of the control group is to estimate exposure in the base population Selection bias results if control selection is not neutral with respect to exposure Bias Information bias: Lack of accurate exposure measurements and/or confounders Recall bias: Cases are more likely to search for a cause for their disease therefore are more likely to report an exposure NOTE: Recall error can exist without recall bias! Interviewers/Investigator bias: more probing of cases for exposure of interest Bias May be present without investigator being aware Sources may be difficult to identify Influence may be difficult to assess: magnitude and direction Advantages of c/c study Efficient for rare outcomes and when there is a long interval between exposure and outcome Can study multiple exposures and collect more detailed c information (may not be possible in cohort studies, where the large number of endpoints and number of subjects may limit exposures to those relevant to a single disease) Can be quick and easy to complete, relatively costeffective, rapid response to new exposures of interest Usually require smaller study population than cohort Disadvantages of c/c study Sometimes unable to ascertain temporality Cannot directly estimate incidence of outcome and therefore direct estimate of risk Retrospective exposure assessment is extremely challenging Not suitable for studying rare exposures Potential for biases, especially recall and selection Advantages of Case- Control Design
Relatively inexpensive Good for diseases with long latency Optimal for rare diseases Multiple etiologic factors evaluated for single disease Shorter time Smaller sample Limitations of Case- Control Design Identifying controls may be difficult Temporal relationship between exposure & disease difficult to establish Prone to more bias compared with other study designs Limitations of Case-Control Design Cont'd Difficult to determine representativeness of cases and controls Unless study is truly populationbased can't measure incidence of disease Inefficient for rare exposures (despite a large number of cases, may still end up with few exposed Comparing Study Designs
Compare exposed and unexposed individuals Case-control
Compare individuals with and without disease/outcome Cross-sectional
Determine presence/absence of exposure and outcome in study population Estimate prevalence of disease in exposed/unexposed and prevalence of exposure in persons with/without disease Study population studied at point in time Estimate incidence rates of disease/outcome in exposed and unexposed Cannot estimate incidence rates but estimate proportion exposed in those with and without the disease Follow large population over time Smaller study population studied retrospectively Comparing Study Designs
Relatively inexpensive Cross-sectional
Relatively inexpensive Able to establish temporality Desirable when exposures rare Establishing temporality can be challenging Desirable when disease/outcome rare Unable to establish temporality EXAMPLE Public Law (US) 103-43, June 10, 1993 The Director of the National Cancer Institute in collaboration with the Director of the National Institute of Environmental Health Sciences, shall conduct a case-control study to assess biological markers of environmental and other potential risk factors contributing to the incidence of breast cancer in:
A. the Counties of Nassau and Suffolk, in the State of New York, and B. 2 other counties with high rates (one in NY and one Connecticut) Principal Investigator and Team Members Marilie Gammon University of North Carolina Regina Santella Columbia University, New York, NY Mary Wolff Mount Sinai School of Medicine, New York, NY And many, many other collaborators and team members The Long Island Breast Cancer Study Project: Description of a Multi-Institutional Collaboration to Identify Environmental Risk Factors for Breast Cancer Breast Cancer Research and Treatment 74; 3: 2002 Study Population 1508 women newly diagnosed with breast cancer from New York cancer registry 1556 "population-based" control women without breast cancer from RandomDigit-Dialing and Medicare rosters Data Collection Protocol Women interviewed in homes about socio-demographics, reproductive history, diet, use of pesticides, medical history, use of hormones, family history of cancer, body size changes over lifetime, physical activity Women provided blood and urine samples Dust, tap water, and soil sampling Community Participation in the LI Study Town meetings held with community Served as Advisors on the case-control study and the Project as a whole Cancer Information Service outreach office on LI Advocates participated in peer-review of grants Continued participation of PI and NCI in LI network Findings: Most Established Risk Factors Confirmed (Magnitude of Increased Risk) Excess risk associated with Increasing age Family history of breast cancer First birth at late age (>28 years) Never having given birth Higher income No excess risk for Early age at menarche Higher education attainment Hypotheses: To Determine If . . . Organochlorines and polycyclic aromatic hydrocarbons are associated with an increased risk of breast cancer among women in Long Island Organochlorines Included pesticides DDT, DDE (a metabolite of DDT), chlordane, dieldrin and Polychlorinated biphenyls: chemicals found in coolants and lubricants in transformers, capacitors and other electrical equipment Polycylic Aromatic Hydrocarbons (PAHs) Caused by incomplete combustion of chemicals including: Diesel fuel Cigarette smoke Vehicle exhaust Smoked/grilled foods Why Study These Chemicals? Still ubiquitous in environment even though many of these compounds are no longer used Measurable levels in serum Persist in body for long time periods Previous Link to Cancer DDT and related metabolites cause liver cancer in rats DDT and PCBs have estrogenic activity in human tissues Estrogen is thought to be one of most important determinants of breast cancer PAHs cause breast cancer in rodents Organochlorines Were measured in blood Blood levels correspond well to levels in tissues where organochlorines are stored (fat tissues) Current levels now reflect cumulative levels throughout life Odds Ratio for Breast Cancer
Controlled for age, race, history of infertility problems, history of benign breast disease More Findings No dose response relationship No increased risk associated with organochlorines among women who Had not breastfed Were overweight Were post-menopausal Long term residents of LI Had invasive vs. in situ cancers Had estrogen-receptor positive vs. negative tumor Bottom Line Findings do not support the hypothesis that organochlorines increase breast cancer risk among Long Island women Polycylic Aromatic Hydrocarbons PAH DNA-adducts measured in blood PAH adducts are metabolites of PAH that have bound to DNA Considered to reflect combination of exposure and capacity to repair DNA PAH adducts now reflect exposures perhaps within the past 3 year, but not entirely known Odds Ratios for Breast Cancer
Controlled for age, race, history of infertility problems, season of blood donation, religion, parity, total # months of lactation, body mass, 1st degree family history of breast cancer, and age at first birth 2 Odds Ratio 1.5 1 0.5 0 Lowest 1/5 1.45 1 1.48 1.01 1.49 Middle 1/5 PAH-DNA Adduct Levels Highest 1/5 Conclusions About PAHs No dose-response relationship No consistent association with two main sources of PAH: active or passive cigarette smoking or eating grilled and smoked foods These findings need to be replicated in other studies 50% increased risk considered modest Smoking increases risk of lung cancer by 900-1000% A family history of breast cancer increases risk by 100-200% Analyze the data Source population
Cases Pop. E a P1 I 1 = a / P1 E c P0 I0 = c /P0 Source population
Cases Pop. E a P1 I1 = a / P1 E c P0 I0 = c /P0
= sample Cases Controls E a b P1 b --- = ---P0 d E c d Source population
Cases Pop. E a P1 I1 = a / P1 I0 = c /P0
= sample E c P0 Cases Controls E a b }
b d a/P1 a . P0 a.d a/c I1 / I0 = ------ = ------- = ----- = -----c/P0 c . P1 c . b b / d
Since d/b = P0 / P1 P1 --- = ---- E c d P0 Source population
Cases Pop. E 30 1000 Cohort Case control
30/10 = ------20/20 E 10 1000
I1 / I 0 30/1000 = ----------10/1000 =3 Cases E 30 20 E 10 20 Reminder: Use of Odds Ratios
Cases Exposed Not exposed Total a c a+c Controls b d b+d Total a+b c+d n The margins of this table do not represent the distribution of disease and exposure in the target population - Why? If there are no selection biases, the ratio of exposed to unexposed among the case and control groups is representative of what is happening in the target population - Why? (Exposure) Odds Ratio: (a/c) / (b/d) = (Disease) Odds Ratio: (a/b) / (c/d) Rare Disease Assumption
If disease is rare: OR RR
Disease No disease Exposed Unexposed a b c d RR=(a/a+b)/(c/c+d) If disease is rare: a+b b c+d d RR a/b/c/d = OR Controlling Confounding
Pair matching Frequency matching Stratification Multivariate analysis You may decide to match on variables that are easy to know up-front (e.g., age, sex) You may be able to stratify your analyses if the number of (additional) confounding variables is small Otherwise, multivariate analysis is the best choice Controlling Confounding
Stratification Separate sample into categories of confounding variable Calculate stratum-specific odds ratios Test that these OR's are homogeneous Calculate an adjusted odds ratio that combines the stratum specific odds ratios weighted by the number in each category Known as the Mantel-Haenszel odds ratio Confounding Example:
Is being male associated with a higher risk of malaria infection?
Cases Controls Males 88 68 82 150 Crude Odds Ratio gender effect: (88/68) / (62/82) = 1.71 Females 62 150 Outdoor Occupations
Cases Controls Males 53 15 3 18 Indoor Occupations
Cases Males 35 Controls 53 79 132 Females 10 63 Females 52 87 Stratum specific odds ratio: 1.06 Stratum specific odds ratio: 1.00 Confounding Example:
Is being male associated with a higher risk of malaria infection?
Final reportable result: the ORMH of the association between gender and malaria controlling for occupation is 1.01
Cases Controls Males 53 15 3 18 Males Indoor Occupations
Cases 35 Controls 53 79 132 Females 10 63 Females 52 87 Stratum specific odds ratio: 1.06 Stratum specific odds ratio: 1.00 Controlling Confounding
Multivariable Methods Efficient alternatives to stratification, which can only handle a very few confounding variables Uses specific mathematical models Multiple logistic regression is most common for case-control studies Example: Case-control study of Chlorination ByProducts and Colorectal Cancer
(King WD et al. Cancer Epi Biomarkers and Prevention 2000) Colon Cancer: Males
Trihalomethanes >=50 g/liter (yr) 0-9 years 10-19 years 20-34 years >=35 years Adjusted* OR 1.00 1.13 1.31 1.68 95% CI Ref 0.86 1.50 0.94 1.81 1.02 2.76 * Adjusted for age, sex, education, BMI, energy intake, cholesterol, calcium, alcohol, coffee Analysis/interpretation of Case-Control Studies: Identify subjects based on disease status Issues of bias, confounding and measurement error apply Selection of appropriate controls is crucial as they are meant to represent the exposure experience of the target population from which the cases arose Odds ratios are the main effect measure used for casecontrol studies Design can be matched to control for confounding or You can adjust for confounding by stratifying the data on the basis of the confounder or You can conduct multivariate analyses to simultaneously control for multiple confounders Defining and assessing effect modification Effect Modification
Along with confounding and bias, this is another key concept in epidemiology Occurs when the direction or magnitude of an association between the study exposure and outcome varies at different levels of a third factor (the effect modifier) Effect Modification - Absent
Modifier Exposure No No Yes Yes No Yes Incidence Rate
(per 1000 person years) Relative Risk 1.0 2.0 1.0 2.0 10.0 20.0 25.0 50.0 Effect Modification - Present
Modifier Exposure No No Yes Yes No Yes Incidence Rate
(per 1000 person years) Relative Risk 1.0 2.0 1.0 5.0 10.0 20.0 25.0 125.0 Association between asbestos and lung cancer: is it modified by smoking status?
No Asbestos Exposure Non-Smokers Smokers OR = 1.0 (reference) OR = 10.0 Asbestos Exposure OR = 5.0 OR = 50.0 This table reports the odds ratios for the association between asbestos and lung cancer stratified by smoking status. Are the results the same in non-smokers and smokers? What is the smoking-no asbestos estimate telling us? What do you think we mean when we say that the joint effect of smoking and asbestos is synergistic? Detecting Effect Modification
1. Observe stratum-specific odds ratios to see if they differ in magnitude or direction 2. Consider whether effect modification makes biologic sense 3. Perform a test of heterogeneity of the odds ratios 4. If effect modification exists, stratumspecific results should be reported When is effect modification present? Example: Case-control study of Chlorination ByProducts and Colorectal Cancer
(King WD et al. Cancer Epi Biomarkers and Prevention 2000) Colon Cancer
Trihalomethanes >=50 g/liter (yr) 0-9 years 10-19 years 20-34 years >=35 years Males OR* (95% CI) 1.00 (Ref) 1.13 (0.86 1.50) 1.31 (0.94 1.81) 1.68 (1.02 2.76) Females OR* (95% CI) 1.00 (Ref) 0.77 (0.55 1.07) 0.63 (0.42 0.93) 1.02 (0.55 1.90) * Adjusted for age, education, BMI, energy intake, cholesterol, calcium, alcohol, coffee Anticipating effect modification and its implications Anticipating Effect Modification
What variable modifies the association between: Lifestyle or environmental hazards and the development of cancer Oral contraceptive use and the occurrence of a myocardial infarction Effect Modification and Public Health Diabetes is a stronger risk factor for coronary heart disease in women than it is in men Which is the effect modifier, diabetes or sex? Implies a hierarchy among the independent variables From a preventive standpoint, the variable not amenable to modification is considered the effect modifier (e.g. a gene) Effect Modification
What are the Public Health Implications?
1. Influenza leads to serious complications, but those at highest risk are the young, elderly, and people with heart and lung disorders. The interaction between aspirin and age on risk of Reye's syndrome after an influenza-like illness. Both driving and alcohol consumption are risk factors for injury, but their combination is much more lethal than either on its own. 2. 3. Assessing study results for presence of confounding and effect modification Confounding versus Effect Modification
From Oleckno: Confounding is an annoyance that needs to be controlled so we can isolate the association between our exposure and outcome Effect modification is a real effect that helps elucidate the relationship between an exposure and outcome in the presence of other factors and therefore needs to be described How do we analyze our data for confounding and interaction?
Is the association between exposure X and disease Y confounded by one or more variables? Yes Is the adjusted association of similar magnitude in subgroups of the population? Yes Report the adjusted effect estimate No Effect modification present. Report the stratified (adjusted) effect estimates No Is the crude association of similar magnitude in subgroups of the population? Yes No Effect modification present. Report the stratified crude effect estimates Report the crude effect estimate Confounding and Interaction
Crude RR RRc 4.00 3.50 1.00 2.75 4.25 2.00 3.75 0.85 Stratum-Specific RR RR1 4.00 1.05 2.50 1.10 0.75 1.00 0.75 2.10 RR2 4.00 1.04 2.48 6.35 0.25 1.10 2.85 4.10 Confounding? None, + / Effect Modification? Present / Absent ...
View Full Document
This note was uploaded on 02/23/2012 for the course EPID 301 taught by Professor Richardson&aronson during the Spring '09 term at Queens University.
- Spring '09