### EPI-820_Lect7_Stat_Inference

Course: EPI 820, Fall 2008
School: Michigan State University
Evidence-Based EPI-820 Medicine LECTURE 7: CLINICAL STATISTICAL INFERENCE Mat Reeves BVSc, PhD 1 Objectives Understand the theoretical underpinnings and the flaws associated with the current approach to clinical statistical testing (the frequentist approach). Understand the difference between testing and estimation Understand the advantages of the CI and the CI functions. Understand the logic of a Bayesian...

Evidence-Based EPI-820 Medicine LECTURE 7: CLINICAL STATISTICAL INFERENCE Mat Reeves BVSc, PhD 1 Objectives Understand the theoretical underpinnings and the flaws associated with the current approach to clinical statistical testing (the frequentist approach). Understand the difference between testing and estimation Understand the advantages of the CI and the CI functions. Understand the logic of a Bayesian Approach 2 Personal Statistical History. Post-DVM Clue-less. Sceptical of the role of statistics Thinks research = the search for P < 0.05 PhD Era: Increasing obsession with stat methods Lots of tools! SLR, ANOVA, MLR, LR, LL & Cox Thinks statistics = real science Post-PhD: Healthy scepticism for the way stats are used Stats = methods which have inherent limitations Not a substitute for clear scientific thought or understanding the scientific method 3 Review of Significance Tests Substantive hypothesis: Cows on BST will tend to gain weight Null hypothesis (Ho): the mean body wt. of cows trt with BST is not different from the mean body wt. of control cows Ux = Uy Alternative hypothesis (Ha): the mean body wt. of cows trt with BST is different from the mean body wt. of control cows Ux Uy 4 Review of Significance Tests - Logically, if Ho is refuted Ha is confirmed - investigator seeks to 'nullify' Ho Expt: 20 cows randomized to BST (X) and control (Y). Measure wt. gain. Calculate mean wt. change per group. 5 Review of Significance Tests Assumptions: i) Sample statistic (X - Y) is one instance of an infinitely large number of sample statistics obtained from an infinite number of replications of the expt., under the same conditions (frequentist assumption) ii) Populations are normally distributed, equal variance iii) The Ho is true 6 Review of Significance Tests (t-test) t = X Y S xy Where: N (0, 1) df = (n1 1) (n2 1) Sxy= ( 11 + ). S 2 n1 n2 = standard error of the difference between two independent means. S2 = estimate of pooled population variance - t may take on any value, no value is logically inconsistent with Ho! Smaller t values are more consistent with Ho being true. - all else equal, larger ns increase value of t (higher power). 7 Review of Significance Tests Large values of t indicate: i) test assumptions are true, a rare event has occurred ii) one of the assumptions of the test is false, and by convention it is assumed that the Ho is not true. - By convention, relative frequency of t where we decide to choose (ii) above as a logical conclusion is set to 5% (alpha level or significance level) - Expt: t = 2.55, p = 0.02, reject Ho - result is significant 8 Review of Significance Tests - Type 1 error (alpha), occurs 5% of the time when Ho is true - Type II error (beta), occurs B% of the time when Ho is false - Alpha and beta are inversely related - Fixing alpha at 5%, means Sp is 95% - Beta is not set 'a priori, hence Se (power) tends to be low - Scientific caution dictates that set alpha small - Scientific ignorance dictates we ignore beta! 9 Alpha and beta are inversely related 10 Relationship between diagnostic test result and disease status DISEASE PRESENT (D+) ABSENT (D-) POSITIVE (T+) TP ab cd FN FP PVP= a a+b PVN= d c+d TEST NEGATIVE (T-) TN Se= a/a + c Se= P(T+|D+) Sp= d/b + d Sp= P(T-|D-) 11 Relationship between significance test results and truth TRUTH Ho False Ho True REJECT Ho TP (1 - B) FP Type I (a) SIGNF. TEST ACCEPT Ho PVP= TP TP + FP FN Type II (B) Se= TP/TP + FN Se= Power (1 - B) TN (1 - a) PVN= TN TN + FN Sp= TN/TN + FP 12 Power - Probability of rejecting Ho when Ho is false - Se = TP/(TP + FN) or (1 - B) - Power is a function of: i) Alpha (increase by making Ha one sided i.e., Ux > Uy) (consistent with changing the cut-off value) ii) Reliability (as measured by SE of the difference) - Power increases with decreasing SE - SE decreases with increasing sample size (= decr variance) iii) Size of treatment effect 13 The Consequences of Low Power i) difficult to interpret negative results - truly no effect - expt unable to detect true difference ii) increase proportion of type 1 errors in literature iii) fail to identify many important associations iv) low power means low precision (indicated by the confidence interval) 14 Questions? What proportion of statistically significant findings published in the literature are false positive (Type 1) errors? What well known measure is this proportion? and, what elements does this figure therefore depend on? 15 Hypothetical outcomes of 500 experiments, a= 0.05, Power= 0.50, and 20% prevalence of false Hos TRUTH Ho FALSE Ho TRUE REJECT Ho 50 20 SIGNF. TEST ACCEPT Ho PV+ = 50/70 = 71% 50 100 Se = 50% 380 400 Sp = 95% 16 N = 500 If all signf. results published, 29% are Type 1 errors The P value - probability of obtaining a value of the test statistic (X) at least as large as the one observed, given the Ho is true - P (>=X | Ho true) Common Incorrect Interpretations - It is NOT P (Ho true|Data)!!! - We can never state the probability of a hypothesis being true! (under the frequentist approach) - The probability that the results were due to chance! 17 Criticisms of Significance Tests i) Decision vs Inference (Neyman-Pearson) - pioneers of modern statistics were interested in producing results enabled that decisions to be made - problem of automatic acceptance or rejection based on an arbitrary cutoff (P= 0.04 vs P=0.06) - results should adjust your degree of belief in a hypothesis rather than forcing you to accept an artificial dichotomy - "intellectual economy" 18 Criticisms of Significance Tests ii) Asymmetry of significance tests - frequently, the experimental data can be found to be consistent with a Ho of no effect or a Ho of a 20% increase - acceptance of both Ho's given the data leads to 2 very different conclusions! - asymmetry was recognized by Fisher, hence convention is to identify theory with the Ha but to test the Ho - Is there an effect? is the wrong question! Should ask: What is the size of the effect? 19 Criticisms of Significance Tests iii) Corroborative power of significance tests - Both Fisherian and Neyman-Pearson schools make no assumption about the prior probability of Ho - Both schools presume Ho is almost always false - rejection of Ho does nothing to illuminate which of the vast number of Has are supported by the data! - Failing to reject Ho does not prove Ho is true (Popper: 'we can falsify hypotheses but not confirm them') 20 Criticisms of Significance Tests iv) Effect size and significance tests - Test statistics and p values are a function of both effect size and sample size - Cannot infer size of an effect by inspection of the P value reporting P< 0.00001 has no scientific merit! - Highly significant results may be derived from trivial effects if sample size is large. - Confidence intervals give plausible range for the unknown popl parameter (signf tests show what the parameter is not!) 21 Relationship between the Size of the Sample and the Size of the P Value Example RCT: Intervention: new a/b for pneumonia. Outcome: Recovery Rate = % of patients in clinical recovery by 5 days Facts: Known = Existing drug of choice results in 35% recovery rate at 5 days Unknown = New drug improves recovery rate by 5% (to 40%) 22 P values Generated by RCT by Sample Size Sample Size (N = 2x) 100 500 600 700 800 1000 P value (Chi-square) 0.465 0.103 0.074 0.053 0.039 0.021 23 Conclusion? Significance testing should be abandoned and replaced with interval estimation (point estimate and CI)! Why? - not couched in pseudo-scientific hypothesis testing language - do not imply any decision making implications - give plausible range to unknown popl parameter - gives clue as to sample size (width of the CI) - avoids danger of inferring a large effect when result if highly significant 24 Interval estimation - view "experimentation" as a measurement exercise - want an unbiased, precise measure of effect - Point estimate: best estimate of the true effect, given the data (aka MLE) and it indicates the magnitude of effect (but is imprecise) - Confidence intervals indicate degree of precision of estimate. Represent a set of all possible values for the parameter that are consistent with the data - width of CI depends on variability and level of confidence (%) 25 Interval estimation - 90% CI: - 90% of such intervals will include the true unknown popl. parameter (necessary frequentist interpretation...

