{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cook's d - COMMUN STATIST.——THEORY METH 26(3...

Info iconThis preview shows pages 1–22. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 16
Background image of page 17

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 18
Background image of page 19

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 20
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: COMMUN. STATIST.——THEORY METH, 26(3), 525-546 (1997) THE DISTRIBUTION OF COOK'S D STATISTIC Keith E. Muller Mario Chen Mok Dept. of Biostatistics, CB#7400 Dept. of Biostatistics, CB#7400 University of North Carolina University of North Carolina Chapel Hill, North Carolina, 27599 Chapel Hill, North Carolina, 27599 Key Words and Phrases: regression diagnostics; influence; residual analysis ABSTRACT Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the‘ diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook's statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. 525 Copyright \l' I997 by Marcel Dekker. Inc. 526 MULLER AND CHEN MOK 1. INTRODUCTION 1.1 Motivation A wide variety of applications in the medical, social, and physical sciences use regression models with continuous predictors. Often the predictors may plausibly be assumed to follow a multivariate Gaussian distribution. For example, a paleontologist may wish to model total skeleton length of fossils of a particular species, as a function of sizes for a limited number of bones. Many diagnostics have been suggested to aid in evaluating the validity of such models. Most research in regression diagnostics has centered on the impact of deleting a single observation, with many different measures suggested. Cook (1977) recommended evaluating the standardized shift in the vector of estimated regression coefficients. He suggested comparing the statistic to the median of the F statistic for the test of all coefiicients equal to zero. Such highlighted observations men't further examination in terms of their credibility and also their implications for validity of the model assumptions. Belsley, Kuh, and Welsch (1980, p28) and Cook and Weisberg (1982, p114) discussed two alternatives for judging diagnostic statistics. Internal scaling involves judging a value with respect to the distribution in the sample at hand. External scaling involves judging a value with respect to the distribution that might occur over repeated samples. Both principles have merit in data analysis. A standard approach for a diagnostic with know sampling distribution, such as studentized residuals, involves three steps. First, highlight observations by reference to the sampling distribution. Second, investigate the highlighted observations values and role in the analysis. Third, decide on the disposition of the observation, in light of all knowledge about the data. Possible actions include doing nothing, correcting a discovered error, or deleting an impossible value. Data analysts first encountering p—values for regression diagnostics may hope to use them for automatic elimination of observations. Sophisticated analysts use the reference distributions to provide a common metric for the three step process (highlight, investigate, decide). Kleinbaum, Kupper, and Muller (1988, p201), in their introductory regression book, summarized their discussion of diagnostics by stating: “One should be cautioned that deleting the most deviant observations will in all cases slightly improve, and sometimes substantially improve, the fit of the model. One must be careful not to data snoop simply in order to polish the fit of the model by discarding troublesome data points.” DISTRIBUTION OF COOK’S D STATISTIC 527 Although conceptually attractive to some observers, Cook's statistic has not elicited universal enthusiasm. For example, Obenchain (1977) suggested ignoring the statistic and concentrating on its two components, the residual and the leverage. The difficulty in using the statistic stems from uncertainty as to what cut-point to use for highlighting troublesome observations. Our experience led us to the belief that the statistic flags only values already highlighted by residual analysis. Unpublished simulations (Chen Mok, 1993) confirmed the impression. The ability to compute quantiles for Cook's statistic based on, Gaussian predictors, described in §2, provides an accurate metric for the statistic and hence allows the diagnostic to consistently highlight values worthy of further examination. The new results in this paper also imply a framework and approach for describing distributions and other properties of other diagnostics. 1.2 Related Earlier Work Nearly all current regression texts consider regression diagnostics in some detail. Excellent book-lengthtreatments include, in chronological order, Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), Atkinson (1985), and Chatterjee and Hadi (1986). We consider two versions of the General Linear Univariate Model (GLUM) with iid Gaussian errors. For each observational unit the predictors will be assumed to be either a set of fixed values or to follow a multivariate Gaussian distribution. Sampson (1974) described the setting with fixed predictors as the conditionalmodel, and the setting with Gaussian predictors as the unconditional model. As detailed in §2, the distribution and interpretation of Cook's statistic depend directly on the distribution of the predictors. See Jensen and Ramirez (1996, 1997) for the distribution of Cook's statistic for fixed predictors. 2. DISTRIBUTION THEORY 2.1 Notation and Definitions In this section we present many standard results for regression diagnostics. Rather than cite a single source for each result, we recommend that the reader consult any of the book-length treatments just cited. LaMotte (1994) provided a “Rosetta Stone” for translating among the many names used for residuals. A number of standard distributions must be considered. In general, indicate the cumulative distribution function (CDF) of the random variable U, which depends on parameters 01 through ak, as Fu(t;a1...ak), with density 528 MULLER AND CHEN MOK fg(t;a1...ak) and pth quantile F51(p;a1...ak). For notational convenience write the CDF of U IV = v as FU|u(t; a1...ak). Resolution of conflict between random variable and matrix notation and the random or fixed nature of a variable will be specified when not obvious from context. Let N (p, 2) indicate a multivariate Gaussian vector, with mean p, non-singular covariance E, and CDF @(t; p,E). Most results in this paper involve )8, F, or fl random variables (Johnson and Kotz, Chapter 17, 1970a; Chapters 24 and 26, 1970b). Let x201) indicate a central x2 random variable on 1/ degrees of freedom, and let F(u1, V2) indicate a central F random variable on V1 and V2 degrees of freedom. Similarly let fi(n1,n2) indicate a ,6 random variable, with support (0,1). Most results for regression diagnostics concern fixed predictors, and hence the conditional model described by Sampson (1974). In particular, consider Xfi + = e . 2.1 Nit (qu)(qx1) N><1 ( ) Let y; indicate the ith row of y, X; the ith row of X, and e; the ith row of e. Here X contains fixed values, known conditionally on having designated the sampling units, 5 contains fixed unknown values, and F¢|x(t) = <I>(t; p, 2). Assume throughout that N > g and that X has fiill rank of q. Let u = (N — q) indicate the error degrees of freedom. Indicate the usual estimators as B = (X'X)-‘X'y, (2.2) a? = y’(I — H)y/V. (2.3) Define H = X(X’X)-1X', (2.4) the hat matrix because 3 = H y (Hoaglin and Welsch, 1978). Let h,- indicate the ith diagonal element of H, the leverage for the ith observation: h.- = X.(X'X)‘1X2. (2.5) Refer to '5 = (y — i?) (2.6) as the vector of residuals. Note that Fax“) = QB; 0,0’2(I —- H” . (2.7) DISTRIBUTION OF COOK’S D STATISTIC 529 In turn define the ith squared standardized residual as fi-i (28) ‘—fi2(l—hg)' ‘ Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982) and Atkinson (1985) reviewed the algebra of deletion and properties of residuals. Let (—1) indicate deletion of the ith observation and index the N statistics generated by doing so. Let X(-;) indicate the (N — 1) x q matrix created by deleting the ith row, with corresponding leverage h(_,-) = Xi(X(_I-)X(-;))‘1X£. The process creates sets of N estimates of fl, {3(4)}, predicted values, {17(4) = Xgflflj}, residuals, {3(4) = y,- —-’3}(_,-)}, and variance estimates, {3%)}. The resulting squared and standardized residual, the studentized residual, equals 0 e -: RM = 3f_,.)(1(—)h(_.)) (2.9) a? = 3(-i)(1 — hr) = R3 (VII—1:3) ' with Fggfllxu) = pm; 1, u — 1). (2.10) Cook's statistic measures the standardized shifi in predicted values and the shifi in 3 due to deleting the ith observation: (717(4) ‘ Wfilei) ‘ 3) W q . 0’ q ' 32 . (2.11) Furthermore _’£__ 9(1 - hr) = R? - Ci. 1),. = R3 . (2.12) Finding d such that Pr{D,~ > d} = a would provide a metric for Cook's statistic. This idea motivates the Current work. The results also provide a test of whether a 530 MULLER AND CHEN MOK particular D,- arose from the distribution of D,- implied by the GLUM assumptions. As highlighted in §l.1 and §4.3, the latter interpretation has more risks than benefits in practical use for the diagnostic setting. 2.2 The Distribution of Cook's Statistic for Fixed Predictors For fixed predictors C,- does not vary randomly. Hence, conditional on X, D,- = 0,- ~12? (2.13) = 0.- -v~fi[1/2,(u -1)/2l = Ci-fill/210/ - 1)/2]- Usually if i # 2" then 0,5 aé 0,4,. The value of C; does not vary randomly with fixed predictors, but does vary with the ith leverage, hi, and hence typically varies across sampling units. In order to provide a metric for judging Cook's statistic it would seem natural to eliminate the heterogeneity between sampling units which occurs with fixed predictors. However, doing so eliminates the variability due to C,- and makes D,- a simple multiple of R?, with no distinct information. At least with predictor values assigned by the experimenter, Obenchain's (1977) preference for considering the leverages and residuals separately seems appealling. See Jensen and Ramirez (1996, 1997) for a thorough treatment of fixed predictors. 2.3 The Distribution of Cook's Statistic for Gaussian Predictors Theorem. Let a0 = [q(N — 1)]‘1, a1 = (q — 1)N[qV(N — 1)]‘1, and to = max(ao, d/u). For (1 > 0 and Gaussian predictors Pr{D,~ 5 d} = 1 — [01944; ”; 1) > %}fc‘.(t)dt, (2.14) with corresponding density 00 d _ row) =/tna(;;%,” 2 1)(vt)“fc.(t)dt. (2.15) Here 0 t< a0 fa-(t) = { fp[(t— ao)a;‘;q— 1,140? 00 St. (2.16) Lemma 1. (Weisberg, 1985, p] 14) Conditional on knowing X (fixed X) R? =V~fi[1/2,(u— 1)/2]. (2.17) DISTRIBUTION OF COOK’S D STATISTIC 531 Lemma 2. A leverage value from a model containing an intercept and (q — 1) multivariate Gaussian predictors, with each row iid, equals a one-to-one function of an F random variable. Proof. Belsley, Kuh, and Welsch {p66, 1980) proved that , _ (h.- — Inn/(4 — 1) F‘ ’ (1 Tho/v Solving their result for h; yields hp _ m— 1>/u+ 1/N ‘ — 1+E(q~1)/V ' = F(-q— 1,11). (2.18) (2.19) Lemma 3. With Gaussian predictors, C,- = a0 + ang, so that Pr{C,- S t} = Pr{ao +a1F; S t} (2.20) = PT{Fi S (t - a(ll/‘11}, and t<ao 0 ’6‘“) = {ma - aim/am ~ Lula? 110 s t ' (“1) Proof. For Gaussian predictors the expression in (2.19) for h,- allows stating a _ E(q- 1)/V+1/N ‘ * 9(1-1/N) =ao+alFi- (2,22) Lemma 4. Let X. = XT, with T a full rank 9 x q matrix of constants. Note that T“ = (T')'1 = (T‘1)’. Then H does not vary due to this transformation of the predictors. Proof. Observe that H = X(X'X)“X' = XT[T‘1(X’X)"T“]T’X’ (2.23) = XT(T’X’XT)“T’X’ = Xs(X:Xt)—1X’u ‘ Corollary 4.1. H does not vary due to the covariance matrix of iid random predictors. Proof. Let 2, = FF’ indicate a factoring of the (q — 1) X (g — 1) covariance matrix of a row of random predictors, assumed full rank. Choosing T = [(1, 13.] (2.24) 532 MULLER AND CHEN MOK corresponds to considering a new model with predictors X . = X T. The model contains an intercept and q — 1 random predictors, with E,_ = I. Corollary 4.2. h, 2., 3?, Bi, 0., and D.- do not vary due to full rank transformation of the predictors or the covariance matrix of random predictors. Proof. Each quantity depends on X only through elements of H. Lemma 5. With Gaussian predictors F Ri-arihi (t) = F Ri—srl X(t). Proof. Consider Bi 1.) in terms of three pieces: (1 — hi), 3?”), and??. i) Obviously (1 — hi) depends on X only through hi. ii) Conditional on X, 3234) (u — 1) /a2 = x201 — 1), and does not depend on X. iii) nglxfl) = (I>[0, (1 — hflaz] and therefore Fax“) = Fawn (t) iv) Conditional on X, by the nature of deletion ’e‘? and 3%”) are statistically independent (LaMotte, 1994, example 1) and Fir/3f_;)IX(t) = Far 1: (t)F1 M4" x(t). v) Combining i) through iv) completes the proof. Corollary 5.1. With Gaussian predictors F Rig"... (t) = FRHX (t). Proof. Use the last line of (2.9) to write R? = u[(u — 1)/R(2_i) + 1] -1. Hence R? depends on X only through RE”), which depends on X only through hi. Corollary 5.2. With Gaussian predictors FRHC; (t) = FR?[X(t)- Proof. 0; = hg/[q(1 - hi)] and hence depends on X only through In. Proof of the Theorem. Use the law of total probability to state Pr{D.- > a} = /°3-’r{(R,-2|C',- = t) > d/t}fc,.(t)dt. (2.25) Equation (2.17) describes the distribution fiinction of R? conditional on X, which equals the distribution of R3 conditional on Ci, by Corollary 5.2. Combining the distribution in (2.17) with (2.25) allows concluding that 1 (150 °° 1 V—l d PT{Dr>d}= LPT{fi(§r 2 )>y—t}fc.-(t)dt O<d<aou °° 1 11—1 (1 [var{fi(§, 2 )>;}fci(t)dt aou<d Note that to = max(ao, d/u) and simplify. Finding the density requires .(2.26) differentiating each form in (2.26) separately, and recognizing that the lower limit depends on d. The two apparently distinct forms reduce to a single one upon noting that fpll; 1/2, (V — 1)/2] = 0. DISTRIBUTION OF COOK’S D STATISTIC 533 2.4 Computational Forms for Numerical Integration Although tantalizing in form, the integral for the CDF of D; does not allow closed form integration. Numerical integration allows accurate and convenient computation of Pr{D,- > d}. Both fimctions in the integral require carefiil consideration in order to produce a form amenable to computation. Among various forms considered, the ones used here provide the simplest proofs and least computational time for any level of accuracy, except perhaps for small values of Pr{D,- 5 (1}. Interest usually centers on large values of Pr{D,~ S d}. Two distinct representations create a finite region of integration which greatly simplifies numerical integration. First express the density of C,- in terms of an F. Ifu = (t - ao)/ai, so that t = alu + a0 and no = (to — ao)/a1 then Pr{D,- > d} = A 73r{fi[3 0" 1)]>—d—}fp(u;q — 1,V)du, (2.27) 2’ 2 u(a1u + co) or equivalently Pr{D-> d} — / ;r{F(1 u — 1)>————”‘—1——-—}f (u' — 1 u)du (2 28) I "o ) (a1u+ao)V/d—1 F ,9 1 ' - The relationship of F and ,6 random variables allows creating a finite region of integration. If 7. = (q — 1)u[u + (q — 1)u]‘1 then u = 2/(q — 1)‘lz(1 — z)—1 and 20 = (q — 1)uo[v + (q —- 1)uo]'1. Also let u—l 8(2) = [alt/(q — 1)‘1z(1 — z)“ + ao]V/d — 1 ' (2.29) With this transformation Pr{D,- > d} =/1Pr{F(1,u - 1) > 3(2)}f5 (2; q 3 1, g) dz. (2.30) A second useful representation results from applying the transformation in = u/(l + u) to the integral in (2.28). With wo = uo/(l + no) and 11—1 ho”) = [a1w(1 — w)-1 + ao]z//d — 1 (2.31) it follows that ;q — 1, u] (1 _1w)2dw.(2.32) (1-1”) 1 Pr{D,- > d}=/PT{F(1,u - 1)> h(w)}fp[ 534 MULLER AND CHEN MOK 2.5 Approximations Equation (2.27) allows recognizing that Pr{D,- > d} equals the expected value of a function of a random variable whenever to =ao. For fixed q lim d / V = Alim a0 = 0. Consequently the expected value interpretation holds, at N—too least asymptotically, in all cases. The accuracy of a series based on treating the integral as an expected value depends both on the remainder term and on any discrepancy due to d/V > do. Creating a two term Taylor's series approximation for (2.30) involves noting that EBKq — 1)/2, 11/2] = (q — l)/(V + q — 1). Ignoring any discrepancy due to d / V > 00 yields 11 —- 1 - d a: F — —— . 2.33 Pr{D,> } Pr{ (1,11 1)>[a1+ao]u/d—1} ( ) Applying a series expansion for an F random variable, using (2.27) or (2.28), requires 11 > 21: to insure finite kth moment. If V > 2 then £F[(q — 1),V] = u/(u - 2) and, ignoring discrepancy due to d / u > a0, a two term series equals 11—1 Pr{D,- s d} z Pr{F(1,V - 1) 5 W } . (2.34) For V 5 2, a one term F based expansion about the number 1 yields 11 — 1 D~ < d a: — < ——- . P'r{ ._ } PT{F(1,V 1)—[a1+ao]V/d—l}’ (235) which corresponds to the two term expansion for the ,6 representation (in 2.34). The approximate probability of (2.35) will never be greater than that of (2.34). The probability approximations imply approximations for quantiles of D,-: ~ 12— 1 ’1 d = [0.1 .m + ao]u[1 + —_———] . (2.36) P F; 1(12:1,11— 1) Here m = 1 for (2.34) and (2.36), or m = V/(V — 2) for (2.35). Assigning m the value of the median, F;](.50; q — 1,11), or mode, u(q — 3)/[(q -— 1)(u + 2)], for q > 3, also provides a one term approximation. One convenient form for creating a long series arises from (2.28): V—l PT{D,‘>d}=‘/u-OPT{F(1,V— 1))W }fF(U; q —- 1, V)du (2.37) DISTRIBUTION OF COOK’S D STATISTIC 535 °° d — 1 =/Pr{F(u — 1,1)3 9%}ffimq — 1,u)du no _. =/Pr{F(u — 1,1) g cm + co}fp(u;q — 1,u)du no =/P(U)fp(u; q - 1.u>du. no In turn c1u+ao P<°>(u) = / fp(s;u- 1,1)ds (2.33) 0 P(1)(u) = clfp(c1u + co; V —- 1,1) PU‘) (u) = cffgkclu + Co; v — 1,1). 2.6 Large Sample Properties The behavior of D.- in large samples merits separate consideration. The results have both analytic and computational value. Rather than study D,- directly, consider D.-. = u - Di. Then Pr{D,-. > (1.} = P'r{r/ - D,- > (1.} (2.39) = Pr{D,' > d./V} = Pr{D,- > d}, with d = d. /V. Using (2.28) the distribution function for D;. may be expressed as Pr{D;. > 11.} = mer{F(1,y — 1) > s.(d./V,u)}fp(u;q - 1,V)d-u, (2.40) with _____£”_“1)/_"_.__ Iu(1?)(7‘/NT1) +q—1(Nu_1 ]/d. — 1/1/ 110. = [to(d./u) — col/a1, and to(d./u) = max(ao,d./u2). Consider D;. as N —-> 00. In that case . . d. Ali-mos. (dim) = W = W (2.42) s.(d./u,u) = » (2-41) That Iim a4, = 0 and Iim d. /1/2 = O combine to imply Iim uo. = 0. Therefore N—ooo N—eoo N—‘co 536 MULLER AND CHEN MOK [gamma > d.) = (2.43) °° 2 d.q _ 2 _ - ._ fa Pr{x <1>>[u(q_1)+1]}<q 1mm 1>u,q lldu- Let w = (q -— 1)u, so that dw = (q — 1)du. Then d.q w+1 Nfioo lim Pr{D,-. > d.) = /mPr{X2(1) > }fxz(w; q — 1)dw. (2.44) ' o A Taylor's series about 5 W = (q — 1) yields the two term approximation Pr{D.. > d.} z Pr{x2(1) > d.}. (2.45) Also, with d = d./u, for large N Pr{D,- g d} z Pr{X2(1) g u - d}, (2.46) with corresponding quantile approximation 3,, z F§1(p;1)/V. (2.47) The F based approximation in (2.36) provides more accuracy, except in large samples. Additional terms are required for the approximation to vary with q. Three conclusions follow. First, as N increases D,- converges to a degenerate random variable with all mass at zero. Second, Di. converges to a non-degenerate random variable, Third, calculations of quantiles in terms of D,-. can greatly reduce numerical difficulties with large samples. 2.7 The Maximum of N Values of Cook's Statistic Fi...
View Full Document

{[ snackBarMessage ]}