### 04ProofOfTheGauss-MarkovTheorem(1)

Course: STAT 511, Spring 2011
School: Iowa State
Proof of the Gauss-Markov Theorem Suppose dy is any linear unbiased estimator other than the OLS ^ estimator C. ^ Need to show Var(dy) > Var(C). ^ ^ Can relate the two Var by writing Var(dy) = Var(dy - C + C) ^ ^ Var(dy) = Var(dy - C + C) ^ ^ ^ ^ = Var(dy - C) + Var(C) + 2 Cov(dy - C, C). Need to do two things: ^ ^ 1) show that Var(dy - C) > 0 unless dy = C ^ ^ 2) show that Cov(dy - C, C) = 0

of Proof the Gauss-Markov Theorem Suppose dy is any linear unbiased estimator other than the OLS ^ estimator C. ^ Need to show Var(dy) > Var(C). ^ ^ Can relate the two Var by writing Var(dy) = Var(dy - C + C) ^ ^ Var(dy) = Var(dy - C + C) ^ ^ ^ ^ = Var(dy - C) + Var(C) + 2 Cov(dy - C, C). Need to do two things: ^ ^ 1) show that Var(dy - C) > 0 unless dy = C ^ ^ 2) show that Cov(dy - C, C) = 0 Proof of the Gauss-Markov Theorem Gauss-Markov Th'm: ^ The OLS estimator, C, is the unique BLUE of C in GM model: y = X + , N(0, 2 I) ^ Need to show Var(C) is strictly less than the variance of any other linear unbiased estimator of C for all IRp and 2 IR+ . Outline of the proof: Consider some other linear unbiased estimator, dy. ^ ^ Write Var(dy) as Var[(dy - C) + C] ^ ^ ^ ^ Show Cov(dy - C, C) = 0 and Var(dy - C) > 0 unless dy = C ^ ^ unless dy = C. Hence, Var(dy) > Var(C) Copyright c 2011 Dept. of Statistics (Iowa State University) Statistics 511 1/4 Copyright 2011 c Dept. of Statistics (Iowa State University) Statistics 511 2/4 GM Proof: item 1 GM proof: item 2 ^ ^ Need to show Cov(dy - C, C) = 0. ^ ^ Cov(dy - C, C) = Cov(dy - y, y) = Cov[(d - )y, y] = (d - )Var(y) = 2 (d - ) = 2 (d - )X[(X X)- ] C The key is (d - )X in the middle of the above equ. Remember any linear unbiased estimator ky of C satisfies E(ky) = C IRp kX = C IRp kX = C. both dy and y are linear unbiased est. of C, so (d - )X = dX - X = C-C=0 ^ ^ Hence, Cov(dy - C, C) = 0. Statistics 511 3/4 Copyright c 2011 Dept. of Statistics (Iowa State University) Statistics 511 4/4 ^ and (X X)- X y are both solutions to the normal equations Because C is estimable, ^ C is the same for all solutions to the normal equations. ^ Thus, C = y where C(X X)- X , and = Var[(d - )y] = (d - )Var(y)(d - ) = (d - ) I(d - ) = 2 ||d - ||2 > 0. unless d = 2 ^ Var(dy - C) = Var(dy - y) Copyright c 2011 Dept. of Statistics (Iowa State University)
Proof of the Gauss-Markov TheoremGauss-Markov Th'm: ^ The OLS estimator, C, is the unique BLUE of C in GM model: y = X + , N(0, 2 I) ^ Need to show Var(C) is strictly less than the variance of any other linear unbiased estimator of C for all IRp and 2 IR
Estimating Estimable Functions of : 11 12 22 23 33 41 42An Examplecustomer1 2 3 4 Which movie is best? y= X +movie 1 2 3 4 1 ? ? 3 5 ? ? 3 3 1 ? Can we guess ratings for customer/movie combinations not in the dataset? = + 4 1 3 5 3 3 11 1 1 1 1 1
Estimating Estimable Functions of :An Example movie 1 2 3 4 1 ? ? 3 5 ? ? 3 3 1 ? customer i's rating of movie j + Ci + mj +ijcustomer1 2 3 4 =Can we guess ratings for customer/movie combinations not in the dataset? Which movie is best?YijYij=Cop
Alternative ParameterizationsFor example yij i = 1, 2, 3 j = 1, 2Recall that the Gauss-Markov Linear Model simply says that E(y) C(X) and Var(y) = 2 I for some 2 &gt; 0.Treatment Effects E(yij ) = + i 1 2 3 Cell Means E(yij ) = iThus, as long as C(X) = C
Inference Under the Normal Theory Gauss-Markov Linear Model Inference (cont.)Remember our dairy cow study (2 treatments, 3 reps per trt) and our questions (Introduction, slide 3) We've answered all questions except the last one: y 3) When does t = [(1 -
Inference Under the Normal Theory Gauss-Markov Linear ModelRemember our dairy cow study (2 treatments, 3 reps per trt) and our questions (Introduction, slide 3) We've answered all questions except the last one: 3) When does t = [(1 - 2 ) - (1 - 2 )] / s2
Practical Data AnalysisA not uncommon situation: A client brings you data from a food development study: 3 treatments:Old &quot;on market&quot; formulation of soup new formulation, Same salt content as old new formulation, Reduced salt contentThey recruited 25 p
Practical Data AnalysisPractical Data AnalysisHow would you analyze the data? (N.B. all analyses account for blocking / pairing of obs. within subject) 1. ANOVA F test of O = S = R , report p-value and trt. means 2. 3 paired t-tests: O = S , O = R , and
POWER OF THE F-TESTA very common consulting question: &quot;I'm planning a study to do .&quot;. How many replicates (per treatment) should I use? I know 5 ways that can be used to determine an appropriate sample sizeAs many as you can afford (time, money) n = 3 p
POWER OF THE F-TESTSuppose C is a q p matrix such that C is testable. Earlier, we established that the quadratic form incorporating ^ C - d has a non-central F distribution ^ F = (C - d) Fq,n-k where 2 = (C - d) [C(X X)- C ] 2-1 ( 2 )A very common cons
REDUCED vs. FULL MODEL F-TESTTests of C - d = 0 lead to F tests, but that's not the only way to an F test 500/402: big emphasis on model comparison SS in ANOVA tables usually explained in terms of model comparison e.g. SS for AB interaction is the differ
REDUCED vs. FULL MODEL F-TESTWhy does model comparison lead to F tests? If you test Ho using a C test or using a model comparison test, do you get the same answer? Again, will answer using a general setup (Normal GM model) y = X + , N(0, 2 I)Questions a
Equivalence of model comparison and Cb F testsSummary of results from Cb estimates and testsContinuation of the Storage time example from Part 9. Data: Storage Temperature 20 C 30 C 2 5 Time Ho: 6 6 7 7 Temp Ho: 16 9 12 15 These correspond to tests of:
Equivalence of model comparison and Cb F testsContinuation of the Storage time example from Part 9. Data: Storage Time 3 months 6 months Storage Temperature 20 C 30 C 2 5 6 6 7 7 9 12 15 16Copyright c 2011 Dept. of Statistics (Iowa State University)Sta
ANalysis Of VAriance (ANOVA) for a sequence of models Some examplesMultiple RegressionModel comparison can be generalized to a sequence of models (not just one full and one reduced model) N(0, 2 I)X1 = 1, X2 = [1, x1 ], X3 = [1, x1 , x2 ], . . . Xm = [
ANalysis Of VAriance (ANOVA) for a sequence of modelsModel comparison can be generalized to a sequence of models (not just one full and one reduced model) Context: usual nGM model: y = X + , Let X1 = 1 and Xm = X. But now, we have a sequence of models &quot;i
THE AITKEN MODELAnalysis of averagesExamples - 1y = X + , (0, 2 V)Identical to the Gauss-Markov Linear Model except that Var ( ) = 2 V instead of 2 I.V is assumed to be a known nonsingular Variance matrix.The Normal Theory Aitken Model adds an assu
THE AITKEN MODELy = X + , (0, 2 V)Identical to the Gauss-Markov Linear Model except that Var ( ) = 2 V instead of 2 I. V is assumed to be a known nonsingular Variance matrix. The Normal Theory Aitken Model adds an assumption of normality: N(0, 2 V) Obs
the bootstrapProcess optimization Many physical processes can be described (at least approximately) as quadratic functions of input variable(s)ExamplesWe've seen a lot about inference on C in a nGM (or nAitken) modelA huge number of questions can be a
the bootstrapWe've seen a lot about inference on C in a nGM (or nAitken) model A huge number of questions can be answered by appropriate choice of C key point is that C is a linear function of y What if quantity of interest is not a linear function of y?
Randomization/permutation testsBootstrapping preserves the fixed effectsRandomization / Permutation testsResampling from a single pool of observations tests HoR : F1 (x) = F2 (x)Notice a subtle point: HoR is slightly more general than Ho:1 = 2H0R is
Randomization / Permutation testsBootstrapping preserves the fixed effectsDifference of two means: resample Y1i and resample Y2i bootstrap estimates, 1B - 2B , are centered on/near Y1 - Y2 , ^ ^ which estimates 1 - 2 Regression bootstrap: ^ resample ^i
LINEAR MIXED-EFFECT MODELSSeedling weight in 2 genotype study from Aitken model section. Seedling weight measured on each seedling. Two (potential) sources of variation: among flats and among seedlings within a flat. Yijk = + i + Tij + Tij ijk ijkExamp
LINEAR MIXED-EFFECT MODELSStudies / data / models seen previously in 511 assumed a single source of &quot;error&quot; variation y = X + . are fixed constants (in the frequentist approach to inference) is the only random effect What if there are multiple sources of
Experimental Designs and LME'sOne example:LME models provide one way to model correlations among observationsVery useful for experimental designs where there is more than one size of experimental unitOr designs where the observation unit is not the sa
Experimental Designs and LME'sLME models provide one way to model correlations among observations Very useful for experimental designs where there is more than one size of experimental unit Or designs where the observation unit is not the same as the exp
THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELSThis is the commonly-used model for a CRD with t treatments, n experimental units per treatment, and m observations per experimental unit. We can write the model as y = X + Zu + , where X=[
THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELSA model for expt. data with subsampling yijk = + i + uij + eijk , (i = 1, ., t; j = 1, ., n; k = 1, ., m) = (, i , ., t ) , u = (u11 , u12 , ., utn ) , = (e111 , e112 , ., etnm ) , IRt+1 ,
Two approaches for E MSRCBD with random blocks and multiple obs. per blockijkYijk = + i + j + ij +where i cfw_1, . . . , B, j cfw_1, . . . , T, k cfw_1, . . . , N.with ANOVA table:Expected Mean Squares from two different sources Source 1: Searle (19
Two approaches for E MSRCBD with random blocks and multiple obs. per block Yijk = + i + j + ij +ijkwhere i cfw_1, . . . , B, j cfw_1, . . . , T, k cfw_1, . . . , N. with ANOVA table: Source Blocks Treatments BlockTrt Error C. total df B-1 T-1 (B-1)(T-1
ANOVA ANALYSIS OF A BALANCED SPLIT-PLOT EXPERIMENTFieldBlock 1 0 100 150 50 150 100 50 100 150 0 50Plot Genotype B0Genotype CGenotype AExample: the corn genotype and fertilization response studyBlock 2 150 100 Block 3 100 50 0 0 150 50 0 0Main pl
ANOVA ANALYSIS OF A BALANCED SPLIT-PLOT EXPERIMENTExample: the corn genotype and fertilization response study Main plots: genotypes, in blocks Split plots: fertilization 2 way factorial treatment structure split plot variability nested in main plot varia
IDENTIFYING AN APPROPRIATE MODELGiven a description of a study, how do you construct an appropriate model?Context: more than one size of e.u.A made-up example, intended to be complicated (but far from being the most complicated I've seen)A study of th
IDENTIFYING AN APPROPRIATE MODELGiven a description of a study, how do you construct an appropriate model? Context: more than one size of e.u. A made-up example, intended to be complicated (but far from being the most complicated I've seen) A study of th
MAXIMUM LIKELIHOOD and REML ESTIMATION IN THE GENERAL LINEAR MODELGiven a value of the parameter vector , f (w|) is a real-valued function of w.Suppose f (w|) is the probability density function (pdf ) or probability mass function (pmf ) of a random vec
MAXIMUM LIKELIHOOD and REML ESTIMATION IN THE GENERAL LINEAR MODELc 2011 Dept. Statistics (Iowa State University)Stat 511 section 211 / 23Suppose f (w|) is the probability density function (pdf ) or probability mass function (pmf ) of a random vector
Prediction of random variablesKey distinction between fixed and random effects:Estimate means of fixed effects Estimate variance of random effectsBut in some instances, want to predict FUTURE values of a random effectExample (from Efron and Morris, 19
Prediction of random variablesKey distinction between fixed and random effects:Estimate means of fixed effects Estimate variance of random effectsBut in some instances, want to predict FUTURE values of a random effect Example (from Efron and Morris, 19
A collection of potentially useful modelsWe've already seen two very common mixed models:for subsampling for designed experiments with multiple experimental unitsHere are three more general classes of modelsRandom coefficient models, aka multi-level m
A collection of potentially useful modelsA regression where all coefficients vary between groups Example: Strength of parachute lines.Random coefficient modelsWe've already seen two very common mixed models:for subsampling for designed experiments wit
Choosing among possible random effects structuresGoal is a model that:Fits the data reasonably well Is not too complicatedINFORMATION CRITERIA: AIC and BICSometimes random effects structure specified by the experimental designe.g. for experimental st
Choosing among possible random effects structuresSometimes random effects structure specified by the experimental designe.g. for experimental study, need a random effect for each e.u.Sometimes subject matter information informs the choicee.g. expect a
NONLINEAR MODELSSo far the models we have studied this semester have been linear in the sense that our model for the mean has been a linear function of the parameters. We have assumed E(y) = X f (Xi , ) = Xi is said to be linear in the parameters of beca
NONLINEAR MODELSFor example, if Xi1 = 1 Xi2 = Amount of fertilizer applied to plot i Xi3 = (Amount of fetrtilizer applied to plot i)2 Xi4 = log(Concentration of fungicide on plot i) f (Xi , ) = Xi = Xi1 1 + Xi2 2 + Xi3 3 + Xi4 4 = 1 + ferti 2 + fert2 3 +
GENERALIZED LINEAR MODELSConsider the normal theory Gauss-Markov linear model y = X + , N(0, 2 I). Does not have to be written as function + error Could specify distribution and model(s) for its parameters i.e., yi N(i , 2 ), where i = Xi for all i = 1,
GENERALIZED LINEAR MODELSConsider the normal theory Gauss-Markov linear model y = X + , N(0, 2 I).Does not have to be written as function + errorCould specify distribution and model(s) for its parametersIn each example, all responses are independent a
Logistic Regr. Model for Binomial Count DataBernoulli model appropriate for 0/1 response on an individual What if data are # events out of # trials per subject? Example: Toxicology study of the carcenogenicity of aflatoxicol.(from Ramsey and Schaefer, T
Logistic Regr. Model for Binomial Count Data Bernoulli model appropriate for 0/1 response on an individual0.8 What if data are # events out of # trials per subject? Example: Toxicology study of the carcenogenicity of aflatoxicol.0.6 0.4But, all f
Generalized Linear Mixed ModelsGLM + Mixed effects Goal: Add random effects or correlations among observations to a model where observations arise from a distribution in the exponential-scale family (other than the normal) Why:More than one source of va
Generalized Linear Mixed ModelsAnother look at the canonical LME: Y = X + Zu + Consider each level of variation separately. A hierarchical or multi-level model = X + Zu N(X, ZGZ ) Y| = + N(, ) Y|u = X + Zu + N(X + Zu, ) Above specifies the conditional di
Methods for large P, small N problemsRegression has (at least) three major purposes:1. Estimate coefficients in a pre-specified model 2. Discover an appropriate model 3. Predict values for new observationsRegression includes classification because clas
Nonparametric regression using smoothing splinesSmoothing is fitting a smooth curve to data in a scatterplot Will focus on two variables: Y and one X Our model: yi = f (xi ) + i , where 1 , 1 , . . . n are independent with mean 0 f is some unknown smooth
Nonparametric regression using smoothing splinesWhy estimate f ?Smoothing is fitting a smooth curve to data in a scatterplotWill focus on two variables: Y and one X yi = f (xi ) + i ,Our model:can see features of the relationship between X and Y that
Smoothing - part 2Next page: fitted penalized regression splines for 3 smoothing parameters: 0, 100, and 5.7 5.7 is the &quot;optimal&quot; choice, to be discussed shortly &quot;optimal&quot; curve is a sequence of straight lines continuous, but 1st derivative is not contin
Smoothing - part 26.5~0 100 5.7 Next page: fitted penalized regression splines for 3 smoothing parameters: 0, 100, and 5.76.0 5.55.7 is the &quot;optimal&quot; choice, to be discussed shortly&quot;optimal&quot; curve is a sequence of straight lines5.0 continuous, b
Smoothing - part 3Penalized splines is not the only way to estimate f (x) when y = f (x) + Two others are kernel smoothing and the Lowess (Loess) smoother I'll only talk about Lowess Penalized splines and Lowess have same goal. Lowess is more ad-hoc. Onl
A simple algorithm that doesn't work well: Penalized splines is not the only way to estimate f (x) when y = f (x) + Two others are kernel smoothing and the Lowess (Loess) smoother I'll only talk about Lowess Penalized splines and Lowess have same goal.Sm
Classification and Regression TreesWhat if you have many X variables? Could imagine estimating f (X1 , X2 , . . . , Xk ) But increasingly difficult beyond k = 2 or k = 3 &quot;Curse of dimensionality&quot; In high dimensions, every point is isolated (see next slid
EXAMPLE ANALYSIS OF AN UNBALANCED TWO-FACTOR EXPERIMENTAn experiment was conducted to study the effect of storage time and storage temperature on the amount of active ingredient present in a drug at the end of storage. Sixteen vials of the drug, each con
EXAMPLE ANALYSIS OF AN UNBALANCED TWO-FACTOR EXPERIMENTStorage Time 3 months 6 months 6 6 7 7 16 2 5 9 12 15 Storage Temperature 30 C 20 CAn experiment was conducted to study the effect of storage time and storage temperature on the amount of active ing
3000qresid(bacteria.lm)10002000q q q q q q q q q q q q q0-2000-1000qq-3000q010002000300040005000fitted(bacteria.lm)