L5handout - PUBH 7430 Lecture 5 J. Wolfson Division of...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: PUBH 7430 Lecture 5 J. Wolfson Division of Biostatistics University of Minnesota School of Public Health September 20, 2011 The linear predictor We will often write down models like Y = Xβ + or E (Y) = Xβ or (later) f (E (Y)) = Xβ Xβ is called the linear predictor The linear predictor, cont’d. Breaking down the linear predictor: • X is the “stacked” covariate matrix of dimension (N × p ) • β is a coefficient vector of length p Hence Xβ is a vector of length N , whose entries are a linear combination of the columns of X: p x11 · β r =1 x11r βr p x12 · β r =1 x12r βr Xβ = = . . . . . . p xKnK · β r =1 xKnK r βr Note: If the model has an intercept (standard for most models), the first column of X will be a vector of ones: [1, 1, 1, . . . , 1]. Recall: Scatterplot smoothing Goal Estimate the underlying mean response curve µ in the model Yi = µ(ti ) + i Kernel smoothing Define µ(t ) = ˆ n i =1 K [(t − ti )/h ]yi n i =1 K [(t − ti )/h ] for some kernel function K and bandwidth h. Kernel function • “Boxcar” kernel: straight average of points within a window • Gaussian kernel: K (u ) = exp (−0.5u 2 ), weight decays exponentially with distance from t Boxcar kernel Gaussian kernel 8 10 12 Age 14 16 18 1.5 1.0 Log(FEV) 0.5 1.0 Log(FEV) 0.5 0.0 6 q q q q qq q qq qq q q q q q q q q q qqq q qq qq q q q qq q q q q q q qq q qq q q q q qq qqqqqqqqqq qq qq q qq q q q q qq q q q qqqq q qq q q q qq q qqq q q qqqqqqqqq q qq q q q q q q qqq q q q q qq q q qq q q q q qqq q qq qq q q qq qq q qq qqq q qq q q qq qqqqq q qqq q q qqq qqq q q q q qq q qq qqq q qq qq qqqqqqqqqq qqqqq q q qq q q q qq q q q q q q qqqqqqqqqq qq qqqqq qqq q qq q q qq q q qqq qqq qq q q q q qq qqqqqq qqqqq qqqqqqqqq q qq q q q qqq qq qqqqq qq q q q q qq q q q qq qqqqq qq q qqq q qqq q q q q qqqqqq qqqqqqqqqqqqqqqqqqq qqq qq qqqqqqqqqq qqq qqqq q q q q qqqq q qqqq qq qq qqq qqq q qq qq qqqq q qq q q q q qq q qq qqq qq qqq qqq qqqqqq q q q qq qq q q q qq q q q q qqqq q q qq q qq qq q qq q q q qq q qq qq qq qqq qqqq qqqqqq qq qqqqq q qq qq qqqq q q qqq qqq q q q q qq q qqqq q q q q q q q q q q qqqq q q q q q q qq qq q qqqqqqqqq qqqq q qq qq q q q q q q q q qq q q qq q q qqq q q qqqq q q q qq q q q qqqqqqq qqq qqq qq q q qqqqq q q qqqq qq q q qq qq q q qq q q qqq q q q qqq q q q q q q q q q q qqqqqqqq q q q qq qq q qq q qq q q qqq q qqqqqqqqqqqqq q qq q q q q qqq q q qq qq q q q q q q q q q q q q q qq q qq q qqqqq q qqq q qq qq qq q qqqqqq q q qq q qq qq qqqqqqqqqqqqqq q qq q qqq q q q q q q q q qqqqqqqq qq q q q q q q q qqqqq q qq q q q qqq qq q q q qqq q qq qqqq q q q q q qqq q q qq q qq q q qqqq qqqqqqqq q q q qq qqqqq qqqqq q q qq q qq qq qqq qq qqqqq q q q q qq q q q q qqq q q q qq q q q qq q qqq q q q q qq q q q q qq qq qqqq qqq q qq q q q q qqq q qqqqqqq q qqq qqqqq q q q qq q q qqqqqqqqqqq qqqqqqqq q q qqqqqqqq q qq q q q qq qqqqq qq q qqqqqq q q qqqqq q q q qqqqq q qq q q q q q q q qq q q qq qq q q q q q qq qqqqqqqq qq q qq q q q q qq qqq q qq q qqqqqqqqqqq qqqqq qqq q qqqqqq q q q qq q qqqq q q q qqqqqqqqqqqqqqqq q q q q qq qq q q qqq q q q q q qqqqq qqqqqqq q qq qq q q q q qq qqqqqq q q q q q q q qq q q q q q qqq q qqq q q q qq q q q q qqqqqq q q qq q q q qq qqqqq q qq q qq q q qq qq q q qqqqqq qq q q q q q q qqq q qq q q qqq qqqqqqq q q qq q q q qqq qq q q q qqqqq qqqq qq qqq qqq q qqqqqqq qq q qqqqqqq qq q q q q qqqqq qqq qqq q q q qq q q qqqqqqqq qqq qq qq q q qq q q q qq q qqqqqq qq qqq qq q qq q q q qqqq q q q q q qqq q q qq q qqqqq q qq qq q q q q qq q qqq q q qq q qq q q qq q qqq q q q qq qq q qq q q q qqqqqq qq q q qqqq q q qq q q qq q qqqqq q q qqq q q q q q qq qq qq qq qq q q q q q q q qqq qqq q q q qq q q qq qqq q q qq qq q q qq q qq qqq q q q qq qqq q qq q qq q q qq q q qq q q 0.0 1.5 q q q q qq q qq qq q q q q q q q q q q qqq q qq qq q q q qq q q q q q q qq q qq q q q q q qq qqqqqqqqqq qq qq q qq q q q q qq q q q qqqq q qq q q q qq q qqq q q qqqqqqqqq q qq q q q q q q qqq q q q q qq q q qq q q q q qqq q qq qq q q qq qq q qq qqq q qq q q qq qqqqq q qqq q q qqq qqq q q q q q qq q qq qqq q qq qq qqqqqq qq q qqqqq q q q q q qqq q q q q qq q q q qqqqqqq qqqqq qqqqq qqq qq qq q q qq q q qqq qqq q q qq qqqqqq qqqqq qqqqqqqqq q qq qq q q qqq qq qqqqq qq q q q q qq q q q qq qqqqq qq q qqq q qqq q q qqqqqq qqqqqqqqqqqqqqqqqqq qqq qq qqqqqqqqqq qqq qqqq q q q q qqqq q qqqq qq qq qqq q q qq q q qq qq qqqq q qq q q q q qq q qq qqq qq qqq qqq qqqqqq q q q qq qq q q q qq q q q q qqqq q q qq q qq qq q qq q q q qq q qq qq qq qqq qqqq qqqqqq qq qqqqq q qq qq qqqq q q qqq qqq q q q q qq q qqqq q q q q q q q q q q qqqq q q q q q q qq qq q qqqqqqqqq qqqq q qq qq q q q q q q q q qq q q qq q q qqq q q qqqq q q q qq q q q qqqqqqq qqq qqq qq q q qqqqq q q qqqq qq q q qq qq q q qq q q qqq q q q qqq q q q q q q q q q q qqqqqqqq q q q qq qq q qq q qq q q qqq q qqqqqqqqqqqqq q qq q q q q qqq q q qq qq q q q q q q q q q q q q q qq q qq q qqqqq q qqq q qq qq qq q qqqqqq q q qq q qq qq qqqqqqqqqqqqqq q qq q qqq q q q q q q q q qqqqqqqq qq q q q q q q q qqqqq q qq q q q qqq qq q q q qqq q qq qqqq q q q q q qqq q q qq q qq q q qqqq qqqqqqqq q q q qq qqqqq qqqqq q q qq q qq qq qqq qq qqqqq q q q q qq q q q q qqq q q q qq q q q qq q qqq q q q q qq q q q q qq qq qqqq qqq q qq q q q q qqq q qqqqqqq q qqq qqqqq q q q qq q q qqqqqqqqqqq qqqqqqqq q q qqqqqqqq q qq q q q qq qqqqq qq q qqqqqq q q qqqqq q q q qqqqq q qq q q q q q q q qq q q qq qq q q q q q qq qqqqqqqq qq q qq q q q q qq qqq q qq q qqqqqqqqqqq qqqqq qqq q qqqqqq q q q qq q qqqq q q q qqqqqqqqqqqqqqqq q q q q qq qq q q qqq q q q q q qqqqq qqqqqqq q qq qq q q q q qq qqqqqq q q q q q q q qq q q q q q qqq q qqq q q q qq q q q q qqqqqq q q qq q q q qq qqqqq q qq q qq q q qq qq q q qqqqqq qq q q q q q q qqq q qq q q qqq qqqqqqq q q qq q q q qqq qq q q q qqqqq qqqq qq qqq qqq q qqqqqqq qq q qqqqqqq qq q q q q qqqqq qqq qqq q q q qq q q qqqqqqqq qqq qq qq q q qq q q q qq q qqqqqq qq qqq qq q qq q q q qqqq q q q q q qqq q q qq q qqqqq q qq qq q q q q qq q qqq q q qq q qq q q qq q qqq q q q qq qq q qq q q q qqqqqq qq q q qqqq q q qq q q qq q qqqqq q q qqq q q q q q qq qq qq qq qq q q q q q q q qqq qqq q q q qq q q qq qqq q q qq qq q q qq q qq qqq q q q qq qqq q qq q qq q q qq q q qq 6 8 10 12 Age 14 16 18 Bandwidth • Wider windows (larger bandwidths) give smoother estimates • Narrower windows (smaller bandwidths) give bumpier estimates Large bandwidth Small bandwidth q q q qq q qq qq q q q q q q q q q qqq q qq qq q q q qq q q q q q q q q qq q q q q q q qq qqqqqqqqqq qq q q q qq q q q q q qq q q q qqqq q qq q q q qq qq q qqq q q qqqqqqqqq q qq q qq q q q qqqq qq q q q q q q q qqq q qq q qq q qq q qq q q qqq qqq q qq q qqq q q q qqq q q qq qq q q qqqq q q q qqqq q qqqqqq q qq qqqq qq q qqqqqq qqq q q q qq qq q q q q q q q qq q qqqqq qqqq qqq q q qq qqqq qq qqqqq qqqqqqqqq qqqq q q q q qqq qqqqqqqqqq q qq qqq qq q qq qqqqq qqqqq qqqqq qq q qq q qqqq qq qq q qqqqq qqq q qqq q q q qq q q qqqqqq qqq qqqqqqqqqqqqq q qqq qq qq q qqqqq qq q q q q q qq qq qqqq q q q qq q qq q q q q q qq q qq q q qq q qq qqq qqqqq qqq qqqqqq q q q qq qq qq q qq q q q q qqqq q q qq q qq qq q qq q q q qq q qq qq qq qqq qqqq qqqqqq qq qqqqq q qq qq qqqq q qq qq qqq q q q q qq q qqqq q qqq q q q q q q q qqqq qq q q q q q qq qq q qqqqqqqqq qqqq q qq qq q q q q q q q q qq q q q qq q qqqqqq qqqqqq q q qq q q q qqqqqqq qqq qqq qq q q qqqqq q q qqqq qq q q qq qq qq q q q q q qqq q q q qqq q q q q q q q q q q qqqqqqqq q q q qq qq q qq q qq q q qqq q qqqqqqqqqqqqq q qq q q q q qqq q q qq qq q q q q q q q qq q q q q qq q q qqqq q qqq qq q q qq q q qqqqqq q q q qq q qq qq qqqqqqqqqqqqqq q qq q qqq q q q q q q q q qqq qq qqq qq qqqqqqqq qq q q qq qq q qq qq q qqqq qq qq q q q qq qq q qq q q q qq q q q qq q qq q qq q q qq qq qqqq q qq q q qq qqqqq qqq qq q q q q qq q qqq qq qqq qqq q q q qq q q q q qqq q q qq q q q qq q qqq q q q q q q q qq qq qqqq qqq q qq qq q q q qq qqq q q q q qqqqqqqq qqq qqqqq q q qq q q qq q q qqqqqqqqqqq qqqqqqqq q q qqqqqqqq q qq q q qq qq q q qq q qqqqqq q q qqqqq q q q qqqqq q qq q q q q q q q qq q q q q q q qq qqqqqqqq qq q qq q qq q q qqq qqq q qq q q qqqqqqqqqqq qqqqq qqq q qqqqqqqqq q qq q q q q qq q qq q qq qq q qqqqqqqqqqq q q q qq q qqqqqqqqq qqqqqq qq q q q qq qqqqqqqq q q q q q q q qqq q q qq q q q q q q q q qqqqqqqqq qqq q q qq q q q q q qq q q qqqq qqq q qq q q qqqq q q q q q qq q q qqqqqq qq q q q q qqq q qq q qqq qqqqqqq q q qq q q q qqq q q qqqqq qqqq qq qqq qqq q qqqqqqq qq q qqqqq qq q q q qq q q q qq q qqqqq qq q qqq q q qq q q q qqqqqqqq qqq qq qq q q qq q qqq q qqqqqq q q qqq qq q qq q q q qqqq q q q q q qqq q q qqq qqqqq q qq qq q q q qq q q q qq q q q qq q qq q q qq q qqq q q q qq qq q qq q q q qqqqqq q q q q qq q q qq q q q qqq q q qq qq q qqq qq q qq q qq qq q q qq qq q q q qq q q q q q qqq qqq q qq q qq qqq q q qq qq q q qq q qq qqq q q q qq qqq q qq q qq q q qq q q qq q 1.5 1.5 q q q q qq q qq qq q q q q q q q q q qqq q qq qq q q q qq q q q q q q q q qq q q q q q q qq qqqqqqqqqq qq q q q qq q q q q q qq q q q qqqq q qq q q q qq qq q qqq q q qqqqqqqqq q qq q qq q q q qqqq qq q q q q q q q qqq q qq q qq q qq q qq q q qqq qqq q qq q qqq q q q qqq q q qq qq q q qqqq q q q qqqq q qqqqqq q qq qqqq qq q qqqqqq qqq q q q qq qq q q q q q q q qq q qqqqq qqqq qqq q q qq qqqq qq qqqqq qqqqqqqqq qqqq q q q q qqq qqqqqqqqqq q qq qqq qq q qq qqqqq qqqqq qqqqq qq q qq q qqqq qq qq q qqqqq qqq q qqq q q q qq q q qqqqqq qqq qqqqqqqqqqqqq q qqq qq qq q qqqqq qq q q q q q qq qq qqqq q q q qq q qq q q q q q qq q qq q q qq q qq qqq qqqqq qqq qqqqqq q q q qq qq qq q qq q q q q qqqq q q qq q qq qq q qq q q q qq q qq qq qq qqq qqqq qqqqqq qq qqqqq q qq qq qqqq q qq qq qqq q q q q qq q qqqq q qqq q q q q q q q qqqq qq q q q q q qq qq q qqqqqqqqq qqqq q qq qq q q q q q q q q qq q q q qq q qqqqqq qqqqqq q q qq q q q qqqqqqq qqq qqq qq q q qqqqq q q qqqq qq q q qq qq qq q q q q q qqq q q q qqq q q q q q q q q q q qqqqqqqq q q q qq qq q qq q qq q q qqq q qqqqqqqqqqqqq q qq q q q q qqq q q qq qq q q q q q q q qq q q q q qq q q qqqq q qqq qq q q qq q q qqqqqq q q q qq q qq qq qqqqqqqqqqqqqq q qq q qqq q q q q q q q q qqq qq qqq qq qqqqqqqq qq q q qq qq q qq qq q qqqq qq qq q q q qq qq q qq q q q qq q q q qq q qq q qq q q qq qq qqqq q qq q q qq qqqqq qqq qq q q q q qq q qqq qq qqq qqq q q q qq q q q q qqq q q qq q q q qq q qqq q q q q q q q qq qq qqqq qqq q qq qq q q q qq qqq q q q q qqqqqqqq qqq qqqqq q q qq q q qq q q qqqqqqqqqqq qqqqqqqq q q qqqqqqqq q qq q q qq qq q q qq q qqqqqq q q qqqqq q q q qqqqq q qq q q q q q q q qq q q q q q q qq qqqqqqqq qq q qq q qq q q qqq qqq q qq q q qqqqqqqqqqq qqqqq qqq q qqqqqqqqq q qq q q q q qq q qq q qq qq q qqqqqqqqqqq q q q qq q qqqqqqqqq qqqqqq qq q q q qq qqqqqqqq q q q q q q q qqq q q qq q q q q q q q q qqqqqqqqq qqq q q qq q q q q q qq q q qqqq qqq q qq q q qqqq q q q q q qq q q qqqqqq qq q q q q qqq q qq q qqq qqqqqqq q q qq q q q qqq q q qqqqq qqqq qq qqq qqq q qqqqqqq qq q qqqqq qq q q q qq q q q qq q qqqqq qq q qqq q q qq q q q qqqqqqqq qqq qq qq q q qq q qqq q qqqqqq q q qqq qq q qq q q q qqqq q q q q q qqq q q qqq qqqqq q qq qq q q q qq q q q qq q q q qq q qq q q qq q qqq q q q qq qq q qq q q q qqqqqq q q q q qq q q qq q q q qqq q q qq qq q qqq qq q qq q qq qq q q qq qq q q q qq q q q q q qqq qqq q qq q qq qqq q q qq qq q q qq q qq qqq q q q qq qqq q qq q qq q q qq q q qq 6 8 10 12 Age 14 16 18 0.5 Log(FEV) 1.0 q 0.0 0.0 0.5 Log(FEV) 1.0 q 6 8 10 12 Age 14 16 18 Lowess • Kernel smoothers may be sensitive to outliers (bigger problem in smaller datasets) • Lowess is a relative of kernel smoothing which is less sensitive to outliers: Basic Idea • Define a smoothing window (bandwidth) • Compute a weighted least-squares fit of the points in the window (more weight for points near center) • Outliers (points far from regression line) are down-weighted, process is repeated • µ(t ) is given by fitted value from final weighted regression ˆ Lowess 0.5 1.0 q q qq qq q qq q q qq q q q qq q q q qq q qq q q qq q q qq q q qq q q q qqqq q q q q q q qq q q q qq q q q q q qq qq q q qq q qq qq q q qq q qq q qq q qq q q qq q q Log(FEV) Log(FEV) 1.0 q q qq qq q qq q q qq q q q qq q q q qq q qq q q qq q q qq q q qq q q q qqqq q q q q q q qq q q q qq q q q q q qq qq q q qq q qq qq q q qq q qq q qq q qq q q qq q q 1.5 Lowess 0.5 1.5 Gaussian kernel 0.0 q 0.0 q 8 10 12 Age 14 16 18 8 10 12 Age 14 16 18 Correlation structures: Correlation matrices More than two variables • With correlated data, often more than two observations per cluster • Correlation within clusters may have distinct patterns depending on when/where/how observed, may vary based on: • Time between measurements • Geographical distance • etc. Scatterplot matrix • Plot potentially correlated outcomes vs. each other • With measurements in continuous time/space, may need to round/stratify 0.2 0.4 0.6 0.8 1.0 0.6 0.8 1.0 1.2 1.4 q 0.0 0.1 0.2 0.3 0.4 q q q qq q q q qq qq q q qq q qq q logFEV.6 q q q q q q q qq q q qq q q qq qq q q q q q q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q q qq q q q qqq q q qq q q q q q qqq q q q q qq q q q q q qq q qqq q q q qq q qq q qq q q qqq qqq q q qq q q q qqq qqq q q q q q qqqq qq q q q q q q qq q q q qq q qq q q q qqq q qq q q qq q q q qqq q qq q qqq q logFEV.9 q q q q q q q q q qq q q qqq q q q q qq q q q q q qq qq qqq qq q q q qq q q q qq q q q q q qq q q q q qq q q qqq q q q q qq q qqq q q q q qqq q qq qq q qqq qq q q q q qq q q qq q q q q qqqqq q qq q q qq qq q q q qq q q q q q q q qq q qq qq q qq q q q q q q qq logFEV.12 q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q qq q qq qq q qq q q q q qq qq q qq qq q qq q q q q qq q q q q q qq q q q q qq q q q q q qq q q q q q qq qq q q qq q q qq q qq qq qq q qq q q qq q q q qq qqqq q q q qqq q q qqq q q qqq q qq q qqq q q q q qqq q q qq q q q qq q qq q q qqq q q q q q q q q q q qq q q q q qq q q q qq q q q qqq q q q q q qqq q q q qq q qq q q qq q qq q qq q q q q qq qq q q q qqqqqq q q q q q qq q qq q qqq q q q qq q q q q qq q q q q qqq q q q qqq q qq q q qq qq q q qq q q q qqq qq q q q qq q qq q qq qq qq q q q q q qq qq q qqq q q q qq q q qqqq qq q q qq q qq q qqq q q q q q qq q qq qq q q q qq qqq q qqq q qq q qq q qq qq q q qq qq qq q q q q q q qq q q q q qq q q q q qq qq q qq q qq q q q q q qq q qq qq q logFEV.15 qq q q q q q q q qq q qq q qq q qq q q q q q qq q qq q qq q q qq q q q qq qq qq q q qq q q q q q q q 1.6 q q 0.0 0.1 0.2 0.3 0.4 qq qq q qq q qq q q qq q q qqq q qq qq qq qq q q qq q qq q qqq q qq q q qq qq qq qq q qqq q qqq q q qq q q qq q q qq q qq qqq q q qq q q q qq q q qq q q q q q qq q qq qq q q q q qq q q qq q qq q qq q q qqq q q qq q qq q qqq q q qq qq q qq q qq q qq q q 0.4 0.8 1.4 q q q logFEV.18 1.0 0.6 0.8 1.0 1.2 1.4 q q q q q q q qq q qq q qq q q q qq qq q q q q qqqq qqq q qq q q q qq q qq q q qq qq q q q q qq q qqq q q q q qqq qq q q q qq q q qq qqq q qqq qq q qq q qq q q q q qq qq q q qq q q qq q qq q q q q q q q q q q qq q q q q qq q qq q q q q qqq qq q q q qq q q qqq q qqqq qq qq q 0.4 qq q q q q q 1.2 q q q 0.8 q q qq qq q q 1.2 1.2 qq 0.8 0.2 0.4 0.6 0.8 1.0 q q q q q qq q q q q 0.8 1.0 1.2 1.4 1.6 Scatterplot matrix What to look for • What pairs appear to have highest correlation (tight cluster around 45-degree line)? What pairs have lowest correlation? • Do pairs further apart in time (space) have higher/lower correlations? • Do correlation patterns between adjacent pairs appear to depend on time? Scatterplot matrix 0.2 0.4 0.6 0.8 1.0 0.6 0.8 1.0 1.2 1.4 q 0.0 0.1 0.2 0.3 0.4 q q q qq q q q qq qq q q qq q qq q logFEV.6 q qq q q q q q q q q q q q q qq q q qq q q qq qq q q q q qq q q q q q q q qq q q qq q q q q q q q q q q qq q q q q q q q q qq q q q q q qqq q q q q qq q q q q q qq q qqq q q q qq q qq q qq q q qqq qqq q q qq q q q qqq qqq q q q q q qqqq qq q q q q q q qq q q q qq q qq q q q qq q qq q q qq q qq q q q qqq q q q q q q q q qqq q logFEV.9 q q q qq q q qqq q q qq q q q q q q q qqq q q q qq q q q qq q q qq q q q q q qq q q q qq q qq q q qqq q q q q qq q qqq q q q q qqq q qq qq q qqq qq q q q qq q q qqqqq q q qq q q q qq q q q qq qq q q q qq q q q q q q q q qq q q qq q qq qq q qq q q q q q q q q qq q qq q qq q q q qq qq q q q qqqq qqq q qq q q q qq q qq q q qq qq q q q q qq q qqq q q q q qqq qq q q q qq q qq qqq q qqq qq qq q q q q qq q q q q qq q q qq q q qq q q qq qq q q q q q logFEV.12 q q q q q q q q qq q q q q q q qq q q q q qq q qq qq q qq q q q q qq qq q qq qq q qq q q q q q qq q q q q q q q q q q q qq q q q q qq q q q q q qq q q q q qq qq q q qq qq qq q q qqqq qq qq qqq q qq q q qq q q q q q qqq qq q qqq q q qqq q qq q qqq q q q q qqq q q q qq q q qq q qq q q qqq q q q q q q qq q qq qq qq q q q q q qq qq q qqq q q q qq q q qqqq qq q q qq q qq q qqq q q q q q qq q qq qq q q q qq qqq q qqq q qq qq q qq qq q qq qqq q q qq q q q q q qq q q qq q q q q q q qq qq q qq q qq q q q q q q q qq q q q qq q q q qqq q q q q q qqq q q q qq q qq q qq q qq q qq q q q q qq qq q q qqqqqq q q q q q qq q qq qqq q q q qq q q qq q q q qq qqq q qq q q q qq q q q qq q q qq qq q qq q q q q qqq qq q q q qq q q q q q qq q q q q q q q qq q qq qq q q q qq q qq q qq q qq q q q q q qq q qq q qq q q qq q q q qq qq qq q q qq q qq q q q q q q q logFEV.15 q q q q 1.6 q q 0.0 0.1 0.2 0.3 0.4 qq qq q qq q qq q q qq q q qqq q qq qq qq qq q q qq q qq q qqq q qq q q qq qq qq qq q qqq q qqq q q qq q q qq q q qq q qq qqq q q qq q q q qq q q qq q q q q q qq q qq qq q q q q qq q q qq q q qq q qq q q qqq q q qq q qq q qqq q q qq qq q q q q qq q qq q 0.4 0.8 1.4 q q q logFEV.18 1.0 0.6 0.8 1.0 1.2 1.4 q q q q q qq q qq q q q q qq q qq q q q q qqq qq q q q qq q q qqq q qqqq qq qq q 0.4 q qq q 1.2 q q q 0.8 q q qq qq q qq q q q qqq q q 1.2 1.2 qq 0.8 0.2 0.4 0.6 0.8 1.0 q 0.8 1.0 1.2 1.4 1.6 Correlation matrix A scatterplot matrix can be summarized numerically by the correlation matrix (scaled version of covariance matrix): 1 ρ(A, B ) ρ(A, C ) . . . ˆ ˆ ˆ ˆ 1 ρ(B , C ) . . . ˆ R (A, B , C , . . . ) = ρ(A, B ) . . . . . . . . . . . . Note: In the notation from lecture 4, [A, B , C , . . . ] = Yi , corresponding to the observations on a single (or, alternately, “generic”) cluster. It can be estimated assuming that all clusters have the same underlying correlation matrix. Correlation matrix When estimated from the data, get the sample correlation matrix : logFEV.6 logFEV.9 logFEV.12 logFEV.15 logFEV.18 logFEV.6 logFEV.9 logFEV.12 logFEV.15 logFEV.18 1.00 0.55 0.49 0.55 NA 0.55 1.00 0.71 0.74 0.72 0.49 0.71 1.00 0.75 0.64 0.55 0.74 0.75 1.00 0.87 NA 0.72 0.64 0.87 1.00 Notes • Usefulness of correlation matrix as a summary measure depends on whether relationship between variables is linear • Look for scatterplots shaped like ellipses ⇒ data are approximately bivariate Normal Correlation structures: Autocorrelation and variograms Autocorrelation and stationarity • Sometimes, pairs of observations measured the same time apart may have similar correlations • eg. logFEV.13 logFEV.14 logFEV.15 logFEV.16 logFEV.17 logFEV.13 logFEV.14 logFEV.15 logFEV.16 logFEV.17 1.00 0.89 0.85 0.75 0.76 0.89 1.00 0.89 0.80 0.77 0.85 0.89 1.00 0.88 0.85 0.75 0.80 0.88 1.00 0.89 0.76 0.77 0.85 0.89 1.00 • Suggests that the data generating process is stationary, i.e. correlation between a pair of observations depends only on the time lag between them, not on the observation time itself: ρ(Y (t1 ), Y (t2 )) depends only on |t1 − t2 | Autocorrelation and stationarity If process is stationary, can safely combine measurements with same time lag to estimate auto-correlation function: A(u ) = ρ(Y (t1 ), Y (t2 )), |t1 − t2 | = u eg. for FEV data A(u ) = ρ(FEV (age1 ), FEV (age2 )), |age1 − age2 | = u ˆ A(u ) = (i ,j ):|ti −tj |=u (yi (i ,j ):|ti −tj |=u (yi − y )2 ¯ − y )(yj − y ) ¯ ¯ (i ,j ):|ti −tj |=u (yi − y )2 ¯ Autocorrelation: limitations A(u ) = ρ(Y (t1 ), Y (t2 )), |t1 − t2 | = u The autocorrelation function has limited value if • Data generating process is not stationary (eg. FEV data across all ages) • Observation times are not regularly spaced (eg. Beta-carotene measurements across time) ...
View Full Document

Ask a homework question - tutors are online