This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Model selection, and influential data points ESM 206 Jan 24, 2008 Multiple independent variables Dependent variable may be caused by more than one independent variable pH affected by both SO4 and NO3 Statistical model: : intercept (predicted value of y when both x and z are zero) 1 : partial effect of x on y (effect of x on y while controlling for z ) 2 : partial effect of z on y (effect of z on y while controlling for x ) 1 2 i i i i y x z = + + + The Biasvariance tradeoff If z really does affect y, and we leave it out of the model, then our estimates of 1 may be biased On average, our prediction of how a change in x will affect y will be wrong If z really does NOT affect y, and we leave it in the model, then we may have inflated standard errors of 1 Our prediction of the impact of changing x is overly uncertain 1 2 i i i i y x z = + + + Underfitting vs. Overfitting http://www.willamette.edu/~gorr/classes/cs449/overfitting.html Strategies for selecting among models Exploratory data analysis We dont know much of anything about the system; we have no a priori ideas about what variables should be in the model Throw everything in; let the data tell us which variables are significant; throw out the rest Resulting model is a hypothesis; it may overfit the data, and thus be unreliable for predictions or inference Examples Test the model with new data Test scientific theory We have a general theory, and we want to know whether it applies to the population at hand Construct models that include the variables encompassed by the theory, as well as other variables that we might reasonably affect the dependent variable Only throw out variables with strong collinearity problems; focus on P values for the variables that reflect the theory Strategies continued Use best available science to guide policy predictions One or more (often more) scientific theories might apply to the problem at hand Need to figure out which theory(ies) apply, and estimate parameters For each theory: Construct models that include the variables encompassed by the theory, as well as other variables that we might reasonably affect the dependent variable...
View
Full
Document
This note was uploaded on 08/06/2008 for the course ESM 206 taught by Professor Kendall,berkley during the Spring '08 term at UCSB.
 Spring '08
 KENDALL,BERKLEY
 Environmental Science

Click to edit the document details