# Values of x are known precisely special methods that

This preview shows page 24 - 28 out of 28 pages.

Values of X are known precisely Special methods that account for measurements error in X No interaction between variables (beyond interactions included in the model) Check for interactions and include interaction terms as necessary 48
Page 25 Issues in Fitting Multiple Regression Models Which and how many variables to include? In the current study, the researches collected data on many additional covariates: smoking, blood pressure, serum ferritin levels, residence in rural vs urban areas, and urinary cadmium levels. Adding these variables to the model did not improve the fit. Therefore, these variables were excluded from the final (reported) model i.e., the researchers computed a P-value for every covariate in the model, removed those with P>0.05, and reran the model without these variables 49 Automatic Variable Selection Many statistical programs can perform automatic variable selection: All subsets selection : fit every possible model (i.e., combination of variables) to the data, and pick the one that fits the best Forward selection : start with a single predictors and add other predictors one at a time. Keep only those that improve model fit (that is, are significantly associated with outcome). Backward selection : start with a full set of predictors and eliminate predictors that contribute the least, one at a time. Stop when no more predictors can be removed. 50
Page 26 Problems with Automatic Variable Selection Testing too many combinations can lead to spurious associations Final model fits too well. R 2 is too high. The best-fit parameters are too far from zero. The CI’s are too narrow The p-values are too low When reading (presenting) the results of a study, it is important to know (report) how many variables the investigator started with Good practice is to validate the model in an independent population 51 Issues in Fitting Multiple Regression Models How many variables to include? The rule of thumb is that you need at least 10-20 (or even 40) participants for each predictor Too few variables – model may be too simple, not good at prediction Too many variables can result in overfitting ; model fits the current data set well; but the predictions will not generalize to other data sets. Goal – find the simplest model that still adequately predicts the outcome. 52
Page 27 Issues in Fitting Multiple Regression Models Which and how many variables to include? This depends on study primary goal: Single hypothesis testing : estimate the effect of a given variable adjusted for all potential available confounders Exploratory study : identify a set of variables independently associated with the outcome Predictive modeling : predict the outcome with the least variables possible (parsimony vs. adequacy) 53 Issues in Fitting Multiple Regression Models Multicollinearity – occurs when two predictor (X) variables are highly correlated with each other Contain redundant information Including the second one does not add much information after the first on is known (i.e., does not improve model fit) Cannot estimate coefficients reliably Check correlation among the predictors!

#### You've reached the end of your free preview.

Want to read all 28 pages?