This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: PubH 7405: EGRESSION ANALYSIS LR: MODEL BUILDING In order to provide more comprehensive prediction of a specific dependent variable Y say the outcome of certain treatment, it is very desirable to consider a large number of factors with available data  and sort out which ones are most closely related that outcome . We do not want to miss identifying any important predictor/covariate. NUMBER OF COVARIATES In biomedical research, the independent variables are covariates representing patients characteristics and, in many cases of clinical research, one of them represents the treatment . There may be more potential predictors than we can manage to investigate; sometimes the number of factor is even larger than the sample size. Multiple Regression could help us to investigate the factors with available data but, in the end, may be only a few of the potential explanatory factors have predictive power . The core of the process is to build a model. A simple strategy for the building of a regression model consists of some, most, or all of the following five steps or phases: (1) Data collection and preparation, (2) Preliminary model investigation, (3) Reduction of the predictor variables, (4) Model refinement and selection, and (5) Model validation #1: DATA COLLECTION The data collection phase separates studies into two types: (1) Controlled experiments , and (2) Observational Studies #2: PRELIMINARY MODEL INVESTIGATION nce data have been collected, the process begins with steps/actions employed to identify: (1) Functional form for predictor variables ; whenever possible, one should rely on investigator/statisticians prior knowledge and or similar previous studies to suggest appropriate data transformations such as taking logs . (2) Important interactions that should be included in the list of variables from which to narrow down in the next step. #3: REDUCTION OF EXPLANATORY VARIABLES Major reason? To avoid multiple decision problem; in addition, with factors include some are unnecessarily the error degree of freedom is reduced which weakens subsequent statistical decisions: Very often, inexperienced investigators might screen a set of explanatory variables by fitting the full model containing the entire set of potential predictor variables, then simply drop those not statistically significant factors using individual ttests. It first seems reasonable but one may drop important intercorrelated predictor variables (which changes the results for the remaining factors which is good but has to be done cautiously with a control strategy). #4: MODEL REFINEMENT & SELECTION At this stage, a tentative regression model or models need to be checked in details for curvature and interaction effects; residual plots are helpful here. In addition, efforts are needed to identify outliers and further reduction is needed due to multicollinearity. #5: MODEL VALIDATION Validation is an useful and necessary final phase of...
View Full
Document
 Fall '08
 Staff

Click to edit the document details