Regression.pdf

# The sample data is usually partitioned into a

• 63

This preview shows pages 55–58. Sign up to view the full content.

The sample data is usually partitioned into a training (or model-building) set, which we can use to develop the model, and a validation (or prediction) set, which is used to evaluate the predictive ability of the model. The K -fold cross-validation partitions the sample dataset into K parts which are (roughly) equal in size. For each part, we use the remaining K 1 parts to estimate the model of interest (i.e., the training sample) and test the predictability of the model with the remain- ing part (i.e., the validation sample). Then, the sum of squared prediction errors can be computed and combining K estimates of prediction error produces a K -fold cross-validation estimate. When K = 2, it is usually preferable to residual diagnostic methods and takes not much longer to compute. When K = n , this is called leave-one-out cross-validation . That means that n PAGE 55

This preview has intentionally blurred sections. Sign up to view the full version.

2.7 Multicollinearity c circlecopyrt HYON-JUNG KIM, 2017 separate data sets are trained on all of the data (except one point) and then prediction is made for that one point. The evaluation of this method is very good, but often computationally expensive. Note that the K -fold cross-validation estimate of prediction error is identical to the PRESS statistic. 2.7 Multicollinearity Multicollinearity exists when two or more of the predictors in a regression model are mod- erately or highly correlated. Types of multicollinearity - Structural multicollinearity: a mathematical artifact caused by creating new predictors from other predictors, such as creating the predictor X 2 from the predictor X . - Data-based multicollinearity: a result of a poorly designed experiment, reliance on purely observational data, or the inability to manipulate the system on which the data are collected. When it exists, there can be one of the following problems: the estimated regression coefficient of any one variable depends on other predictors that are included in the model. the precision of the estimated regression coefficients decreases as more predictors are added to the model. the marginal contribution of any one predictor variable in reducing the error sum of squares depends on the other predictors that are included in the model. hypothesis tests for β k = 0 may yield different conclusions depending on which predic- tors are in the model. Example. The researchers were interested in determining if a relationship exists between PAGE 56
2.8 Ridge Regression c circlecopyrt HYON-JUNG KIM, 2017 blood pressure and age, weight, body surface area, duration, pulse rate and/or stress level. (refer to the data set on the web) Y : blood pressure (BP, in mm Hg) - 20 individuals with high blood pressure X 1 : age (in years), X 2 : weight (in kg), X 3 : body surface area (BSA in square meter) X 4 : duration of hypertension (in years), X 5 : basal pulse (in beats per minute), X 6 : stress index Q. What is the effect on the regression analysis if the predictors are perfectly uncorre- lated?

This preview has intentionally blurred sections. Sign up to view the full version.

This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern