The sample data is usually partitioned into a

Info icon This preview shows pages 55–58. Sign up to view the full content.

View Full Document Right Arrow Icon
The sample data is usually partitioned into a training (or model-building) set, which we can use to develop the model, and a validation (or prediction) set, which is used to evaluate the predictive ability of the model. The K -fold cross-validation partitions the sample dataset into K parts which are (roughly) equal in size. For each part, we use the remaining K 1 parts to estimate the model of interest (i.e., the training sample) and test the predictability of the model with the remain- ing part (i.e., the validation sample). Then, the sum of squared prediction errors can be computed and combining K estimates of prediction error produces a K -fold cross-validation estimate. When K = 2, it is usually preferable to residual diagnostic methods and takes not much longer to compute. When K = n , this is called leave-one-out cross-validation . That means that n PAGE 55
Image of page 55

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2.7 Multicollinearity c circlecopyrt HYON-JUNG KIM, 2017 separate data sets are trained on all of the data (except one point) and then prediction is made for that one point. The evaluation of this method is very good, but often computationally expensive. Note that the K -fold cross-validation estimate of prediction error is identical to the PRESS statistic. 2.7 Multicollinearity Multicollinearity exists when two or more of the predictors in a regression model are mod- erately or highly correlated. Types of multicollinearity - Structural multicollinearity: a mathematical artifact caused by creating new predictors from other predictors, such as creating the predictor X 2 from the predictor X . - Data-based multicollinearity: a result of a poorly designed experiment, reliance on purely observational data, or the inability to manipulate the system on which the data are collected. When it exists, there can be one of the following problems: the estimated regression coefficient of any one variable depends on other predictors that are included in the model. the precision of the estimated regression coefficients decreases as more predictors are added to the model. the marginal contribution of any one predictor variable in reducing the error sum of squares depends on the other predictors that are included in the model. hypothesis tests for β k = 0 may yield different conclusions depending on which predic- tors are in the model. Example. The researchers were interested in determining if a relationship exists between PAGE 56
Image of page 56
2.8 Ridge Regression c circlecopyrt HYON-JUNG KIM, 2017 blood pressure and age, weight, body surface area, duration, pulse rate and/or stress level. (refer to the data set on the web) Y : blood pressure (BP, in mm Hg) - 20 individuals with high blood pressure X 1 : age (in years), X 2 : weight (in kg), X 3 : body surface area (BSA in square meter) X 4 : duration of hypertension (in years), X 5 : basal pulse (in beats per minute), X 6 : stress index Q. What is the effect on the regression analysis if the predictors are perfectly uncorre- lated?
Image of page 57

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 58
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern