Comprehension Check_ Cross-validation _ 4.2_ Cross-validation _ PH125.8x Courseware _ edX.pdf

This preview shows page 1 - 4 out of 8 pages.

5/9/2020 Comprehension Check: Cross-validation | 4.2: Cross-validation | PH125.8x Courseware | edX 1/8 Course Section… 4.2: Cr… Compr… Comprehension Check: Cross-validation Q1 1/1 point (graded) Generate a set of random predictors and outcomes using the following code: set.seed(1996) #if you are using R 3.5 or earlier set.seed(1996, sample.kind="Rounding") #if you are using R 3.6 or later n <- 1000 p <- 10000 x <- matrix(rnorm(n*p), n, p) colnames(x) <- paste("x", 1:ncol(x), sep = "_") y <- rbinom(n, 1, 0.5) %>% factor() x_subset <- x[ ,sample(p, 100)] Because x and y are completely independent, you should not be able to predict using x with accuracy greater than 0.5. Con±rm this by running cross-validation using logistic regression to ±t the model. Because we have so many predictors, we selected a random sample x_subset . Use the subset when training the model. Which code correctly performs this cross-validation? y fit <- train(x_subset, y) fit$results
5/9/2020 Comprehension Check: Cross-validation | 4.2: Cross-validation | PH125.8x Courseware | edX 2/8 Submit You have used 1 of 2 attempts Q2 1/1 point (graded) Now, instead of using a random selection of predictors, we are going to search for those that are most predictive of the outcome. We can do this by comparing the values for the group to those in the group, for each predictor, using a t-test. You can do perform this step like this: install.packages("BiocManager") BiocManager::install("genefilter") library(genefilter) tt <- colttests(x, y) fit <- train(x_subset, y, method = "glm") fit$results fit <- train(y, x_subset, method = "glm") fit$results fit <- test(x_subset, y, method = "glm") fit$results
5/9/2020 Comprehension Check: Cross-validation | 4.2: Cross-validation | PH125.8x Courseware | edX

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture