Classification
Qualitative variables take values in an unordered set C,
such as:
eye color cfw_brown, blue, green
email cfw_spam, ham.
Given a feature vector X and a qualitative response Y
taking values in the set C, the classification task is to build
Cross-validation and the Bootstrap
In the section we discuss two resampling methods:
cross-validation and the bootstrap.
1 / 44
Cross-validation and the Bootstrap
In the section we discuss two resampling methods:
cross-validation and the bootstrap.
The
Taha Mokfi
Department of Statistics
University of Central Florida
https:/www.linkedin.com/in/tahamokfi
https:/www.linkedin.com/in/mahsa-almaeenejad-30229646
B.S industrial engineering
M.S Statistical Computing Data Mining (UCF)
4 years experience in ana
0
50
100
200
TV
300
25
5
10
15
Sales
20
25
20
15
Sales
5
10
15
5
10
Sales
20
25
What is Statistical Learning?
0
10
20
30
40
50
0
20
Radio
40
60
80
100
Newspaper
Shown are Sales vs TV, Radio and Newspaper, with a blue
linear-regression line fit separately
There are N = 50 samples with 25 labeled as 1 and 25 labeled as 2, denoted
by Y as the response. There are p = 5000 covariates X1 , . . . , X5000 simulated
i.i.d. from standard normal distribution N (0, 1), which are also independent
of Y.
(1) Do 50 times
Simple linear regression using a single predictor X.
We assume a model
Y = 0 + 1 X + ,
where 0 and 1 are two unknown constants that represent
the intercept and slope, also known as coefficients or
parameters, and is the error term.
Given some estimates
Logistic Regression
Lets write p(X) = Pr(Y = 1|X) for short and consider using
balance to predict default. Logistic regression uses the form
p(X) =
e0 +1 X
.
1 + e0 +1 X
(e 2.71828 is a mathematical constant [Eulers number.])
It is easy to see that no mat
Linear Model Selection and Regularization
Recall the linear model
Y = 0 + 1 X1 + + p Xp + .
In the lectures that follow, we consider some approaches for
extending the linear model framework. In the lectures
covering Chapter 7 of the text, we generalize