Linear Regression
Chapter 3
Overview
This is the simplest supervised learning approach.
The interest is predicting a quantitative response variable Y from a set of X
variables.
It assumes that the dependence of Y on X1, X2, Xp is linear.
We will spend a l
Lecture 13: Subset Selection and
Regularization
Improving the Linear Model
Although simple, the linear model has distinct advantages in terms of its
interpretability and often shows good predictive performance.
Chapter 6 presents some ways in which the s
Data Wrangling
with dplyr and tidyr
Cheat Sheet
Tidy Data - A foundation for wrangling in R
F MA
F MA
&
In a tidy
data set:
Each variable is saved
in its own column
Syntax - Helpful conventions for wrangling
Sepal.Length Sepal.Width Petal.Length
1
5.1
3.5
DATA ANALYSIS THE DATA.TABLE WAY
The official Cheat Sheet for the DataCamp course
Take DT, subset rows using i, then calculate j grouped by by
General form: DT[i, j, by]
CREATE A DATA TABLE
Create a
data.table
library(data.table)
> DT
set.seed(45L)
V1 V2
Resampling Methods, Part 1
Chapter 5
Introduction
This chapter examines tools that involves repeatedly drawing samples from
a training set and retting a model of interest on each sample in order to
obtain more information about the tted model
Model Assess
Introduction to Statistical Models
and Data Mining
Stat 101C - Lew
Lecture 1: Overview
Course Content
Applied regression analysis, with emphasis on general linear model (e.g.,
multiple regression) and generalized linear model (e.g., logistic regression).
Linear Regression, Part 3
Chapter 3
Possible Fit Problems
1. Non-linearity of the response-predictor relationships.
2. Correlation of error terms.
3. Non-constant variance of error terms.
4. Outliers.
5. High-leverage points.
6. Collinearity.
1. Non-linea
Linear Regression, Part 2
Chapter 3
The basic questions (3.2.2)
Is at least one of the predictors X1, X2, . . . , Xp useful in predicting the
response?
Do all the predictors help to explain Y , or is only a subset of the predictors
useful?
How well does t
Introduction to Statistical Models
and Data Mining
Stat 101C - Lew
Lecture 2:
Model Accuracy in Regression
From Last Time
It is easy to see that best t depends heavily on the data and we need a
well dened set of rules to help us decide which method works
Classication, Part 3
chapter 4
From Last Time
<- should set.seed()
Confusion Matrix
Recall the last two lectures weve looked at contingency tables as a way of
assessing models. In this textbook, a contingency table of the predicted
outcome (row) vs actual
Introduction to Statistical Models
and Data Mining
Stat 101C - Lew
Finishing thoughts from last time
A general rule
low bias, high variation
high bias, low variation
as we employ more exible statistical learning methods, the variance will
increase but the
Classication Methods
Chapter 4
Logistic Regression
Overview
Last Chapter the response variable Y is quantitative
This Chapter the response variable is qualitative AKA categorical. These
types of problem are probably more common than regression.
Qualitativ
Classication, Part 2
Chapter 4
Linear Discriminant Analysis (LDA)
LDA is like Logistic Regression. It classies data using a categorical Y
variable (outcome). Examples could be
Made Money or not
Bought Something or not
Satised or not satised
Voted Democrat
1
The Preamble
The preamble is the space between the documentclass statement and the begin document statement. I can
reproduce it for you here:
% this part is the preamble
\usepackage[left=1in, top=1in, right=1in, bottom=1in, papersize=cfw_8.5in, 11in,
po
Linear Regression, Part 2
Chapter 3
The basic questions (3.2.2)
Is at least one of the predictors X1, X2, . . . , Xp useful in predicting the
response?
Do all the predictors help to explain Y , or is only a subset of the predictors
useful?
How well does t
Classication Methods
Chapter 4
Logistic Regression
Overview
Last Chapter the response variable Y is quantitative
This Chapter the response variable is qualitative AKA categorical. These
types of problem are probably more common than regression.
Qualitativ
Data manipulation
with dplyr
Hadley Wickham
@hadleywickham
Chief Scientist, RStudio
June 2014
Data analysis
Data analysis
is the process
is the process
by which
bydata
which
becomes
data becomes
understanding,
understanding,
knowledge
knowledge
and insig