Linear Regression
Chapter 3
Overview
This is the simplest supervised learning approach.
The interest is predicting a quantitative response variable Y from a set of X
variables.
It assumes that the dependence of Y on X1, X2, Xp is linear.
Lecture 13: Subset Selection and
Regularization
Improving the Linear Model
Although simple, the linear model has distinct advantages in terms of its
interpretability and often shows good predictive performance.
Data Wrangling
with dplyr and tidyr
Cheat Sheet
Tidy Data - A foundation for wrangling in R
In a tidy
data set:
Each variable is saved
in its own column
Syntax - Helpful conventions for wrangling
Sepal.Length Sepal.Width Petal.Length
1
5.1
3.5
DATA ANALYSIS THE DATA.TABLE WAY
The official Cheat Sheet for the DataCamp course
Take DT, subset rows using i, then calculate j grouped by by
General form: DT[i, j, by]
CREATE A DATA TABLE
Create a
data.table
library(data.table)
> DT
set.seed(45L)
V1 V2
Resampling Methods, Part 1
Chapter 5
Introduction
This chapter examines tools that involves repeatedly drawing samples from
a training set and retting a model of interest on each sample in order to
Introduction to Statistical Models
and Data Mining
Lecture 1: Overview
Course Content
Applied regression analysis, with emphasis on general linear model (e.g.,
multiple regression) and generalized linear model (e.g., logistic regression).
Linear Regression, Part 3
Chapter 3
Possible Fit Problems
1. Non-linearity of the response-predictor relationships.
2. Correlation of error terms.
3. Non-constant variance of error terms.
4. Outliers.
5. High-leverage points.
6. Collinearity.
Introduction to Statistical Models
and Data Mining
Stat 101C - Lew
Lecture 2:
Model Accuracy in Regression
From Last Time
It is easy to see that best t depends heavily on the data and we need a
well dened set of rules to help us decide which method works
Classication, Part 3
chapter 4
From Last Time
Confusion Matrix
Recall the last two lectures weve looked at contingency tables as a way of
assessing models. In this textbook, a contingency table of the predicted
outcome (row) vs actual
Introduction to Statistical Models
and Data Mining
Stat 101C - Lew
Finishing thoughts from last time
A general rule
low bias, high variation
high bias, low variation
as we employ more exible statistical learning methods, the variance will
Classication Methods
Chapter 4
Logistic Regression
Overview
Last Chapter the response variable Y is quantitative
This Chapter the response variable is qualitative AKA categorical. These
types of problem are probably more common than regression.
Classication, Part 2
Chapter 4
Linear Discriminant Analysis (LDA)
LDA is like Logistic Regression. It classies data using a categorical Y
variable (outcome). Examples could be
Made Money or not
Bought Something or not
Satised or not satised
The Preamble
The preamble is the space between the documentclass statement and the begin document statement. I can
reproduce it for you here:
% this part is the preamble
\usepackage[left=1in, top=1in, right=1in, bottom=1in, papersize=cfw_8.5in, 11in,
Data manipulation
with dplyr
Hadley Wickham
@hadleywickham
Chief Scientist, RStudio
June 2014
Data analysis
Data analysis
is the process
is the process
by which
becomes
understanding,
knowledge
and insig