HWK2
Warren Lau
Chaoxu Tong
1. The error rate is 0.405, hence 40.5%.
2. In the probability calculation of nb.train, the bolded parts are the changed parts. The error rate is 0.405, 40.5%. There was no
improvement using the Bayesian estimates.
# Probabilit

ORIE 3500/5500 Fall Term 2008 Assignment 1-Solution
1. (a) P (B) = 1 P (B c ) = 1 0.35 = 0.65. Since P (A B) = P (A) + P (B) P (A B), we have P (A B) = 0.55 + 0.65 0.75 = 0.45. P (Ac B) = P (B) P (A B) = 0.65 0.45 = 0.2. P (A B c ) = P (A) P (A B) = 0.55

HWK1
Warren Lau (wl325)
Chaoxu Tong
Part1: Classification rule first, if the value of eicosenoic is greater than 7, then the corresponding sample belongs
to region 1. Now only with region 2 or 3 left, if the value of oleic is greater than 7500, then the s

1. P-values for the runs and hits in the model are 0.80381 and 0.05439.
P-value for the hits (with runs removed) is 0.018056. It got smaller when the runs predictor is removed.
20
40
60
80 100
150
200
0
80 100
0
50
100
hits
7
8
0
20
40
60
runs
5
6
logsal

H W # 8 ORIE 4740 (sj259) 1. > sum <- sum(orthoKmeans$withinss) > sum [1] 22681.11
Shradha Jain
So, the smallest value of W (C) is 22681.11. The largest cluster has the significant predictor variables: cfw_ RBEDS, ADM, TH, REHAB . The next-largest cluster

ORIE 4740, Spring15, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
1
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data set

ORIE 4740, Spring15, Lab 2
1
Lab 2: Bootstraps for regression
Last Name:
First Name:
NetID:
This lab is due at noon on Mar 17th in the homework box on Rhodes 2nd oor.
You need to submit your code as well as the answer for each question.
In this lab, we ar

ORIE 4740, Spring15, Lab 2
1
Lab 2: Bootstraps for regression
First Name:
Last Name:
NetID:
This lab is due at noon on Mar 17th in the homework box on Rhodes 2nd oor.
You need to submit your code as well as the answer for each question.
In this lab, we ar

ORIE 4740, Spring15, Lab 1
1
Lab 1: Simple Regression Continued
First Name:
Last Name:
1
NetID:
In this lab, we will continue analyzing the ToyotaCorolla data set. We have noticed that the
linearity assumption does not hold for several predictors by plott

Lab Guidelines:
Format:
1. Labs will have an in-class component that needs to be finished in lab and
questions that can be finished after lab.
2. Labs will be due two weeks from the date they are taught and

ORIE 4740 Problem Set I
Out: Midnight Friday January 23, 2015
Due: Noon Monday February 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material relate

ORIE 4740 Problem Set 2
Out: Midnight February 1, 2015
Due: Noon Monday February 8, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
th

ORIE 4740 Problem Set 3
Out: Midnight February 20, 2015
Due: Noon Monday March 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
the

Olive Oils Lab I
Get the olive oils data sets (train and test) from Blackboard. Read the training data into R
using the read.table function. You will need to use the header = F argument and set the
sep argument correctly for the read.table function. To ge

ORIE4740 Final Project
April 2015
1
Objective
The purpose of the nal project is to give you some experience at an end-to-end
data analysis pipeline, which should include:
Finding a dataset of interest and posing questions that can be answered
with it.
C

ORIE 4740, Spring15, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
1
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data set

ORIE 4740 Syllabus
Igor Labutov
Spring 2015
1
Introduction
[Data mining is] the process of discovering meaningful correlations, patterns,
and trends by sifting through large amounts of data.[it] employs pattern recognition technologies, as well as statist

ORIE 4740
Spring 2016
Homework 3
(Due March 11 by 1:25pm)
Instruction: Check the boxes next to the right answers. There can be zero to four correct answers to
each question. One point is assigned to a multiple choice question if and only if all boxes next

Classification
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 911
1
Announcements
2
Classification
(Reading: ISLR Sections 4.14.3, 4.54.6, 2.2.3)
Recall: statistical learning
40000
Income
0
X1 = balance
X2 = income
Y = default or not
20000
T

Linear Regression III: Extensions
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 68
1
Announcements
I No lab session next week.
I Homework 1 due this Friday
I Lab 1: session this week, due next Friday.
2
Recap: Matrix Calculus
Exercise: Comp

Linear Model Selection
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 1516
1
Announcements
2
2.5
Recap
2.0
test error = bias2 + variance
Bias: decreases
1.5
I As model flexibility increases:
1.0
Variance: increases
0.5
I Want to select model

Model Flexibility and Cross-Validation
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 1214
1
Announcements
2
Recap
~)+
Statistical learning: Y = f (X
Regression: Response Y is continuous
I Linear regression
I More flexible: add non-linear te

Linear Regression II: Assessing Model
Accuracy
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 45
1
Announcements
Office hours:
Yudong Chen: M 5-6, F 11-12, Rhodes 223
Sijia Ma: W 2:30-4:30, Rhodes 418
Shuang Tao: W 4:30-5:30, F 4:00-5:00, Rh

Linear Regression
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 3
1
Recap: Statistical Learning
so
fE
du
ca
ity
Se
ni
or
Se
ni
or
ity
e
Incom
e
Incom
Ye
ar
Ye
a
rs
of
Ed
uc
tio
n
atio
n
y = f (X1 , X2 ) +
y f (X1 , f2 ) = 0 + 1 X1 + 2 X2
2

ORIE 4740, Spring 2016, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data se

ORIE 4740, Spring16, Lab 4
1
Lab 4: Linear Model Selection and Regularization
First Name:
Last Name:
NetID:
You do not need to submit f Lab 4. We will try to go over most of the materials
in the sessions.
In this lab, we will cover linear model selection

ORIE 4740, Spring16, Lab 3
1
Lab 3: Cross Validation
First Name:
Last Name:
NetID:
Lab 3 is due March 18 before class in the homework box at 2nd floor of Rhodes
Hall. For Lab 3, submit your code and plots of the validation errors as well as
your answers t