HWK2
Warren Lau
Chaoxu Tong
1. The error rate is 0.405, hence 40.5%.
2. In the probability calculation of nb.train, the bolded parts are the changed parts. The error rate is 0.405, 40.5%. There was no
improvement using the Bayesian estimates.
# Probabilit

1. P-values for the runs and hits in the model are 0.80381 and 0.05439.
P-value for the hits (with runs removed) is 0.018056. It got smaller when the runs predictor is removed.
20
40
60
80 100
150
200
0
80 100
0
50
100
hits
7
8
0
20
40
60
runs
5
6
logsal

ORIE 4740, Spring16, Lab 2b
1
Lab 2b: K-Nearest Neighbors
First Name:
Last Name:
NetID:
Lab 2b is due March 4th before class in the homework box at 2nd floor of Rhodes
Hall, along with Lab 2a. For Lab 2b, you only need to turn in this handout with
all the

HWK1
Warren Lau (wl325)
Chaoxu Tong
Part1: Classification rule first, if the value of eicosenoic is greater than 7, then the corresponding sample belongs
to region 1. Now only with region 2 or 3 left, if the value of oleic is greater than 7500, then the s

ORIE 3500/5500 Fall Term 2008 Assignment 1-Solution
1. (a) P (B) = 1 P (B c ) = 1 0.35 = 0.65. Since P (A B) = P (A) + P (B) P (A B), we have P (A B) = 0.55 + 0.65 0.75 = 0.45. P (Ac B) = P (B) P (A B) = 0.65 0.45 = 0.2. P (A B c ) = P (A) P (A B) = 0.55

H W # 8 ORIE 4740 (sj259) 1. > sum <- sum(orthoKmeans$withinss) > sum [1] 22681.11
Shradha Jain
So, the smallest value of W (C) is 22681.11. The largest cluster has the significant predictor variables: cfw_ RBEDS, ADM, TH, REHAB . The next-largest cluster

ORIE 4740 Syllabus
Igor Labutov
Spring 2015
1
Introduction
[Data mining is] the process of discovering meaningful correlations, patterns,
and trends by sifting through large amounts of data.[it] employs pattern recognition technologies, as well as statist

ORIE 4740, Spring15, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
1
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data set

ORIE4740 Final Project
April 2015
1
Objective
The purpose of the nal project is to give you some experience at an end-to-end
data analysis pipeline, which should include:
Finding a dataset of interest and posing questions that can be answered
with it.
C

Statistical Learning and Data Mining STSCI 4740
Final project
I
Data set posted on blackboard today. n = 500 data points, response Y ,
predictors X 1, ., X 11.
I
Turn in written report and programs by December 7 (2015), 10 pm via email.
I
Report: no more

ORIE 4740 Problem Set 3
Out: Midnight February 20, 2015
Due: Noon Monday March 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
the

STSCI 4740 Machine Learning and Data Mining
Dr. Stanislav Volgushev
Fall 2015
Homework 0 Solutions
Problem 1
1. Assume that X1 N (1, 2), X2 N (1, 3) and X1 , X2 are independent. What is the
distribution of X1 + X2 ?
Answer:
N (1 + 1, 2 + 3)
2. Assume that

STSCI 4740 Machine Learning and Data Mining
Fall 2015
Dr. Stanislav Volgushev
Homework 2, due Oct 8 before lecture
For Problems 1-4, also write down the R code and R output for all programs that
you are using. In all of the problems justify your answers.

ORIE 4740
Spring 2017
Homework 0 Solution
1. (a) N (1 + 1, 2 + 3) = N (2, 5).
(For independent normal random variables X N (1 , 12 ), Y N (2 , 22 ), X + Y N (1 +
2 , 12 + 2 ).
(b) Nl (A, AA> ).
Since linear transformation of normal r.v. is still normal, w

ORIE 4740
Spring 2017
Homework 0
(Due Monday Feb 6 by 4:30pm)
1. Normal distributions.
(a) Suppose X1 N (1, 2), X2 N (1, 3) and X1 , X2 are independent. What is the distribution of
X1 + X2 ?
(b) Suppose that X Nk (, ) and that A is a ` k matrix with full

ORIE 4740, Spring 2017, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data se

ORIE 4740, Spring 2017, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data se

ORIE 4740, Spring17, Lab 2
1
Lab 2: Basic Classification Algorithms
First Name:
Last Name:
NetID:
This lab is due March 10th (Friday) in the homework box at 2nd floor of Rhodes
Hall. You need to submit your code, summary of the models you fit as well as
y

ORIE 4740, Spring17, Lab 2
1
Lab 2: Basic Classification Algorithms
First Name:
Last Name:
NetID:
This lab is due March 10th (Friday) in the homework box at 2nd floor of Rhodes
Hall. You need to submit your code, summary of the models you fit as well as
y

Data Mining and Machine Learning
STSCI 4740
Fall 2015, 4 credits
DESCRIPTION:
The topics covered will include linear methods for regression and classification, model
selection and penalization, a brief overview of resampling methods such as bootstrap
and

ORIE 4740 Problem Set 2
Out: Midnight February 1, 2015
Due: Noon Monday February 8, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
th

ORIE 4740 Problem Set I
Out: Midnight Friday January 23, 2015
Due: Noon Monday February 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material relate

Lab Guidelines:
Format:
1. Labs will have an in-class component that needs to be finished in lab and
questions that can be finished after lab.
2. Labs will be due two weeks from the date they are taught and

ORIE 4740, Spring16, Lab 4
1
Lab 4: Linear Model Selection and Regularization
First Name:
Last Name:
NetID:
You do not need to submit f Lab 4. We will try to go over most of the materials
in the sessions.
In this lab, we will cover linear model selection

ORIE 4740, Spring 2016, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data se

Linear Regression
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 3
1
Recap: Statistical Learning
so
fE
du
ca
ity
Se
ni
or
Se
ni
or
ity
e
Incom
e
Incom
Ye
ar
Ye
a
rs
of
Ed
uc
tio
n
atio
n
y = f (X1 , X2 ) +
y f (X1 , f2 ) = 0 + 1 X1 + 2 X2
2

Linear Regression II: Assessing Model
Accuracy
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 45
1
Announcements
Office hours:
Yudong Chen: M 5-6, F 11-12, Rhodes 223
Sijia Ma: W 2:30-4:30, Rhodes 418
Shuang Tao: W 4:30-5:30, F 4:00-5:00, Rh

Recap: Linear vs. Nonlinear
Linear techniques:
Linear and logistic regression
k -means; PCA
Simple extensions of linear techniques:
Adding high-order and interaction terms
Converting to dummy variables
Nonlinear techniques:
KNN
Basis functions, splines an

Tree-based Methods
Decision tree
Bagging
Boosting
Bagging Trees
Random Feature Selection
Boosted Trees
(optional)
Random Forests
Unused text text text text text
I Popular for classification, but works for regression as well
4