HWK2
Warren Lau
Chaoxu Tong
1. The error rate is 0.405, hence 40.5%.
2. In the probability calculation of nb.train, the bolded parts are the changed parts. The error rate is 0.405, 40.5%. There was no
improvement using the Bayesian estimates.
# Probabilit

HWK1
Warren Lau (wl325)
Chaoxu Tong
Part1: Classification rule first, if the value of eicosenoic is greater than 7, then the corresponding sample belongs
to region 1. Now only with region 2 or 3 left, if the value of oleic is greater than 7500, then the s

ORIE 3500/5500 Fall Term 2008 Assignment 1-Solution
1. (a) P (B) = 1 P (B c ) = 1 0.35 = 0.65. Since P (A B) = P (A) + P (B) P (A B), we have P (A B) = 0.55 + 0.65 0.75 = 0.45. P (Ac B) = P (B) P (A B) = 0.65 0.45 = 0.2. P (A B c ) = P (A) P (A B) = 0.55

1. P-values for the runs and hits in the model are 0.80381 and 0.05439.
P-value for the hits (with runs removed) is 0.018056. It got smaller when the runs predictor is removed.
20
40
60
80 100
150
200
0
80 100
0
50
100
hits
7
8
0
20
40
60
runs
5
6
logsal

H W # 8 ORIE 4740 (sj259) 1. > sum <- sum(orthoKmeans$withinss) > sum [1] 22681.11
Shradha Jain
So, the smallest value of W (C) is 22681.11. The largest cluster has the significant predictor variables: cfw_ RBEDS, ADM, TH, REHAB . The next-largest cluster

ORIE 4740 Problem Set I
Out: Midnight Friday January 23, 2015
Due: Noon Monday February 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material relate

ORIE 4740 Problem Set 2
Out: Midnight February 1, 2015
Due: Noon Monday February 8, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
th

ORIE 4740, Spring15, Lab 1
1
Lab 1: Simple Regression Continued
First Name:
Last Name:
1
NetID:
In this lab, we will continue analyzing the ToyotaCorolla data set. We have noticed that the
linearity assumption does not hold for several predictors by plott

ORIE 4740 Problem Set 3
Out: Midnight February 20, 2015
Due: Noon Monday March 2, 2015
Rules (read carefully)
This homework must be individual work. You may not discuss your solution with other students in the class. Concepts and material related to
the

ORIE 4740, Spring15, Lab 2
1
Lab 2: Bootstraps for regression
First Name:
Last Name:
NetID:
This lab is due at noon on Mar 17th in the homework box on Rhodes 2nd oor.
You need to submit your code as well as the answer for each question.
In this lab, we ar

Lab Guidelines:
Format:
1. Labs will have an in-class component that needs to be finished in lab and
questions that can be finished after lab.
2. Labs will be due two weeks from the date they are taught and

ORIE 4740, Spring 2016
Project Information Sheet
Due Dates:
Once you form your group and decide on which dataset to use, email the
instructor ASAP with the names and NetID of your group members as well as the
source of the dataset(s). No two groups may us

ORIE 4740, Spring15, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
1
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data set

ORIE4740 Final Project
April 2015
1
Objective
The purpose of the nal project is to give you some experience at an end-to-end
data analysis pipeline, which should include:
Finding a dataset of interest and posing questions that can be answered
with it.
C

Statistical Learning and Data Mining STSCI 4740
Final project
I
Data set posted on blackboard today. n = 500 data points, response Y ,
predictors X 1, ., X 11.
I
Turn in written report and programs by December 7 (2015), 10 pm via email.
I
Report: no more

Data Mining and Machine Learning
STSCI 4740
Fall 2015, 4 credits
DESCRIPTION:
The topics covered will include linear methods for regression and classification, model
selection and penalization, a brief overview of resampling methods such as bootstrap
and

ORIE 4740
Spring 2017
Homework 0
(Due Monday Feb 6 by 4:30pm)
1. Normal distributions.
(a) Suppose X1 N (1, 2), X2 N (1, 3) and X1 , X2 are independent. What is the distribution of
X1 + X2 ?
(b) Suppose that X Nk (, ) and that A is a ` k matrix with full

ORIE 4740, Spring15, Lab 2
1
Lab 2: Bootstraps for regression
Last Name:
First Name:
NetID:
This lab is due at noon on Mar 17th in the homework box on Rhodes 2nd oor.
You need to submit your code as well as the answer for each question.
In this lab, we ar

ORIE 4740, Spring15, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
1
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data set

ORIE 4740, Spring 2016, Lab 1
1
Lab 1: Plotting in R; Regression
First Name:
Last Name:
NetID:
In this lab, you will learn how to create simple plots in R. We will also review the basics
of regression we learned in lectures using the ToyotaCorolla data se

Linear Regression
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 3
1
Recap: Statistical Learning
so
fE
du
ca
ity
Se
ni
or
Se
ni
or
ity
e
Incom
e
Incom
Ye
ar
Ye
a
rs
of
Ed
uc
tio
n
atio
n
y = f (X1 , X2 ) +
y f (X1 , f2 ) = 0 + 1 X1 + 2 X2
2

Linear Regression II: Assessing Model
Accuracy
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 45
1
Announcements
Office hours:
Yudong Chen: M 5-6, F 11-12, Rhodes 223
Sijia Ma: W 2:30-4:30, Rhodes 418
Shuang Tao: W 4:30-5:30, F 4:00-5:00, Rh

Model Flexibility and Cross-Validation
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 1214
1
Announcements
2
Recap
~)+
Statistical learning: Y = f (X
Regression: Response Y is continuous
I Linear regression
I More flexible: add non-linear te

Linear Model Selection
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 1516
1
Announcements
2
2.5
Recap
2.0
test error = bias2 + variance
Bias: decreases
1.5
I As model flexibility increases:
1.0
Variance: increases
0.5
I Want to select model

Linear Regression III: Extensions
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 68
1
Announcements
I No lab session next week.
I Homework 1 due this Friday
I Lab 1: session this week, due next Friday.
2
Recap: Matrix Calculus
Exercise: Comp

Classification
Yudong Chen
School of ORIE, Cornell University
ORIE 4740 Lec 911
1
Announcements
2
Classification
(Reading: ISLR Sections 4.14.3, 4.54.6, 2.2.3)
Recall: statistical learning
40000
Income
0
X1 = balance
X2 = income
Y = default or not
20000
T

ORIE 4740, Spring16, Lab 4
1
Lab 4: Linear Model Selection and Regularization
First Name:
Last Name:
NetID:
You do not need to submit f Lab 4. We will try to go over most of the materials
in the sessions.
In this lab, we will cover linear model selection

ORIE 4740
Spring 2017
Homework 0 Solution
1. (a) N (1 + 1, 2 + 3) = N (2, 5).
(For independent normal random variables X N (1 , 12 ), Y N (2 , 22 ), X + Y N (1 +
2 , 12 + 2 ).
(b) Nl (A, AA> ).
Since linear transformation of normal r.v. is still normal, w

STSCI 4740 Machine Learning and Data Mining
Fall 2015
Dr. Stanislav Volgushev
Homework 2, due Oct 8 before lecture
For Problems 1-4, also write down the R code and R output for all programs that
you are using. In all of the problems justify your answers.

STSCI 4740 Machine Learning and Data Mining
Dr. Stanislav Volgushev
Fall 2015
Homework 0 Solutions
Problem 1
1. Assume that X1 N (1, 2), X2 N (1, 3) and X1 , X2 are independent. What is the
distribution of X1 + X2 ?
Answer:
N (1 + 1, 2 + 3)
2. Assume that

OR 3500/5500, Fall15, Homework 1
Homework 1
Problem 1. Let = N = cfw_0, 1, 2, . . .. Define a probability by P (cfw_n) =
C2 n . What is C? What is the probability of the event cfw_n 2 : n is even?
Problem 2. Consider a grocery which has 2 cashier stations