require(DAAG)
require(mvtnorm)
require(ggplot2)
require(MASS)
# This code implements reduced rank LDA (Fisher Discriminant Analysis)
# It can reproduce figure 4.8 (alongwith decision boundary)
# in Hastie by specifing coordinates a,b
# For example, a=1,b
IEOR E4525
Machine Learning
Feb. 9th, 2015
Solutions to Assignment 2
1. Naive Bayes and spam ltering
(a) A MATLAB code for this exercise is attached to the solution. Note that NaiveBayes
functionality in MATLAB requires numerical data, so we rst relabel c
# Chapter 3 Lab: Linear Regression
library(MASS)
library(ISLR)
#
# Simple Linear Regression
fix(Boston) # invokes `edit` on Boston and then assigns the new edited in the workspace
names(Boston)
?Boston
attach(Boston) # By attaching Boston variable names i
IEOR E4525
Machine Learning
Feb. 23rd, 2015
Solutions to Assignment 3
1. Regression analysis of the CEO Pay Data Set
The R code for this exercise is enclosed (see HW3.R), here we outline only the main steps.
After loading and standardizing data, we select
IEOR E4525
Machine Learning
April 29th, 2015
Solutions to Assignment 8
1. The Party Animal
(a) The Bayes network is shown below:
Figure 1: Bayes network example
The key here is to expand P(H = 1, A = 1) in terms of conditional probabilities that are
speci
IEOR E4525
Machine Learning
April 8th, 2015
Solutions to Assignment 6
1. PCA Via Optimization
Induction base:
First, we need to show that 1 , the rst eigenvector of a covariance matrix , solves a problem
maxcfw_a a subject to a constraint a a = 1. This wa
IEOR E4525
Machine Learning
Mar. 9th, 2015
Solutions to Assignment 4
1. Scaling the inputs
True. In general it is a good idea to scale the inputs, otherwise not all components
of an input vector x contribute equally to the model output w x + b. Recall tha
# Chapter 4 Lab: Logistic Regression, LDA, QDA, and KNN
# The Stock Market Data
library(ISLR)
names(Smarket)
# Contains % returns for S&P 500 for 1250 days from 2001 to
2005
dim(Smarket)
# It also contains returns for 5 previous days and previous
day's tr
IEOR E4525: Machine Learning for OR and FE (Spring 2017)
Syllabus and Course Logistics
Instructors: Martin Haugh and Garud Iyengar
Email: [email protected] and [email protected]
URL: http:/www.columbia.edu/~mh2078/ and http:/iyengarlab.ieor.columb
IEOR E4525
Martin Haugh
Due: Thursday Feb. 23rd
Assignment 3
1. (Some Properties of Ridge Regression)
Do Exercise 4 in Chapter 6 of ISLR.
2. (Exercise 3.4 in Bishop: Error in the Predictor Regularization)
Consider a linear model of the form
y = w0 +
p
X
w
Machine Learning for OR & FE
Resampling Methods
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Some of the figures in this presentation are taken from "An Introduction to Stati
IEOR E4525
Martin Haugh and Garud Iyengar
Due: Tuesday 31 Jan 2017
Solutions to Assignment 1
2. (EDA with the Spam Filtering Data Set)
The csv file spam.csv contains a data set for emails that were categorized as spam or not
spam. The documentation for th
% solves the question / problem in the lectures
% probablity of the classes
K = 2; % # of classes
pi = rand(1,K);
pi = pi/sum(pi);
%probability of the answers
Q = 5; % # of questions
sigma = zeros(K,Q);
for k = 1:K
sigma(k,:) = rand(1,Q);
end
% randomly g
% testing the multinomial EM model
theta = 0.25;
% set seed
seed = 10;
rng(seed);
% samples of Z
pz = [0.5, 0.25*theta,0.25*(1-theta), 0.25*(1-theta), 0.25*theta];
n = 10;
m = length(pz);
y = mnrnd(n,pz);
% samples of X
x = zeros(1,m-1);
x(2:m-1) = y(3:m)
IEOR E4525
Martin Haugh & Garud Iyengar
Due: Thursday 9 February 2017
Assignment 2
You might want to work through some of the examples in Section 5.3 Lab: Cross-Validation
and the Bootstrap in ISLR before tackling questions 3 and 4. (Dont worry if youre n
sx# Chapter 3 Lab: Linear Regression
library(MASS)
library(ISLR)
#
# Simple Linear Regression
fix(Boston)
# invokes `edit` on Boston and then assigns the new edited
version of Boston in the workspace
names(Boston)
?Boston
attach(Boston) # By attaching Bos
require(DAAG)
require(ggplot2)
require(MASS)
# This code implements reduced rank LDA (Fisher Discriminant Analysis)
# It can reproduce the subplots of Figure 4.8 in HTF by specifing coordinates
a,b
# For example, a=1,b=3 reproduces the top-left sub-figure
Lab 10 - Ridge Regression and the Lasso in Python
March 9, 2016
This lab on Ridge Regression and the Lasso is a Python adaptation of p. 251-255 of Introduction to
Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie a
Lab 8 - Subset Selection in Python
March 2, 2016
This lab on Subset Selection is a Python adaptation of p. 244-247 of Introduction to Statistical Learning
with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Adapted
Lab 2 - Linear Regression in Python
February 24, 2016
This lab on Linear Regression is a python adaptation of p. 109-119 of Introduction to Statistical Learning
with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. W
spamdata <- read.csv("spam.csv")
names(spamdata) #variable column headings
str(unique(spamdata$spampct) #view the unique values of spampct
sum(is.na(spamdata$spampct) # count of NA values in spampct
summary(spamdata)#summary of all data set
summary(spamda
spamdata <- read.csv("spam.csv")
names(spamdata) #variable column headings
str(unique(spamdata$spampct) #view the unique values of spampct
sum(is.na(spamdata$spampct) # count of NA values in spampct
summary(spamdata)
# split the data
pctmissing = subset(s
IEOR E4525
Martin Haugh
Due: Tuesday 31 Jan 2017
Assignment 1
You do not need to submit anything for Questions 1 and 3.
You only need to do one of Questions 4 and 5.
1. (Implications of Big Data)
The McKinsey Global Institute recently published a report t
IEOR E4525
Iyengar
Due: Tuesday April 11th
Assignment 6
1. Binomial-Poisson mixture
Suppose a random variable X is distributed as follows
0
with probability p,
X=
Poisson() with probability 1 p,
where the Poisson() is distributed as follows:
P(Poisson() =
IEOR E4525
M. Haugh and G. Iyengar
March 24th, 2017
IEOR E4525 Midterm
Instructions :
200pts
1. There are 5 questions in all.
2. You have 2 hours and 30 minutes to do the exam.
3. The exam is closed book but you can use the cheat sheet provided with the e
IEOR E4525
M. Haugh
Due Thursday April 27th 2017
Assignment 8
1. (Barber Exercise 23.3)
Consider an HMM with three states (K = 3) and two output symbols (M = 2), with
a left-to-right state transition matrix
0.5 0.0 0.0
A = 0.3 0.6 0.0
0.2 0.4 1.0
where A
Machine Learning for OR & FE
Regression II: Regularization and Shrinkage Methods
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Some of the figures in this presentation are tak
Machine Learning for OR & FE
Supervised Learning: Classification - II
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Some of the figures in this presentation are taken from "An
IEOR E4525
Martin Haugh and Garud Iyengar
Due: April 11th, 2017
Solutions to Assignment 6
1. Binomial-Poisson mixture
Suppose a random variable X is distributed as follows
(
0
with probability p,
X=
Poisson() with probability 1 p,
where the Possion() is d
IEOR E4525 Machine Learning for OR and FE
Martin Haugh & Garud Iyengar
Due: Thursday 23 February 2017
Solutions to Assignment 3
1. (Some Properties of Ridge Regression)
Do Exercise 4 in Chapter 6 of ISLR.
Solution:
(a) iii. Since the regularizer forces th
IEOR E4525
Martin Haugh and Garud Iyengar
Due: 5pm Tuesday 21st March 2017
Solutions to Assignment 5
1. (Scaling the Inputs)
True or false: in training an SVM it is generally a good idea to scale all input variables so
that, for example, they all lie in s