CSOR W4246 Fall, 2015
Homework 1
1. (10 points)
(a) Let f1 (n) = n. Then 2n c n for all n n0 = 1 and c = 2.
22n
n
n 2
(b) Let f1 (n) = 2n . Then lim
= lim 2n = . Hence 22n = (2n ).
n
2. (20 points)
(a) T (n) = O(n2 log n)
(b) T (n) = O(n3 log n)
(c) T (n)

Midterm Exam
STAT W4240: Data Mining
Instructor: Dr. Rahul Mazumder
March 9, 2015 (M/W Section)
Explanation
This exam is to be done in-class. You have 75 minutes to complete the entirety. All solutions should be
written in the accompanying blue book. No o

STAT W4240 Sec1 Midterm2 Solution
April 17, 2015
1. (a) Split data randomly into 5 folds. For each k = 1, 2, 3, ., 20 do best subset selection
5 times. In time i hold fold i as validation data and use the other 4 folds as training set. For
the training se

cfw_
"cells": [
cfw_
"cell_type": "markdown",
"metadata": cfw_,
"source": [
"# Max Flow Applications\n",
"\n",
"The purpose of this assignment is to investigate applications of finding a Max Flow. The problem asks you to design and implement an al

CSOR W4246
HW2
Name: Yeyun Chen
UNI: yc3070
Due: Oct 18th
Date: Oct 11th
Since it is an undirected graph and unweighted graph, we can use BFS algorithm to solve this
problem. Let N_SP[t] denote the number of shortest path from s to t.
1.
First add all nod

cfw_
"cells": [
cfw_
"cell_type": "markdown",
"metadata": cfw_,
"source": [
"# Max Flow Applications\n",
"\n",
"The purpose of this assignment is to investigate applications of finding a Max Flow. The problem asks you to design and implement an al

cfw_
"cells": [
cfw_
"cell_type": "markdown",
"metadata": cfw_,
"source": [
"# Connected Components\n",
"\n",
"The purpose of this assignment is to familiarize yourself with the handling of graph data structures. You will implement the algorithm f

CSOR W4246
HW3
Name: Yeyun Chen
UNI: yc3070
Due: Nov 04th
Date: Oct 29th
So, the maximum flow and the capacity of minimum cut is 11, and the minimum cut is:
S = cfw_s, a, b, c, T = cfw_d, t
-1/5
CSOR W4246
HW3
1.
Name: Yeyun Chen
UNI: yc3070
Due: Nov 04th

Midterm Exam
STAT W4240: Data Mining
Instructor: Dr. Rahul Mazumder
April 8, 2015 (M/W Section)
Explanation
This exam is to be done in-class. You have 75 minutes to complete the entirety. All solutions should be
written in the accompanying blue book. No o

CSOR W4246 Fall, 2015
HW1 Theoretical part
Out: Thursday, September 17, 2015
Due: 8pm, Monday, September 28, 2015
Please keep your answers clear and concise. For all algorithms you suggest, you must give the best upper
bound that you can for the running t

cfw_
"cells": [
cfw_
"cell_type": "markdown",
"metadata": cfw_,
"source": [
"# NP-Hard Problems\n",
"\n",
"The purpose of this assignment is to familiarize yourself with different approaches to solving NP-hard problems in practice, especially via

CSOR W4246 Fall, 2015
Homework 3 Theoretical part
Out: Thursday, October 22, 2015
Due: 1pm, Wednesday, November 4, 2015
Please keep your answers clear and concise. For all algorithms you suggest, you must give the best upper
bound that you can for the run

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, November 13, 2014
Outline
1 The structure of the WWW
2 Identifying important pages
3 Hubs and authorities
4 PageRank
Review of last lecture
Data

CSOR W4246 Fall, 2015
Homework 2 Theoretical part
Out: Monday, October 5, 2015
Due: 9pm, Monday, October 19, 2015
Please keep your answers clear and concise. For all algorithms you suggest, you must give the best upper
bound that you can for the running t

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, November 6, 2014
Outline
1 Recap
2 Boolean expressions and satisability
3 The art of proving N P-completeness
4 Important members of N PC
Review

CSOR W4246 Fall, 2014
Homework 1
Out: Monday, September 8, 2014
Due: 6pm, Monday, September 22, 2014
Please keep your answers clear and concise, and make sure that your hand-writing is legible and
that your name is clearly written on your homework if you

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, October 9, 2014
Outline
1 Recap
2 Hashing
Review of the last lecture
Negative cycle detection
All-pairs shortest paths (Floyd-Warshall)
Hashing
T

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Tuesday, October 21, 2014
Outline
1 Recap
2 Review of ows
3 Correctness of the Ford-Fulkerson algorithm
4 An application of Max-Flow: Bipartite Matching
Re

CSOR W4246 Fall, 2015
Homework 4 Theoretical part
Out: Thursday, November 12, 2015
Due: 1pm, Wednesday, November 25, 2015
Please keep your answers clear and concise. For all algorithms you suggest, you must prove correctness and give the best upper bound

CSOR W4246 Fall, 2016
Homework 3 Theoretical part
Out: Thursday, October 27, 2016
Due: 8pm, Thursday, November 10, 2016
Please keep your answers clear and concise. For all algorithms you suggest, you must give the best upper
bound that you can for the run

Chapter-3
Greedy Method
3.1 Greedy Technique Definition
Constructs a solution to an optimization problem piece by piece through a sequence of
choices that are: feasible, i.e. satisfying the constraints locally optimal (with respect to some
neighborhood de

Master method for Solving Recurrences
Introduction
Consider a problem that can be solved using a recursive algorithm such as the
following:
Procedure T( n : size of problem ) defined as:
if n < 1 then exit
Do work of amount f(n)
T(n/b)
T(n/b)
.repeat for

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%
J = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z);
% = YOUR CODE HERE =
% Instructions: Compute the sigmoid of each value of z (z

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
October 20-22, 2015
Outline
1 Recap
2 Flow networks
Applications
3 The residual graph and augmenting paths
4 The Ford-Fulkerson algorithm for max ow
5 Corr

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, December 3, 2015
Outline
1 The structure of the WWW
2 Identifying important pages via link analysis
3 Hubs and authorities
4 PageRank
Review of l

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Tuesday, November 24, 2015
Outline
1 Recap
2 Hashing
3 Time/space analysis of chain hashing
Balls and bins models
Expected & worst-case analysis of Lookup

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Tuesday, December 1, 2015
Outline
1 Recap
Balls and bins
2 On randomized algorithms
3 Saving space: hashing-based ngerprints
4 Bloom lters
Today
1 Recap
Ba