CSCI 5510: Tutorial 7
Mid-term Preview
Tutor: Robbie
Oct. 29, 2013
1
Announcement
Assignment 2 is posted online
Deadline: 23:59:59 Nov. 17
Last penalty is same as assignment 1
Marks of assignment 1 wi
CSCI 5510 Big Data Analytics
Lecture 2: MapReduce and
Frequent Itemsets
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Grade Asses
CSCI 5510 Big Data Analytics
Lecture 2: MapReduce and
Frequent Itemsets
Prof. Irwin King and Prof.
Michael R. Lyu
Computer Science &
1
Grade Assessment Scheme and
Deadlines
Assignments
(20%)
Written
a
340
Chapter 10
Mining Social-Network
Graphs
There is much information to be gained by analyzing the large-scale data that
is derived from social networks. The best-known example of a social network
is
70
Chapter 3
Finding Similar Items
A fundamental data-mining problem is to examine data for similar items. We
shall take up applications in Section 3.1, but an example would be looking at a
collection
Chapter 2
Map-Reduce and the New
Software Stack
Modern data-mining applications, often called big-data analysis, require us
to manage immense amounts of data quickly. In many of these applications, th
CSCI5510 In-Class Practice
Social Graph
Date:
_
Students Names:
IDs:
_
_
_
_
For the graph on the right, compute:
The adjacency matrix
The degree matrix
The Laplacian matrix
Answer:
CSCI 5510 Big Data Analytics
Lecture 11: Online Learning
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Outline
Introduction
Lea
CSCI 5510 Big Data Analytics
Lecture 7: Matrix Factorization
Methods
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Outline
Introd
CSCI 5510 Big Data Analytics
Lecture 4: Mining Data Streams
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Motivation
In many dat
Chapter 1
Data Mining
In this intoductory chapter we begin with the essence of data mining and a discussion of how data mining is treated by the various disciplines that contribute
to this eld. We cov
CSCI 5510 Big Data Analytics
Mining Data Streams
Prof. Irwin King and Prof. Michael
R. Lyu
Computer Science & Engineering
Dept.
1
Motivation
In many data mining situations, we know t
he entire data se
CSCI5510 In-Class Practice
Dimensionality Reduction
Date:
_
Students Names:
IDs:
_
_
_
_
1. Describe briefly (informally or formally) the relationship between singular value
decomposition and eigenval
CSCI 5510 Big Data Analytics
Lecture 8: Massive Link Analysis
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Whats the Mechanism B
CSCI 5510 Big Data Analytics
Lecture 6: Data Representation for
High Dimensional Data
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
CSCI5510 In-Class Practice
MapReduce and Hadoop
Date:
_
Students Names:
IDs:
_
_
_
_
1. Given the following input:
I spent long spells at sea on all types of vessel; I followed officer training with t
CSCI5510 In-Class Practice
Mining Data Streams
Date:
_
Students Names:
IDs:
_
_
_
_
1. There are several ways that the bit-stream 1001011011101 could be
partitioned into buckets. Find all of them.
100
CSCI 5510 Big Data Analytics
Lecture 9: Large Scale Support
Vector Machines
Prof. Irwin King and Prof. Michael R. Lyu
Computer Science & Engineering Dept.
The Chinese University of Hong Kong
1
Motivat
CSCI5510 In-Class Practice
Scalable Clustering
Date:
Students Names:
_
IDs:
_
_
_
_
Given 8 points in the left 2D space,
suppose that the initial seeds
(centers of each cluster) are A1, A4
and A7. Run
CSCI5510 Tutorial 3
Introduction to Numpy
Guang Ling
Sept. 24, 2013
Announcement
Lecture 4 (Mining data streams) is shifted
to next Monday lecture time (when lecture
5 is scheduled)
Tutorial on Oct. 1
CSCI5510 Tutorial 5
Hints on Assignment1 and
FAQs
Guang Ling
Oct. 14, 2013
Late Penalty
Submitted before Oct. 11 23:59:59, no
penalty
Submitted before Oct. 12 23:59:59, 20%
mark deduction
Submitted be
Chapter 4
Mining Data Streams
Most of the algorithms described in this book assume that we are mining a
database. That is, all our data is available when and if we want it. In this
chapter, we shall m
CSCI5510 Tutorial 6
The Netflix Prize and Related
Recommendation Models
Jieming Zhu
Oct. 14, 2013
Outline
The Netflix Prize & the Recommendation
Problem
Some Interesting Contests on Machine
Learning A
Chapter 6
Frequent Itemsets
We turn in this chapter to one of the major families of techniques for characterizing data: the discovery of frequent itemsets. This problem is often viewed as
the discover