Boston University
Department of Computer Science
CS 565 Data Mining
Midterm Exam
Date: Oct 14, 2009
Time: 4:00 p.m.  5:30 p.m.
Write Your University Number Here:
Answer all questions.
Good luck!
Problem 1 [25 points]
True or False:
1. Maximal frequent itemsets are suﬃcient to determine all frequent itemsets with their
supports.
2. The maximal frequent itemsets (and only those) constitute the positive border of a
frequentset collection.
3. Let
D
be the Euclidean distance between multidimensional points. Assume a set of
n
points
X
=
{
x
1
,...,x
n
}
in a
d
dimensional space and project them into a lower
dimensional space
k
≥
O
(log
n
). If
Y
=
{
y
1
,...,y
n
}
is the new set of
k
dimensional
points, then, the Johnson Lindenstrauss lemma states that for all pairs (
i,j
) it holds
that
S
(
x
i
,x
j
) =
D
(
y
i
,y
j
). (All points
x
i
and
y
i
are normalized to have length 1.)
4. Computing the mean and a variance of a stream of numbers can be done using a single
pass over the data and constant (
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Data Mining, Distance, Metric space, Frequent Itemsets, Singlelinkage clustering

Click to edit the document details