Problem 1:
a) True
Consider visiting the rows in the permuted order. The first time you see a one in any of the two columns,
the column C1 \/ C2 will also have a one. Consequently, the first (minimum) row number which
corresponds to the min hash value for any of the two columns will also be the min hash for C1 \/ C2.
b) False
Consider the following permuted order or rows: 1) 1 0 2) 0 1 3) 1 1 Under this permutation the minhash
for C1 and C2 are 1 and 2, while that for C1 /\ C2 is 3.
c) True
Follows directly from part a)
d) True
Since h(C1) = h (C2), the first row (under the permuted order) that has a 1 in C1 also has a 1 in C2.
Therefore, by definition the column C1 /\ C2 also has a 1 in this row. The result follows.
Problem 2:
a) True
h(i) =
lambda
sum
k
A(i,k) a(k) h(j) =
lambda
sum
k
A(j,k) a(k) Out(i)
subseteq
Out(j) implies that
whenever A(i,k) is 1, A(j,k) is also 1. This coupled with the fact that a(k)'s are positive gives the result.
b) False
Consider the following figure. In the figure Out(i)
subset
Out(j), while p(i) > p(j)
c) True
p(i) = (1f)(
sum
k
M(i,k) p(k)) + f p(j) = (1f)(
sum
k
M(j,k) p(k)) + f where 'f' is the fudge factor and M is
the matrix that has entry M(i,k) = 1/d iff k points to i and k has degree 'd'. In(i)
subseteq
In(j) implies that
if M(i,k) = 1/d > 0, then M(j,k) = 1/d > 0. This coupled with the fact that p(k)'s are positive gives the
result
d) False
Infact the opposite is true, namely a(j) <= a(i). This follows from same reasoning as in a) with A replaced
by A
T
and
lambda
replace by
mu
Problem 3:
There are exactly 3 stable models: {p1,p2},{q1,p2} and {q1,q2}. One may arrive at the answer by
applying the GLtransform to all 16 candidate models but the following observations might relieve one of
that tedium:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
(1) Since we have pi:NOT qi and qi:NOT pi (for both i=1 and i=2), exactly one of pi and qi belongs in a
stable model.
(2) If p1 is part of a model, then so is p2.
Observation (1) reduces the number of candidate models to check down to just 4, and observation (2)
rules out the candidate {p1, q2}. The three possibilities left all turn out to be stable.
Common errors:
All errors had low support but the following two stood out:
1. Not considering all the possibilities and providing only a subset of the answer.
2. Believing that {p1,q2} is stable.
Grading:
If the provided solution was a subset of the correct solution, your score was 15*Sim_Jaccard(Correct
Solution, Your Solution).
If you provided a superset of the correct solution, but had used Observation (1), you lost 3 points.
Otherwise, you scored min(5*#correct models in solution, 15max(5*#wrong models in solution,10)).
Problem 4:
(a) There are 100,000choose2 or about 5*10
9
frequent pairs. These occur 100 times each, for a total of
5*10
11
occurrences. The number of frequentinfreqent pairs is 10
11
, and these occur 10 times each, for a
total of 10
12
occurrences. Finally, there are 1,000,000choose2 or about 5*10
11
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Quantification, infrequent items

Click to edit the document details