This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: C8345 Midterm Examination
Wednesday7 May 147 20037 9:30 — 11:30AM Directions 0 The exam is open book; any written materials may be used.
0 Answer all 9 questions on the exam paper itself.
0 The total number of points is 120 (i.e.7 1 point per minute). 0 Do not forget to Sign the pledge below. I acknowledge and accept the honor code. Print your name here: Problem 1: (12 points) Let C1 and C2 denote two columns (items) of a matrix that
represents market—basket data. Let C1 V C2 denote a column that is the row—wise logical
OR of the two columns; i.e., C1 V C2 has a 1 when either C1 or C2 or both has a 1, and has
0 otherwise. Similarly, C1 /\ C2 denotes the row—wise logical AND of the two columns; i.e.,
C1 /\ C2 has a 1 if and only if both columns have 1. Let h(C) denote the minhash value for
column C. That is, h(C) is the smallest i such that the ith row in the chosen (permuted)
order of rows has a 1 in column C. For each of the following statements, indicate whether
it is always true or sometimes false, by circling T or F, respectively. EL) V C2) 2 T F
) b /\ C2) 2 maX(h(C1), T F
C) Z h(C2), then V C2) 2 T F Problem 2: (12 points) Let In(:1:) denote the set of pages with link to page 91:, and let
Out(:1:) denote the set of pages to which page :1: links. Let h(:1:), a(:1:), and p(:1:) denote the
“hubbiness,” authority, and PageRank of page 91:, respectively. Indicate whether each of
the following statements is always true (T) or sometimes false a) If Out(i) Q Out(j), then g T F
b) If Out(i) Q Out(j), then g T F
c) If Q In(j), then g T F
d) If Q In(i), then a(i) 3 (1(3). T F Problem 3: (15 points) What are all the stable models for the following propositional—
logic program? p1 : NOT q1
q1 : NOT p1
p2 : p1 p2 : NOT q2
q2 : NOT p2 Problem 4: (15 points) A collection of market—basket data has 100,000 frequent items,
and 1,000,000 infrequent items. Each pair of frequent items appears 100 times; each pair
consisting of one frequent and one infrequent item appears 10 times, and each pair of
infrequent items appears once. Answer each of the following questions. Your answers only
have to be correct to within 1%, and for convenience, you may optionally use scientific notation, e.g., 3.14 X 108 instead of 314,000,000. a) What is the total number of pair occurrences? That is, what is the sum of the counts
of all pairs? b) We did not state the support threshold, but the given information lets us put bounds
on the support threshold 8. What are the tightest upper and lower bounds on 8? c) Suppose we apply the PCY algorithm to this data. If the actual support threshold 8
is 10,000,000 (i.e., 107), and pairs in each of the three categories distribute as evenly
as possible, what is the smallest number of buckets we can use so that most of the
buckets are not frequent? Problem 5: (16 points) Consider the following rules: p(X) : int(X) 86 X Z 2 86 NOT C(X)
C(X) : int(X) 86 p(Y) 86 divides(X,Y) 86 X 7E Y Think of p(X) as meaning “X is a prime77 and C(X) as “X is composite.77 The EDB
predicate int(X) says that X is a positive integer, and in practice it will hold a finite set
of integers. The EDB predicate divides(X, Y) means that Y evenly divides X. Suppose that int : {1,2,3,4}, and divides is the expected relation on these four
integers; that is, divides : {(1,1), (2,1), (3,1), (4,1), (2,2), (4,2), (3,3), (4,4)}. If
we instantiate these rules in all possible ways, eliminate rules with a known false subgoal
and then eliminate known true subgoals from the remaining rules, we are left with the
following: p(2) : NOT C(2) C(2) : p(1)
p(3) : NOT C(3) C(3) : p(1)
p(4) : NOT C(4) C(4) : p(1) C(4) : p(2) a) Use the alternating—fixedpoint method to compute the well—founded model for this
program plus EDB, by filling in the following table and then indicating the truth
value (T, F, UNK) of each of the eight ground atoms. The table may have extra space
for rounds that need not be computed; you may fill in the table only until you are 3 sure you have reached convergence. Round 0 1 2 3 4 Truth Value b) In the space belovv7 draw the dependency graph for the instantiated atoms and for 1 g i g 4. c) Are the rules with the given EDB locally stratiﬁed? — If so7 tell what the strata
are; if not7 describe an inﬁnite negative path. d) Suppose int contains the integers from 1 to n, and divides contains all those pairs such that divides i and i andj are integers between 1 and n. For what values of n will the rules and EDB be locally stratiﬁed? Explain brieﬁy. Problem 6: (16 points) A vievv—centric information system has a single view:
V(X,Y,Z) : e(X,Y) 86 e(Y,Z) 86 e(X,Z) We wish to answer the following query:
q(A,B,C,D) :— e(A,B) & e(B,C) & e(C,D) & e(A,C) & e(B,D) & e(A,D) Notice that in this unusual case, neither the view deﬁnition nor the query have any variables
that do not appear in the head. That fact may simplify reasoning about the problem. Also
observe that the view describes a triangle in a graph, but the edges are directed, and go in
the direction from one argument of the head (representing a node) to another that appears
to the right, among the arguments of the head. Likewise, the query asks for a complete
graph of 4 nodes, again with direction determined by “to the right, among the arguments
of the head.77 A conjunctive query Q, all of whose subgoals have predicate v, is a solution if, after
expansion, it is contained in the query. For Q to be a minimal solution, any conjunctive
query P formed by deleting one or more subgoals from the body of Q must not be a
solution; i.e., the expansion of P is not contained in the query. For each of the proposed
solutions belovv, tell whether it is: not a solution, a solution but not minimal, or a minimal solution. In each case, explain your reasoning brieﬂy. Suggestions: describe
the expansions of the proposed solutions and indicate containment mappings when needed. a) q(A,B,C,D) : V(A,B,C) 86 V(B,C,D) b) q(A,B,C,D) : V(A,B,C) & V(B,C,D) & V(A,C,D) C) q(A,B,C,D) : V(A,B,C) & V(A,E,D) & V(B,F,D) & V(G,C,D) d) q(A,B,C,D) : V(A,B,C) & V(B,C,D) & V(B,A,D) Problem 7: (8 points) A market—basket data set contains 10 items. For a particular
sample of the data7 the set of all maximal frequent itemsets is precisely the set of all pairs
ofnﬁnm. HOW'nmnyiUHnaﬁsthataresubﬁmsofthaw H)ﬁenm Wﬂlbezxpaﬂzofthe
negative border (as used in Toivonen’s Algorithm)? — Explain your answer brieﬂy. Problem 8: (16 points) Consider the following conjunctive queries With arithmetic: Q2: panic : a(X,Y) 86 a(Y,X) 86 X<Y
Q1: panic : a(A,B) 86 a(B,A) 86 A7EB We Wish to check Whether or not Q1 Q Q2.
a) Rewrite Q1 and Q2 as rectiﬁed rules. b) What are all the containment mappings from the uninterpreted subgoals of Q2 to
those of Q1? c) Write the statement about arithmetic that must be checked to verify that Q1 Q Q2. d) Is the condition of (c) true? — Explain brieﬂy. Problem 9: (10 points) Suppose a Web graph is undirected, i.e. page i points to page
j if and only page points to page i. Are the following statements true or false? Justify
your answers brieﬂy. a) The hubbiness and authority vectors are identical, i.e for each page, its hubbiness is
equal to its authority. b) The matrix M that we use to compute PageRank is symmetric; i.e. = for all i and ...
View
Full
Document
This document was uploaded on 01/25/2012.
 Spring '09

Click to edit the document details