*This preview shows
pages
1–4. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS246 Midterm, Fall 2007 — Page: 1 UCLA Instructor: J. Cho Computer Science Department Fall 2007 Student Name and ID: CS246 Midterm: 1.5 Hours Attach extra pages as needed. Write your name and ID on the extra pages. If you need to make additional assumption to solve a problem, write it down on the sheet. A calculator and one-page double-sided cheatsheet are allowed. Problem Score 1 20 2 20 3 20 4 20 5 20 Total 100 Exam Score: CS246 Midterm, Fall 2007 — Page: 2 Problem 1: 20 points Using the apriori algorithm, you have identified a frequent k-itemset I = { i 1 ,...,i k } . Now you task is to find all association rules with items in I , whose confidence scores are **below** c . That is, we want to find all low-confidence rules from the itemset I . For this task, you have to consider a rule r only if it has every item in I on either the left side or the right side. For example, for I = { a,b,c,d } , you should consider the rule r 1 : ab → cd , but not r 2 : a → bc , because r 2 does not have d . From the apriori algorithm, you already know the frequency counts of all itemsets. Explain how you can identify the low-confidence rules from I efficiently. (Hint: Consider I = { a,b,c,d } . For the rule cd → ab , we know that P ( ab | cd ) = P ( abcd ) P ( cd ) . Note that every rule that we consider from I shares the same enumerator P ( abcd ) in its formula for the confidence because all items in I should appear in the rule.) Answer: From, P ( ab | cd ) = P ( abcd ) P ( cd ) < c , we can derive P ( abcd ) c < P ( cd ). That is, the rule cd → ab is a low confidence rule if and only if P ( cd ) > P ( abcd ) c = c ′ . Therefore, we can identify all low confidence rules efficiently simply by applying the Apriori algorithm with the new threshold value c ′ = P ( abcd ) c . CS246 Midterm, Fall 2007 — Page: 3 Problem 2: 20 Points In order to take random samples from the following graph, you have run the WebWalker algorithm multiple times. In one of the runs, the algorithm visited the nodes in the sequencealgorithm multiple times....

View
Full
Document