02-assoc

Output the rules above the confidence threshold

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: C246: Mining Massive Datasets 17 Typically, data is kept in flat files rather than in a database system: Stored on disk Stored basket-by-basket Expand baskets into pairs, triples, etc. as you read baskets Use k nested loops to generate all sets of size k 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18 Item Item Item Item Item Item Item Item Item Item Item Item Basket 1 Basket 2 Basket 3 Example: items are positive integers, and boundaries between baskets are –1. Etc. 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19 The true cost of mining disk-resident data is usually the number of disk I/O’s In practice, association-rule algorithms read the data in passes – all baskets read in turn We measure the cost by the number of passes an algorithm makes over the data 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20 For many frequent-itemset algorithms, main-memory is the critical resource As we read baskets, we need to count something, e.g., occurrences of pairs of items The number of different things we can count is limited by main memory Swappin...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online