notes29 - #f Here’s how we look for lines in Fles that...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
> (mostfreq (append (file->linelist file1) (file->linelist file2) (file->linelist file3))) ((me . 8)) (Second place is ”you” and ”I” with 7 appearances each, which would have made a two-element a-list as the result.) If we had a truly enormous word list, we’d put it into a distributed Fle and use another mapreduce to Fnd the most frequent words of subsets of the list, and then Fnd the most frequent word of those most frequent words. Searching for a pattern. Another task is to search through Fles for lines matching a pattern. A pattern is a sentence in which the word * matches any set of zero or more words: > (match? ’(* i * her *) ’(i saw her standing there)) #t > (match? ’(* i * her *) ’(and i love her)) #t > (match? ’(* i * her *) ’(ps i love you))
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: #f Here’s how we look for lines in Fles that match a pattern: (define (grep pattern files) (groupreduce cons ’() (sort-into-buckets (flatmap (lambda (kv-pair) (if (match? pattern (kv-value kv-pair)) (list kv-pair) ’())) files)))) > (grep ’(* i * her *) (append (file->linelist file1) (file->linelist file2) (file->linelist file3))) (((a hard days night) (and i love her)) ((please please me) (i saw her standing there))) Summary. The general pattern here is (groupreduce reducer base-case (sort-into-buckets (map-or-flatmap mapper data ))) This corresponds to (mapreduce mapper reducer base-case data ) in the truly parallel mapreduce exploration we’ll be doing later. 295...
View Full Document

This note was uploaded on 02/17/2010 for the course COMPUTER S 26275 taught by Professor Harvey,b during the Spring '10 term at Berkeley.

Ask a homework question - tutors are online