Unformatted text preview: Word frequency counting. A common problem is to look for commonly used words in a document. For starters, we’ll count word frequencies in a single sentence. The first step is to turn the sentence into key-value pairs in which the key is the word and the value is always 1: > (map (lambda (wd) (list (make-kv-pair wd 1))) ’(cry baby cry)) ((cry . 1) (baby . 1) (cry . 1)) If we group these by key and add the values, we’ll get the number of times each word appears. (define (wordcounts1 sent) (groupreduce + 0 (sort-into-buckets (map (lambda (wd) (make-kv-pair wd 1)) sent)))) > (wordcounts1 ’(cry baby cry)) ((baby . 1) (cry . 2)) Now to try the same task with (simulated) files. When we use the real mapreduce , it’ll give us file data in the form of a key-value pair whose key is the name of the file and whose value is a line from the file, in the form of a sentence. For now, we’re going to simulate a file as a list whose car is the “filename” and whose cdr is a list of sentences, representing the lines of the file. In other words, a file is a list whose first elementcdr is a list of sentences, representing the lines of the file....
View Full Document
This note was uploaded on 02/17/2010 for the course COMPUTER S 26275 taught by Professor Harvey,b during the Spring '10 term at Berkeley.
- Spring '10