This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Word frequency counting. A common problem is to look for commonly used words in a document. For starters, well count word frequencies in a single sentence. The first step is to turn the sentence into key-value pairs in which the key is the word and the value is always 1: > (map (lambda (wd) (list (make-kv-pair wd 1))) (cry baby cry)) ((cry . 1) (baby . 1) (cry . 1)) If we group these by key and add the values, well get the number of times each word appears. (define (wordcounts1 sent) (groupreduce + 0 (sort-into-buckets (map (lambda (wd) (make-kv-pair wd 1)) sent)))) > (wordcounts1 (cry baby cry)) ((baby . 1) (cry . 2)) Now to try the same task with (simulated) files. When we use the real mapreduce , itll give us file data in the form of a key-value pair whose key is the name of the file and whose value is a line from the file, in the form of a sentence. For now, were going to simulate a file as a list whose car is the filename and whose cdr is a list of sentences, representing the lines of the file. In other words, a file is a list whose first elementcdr is a list of sentences, representing the lines of the file....
View Full Document
- Spring '10