notes27 - Word frequency counting. A common problem is to...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Word frequency counting. A common problem is to look for commonly used words in a document. For starters, well count word frequencies in a single sentence. The first step is to turn the sentence into key-value pairs in which the key is the word and the value is always 1: > (map (lambda (wd) (list (make-kv-pair wd 1))) (cry baby cry)) ((cry . 1) (baby . 1) (cry . 1)) If we group these by key and add the values, well get the number of times each word appears. (define (wordcounts1 sent) (groupreduce + 0 (sort-into-buckets (map (lambda (wd) (make-kv-pair wd 1)) sent)))) > (wordcounts1 (cry baby cry)) ((baby . 1) (cry . 2)) Now to try the same task with (simulated) files. When we use the real mapreduce , itll give us file data in the form of a key-value pair whose key is the name of the file and whose value is a line from the file, in the form of a sentence. For now, were going to simulate a file as a list whose car is the filename and whose cdr is a list of sentences, representing the lines of the file. In other words, a file is a list whose first elementcdr is a list of sentences, representing the lines of the file....
View Full Document

Ask a homework question - tutors are online