notes27 - Word frequency counting A common problem is to...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Word frequency counting. A common problem is to look for commonly used words in a document. For starters, we’ll count word frequencies in a single sentence. The first step is to turn the sentence into key-value pairs in which the key is the word and the value is always 1: > (map (lambda (wd) (list (make-kv-pair wd 1))) ’(cry baby cry)) ((cry . 1) (baby . 1) (cry . 1)) If we group these by key and add the values, we’ll get the number of times each word appears. (define (wordcounts1 sent) (groupreduce + 0 (sort-into-buckets (map (lambda (wd) (make-kv-pair wd 1)) sent)))) > (wordcounts1 ’(cry baby cry)) ((baby . 1) (cry . 2)) Now to try the same task with (simulated) files. When we use the real mapreduce , it’ll give us file data in the form of a key-value pair whose key is the name of the file and whose value is a line from the file, in the form of a sentence. For now, we’re going to simulate a file as a list whose car is the “filename” and whose cdr is a list of sentences, representing the lines of the file. In other words, a file is a list whose first elementcdr is a list of sentences, representing the lines of the file....
View Full Document

This note was uploaded on 02/17/2010 for the course COMPUTER S 26275 taught by Professor Harvey,b during the Spring '10 term at Berkeley.

Ask a homework question - tutors are online