Unformatted text preview: Our plan is to compare each word of the text against the words in the dictionary. But we don’t have to read every word of the dictionary for each word of the file; since the dictionary is sorted alphabetically, if we sort the words in our text file, we can just make one pass through both files in parallel. sort < lowcase > sorted The sort program can take arguments to do sorting in many different ways; for example, you can ask it to sort the lines of a file based on the third word of each line. But in this case, we want the simplest possible sort: The “sort key” is the entire line, and we’re sorting in character-code order (which is the same as alphabetical order since we eliminated capital letters). Common words like “the” will occur many times in our text. There’s no need to spell-check the same word repeatedly. Since we’ve sorted the file, all instances of the same word are next to each other in the file, so we can ask Unix to eliminate consecutive equal lines: uniq < sorted > nodup...
View Full Document
- Spring '10
- Sort, standard input, simplest possible sort, ﬁrst database input, consecutive equal lines