29-Huffman-Encoding

29-Huffman-Encoding - CS106X Autumn 2010 Handout 29...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS106X Handout 29 Autumn 2010 November 8 th , 2010 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 1980s, personal computers had hard disks that were no larger than 10MB; today, the puniest of disks are still measured in gigabytes. Even though hard drives are getting bigger, the files we want to store (images, videos, MP3s and so on) seem to keep pace with that growth which makes even today's gargantuan disk seem too small to hold everything. One technique to use our storage more optimally is to compress the files. By taking advantage of redundancy or patterns, we may be able to "abbreviate" the contents in such a way to take up less space yet maintain the ability to reconstruct a full version of the original when needed. Such compression could be useful when trying to cram more things on a disk or to shorten the time needed to copy/send a file over a network. There are compression algorithms that you may already have heard of. Some compression formats, such as GIF, MPEG, or MP3, are specifically designed to handle a particular type of data file. They tend to take advantage of known features of that type of data (such as the propensity for pixels in an image to be same or similar colors to their neighbors) to compress it. Other tools such as compress , zip , or pack and programs like StuffIt or ZipIt can be used to compress any sort of file. These algorithms have no a priori expectations and usually rely on studying the particular data file contents to find redundancy and patterns that allow for compression. Some of the compression algorithms (e.g. JPEG, MPEG) are lossy —decompressing the compressed result doesn't recreate a perfect copy of the original. Such an algorithm compresses by "summarizing" the data. The summary retains the general structure while discarding the more minute details. For sound, video, and images, this imprecision may be acceptable because the bulk of the data is maintained and a few missed pixels or milliseconds of video delay is no big deal. For text data, though, a lossy algorithm usually isn't appropriate. An example of a lossy algorithm for compressing text would be to remove all the vowels. Compressing the previous sentence by this scheme results in: n xmpl f lssy lgrthm fr cmprssng txt wld b t rmv ll th vwls. This shrinks the original 87 characters down to just 61 and requires only 70% of the original space. To decompress, we could try matching the consonant patterns to English words with vowels inserted, but we cannot reliably reconstruct the original in this manner. Is the compressed word "fr" an abbreviation for the word "four" or the word "fir" or "far"? An intelligent reader can usually figure it out by context, but, alas, a brainless computer
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 can't be sure and would not be able to faithfully reproduce the original. For files containing text, we usually want a lossless scheme so that there is no ambiguity when re- creating the original meaning and intent. An Overview
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/13/2011 for the course CS 106X taught by Professor Cain,g during the Fall '08 term at Stanford.

Page1 / 10

29-Huffman-Encoding - CS106X Autumn 2010 Handout 29...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online