How File Compression Works
Harris, Tom. "How File Compression Works." 18 January 2001. HowStuffWorks.com.
<http://computer.howstuffworks.com/file-compression.htm> 21 October 2008.
If you download many programs and files off the Internet, you've probably encountered
ZIP files before. This compression system is a very handy invention, especially for Web users,
because it lets you reduce the overall number of
bits and bytes
in a file so it can be transmitted
faster over slower Internet connections, or take up less space on a disk. Once you download the
file, your computer uses a program such as
to expand the file back to its
original size. If everything works correctly, the expanded file is identical to the original file
before it was compressed.
At first glance, this seems very mysterious. How can you reduce the number of bits and
bytes and then add those exact bits and bytes back later? As it turns out, the basic idea behind the
process is fairly straightforward. In this article, we'll examine this simple method as we take a
very small file through the basic process of compression.
Most types of computer files are fairly redundant -- they have the same information listed
over and over again. File-compression programs simply get rid of the redundancy. Instead of
listing a piece of information over and over again, a file-compression program lists that
information once and then refers back to it whenever it appears in the original program.
As an example, let's look at a type of information we're all familiar with: words. In John
F. Kennedy's 1961 inaugural address, he delivered this famous line:
"Ask not what your country can do for you -- ask what you can do for your country."
The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If
each letter, space or punctuation mark takes up one unit of
, we get a total file size of 79
units. To get the file size down, we need to look for redundancies.
Immediately, we notice that:
"ask" appears two times
"what" appears two times
"your" appears two times
"country" appears two times
"can" appears two times
"do" appears two times
"for" appears two times
"you" appears two times
Ignoring the difference between capital and lower-case letters, roughly half of the phrase
is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost
everything we need for the entire quote. To construct the second half of the phrase, we just point
to the words in the first half and fill in the spaces and punctuation.