This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Module 5, Lecture 1 Data Compression: Introduction G.L. Heileman Module 5, Lecture 1 Data Compression Introduction In this module we study the science (art) or representing information (i.e., data) in a compact form. Key idea: Compact representations are created by identifying and exploiting structure that exists in the data. Ex: Morse code – Uses an alphabet of four symbols, dot ( · ), dash (–), letter space, and word space, to encode the English alphabet. – Shorter codewords are assigned more frequently occurring letters, and longer ones to less frequently occurring letters. E.g., e ( · ) a ( · ) q (  · ) j ( ·   ) – What kind of structure is being exploited? Statistical. G.L. Heileman Module 5, Lecture 1 Data Compression Setup Overview: source encoder decoder x n ε X x n C( ) x n ε X transmission (error free) original reconstructed data data compressed data n n We assume: The information source is outputting a string x n ∈ X n . The encoder performs a mapping from source data x n to codewords C ( x n ). Two general types of codes: fixed length (e.g., ASCII, Unicode) – all codewords have the same length. A q character alphabet requires d log q e bits/symbol. variable length – # of bits vary from codeword to codeword (hopefully  C ( x n )  is much smaller than  x n  ). The communication channel could be: a telephone line, radio waves through the atmosphere, disk drive (communication does not necessarily involved moving data from one place to another). In every case, we’re assuming errorfree communication. G.L. Heileman Module 5, Lecture 1 Data Compression Setup The decoder must “know” the encoding algorithm or the encoder needs to send a codebook with the encoded data). If ˆ x n = x n , this is called lossless compression (remember, we’re assuming errorfree communication). If ˆ x n 6 = x n , this is called lossy compression , e.g., JPEG. Finally, given some compression algorithm, we generally want to be able to say something about the performance of the algorithm. Factors include: Amount of compression – Lossless: – compression ratio – ratio of bits used before and after compression. – expected codeword length – average number of bits/symbol. Lossy: Same as above, except that we need to quantify the difference between x n and ˆ x n (rate distortion theory). Efficiency of the compression algorithm. Complexity of the compression algorithm. G.L. Heileman Module 5, Lecture 1 Data Compression Models Other types of structure can be exploited for compression. Ex 1: Numerical Data – Consider the data { x 1 , x 2 , . . . } given by { 9 , 11 , 11 , 11 , 14 , 13 , 15 , 17 , 16 , 17 , 20 , 21 } . 2 4 6 8 10 12 8 10 12 14 16 18 20 22 Since these numbers are in the range [0 , 32], if we encoded them directly, we would need 5 bits/sample....
View
Full Document
 Spring '10
 G.L.Heileman
 codeword, G.L. Heileman, G.L. Heileman Module

Click to edit the document details