p520-witten - COMPUTING PRACTICES Edgar H. Sibley Panel...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
COMPUTING PRACTICES Edgar H. Sibley Panel Editor The state of the art in data compression is arithmetic coding, not the better- known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding. ARITHMETIC CODING FOR DATA COIUPRESSION IAN H. WIllEN, RADFORD M. NEAL, and JOHN G. CLEARY Arithmetic coding is superior in most respects to the better-known Huffman [lo] method. It represents in- formation at least as compactly-sometimes consid- erably more so. Its performance is optimal without the need for blocking of input data. It encourages a clear separation between the model for representing data and the encoding of information with respect to that model. It accommodates adaptive models easily and is computationally efficient. Yet many authors and practitioners seem unaware of the technique. Indeed there is a widespread belief that Huffman coding cannot be improved upon. We aim to rectify this situation by presenting an accessible implementation of arithmetic coding and by detailing its performance characteristics. We start by briefly reviewing basic concepts of data compres- sion and introducing the model-based approach that underlies most modern techniques. We then outline the idea of arithmetic coding using a simple exam- ple, before presenting programs for both encoding and decoding. In these programs the model occupies a separate module so that different models can easily be used. Next we discuss the construction of fixed and adaptive models and detail the compression efficiency and execution time of the programs, including the effect of different arithmetic word lengths on compression efficiency. Finally, we out- line a few applications where arithmetic coding is appropriate. Financial support for this work has been provided by the Natural Sciences and E@neering Research Council of Canada. UNIX is a registered trademark of AT&T Bell Laboratories. 0 1987 ACM OOOl-0782/87/OtiOO-0520 750 DATA COMPRESSION To many, data compression conjures up an assort- ment of ad hoc techniques such as conversion of spaces in text to tabs, creation of special codes for common words, or run-length coding of picture data (e.g., see [8]). This contrasts with the more modern model-based paradigm for coding, where, from an input string of symbols and a model, an encoded string is produced that is (usually) a compressed version of the input. The decoder, which must have access to the same model, regenerates the exact input string from the encoded string. Input symbols are drawn from some well-defined set such as the ASCII or binary alphabets; the encoded string is a plain se- quence of bits. The model is a way of calculating, in any given context, the distribution of probabilities for the next input symbol. It must be possible for the
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/09/2011 for the course CAP 5015 taught by Professor Mukherjee during the Spring '11 term at University of Central Florida.

Page1 / 21

p520-witten - COMPUTING PRACTICES Edgar H. Sibley Panel...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online