Cleary - Unbounded Length Contexts for PPM J OHN G. C LEARY...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Unbounded Length Contexts for PPM J OHN G. C LEARY AND W. J. T EAHAN Department of Computer Science, University of Waikato, Hamilton, New Zealand Email: jcleary@cs.waikato.ac.nz, wjt@cs.waikato.ac.nz The PPM data compression scheme has set the performance standard in lossless compression of text throughout the past decade. PPM is a " nite-context statistical modelling technique that can be viewed as blending together several " xed-order context models to predict the next character in the input sequence. This paper gives a brief introduction to PPM, and describes a variant of the algorithm, called PPM*, which exploits contexts of unbounded length. Although requiring considerably greater computational resources (in both time and space), this reliably achieves compression superior to the benchmark PPMC version. Its major contribution is that it shows that the full information available by considering all substrings of the input string can be used effectively to generate high-quality predictions. Hence, it provides a useful tool for exploring the bounds of compression. Received June 28, 1996; revised July 25, 1997 1. INTRODUCTION The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless com- pression of text throughout the past decade. The original algorithm was " rst published in 1984 by Cleary and Witten [1], and a series of improvements was described by Mof- fat, culminating in a careful implementation, called PPMC, which has become the benchmark version [2]. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. Other meth- ods such as those based on Ziv&Lempel coding [3, 4] are more commonly used in practice, but their attractiveness lies in their relative speed rather than any superiority in compression±indeed, their compression performance gen- erally falls distinctly below that of PPM in practical bench- mark tests [5]. Prediction by partial matching, or PPM, is a " nite-context statistical modelling technique that can be viewed as blend- ing together several " xed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively, and the symbol that ac- tually occurs is encoded relative to its predicted distribution using arithmetic coding [6, 7]. The maximum context length is a " xed constant, and it has been found that increasing it beyond about 5 does not generally improve compression [1, 2, 8]. The present paper 1 describes an algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to the benchmark PPMC version, al- though our current implementation uses considerably greater computational resources (in both time and space). The next section describes the basic PPM compression scheme.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/09/2011 for the course CAP 5015 taught by Professor Mukherjee during the Spring '11 term at University of Central Florida.

Page1 / 9

Cleary - Unbounded Length Contexts for PPM J OHN G. C LEARY...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online