mation, channels, codes, and communications systems were rigorously extended
to more general random processes with abstract alphabets and discrete and
continuous time by Khinchine [73], [74] and by Kolmogorov and his colleagues,
especially Gelfand, Yaglom, Dobrushin, and Pinsker [45], [91], [88], [32], [126].
(See, for example, “Kolmogorov’s contributions to information theory and algo-
rithmic complexity” [23].) In almost all of the early Soviet work, it was average

xvi
PROLOGUE
mutual information that played the fundamental role.
It was the more natu-
ral quantity when more than one process were being considered. In addition,
the notion of entropy was not useful when dealing with processes with contin-
uous alphabets since it is generally infinite in such cases. A generalization of
the idea of entropy called
discrimination
was developed by Kullback (see, e.g.,
Kullback [93]) and was further studied by the Soviet school. This form of infor-
mation measure is now more commonly referred to as relative entropy or cross
entropy (or Kullback-Leibler number) and it is better interpreted as a measure
of similarity between probability distributions than as a measure of information
between random variables.
Many results for mutual information and entropy
can be viewed as special cases of results for relative entropy and the formula for
relative entropy arises naturally in some proofs.
It is the mathematical aspects of information theory and hence the descen-
dants of the above results that are the focus of this book, but the developments
in the engineering community have had as significant an impact on the founda-
tions of information theory as they have had on applications. Simpler proofs of
the basic coding theorems were developed for special cases and, as a natural off-
shoot, the rate of convergence to the optimal performance bounds characterized
in a variety of important cases. See, e.g., the texts by Gallager [43], Berger [11],
and Csisz`
ar and K¨
orner [26]. Numerous practicable coding techniques were de-
veloped which provided performance reasonably close to the optimum in many
cases: from the simple linear error correcting and detecting codes of Slepian
[139] to the huge variety of algebraic codes currently being implemented (see,
e.g., [13], [150],[96], [98], [18]) and the various forms of convolutional, tree, and
trellis codes for error correction and data compression (see, e.g., [147], [69]).
Clustering techniques have been used to develop good nonlinear codes (called
“vector quantizers”) for data compression applications such as speech and image
coding [49], [46], [100], [69], [119]. These clustering and trellis search techniques
have been combined to form single codes that combine the data compression
and reliable communication operations into a single coding system [8].
The engineering side of information theory through the middle 1970’s has
been well chronicled by two IEEE collections:
Key Papers in the Development
of Information Theory,
edited by D. Slepian [140], and
Key Papers in the Devel-
opment of Coding Theory,
edited by E. Berlekamp [14]. In addition there have

#### You've reached the end of your free preview.

Want to read all 313 pages?

- Spring '10
- sd
- Electrical Engineering, Information Theory, Probability theory, Ergodic theory, Ergodic Properties, Don Ornstein