Classical Information Theory
Robert B. Grifths
Version oF 12 January 2010
Contents
1 Introduction
1
2 Shannon Entropy
1
3T
w
oR
a
n
d
omV
a
r
i
a
b
l
e
s
3
4 Conditional Entropies and Mutual Information
4
5C
h
a
n
n
e
lC
a
p
a
c
i
t
y
6
ReFerences:
CT = Cover and Thomas,
Elements of Information Theory
, 2d edition (Wiley, 2006)
QCQI =
Quantum Computation and Quantum Information
by Nielsen and Chuang (Cambridge, 2000),
Secs. 11.1, 11.2
1
Introduction
±
Classical inFormation theory is a well-developed subject—see CT For a very thorough presentation—
which provides some oF the motivation and many oF the tools and concepts used in quantum inFormation.
Consequently it is useFul to know something about classical inFormation theory beFore studying how it needs
to be modi±ed in situations where quantum e²ects are important.
•
These notes are intended to provide a quick survey with an emphasis on intuitive ideas. ProoFs and
lots oF details are leFt to CT and the much more compact treatment in QCQI.
2
Shannon Entropy
±
Suppose we have a certain message we want to transmit From location
A
to location
B
. What resources
are needed to get it From here to there? How long will it take iF we have a channel with a capacity oF
c
bits
per second? IF transmission introduces errors, what do we do about them? These are the sorts oF questions
which are addressed by inFormation theory as developed by Shannon and his successors.
•
There are, clearly,
N
=2
n
messages which can be represented by a string oF
n
bits; e.g., 8 messages
000, 001,.
. . 111 For
n
= 3. Hence iF we are trying to transmit one oF
N
distinct messages it seems sensible to
de±ne the amount oF inFormation carried by a single message to be log
2
N
bits, which we hereaFter abbreviate
to log
N
(in contrast to ln
N
For the natural logarithm).
◦
Substituting some other base For the logarithm merely means multiplying log
2
N
by a constant Factor,
which is, in e²ect, changing the units in which we measure inFormation. E.g., ln
N
nits in place oF log
N
bits.
±
A key observation is that iF we are in the business oF
repeatedly
transmitting messages From a collection
oF
N
messages, and iF the messages can be assigned a non-uniForm
probability distribution
, then it is, on
average, possible to use Fewer than log
N
bits per message in order to transmit them, or to store them.
•
Efciently storing messages is known as
data compression
, and exempli±ed by gzip. The trick is to
encode
them in some clever way so that, For example, more common messages are represented using short
strings oF bits, and longer strings are used For less common messages.
1