1
The Coding Problem
The source alphabet
A
of
n
symbols {a
1
,a
2
, …a
n
} and a corresponding set of probability
estimates
P=
{
p
1
,p
2
,…,p
n
} are given, such that
∑
=
n
i
p
1
1. The coding problem consists of
deciding on a
code
giving a representation of each symbol
a
i
of the alphabet using strings
over a channel alphabet
B
, which is usually {0,1} .
*********************************************************************
Code:
Source
message  
f
> code words
(alphabet
A
)
(alphabet
B
)
alphanumeric symbols
Channel alphabet= binary symbols

A
 =
n

B
=2
*********************************************************************
The symbol
a
i
may be drawn from a longer
message M
consisting of strings of source
alphabet symbols, but at this point we are considering the symbol
a
i
in isolation.
Sometimes we will denote the source alphabet
A
by the integers {1, 2, 3,…,
n
}. Let the
codewords
for a particular coding algorithm be
C
= {c
1
,c
2
,…c
n
} with corresponding
lengths of codewords being
L
={l
1
,l
2
,..,l
n
}. Then the
average code length
l
or the
expected codeword length
E
(
C,P
) is given by
∑
=
=
=
n
j
j
j
l
p
l
P
C
E
1
)
,
(
Prefixfree Code
: A code is said to have prefix property if no code word or bit pattern is
a prefix of other code word. Sometimes prefixfree code is also called simply
prefix
code
. A code is said to be
uniquely decodable or uniquely decipherable (UD)
if the
message for the code string, if it exists, can be recovered unambiguously. The
fundamental question is: how short can we make the average code length so that the code
is UD. Consider the table below giving different codes for 8 symbols
8
2
1
,...,
,
a
a
a
:
Example Codes
:
,
probabilities
codes
ai
p(ai)
Code A
Code B
Code C
Code D
Code E
Code F
a1
0.40
000
0
010
0
0
1
a2
0.15
001
1
011
011
01
001
a3
0.15
010
00
00
1010
011
011
a4
0.10
011
01
100
1011
0111
010
a5
0.10
100
10
101
10000
01111
0001
a6
0.05
101
11
110
10001
011111
00001
a7
0.04
110
000
1110
10010
0111111 000001
a8
0.01
111
001
1111
10011
01111111 000000
Avg.length
3
1.5
2.9
2.85
2.71
2.55