sol6 - ECE 5620 Spring 2011 Homework #6: Solutions 1. The...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECE 5620 Spring 2011 Homework #6: Solutions 1. The parsing is 0,00,000,1,10,101,0000,01,1010,1. We encode the phrases as follows Phrase 0 00 000 1 10 101 0000 01 1010 1 Pointer ∅ 1 10 00 100 101 011 001 0110 0000 Added Character 0 0 0 1 0 1 0 1 0 1 One can also encode the final phrase using only a pointer, 0011, without an added character. Using the encoding in the table, the encoded string is 01010000110001011011000110110000001. Note that in this example we have actually made the string longer. 2. (a) The LZ78 parsing is 1,11,111,1111,11111,. . . (b) Let c(n) denote the number of non-null phrases in the parsing of the string of length n. Let (n) denote the length of the encoding for the string of length n. Since the null phrase does not need to be communicated, and we never refer to the last phrase, we have (n) ≤ c(n)( log[c(n)] + 1) ≤ c(n)(log[c(n)] + 2). (1) It remains to bound c(n). Using induction one can verify that m i= i=1 m(m + 1) . 2 (2) Thus if n = m(m + 1)/2, then c(n) = m. More generally, if m(m − 1) m(m + 1) <n≤ , 2 2 (3) then c(n) = m. But the first inequality in (3) implies that (m − 1)2 < 2n or m = c(n) < √ 2n + 1. Substituting this into (1) yields √ √ (n) 2n + 1 log[ 2n + 1] + 2 ≤ · → 0. n n n 1 Universal Source Coding 313 3. (a) The Lemp el-Ziv parsing is: 0, 00, 001, 000, 0001, 1, 01, 011, 1 (b) The sequence is: 0, 1, 00, 01, 10, 11, 000, 001, . . . concatenating all binary strings of length 1,2,3, etc. This is the sequence where the phrases are as short as p ossible. (c) Clearly the constant sequence will do: 1, 11, 111, 1111, . . . 10. The question assumes that we send the midpoint of the final interval, (a source (A, P ) . For 4. Two versions of fixed-database Lempel-Ziv. Consider a, b), truncated to simplicity assume that the alphab et is finite |A| = A < ∞ , and the symb ols are i.i.d. ∼ P . A fixed database D is given, and log(b − a) + 1 decoder. The encoder parses − is revealed to the n into blo cks of length l , and subsequently enco des them by the target sequence x1 n bits. Let xnbinaryndescription of their source realizations. For idatabase.suppose thathxis leads giving the1 and x2 denote two distinct last app earance in the = 1, 2, If a matc i to a final intervalentire iblo).k Letsentdenote the midpoint of the (l log iA interval, truncated to i := not found, the of (a , bi c is yi uncompressed, requiring ai , b ) bits. A flag is used − tell the ai ) + 1 bits. We a matc in class that to log(bi − decoder whether showed h location is b eing describ ed, or the sequence itself. Problems (a) and (b) give some preliminaries a ou b y + will need in showing the optimality i i yi ∈ a i , . of fixed-database LZ in (c). 2 l (a) Let x b e a δ -typical sequence of length Without losslof generality, we may assume that a2 ≥ lb1starting athaveand let R l (x ) b e the . Then we 0, corresp onding recurrence index in the infinite past . . . , X −2 , X−1 . Show that 1 y2 − y1 > a2 − (a1 + b1 ) l l 2l E Rl (X )|X = x ≤ 2l(H +δ) 1 ≥ b1 − (a1 + b1 ) 2 where H is the entropy rate of the source. 1 = (b l(H + (b) Prove that for any > 0 , Pr Rl (X l ) > 12− a1 )) → 0 as l → ∞ . 2 Hint: Expand the probability by= 2−(− log(b1 −a1 )+1) conditioning on strings x l , and break things up into typical and non-typical. Markov’slog 1 −( − inequality1) + and the AEP should prove (b1 −a1 ) ≥2 handy as well. = 2− 1 . (c) Consider the following two fixed databases (i) D 1 is formed by taking all δ -typical ˜ l -vectors; and (ii) D2 formed by taking the most recent L = 2l(H +δ) symb ols in Now if y1 is specified to 1 bits and y1 is a prefix of y , then y − y1 < 2− 1 . The above calculation the infinite past (i.e., X−L , . . . , X−1 ). Argue that the algorithm describ ed ab ove ˜ then implies that y1 cannot be a prefix of y2 . Since y2 > y1 , y2 cannot be a prefix of y1 . is asymptotically optimal, namely that the exp ected numb er of bits-p er-symb ol Note that ifergesdoesthe entropylog(b − a) + 1 bits in the binary representation,databaseencoder is conv one to not use − rate, when used in conjunction with either or if the D 1 permittedD2 send any point in the final interval, then the code may not be prefix-free. or to . To see this, consider first a uniform distribution over an alphabet of size 3. The midpoint of the second Solution: Two versions of in binary, which is obviously symbol’s interval is 1/2 or 0.1 fixed-database Lempel-Ziv a prefix of the third symbol’s string. If one uses a binary expansion of − log(b − a) + 1 bits, however, then lthis δ) written as 0.100, which is (a) Since xl is δ -typical, the AEP implies that p(x l ) ≥ 2− (H + , and the result is not a prefix of the third symbol’s string. follows from Kac’s lemma. Consider next a binary distribution with probabilities 9/16 and 7/16. If welare not required to send the (l) (b) Fix > 0 , and δ ∈ (0, ) . Let Aδ b e the δ -typical set for A . We divide the set midpoint of the final interval, then we may send 1/2 using two bits (0.10) for the first symbol and 5/8 of sequences into the typical sequences and the non-typical sequences. using 3 bits (0.101) for the second symbol. Observe that 10 is a prefix of 101. Pr sourcel ) > 2l(H + ) ) = p(xl Pr(Rl X l ) > 2l(H + ) xl ) 5. Consider two (Rl (X strings xn and xn . Suppose )that the(codeword for x|n is a prefix of the codeword 1 2 1 xl for xn . Then the set of phrases in the parsing of xn must form the initial set of phrases (in order) in 2 1 l p(xl ) xn which > 2l(H + ) |xl unless they are equal. the parsing of xn . This requires that =n be a prefix ofPr(,Rl (X ) is impossible ) x1 2 2 (l) 6. xl ∈Aδ (a) The progression of the arithmetic encoder is as follows 2 1/ 2 5/8 5/ 8 0 23/32 5/ 8 0 1/2 1 89/128 1 1 1 1 23/32 1 Thus the final interval is (a, b) = (89/128, 23/32) = (178/256, 184/256). The midpoint of this interval is 181/256, or 0.10110101 in binary. Truncating to − log(b − a) + 1 = 7 bits gives 0.1011010 as the encoded string. (b) 0.01110 = 14/32. The progression of the arithmetic decoder is as follows 0 0 3/8 13/32 0 13/32 55/128 1/2 1 1 3/ 8 1 1/ 2 1 1/2 1/2 We choose 0 as the first symbol because 14/32 < 1/2 We choose 1 for the second symbol because 14/32 > 3/8, etc. The decoded string is 0111. 7. The quickest way of getting the desired string in the dictionary is to build it up character by character: W|Wi|Wit|With|... where the vertical lines indicate the LZ parsing and are not part of the actual string. Using the formula (2) above, the required string length is then 50 i= i=1 8. 50 · 51 = 1275. 2 (a) As shown in class, the i.i.d. distribution that assigns the maximum probability to a string is its type, which in this case is [1/3 2/3]. The probability it assigns is 1 3 n/3 · 2 3 2n/3 ≈ 0.529n . (b) As shown in class, the Markov distribution that assigns the maximum probability to a string has an initial distribution given by a point mass on the first symbol p(X1 = 0) = 1 3 and a transition probability matrix given by the Markov type 0 1 (n − 3)/(2n − 3) n/(2n − 3). . The probability that this distribution assigns to the string is n−3 2n − 3 n/3−1 · n 2n − 3 n/3 ≈ .630n . For large n, this is much larger than the probability in (a). (c) Consider the following second-order Markov chain p(X1 = 0, X2 = 1) = 1 p(0|0, 0) = 1 − p(1|0, 0) = 1 p(0|0, 1) = 1 − p(1|0, 1) = 0 p(0|1, 0) = 1 − p(1|1, 0) = 0 p(0|1, 1) = 1 − p(1|1, 1) = 1, where p(i|j, k ) refers to p(Xn = i|Xn−2 = j, Xn−1 = k ). This distribution assigns probability one to the given string, which is clearly the largest possible. (d) The distribution in (c) is second-order Markov, and hence also third-order Markov, and assigns probability one to the given string, which is clearly the largest possible. Thus it is the third-order Markov distribution that assigns the largest possible probability to the string. Note that the same reasoning implies that it is the maximum-likelihood distribution among Markov distributions of any order. 4 ...
View Full Document

This note was uploaded on 10/03/2011 for the course ECE 5620 at Cornell University (Engineering School).

Ask a homework question - tutors are online