it_script_v65.pdf - Information Theory Lecture Notes Stefan M Moser 6th Edition \u2014 2018 c Copyright Stefan M Moser Signal and Information Processing

it_script_v65.pdf - Information Theory Lecture Notes Stefan...

This preview shows page 1 out of 559 pages.

You've reached the end of your free preview.

Want to read all 559 pages?

Unformatted text preview: Information Theory Lecture Notes Stefan M. Moser 6th Edition — 2018 c Copyright Stefan M. Moser Signal and Information Processing Lab ETH Zürich Zurich, Switzerland Institute of Communications Engineering National Chiao Tung University (NCTU) Hsinchu, Taiwan You are welcome to use these lecture notes for yourself, for teaching, or for any other noncommercial purpose. If you use extracts from these lecture notes, please make sure to show their origin. The author assumes no liability or responsibility for any errors or omissions. 6th Edition — 2018. Version 6.5. Compiled on 24 August 2019. For the latest version see Contents Preface xi 1 Shannon’s Measure of Information 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Uncertainty or Entropy . . . . . . . . . . . . . . . . . . . . 1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Binary Entropy Function . . . . . . . . . . . . . 1.2.3 The Information Theory Inequality . . . . . . 1.2.4 Bounds on H(U ) . . . . . . . . . . . . . . . . . . . 1.2.5 Conditional Entropy . . . . . . . . . . . . . . . . . 1.2.6 Extensions to More RVs . . . . . . . . . . . . . . 1.2.7 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . 1.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Conditional Mutual Information . . . . . . . . 1.3.4 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . 1.4 Comments on our Notation . . . . . . . . . . . . . . . . . 1.4.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Entropy and Mutual Information . . . . . . . 1.A Appendix: Uniqueness of the Definition of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 6 6 8 9 11 12 16 17 18 18 20 20 21 21 21 22 23 2 Review of Probability Theory 2.1 Discrete Probability Theory . . 2.2 Discrete Random Variables . . 2.3 Continuous Random Variables 2.4 The Jensen Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 26 30 31 3 Entropy, Relative Entropy, and Variational Distance 3.1 Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Variational Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Relations between Entropy and Variational Distance . . . 3.3.1 Estimating PMFs . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Extremal Entropy for given Variational Distance . . . . . . . . . . . . . . . . . . . . 33 33 35 36 36 38 iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c Stefan M. Moser, version 6.5 iv Contents 3.3.3 3.4 Lower Bound on Entropy in Terms of Variational Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Entropy Distribution . . . . . . . . . . . . . . . . . . . . . . 4 Data Compression: Efficient Coding of a Single Random Message 4.1 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 A Coding Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Prefix-Free or Instantaneous Codes . . . . . . . . . . . . . . . . . . 4.4 Trees and Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Kraft Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Trees with Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 What We Cannot Do: Fundamental Limitations of Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 What We Can Do: Analysis of Some Good Codes . . . . . . . 4.8.1 Shannon-Type Codes . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Shannon Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Fano Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.4 Coding Theorem for a Single Random Message . . . . 4.9 Optimal Codes: Huffman Code . . . . . . . . . . . . . . . . . . . . 4.10 Types of Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.A Appendix: Alternative Proof for the Converse Part of the Coding Theorem for a Single Random Message . . . . . . . . . 5 Data Compression: Efficient Coding of a Memoryless Random Source 5.1 A Discrete Memoryless Source . . . . . . . . . . . . . . . . . . . . 5.2 Block–to–Variable-Length Coding of a DMS . . . . . . . . . . 5.3 Arithmetic Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Variable-Length–to–Block Coding of a DMS . . . . . . . . . . 5.5 General Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Optimal Message Sets: Tunstall Message Sets . . . . . . . . . 5.7 Optimal Variable-Length–to–Block Codes: Tunstall Codes 5.8 The Efficiency of a Source Coding Scheme . . . . . . . . . . . . . . . . . . . . . . 41 45 . . . . . . . . . . . . 49 49 51 52 53 58 60 . . . . . . . . . . . . . . . . 66 67 67 70 74 79 81 94 .. 98 . . . . . . . . . . . . . . . . . . . . . . 101 101 102 104 104 105 111 114 120 120 122 127 6 Stochastic Processes and Entropy Rate 129 6.1 Discrete Stationary Sources . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.3 Entropy Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7 Data Compression: Efficient Coding of a Random Source with Memory 145 7.1 Block–to–Variable-Length Coding of a DSS . . . . . . . . . . . . . . 145 c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 v Contents 7.2 7.3 Elias–Willems Universal Block–To–Variable-Length Coding 7.2.1 The Recency Rank Calculator . . . . . . . . . . . . . . . . 7.2.2 Codes for Positive Integers . . . . . . . . . . . . . . . . . . 7.2.3 Elias–Willems Block–to–Variable-Length Coding for DSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sliding Window Lempel–Ziv Universal Coding Scheme . . . . .. .. .. a .. .. 148 149 151 154 156 8 Data Compression: Efficient Coding of an Infinitely Long Fixed Sequence 8.1 Information-Lossless Finite State Encoders . . . . . . . . . . . . . 8.2 Distinct Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Analysis of Information-Lossless Finite State Encoders . . . . . 8.4 Tree-Structured Lempel–Ziv Universal Coding Scheme . . . . . 8.5 Analysis of Tree-Structured Lempel–Ziv Coding . . . . . . . . . . . . . . 159 160 161 165 166 167 9 Optimizing Probability Vectors over Concave Functions: Karush–Kuhn–Tucker Conditions 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Convex Regions and Concave Functions . . . . . . . . . . . . . . 9.3 Maximizing Concave Functions . . . . . . . . . . . . . . . . . . . . 9.A Appendix: The Slope Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 171 172 175 180 10 Gambling and Horse Betting 10.1 Problem Setup . . . . . . . . . . . . . . . 10.2 Optimal Gambling Strategy . . . . . . 10.3 The Bookie’s Perspective . . . . . . . . 10.4 Uniform Fair Odds . . . . . . . . . . . . 10.5 What About Not Gambling? . . . . . 10.6 Optimal Gambling for Subfair Odds 10.7 Gambling with Side-Information . . . 10.8 Dependent Horse Races . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 183 184 188 190 191 192 197 199 11 Data Transmission over a Noisy Digital Channel 11.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Discrete Memoryless Channels . . . . . . . . . . . . . . 11.3 Coding for a DMC . . . . . . . . . . . . . . . . . . . . . . 11.4 The Bhattacharyya Bound . . . . . . . . . . . . . . . . . 11.5 Operational Capacity . . . . . . . . . . . . . . . . . . . . 11.6 Two Important Lemmas . . . . . . . . . . . . . . . . . . 11.7 Converse to the Channel Coding Theorem . . . . . . 11.8 The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 203 208 210 215 217 219 222 224 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Computing Capacity 235 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 12.2 Strongly Symmetric DMCs . . . . . . . . . . . . . . . . . . . . . . . . . 235 12.3 Weakly Symmetric DMCs . . . . . . . . . . . . . . . . . . . . . . . . . . 239 c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 vi Contents 12.4 Mutual Information and Convexity . . . . . . . . . . . . . . . . . . . . 244 12.5 Karush–Kuhn–Tucker Conditions . . . . . . . . . . . . . . . . . . . . . 247 13 Convolutional Codes 13.1 Convolutional Encoder of a Trellis Code . . . . . . . . . 13.2 Decoder of a Trellis Code . . . . . . . . . . . . . . . . . . . . 13.3 Quality of a Trellis Code . . . . . . . . . . . . . . . . . . . . 13.3.1 Detours in a Trellis . . . . . . . . . . . . . . . . . . 13.3.2 Counting Detours: Signalflowgraphs . . . . . . 13.3.3 Upper Bound on the Bit Error Probability of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... ....... ....... ....... ....... a Trellis ....... 14 Polar Codes 14.1 The Polar Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Recursive Application of the Polar Transform . . . 14.2.2 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 Are these Channels Realistic? . . . . . . . . . . . . . . 14.2.4 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.5 Proof of Theorem 14.15 . . . . . . . . . . . . . . . . . . 14.2.6 Attempt on a Polar Coding Scheme for the BEC 14.3 Channel Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Polar Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Coset Coding Scheme . . . . . . . . . . . . . . . . . . . . 14.4.2 Performance of Coset Coding . . . . . . . . . . . . . . 14.4.3 Polar Coding Schemes . . . . . . . . . . . . . . . . . . . 14.5 Polar Coding for Symmetric DMCs . . . . . . . . . . . . . . . 14.6 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.3 Code Creation . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.A Appendix: Landau Symbols . . . . . . . . . . . . . . . . . . . . . 14.B Appendix: Concavity of Z(W) and Proof of (14.152) in Theorem 14.20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.C Appendix: Proof of Theorem 14.24 . . . . . . . . . . . . . . . . 14.C.1 Converse Part . . . . . . . . . . . . . . . . . . . . . . . . . 14.C.2 Direct Part . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 251 254 260 260 263 268 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 275 282 282 288 289 291 295 299 299 306 306 308 311 313 316 316 318 323 324 325 . . . . . . . . . . . . . . . . 327 330 330 331 15 Joint Source and Channel Coding 15.1 Information Transmission System . . . . . . . . . . . . . . . . . . . . 15.2 Converse to the Information Transmission Theorem . . . . . . . 15.3 Achievability of the Information Transmission Theorem . . . . 15.3.1 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.2 An Achievable Joint Source Channel Coding Scheme 15.4 Joint Source and Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 335 335 336 337 338 338 340 c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 vii Contents 15.5 The Rate of a Joint Source Channel Coding Scheme . . . . . . . 343 15.6 Transmission above Capacity and Minimum Bit Error Rate . . 344 16 Continuous Random Variables and Differential Entropy 16.1 Entropy of Continuous Random Variables . . . . . . . . . . . . 16.2 Properties of Differential Entropy . . . . . . . . . . . . . . . . . . 16.3 Generalizations and Further Definitions . . . . . . . . . . . . . 16.4 Mixed Continuous and Discrete Random Variables . . . . . 16.5 Multivariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . 17 The 17.1 17.2 17.3 Gaussian Channel Introduction . . . . . . . . . . . . . . . . . . . . . . . Information Capacity . . . . . . . . . . . . . . . . Channel Coding Theorem . . . . . . . . . . . . . 17.3.1 Plausibility . . . . . . . . . . . . . . . . . . 17.3.2 Achievability . . . . . . . . . . . . . . . . . 17.3.3 Converse . . . . . . . . . . . . . . . . . . . . 17.4 Joint Source and Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 351 355 356 359 361 . . . . . . . 365 365 367 369 370 371 378 380 18 Bandlimited Channels 385 18.1 Additive White Gaussian Noise Channel . . . . . . . . . . . . . . . . 385 18.2 The Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 18.3 From Continuous To Discrete Time . . . . . . . . . . . . . . . . . . . 392 19 Parallel Gaussian Channels 19.1 Channel Model . . . . . . . . . . . . . . . . . . 19.2 Independent Parallel Gaussian Channels 19.3 Optimal Power Allocation: Waterfilling . 19.4 Dependent Parallel Gaussian Channels . 19.5 Colored Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 399 400 403 405 408 20 Asymptotic Equipartition Property and Weak Typicality 20.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Random Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Typical Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 High-Probability Sets and the Typical Set . . . . . . . . . . . . . 20.6 Data Compression Revisited . . . . . . . . . . . . . . . . . . . . . . 20.7 AEP for General Sources with Memory . . . . . . . . . . . . . . . 20.8 General Source Coding Theorem . . . . . . . . . . . . . . . . . . . 20.9 Joint AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.10 Jointly Typical Sequences . . . . . . . . . . . . . . . . . . . . . . . . 20.11 Data Transmission Revisited . . . . . . . . . . . . . . . . . . . . . . 20.12 Joint Source and Channel Coding Revisited . . . . . . . . . . . 20.13 Typicality for Continuous Random Variables . . . . . . . . . . . 20.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 413 415 416 417 420 422 424 425 427 428 431 433 435 439 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 viii 21 Cryptography 21.1 Introduction to Cryptography . . . . . . . . . 21.2 Cryptographic System Model . . . . . . . . . . 21.3 The Kerckhoff Hypothesis . . . . . . . . . . . . 21.4 Perfect Secrecy . . . . . . . . . . . . . . . . . . . . 21.5 Imperfect Secrecy . . . . . . . . . . . . . . . . . . 21.6 Computational vs. Unconditional Security . 21.7 Public-Key Cryptography . . . . . . . . . . . . 21.7.1 One-Way Function . . . . . . . . . . . . 21.7.2 Trapdoor One-Way Function . . . . . Contents . . . . . . . . . 441 441 442 443 444 446 449 450 450 453 . . . . . . . . . . 457 457 459 462 467 469 B Gaussian Vectors B.1 Positive Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . B.2 Random Vectors and Covariance Matrices . . . . . . . . . . . . . . B.3 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . B.4 A Standard Gaussian Vector . . . . . . . . . . . . . . . . . . . . . . . B.5 Gaussian Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.6 The Mean and Covariance Determine the Law of a Gaussian B.7 Canonical Representation of Centered Gaussian Vectors . . . . B.8 The Characteristic Function of a Gaussian Vector . . . . . . . . B.9 The Density of a Gaussian Vector . . . . . . . . . . . . . . . . . . . . B.10 Linear Functions of Gaussian Vectors . . . . . . . . . . . . . . . . . B.11 A Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 471 476 480 481 482 485 489 492 492 495 496 . . . . . . . . 499 499 501 503 504 505 509 510 513 A Gaussian Random Variables A.1 Standard Gaussian Random Variables . . . . A.2 Gaussian Random Variables . . . . . . . . . . . A.3 The Q-Function . . . . . . . . . . . . . . . . . . . A.4 The Characteristic Function of a Gaussian A.5 A Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Stochastic Processes C.1 Stochastic Processes & Stationarity . . . . . . . . . C.2 The Autocovariance Function . . . . . . . . . . . . . . C.3 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . C.4 The Power Spectral Density . . . . . . . . . . . . . . . C.5 Linear Functionals of WSS Stochastic Processes C.6 Filtering Stochastic Processes . . . . . . . . . . . . . . C.7 White Gaussian Noise . . . . . . . . . . . . . . . . . . . C.8 Orthonormal and Karhunen–Loeve Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography 523 List of Figures 527 List of Tables 531 c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 Contents Index ix 533 c Copyright Stefan M. Moser, version 6.5, 24 Aug. 2019 Preface These lecture notes started out as handwritten guidelines that I used myself in class for teaching. As I got frequent and persistent requests from students attending the class to hand out these private notes in spite of their awful state (I still cannot really believe that any student was actually able to read them!), my students Lin Gu-Rong and Lin Hsuan-Yin took matters into their own hands and started to typeset my handwritten notes in LATEX. These versions of notes then grew together with a couple of loose handouts (that complemented the textbook by Cover and Thomas [CT06] that I had been using as class textbook for several years) to a large pile of proper handouts and were used several times in combination with Cover and Thomas. During this time, the notes were constantly improved and extended. In this context I have to acknowledge the continued help of my students, in particular of Lin HsuanYin and of Chang Hui-Ting, who typed the chapter about cryptography. In fall 2008 my colleague Chen Po-Ning approached me and suggested to write a coding and information theory textbook for students with only li...
View Full Document

  • Fall '16

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture