*This preview shows
page 1. Sign up to
view the full content.*

**Unformatted text preview: **A Wavelet Tour of Signal Processing
St´phane Mallat
e 2 Contents
1 Introduction to a Transient World 21 2 Fourier Kingdom 45 1.1 Fourier Kingdom . . . . . . . . . . . . . . . . . . . . . .
1.2 Time-Frequency Wedding . . . . . . . . . . . . . . . . .
1.2.1 Windowed Fourier Transform . . . . . . . . . . .
1.2.2 Wavelet Transform . . . . . . . . . . . . . . . . .
1.3 Bases of Time-Frequency Atoms . . . . . . . . . . . . . .
1.3.1 Wavelet Bases and Filter Banks . . . . . . . . . .
1.3.2 Tilings of Wavelet Packet and Local Cosine Bases
1.4 Bases for What? . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Approximation . . . . . . . . . . . . . . . . . . .
1.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . .
1.4.3 Compression . . . . . . . . . . . . . . . . . . . . .
1.5 Travel Guide . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Reproducible Computational Science . . . . . . .
1.5.2 Road Map . . . . . . . . . . . . . . . . . . . . . .
2.1 Linear Time-Invariant Filtering .
2.1.1 Impulse Response . . . . . .
2.1.2 Transfer Functions . . . . .
2.2 Fourier Integrals 1 . . . . . . . . .
2.2.1 Fourier Transform in L1(R )
2.2.2 Fourier Transform in L2(R )
2.2.3 Examples . . . . . . . . . .
2.3 Properties 1 . . . . . . . . . . . . .
2.3.1 Regularity and Decay . . . .
2.3.2 Uncertainty Principle . . . .
1 3 .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. 22
23
24
25
28
29
32
34
35
38
41
42
42
43
45
46
47
48
48
51
54
57
57
58 CONTENTS 4 2.3.3 Total Variation . . . . . . . . . . . . . . . . . . . 61
2.4 Two-Dimensional Fourier Transform 1 . . . . . . . . . . 68
2.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3 Discrete Revolution 3.1 Sampling Analog Signals . . . . . . . . . . . .
3.1.1 Whittaker Sampling Theorem . . . . . .
3.1.2 Aliasing . . . . . . . . . . . . . . . . . .
3.1.3 General Sampling Theorems . . . . . . .
3.2 Discrete Time-Invariant Filters 1 . . . . . . . .
3.2.1 Impulse Response and Transfer Function
3.2.2 Fourier Series . . . . . . . . . . . . . . .
3.3 Finite Signals 1 . . . . . . . . . . . . . . . . . .
3.3.1 Circular Convolutions . . . . . . . . . .
3.3.2 Discrete Fourier Transform . . . . . . . .
3.3.3 Fast Fourier Transform . . . . . . . . . .
3.3.4 Fast Convolutions . . . . . . . . . . . . .
3.4 Discrete Image Processing 1 . . . . . . . . . . .
3.4.1 Two-Dimensional Sampling Theorem . .
3.4.2 Discrete Image Filtering . . . . . . . . .
3.4.3 Circular Convolutions and Fourier Basis
3.5 Problems . . . . . . . . . . . . . . . . . . . . . . 4 Time Meets Frequency 1 4.1 Time-Frequency Atoms . . . . . . . . . . . . .
4.2 Windowed Fourier Transform 1 . . . . . . . . .
4.2.1 Completeness and Stability . . . . . . .
4.2.2 Choice of Window 2 . . . . . . . . . . .
4.2.3 Discrete Windowed Fourier Transform 2
4.3 Wavelet Transforms 1 . . . . . . . . . . . . . . .
4.3.1 Real Wavelets . . . . . . . . . . . . . . .
4.3.2 Analytic Wavelets . . . . . . . . . . . . .
4.3.3 Discrete Wavelets 2 . . . . . . . . . . . .
4.4 Instantaneous Frequency 2 . . . . . . . . . . . .
4.4.1 Windowed Fourier Ridges . . . . . . . .
4.4.2 Wavelet Ridges . . . . . . . . . . . . . .
4.5 Quadratic Time-Frequency Energy 1 . . . . . .
1 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 73 . 73
. 74
. 76
. 80
. 82
. 82
. 84
. 89
. 89
. 90
. 92
. 94
. 95
. 96
. 97
. 99
. 101 105 . 105
. 108
. 112
. 115
. 118
. 119
. 121
. 126
. 132
. 136
. 139
. 149
. 156 CONTENTS
4.5.1 Wigner-Ville Distribution . . . . . . .
4.5.2 Interferences and Positivity . . . . . .
4.5.3 Cohen's Class 2 . . . . . . . . . . . . .
4.5.4 Discrete Wigner-Ville Computations 2
4.6 Problems . . . . . . . . . . . . . . . . . . . . . 5
.
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. . 157
. 162
. 168
. 172
. 174 .
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
. . 179
. 179
. 182
. 188
. 192
. 196
. 202
. 206
. 208
. 212
. 216
. 219
. 223 6.1 Lipschitz Regularity 1 . . . . . . . . . . . . . .
6.1.1 Lipschitz De nition and Fourier Analysis
6.1.2 Wavelet Vanishing Moments . . . . . . .
6.1.3 Regularity Measurements with Wavelets
6.2 Wavelet Transform Modulus Maxima 2 . . . . .
6.2.1 Detection of Singularities . . . . . . . . .
6.2.2 Reconstruction From Dyadic Maxima 3 .
6.3 Multiscale Edge Detection 2 . . . . . . . . . . .
6.3.1 Wavelet Maxima for Images 2 . . . . . .
6.3.2 Fast Multiscale Edge Computations 3 . .
6.4 Multifractals 2 . . . . . . . . . . . . . . . . . . .
6.4.1 Fractal Sets and Self-Similar Functions .
6.4.2 Singularity Spectrum 3 . . . . . . . . . .
6.4.3 Fractal Noises 3 . . . . . . . . . . . . . .
6.5 Problems . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 227
. 228
. 231
. 235
. 243
. 244
. 251
. 259
. 260
. 267
. 272
. 273
. 279
. 287
. 294 5 Frames 5.1 Frame Theory 2 . . . . . . . . . . . . . . . . .
5.1.1 Frame De nition and Sampling . . . .
5.1.2 Pseudo Inverse . . . . . . . . . . . . .
5.1.3 Inverse Frame Computations . . . . . .
5.1.4 Frame Projector and Noise Reduction .
5.2 Windowed Fourier Frames 2 . . . . . . . . . .
5.3 Wavelet Frames 2 . . . . . . . . . . . . . . . .
5.4 Translation Invariance 1 . . . . . . . . . . . .
5.5 Dyadic Wavelet Transform 2 . . . . . . . . . .
5.5.1 Wavelet Design . . . . . . . . . . . . .
5.5.2 \Algorithme a Trous" . . . . . . . . .
5.5.3 Oriented Wavelets for a Vision 3 . . . .
5.6 Problems . . . . . . . . . . . . . . . . . . . . . 6 Wavelet Zoom 179 227 CONTENTS 6 7 Wavelet Bases 7.1 Orthogonal Wavelet Bases 1 . . . . . . . . . . . . . . .
7.1.1 Multiresolution Approximations . . . . . . . . .
7.1.2 Scaling Function . . . . . . . . . . . . . . . . .
7.1.3 Conjugate Mirror Filters . . . . . . . . . . . . .
7.1.4 In Which Orthogonal Wavelets Finally Arrive .
7.2 Classes of Wavelet Bases 1 . . . . . . . . . . . . . . . .
7.2.1 Choosing a Wavelet . . . . . . . . . . . . . . . .
7.2.2 Shannon, Meyer and Battle-Lemarie Wavelets .
7.2.3 Daubechies Compactly Supported Wavelets . .
7.3 Wavelets and Filter Banks 1 . . . . . . . . . . . . . . .
7.3.1 Fast Orthogonal Wavelet Transform . . . . . . .
7.3.2 Perfect Reconstruction Filter Banks . . . . . . .
7.3.3 Biorthogonal Bases of l2(Z) 2 . . . . . . . . . .
7.4 Biorthogonal Wavelet Bases 2 . . . . . . . . . . . . . .
7.4.1 Construction of Biorthogonal Wavelet Bases . .
7.4.2 Biorthogonal Wavelet Design 2 . . . . . . . . .
7.4.3 Compactly Supported Biorthogonal Wavelets 2 .
7.4.4 Lifting Wavelets 3 . . . . . . . . . . . . . . . . .
7.5 Wavelet Bases on an Interval 2 . . . . . . . . . . . . . .
7.5.1 Periodic Wavelets . . . . . . . . . . . . . . . . .
7.5.2 Folded Wavelets . . . . . . . . . . . . . . . . . .
7.5.3 Boundary Wavelets 3 . . . . . . . . . . . . . . .
7.6 Multiscale Interpolations 2 . . . . . . . . . . . . . . . .
7.6.1 Interpolation and Sampling Theorems . . . . . .
7.6.2 Interpolation Wavelet Basis 3 . . . . . . . . . .
7.7 Separable Wavelet Bases 1 . . . . . . . . . . . . . . . .
7.7.1 Separable Multiresolutions . . . . . . . . . . . .
7.7.2 Two-Dimensional Wavelet Bases . . . . . . . . .
7.7.3 Fast Two-Dimensional Wavelet Transform . . .
7.7.4 Wavelet Bases in Higher Dimensions 2 . . . . .
7.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Wavelet Packet and Local Cosine Bases 299 . 299
. 300
. 305
. 309
. 318
. 326
. 326
. 333
. 337
. 344
. 344
. 350
. 354
. 357
. 357
. 361
. 363
. 368
. 378
. 380
. 382
. 385
. 392
. 392
. 399
. 406
. 407
. 410
. 415
. 418
. 420 431 8.1 Wavelet Packets 2 . . . . . . . . . . . . . . . . . . . . . . 432
8.1.1 Wavelet Packet Tree . . . . . . . . . . . . . . . . 432
8.1.2 Time-Frequency Localization . . . . . . . . . . . 439 CONTENTS
8.2
8.3 8.4 8.5
8.6 7 8.1.3 Particular Wavelet Packet Bases . .
8.1.4 Wavelet Packet Filter Banks . . . .
Image Wavelet Packets 2 . . . . . . . . . .
8.2.1 Wavelet Packet Quad-Tree . . . . .
8.2.2 Separable Filter Banks . . . . . . .
Block Transforms 1 . . . . . . . . . . . . .
8.3.1 Block Bases . . . . . . . . . . . . .
8.3.2 Cosine Bases . . . . . . . . . . . .
8.3.3 Discrete Cosine Bases . . . . . . . .
8.3.4 Fast Discrete Cosine Transforms 2 .
Lapped Orthogonal Transforms 2 . . . . .
8.4.1 Lapped Projectors . . . . . . . . .
8.4.2 Lapped Orthogonal Bases . . . . .
8.4.3 Local Cosine Bases . . . . . . . . .
8.4.4 Discrete Lapped Transforms . . . .
Local Cosine Trees 2 . . . . . . . . . . . .
8.5.1 Binary Tree of Cosine Bases . . . .
8.5.2 Tree of Discrete Bases . . . . . . .
8.5.3 Image Cosine Quad-Tree . . . . . .
Problems . . . . . . . . . . . . . . . . . . . 9 An Approximation Tour .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 9.1 Linear Approximations . . . . . . . . . . . . . . .
9.1.1 Linear Approximation Error . . . . . . . . .
9.1.2 Linear Fourier Approximations . . . . . . .
9.1.3 Linear Multiresolution Approximations . . .
9.1.4 Karhunen-Loeve Approximations 2 . . . . .
9.2 Non-Linear Approximations 1 . . . . . . . . . . . .
9.2.1 Non-Linear Approximation Error . . . . . .
9.2.2 Wavelet Adaptive Grids . . . . . . . . . . .
9.2.3 Besov Spaces 3 . . . . . . . . . . . . . . . .
9.2.4 Image Approximations with Wavelets . . . .
9.3 Adaptive Basis Selection 2 . . . . . . . . . . . . . .
9.3.1 Best Basis and Schur Concavity . . . . . . .
9.3.2 Fast Best Basis Search in Trees . . . . . . .
9.3.3 Wavelet Packet and Local Cosine Best Bases
9.4 Approximations with Pursuits 3 . . . . . . . . . . .
1 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 447
. 451
. 454
. 454
. 457
. 460
. 460
. 463
. 466
. 469
. 472
. 473
. 479
. 483
. 487
. 491
. 492
. 494
. 496
. 498 501 . 502
. 502
. 504
. 509
. 514
. 519
. 519
. 522
. 527
. 532
. 539
. 539
. 546
. 549
. 554 CONTENTS 8
9.4.1 Basis Pursuit . . . . . . . . .
9.4.2 Matching Pursuit . . . . . . .
9.4.3 Orthogonal Matching Pursuit
9.5 Problems . . . . . . . . . . . . . . . . .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. 10 Estimations Are Approximations 10.1 Bayes Versus Minimax . . . . . . . . . . . . .
10.1.1 Bayes Estimation . . . . . . . . . . . . .
10.1.2 Minimax Estimation . . . . . . . . . . .
10.2 Diagonal Estimation in a Basis 2 . . . . . . . .
10.2.1 Diagonal Estimation with Oracles . . . .
10.2.2 Thresholding Estimation . . . . . . . . .
10.2.3 Thresholding Re nements 3 . . . . . . .
10.2.4 Wavelet Thresholding . . . . . . . . . . .
10.2.5 Best Basis Thresholding 3 . . . . . . . .
10.3 Minimax Optimality 3 . . . . . . . . . . . . . .
10.3.1 Linear Diagonal Minimax Estimation . .
10.3.2 Orthosymmetric Sets . . . . . . . . . . .
10.3.3 Nearly Minimax with Wavelets . . . . .
10.4 Restoration 3 . . . . . . . . . . . . . . . . . . .
10.4.1 Estimation in Arbitrary Gaussian Noise
10.4.2 Inverse Problems and Deconvolution . .
10.5 Coherent Estimation 3 . . . . . . . . . . . . . .
10.5.1 Coherent Basis Thresholding . . . . . . .
10.5.2 Coherent Matching Pursuit . . . . . . .
10.6 Spectrum Estimation 2 . . . . . . . . . . . . . .
10.6.1 Power Spectrum . . . . . . . . . . . . . .
10.6.2 Approximate Karhunen-Loeve Search 3 .
10.6.3 Locally Stationary Processes 3 . . . . . .
10.7 Problems . . . . . . . . . . . . . . . . . . . . . .
2 11 Transform Coding 11.1 Signal Compression . . . . . . . . . . . .
11.1.1 State of the Art . . . . . . . . . . .
11.1.2 Compression in Orthonormal Bases
11.2 Distortion Rate of Quantization 2 . . . . .
11.2.1 Entropy Coding . . . . . . . . . . .
2 .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 555
. 559
. 568
. 570 575 . 576
. 576
. 586
. 591
. 592
. 596
. 603
. 607
. 616
. 620
. 621
. 627
. 634
. 642
. 643
. 648
. 662
. 663
. 667
. 670
. 671
. 677
. 681
. 686 695 . 696
. 696
. 697
. 699
. 700 CONTENTS
11.2.2 Scalar Quantization . . . . . . . . .
11.3 High Bit Rate Compression 2 . . . . . . .
11.3.1 Bit Allocation . . . . . . . . . . . .
11.3.2 Optimal Basis and Karhunen-Loeve
11.3.3 Transparent Audio Code . . . . . .
11.4 Image Compression 2 . . . . . . . . . . . .
11.4.1 Deterministic Distortion Rate . . .
11.4.2 Wavelet Image Coding . . . . . . .
11.4.3 Block Cosine Image Coding . . . .
11.4.4 Embedded Transform Coding . . .
11.4.5 Minimax Distortion Rate 3 . . . . .
11.5 Video Signals 2 . . . . . . . . . . . . . . .
11.5.1 Optical Flow . . . . . . . . . . . .
11.5.2 MPEG Video Compression . . . . .
11.6 Problems . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 710
. 714
. 714
. 717
. 719
. 724
. 725
. 737
. 742
. 747
. 755
. 760
. 761
. 769
. 773 Functions and Integration . . . . . . . . . .
Banach and Hilbert Spaces . . . . . . . . . .
Bases of Hilbert Spaces . . . . . . . . . . . .
Linear Operators . . . . . . . . . . . . . . .
Separable Spaces and Bases . . . . . . . . .
Random Vectors and Covariance Operators .
Diracs . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. . 779
. 781
. 783
. 785
. 788
. 789
. 792 A Mathematical Complements
A.1
A.2
A.3
A.4
A.5
A.6
A.7 9 B Software Toolboxes 779 795 B.1 WaveLab . . . . . . . . . . . . . . . . . . . . . . . . . . 795
B.2 LastWave . . . . . . . . . . . . . . . . . . . . . . . . . 801
B.3 Freeware Wavelet Toolboxes . . . . . . . . . . . . . . . . 803 10 Preface CONTENTS Facing the unusual popularity of wavelets in sciences, I began to
wonder whether this was just another fashion that would fade away
with time. After several years of research and teaching on this topic,
and surviving the painful experience of writing a book, you may rightly
expect that I have calmed my anguish. This might be the natural selfdelusion a ecting any researcher studying his corner of the world, but
there might be more.
Wavelets are not based on a \bright new idea", but on concepts
that already existed under various forms in many di erent elds. The
formalization and emergence of this \wavelet theory" is the result of a
multidisciplinary e ort that brought together mathematicians, physicists and engineers, who recognized that they were independently developing similar ideas. For signal processing, this connection has created
a ow of ideas that goes well beyond the construction of new bases or
transforms. A Personal Experience At one point, you cannot avoid mention- ing who did what. For wavelets, this is a particularly sensitive task,
risking aggressive replies from forgotten scienti c tribes arguing that
such and such results originally belong to them. As I said, this wavelet
theory is truly the result of a dialogue between scientists who often met
by chance, and were ready to listen. From my totally subjective point
of view, among the many researchers who made important contributions, I would like to single out one, Yves Meyer, whose deep scienti c
vision was a major ingredient sparking this catalysis. It is ironic to
see a French pure mathematician, raised in a Bourbakist culture where
applied meant trivial, playing a central role along this wavelet bridge
between engineers and scientists coming from di erent disciplines.
When beginning my Ph.D. in the U.S., the only project I had in
mind was to travel, never become a researcher, and certainly never
teach. I had clearly destined myself to come back to France, and quickly
begin climbing the ladder of some big corporation. Ten years later, I
was still in the U.S., the mind buried in the hole of some obscure
scienti c problem, while teaching in a university. So what went wrong?
Probably the fact that I met scientists like Yves Meyer, whose ethic CONTENTS 11 and creativity have given me a totally di erent view of research and
teaching. Trying to communicate this ame was a central motivation
for writing this book. I hope that you will excuse me if my prose ends
up too often in the no man's land of scienti c neutrality. A Few Ideas Beyond mathematics and algorithms, the book carries a few important ideas that I would like to emphasize. Time-frequency wedding Important information often appears
through a simultaneous analysis of the signal's time and frequency properties. This motivates decompositions over elementary \atoms" that are well concentrated in time and frequency. It
is therefore necessary to understand how the uncertainty principle
limits the exibility of time and frequency transforms.
Scale for zooming Wavelets are scaled waveforms that measure
signal variations. By traveling through scales, zooming procedures provide powerful characterizations of signal structures such
as singularities.
More and more bases Many orthonormal bases can be designed
with fast computational algorithms. The discovery of lter banks
and wavelet bases has created a popular new sport of basis hunting. Families of orthogonal bases are created every day. This
game may however become tedious if not motivated by applications.
Sparse representations An orthonormal basis is useful if it denes a representation where signals are well approximated with
a few non-zero coe cients. Applications to signal estimation in
noise and image compression are closely related to approximation
theory.
Try it non-linear and diagonal Linearity has long predominated
because of its apparent simplicity. We are used to slogans that
often hide the limitations of \optimal" linear procedures such as
Wiener ltering or Karhunen-Loeve bases expansions. In sparse CONTENTS 12 representations, simple non-linear diagonal operators can considerably outperform \optimal" linear procedures, and fast algorithms are available. and LastWave Toolboxes Numerical experimentations
are necessary to fully understand the algorithms and theorems in this
book. To avoid the painful programming of standard procedures, nearly
all wavelet and time-frequency algorithms are available in the WaveLab package, programmed in Matlab. WaveLab is a freeware software that can be retrieved through the Internet. The correspondence
between algorithms and WaveLab subroutines is explained in Appendix B. All computational gures can be reproduced as demos in
WaveLab. LastWave is a wavelet signal and image processing environment, written in C for X11/Unix and Macintosh computers. This
stand-alone freeware does not require any additional commercial package. It is also described in Appendix B.
WaveLab Teaching This book is intended as a graduate textbook. It took form after teaching \wavelet signal processing" courses in electrical engineering departments at MIT and Tel Aviv University, and in applied
mathematics departments at the Courant Institute and Ecole Polytechnique (Paris).
In electrical engineering, students are often initially frightened by
the use of vector space formalism as opposed to simple linear algebra.
The predominance of linear time invariant systems has led many to
think that convolutions and the Fourier transform are mathematically
su cient to handle all applications. Sadly enough, this is not the case.
The mathematics used in the book are not motivated by theoretical
beauty they are truly necessary to face the complexity of transient
signal processing. Discovering the use of higher level mathematics happens to be an important pedagogical side-e ect of this course. Numerical algorithms and gures escort most theorems. The use of WaveLab
makes it particularly easy to include numerical simulations in homework. Exercises and deeper problems for class projects are listed at the
end of each chapter.
In applied mathematics, this course is an introduction to wavelets CONTENTS 13 but also to signal processing. Signal processing is a newcomer on the
stage of legitimate applied mathematics topics. Yet, it is spectacularly
well adapted to illustrate the applied mathematics chain, from problem
modeling to e cient calculations of solutions and theorem proving. Images and sounds give a sensual contact with theorems, that can wake up
most students. For teaching, formatted overhead transparencies with
enlarged gures are available on the Internet:
http://www.cmap.polytechnique.fr/ mallat/Wavetour figures/ Francois Chaplais also o ers an introductory Web tour of basic concepts
in the book at
http://cas.ensmp.fr/ chaplais/Wavetour presentation/ : Not all theorems of the book are proved in detail, but the important
techniques are included. I hope that the reader will excuse the lack
of mathematical rigor in the many instances where I have privileged
ideas over details. Few proofs are long they are concentrated to avoid
diluting the mathematics into many intermediate results, which would
obscure the text. Course Design Level numbers explained in Section 1.5.2 can help in designing an introductory or a more advanced course. Beginning with
a review of the Fourier transform is often necessary. Although most
applied mathematics students have already seen the Fourier transform,
they have rarely had the time to understand it well. A non-technical review can stress applications, including the sampling theorem. Refreshing basic mathematical results is also needed for electrical engineering
students. A mathematically oriented review of time-invariant signal
processing in Chapters 2 and 3 is the occasion to remind the student of
elementary properties of linear operators, projectors and vector spaces,
which can be found in Appendix A. For a course of a single semester,
one can follow several paths, oriented by di erent themes. Here are few
possibilities.
One can teach a course that surveys the key ideas previously outlined. Chapter 4 is particularly important in introducing the concept of : 14 CONTENTS local time-frequency decompositions. Section 4.4 on instantaneous frequencies illustrates the limitations of time-frequency resolution. Chapter 6 gives a di erent perspective on the wavelet transform, by relating
the local regularity of a signal to the decay of its wavelet coe cients
across scales. It is useful to stress the importance of the wavelet vanishing moments. The course can continue with the presentation of wavelet
bases in Chapter 7, and concentrate on Sections 7.1-7.3 on orthogonal
bases, multiresolution approximations and lter bank algorithms in one
dimension. Linear and non-linear approximations in wavelet bases are
covered in Chapter 9. Depending upon students' backgrounds and interests, the course can nish in Chapter 10 with an application to signal
estimation with wavelet thresholding, or in Chapter 11 by presenting
image transform codes in wavelet bases.
A di erent course may study the construction of new orthogonal
bases and their applications. Beginning with the wavelet basis, Chapter 7 also gives an introduction to lter banks. Continuing with Chapter
8 on wavelet packet and local cosine bases introduces di erent orthogonal tilings of the time-frequency plane. It explains the main ideas
of time-frequency decompositions. Chapter 9 on linear and non-linear
approximation is then particularly important for understanding how to
measure the e ciency of these bases, and for studying best bases search
procedures. To illustrate the di erences between linear and non-linear
approximation procedures, one can compare the linear and non-linear
thresholding estimations studied in Chapter 10.
The course can also concentrate on the construction of sparse representations with orthonormal bases, and study applications of non-linear
diagonal operators in these bases. It may start in Chapter 10 with a
comparison of linear and non-linear operators used to estimate piecewise regular signals contaminated by a white noise. A quick excursion
in Chapter 9 introduces linear and non-linear approximations to explain what is a sparse representation. Wavelet orthonormal bases are
then presented in Chapter 7, with special emphasis on their non-linear
approximation properties for piecewise regular signals. An application
of non-linear diagonal operators to image compression or to thresholding estimation should then be studied in some detail, to motivate the
use of modern mathematics for understanding these problems.
A more advanced course can emphasize non-linear and adaptive sig- CONTENTS 15 nal processing. Chapter 5 on frames introduces exible tools that are
useful in analyzing the properties of non-linear representations such
as irregularly sampled transforms. The dyadic wavelet maxima representation illustrates the frame theory, with applications to multiscale
edge detection. To study applications of adaptive representations with
orthonormal bases, one might start with non-linear and adaptive approximations, introduced in Chapter 9. Best bases, basis pursuit or
matching pursuit algorithms are examples of adaptive transforms that
construct sparse representations for complex signals. A central issue is
to understand to what extent adaptivity improves applications such as
noise removal or signal compression, depending on the signal properties. Responsibilities This book was a one-year project that ended up in a never will nish nightmare. Ruzena Bajcsy bears a major responsibility for not encouraging me to choose another profession, while guiding
my rst research steps. Her profound scienti c intuition opened my
eyes to and well beyond computer vision. Then of course, are all the
collaborators who could have done a much better job of showing me
that science is a sel sh world where only competition counts. The
wavelet story was initiated by remarkable scientists like Alex Grossmann, whose modesty created a warm atmosphere of collaboration,
where strange new ideas and ingenuity were welcome as elements of
creativity.
I am also grateful to the few people who have been willing to work
with me. Some have less merit because they had to nish their degree but others did it on a voluntary basis. I am thinking of Amir
Averbuch, Emmanuel Bacry, Francois Bergeaud, Geo Davis, Davi
Geiger, Frederic Falzon, Wen Liang Hwang, Hamid Krim, George Papanicolaou, Jean-Jacques Slotine, Alan Willsky, Zifeng Zhang and Sifen
Zhong. Their patience will certainly be rewarded in a future life.
Although the reproduction of these 600 pages will probably not kill
many trees, I do not want to bear the responsibility alone. After four
years writing and rewriting each chapter, I rst saw the end of the
tunnel during a working retreat at the Fondation des Treilles, which
o ers an exceptional environment to think, write and eat in Provence.
With WaveLab, David Donoho saved me from spending the second CONTENTS 16 half of my life programming wavelet algorithms. This opportunity was
beautifully implemented by Maureen Clerc and Jer^me Kalifa, who
o
made all the gures and found many more mistakes than I dare say.
Dear reader, you should thank Barbara Burke Hubbard, who corrected
my Frenglish (remaining errors are mine), and forced me to modify
many notations and explanations. I thank her for doing it with tact
and humor. My editor, Chuck Glaser, had the patience to wait but I
appreciate even more his wisdom to let me think that I would nish in
a year.
She will not read this book, yet my deepest gratitude goes to Branka
with whom life has nothing to do with wavelets.
Stephane Mallat CONTENTS 17 Second Edition Before leaving this Wavelet Tour, I naively thought that I should
take advantage of remarks and suggestions made by readers. This almost got out of hand, and 200 pages ended up being rewritten. Let me
outline the main components that were not in the rst edition.
Bayes versus Minimax Classical signal processing is almost entirely built in a Bayes framework, where signals are viewed as
realizations of a random vector. For the last two decades, researchers have tried to model images with random vectors, but in
vain. It is thus time to wonder whether this is really the best approach. Minimax theory opens an easier avenue for evaluating the
performance of estimation and compression algorithms. It uses
deterministic models that can be constructed even for complex
signals such as images. Chapter 10 is rewritten and expanded to
explain and compare the Bayes and minimax points of view.
Bounded Variation Signals Wavelet transforms provide sparse
representations of piecewise regular signals. The total variation
norm gives an intuitive and precise mathematical framework in
which to characterize the piecewise regularity of signals and images. In this second edition, the total variation is used to compute
approximation errors, to evaluate the risk when removing noise
from images, and to analyze the distortion rate of image transform codes.
Normalized Scale Continuous mathematics give asymptotic results when the signal resolution N increases. In this framework,
the signal support is xed, say 0 1], and the sampling interval
N ;1 is progressively reduced. In contrast, digital signal processing algorithms are often presented by normalizing the sampling
interval to 1, which means that the support 0 N ] increases with
N . This new edition explains both points of views, but the gures
now display signals with a support normalized to 0 1], in accordance with the theorems. The scale parameter of the wavelet
transform is thus smaller than 1. 18 CONTENTS
Video Compression Compressing video sequences is of prime importance for real time transmission with low-bandwidth channels
such as the Internet or telephone lines. Motion compensation
algorithms are presented at the end of Chapter 11. CONTENTS 19 Notation hf g i
kf k Inner product (A.6).
Norm (A.3).
f n] = O(g n]) Order of: there exists K such that f n] K g n].
n
f n] = o(g n]) Small order of: limn!+1 f n]] = 0.
g
f n] g n]
Equivalent to: f n] = O(g n]) and g n] = O(f n]).
A < +1
A is nite.
AB
A is much bigger than B.
z
Complex conjugate of z 2 C .
bxc
Largest integer n x.
dxe
Smallest integer n x.
n mod N
Remainder of the integer division of n modulo N . Sets
N
Z
R
R+
C Signals Positive integers including 0.
Integers.
Real numbers.
Positive real numbers.
Complex numbers. f (t)
f n]
(t)
n] 1 a b] Continuous time signal.
Discrete signal.
Dirac distribution (A.30).
Discrete Dirac (3.17).
Indicator function which is 1 in a b] and 0 outside. Spaces
C0
Cp
C1
Ws(R )
L2(R ) Uniformly continuous functions (7.240).
p times continuously di erentiable functions.
In nitely di erentiable functions.
Sobolev s times di erentiable functions (9.5).
R
Finite energy functions jf (t)j2 dt < +1. CONTENTS 20 Lp (R )
l2(Z)
lp(Z)
CN UV
UV
Operators
Id
f 0(t)
f (p)(t)
~
rf (x y )
f ? g(t)
f ? g n]
f ? g n] Transforms f^(!)
f^ k]
Sf (u s)
PS f (u )
Wf (u s)
PW f (u )
PV f (u )
Af (u ) Probability
X EfX g H(X )
Hd (X ) Cov(X1 X2)
F n]
RF k] R Functions such that jf (t)jp dt < +1.
P1
Finite energy discrete signals +=;1 jf n]j2 < +1.
P 1n
Discrete signals such that +=;1 jf n]jp < +1.
n
Complex signals of size N .
Direct sum of two vector spaces.
Tensor product of two vector spaces (A.19).
Identity.
(
Derivative dfpdtt) .
f
Derivative d dtp(t) of order p .
Gradient vector (6.54).
Continuous time convolution (2.2).
Discrete convolution (3.18).
Circular convolution (3.58)
Fourier transform (2.6), (3.24).
Discrete Fourier transform (3.34).
Short-time windowed Fourier transform (4.11).
Spectrogram (4.12).
Wavelet transform (4.31).
Scalogram (4.55).
Wigner-Ville distribution (4.108).
Ambiguity function (4.24).
Random variable.
Expected value.
Entropy (11.4).
Di erential entropy (11.20).
Covariance (A.22).
Random vector.
Autocovariance of a stationary process (A.26). Chapter 1
Introduction to a Transient
World
After a few minutes in a restaurant we cease to notice the annoying
hubbub of surrounding conversations, but a sudden silence reminds
us of the presence of neighbors. Our attention is clearly attracted by
transients and movements as opposed to stationary stimuli, which we
soon ignore. Concentrating on transients is probably a strategy for
selecting important information from the overwhelming amount of data
recorded by our senses. Yet, classical signal processing has devoted
most of its e orts to the design of time-invariant and space-invariant
operators, that modify stationary signal properties. This has led to the
indisputable hegemony of the Fourier transform, but leaves aside many
information-processing applications.
The world of transients is considerably larger and more complex
than the garden of stationary signals. The search for an ideal Fourierlike basis that would simplify most signal processing is therefore a hopeless quest. Instead, a multitude of di erent transforms and bases have
proliferated, among which wavelets are just one example. This book
gives a guided tour in this jungle of new mathematical and algorithmic
results, while trying to provide an intuitive sense of orientation. Major
ideas are outlined in this rst chapter. Section 1.5.2 serves as a travel
guide and introduces the reproducible experiment approach based on
the WaveLab and LastWave softwares. It also discusses the use of
level numbers|landmarks that can help the reader keep to the main
21 22 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
roads. 1.1 Fourier Kingdom
The Fourier transform rules over linear time-invariant signal processing
because sinusoidal waves ei!t are eigenvectors of linear time-invariant
operators. A linear time-invariant operator L is entirely speci ed by
^
the eigenvalues h(!):
^
8! 2 R
Lei!t = h(!) ei!t:
(1.1)
To compute Lf , a signal f is decomposed as a sum of sinusoidal eigenvectors fei!t g!2R:
1 Z +1 f^(!) ei!t d!:
(1.2)
f (t) = 2
;1
If f has nite energy, the theory of Fourier integrals presented in
Chapter 2 proves that the amplitude f^(!) of each sinusoidal wave ei!t
is the Fourier transform of f : f^(!) = Z +1
;1 f (t) e;i!t dt: (1.3) Applying the operator L to f in (1.2) and inserting the eigenvector
expression (1.1) gives
1 Z +1 f^(!) ^ (!) ei!t d!:
Lf (t) = 2
h
(1.4)
;1
The operator L ampli es or attenuates each sinusoidal component ei!t
^
of f by h(!). It is a frequency ltering of f .
As long as we are satis ed with linear time-invariant operators,
the Fourier transform provides simple answers to most questions. Its
richness makes it suitable for a wide range of applications such as signal
transmissions or stationary signal processing.
However, if we are interested in transient phenomena|a word pronounced at a particular time, an apple located in the left corner of an
image|the Fourier transform becomes a cumbersome tool. 1.2. TIME-FREQUENCY WEDDING 23 The Fourier coe cient is obtained in (1.3) by correlating f with a
sinusoidal wave ei!t . Since the support of ei!t covers the whole real
line, f^(!) depends on the values f (t) for all times t 2 R . This global
\mix" of information makes it di cult to analyze any local property of
f from f^. Chapter 4 introduces local time-frequency transforms, which
decompose the signal over waveforms that are well localized in time
and frequency. 1.2 Time-Frequency Wedding
The uncertainty principle states that the energy spread of a function
and its Fourier transform cannot be simultaneously arbitrarily small.
Motivated by quantum mechanics, in 1946 the physicist Gabor 187]
de ned elementary time-frequency atoms as waveforms that have a
minimal spread in a time-frequency plane. To measure time-frequency
\information" content, he proposed decomposing signals over these elementary atomic waveforms. By showing that such decompositions are
closely related to our sensitivity to sounds, and that they exhibit important structures in speech and music recordings, Gabor demonstrated
the importance of localized time-frequency signal processing.
Chapter 4 studies the properties of windowed Fourier and wavelet
transforms, computed by decomposing the signal over di erent families of time-frequency atoms. Other transforms can also be de ned by
modifying the family of time-frequency atoms. A uni ed interpretation
of local time-frequency decompositions follows the time-frequency energy density approach of Ville. In parallel to Gabor's contribution, in
1948 Ville 342], who was an electrical engineer, proposed analyzing the
time-frequency properties of signals f with an energy density de ned
by
Z +1
PV f (t !) =
f t + 2 f t ; 2 e;i ! d :
;1 Once again, theoretical physics was ahead, since this distribution had
already been introduced in 1932 by Wigner 351] in the context of
quantum mechanics. Chapter 4 explains the path that relates WignerVille distributions to windowed Fourier and wavelet transforms, or any
linear time-frequency transform. 24 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD 1.2.1 Windowed Fourier Transform
Gabor atoms are constructed by translating in time and frequency a
time window g:
gu (t) = g(t ; u) ei t:
The energy of gu is concentrated in the neighborhood of u over an
interval of size t , measured by the standard deviation of jgj2. Its
Fourier transform is a translation by of the Fourier transform g of g:
^
^
gu (!) = g(! ; ) e;iu(!; ):
^ (1.5) The energy of gu is therefore localized near the frequency , over an
^
interval of size ! , which measures the domain where g(!) is non^
negligible. In a time-frequency plane (t !), the energy spread of the
atom gu is symbolically represented by the Heisenberg rectangle illustrated by Figure 1.1. This rectangle is centered at (u ) and has a time
width t and a frequency width ! . The uncertainty principle proves
that its area satis es
1:
t!
2
This area is minimum when g is a Gaussian, in which case the atoms
gu are called Gabor functions.
ω γ ^
|g (ω) |
ξ σt ^
|gv,γ(ω) | σω
σt u, ξ σω
|gu, ξ (t) | 0 u |g v ,γ (t) |
v t Figure 1.1: Time-frequency boxes (\Heisenberg rectangles") representing the energy spread of two Gabor atoms. 1.2. TIME-FREQUENCY WEDDING 25 The windowed Fourier transform de ned by Gabor correlates a signal f with each atom gu : Sf (u ) = Z +1
;1 f (t) gu (t) dt = Z +1
;1 f (t) g(t ; u) e;i t dt: (1.6) It is a Fourier integral that is localized in the neighborhood of u by the
window g(t ; u). This time integral can also be written as a frequency
integral by applying the Fourier Parseval formula (2.25):
1 Z +1 f^(!) g (!) d!:
^u
(1.7)
Sf (u ) = 2
;1
The transform Sf (u ) thus depends only on the values f (t) and f^(!)
in the time and frequency neighborhoods where the energies of gu
and gu are concentrated. Gabor interprets this as a \quantum of
^
information" over the time-frequency rectangle illustrated in Figure
1.1.
When listening to music, we perceive sounds that have a frequency
that varies in time. Measuring time-varying harmonics is an important
application of windowed Fourier transforms in both music and speech
recognition. A spectral line of f creates high amplitude windowed
Fourier coe cients Sf (u ) at frequencies (u) that depend on the
time u. The time evolution of such spectral components is therefore
analyzed by following the location of large amplitude coe cients. 1.2.2 Wavelet Transform In re ection seismology, Morlet knew that the modulated pulses sent
underground have a duration that is too long at high frequencies to
separate the returns of ne, closely-spaced layers. Instead of emitting
pulses of equal duration, he thus thought of sending shorter waveforms
at high frequencies. Such waveforms are simply obtained by scaling a
single function called a wavelet. Although Grossmann was working in
theoretical physics, he recognized in Morlet's approach some ideas that
were close to his own work on coherent quantum states. Nearly forty
years after Gabor, Morlet and Grossmann reactivated a fundamental
collaboration between theoretical physics and signal processing, which 26 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
led to the formalization of the continuous wavelet transform 200]. Yet,
these ideas were not totally new to mathematicians working in harmonic
analysis, or to computer vision researchers studying multiscale image
processing. It was thus only the beginning of a rapid catalysis that
brought together scientists with very di erent backgrounds, rst around
co ee tables, then in more luxurious conferences.
A wavelet is a function of zero average:
Z +1
;1 (t) dt = 0 which is dilated with a scale parameter s, and translated by u:
t;u :
1
(1.8)
u s(t) = p
s
s
The wavelet transform of f at the scale s and position u is computed
by correlating f with a wavelet atom
Z +1
t ; u dt:
1
(1.9)
Wf (u s) =
f (t) ps
s
;1 Time-Frequency Measurements Like a windowed Fourier trans- form, a wavelet transform can measure the time-frequency variations of
spectral components, but it has a di erent time-frequency resolution.
A wavelet transform correlates f with u s. By applying the Fourier
Parseval formula (2.25), it can also be written as a frequency integration:
Z +1
1 Z +1 f^(!) ^ (!) d!: (1.10)
Wf (u s) =
f (t) u s(t) dt = 2
us
;1
;1
The wavelet coe cient Wf (u s) thus depends on the values f (t) and
f^(!) in the time-frequency region where the energy of u s and ^u s is
concentrated. Time varying harmonics are detected from the position
and scale of high amplitude wavelet coe cients.
In time, u s is centered at u with a spread proportional to s. Its
Fourier transform is calculated from (1.8):
^u s(!) = e;iu! ps ^(s!) 1.2. TIME-FREQUENCY WEDDING 27 where ^ is the Fourier transform of . To analyze the phase information
of signals, a complex analytic wavelet is used. This means that ^(!) =
0 for ! < 0. Its energy is concentrated in a positive frequency interval
centered at . The energy of ^u s(!) is therefore concentrated over
a positive frequency interval centered at =s, whose size is scaled by
1=s. In the time-frequency plane, a wavelet atom u s is symbolically
represented by a rectangle centered at (u =s). The time and frequency
spread are respectively proportional to s and 1=s. When s varies, the
height and width of the rectangle change but its area remains constant,
as illustrated by Figure 1.2.
ω ^
|ψ (ω)|
u,s η
s σω
s
s σt s0σt ^
|ψu ,s(ω)|
00 η
s0 ψ u ,s ψu,s
0 0 u σω
s0 0 u0 t Figure 1.2: Time-frequency boxes of two wavelets u s and u0 s0 . When
the scale s decreases, the time support is reduced but the frequency
spread increases and covers an interval that is shifted towards high
frequencies. Multiscale Zooming The wavelet transform can also detect and characterize transients with a zooming procedure across scales. Suppose that is real. Since it has a zero average, a wavelet coe cient
Wf (u s) measures the variation of f in a neighborhood of u whose size
is proportional to s. Sharp signal transitions create large amplitude
wavelet coe cients. Chapter 6 relates the pointwise regularity of f to
the asymptotic decay of the wavelet transform Wf (u s), when s goes
to zero. Singularities are detected by following across scales the local
maxima of the wavelet transform. In images, high amplitude wavelet 28 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
coe cients indicate the position of edges, which are sharp variations
of the image intensity. Di erent scales provide the contours of image
structures of varying sizes. Such multiscale edge detection is particularly e ective for pattern recognition in computer vision 113].
The zooming capability of the wavelet transform not only locates
isolated singular events, but can also characterize more complex multifractal signals having non-isolated singularities. Mandelbrot 43] was
the rst to recognize the existence of multifractals in most corners of
nature. Scaling one part of a multifractal produces a signal that is
statistically similar to the whole. This self-similarity appears in the
wavelet transform, which modi es the analyzing scale. >From the global
wavelet transform decay, one can measure the singularity distribution
of multifractals. This is particularly important in analyzing their properties and testing models that explain the formation of multifractals in
physics. 1.3 Bases of Time-Frequency Atoms
The continuous windowed Fourier transform Sf (u ) and the wavelet
transform Wf (u s) are two-dimensional representations of a one-dimensional signal f . This indicates the existence of some redundancy
that can be reduced and even removed by subsampling the parameters
of these transforms. Frames Windowed Fourier transforms and wavelet transforms can be
written as inner products in L2(R ), with their respective time-frequency
atoms
and Sf (u ) =
Wf (u s) = Z +1
;1 Z +1
;1 f (t) gu (t) dt = hf gu
f (t ) u s(t) dt = hf i u si: Subsampling both transforms de nes a complete signal representation
if any signal can be reconstructed from linear combinations of discrete
families of windowed Fourier atoms fgun k g(n k)2Z2 and wavelet atoms 1.3. BASES OF TIME-FREQUENCY ATOMS 29 f un sj g(j n)2Z2. The frame theory of Chapter 5 discusses what conditions these families of waveforms must meet if they are to provide stable
and complete representations.
Completely eliminating the redundancy is equivalent to building
a basis of the signal space. Although wavelet bases were the rst to
arrive on the research market, they have quickly been followed by other
families of orthogonal bases, such as wavelet packet and local cosine
bases. 1.3.1 Wavelet Bases and Filter Banks In 1910, Haar 202] realized that one can construct a simple piecewise
constant function
8
< 1 if 0 t < 1=2
(t) = :;1 if 1=2 t < 1
0 otherwise
whose dilations and translations generate an orthonormal basis of L2(R ):
1
j n (t) = p j
2 t ; 2j n
2j (j n)2Z2 : Any nite energy signal f can be decomposed over this wavelet orthogonal basis f j ng(j n)2Z2 f= +1
+1
XX hf
j =;1 n=;1 j ni j n : (1.11) Since (t) has a zero average, each partial sum dj (t) = +1
X hf
n=;1 j ni j n (t) can be interpreted as detail variations at the scale 2j . These layers of
details are added at all scales to progressively improve the approximation of f , and ultimately recover f . 30 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
If f has smooth variations, we should obtain a precise approximation
when removing ne scale details, which is done by truncating the sum
(1.11). The resulting approximation at a scale 2J is fJ (t) = +1
X j =J dj (t): For a Haar basis, fJ is piecewise constant. Piecewise constant approximations of smooth functions are far from optimal. For example, a
piecewise linear approximation produces a smaller approximation error.
The story continues in 1980, when Stromberg 322] found a piecewise
linear function that also generates an orthonormal basis and gives
better approximations of smooth functions. Meyer was not aware of
this result, and motivated by the work of Morlet and Grossmann he
tried to prove that there exists no regular wavelet that generates an
orthonormal basis. This attempt was a failure since he ended up constructing a whole family of orthonormal wavelet bases, with functions
that are in nitely continuously di erentiable 270]. This was the fundamental impulse that lead to a widespread search for new orthonormal
wavelet bases, which culminated in the celebrated Daubechies wavelets
of compact support 144].
The systematic theory for constructing orthonormal wavelet bases
was established by Meyer and Mallat through the elaboration of multiresolution signal approximations 254], presented in Chapter 7. It
was inspired by original ideas developed in computer vision by Burt
and Adelson 108] to analyze images at several resolutions. Digging
more into the properties of orthogonal wavelets and multiresolution
approximations brought to light a surprising relation with lter banks
constructed with conjugate mirror lters. Filter Banks Motivated by speech compression, in 1976 Croisier,
Esteban and Galand 141] introduced an invertible lter bank, which
decomposes a discrete signal f n] in two signals of half its size, using a
ltering and subsampling procedure. They showed that f n] can be recovered from these subsampled signals by canceling the aliasing terms
with a particular class of lters called conjugate mirror lters. This
breakthrough led to a 10-year research e ort to build a complete lter 1.3. BASES OF TIME-FREQUENCY ATOMS 31 bank theory. Necessary and su cient conditions for decomposing a signal in subsampled components with a ltering scheme, and recovering
the same signal with an inverse transform, were established by Smith
and Barnwell 316], Vaidyanathan 336] and Vetterli 339].
The multiresolution theory of orthogonal wavelets proves that any
conjugate mirror lter characterizes a wavelet that generates an orthonormal basis of L2(R ). Moreover, a fast discrete wavelet transform
is implemented by cascading these conjugate mirror lters. The equivalence between this continuous time wavelet theory and discrete lter
banks led to a new fruitful interface between digital signal processing
and harmonic analysis, but also created a culture shock that is not
totally resolved. Continuous Versus Discrete and Finite Many signal processors have been and still are wondering what is the point of these continuous
time wavelets, since all computations are performed over discrete signals, with conjugate mirror lters. Why bother with the convergence
of in nite convolution cascades if in practice we only compute a nite
number of convolutions? Answering these important questions is necessary in order to understand why throughout this book we alternate
between theorems on continuous time functions and discrete algorithms
applied to nite sequences.
A short answer would be \simplicity". In L2 (R ), a wavelet basis
is constructed by dilating and translating a single function . Several important theorems relate the amplitude of wavelet coe cients
to the local regularity of the signal f . Dilations are not de ned over
discrete sequences, and discrete wavelet bases have therefore a more
complicated structure. The regularity of a discrete sequence is not well
de ned either, which makes it more di cult to interpret the amplitude
of wavelet coe cients. A theory of continuous time functions gives
asymptotic results for discrete sequences with sampling intervals decreasing to zero. This theory is useful because these asymptotic results
are precise enough to understand the behavior of discrete algorithms.
Continuous time models are not su cient for elaborating discrete
signal processing algorithms. Uniformly sampling the continuous time
wavelets f j n(t)g(j n)2Z2 does not produce a discrete orthonormal ba- 32 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
sis. The transition between continuous and discrete signals must be
done with great care. Restricting the constructions to nite discrete
signals adds another layer of complexity because of border problems.
How these border issues a ect numerical implementations is carefully
addressed once the properties of the bases are well understood. To
simplify the mathematical analysis, throughout the book continuous
time transforms are introduced rst. Their discretization is explained
afterwards, with fast numerical algorithms over nite signals. 1.3.2 Tilings of Wavelet Packet and Local Cosine
Bases Orthonormal wavelet bases are just an appetizer. Their construction
showed that it is not only possible but relatively simple to build orthonormal bases of L2 (R ) composed of local time-frequency atoms. The
completeness and orthogonality of a wavelet basis is represented by a
tiling that covers the time-frequency plane with the wavelets' timefrequency boxes. Figure 1.3 shows the time-frequency box of each j n,
which is translated by 2j n, with a time and a frequency width scaled
respectively by 2j and 2;j .
One can draw many other tilings of the time-frequency plane, with
boxes of minimal surface as imposed by the uncertainty principle. Chapter 8 presents several constructions that associate large families of orthonormal bases of L2 (R) to such new tilings. Wavelet Packet Bases A wavelet orthonormal basis decomposes the frequency axis in dyadic intervals whose sizes have an exponential
growth, as shown by Figure 1.3. Coifman, Meyer and Wickerhauser
139] have generalized this xed dyadic construction by decomposing
the frequency in intervals whose bandwidths may vary. Each frequency
interval is covered by the time-frequency boxes of wavelet packet functions that are uniformly translated in time in order to cover the whole
plane, as shown by Figure 1.4.
Wavelet packet functions are designed by generalizing the lter bank
tree that relates wavelets and conjugate mirror lters. The frequency
axis division of wavelet packets is implemented with an appropriate 1.3. BASES OF TIME-FREQUENCY ATOMS 33 sequence of iterated convolutions with conjugate mirror lters. Fast
numerical wavelet packet decompositions are thus implemented with
discrete lter banks.
ω ψj,n(t) ψj+1,p (t) t t Figure 1.3: The time-frequency boxes of a wavelet basis de ne a tiling
of the time-frequency plane. Local Cosine Bases Orthonormal bases of L2(R ) can also be con- structed by dividing the time axis instead of the frequency axis. The
time axis is segmented in successive nite intervals ap ap+1]. The local
cosine bases of Malvar 262] are obtained by designing smooth windows
gp(t) that cover each interval ap ap+1], and multiplying them by cosine
functions cos( t + ) of di erent frequencies. This is yet another idea
that was independently studied in physics, signal processing and mathematics. Malvar's original construction was done for discrete signals. At
the same time, the physicist Wilson 353] was designing a local cosine
basis with smooth windows of in nite support, to analyze the properties of quantum coherent states. Malvar bases were also rediscovered
and generalized by the harmonic analysts Coifman and Meyer 138].
These di erent views of the same bases brought to light mathematical
and algorithmic properties that opened new applications.
A multiplication by cos( t + ) translates the Fourier transform
gp(!) of gp(t) by . Over positive frequencies, the time-frequency
^
box of the modulated window gp(t) cos( t + ) is therefore equal to 34 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
ω 0 t Figure 1.4: A wavelet packet basis divides the frequency axis in separate
intervals of varying sizes. A tiling is obtained by translating in time
the wavelet packets covering each frequency interval.
the time-frequency box of gp translated by along frequencies. The
time-frequency boxes of local cosine basis vectors de ne a tiling of the
time-frequency plane illustrated by Figure 1.5. 1.4 Bases for What?
The tiling game is clearly unlimited. Local cosine and wavelet packet
bases are important examples, but many other kinds of bases can be
constructed. It is thus time to wonder how to select an appropriate
basis for processing a particular class of signals. The decomposition
coe cients of a signal in a basis de ne a representation that highlights
some particular signal properties. For example, wavelet coe cients
provide explicit information on the location and type of signal singularities. The problem is to nd a criterion for selecting a basis that is
intrinsically well adapted to represent a class of signals.
Mathematical approximation theory suggests choosing a basis that
can construct precise signal approximations with a linear combination
of a small number of vectors selected inside the basis. These selected
vectors can be interpreted as intrinsic signal structures. Compact coding and signal estimation in noise are applications where this criterion
is a good measure of the e ciency of a basis. Linear and non-linear 1.4. BASES FOR WHAT? 35 ω 0 0 t g p(t) a p-1 ap lp a p+1 t Figure 1.5: A local cosine basis divides the time axis with smooth
windows gp(t). Multiplications with cosine functions translate these
windows in frequency and yield a complete cover of the time-frequency
plane.
procedures are studied and compared. This will be the occasion to
show that non-linear does not always mean complicated. 1.4.1 Approximation
The development of orthonormal wavelet bases has opened a new bridge
between approximation theory and signal processing. This exchange is
not quite new since the fundamental sampling theorem comes from an
interpolation theory result proved in 1935 by Whittaker 349]. However,
the state of the art of approximation theory has changed since 1935.
In particular, the properties of non-linear approximation schemes are
much better understood, and give a rm foundation for analyzing the
performance of many non-linear signal processing algorithms. Chapter
9 introduces important approximation theory results that are used in
signal estimation and data compression. 36 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD Linear Approximation A linear approximation projects the signal
f over M vectors that are chosen a priori in an orthonormal basis
B = fgm gm2Z, say the rst M :
fM = M ;1
X
m=0 hf gm i gm : (1.12) Since the basis is orthonormal, the approximation error is the sum of
the remaining squared inner products M ] = kf ; fM k =
2 +1
X m=M jhf gm ij2 : The accuracy of this approximation clearly depends on the properties
of f relative to the basis B.
A Fourier basis yields e cient linear approximations of uniformly
smooth signals, which are projected over the M lower frequency sinusoidal waves. When M increases, the decay of the error M ] can be
related to the global regularity of f . Chapter 9 characterizes spaces of
smooth functions from the asymptotic decay of M ] in a Fourier basis.
In a wavelet basis, the signal is projected over the M larger scale
wavelets, which is equivalent to approximating the signal at a xed resolution. Linear approximations of uniformly smooth signals in wavelet
and Fourier bases have similar properties and characterize nearly the
same function spaces.
Suppose that we want to approximate a class of discrete signals of
size N , modeled by a random vector F n]. The average approximation
error when projecting F over the rst M basis vectors of an orthonormal
basis B = fgmg0 m<N is M ] = EfkF ; FM k2 g = N ;1
X m=M EfjhF gm ij2 g: Chapter 9 proves that the basis that minimizes this error is the KarhunenLoeve basis, which diagonalizes the covariance matrix of F . This remarkable property explains the fundamental importance of the KarhunenLoeve basis in optimal linear signal processing schemes. This is however
only a beginning. 1.4. BASES FOR WHAT? 37 Non-linear Approximation The linear approximation (1.12) is im- proved if we choose a posteriori the M vectors gm, depending on f . The
approximation of f with M vectors whose indexes are in IM is
X
fM =
hf gm i gm :
(1.13)
m2IM The approximation error is the sum of the squared inner products with
vectors not in IM :
X
M ] = kf ; fM k2 =
jhf gm ij2 :
n= IM
2 To minimize this error, we choose IM to be the set of M vectors that
have the largest inner product amplitude jhf gmij. This approximation
scheme is non-linear because the approximation vectors change with f .
The amplitude of inner products in a wavelet basis is related to the
local regularity of the signal. A non-linear approximation that keeps the
largest wavelet inner products is equivalent to constructing an adaptive
approximation grid, whose resolution is locally increased where the signal is irregular. If the signal has isolated singularities, this non-linear
approximation is much more precise than a linear scheme that maintains the same resolution over the whole signal support. The spaces
of functions that are well approximated by non-linear wavelet schemes
are thus much larger than for linear schemes, and include functions
with isolated singularities. Bounded variation signals are important
examples that provide useful models for images.
In this non-linear setting, Karhunen-Loeve bases are not optimal for
approximating the realizations of a process F . It is often easy to nd a
basis that produces a smaller non-linear error than a Karhunen-Loeve
basis, but there is yet no procedure for computing the optimal basis
that minimizes the average non-linear error. Adaptive Basis Choice Approximations of non-linear signals can be improved by choosing the approximation vectors in families that are
much larger than a basis. Music recordings, which include harmonic
and transient structures of very di erent types, are examples of complex
signals that are not well approximated by a few vectors chosen from a
single basis. 38 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
A new degree of freedom is introduced if instead of choosing a priori
the basis B, we adaptively select a \best" basis, depending on the signal
f . This best basis minimizes a cost function related to the non-linear
approximation error of f . A fast dynamical programming algorithm
can nd the best basis in families of wavelet packet basis or local cosine
bases 140]. The selected basis corresponds to a time-frequency tiling
that \best" concentrates the signal energy over a few time-frequency
atoms.
Orthogonality is often not crucial in the post-processing of signal
coe cients. One may thus further enlarge the freedom of choice by approximating the signal f with M non-orthogonal vectors fg m g0 m<M ,
chosen from a large and redundant dictionary D = fg g 2;: fM = M ;1
X
m=0 am g m : Globally optimizing the choice of these M vectors in D can lead to
a combinatorial explosion. Chapter 9 introduces sub-optimal pursuit
algorithms that reduce the numerical complexity, while constructing
e cient approximations 119, 259]. 1.4.2 Estimation The estimation of a signal embedded in noise requires taking advantage
of any prior information about the signal and the noise. Chapter 10
studies and contrasts several approaches: Bayes versus minimax, linear versus non-linear. Until recently, signal processing estimation was
mostly Bayesian and linear. Non-linear smoothing algorithms existed
in statistics, but these procedures were often ad-hoc and complex. Two
statisticians, Donoho and Johnstone 167], changed the game by proving that a simple thresholding algorithm in an appropriate basis can be
a nearly optimal non-linear estimator. Linear versus Non-Linear A signal f n] of size N is contaminated
by the addition of a noise. This noise is modeled as the realization of
a random process W n], whose probability distribution is known. The 1.4. BASES FOR WHAT? 39 measured data are X n] = f n] + W n] :
The signal f is estimated by transforming the noisy data X with an
operator D:
~
F = DX :
~
The risk of the estimator F of f is the average error, calculated with
respect to the probability distribution of the noise W :
r(D f ) = Efkf ; DX k2g :
It is tempting to restrict oneself to linear operators D, because
of their simplicity. Yet, non-linear operators may yield a much lower
risk. To keep the simplicity, we concentrate on diagonal operators in
a basis B. If the basis B gives a sparse signal representation, Donoho
and Johnstone 167] prove that a nearly optimal non-linear estimator
is obtained with a simple thresholding:
N ;1 X
~
F = DX =
T (hX gm i) gm :
m=0 The thresholding function T (x) T (x) = sets to zero all coe cients below T :
0 if jxj < T :
x if jxj T In a wavelet basis, such a thresholding implements an adaptive smoothing, which averages the data X with a kernel that depends on the
regularity of the underlying signal f . Bayes Versus Minimax To optimize the estimation operator D, one must take advantage of any prior information available about the
signal f . In a Bayes framework, f is considered as a realization of a
random vector F , whose probability distribution is known a priori.
Thomas Bayes was a XVII century philosopher, who rst suggested
and investigated methods sometimes referred as \inverse probability
methods," which are basic to the study of Bayes estimators. The Bayes 40 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
risk is the expected risk calculated with respect to the prior probability
distribution of the signal:
r(D ) = E fr(D F )g :
Optimizing D among all possible operators yields the minimum Bayes
risk:
rn( ) = all D r(D ) :
inf
Complex signals such as images are clearly non-Gaussian, and there
is yet no reliable probabilistic model that incorporates the diversity of
structures such as edges and textures.
In the 1940's, Wald brought a new perspective on statistics, through
a decision theory partly imported from the theory of games. This point
of view o ers a simpler way to incorporate prior information on complex
signals. Signals are modeled as elements of a particular set , without
specifying their probability distribution in this set. For example, large
classes of images belong to the set of signals whose total variation is
bounded by a constant. To control the risk for any f 2 , we compute
the maximum risk
r(D ) = sup r(D f ) :
f2 The minimax risk is the lower bound computed over all operators D:
rn( ) = all D r(D ):
inf In practice, the goal is to nd an operator D that is simple to implement
and which yields a risk close the minimax lower bound.
Unless has particular convexity properties, non-linear estimators
have a much lower risk than linear estimators. If W is a white noise
and signals in have a sparse representation in B, then Chapter 10
shows that thresholding estimators are nearly minimax optimal. In
particular, the risk of wavelet thresholding estimators is close to the
minimax risk for wide classes of piecewise smooth signals, including
bounded variation images. Thresholding estimators are extended to
more complex problems such as signal restorations and deconvolutions.
The performance of a thresholding may also be improved with a best
basis search or a pursuit algorithm that adapts the basis B to the noisy
data. However, more adaptivity does not necessarily means less risk. 1.4. BASES FOR WHAT? 41 1.4.3 Compression Limited storage space and transmission through narrow band-width
channels create a need for compressing signals while minimizing their
degradation. Transform codes compress signals by decomposing them
in an orthonormal basis. Chapter 11 introduces the basic information
theory needed to understand these codes and optimize their performance. Bayes and minimax approaches are studied.
A transform code decomposes a signal f in an orthonormal basis
B = fgm g0 m<N : f= N ;1
X
m=0 hf gm i gm : The coe cients hf gm i are approximated by quantized values Q(hf gmi).
A signal f~ is restored from these quantized coe cients:
N ;1 X
f~ = Q(hf gmi) gm :
m=0 A binary code is used to record the quantized coe cients Q(hf gmi)
with R bits. The resulting distortion is
d(R f ) = kf ; f~k2 :
At the compression rates currently used for images, d(R f ) has a highly
non-linear behavior, which depends on the precision of non-linear approximations of f from a few vectors in the basis B.
To compute the distortion rate over a whole signal class, the Bayes
framework models signals as realizations of a random vector F whose
probability distribution is known. The goal is then to optimize the
quantization and the basis B in order to minimize the average distortion
rate d(R ) = E fd(R F )g. This approach applies particularly well to
audio signals, which are relatively well modeled by Gaussian processes.
In the absence of stochastic models for complex signals such as
images, the minimax approach computes the maximum distortion by
assuming only that the signal belongs to a prior set . Chapter 11
describes the implementation of image transform codes in wavelet bases
and block cosine bases. The minimax distortion rate is calculated for 42 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
bounded variation images, and wavelet transform codes are proved to
be nearly minimax optimal.
For video compression, one must also take advantage of the similarity of images across time. The most e ective algorithms predict
each image from a previous one by compensating for the motion, and
the error is recorded with a transform code. MPEG video compression
standards are described. 1.5 Travel Guide
1.5.1 Reproducible Computational Science
The book covers the whole spectrum from theorems on functions of
continuous variables to fast discrete algorithms and their applications.
Section 1.3.1 argues that models based on continuous time functions
give useful asymptotic results for understanding the behavior of discrete algorithms. Yet, a mathematical analysis alone is often unable
to predict fully the behavior and suitability of algorithms for speci c
signals. Experiments are necessary and such experiments ought in principle be reproducible, just like experiments in other elds of sciences.
In recent years, the idea of reproducible algorithmic results has been
championed by Claerbout 127] in exploration geophysics. The goal of
exploration seismology is to produce the highest possible quality image
of the subsurface. Part of the scienti c know-how involved includes appropriate parameter settings that lead to good results on real datasets.
The reproducibility of experiments thus requires having the complete
software and full source code for inspection, modi cation and application under varied parameter settings.
Donoho has advocated the reproducibility of algorithms in wavelet
signal processing, through the development of a WaveLab toolbox,
which is a large library of Matlab routines. He summarizes Claerbout's insight in a slogan: 105]
An article about computational science in a scienti c publication is not the scholarship itself, it is merely advertising
of the scholarship. The actual scholarship is the complete 1.5. TRAVEL GUIDE 43 software environment and the complete set of instructions
which generated the gures. Following this perspective, all wavelet and time-frequency tools presented in this book are available in WaveLab. The gures can be
reproduced as demos and the source code is available. The LastWave
package o ers a similar library of wavelet related algorithms that are
programmed in C, with a user-friendly shell interface and graphics.
Appendix B explains how to retrieve these toolboxes, and relates their
subroutines to the algorithms described in the book. 1.5.2 Road Map Sections are kept as independent as possible, and some redundancy is
introduced to avoid imposing a linear progression through the book.
The preface describes several possible paths for a graduate signal processing or an applied mathematics course. A partial hierarchy between
sections is provided by a level number. If a section has a level number
then all sub-sections without number inherit this level, but a higher
level number indicates that a subsection is more advanced.
Sections of level 1 introduce central ideas and techniques for wavelet
and time-frequency signal processing. These would typically be taught
in an introductory course. The rst sections of Chapter 7 on wavelet
orthonormal bases are examples. Sections of level 2 concern results that
are important but which are either more advanced or dedicated to an
application. Wavelet packets and local cosine bases in Chapter 8 are
of that sort. Applications to estimation and data compression belong
to this level, including fundamental results such as Wiener ltering.
Sections of level 3 describe advanced results that are at the frontier
of research or mathematically more di cult. These sections open the
book to open research problems.
All theorems are explained in the text and reading the proofs is
not necessary to understand the results. Proofs also have a level index specifying their di culty, as well as their conceptual or technical
importance. These levels have been set by trying to answer the question: \Should this proof be taught in an introductory course ?" Level 1
means probably, level 2 probably not, level 3 certainly not. Problems 44 CHAPTER 1. INTRODUCTION TO A TRANSIENT WORLD
at the end of each chapter follow this hierarchy of levels. Direct applications of the course are at the level 1. Problems at level 2 require
more thinking. Problems of level 3 are often at the interface of research
and can provide topics for deeper projects.
The book begins with Chapters 2 and 3, which review the Fourier
transform properties and elementary discrete signal processing. They
provide the necessary background for readers with no signal processing experience. Fundamental properties of local time-frequency transforms are presented in Chapter 4. The wavelet and windowed Fourier
transforms are introduced and compared. The measurement of instantaneous frequencies is used to illustrate the limitations of their timefrequency resolution. Wigner-Ville time-frequency distributions give a
global perspective which relates all quadratic time-frequency distributions. Frame theory is explained in Chapter 5. It o ers a exible framework for analyzing the properties of redundant or non-linear adaptive
decompositions. Chapter 6 explains the relations between the decay of
the wavelet transform amplitude across scales and local signal properties. It studies applications involving the detection of singularities and
analysis of multifractals.
The construction of wavelet bases and their relations with lter
banks are fundamental results presented in Chapter 7. An overdose of
orthonormal bases can strike the reader while studying the construction
and properties of wavelet packets and local cosine bases in Chapter 8.
It is thus important to read in parallel Chapter 9, which studies the
approximation performance of orthogonal bases. The estimation and
data compression applications of Chapters 10 and 11 give life to most
theoretical and algorithmic results of the book. These chapters o er
a practical perspective on the relevance of these linear and non-linear
signal processing algorithms. Chapter 2
Fourier Kingdom
The story begins in 1807 when Fourier presents a memoir to the Institut de France, where he claims that any periodic function can be
represented as a series of harmonically related sinusoids. This idea had
a profound impact in mathematical analysis, physics and engineering,
but it took one and a half centuries to understand the convergence of
Fourier series and complete the theory of Fourier integrals.
Fourier was motivated by the study of heat di usion, which is governed by a linear di erential equation. However, the Fourier transform
diagonalizes all linear time-invariant operators, which are the building
blocks of signal processing. It is therefore not only the starting point
of our exploration but the basis of all further developments. 2.1 Linear Time-Invariant Filtering 1
Classical signal processing operations such as signal transmission, stationary noise removal or predictive coding are implemented with linear
time-invariant operators. The time invariance of an operator L means
that if the input f (t) is delayed by , f (t) = f (t ; ), then the output
is also delayed by : g(t) = Lf (t) ) g(t ; ) = Lf (t): (2.1) For numerical stability, the operator L must have a weak form of continuity, which means that Lf is modi ed by a small amount if f is
45 CHAPTER 2. FOURIER KINGDOM 46 slightly modi ed. This weak continuity is formalized by the theory of
distributions 66, 69], which guarantees that we are on a safe ground
without further worrying about it. 2.1.1 Impulse Response Linear time-invariant systems are characterized by their response to a
Dirac impulse, de ned in Appendix A.7. If f is continuous, its value
at t is obtained by an \integration" against a Dirac located at t. Let
u (t) = (t ; u):
Z +1
f (t) =
f (u) u(t) du:
;1 The continuity and linearity of L imply that Lf (t) = Z +1
;1 f (u) L u(t) du: Let h be the impulse response of L:
h(t) = L (t):
The time-invariance proves that L u (t) = h(t ; u) and hence Lf (t) = Z +1
;1 f (u) h(t ; u) du = Z +1
;1 h(u)f (t ; u) du = h?f (t): (2.2) A time-invariant linear lter is thus equivalent to a convolution with
the impulse response h. The continuity of f is not necessary. This
formula remains valid for any signal f for which the convolution integral
converges.
Let us recall a few useful properties of convolution products:
Commutativity
f ? h(t) = h ? f (t):
(2.3)
Di erentiation
d (f ? h)(t) = df ? h(t) = f ? dh (t):
(2.4)
dt
dt
dt
Dirac convolution
f ? (t) = f (t ; ):
(2.5) 2.1. LINEAR TIME-INVARIANT FILTERING 47 Stability and Causality A lter is said to be causal if Lf (t) does
not depend on the values f (u) for u > t. Since Lf (t) = Z +1
;1 h(u) f (t ; u) du this means that h(u) = 0 for u < 0. Such impulse responses are said to
be causal.
The stability property guarantees that Lf (t) is bounded if f (t) is
bounded. Since
jLf (t)j Z +1
;1 jh(u)j jf (t ; u)j du R sup jf (u)j
u2R Z +1
;1 jh(u)j du +1
it is su cient that ;1 jh(u)j du < +1: One can verify that this condition is also necessary if h is a function. We thus say that h is stable
if it is integrable. Example 2.1 An ampli cation and delay system is de ned by
Lf (t) = f (t ; ):
The impulse response of this lter is h(t) = (t ; ): Example 2.2 A uniform averaging of f over intervals of size T is
calculated by 1 Z t+T=2 f (u) du:
Lf (t) = T
t;T=2
This integral can be rewritten as a convolution of f with the impulse
1
response h = T 1 ;T=2 T=2]. 2.1.2 Transfer Functions Complex exponentials ei!t are eigenvectors of convolution operators.
Indeed
Z +1
i!t =
Le
h(u) ei!(t;u) du
;1 CHAPTER 2. FOURIER KINGDOM 48
which yields Lei!t = eit! The eigenvalue Z +1
;1 ^
h(!) = ^
h(u) e;i!u du = h(!) ei!t: Z +1
;1 h(u) e;i!u du is the Fourier transform of h at the frequency !. Since complex sinusoidal waves ei!t are the eigenvectors of time-invariant linear systems,
it is tempting to try to decompose any function f as a sum of these
eigenvectors. We are then able to express Lf directly from the eigen^
values h(!). The Fourier analysis proves that under weak conditions
on f , it is indeed possible to write it as a Fourier integral. 2.2 Fourier Integrals 1
To avoid convergence issues, the Fourier integral is rst de ned over
the space L1(R ) of integrable functions 57]. It is then extended to the
space L2(R ) of nite energy functions 24]. 2.2.1 Fourier Transform in L1(R )
The Fourier integral f^(!) = Z +1
;1 f (t) e;i!t dt (2.6) measures \how much" oscillations at the frequency ! there is in f . If
f 2 L1 (R ) this integral does converge and
^
jf (! )j Z +1
;1 jf (t)j dt < +1: (2.7) The Fourier transform is thus bounded, and one can verify that it is
a continuous function of ! (Problem 2.1). If f^ is also integrable, the
following theorem gives the inverse Fourier transform. 2.2. FOURIER INTEGRALS 49 Theorem 2.1 (Inverse Fourier Transform) If f 2 L1(R ) and f^ 2
L1(R ) then
Z +1
f (t) = 21 ;1 f^(!) ei!t d!: (2.8) ^
Proof 2 . Replacing f (!) by its integral expression yields 1 Z +1 f (!) exp(i!t) d! = 1 Z +1 Z +1 f (u) exp i!(t ; u)] du d!:
^
2 ;1
2 ;1 ;1
We cannot apply the Fubini Theorem A.2 directly because f (u) exp i!(t;
u)] is not integrable in R2 . To avoid this technical problem, we multiply
by exp(; 2 !2 =4) which converges to 1 when goes to 0. Let us de ne
1 Z +1
I (t) = 2
;1 Z +1 22
f (u) exp ; 4! exp i!(t ; u)] du d!:
;1 (2.9)
We compute I in two di erent ways using the Fubini theorem. The
integration with respect to u gives
1 Z +1 f (!) exp ; 2 !2 n exp(i!t) d!:
^
I (t) = 2
4
;1 Since 22
f^(!) exp ; 4! exp(i!(t ; u)) jf^(!)j ^
and since f is integrable, we can apply the dominated convergence Theorem A.1, which proves that 1 Z +1 f (!) exp(i!t) d!:
^
lim I (t) = 2
(2.10)
!0
;1
Let us now compute the integral (2.9) di erently by applying the Fubini
theorem and integrating with respect to !: I (t) =
with Z +1
;1 g (t ; u) f (u) du 1 Z +1 exp(ix!) exp ; 2 !2 d!:
g (x) = 2
4
;1 (2.11) CHAPTER 2. FOURIER KINGDOM 50 A change of variable !0 = ! shows that g (x) = ;1 g1 ( ;1 x) and it is
proved in (2.32) that g1 (x) = ;1=2 e;x2 : The Gaussian g1 has an integral
equal to 1 and a fast decay. The squeezed Gaussians g have an integral
that remains equal to 1, and thus they converge to a Dirac when goes
to 0. By inserting (2.11) one can thus verify that
lim
!0 Z +1
;1 jI (t) ; f (t)j dt = lim
!0 ZZ g (t ; u) jf (u) ; f (t)j du dt = 0: Inserting (2.10) proves (2.8). The inversion formula (2.8) decomposes f as a sum of sinusoidal waves
ei!t of amplitude f^(!). By using this formula, we can show (Problem
2.1) that the hypothesis f^ 2 L1(R ) implies that f must be continuous.
The reconstruction (2.8) is therefore not proved for discontinuous functions. The extension of the Fourier transform to the space L2(R ) will
address this issue.
The most important property of the Fourier transform for signal
processing applications is the convolution theorem. It is another way to
express the fact that sinusoidal waves eit! are eigenvalues of convolution
operators. Theorem 2.2 (Convolution) Let f
function g = h ? f is in L1(R ) and 2 L1 (R) and h 2 L1(R ) . The ^
g(!) = h(!)f^(!):
^ Proof 1 . g (!) =
^ Z +1
;1 exp(;it!) Z +1
;1 (2.12) f (t ; u) h(u) du dt: Since jf (t;u)jjh(u)j is integrable in R2 , we can apply the Fubini Theorem
A.2, and the change of variable (t u) ! (v = t ; u u) yields g (! ) =
^
= Z +1 Z +1
;1 ;1
Z +1
;1 exp ;i(u + v)!] f (v) h(u) du dv exp(;iv!) f (v) dv which veri es (2.12). Z +1
;1 exp(;iu!) h(u) du 2.2. FOURIER INTEGRALS 51 The response Lf = g = f ? h of a linear time-invariant system can be
calculated from its Fourier transform g(!) = f^(!) ^ (!) with the inverse
^
h
Fourier formula
1 Z +1 g(!) ei!t d!
^
(2.13)
g(t) = 2
;1 which yields 1 Z +1 ^ (!) f (!) ei!t d!:
Lf (t) = 2
h^
(2.14)
;1
Each frequency component eit! of amplitude f^(!) is ampli ed or atten^
uated by h(!). Such a convolution is thus called a frequency ltering,
^
and h is the transfer function of the lter.
The following table summarizes important properties of the Fourier
transform, often used in calculations. Most of these formulas are proved
with a change of variable in the Fourier integral. Property Inverse
Convolution
Multiplication
Translation
Modulation
Scaling
Time derivatives
Frequency derivatives
Complex conjugate
Hermitian symmetry Function f (t)
f^(t)
f1 ? f2(t)
f1(t) f2(t)
f (t ; t0 )
ei!0t f (t)
t
f(s)
f (p) (t)
(;it)p f (t)
f (t)
f (t) 2 R Fourier Transform
f^(!)
2 f (;!)
f^1(!) f^2(!)
1 f^ ? f^ (!)
212
e;it0 ! f^(!)
f^(! ; !0)
^
jsj f (s ! )
(i!)p f^(!)
f^(p)(!)
f^ (;!)
f^(;!) = f^ (!) 2.2.2 Fourier Transform in L2(R ) The Fourier transform of the indicator function f = 1 ;1 1] is
Z1
^(!) = e;i!t dt = 2 sin ! :
f
!
;1 (2.15)
(2.16)
(2.17)
(2.18)
(2.19)
(2.20)
(2.21)
(2.22)
(2.23)
(2.24) CHAPTER 2. FOURIER KINGDOM 52 This function is not integrable because f is not continuous, but its
square is integrable. The inverse Fourier transform Theorem 2.1 thus
does not apply. This motivates the extension of the Fourier transform
R +1
to the space L2(R ) of functions f with a nite energy ;1 jf (t)j2 dt <
+1: By working in the Hilbert space L2(R ), we also have access to all
the facilities provided by the existence of an inner product. The inner
product of f 2 L2 (R) and g 2 L2(R ) is
hf g i = Z +1
;1 and the resulting norm in L2(R ) is
kf k 2 = hf f i = f (t) g (t) dt
Z +1
;1 jf (t)j2 dt: The following theorem proves that inner products and norms in L2(R )
are conserved by the Fourier transform up to a factor of 2 . Equations
(2.25) and (2.26) are called respectively the Parseval and Plancherel
formulas. Theorem 2.3 If f and h are in L1 (R ) \ L2(R ) then
Z +1 1 Z +1 f^(!) ^ (!) d!:
f (t) h (t) dt = 2
h
;1
;1
For h = f it follows that (2.25) 1 Z +1 jf^(!)j2 d!:
jf (t)j dt =
(2.26)
2 ;1
;1
Proof 1 . Let g = f ? h with h(t) = h (;t). The convolution Theorem 2.2
^
and property (2.23) show that g (!) = f^(!) h (!). The reconstruction
^
formula (2.8) applied to g(0) yields
Z +1 Z +1
;1 f (t) h (t) dt = g(0) = 21 2 Z +1
;1 g(!) d! = 21
^ Z +1
;1 ^
f^(!) h (!) d!: 2.2. FOURIER INTEGRALS 53 Density Extension in L2(R ) If f 2 L2(R ) but f 2 L1 (R), its
= Fourier transform cannot be calculated with the Fourier integral (2.6)
because f (t) ei!t is not integrable. It is de ned as a limit using the
Fourier transforms of functions in L1(R ) \ L2(R ).
Since L1(R ) \ L2 (R) is dense in L2(R ), one can nd a family ffngn2Z
of functions in L1 (R ) \ L2(R ) that converges to f :
lim kf ; fnk = 0: n!+1 Since ffngn2Z converges, it is a Cauchy sequence, which means that
kfn ; fp k is arbitrarily small if n and p are large enough. Moreover,
fn 2 L1(R ), so its Fourier transform f^n is well de ned. The Plancherel
formula (2.26) proves that ff^ngn2Z is also a Cauchy sequence because
^^
kfn ; fp k = p 2 kfn ; fpk is arbitrarily small for n and p large enough. A Hilbert space (Appendix
A.2) is complete, which means that all Cauchy sequences converge to
an element of the space. Hence, there exists f^ 2 L2(R ) such that
lim kf^ ; f^nk = 0: n!+1 By de nition, f^ is the Fourier transform of f . This extension of the
Fourier transform to L2(R ) satis es the convolution theorem, the Parseval and Plancherel formulas, as well as all properties (2.15-2.24). Diracs Diracs are often used in calculations their properties are sum- marized in Appendix A.7. A Dirac associates to a function its value
at t = 0. Since ei!t = 1 at t = 0 it seems reasonable to de ne its Fourier
transform by
Z +1
^(!) =
(t) e;i!t dt = 1:
(2.27)
;1 This formula is justi ed mathematically by the extension of the Fourier
transform to tempered distributions 66, 69]. CHAPTER 2. FOURIER KINGDOM 54 2.2.3 Examples The following examples often appear in Fourier calculations. They also
illustrate important Fourier transform properties.
The indicator function f = 1 ;T T ] is discontinuous at t = T .
Its Fourier transform is therefore not integrable: f^(!) = ZT ;T e;i!t dt = 2 sin(T!) :
! (2.28) ^
An ideal low-pass lter has a transfer function h = 1 ; ] that
selects low frequencies over ; ]. The impulse response is calculated with the inverse Fourier integral (2.8):
Z ei!t d! = sin(t t) :
h(t) = 21
; (2.29) A passive electronic circuit implements analog lters with resistances, capacities and inductors. The input voltage f (t) is related
to the output voltage g(t) by a di erential equation with constant
coe cients:
K
M
X
X (k )
(k)
ak f (t) = bk g (t):
(2.30)
k=0 k=0 Suppose that the circuit is not charged for t < 0, which means
that f (t) = g(t) = 0. The output g is a linear time-invariant
function of f and can thus be written g = f ? h. Computing the
Fourier transform of (2.30) and applying (2.22) proves that
P K
k
^
^ (!) = g(!) = Pk=0 ak (i!) :
h
M b (i! )k
f^(!)
k=0 k (2.31) It is therefore a rational function of i!. An ideal low-pass transfer
function 1 ; ] thus cannot be implemented by an analog circuit.
It must be approximated by a rational function. Chebyshev or
Butterworth lters are often used for this purpose 14]. 2.2. FOURIER INTEGRALS 55 A Gaussian f (t) = e;t2 is a C1 function with a fast asymptotic
decay. Its Fourier transform is also a Gaussian:
p
f^(!) = e;!2 =4 :
(2.32)
This Fourier transform is computed 2by showing with an integraR +1
tion by parts that f^(!) = ;1 e;t e;i!t dt is di erentiable and
satis es the di erential equation
^
2 f 0(!) + ! f^(!) = 0:
(2.33)
2
The solution Rof 1 equation is a Gaussian f^(!) = K e; !4 , and
this 2
p
+
since f^(0) = ;1 e;t dt = , we obtain (2.32). A Gaussian chirp f (t) = exp ;(a ; ib)t2 ] has a Fourier transform
calculated with a similar di erential equation:
r
ib 2
(2.34)
f^(!) = a ; ib exp ;(a 2++ b)2! :
4(a
)
A translated Dirac (t) = (t ; ) has a Fourier transform calculated by evaluating e;i!t at t = :
^ (!) = Z +1
;1 (t ; ) e;i!t dt = e;i! : (2.35) The Dirac comb is a sum of translated Diracs c(t) = +1
X n=;1 (t ; nT ) that is used to uniformly sample analog signals. Its Fourier transform is derived from (2.35): c(!) =
^ +1
X ;inT!
e
: n=;1 (2.36) The Poisson formula proves that it is also equal to a Dirac comb
with a spacing equal to 2 =T . CHAPTER 2. FOURIER KINGDOM 56 Theorem 2.4 (Poisson Formula) In the sense of distribution equalities (A.32), +1
X ;inT! 2
e
= T n=;1 +1
X k=;1 ! ; 2T k : (2.37) Proof 2 . The Fourier transform c in (2.36) is periodic with period 2 =T .
^
To verify the Poisson formula, it is therefore su cient to prove that the
restriction of c to ; =T =T ] is equal to 2 =T . The formula (2.37) is
^
proved in the sense of a distribution equality (A.32) by showing that for
any test function ^(!) with a support included in ; =T =T ], hc ^i = N !+1
^
lim Z +1 X
N ;1 n=;N exp(;inT!) ^(!) d! = 2 ^(0):
T The sum of the geometric series is
N
X
n=;N N
exp(;inT!) = sin (sin + 1=2)T!] :
T!=2] (2.38) Hence Z =T sin (N + 1=2)T!] T!=2
^i = lim 2
^
hc
^
N !+1 T ; =T
!
sin T!=2] (!) d!: (2.39)
Let
( ^ T!=2
(!) sin T!=2] if j!j =T
^(!) =
0
if j!j > =T
and (t) be the inverse Fourier transform of ^(!). Since 2!;1 sin(a!) is
the Fourier transform of 1 ;a a] (t), the Parseval formula (2.25) implies
Z +1 sin (N + 1=2)T!]
^i = lim 2
^(!) d!
hc
^
N !+1 T ;1
!
2 Z (N +1=2)T (t) dt:
= lim
N !+1 T ;(N +1=2)T When N goes to +1 the integral converges to ^(0) = ^(0). (2.40) 2.3. PROPERTIES 1 57 2.3 Properties 1 2.3.1 Regularity and Decay The global regularity of a signal f depends on the decay of jf^(!)j when
the frequency ! increases. The di erentiability of f is studied. If
f^ 2 L1(R ), then the Fourier inversion formula (2.8) implies that f is
continuous and bounded:
1 Z +1 jei!t f^(!)j d! = 1 Z +1 jf^(!)j d! < +1 : (2.41)
jf (t)j
2
2
;1 ;1 The next proposition applies this property to obtain a su cient condition that guarantees the di erentiability of f at any order p.
Proposition 2.1 A function f is bounded and p times continuously
di erentiable with bounded derivatives if
Z +1
^
jf (! )j (1 + j! jp) d! < +1 :
(2.42)
;1 Proof The Fourier transform of the kth order derivative f (k) (t) is
(i!)k f^(!). Applying (2.41) to this derivative proves that
2. (k) jf (t)j Z +1 jf^(!)j j!jk d!: ;1
R +1 jf^(!)jj!jk d!
Condition (2.42) implies that ;1 so f (k) (t) is continuous and bounded. < +1 for any k p, This result proves that if there exist a constant K and > 0 such that
K
^
jf (! )j
then f 2 Cp:
p+1+
1 + j! j
If f^ has a compact support then (2.42) implies that f 2 C1.
The decay of jf^(!)j depends on the worst singular behavior of f .
For example, f = 1 ;T T ] is discontinuous at t = T , so jf^(!)j decays
like j!j;1. In this case, it could also be important to know that f (t)
is regular for t 6= T . This information cannot be derived from the
decay of jf^(!)j. To characterize local regularity of a signal f it is
necessary to decompose it over waveforms that are well localized in
time, as opposed to sinusoidal waves ei!t . Section 6.1.3 explains that
wavelets are particularly well adapted to this purpose. 58 CHAPTER 2. FOURIER KINGDOM 2.3.2 Uncertainty Principle Can we construct a function f whose energy is well localized in time
and whose Fourier transform f^ has an energy concentrated in a small
frequency neighborhood? The Dirac (t ; u) has a support restricted
to t = u but its Fourier transform e;iu! has an energy uniformly spread
over all frequencies. We know that jf^(!)j decays quickly at high frequencies only if f has regular variations in time. The energy of f must
therefore be spread over a relatively large domain.
To reduce the time spread of f , we can scale it by s < 1 while
maintaining constant its total energy. If
1
t
fs(t) = ps f s then kfsk2 = kf k2:
p
The Fourier transform f^s(!) = s f^(s!) is dilated by 1=s so we lose in
frequency localization what we gained in time. Underlying is a trade-o
between time and frequency localization.
Time and frequency energy concentrations are restricted by the
Heisenberg uncertainty principle. This principle has a particularly important interpretation in quantum mechanics as an uncertainty as to
the position and momentum of a free particle. The state of a onedimensional particle is described by a wave function f 2 L2(R ). The
probability density that this particle is located at t is kf1k2 jf (t)j2. The
probability density that its momentum is equal to ! is 2 k1f k2 jf^(!)j2.
The average location of this particle is
1 Z +1 t jf (t)j2 dt
u = kf k2
(2.43)
;1
and the average momentum is
1 Z +1 ! jf^(!)j2 d!:
= 2 kf k2
(2.44)
;1
The variances around these average values are respectively
1 Z +1(t ; u)2 jf (t)j2 dt
2
(2.45)
t = kf k2
;1 2.3. PROPERTIES 59 and Z +1
1
= 2 kf k2
(! ; )2 jf^(!)j2 d!:
(2.46)
!
;1
The larger t , the more uncertainty there is concerning the position of
the free particle the larger ! , the more uncertainty there is concerning
its momentum.
2 Theorem 2.5 (Heisenberg Uncertainty) The temporal variance and
the frequency variance of f 2 L2 (R ) satisfy
1:
(2.47)
4
This inequality is an equality if and only if there exist (u a b) 2
R 2 C 2 such that
f (t) = a ei t e;b(t;u)2 :
(2.48)
p
Proof 2 . The following proof due to Weyl 75] supposes that limjtj!+1 tf (t) =
0, but the theorem is valid for any f 2 L2 (R). If the average time and
frequency localization of f is u and , then the average time and frequency location of exp(;i t) f (t + u) is zero. It is thus su cient to
prove the theorem for u = = 0. Observe that
2 2 t! Z +1
Z +1
= 2 k1f k4
jt f (t)j2 dt
j! f^(!)j2 d!:
(2.49)
t!
;1
;1
^
Since i!f (!) is the Fourier transform of f 0 (t), the Plancherel identity
(2.26) applied to i!f^(!) yields
Z +1
Z +1
22
=1
jt f (t)j2 dt
jf 0(t)j2 dt:
(2.50)
2 2 t! kf k4 ;1 ;1 Schwarz's inequality implies
22 t! 1 kf k4
1 kf k4
1 4kf k4 Z +1 jt f 0(t) f (t)j dt Z;11 t
+ 2 ;1
Z +1
;1 2 f 0 (t) f (t) + f 0 (t) f (t)] dt
2 t (jf (t)j2 )0 dt : 2 CHAPTER 2. FOURIER KINGDOM 60 p Since limjtj!+1 t f (t) = 0, an integration by parts gives
22 t! 1 4kf k4 Z +1
;1 jf (t)j2 dt = 1 :
4
2 (2.51) To obtain an equality, Schwarz's inequality applied to (2.50) must be an
equality. This implies that there exists b 2 C such that f 0 (t) = ;2 b t f (t): (2.52) Hence, there exists a 2 C such that f (t) = a exp(;bt2 ). The other
steps of the proof are then equalities so that the lower bound is indeed
reached. When u 6= 0 and 6= 0 the corresponding time and frequency
translations yield (2.48). In quantum mechanics, this theorem shows that we cannot reduce arbitrarily the uncertainty as to the position and the momentum of a free
particle. In signal processing, the modulated Gaussians (2.48) that have
a minimum joint time-frequency localization are called Gabor chirps.
As expected, they are smooth functions with a fast time asymptotic
decay. Compact Support Despite the Heisenberg uncertainty bound, we might still be able to construct a function of compact support whose
Fourier transform has a compact support. Such a function would be
very useful in constructing a nite impulse response lter with a bandlimited transfer function. Unfortunately, the following theorem proves
that it does not exist. Theorem 2.6 If f 6= 0 has a compact support then f^(!) cannot be ^
zero on a whole interval. Similarly, if f 6= 0 has a compact support
then f (t) cannot be zero on a whole interval.
Proof 2 . We prove only the rst statement, since the second is derived
^
from the rst by applying the Fourier transform. If f has a compact
support included in ;b b] then 1 Z b f^(!) exp(i!t) d!:
f (t) = 2
;b (2.53) 2.3. PROPERTIES 61 If f (t) = 0 for t 2 c d], by di erentiating n times under the integral at
t0 = (c + d)=2, we obtain
1 Z b f^(!) (i!)n exp(i!t ) d! = 0:
f (t0 ) = 2
0
;b
(n) (2.54) Since 1 Z b f (!) exp i!(t ; t )] exp(i!t ) d!
^
f (t) = 2
(2.55)
0
0
;b
developing exp i!(t ; t0 )] as an in nite series yields for all t 2 R f (t) = 21 +1
X i(t ; t0)]n Z b ^ n
f (!) ! exp(i!t0 ) d! = 0: n=0 n! ;b (2.56) This contradicts our assumption that f 6= 0. 2.3.3 Total Variation
The total variation measures the total amplitude of signal oscillations.
It plays an important role in image processing, where its value depends
on the length of the image level sets. We show that a low-pass lter can
considerably amplify the total variation by creating Gibbs oscillations. Variations and Oscillations If f is di erentiable, its total variation is de ned by kf kV = Z +1
;1 jf 0 (t)j dt : (2.57) If fxpgp are the abscissa of the local extrema of f where f 0(xp) = 0, then
P
kf kV = p jf (xp+1 ) ; f (xp )j. It thus measures the total amplitude of
the oscillations of f . For example, if f (t) = e;t2 , then kf kV = 2. If
f (t) = sin( t)=( t), then f has a local extrema at xp 2 p p +1] for any
p 2 Z. Since jf (xp+1) ; f (xp)j jpj;1, we derive that kf kV = +1.
The total variation of non-di erentiable functions can be calculated by considering the derivative in the general sense of distributions CHAPTER 2. FOURIER KINGDOM 62 66, 79]. This is equivalent to approximating the derivative by a nite
di erence on an interval h that goes to zero:
Z +1
jf (t) ; f (t ; h)j
kf kV = lim
dt :
(2.58)
h!0 ;1
jhj
The total variation of discontinuous functions is thus well de ned. For
example, if f = 1 a b] then (2.58) gives kf kV = 2. We say that f has a
bounded variation if kf kV < +1.
Whether f 0 is the standard derivative of f or its generalized derivab
tive in the sense of distributions, its Fourier transform is f 0(!) =
i!f^(!). Hence
Z +1
^(!)j
j! j jf
jf 0 (t)jdt = kf kV
which implies that ;1 ^
jf (! )j kf kV
j! j : (2.59) ^
However, jf (!)j = O(j!j;1) is not a su cient condition to guarantee
that f has bounded variation. For example, if f (t) = sin( t)=( t), then
f^ = 1 ; ] satis es jf^(!)j j!j;1 although kf kV = +1. In general,
the total variation of f cannot be evaluated from jf^(!)j. Discrete Signals Let fN n] = f (n=N ) be a discrete signal obtained with a uniform sampling at intervals N ;1 . The discrete total variation
is calculated by approximating the signal derivative by a nite di erence
over the sampling distance h = N ;1 , and replacing the integral (2.58)
by a Riemann sum, which gives:
X
kfN kV =
jfN n] ; fN n ; 1]j :
(2.60)
n If np are the abscissa of the local extrema of fN , then
X
kfN kV =
jfN np+1 ] ; fN np ]j :
p The total variation thus measures the total amplitude of the oscillations
of f . In accordance with (2.58), we say that the discrete signal has a
bounded variation if kfN kV is bounded by a constant independent of
the resolution N . 2.3. PROPERTIES 63 Gibbs Oscillations Filtering a signal with a low-pass lter can cre- ate oscillations that have an in nite total variation. Let f = f ? h be
the ltered signal obtained with an ideal low-pass lter whose transfer
^
function is h = 1 ; ]. If f 2 L2(R ), then f converges to f in L2(R )
norm: lim !+1 kf ; f k = 0. Indeed, f^ = f^ 1 ; ] and the Plancherel
formula (2.26) implies that
Z Z 1 +1 jf^(!) ; f^ (!)j2 d! = 1
^2
kf ; f k =
2 ;1
2 j!j> jf (!)j d!
which goes to zero as increases. However, if f is discontinuous in
t0 , then we show that f has Gibbs oscillations in the neighborhood
of t0 , which prevents supt2R jf (t) ; f (t)j from converging to zero as
increases.
Let f be a bounded variation function kf kV < +1 that has an
isolated discontinuity at t0, with a left limit f (t;) and right limit f (t+).
0
0
It is decomposed as a sum of fc, which is continuous in the neighborhood
of t0 , plus a Heaviside step of amplitude f (t+) ; f (t;):
0
0
2 f (t) = fc(t) + f (t+) ; f (t;)] u(t ; t0 )
0
0
with
Hence u(t) = 1 if t 0 :
0 otherwise f (t) = fc ? h (t) + f (t+) ; f (t;)] u ? h (t ; t0 ):
0
0 (2.61)
(2.62) Since fc has bounded variation and is uniformly continuous in the neighborhood of t0 , one can prove (Problem 2.13) that fc ? h (t) converges
uniformly to fc(t) in a neighborhood of t0 . The following proposition
shows that this is not true for u ? h , which creates Gibbs oscillations. Proposition 2.2 (Gibbs) For any > 0,
u ? h (t) = Zt
sin x
;1 x dx: (2.63) CHAPTER 2. FOURIER KINGDOM 64 Proof 2 . The impulse response of an ideal low-pass lter, calculated in
(2.29), is h (t) = sin( t)=( t): Hence sin (t ; ) d = Z +1 sin (t ; ) d :
u ? h (t) =
u( ) (t ; )
(t ; )
;1
0
The change of variable x = (t ; ) gives (2.63). Z +1 f (t) f ? h4 (t) f ? h2 (t) f ? h (t) 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 −0.2
0 0.5 1 −0.2
0 0.5 1 −0.2
0 0.5 1 −0.2
0 0.5 Figure 2.1: Gibbs oscillations created by low-pass lters with cut-o
frequencies that decrease from left to right.
The function Zt
sin x x dx
is a sigmoid that increases from 0 at t = ;1 to 1 at t = +1, with
s(0) = 1=2. It has oscillations of period = , which are attenuated
when the distance to 0 increases, but their total variation is in nite:
kskV = +1. The maximum amplitude of the Gibbs oscillations occurs
at t = = , with an amplitude independent of :
Z
sin x dx ; 1 0:045 :
A = s( ) ; 1 =
;1 x
Inserting (2.63) in (2.62) shows that
f (t) ; f (t) = f (t+) ; f (t;)] s( (t ; t0)) + ( t)
(2.64)
0
0
where lim !+1 supjt;t0 j< j ( t)j = 0 in some neighborhood of size
> 0 around t0 . The sigmoid s( (t ; t0 )) centered at t0 creates a
maximum error of xed amplitude for all . This is seen in Figure
2.1, where the Gibbs oscillations have an amplitude proportional to
the jump f (t+) ; f (t;) at all frequencies .
0
0
s( t) = ;1 1 2.3. PROPERTIES 65 Image Total Variation The total variation of an image f (x1 x2 ) depends on the amplitude of its variations as well as the length of the
contours along which they occur. Suppose that f (x1 x2 ) is di erentiable. The total variation is de ned by
kf kV = ZZ ~
jrf (x1 x2)j dx1 dx2 (2.65) where the modulus of the gradient vector is ~
jrf (x1 @ f (x1 x2 ) 2 + @ f (x1 x2 )
@x1
@x2 x2)j = 2 !1=2 : As in one dimension, the total variation is extended to discontinuous
functions by taking the derivatives in the general sense of distributions.
An equivalent norm is obtained by approximating the partial derivatives by nite di erences:
j h f (x1 x2 )j = f (x1 x2 ) ; f (x1 ; h x2 ) 2 +
h
f (x1 x2 ) ; f (x1 x2 ; h)
h 2 !1=2 : One can verify that
kf kV lim
h!0 ZZ j h f (x1 x2)j dx1 dx2 p 2 kf kV : (2.66) The nite di erence integral gives a larger value when f (x1 x2) is discontinuous along a diagonal line in the (x1 x2) plane.
The total variation of f is related to the length of it level sets. Let
us de ne
2
y = f(x1 x2 ) 2 R : f (x1 x2 ) > y g :
If f is continuous then the boundary @ y of y is the level set of all
(x1 x2) such that f (x1 x2 ) = y. Let H 1(@ y ) be the length of @ y .
Formally, this length is calculated in the sense of the monodimensional
Hausdor measure. The following theorem relates the total variation
of f to the length of its level sets. CHAPTER 2. FOURIER KINGDOM 66 Theorem 2.7 (Co-area Formula) If kf kV < +1 then
kf kV = Z +1
;1 H 1(@ y ) dy: (2.67) Proof 2 . The proof is a highly technical result that is given in 79]. We
give an intuitive explanation when f is continuously di erentiable. In
this case @ y is a di erentiable curve x(y s) 2 R2 , which is parameterized by the arc-length s. Let ~ (x) be the vector tangent to this curve
~
in the plane. The gradient rf (x) is orthogonal to ~ (x). The Frenet
coordinate system along @ y is composed of ~ (x) and of the unit vector
~
~ (x) parallel to rf (x). Let ds and dn be the Lebesgue measures in the
n
direction of ~ and ~ . We have
n ~
~
jrf (x)j = rf (x) :~ = dn
n dy (2.68) where dy is the di erential of amplitudes across level sets. The idea of
the proof is to decompose the total variation integral over the plane as
an integral along the level sets and across level sets, which we write: kf kV = ZZ ~
jrf (x1 x2 )j dx1 dx2 = By using (2.68) we can get kf kV = R ZZ ZZ
@y @y ~
jrf (x(y s))j ds dn: (2.69) ds dy : But @ y ds = H 1 (@ y ) is the length of the level set, which justi es
(2.67). The co-area formula gives an important geometrical interpretation of
the total image variation. Images are uniformly bounded so the integral
(2.67) is calculated over a nite interval and is proportional to the
average length of level sets. It is nite as long as the level sets are not
fractal curves. Let f = 1 be proportional to the indicator function
of a set
R 2 which has a boundary @ of length L. The co-area
formula (2.7) implies that kf kV = L. In general, bounded variation
images must have step edges of nite length. 2.3. PROPERTIES 67 (a)
(b)
Figure 2.2: (a): The total variation of this image remains nearly constant when the resolution N increases. (b): Level sets @ y obtained by
sampling uniformly the amplitude variable y. Discrete Images A camera measures light intensity with photore- ceptors that perform a uniform sampling over a grid that is supposed
to be uniform. For a resolution N , the sampling interval is N ;1 and the
resulting image can be written fN n1 n2 ] = f (n1 =N n2=N ). Its total
variation is de ned by approximating derivatives by nite di erences
and the integral (2.66) by a Riemann sum:
kfN kV 1 XX
=N
n1 n2 2 f n1 n2 ] ; f n1 ; 1 n2] +
f n1 n2 ] ; f n1 n2 ; 1] 2 1=2 (2.70) : In accordance with (2.66) we say that the image has bounded variation
if kfN kV is bounded by a constant independent of the resolution N .
The co-area formula proves that it depends p the length of the level
on
sets as the image resolution increases. The 2 upper bound factor in
(2.66) comes p the fact that the length of a diagonal line can be
from
increased by 2 if it is approximated by a zig-zag line that remains on
the horizontal and vertical segments of the image sampling grid. Figure
2.2(a) shows a bounded variation image and Figure 2.2(b) displays the
level sets obtained by discretizing uniformly the amplitude variable CHAPTER 2. FOURIER KINGDOM 68 y. The total variation of this image remains nearly constant as the
resolution varies. 2.4 Two-Dimensional Fourier Transform 1
The Fourier transform in R n is a straightforward extension of the onedimensional Fourier transform. The two-dimensional case is brie y
reviewed for image processing applications. The Fourier transform of a
two-dimensional integrable function f 2 L1 (R2 ) is f^(!1 !2) = Z +1 Z +1
;1 ;1 f (x1 x2) exp ;i(!1 x1 + !2 x2)] dx1 dx2 : (2.71) In polar coordinates exp i(!1 x + !2y)] can be rewritten
exp i(!1 x1 + !2x2 )] = exp i (x1 cos + x2 sin )]
p2
2
with = !1 + !2 : It is a plane wave that propagates in the direction of and oscillates at the frequency . The properties of a
two-dimensional Fourier transform are essentially the same as in one
dimension. We summarize a few important results.
If f 2 L1(R 2 ) and f^ 2 L1(R 2 ) then
1 Z Z f^(! ! ) exp i(! x +! x )] d! d! : (2.72)
f (x1 x2 ) = 4 2
1
2
11
22
1
2
If f 2 L1(R 2 ) and h 2 L1 (R2 ) then the convolution g(x1 x2 ) = f ?h(x1 x2) = ZZ f (u1 u2) h(x1 ; u1 x2 ; u2 ) du1 du2 has a Fourier transform
^
g(!1 !2) = f^(!1 !2) h(!1 !2):
^ (2.73) The Parseval formula proves that
ZZ f (x1 x2 ) g (x1 x2) dx1 dx2 = 1 Z Z f^(! ! ) g (! ! ) d! d! :
1
2^
1
2
1
2
42 (2.74) 2.4. TWO-DIMENSIONAL FOURIER TRANSFORM 69 If f = g, we obtain the Plancherel equality
ZZ 1 Z Z jf^(! ! )j2 d! d! : (2.75)
jf (x1 x2 )j dx1 dx2 = 2
1
2
1
2
4
2 The Fourier transform of a nite energy function thus has nite
energy. With the same density based argument as in one dimension, energy equivalence makes it possible to extend the Fourier
transform to any function f 2 L2(R 2 ).
If f 2 L2(R 2 ) is separable, which means that f (x1 x2 ) = g(x1) h(x2)
then its Fourier transform is f^(!1 !2) = g(!1) h(!2)
^^
^
where h and g are the one-dimensional Fourier transforms of g
^
and h. For example, the indicator function f (x1 x2 ) = 1 if jx1j T
0 otherwise jx2 j T =1
;T T ](x1 ) 1 ;T T ](x2 ) is a separable function whose Fourier transform is derived from
(2.28):
f^(!1 !2) = 4 sin(T!1)!sin(T!2) :
!
1 2 If f (x1 x2) is rotated by : f (x1 x2) = f (x1 cos ; x2 sin x1 sin + x2 cos ) then its Fourier transform is rotated by ; : f^ (!1 !2) = f^(!1 cos + !2 sin ;!1 sin + !2 cos ): (2.76) CHAPTER 2. FOURIER KINGDOM 70 2.5 Problems
2.1.
2.2.
2.3.
2.4. 2.5. 2.6.
2.7. ^
Prove that if f 2 L1(R) then f (!) is a continuous function of !,
1 (R ) then f (t) is also continuous.
^
and if f 2 L
1 Prove the translation (2.18), scaling (2.20) and time derivative
(2.21) properties of the Fourier transform.
1 Let f (t) = Real f (t)] and f (t) = Ima f (t)] be the real and
r
i
^
imaginary parts of f (t). Prove that fr (!) = f^(!) + f^ (;!)]=2 and
f^i(!) = f^(!) ; f^ (;!)]=(2i).
1 By using the Fourier transform, verify that
Z +1 sin3 t 3
Z +1 sin4 t 2
dt = 4 and
dt = 3 :
;1 t3
;1 t4
1 Show that the Fourier transform of f (t) = exp(;(a ; ib)t2 ) is
r
a
^(!) =
f
exp ; 4(a2+ ib 2 ) !2 :
a ; ib
+b
Hint: write a di erential equation similar to (2.33).
2 Riemann-Lebesgue Prove that if f 2 L1 (R ) then lim f (! ) = 0.
^
!!1
Hint: Prove it rst for C1 functions with a compact support and use
a density argument.
1 Stability of passive circuits
(a) Let p be a complex number with Real p] < 0. Compute the
Fourier transforms of f (t) = exp(pt) 1 0 +1) (t) and of f (t) =
tn exp(pt) 1 0 +1) (t).
(b) A passive circuit relates the input voltage f to the output voltage
g by a di erential equation with constant coe cients:
1 K
X
k=0 (k) ak f (t) = M
X k=0 bk g(k) (t): Prove that this system is stable and causal if and only if the
P
roots of the equation M bk z k = 0 have a strictly negative
k=0
real part.
(c) A Butterworth lter satis es
1
^
:
jh(!)j2 =
1 + (!=!0 )2N 2.5. PROBLEMS 71 ^
For N = 3, compute h(!) and h(t) so that this lter can be
implemented by a stable electronic circuit.
1 For any A > 0, construct f such that the time and frequency
2.8.
spread measured respectively by t and ! in (2.46, 2.45) satisfy
t > A and ! > A.
2 Suppose that f (t)
2.9.
0 and that its support is in ;T T ]. Ver^
ify that jf^(!)j f (0). Let !c be the half-power point de ned by
jf^(!c)j2 = jf (0)j2 =2 and jf (!)j2 < jf (0)j2 =2 for ! < !c. Prove that
!c T =2.
2.10. 1 Hilbert transform
(a) Prove that if f^(!) = 2=(i!) then f (t) = sign(t) = t=jtj.
(b) Suppose that f 2 L1(R) is a causal function, i.e., f (t) = 0 for
t < 0. Let f^r (!) = Real f^(!)] and f^i (!) = Ima f^(!)]. Prove
^
that f^r = Hfi and fi = ;Hfr where H is the Hilbert transform
operator
1 Z +1 g(u) du:
Hg(x) =
x;u 2.11. ;1 Recti cation A recti er computes g(t) = jf (t)j, for recovering the
envelope of modulated signals 57].
(a) Show that if f (t) = a(t) sin !0 t with a(t) 0 then
1 g (! ) = ; 2
^ +1
X a(! ; 2n!0)
^
: n=;1 4n2 ; 1 (b) Suppose that a(!) = 0 for j!j > !0 . Find h such that a(t) =
^
h ? g(t).
2.12. 2 Amplitude modulation For 0 n < N , we suppose that fn (t) is
real and that f^n (!) = 0 for j!j > !0 .
(a) Double side-bands An amplitude modulated multiplexed signal
is de ned by g(t) = N
X n=0 fn(t) cos(2 n !0 t): Compute g(!) and verify that the width of its support is 4N!0 .
^
Find a demodulation algorithm that recovers each fn from g.
(b) Single side-band We want to reduce the bandwidth of the multiplexed signal by 2. Find a modulation procedure that transforms CHAPTER 2. FOURIER KINGDOM 72 each fn into a real signal gn such that gn has a support included
^
in ;(n +1)!0 ;n!0 ] n!0 (n +1)!0 ], with the possibility of reP;
covering fn from gn . Compute the bandwidth of g = N=01 gn ,
n
and nd a demodulation algorithm that recovers each fn from
g.
2 Let f = f ? h with h = 1
^
2.13.
; ]. Suppose that f has a bounded
variation kf kV < +1 and that it is continuous in a neighborhood
of t0 . Prove that in a neighborhood of t0 , f (t) converges uniformly
to f (t) when goes to +1.
2.14. 1 Tomography Let g (t) be the integral of f (x1 x2 ) along the line
;x1 sin + x2 cos = t, which has an angle and lies at a distance
jtj from the origin: g (t) = Z +1
;1 f (;t sin + cos t cos + sin ) d : Prove that g (!) = f^(;! sin ! cos ). How can we recover f (x1 x2 )
^
from the tomographic projections g (t) for 0
<2 ?
1 Let f (x x ) be an image which has a discontinuity of amplitude A
2.15.
12
along a straight line having an angle in the plane (x1 x2 ). Compute
the amplitude of the Gibbs oscillations of f ?h (x1 x2 ) as a function
^
of , and A, for h (!1 !2 ) = 1 ; ](!1 ) 1 ; ](!2 ). Chapter 3
Discrete Revolution
Digital signal processing has taken over. First used in the 1950's at
the service of analog signal processing to simulate analog transforms,
digital algorithms have invaded most traditional fortresses, including
television standards, speech processing, tape recording and all types
of information manipulation. Analog computations performed with
electronic circuits are faster than digital algorithms implemented with
microprocessors, but are less precise and less exible. Thus analog
circuits are often replaced by digital chips once the computational performance of microprocessors is su cient to operate in real time for a
given application.
Whether sound recordings or images, most discrete signals are obtained by sampling an analog signal. Conditions for reconstructing an
analog signal from a uniform sampling are studied. Once more, the
Fourier transform is unavoidable because the eigenvectors of discrete
time-invariant operators are sinusoidal waves. The Fourier transform
is discretized for signals of nite size and implemented with a fast computational algorithm. 3.1 Sampling Analog Signals 1
The simplest way to discretize an analog signal f is to record its sample values ff (nT )gn2Z at intervals T . An approximation of f (t) at any
t 2 R may be recovered by interpolating these samples. The Whittaker sampling theorem gives a su cient condition on the support of
the Fourier transform f^ to compute f (t) exactly. Aliasing and approx73 74 CHAPTER 3. DISCRETE REVOLUTION imation errors are studied when this condition is not satis ed. More
general sampling theorems are studied in Section 3.1.3 from a vector
space point of view. 3.1.1 Whittaker Sampling Theorem A discrete signal may be represented as a sum of Diracs. We associate
to any sample f (nT ) a Dirac f (nT ) (t ; nT ) located at t = nT . A
uniform sampling of f thus corresponds to the weighted Dirac sum fd(t) = +1
X n=;1 f (nT ) (t ; nT ): (3.1) The Fourier transform of (t ; nT ) is e;inT! so the Fourier transform
of fd is a Fourier series:
+1
^d (!) = X f (nT ) e;inT! :
f (3.2) n=;1 To understand how to compute f (t) from the sample values f (nT ) and
hence f from fd, we relate their Fourier transforms f^ and f^d. Proposition 3.1 The Fourier transform of the discrete signal obtained
by sampling f at intervals T is +1
^d(!) = 1 X f^ ! ; 2k
f
T k=;1
T : (3.3) Proof 1 . Since (t ; nT ) is zero outside t = nT , f (nT ) (t ; nT ) = f (t) (t ; nT )
so we can rewrite (3.1) as multiplication with a Dirac comb: fd (t) = f (t) +1
X n=;1 (t ; nT ) = f (t) c(t): (3.4) 3.1. SAMPLING ANALOG SIGNALS 75 Computing the Fourier transform yields
^
f^d(!) = 21 f^ ? c(!): (3.5) The Poisson formula (2.4) proves that c(!) = 2
^ T +1
X k=;1 ! ; 2T k : (3.6) ^
^
Since f ? (! ; ) = f (! ; ), inserting (3.6) in (3.5) proves (3.3). Proposition 3.1 proves that sampling f at intervals T is equivalent to
making its Fourier transform 2 =T periodic by summing all its translations f^(! ; 2k =T ). The resulting sampling theorem was rst proved
by Whittaker 349] in 1935 in a book on interpolation theory. Shannon
rediscovered it in 1949 for applications to communication theory 306]. Theorem 3.1 (Shannon, Whittaker) If the support of f^ is included
in ; =T =T ] then f (t) =
with +1
X n=;1 f (nT ) hT (t ; nT ) t=T
hT (t) = sin(t=T ) : (3.7)
(3.8) ^
Proof 1 . If n 6= 0, the support of f (! ; n =T ) does not intersect the
^(!) because f^(!) = 0 for j!j > =T . So (3.3) implies
support of f
^ f^d (!) = f (!) if j!j T :
T (3.9) ^
The Fourier transform of hT is hT = T 1 ; =T =T ]. Since the support of
^
f^ is in ; =T =T ] it results from (3.9) that f^(!) = hT (!) f^d (!). The CHAPTER 3. DISCRETE REVOLUTION 76 inverse Fourier transform of this equality gives f (t) = hT ? fd (t) = hT ?
= +1
X +1
X n=;1 n=;1 f (nT ) (t ; nT ) f (nT ) hT (t ; nT ): (3.10)
(3.10) The sampling theorem imposes that the support of f^ is included in
; =T =T ], which guarantees that f has no brutal variations between
consecutive samples, and can thus be recovered with a smooth interpolation. Section 3.1.3 shows that one can impose other smoothness
conditions to recover f from its samples. Figure 3.1 illustrates the different steps of a sampling and reconstruction from samples, in both the
time and Fourier domains. 3.1.2 Aliasing The sampling interval T is often imposed by computation or storage
constraints and the support of f^ is generally not included in ; =T =T ].
In this case the interpolation formula (3.7) does not recover f . We analyze the resulting error and a ltering procedure to reduce it.
Proposition 3.1 proves that
+1
^d(!) = 1 X f^ ! ; 2k
f
T
T k=;1 : (3.11) Suppose that the support of f^ goes beyond ; =T =T ]. In general
the support of f^(! ; 2k =T ) intersects ; =T =T ] for several k 6= 0,
as shown in Figure 3.2. This folding of high frequency components over
a low frequency interval is called aliasing. In the presence of aliasing,
the interpolated signal hT ? fd (t) = +1
X n=;1 f (nT ) hT (t ; nT ) 3.1. SAMPLING ANALOG SIGNALS 77 ^
f( ω) (a) f(t) ω π
T π
T t ^
fd( ω) (b) 3π
T π
T −π
T f (t)
d 3π
T ω t
T ^ (ω)
h h (t)
T T (c) π
T π
T 1 ω
-3T ^ f (ω) ^ (ω)
h
d
T (d) −π
T π
T 0 -T t
T 3T f * h (t)
dT ω t Figure 3.1: (a): Signal f and its Fourier transform f^. (b): A uniform
sampling of f makes its Fourier transform periodic. (c): Ideal low-pass
lter. (d): The ltering of (b) with (c) recovers f . CHAPTER 3. DISCRETE REVOLUTION 78
has a Fourier transform f^d (!) ^ T (!) = T f^d (!) 1 ;
h =T =T ](! ) = 1 ; =T =T ](! ) +1
X k=;1 k
f^ ! ; 2T (3.12)
which may be completely di erent from f^(!) over ; =T =T ]. The
signal hT ? fd may not even be a good approximation of f , as shown by
Figure 3.2.
^
f( ω) (a) π
T f(t) ω π
T t ^
f ( ω) f d(t) d (b) 3π
T π
T π
T 3π
T ω t
T ^
h ( ω) h (t) T 1 (c)
π
T ω π
T ^
^
fd ( ω) h ( ω) π
T π
T 0
-3T -T T 3T t f d * h (t)
T T (d) T ω t Figure 3.2: (a): Signal f and its Fourier transform f^. (b): Aliasing
produced by an overlapping of f^(! ; 2k =T ) for di erent k, shown in
dashed lines. (c): Ideal low-pass lter. (d): The ltering of (b) with
(c) creates a low-frequency signal that is di erent from f . 3.1. SAMPLING ANALOG SIGNALS 79 Example 3.1 Let us consider a high frequency oscillation
i!0 t + e;i!0 t f (t) = cos(!0t) = e
Its Fourier transform is
f^(!) = 2 (! ; !0) + (! + !0) : If 2 =T > !0 > =T then (3.12) yields
f^d (!) ^ T (!)
h
= 1;
=
so =T =T ](! ) : +1
X k=;1 k
! ; !0 ; 2T (! ; 2 + !0) + (! + 2
T
T k
+ ! + !0 ; 2T ; !0 ) fd ? hT (t) = cos 2 ; !0 t :
T
The aliasing reduces the high frequency !0 to a lower frequency 2 =T ;
!0 2 ; =T =T ]. The same frequency folding is observed in a lm
that samples a fast moving object without enough images per second.
A wheel turning rapidly appears as turning much more slowly in the
lm. Removal of Aliasing To apply the sampling theorem, f is approx- imated by the closest signal f~ whose Fourier transform has a support
in ; =T =T ]. The Plancherel formula (2.26) proves that
Z +1
b2
^
~
~k2 = 1
kf ; f
2 Z;1 jf (!) ; f (!)j d! Z
1
b
^
^
~
=1
jf (! )j2 d! +
jf (! ) ; f (! )j2 d!:
2 j!j> =T
2 j!j =T
This distance is minimum when the second integral is zero and hence
1^
b
f~(!) = f^(!) 1 ; =T =T ](!) = T hT (!) f^(!):
(3.13) 80 CHAPTER 3. DISCRETE REVOLUTION 1
It corresponds to f~ = T f ?hT . The ltering of f by hT avoids the aliasb
ing by removing any frequency larger than =T . Since f~ has a support
in ; =T =T ], the sampling theorem proves that f~(t) can be recovered from the samples f~(nT ). An analog to digital converter is therefore composed of a lter that limits the frequency band to ; =T =T ],
followed by a uniform sampling at intervals T . 3.1.3 General Sampling Theorems The sampling theorem gives a su cient condition for reconstructing a
signal from its samples, but other su cient conditions can be established for di erent interpolation schemes 335]. To explain this new
point of view, the Whittaker sampling theorem is interpreted in more
abstract terms, as a signal decomposition in an orthogonal basis. Proposition 3.2 If hT (t) = sin( t=T )=( t=T ) then fhT (t ; nT )gn2Z
is an orthogonal basis of the space UT of functions whose Fourier transforms have a support included in ; =T =T ]. If f 2 UT then
f (nT ) = 1 hf (t) hT (t ; nT )i:
(3.14)
T ^
Proof 2 . Since hT = T 1 ; =T =T ] the Parseval formula (2.25) proves
that
1 Z +1 T 2 1
hhT (t ; nT ) hT (t ; pT )i = 2
; =T =T ](!) exp ;i(n ; p)T!] d!
;1 T2 Z
=2 =T ; =T exp ;i(n ; p)T!] d! = T n ; p]: The family fhT (t;nT )gn2Z is therefore orthogonal. Clearly hT (t;nT ) 2
UT and (3.7) proves that any f 2 UT can be decomposed as a linear
combination of fhT (t ; nT )gn2Z. It is therefore an orthogonal basis of
UT .
Equation (3.14) is also proved with the Parseval formula
1 Z +1 f^(!) h (!) exp(inT!) d!:
^T
hf (t) hT (t ; nT )i = 2
;1 3.1. SAMPLING ANALOG SIGNALS 81 ^
^
Since the support of f is in ; =T =T ] and hT = T 1 ; =T =T ] , TZ
hf (t) hT (t ; nT )i = 2 =T ; =T f^(!) exp(inT!) d! = T f (nT ): (3.14) Proposition 3.2 shows that the interpolation formula (3.7) can be interpreted as a decomposition of f 2 UT in an orthogonal basis of UT:
+1
1 X hf (u) h (u ; nT )i h (t ; nT ):
f (t) = T
T
T n=;1 (3.15) If f 2 UT, which means that f^ has a support not included in ; =T =T ],
=
the removal of aliasing is computed by nding the function f~ 2 UT that
minimizes kf~; f k. Proposition A.2 proves that f~ is the orthogonal projection PUT f of f in UT .
The Whittaker sampling theorem is generalized by de ning other
spaces UT such that any f 2 UT can be recovered by interpolating its
samples ff (nT )gn2Z. A signal f 2 UT is approximated by its orthog=
~ = PUT f in UT, which is characterized by a uniform
onal projection f
sampling ff~(nT )gn2Z. Block Sampler A block sampler approximates signals with piecewise constant functions. The approximation space UT is the set of all functions that are constant on intervals nT (n + 1)T ), for any n 2 Z.
Let hT = 1 0 T ). The family fhT (t ; nT )gn2Z is clearly an orthogonal
basis of UT. Any f 2 UT can be written f (t) = +1
X n=;1 f (nT ) hT (t ; nT ): If f 2 UT then (A.17) shows that its orthogonal projection on UT is
=
calculated with a partial decomposition in an orthogonal basis of UT.
Since khT (t ; nT )k2 = T ,
+1
1 X hf (u) h (u ; nT )i h (t ; nT ):
PUT f (t) = T
T
T
n=;1 (3.16) 82 CHAPTER 3. DISCRETE REVOLUTION Let hT (t) = hT (;t). Then
hf (u) hT (u ; nT )i = Z (n+1)T
nT f (t) dt = f ? hT (nT ): This averaging of f over intervals of size T is equivalent to the aliasing
removal used for the Whittaker sampling theorem. Approximation Space The space UT should be chosen so that PUT f gives an accurate approximation of f , for a given class of signals. The Whittaker interpolation approximates signals by restricting
their Fourier transform to a low frequency interval. It is particularly effective for smooth signals whose Fourier transform have an energy concentrated at low frequencies. It is also well adapted to sound recordings,
which are well approximated by lower frequency harmonics.
For discontinuous signals such as images, a low-frequency restriction produces the Gibbs oscillations studied in Section 2.3.3. The visual quality of the image is degraded by these oscillations, which have
a total variation (2.65) that is in nite. A piecewise constant approximation has the advantage of creating no spurious oscillations, and
one can prove that the projection in UT decreases the total variation:
kPUT f kV
kf kV . In domains where f is a regular function, the piecewise constant approximation PUT f may however be signi cantly improved. More precise approximations are obtained with spaces UT of
higher order polynomial splines. These approximations can introduce
small Gibbs oscillations, but these oscillations have a nite total variation. Section 7.6.1 studies the construction of interpolation bases used
to recover signals from their samples, when the signals belong to spaces
of polynomial splines and other spaces UT. 3.2 Discrete Time-Invariant Filters 1 3.2.1 Impulse Response and Transfer Function Classical discrete signal processing algorithms are mostly based on
time-invariant linear operators 55, 58]. The time-invariance is limited to translations on the sampling grid. To simplify notation, the 3.2. DISCRETE TIME-INVARIANT FILTERS 83 sampling interval is normalized T = 1, and we denote f n] the sample
values. A linear discrete operator L is time-invariant if an input f n]
delayed by p 2 Z, fp n] = f n ; p], produces an output also delayed by
p:
Lfp n] = Lf n ; p]: Impulse Response We denote by n] the discrete Dirac
1 if n = 0 :
0 if n 6= 0 n] = (3.17) Any signal f n] can be decomposed as a sum of shifted Diracs f n] = +1
X p=;1 f p] n ; p]: Let L n] = h n] be the discrete impulse response. The linearity and
time-invariance implies that Lf n] = +1
X p=;1 f p] h n ; p] = f ? h n]: (3.18) A discrete linear time-invariant operator is thus computed with a discrete convolution. If h n] has a nite support the sum (3.18) is calculated with a nite number of operations. These are called Finite
Impulse Response (FIR) lters. Convolutions with in nite impulse response lters may also be calculated with a nite number of operations
if they can be rewritten with a recursive equation (3.30). Causality and Stability A discrete lter L is causal if Lf p] depends only on the values of f n] for n p. The convolution formula (3.18)
implies that h n] = 0 if n < 0.
The lter is stable if any bounded input signal f n] produces a
bounded output signal Lf n]. Since
jLf n]j sup jf n]j
n 2Z +1
X k=;1 jh k ]j CHAPTER 3. DISCRETE REVOLUTION 84
P 1
it is su cient that +=;1 jh n]j < +1, which means that h 2 l1(Z).
n
One can verify that this su cient condition is also necessary. The
impulse response h is thus stable if h 2 l1(Z). Transfer Function The Fourier transform plays a fundamental role
in analyzing discrete time-invariant operators, because the discrete sinusoidal waves e! n] = ei!n are eigenvectors: Le! n] = +1
+1
X i!(n;p)
X
e
h p] = ei!n p=;1 p=;1 h p] e;i!p : (3.19) The eigenvalue is a Fourier series
+1
^ (!) = X h p] e;i!p :
h p=;1 (3.20) It is the lter transfer function. Example 3.2 The uniform discrete average
X
1 n+N f p]
Lf n] = 2N + 1
p=n;N is a time-invariant discrete lter whose impulse response is h = (2N +
1);11 ;N N ]. Its transfer function is
+N
^ (!) = 1 X e;in! = 1 sin(N + 1=2)! :
h
2N + 1
2N + 1 sin !=2 n=;N 3.2.2 Fourier Series (3.21) The properties of Fourier series are essentially the same as the properties of the Fourier transform since Fourier series are particular instances
P1
of Fourier transforms for Dirac sums. If f (t) = +=;1 f n] (t ; n)
n
P1
then f^(!) = +=;1 f n] e;i!n :
n
For any n 2 Z, e;i!n has period 2 , so Fourier series have period 2 .
An important issue is to understand whether all functions with period 3.2. DISCRETE TIME-INVARIANT FILTERS 85 2 can be written as Fourier series. Such functions are characterized
by their restriction to ; ]. We therefore consider functions a 2
^
2;
2;
L that are square integrable over ; ]. The space L
is a Hilbert space with the inner product
Z
^i = 1
ha b
^
^^
(3.22)
2 ; a(!) b (!) d!
and the resulting norm
1 Z ja(!)j2 d!:
2
kak =
^
2 ;^
The following theorem proves that any function in L2 ; ] can be
written as a Fourier series. Theorem 3.2 The family of functions fe;ik! gk2Z is an orthonormal
basis of L2 ; ].
Proof 2 . The orthogonality with respect to the inner product (3.22) is
established with a direct integration. To prove that fexp(;ik!)gk2Z is
a basis, we must show that linear expansions of these vectors are dense
in L2 ; ].
We rst prove that any continuously di erentiable function ^ with a
support included in ; ] satis es ^(!) = +1
X k=;1 h ^( ) e;ik i exp(;ik!) with a pointwise convergence for any ! 2 ;
partial sum SN (!) = N
X k=;N
N
X (3.23)
. Let us compute the h ^( ) exp(;ik )i exp(;ik!) 1 Z ^( ) exp(ik ) d exp(;ik!)
=
k=;N 2 ;
N
1 Z ^( ) X exp ik( ; !)] d :
=2
;
k=;N CHAPTER 3. DISCRETE REVOLUTION 86 The Poisson formula (2.37) proves the distribution equality
lim N
X N !+1 k=;N exp ik( ; !)] = 2 +1
X k=;1 ( ; ! ; 2 k) and since the support of ^ is in ; ] we get
lim S (!) = ^(!):
N !+1 N Since ^ is continuously di erentiable, following the steps (2.38-2.40) in
the proof of the Poisson formula shows that SN (!) converges uniformly
to ^(!) on ; ].
To prove that linear expansions of sinusoidal waves fexp(;ik!)gk2Z
are dense in L2 ; ], let us verify that the distance between a 2
^
2;
L and such a linear expansion is less than , for any > 0.
Continuously di erentiable functions with a support included in ; ]
are dense in L2 ; ], hence there exists ^ such that ka ; ^k =2.
^
The uniform pointwise convergence proves that there exists N for which
sup jSN (!) ; ^(!)j 2
!2 ; ]
which implies that
Z
2
^k2 = 1
kSN ;
jSN (!) ; ^(!)j2 d! 4 :
2;
It follows that a is approximated by the Fourier series SN with an error
^
ka ; SN k ka ; ^k + k ^ ; SN k :
^
^
(3.23) Theorem 3.2 proves that if f 2 l2(Z), the Fourier series
+1
^(!) = X f n] e;i!n
f n=;1 (3.24) can be interpreted as the decomposition of f^ 2 L2 ; ] in an orthonormal basis. The Fourier series coe cients can thus be written as
inner products
Z
^(!) e;i!ni = 1
f n] = hf
f^(!) ei!n d!:
(3.25)
2; 3.2. DISCRETE TIME-INVARIANT FILTERS 87 The energy conservation of orthonormal bases (A.10) yields a Plancherel
identity:
+1
X
1 Z jf^(!)j2 d!:
2
2
^
jf n]j = kf k =
(3.26)
2;
n=;1 Pointwise Convergence The equality (3.24) is meant in the sense
of mean-square convergence N
X
lim1 f^(!) ;
f k] e;i!k = 0:
N !+
k=;N It does not imply a pointwise convergence at all ! 2 R . In 1873,
Dubois-Reymond constructed a periodic function f^(!) that is continuous and whose Fourier series diverges at some points. On the other
hand, if f^(!) is continuously di erentiable, then the proof of Theorem 3.2 shows that its Fourier series converges uniformly to f^(!) on
;. It was only in 1966 that Carleson 114] was able to prove that
^ 2 L2 ; ] then its Fourier series converges almost everywhere.
if f
The proof is however extremely technical. Convolutions Since fe;i!k gk2Z are eigenvectors of discrete convolution operators, we also have a discrete convolution theorem. Theorem 3.3 If f 2 l1(Z) and h 2 l1(Z) then g = f ? h 2 l1(Z) and
g(!) = f^(!) ^ (!):
^
h (3.27) The proof is identical to the proof of the convolution Theorem 2.2,
if we replace integrals by discrete sums. The reconstruction formula
(3.25) shows that a ltered signal can be written
1 Z h(!)f^(!) ei!n d!:
^
f ? h n] = 2
; (3.28) ^
The transfer function h(!) ampli es or attenuates the frequency components f^(!) of f n]. CHAPTER 3. DISCRETE REVOLUTION 88 Example 3.3 An ideal discrete low-pass lter has a 2 periodic trans^
fer function de ned by h(!) = 1 ; ](!), for ! 2 ; ] and 0 < < . Its impulse response is computed with (3.25):
Z
1
h n] = 2
ei!n d! = sinn n :
(3.29)
;
It is a uniform sampling of the ideal analog low-pass lter (2.29). Example 3.4 A recursive lter computes g = Lf which is solution of
a recursive equation K
X
k=0 ak f n ; k] = M
X
k=0 bk g n ; k] (3.30) with b0 6= 0. If g n] = 0 and f n] = 0 for n < 0 then g has a linear and
time-invariant dependency upon f , and can thus be written g = f ? h.
The transfer function is obtained by computing the Fourier transform of
(3.30). The Fourier transform of fk n] = f n ; k] is f^k (!) = f^(!) e;ik!
so
PK
;ik!
^
^ (!) = g(!) = Pk=0 ak e :
h
M b e;ik!
f^(!)
k=0 k
;i! . If bk 6= 0 for some k > 0 then one
It is a rational function of e
can verify that the impulse response h has an in nite support. The
stability of such lters is studied in Problem 3.8. A direct calculation
of the convolution sum g n] = f ?h n] would require an in nite number
of operation whereas (3.30) computes g n] with K + M additions and
multiplications from its past values. Window Multiplication An in nite impulse response lter h such as the ideal low-pass lter (3.29) may be approximated by a nite re~
sponse lter h by multiplying h with a window g of nite support:
~
h n] = g n] h n]:
One can verify that a multiplication in time is equivalent to a convolution in the frequency domain:
Z
b(! ) = 1
~
^^
^^
h
h( ) g(! ; ) d = 21 h ? g(!):
(3.31)
2; 3.3. FINITE SIGNALS 1 89 b^
~
Clearly h = h only if g = 2 , which would imply that g has an
^
b
~
^
in nite support and g n] = 1. The approximation h is close to h only if
g approximates a Dirac, which means that all its energy is concentrated
^
at low frequencies. In time, g should therefore have smooth variations.
The rectangular window g = 1 ;N N ] has a Fourier transform g
^
computed in (3.21). It has important side lobes far away from ! = 0.
b
~
^
The resulting h is a poor approximation of h. The Hanning window
n
g n] = cos2 2N 1 ;N N ] n]
is smoother and thus has a Fourier transform better concentrated at
low frequencies. The spectral properties of other windows are studied
in Section 4.2.2. 3.3 Finite Signals 1
Up to now, we have considered discrete signals f n] de ned for all
n 2 Z. In practice, f n] is known over a nite domain, say 0 n < N .
Convolutions must therefore be modi ed to take into account the border
e ects at n = 0 and n = N ; 1. The Fourier transform must also be
rede ned over nite sequences for numerical computations. The fast
Fourier transform algorithm is explained as well as its application to
fast convolutions. 3.3.1 Circular Convolutions ~
Let f~ and h be signals of N samples. To compute the convolution
product
+1
~ ? h n] = X f~ p] h n ; p] for 0 n < N
~
~
f
p=;1 ~
we must know f~ n] and h n] beyond 0 n < N . One approach is to
~
extend f~ and h with a periodization over N samples, and de ne
~
f n] = f~ n mod N ] h n] = h n mod N ]: CHAPTER 3. DISCRETE REVOLUTION 90 The circular convolution of two such signals, both with period N , is
de ned as a sum over their period: f ? h n] = N ;1
X
p=0 f p] h n ; p] = N ;1
X
p=0 f n ; p] h p]: It is also a signal of period N .
The eigenvectors of a circular convolution operator
Lf n] = f ? h n]
are the discrete complex exponentials ek n] = exp (i2 kn=N ). Indeed
X
i2 kn N ;1 h p] exp
Lek n] = exp N
p=0 ;i2 N kp and the eigenvalue is the discrete Fourier transform of h:
N ;1 X
^
h k] = h p] exp ;i2 p=0 kp :
N 3.3.2 Discrete Fourier Transform The space of signals of period N is an Euclidean space of dimension N
and the inner product of two such signals f and g is
hf g i = N ;1
X
n=0 f n] g n]: (3.32) The following theorem proves that any signal with period N can be
decomposed as a sum of discrete sinusoidal waves. Theorem 3.4 The family
ek n] = exp i2Nkn 0 k<N is an orthogonal basis of the space of signals of period N . 3.3. FINITE SIGNALS 91 Since the space is of dimension N , any orthogonal family of N
vectors is an orthogonal basis. To prove this theorem it is therefore
su cient to verify that fek g0 k<N is orthogonal with respect to the
inner product (3.32). Any signal f of period N can be decomposed in
this basis:
N ;1
Xf
(3.33)
f = hke ekk2i ek :
k k=0 By de nition, the discrete Fourier transform (DFT) of f is f^ k] = hf ek i = N ;1
X
n=0 f n] exp ;i2 kn :
N (3.34) Since kek k2 = N , (3.33) gives an inverse discrete Fourier formula:
X
1 N ;1 f^ k] exp i2 kn :
f n] = N
N
k=0 (3.35) The orthogonality of the basis also implies a Plancherel formula
X
1 N ;1 jf^ k]j2 :
kf k =
jf n]j =
N k=0
n=0
2 N ;1
X (3.36) 2 The discrete Fourier transform of a signal f of period N is computed
from its values for 0 n < N . Then why is it important to consider it
a periodic signal with period N rather than a nite signal of N samples? The answer lies in the interpretation of the Fourier coe cients.
The discrete Fourier sum (3.35) de nes a signal of period N for which
the samples f 0] and f N ; 1] are side by side. If f 0] and f N ; 1] are
very di erent, this produces a brutal transition in the periodic signal,
creating relatively high amplitude Fourier coe cients at high frequencies. For example, Figure 3.3 shows that the \smooth" ramp f n] = n
for 0 n < N has sharp transitions at n = 0 and n = N once made
periodic. Circular Convolutions Since fexp (i2 kn=N )g0 are eigenvectors of circular convolutions, we derive a convolution theorem.
k<N CHAPTER 3. DISCRETE REVOLUTION 92 -1 01 N-1 N Figure 3.3: Signal f n] = n for 0
samples. n < N made periodic over N Theorem 3.5 If f and h have period N then the discrete Fourier
transform of g = f ? h is ^
g k] = f^ k] h k]:
^ (3.37) The proof is similar to the proof of the two previous convolution
Theorems 2.2 and 3.3. This theorem shows that a circular convolution can be interpreted as a discrete frequency ltering. It also opens
the door to fast computations of convolutions using the fast Fourier
transform. 3.3.3 Fast Fourier Transform For a signal f of N points, a direct calculation of the N discrete Fourier
sums
N ;1 X
f^ k] = f n] exp
n=0 ;i2 N kn for 0 k < N (3.38) requires N 2 complex multiplications and additions. The fast Fourier
transform (FFT) algorithm reduces the numerical complexity to O(N log2 N )
by reorganizing the calculations.
When the frequency index is even, we group the terms n and n +
N=2:
N=2;1
^ 2k] = X f n] + f n + N=2] exp
f
n=0 ;i2 kn :
N=2 When the frequency index is odd, the same grouping becomes (3.39) 3.3. FINITE SIGNALS
N=2;1
X kn :
N
N=2
n=0
(3.40)
Equation (3.39) proves that even frequencies are obtained by calculating
the discrete Fourier transform of the N=2 periodic signal
fe n] = f n] + f n + N=2]:
Odd frequencies are derived from (3.40) by computing the Fourier transform of the N=2 periodic signal
2
fo n] = exp ;iN n f n] ; f n + N=2] :
A discrete Fourier transform of size N may thus be calculated with two
discrete Fourier transforms of size N=2 plus O(N ) operations.
The inverse fast Fourier transform of f^ is derived from the forward
fast Fourier transform of its complex conjugate f^ by observing that
X
1 N ;1 f^ k] exp ;i2 kn :
f n] = N
(3.41)
N
k=0
f^ 2k + 1] = exp ;i2 93 n f n] ; f n + N=2] exp ;i2 Complexity Let C (N ) be the number of elementary operations needed to compute a discrete Fourier transform with the FFT. Since f is complex, the calculation of fe and fo requires N complex additions and
N=2 complex multiplications. Let KN be the corresponding number
of elementary operations. We have
C (N ) = 2 C (N=2) + K N:
(3.42)
Since the Fourier transform of a single point is equal to itself, C (1) = 0.
With the change of variable l = log2 N and the change of function
N
T (l) = C (N ) , we derive from (3.42) that
T (l) = T (l ; 1) + K:
Since T (0) = 0 we get T (l) = K l and hence C (N ) = K N log2(N ): CHAPTER 3. DISCRETE REVOLUTION 94 There exist several variations of this fast algorithm 177, 51]. The
goal is to minimize the constant K . The most e cient fast discrete
Fourier transform to this date is the split-radix FFT algorithm, which is
slightly more complicated than the procedure just described, but which
requires only N log2 N real multiplications and 3N log2 N additions.
When the input signal f is real, there are half as many parameters
to compute, since f^ ;k] = f^ k]. The number of multiplications and
additions is thus reduced by 2. 3.3.4 Fast Convolutions The low computational complexity of a fast Fourier transform makes
it e cient to compute nite discrete convolutions by using the circular
convolution Theorem 3.5. Let f and h be two signals whose samples
are non-zero only for 0 n < M . The causal signal g n] = f ? h n] = +1
X k=;1 f k] h n ; k] (3.43) is non-zero only for 0 n < 2M . If h and f have M non-zero samples, calculating this convolution product with the sum (3.43) requires
M (M + 1) multiplications and additions. When M 32, the number
of computations is reduced by using the fast Fourier transform 11, 51].
To use the fast Fourier transform with the circular convolution Theorem 3.5, the non-circular convolution (3.43) is written as a circular
convolution. We de ne two signals of period 2M : a n] =
b n] = f n]
0
h n]
0 if 0 n < M
if M n < 2M
if 0 n < M :
if M n < 2M (3.44)
(3.45) Let c = a ? b, one can verify that c n] = g n] for 0 n < 2M . The 2M
non-zero coe cients of g are thus obtained by computing a and ^ from
^
b
a and b and then calculating the inverse discrete Fourier transform of
c = a ^. With the fast Fourier transform algorithm, this requires a total
^ ^b
of O(M log2 M ) additions and multiplications instead of M (M + 1). A 3.4. DISCRETE IMAGE PROCESSING 1 95 single FFT or inverse FFT of a real signal of size N is calculated with
2;1N log2 N multiplications, using a split-radix algorithm. The FFT
convolution is thus performed with a total of 3M log2 M + 11M real
multiplications. For M 32 the FFT algorithm is faster than the
direct convolution approach. For M 16, it is faster to use a direct
convolution sum. Fast Overlap-Add Convolutions The convolution of a signal f of L non-zero samples with a smaller causal signal h of M samples
is calculated with an overlap-add procedure that is faster than the
previous algorithm. The signal f is decomposed with a sum of L=M
blocks fr having M non-zero samples: f n] = L=M ;1
X
r=0 fr n ; rM ] with fr n] = f n + rM ] 1 0 M ;1] n]: (3.46) For each 0 r < L=M , the 2M non-zero samples of gr = fr ? h are
computed with the FFT based convolution algorithm, which requires
O(M log2 M ) operations. These L=M convolutions are thus obtained
with O(L log2 M ) operations. The block decomposition (3.46) implies
that f ? h n] = L=M ;1
X
r=0 gr n ; rM ]: (3.47) The addition of these L=M translated signals of size 2M is done with 2L
additions. The overall convolution is thus performed with O(L log2 M )
operations. 3.4 Discrete Image Processing 1
Two-dimensional signal processing poses many speci c geometrical and
topological problems that do not exist in one dimension 23, 34]. For
example, a simple concept such as causality is not well de ned in two dimensions. We avoid the complexity introduced by the second dimension
by extending one-dimensional algorithms with a separable approach. CHAPTER 3. DISCRETE REVOLUTION 96 This not only simpli es the mathematics but also leads to faster numerical algorithms along the rows and columns of images. Appendix
A.5 reviews the properties of tensor products for separable calculations. 3.4.1 Two-Dimensional Sampling Theorem The light intensity measured by a camera is generally sampled over a
rectangular array of picture elements, called pixels. The one-dimensional
sampling theorem is extended to this two-dimensional sampling array.
Other two-dimensional sampling grids such as hexagonal grids are also
possible, but non-rectangular sampling arrays are hardly ever used. We
avoid studying them following our separable extension principle.
Let T1 and T2 be the sampling intervals along the x1 and x2 axes
of an in nite rectangular sampling grid. A discrete image obtained by
sampling f (x1 x2 ) can be represented as a sum of Diracs located at the
grid points: fd(x1 x2 ) = +1
X n1 n2 ;1 f (n1 T1 n2 T2) (x1 ; n1 T1 ) (x2 ; n2T2 ): The two-dimensional Fourier transform of
(x1 ; n1T1 ) (x2 ; n2 T2) is exp ;i(n1 T1 !1 + n2 T2!2 )]:
The Fourier transform of fd is thus a two-dimensional Fourier series
+1
^d(!1 !2) = X f (n1T1 n2T2 ) exp ;i(n1 T1 !1 + n2 T2 !2)]: (3.48)
f n1 n2 =;1 It has period 2 =T1 along !1 and period 2 =T2 along !2. An extension
of Proposition 3.1 relates f^d to the two-dimensional Fourier transform
f^ of f .
Proposition 3.3 The Fourier transform of the discrete image obtained
by sampling f at intervals T1 and T2 along x1 and x2 is
+1
X f^d(!1 !2) = T 1T
1 2 k1 k2 =;1 f^ !1 ; 2k1 !2 ; 2k2 :
T
T
1 2 (3.49) 3.4. DISCRETE IMAGE PROCESSING 97 We derive the following two-dimensional sampling theorem, which
is analogous to Theorem 3.1.
Theorem 3.6 If f^ has a support included in ; =T1 =T1] ; =T2 =T2]
then f (x1 x2) = +1
X n1 n2 =;1 where f (n1T1 n2T2 ) hT1 (x1 ; n1 T1) hT2 (x2 ; n2 T2 )
(3.50) t=T
hT (t) = sin(t=T ) : (3.51) Aliasing If the support of f^ is not included in the low-frequency rect- angle ; =T1 =T1] ; =T2 =T2], the interpolation formula (3.50)
introduces aliasing errors. This aliasing is eliminated by pre ltering
f with the ideal low-pass separable lter hT1 (x1 ) hT2 (x2 )=(T1 T2) whose
Fourier transform is the indicator function of ; =T1 =T1] ; =T2 =T2]. 3.4.2 Discrete Image Filtering The properties of two-dimensional space-invariant operators are essentially the same as in one dimension. The sampling intervals T1 and
T2 are normalized to 1. A pixel value located at (n1 n2) is written
f n1 n2 ]. A linear operator L is space-invariant if for any fp1 p2 n1 n2] =
f n1 ; p1 n2 ; p2], with (p1 p2 ) 2 Z2,
Lfp1 p2 n1 n2] = Lf n1 ; p1 n2 ; p2]: Impulse Response Since an image can be decomposed as a sum of
discrete Diracs: f n1 n2] = +1
X p1 p2 =;1 f p1 p2] n1 ; p1] n2 ; p2 ] the linearity and time invariance implies Lf n1 n2] = +1
X p1 p2 =;1 f p1 p2] h n1 ; p1 n2 ; p2 ] = f ? h n1 n2 ] (3.52) CHAPTER 3. DISCRETE REVOLUTION 98 where h n1 n2] is the response of the impulse 0 0 p1 p2] = p1] p2 ]:
h n1 n2] = L 0 0 n1 n2]:
If the impulse response is separable:
h n1 n2 ] = h1 n1] h2 n2]
(3.53)
the two-dimensional convolution (3.52) is computed as one-dimensional
convolutions along the columns of the image followed by one-dimensional
convolutions along the rows (or vice-versa): f ? h n1 n2 ] = +1
X p1 =;1 h1 n1 ; p1 ] +1
X p2=;1 h2 n2 ; p2 ] f p1 p2]: (3.54) This factorization reduces the number of operations. For example, a
moving average over squares of (2M + 1)2 pixels:
M
M
XX
1
f n1 ; p1 n2 ; p2 ]
Lf n1 n2] = (2M + 1)2
p1 =;M p2 =;M (3.55) is a separable convolution with h1 = h2 = (2M + 1);11 ;M M ]. A
direct calculation with (3.55) requires (2M + 1)2 additions per pixel
whereas the factorization (3.54) performs this calculation with 2(2M +
1) additions per point. Transfer Function The Fourier transform of a discrete image f is
de ned by the Fourier series f^(!1 !2) = +1
X +1
X n1 =;1 n2 =;1 f n1 n2 ] exp ;i(!1 n1 + !2n2)]: (3.56) The two-dimensional extension of the convolution Theorem 3.3 proves
that if g = Lf = f ? h then its Fourier transform is
^
g(!1 !2) = f^(!1 !2) h(!1 !2)
^
^
and h is the transfer function of the lter. When a lter is separable
h n1 n2] = h1 n1 ] h2 n2 ], its transfer function is also separable:
^
^
h(!1 !2) = h1 (!1) ^ 2(!2):
h
(3.57) 3.4. DISCRETE IMAGE PROCESSING 99 3.4.3 Circular Convolutions and Fourier Basis The discrete convolution of a nite image f~ raises border problems.
As in one dimension, these border issues are solved by extending the
image, making it periodic along its rows and columns:
f n1 n2 ] = f~ n1 mod N n2 mod N ]:
The resulting image f n1 n2] is de ned for all (n1 n2) 2 Z2, and each
of its rows and columns is a one-dimensional signal of period N .
A discrete convolution is replaced by a circular convolution over the
image period. If f and h have period N along their rows and columns,
then f ? h n1 n2] = N ;1
X p1 p2 =0 f p1 p2] h n1 ; p1 n2 ; p2]: (3.58) Discrete Fourier Transform The eigenvectors of circular convolu- tions are two-dimensional discrete sinusoidal waves:
2
ek1 k2 n1 n2] = exp iN (k1n1 + k2n2) :
This family of N 2 discrete vectors is the separable product of two onedimensional discrete Fourier bases fexp (i2 kn=N )g0 k<N . Theorem
A.3 thus proves that the family
2
ek1 k2 n1 n2] = exp iN (k1n1 + k2n2 )
0 k1 k2 <N
is an orthogonal basis of the space of images that are periodic with
period N along their rows and columns. Any discrete periodic image f
can be decomposed in this orthogonal basis:
X
1 N ;1 f^ k k ] exp i2 (k n + k n )
f n1 n2] = N 2
(3.59)
12
N 11 22
k1 k2 =0
where f^ is the two-dimensional discrete Fourier transform of f
N ;1
^ k1 k2] = hf ek1 k2 i = X f n1 n2 ] exp
f
n1 n2 =0 ;i2 N (k1n1 + k2n2 ) :
(3.60) CHAPTER 3. DISCRETE REVOLUTION 100 i
Fast Convolutions Since exp( ;N2 (k1n1 + k2 n2)) are eigenvectors of two-dimensional circular convolutions, the discrete Fourier transform
of g = f ? h is
^
g k1 k2] = f^ k1 k2] h k1 k2]:
^
(3.61)
A direct computation of f ? h with the summation (3.58) requires
O(N 4) multiplications. With the two-dimensional FFT described next,
^
f^ k1 k2] and h k1 k2] as well as the inverse DFT of their product (3.61)
are calculated with O(N 2 log N ) operations. Non-circular convolutions
are computed with a fast algorithm by reducing them to circular convolutions, with the same approach as in Section 3.3.4. Separable Basis Decomposition Let fek g0 k<N be an orthogonal
basis of signals of size N . The family fek1 n1 ] ek2 n2 ]g0 k1 k2<N is then an
orthogonal basis of the space of images of N 2 pixels. The decomposition
coe cients of an image f in such a basis is calculated with a separable
algorithm. The application to the two-dimensional FFT is explained.
Two-dimensional inner products are calculated with
hf ek1 ek2 i =
= N ;1 N ;1
XX f n1 n2 ] ek1 n1 ] ek2 n2 ] n1 =0 n2 =0
N ;1
N ;1
X
X
ek1 n] f
n1 =0
n2 =0 n1 n2 ] ek2 n2 ]: (3.62) For 0 n1 < N , we must compute Tf n1 k2] = N ;1
X
n2 =0 f n1 n2] ek2 n2 ] which are the decomposition coe cients of the N image rows in the
basis fek2 g0 k2<N . The coe cients fhf ek1 ek2 ig0 k1 k2<N are calculated
in (3.62) as the inner products of the columns of the transformed image
Tf n1 k2] in the same basis fek g0 k<N . This requires expanding 2N
one-dimensional signals (N rows and N columns) in fek g0 k<N .
The fast Fourier transform algorithm of Section 3.3.3 decomposes a
signal of size N in the discrete Fourier basis fek n] = exp (;i2 kn=N )g0 k<N 3.5. PROBLEMS 101 with KN log2 N operations. A separable implementation of a twodimensional FFT thus requires 2KN 2 log2 N operations. A split-radix
FFT corresponds to K = 3. 3.5 Problems f^ has a support in ;(n+1) =T ;n =T ] n =T (n+
1) =T ] and that f (t) is real. Find an interpolation formula that
recovers f (t) from ff (nT )gn2Z.
3.2. 2 Suppose that f^ has a support in ; =T =T ]. Find a formula
that recovers f (t) from the average samples
3.1. 1 Suppose that 8n 2 Z f~(nT ) = Z (n+1=2)T
(n;1=2)T f (t) dt : An interpolation function f (t) satis es f (n) = n].
P1
(a) Prove that +=;1 f^(! + 2k ) = 1 if and only if f is an
k
interpolation function.P
1
(b) Suppose that f (t) = +=;1 h n] (t ; n) with 2 L2(R).
n
^
Find h(!) so that f (n) = n], and relate f^(!) to ^(!). Give
a su cient condition on ^ to guarantee that f 2 L2 (R).
P1
3.4. 1 Prove that if f 2 L2 (R) and +=;1 f (t ; n) 2 L2 0 1] then
n 3.3. 1 +1
X n=;1 3.5. 1 Verify that f (t ; n) = +1
X k=;1 f^(2k ) ei2 Ya
^
h(!) = 1 k+; e ei!
ak
k=1
K kt : ;i! ^
is an all-pass lter, i.e. jh(!)j = 1. Prove that fh n ; m]gm2Z is
an orthonormal basis of l2 (Z).
^
3.6. 1 Let g n] = (;1)n h n]. Relate g (!) to h(!). If h is a low-pass
^
lter can g(!) be a low-pass lter?
^
1 Prove the convolution Theorem 3.3.
3.7.
3.8. 2 Recursive lters 102 CHAPTER 3. DISCRETE REVOLUTION
(a) Compute the Fourier transform of h n] = an 1 0 +1) n] for
^
jaj < 1. Compute the inverse Fourier transform of h(!) =
;i! );p .
(1 ; a e
(b) Suppose that g = f ? h is calculated by a recursive equation
with real coe cients
K
X
k=0 ak f n ; k] = M
X
k=0 bk g n ; k] P Show that h is a stable lter if and only if the equation M bk z ;k =
k=0
0 has roots with a modulus strictly smaller than 1.
^
(c) Suppose that jh(!)j2 = jP (e;i! )j2 =jD(e;i! )j2 where P (z ) and
D(z ) are polynomials. If D(z) has no root of modulus 1, prove
that one can nd two polynomials P1 (z ) and D1 (z ) such that
^
h(!) = P1 (e;i! )=D1 (e;i! ) is the Fourier transform of a stable
and causal recursive lter. Hint: nd the complex roots of
D(z ) and compute D1(z) by choosing the appropriate roots.
(d) A discrete Butterworth lter with cut-o frequency !c <
satis es
1
^
jh(!)j2 =
2N
tan(!=
1 + tan(!c =2)
2)
^
Compute h(!) for N = 3 in order to obtain a lter h which is
real, stable and causal.
1 Let a and b be two integers with many digits. Relate the product
3.9.
a b to a convolution. Explain how to use the FFT to compute this
product.
3.10. 1 Let h;1 be the inverse of h de ned by h ? h;1 n] = n].
(a) Prove that if h has a nite support then h;1 has a nite support if and only if h n] = n ; p] for some p 2 Z.
^
(b) Find a su cient condition on h(!) for h;1 to be a stable lter.
^
3.11. 1 Discrete interpolation Let f k] be the DFT of a signal f n] of
b
b
~
b
size N . We de ne f~ N=2] = f 3N=2] = f N=2] and
8 2f^ k]
if 0 k < N=2
>
b k] = < 0
f~
if N=2 < k < 3N=2 :
>^
: 2f k ; N ] if 3N=2 < k < 2N
Prove that f~ 2n] = f n]. 3.5. PROBLEMS 103 Decimation Let x n] = y M n] with M > 1.
P^
(a) Show that x(!) = M ;1 M ;1 y(M ;1 (! ; 2k )).
^
k=0
(b) Give a su cient condition on y(!) to recover y from x. De^
scribe the interpolation algorithm.
3.13. 1 Complexity of FFT
(a) Find an algorithm that multiplies two complex numbers with
3 additions and 3 multiplications.
(b) Compute the total number of additions and multiplications of
the FFT algorithm described in Section 3.3.3, for a signal of
size N .
3.14. 2 We want to compute numerically the Fourier transform of f (t).
P1
Let fd n] = f (nT ), and fp n] = +=;1 fd n ; pN ].
p
(a) Prove that the DFT of fp n] is related to the Fourier series of
fd n] and to the Fourier transform of f (t) by 3.12. 1 +1
^p k] = fd 2 k = 1 X f 2k ; 2l :
^
^
f
N
T l=;1 NT T
(b) Suppose that jf (t)j and jf^(!)j are negligible when t 2 ;t0 t0 ]
=
and ! 2 ;!0 !0 ]. Relate N and T to t0 and !0 so that one
=
^
can compute an approximation value of f (!) at all ! 2 R
^
by interpolating the samples fp k]. Is it possible to compute
^(!) with such an interpolation formula?
exactly f
4
(c) Let f (t) = sin( t)=( t) . What is the support of f^? Sample
f appropriately and compute f^ with the FFT algorithm of
Matlab.
1 Suppose that f n n ] is an image with N 2 non-zero pixels for
3.15.
12
0 n1 n2 < N . Let h n1 n2 ] be a non-separable lter with M 2
non-zero coe cients for 0 n1 n2 < M . Describe an overlapadd algorithm to compute g n1 n2 ] = f ? h n1 n2 ]. How many
operations does it require? For what range of M is it better to
compute the convolution with a direct summation? 104 CHAPTER 3. DISCRETE REVOLUTION Chapter 4
Time Meets Frequency
When we listen to music, we clearly \hear" the time variation of the
sound \frequencies." These localized frequency events are not pure
tones but packets of close frequencies. The properties of sounds are revealed by transforms that decompose signals over elementary functions
that are well concentrated in time and frequency. Windowed Fourier
transforms and wavelet transforms are two important classes of local
time-frequency decompositions. Measuring the time variations of \instantaneous" frequencies is an important application that illustrates
the limitations imposed by the Heisenberg uncertainty.
There is no unique de nition of time-frequency energy density, which
makes this topic di cult. Yet, some order can be established by proving
that quadratic time-frequency distributions are obtained by averaging
a single quadratic form called the Wigner-Ville distribution. This unied framework gives a more general perspective on windowed Fourier
transforms and wavelet transforms. 4.1 Time-Frequency Atoms 1
A linear time-frequency transform correlates the signal with a family of
waveforms that are well concentrated in time and in frequency. These
waveforms are called time-frequency atoms. Let us consider a general
family of time-frequency atoms f g 2;, where might be a multiindex parameter. We suppose that 2 L2 (R) and that k k = 1.
105 CHAPTER 4. TIME MEETS FREQUENCY 106 The corresponding linear time-frequency transform of f
de ned by
Z +1
Tf ( ) =
f (t) (t) dt = hf i: 2 L2 (R) is ;1 The Parseval formula (2.25) proves that
1 Z +1 f^(!) ^ (!) d!:
Tf ( ) =
f (t) (t) dt = 2
(4.1)
;1
;1
If (t) is nearly zero when t is outside a neighborhood of an abscissa
u, then hf i depends only on the values of f in this neighborhood.
Similarly, if ^ (!) is negligible for ! far from , then the right integral of
(4.1) proves that hf i reveals the properties of f^ in the neighborhood
of .
Z +1 Example 4.1 A windowed Fourier atom is constructed with a window
g translated by u and modulated by the frequency :
(t) = g u(t) = ei t g(t ; u): (4.2) A wavelet atom is a dilation by s and a translation by u of a mother
wavelet :
1
t;u :
(4.3)
(t) = s u(t) = p
s
s
Wavelets and windowed Fourier functions have their energy well localized in time, while their Fourier transform is mostly concentrated in a
limited frequency band. The properties of the resulting transforms are
studied in Sections 4.2 and 4.3. Heisenberg Boxes The slice of information provided by hf is
represented in a time-frequency plane (t !) by a region whose location
and width depends on the time-frequency spread of . Since
k k 2 = Z +1
;1 j (t)j2 dt = 1 i 4.1. TIME-FREQUENCY ATOMS 107 we interpret j (t)j2 as a probability distribution centered at u= Z +1
;1 t j (t)j2 dt: (4.4) The spread around u is measured by the variance
t(
2 )= Z +1
;1 (t ; u )2 j (t)j2 dt: R +1
The Plancherel formula (2.26) proves that ;1 j ^ (!)j2 d! = 2
The center frequency of ^ is therefore de ned by 1 Z +1 ! j ^ (!)j2 d!
=2
;1
and its spread around is (4.5)
k k2 . (4.6) 1 Z +1(! ; )2 j ^ (!)j2 d!:
(4.7)
!( ) = 2
;1
The time-frequency resolution of is represented in the timefrequency plane (t !) by a Heisenberg box centered at (u ), whose
width along time is t ( ) and whose width along frequency is ! ( ).
This is illustrated by Figure 4.1. The Heisenberg uncertainty Theorem
2.5 proves that the area of the rectangle is at least 1=2:
1:
(4.8)
t!
2
This limits the joint resolution of in time and frequency. The timefrequency plane must be manipulated carefully because a point (t0 !0)
is ill-de ned. There is no function that is perfectly well concentrated
at a point t0 and a frequency !0. Only rectangles with area at least
1=2 may correspond to time-frequency atoms.
2 Energy Density Suppose that for any (u ) there exists a unique atom (u ) centered at (u ) in the time-frequency plane. The timefrequency box of (u ) speci es a neighborhood of (u ) where the CHAPTER 4. TIME MEETS FREQUENCY 108 energy of f is measured by PT f (u ) = jhf 2 (u ) ij = Z +1
;1 f (t) 2 (u ) (t) dt : (4.9) Section 4.5.1 proves that any such energy density is an averaging of the
Wigner-Ville distribution, with a kernel that depends on the atoms .
ω σt ^
|φ γ(ω) | σω ξ |φγ (t )|
0 u t Figure 4.1: Heisenberg box representing an atom . 4.2 Windowed Fourier Transform 1
In 1946, Gabor 187] introduced windowed Fourier atoms to measure
the \frequency variations" of sounds. A real and symmetric window
g(t) = g(;t) is translated by u and modulated by the frequency : gu (t) = ei t g(t ; u): (4.10) It is normalized kgk = 1 so that kgu k = 1 for any (u ) 2 R 2 . The
resulting windowed Fourier transform of f 2 L2(R ) is Sf (u ) = hf gu i = Z +1
;1 f (t) g(t ; u) e;i t dt: (4.11) This transform is also called the short time Fourier transform because
the multiplication by g(t ; u) localizes the Fourier integral in the neighborhood of t = u. 4.2. WINDOWED FOURIER TRANSFORM 109 As in (4.9), one can de ne an energy density called a spectrogram,
denoted PS : PS f (u ) = jSf (u )j =
2 Z +1
;1 f (t) g(t ; u) e;i t dt 2 : (4.12) The spectrogram measures the energy of f in the time-frequency neighborhood of (u ) speci ed by the Heisenberg box of gu . Heisenberg Boxes Since g is even, gu (t) = ei tg(t ; u) is centered at u. The time spread around u is independent of u and :
2 t = Z +1
;1 (t ; u) jgu (t)j dt =
2 2 Z +1
;1 t2 jg(t)j2 dt: (4.13) The Fourier transform g of g is real and symmetric because g is real
^
and symmetric. The Fourier transform of gu is gu (!) = g(! ; ) exp ;iu(! ; )] :
^
^ (4.14) It is a translation by of the frequency window g, so its center frequency
^
is . The frequency spread around is 1 Z +1(! ; )2 jg (!)j d! = 1 Z +1 !2 jg(!)j d!: (4.15)
^u
^
!=2
2 ;1
;1
2 It is independent of u and . Hence gu corresponds to a Heisenberg
box of area t ! centered at (u ), as illustrated by Figure 4.2. The
size of this box is independent of (u ), which means that a windowed
Fourier transform has the same resolution across the time-frequency
plane. Example 4.2 A sinusoidal wave f (t) = exp(i 0t) whose Fourier transform is a Dirac f^(!) = 2 (! ; 0) has a windowed Fourier transform Sf (u ) = g(
^ ; 0 ) exp ;iu( Its energy is spread over the frequency interval ; 0
0 )] : ; ! =2 0 + ! =2]. CHAPTER 4. TIME MEETS FREQUENCY 110
ω γ ^
|g (ω) |
ξ σt ^
|gv,γ(ω) | σω
σt u, ξ σω
|gu, ξ (t) | 0 |g v ,γ (t) |
t v u Figure 4.2: Heisenberg boxes of two windowed Fourier atoms gu and
g. Example 4.3 The windowed Fourier transform of a Dirac f (t) =
(t ; u0) is Sf (u ) = g(u0 ; u) exp(;i u0) :
Its energy is spread in the time interval u0 ; t =2 u0 + t =2]. Example 4.4 A linear chirp f (t) = exp(iat2 ) has an \instantaneous frequency" that increases linearly in time. For a Gaussian window
g(t) = ( 2 );1=4 exp ;t2 =(2 2)], the windowed Fourier transform of f
is calculated using the Fourier transform (2.34) of Gaussian chirps. One
can verify that its spectrogram is
42
PS f (u ) = jSf (u )j = 1 + 4a2
2 1=2
4 exp ; ( ; 2au)2 :
1 + 4a2 4
(4.16) 2 For a xed time u, PS f (u ) is a Gaussian that reaches its maximum at
the frequency (u) = 2au. Observe that if we write f (t) = exp i (t)],
then (u) is equal to the \instantaneous frequency," de ned as the
derivative of the phase: !(u) = 0(u) = 2au. Section 4.4.1 explains
this result. 4.2. WINDOWED FOURIER TRANSFORM 111 f(t)
2
0
−2
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
500
400
300
200
100
0
0 (a) u ξ / 2π
500
400
300
200
100
0
0 u (b)
Figure 4.3: The signal includes a linear chirp whose frequency increases, a quadratic chirp whose frequency decreases, and two modulated Gaussian functions located at t = 0:5 and t = 0:87. (a) Spectrogram PS f (u ). Dark points indicate large amplitude coe cients. (b)
Complex phase of Sf (u ) in regions where the modulus PS f (u ) is
non-zero. CHAPTER 4. TIME MEETS FREQUENCY 112 Example 4.5 Figure 4.3 gives the spectrogram of a signal that in- cludes a linear chirp, a quadratic chirp and two modulated Gaussians. The spectrogram is computed with a Gaussian window dilated
by = 0:05. As expected from (4.16), the linear chirp yields large amplitude coe cients along the trajectory of its instantaneous frequency,
which is a straight line. The quadratic chirp yields large coe cients
along a parabola. The two modulated Gaussians produce low and high
frequency blobs at u = 0:5 and u = 0:87. 4.2.1 Completeness and Stability When the time-frequency indices (u ) vary across R 2 , the Heisenberg
boxes of the atoms gu cover the whole time-frequency plane. One can
thus expect that f can be recovered from its windowed Fourier transform Sf (u ). The following theorem gives a reconstruction formula
and proves that the energy is conserved. Theorem 4.1 If f 2 L2(R ) then 1 Z +1 Z +1 Sf (u ) g(t ; u) ei t d du
f (t) = 2
;1 ;1 and Z +1 1 Z +1 Z +1 jSf (u )j2 d du:
jf (t)j dt =
2 ;1 ;1
;1
2 (4.17) (4.18) Proof 1 . The reconstruction formula (4.17) is proved rst. Let us apply
the Fourier Parseval formula (2.25) to the integral (4.17) with respect
to the integration in u. The Fourier transform of f (u) = Sf (u ) with
respect to u is computed by observing that Sf (u ) = exp(;iu ) Z +1
;1 f (t) g(t;u) exp i (u;t)] dt = exp(;iu ) f?g (u) where g (t) = g(t) exp(i t), because g(t) = g(;t). Its Fourier transform
is therefore
f^ (!) = f^(! + ) g (! + ) = f^(! + ) g (!):
^
^ 4.2. WINDOWED FOURIER TRANSFORM 113 The Fourier transform of g(t ; u) with respect to u is g (!) exp(;it!).
^
Hence
1 Z +1 Z +1 Sf (u ) g(t ; u) exp(i t) du d =
;1
Z +12 1 Z +1 ;1
1
^
^2
2 ;1 2 ;1 f (! + ) jg (!)j exp it(! + )] d! d :
^
If f 2 L1(R ), we can apply the Fubini Theorem A.2 to reverse the
integration order. The inverse Fourier transform proves that 1 Z +1 f^(! + ) exp it(! + )] d = f (t):
2 ;1
R +1 ^
=
Since 21 ;1 jg(!)j2 d! = 1 we derive (4.17). If f^ 2 L1 (R), a density
argument is used to verify this formula.
Let us now prove the energy conservation (4.18). Since the Fourier
transform in u of Sf (u ) is f^(! + ) g (!), the Plancherel formula (2.26)
^
applied to the right-hand side of (4.18) gives 1 Z +1 Z +1 jSf (u )j2 du d = 1 Z +1 1 Z +1 jf^(!+ ) g (!)j2 d! d :
^
2 ;1 ;1
2 ;1 2 ;1
The Fubini theorem applies and the Plancherel formula proves that
1 Z +1 jf (! + )j2 d = kf k2
^
2 ;1
which implies (4.18). The reconstruction formula (4.17) can be rewritten 1 Z +1 Z +1 hf g i g (t) d du:
f (t) = 2
(4.19)
u
u
;1 ;1
It resembles the decomposition of a signal in an orthonormal basis but it
is not, since the functions fgu gu 2R2 are very redundant in L2 (R ). The
second equality (4.18) justi es the interpretation of the spectrogram
PS f (u ) = jSf (u )j2 as an energy density, since its time-frequency
sum equals the signal energy. CHAPTER 4. TIME MEETS FREQUENCY 114 Reproducing Kernel A windowed Fourier transform represents a one-dimensional signal f (t) by a two-dimensional function Sf (u ).
The energy conservation proves that Sf 2 L2 (R 2 ). Because Sf (u ) is
redundant, it is not true that any 2 L2(R 2 ) is the windowed Fourier
transform of some f 2 L2(R ). The next proposition gives a necessary
and su cient condition for such a function to be a windowed Fourier
transform.
Proposition 4.1 Let 2 L2(R 2 ). There exists f 2 L2(R ) such that
(u ) = Sf (u ) if and only if
1 Z +1 Z +1 (u ) K (u u
(u0 0) = 2
(4.20)
0
0 ) du d
;1 ;1
with
K (u0 u 0 ) = hgu gu0 0 i :
(4.21)
Proof 2 . Suppose that there exists f such that (u ) = Sf (u ). Let us
replace f with its reconstruction integral (4.17) in the windowed Fourier
transform de nition: Sf (u0 0 ) = Z +1
;1 1 Z +1 Z +1 Sf (u ) g (t) du d
u
2 ;1 ;1 gu0 0 (t) dt: (4.22)
Inverting the integral on t with the integrals on u and yields (4.20).
To prove that the condition (4.20) is su cient, we de ne f as in the
reconstruction formula (4.17):
Z +1 Z +1
f (t) = 21
(u ) g(t ; u) exp(i t) d du
;1 ;1
and show that (4.20) implies that (u ) = Sf (u ). Ambiguity Function The reproducing kernel K (u0 u 0 ) measures the time-frequency overlap of the two atoms gu and gu0 0 . The
amplitude of K (u0 u 0 ) decays with u0 ; u and 0 ; at a rate
that depends on the energy concentration of g and g. Replacing gu
^
and gu0 0 by their expression and making the change of variable v =
t ; (u + u0)=2 in the inner product integral (4.21) yields
i
K (u0 u 0 ) = exp ; 2 ( 0 ; )(u + u0) Ag(u0 ; u 0 ; ) (4.23) 4.2. WINDOWED FOURIER TRANSFORM
where 115 Z +1 g v + 2 g v ; 2 e;i v dv
(4.24)
;1
is called the ambiguity function of g. Using the Parseval formula to
replace this time integral with a Fourier integral gives
Z
1 +1 g ! + g ! ; ei ! d!:
Ag( ) = 2
^
(4.25)
2^
2
;1
The decay of the ambiguity function measures the spread of g in time
and of g in frequency. For example, if g has a support included in an
^
interval of size T , then Ag( !) = 0 for j j T =2. The integral (4.25)
shows that the same result applies to the support of g.
^
Ag( )= 4.2.2 Choice of Window 2 The resolution in time and frequency of the windowed Fourier transform
depends on the spread of the window in time and frequency. This can
be measured from the decay of the ambiguity function (4.24) or more
simply from the area t ! of the Heisenberg box. The uncertainty
Theorem 2.5 proves that this area reaches the minimum value 1=2 if
and only if g is a Gaussian. The ambiguity function Ag( ) is then a
two-dimensional Gaussian. Window Scale The time-frequency localization of g can be modi ed with a scaling. Suppose that g has a Heisenberg time and frequency
width respectively equal to t and ! . Let gs(t) = s;1=2 g(t=s) be its
dilation by s. A change of variables in the integrals (4.13) and (4.15)
shows that the Heisenberg time and frequency width of gs are respectively s t and ! =s. The area of the Heisenberg box is not modi ed but
it is dilated by s in time and compressed by s in frequency. Similarly,
a change of variable in the ambiguity integral (4.24) shows that the
ambiguity function is dilated in time and frequency respectively by s
and 1=s
Ags( ) = Ag s s :
The choice of a particular scale s depends on the desired resolution
trade-o between time and frequency. 116 CHAPTER 4. TIME MEETS FREQUENCY Finite Support In numerical applications, g must have a compact support. Theorem 2.6 proves that its Fourier transform g necessarily
^
has an in nite support. It is a symmetric function with a main lobe
centered at ! = 0, which decays to zero with oscillations. Figure 4.4
illustrates its behavior. To maximize the frequency resolution of the
transform, we must concentrate the energy of g near ! = 0. Three
^
important parameters evaluate the spread of g:
^
The root mean-square bandwidth !, which is de ned by
jg ( !=2)j2
^
= 1:
2
jg (0)j
^
2
The maximum amplitude A of the rst side-lobes located at ! =
!0 in Figure 4.4. It is measured in decibels:
^ !0 2
A = 10 log10 jjgg((0))jj2 :
^
The polynomial exponent p, which gives the asymptotic decay of
jg(! )j for large frequencies:
^
jg (! )j = O (! ;p;1):
^ (4.26) Table 4.1 gives the values of these three parameters for several
windows g whose supports are restricted to ;1=2 1=2] 204]. Figure 4.5 shows the graph of these windows.
To interpret these three frequency parameters, let us consider the
spectrogram of a frequency tone f (t) = exp(i 0t). If ! is small, then
jSf (u )j2 = jg( ; 0 )j2 has an energy concentrated near = 0 . The
^
side-lobes of g create \shadows" at = 0 !0 , which can be neglected
^
if A is also small.
If the frequency tone is embedded in a signal that has other components of much higher energy at di erent frequencies, the tone can still
be detected if g(! ; ) attenuates these components rapidly when j! ; j
^
increases. This means that jg(!)j has a rapid decay, and Proposition
^
2.1 proves that this decay depends on the regularity of g. Property
(4.26) is typically satis ed by windows that are p times di erentiable. 4.2. WINDOWED FOURIER TRANSFORM 117 ^ ω)
g( ∆ω
ω0 −ω0
Α ω Α Figure 4.4: The energy spread of g is measured by its bandwidth !
^
and the maximum amplitude A of the rst side-lobes, located at ! =
!0 . g(t)
!
A
p
1
0.89
;13db
0
0:54 + 0:46 cos(2 t)
1.36
;43db
0
2
exp(;18t )
1.55
;55db
0
2
cos ( t)
1.44
;32db
2
0:42 + 0:5 cos(2 t)
+0:08 cos(4 t)
1.68
;58db
2
Table 4.1: Frequency parameters of ve windows g whose supports
are restricted to ;1=2 1=2]. These windows are normalized so that
g(0) = 1 but kgk 6= 1.
Name
Rectangle
Hamming
Gaussian
Hanning
Blackman CHAPTER 4. TIME MEETS FREQUENCY 118 Hamming Gaussian 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.5 0 0.5 −0.5 Hanning 0 0.5 Blackman 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.5 0 0.5 −0.5 0 0.5 Figure 4.5: Graphs of four windows g whose support are ;1=2 1=2]. 4.2.3 Discrete Windowed Fourier Transform 2 The discretization and fast computation of the windowed Fourier transform follow the same ideas as the discretization of the Fourier transform
described in Section 3.3. We consider discrete signals of period N . The
window g n] is chosen to be a symmetric discrete signal of period N
with unit norm kgk = 1: Discrete windowed Fourier atoms are de ned
by
gm l n] = g n ; m] exp i2Nln : The discrete Fourier transform of gm l is m(k ; l) :
N
The discrete windowed Fourier transform of a signal f of period N is
gm l k] = g k ; l] exp
^
^ Sf m l] = hf gm li = N ;1
X
n=0 ;i2 f n] g n ; m] exp ;i2 N ln (4.27) For each 0 m < N , Sf m l] is calculated for 0 l < N with a
discrete Fourier transform of f n]g n ; m]. This is performed with N 4.3. WAVELET TRANSFORMS 1 119 FFT procedures of size N , and thus requires a total of O(N 2 log2 N )
operations. Figure 4.3 is computed with this algorithm. Inverse Transform The following theorem discretizes the recon- struction formula and the energy conservation of Theorem 4.1. Theorem 4.2 If f is a signal of period N then
XX
1 N ;1 N ;1 Sf m l] g n ; m] exp i2 ln
f n] = N
N
m=0 l=0 and XX
1 N ;1 N ;1 jSf m l]j2:
jf n]j =
N l=0 m=0
n=0 N ;1
X 2 (4.28) (4.29) This theorem is proved by applying the Parseval and Plancherel
formulas of the discrete Fourier transform, exactly as in the proof of
Theorem 4.1. The reconstruction formula (4.28) is rewritten
X
X
1 N ;1 g n ; m] N ;1 Sf m l] exp i2 ln :
f n] = N
N
m=0
l=0 The second sum computes for each 0 m < N the inverse discrete
Fourier transform of Sf m l] with respect to l. This is calculated with
N FFT procedures, requiring a total of O(N 2 log2 N ) operations.
A discrete windowed Fourier transform is an N 2 image Sf l m]
that is very redundant, since it is entirely speci ed by a signal f of size
N . The redundancy is characterized by a discrete reproducing kernel
equation, which is the discrete equivalent of (4.20). 4.3 Wavelet Transforms 1
To analyze signal structures of very di erent sizes, it is necessary to
use time-frequency atoms with di erent time supports. The wavelet CHAPTER 4. TIME MEETS FREQUENCY 120 transform decomposes signals over dilated and translated wavelets. A
wavelet is a function 2 L2(R ) with a zero average:
Z +1
;1 (t) dt = 0: (4.30) It is normalized k k = 1, and centered in the neighborhood of t = 0.
A family of time-frequency atoms is obtained by scaling by s and
translating it by u:
t;u :
1
u s(t) = p
s
s
These atoms remain normalized: k u sk = 1. The wavelet transform of
f 2 L2 (R ) at time u and scale s is
Z +1
1
t ; u dt:
(4.31)
Wf (u s) = hf u si =
f (t) ps
s
;1 Linear Filtering The wavelet transform can be rewritten as a convolution product: Wf (u s) =
with Z +1
;1 t ; u dt = f ? (u)
s
s 1
f (t) ps
1
s s (t) = p The Fourier transform of s(t) is (4.32) ;t s: c(! ) = ps ^ (s! ):
s (4.33) R +1
Since ^(0) = ;1 (t) dt = 0, it appears that ^ is the transfer function
of a band-pass lter. The convolution (4.32) computes the wavelet
transform with dilated band-pass lters. Analytic Versus Real Wavelets Like a windowed Fourier transform, a wavelet transform can measure the time evolution of frequency
transients. This requires using a complex analytic wavelet, which can 4.3. WAVELET TRANSFORMS 121 separate amplitude and phase components. The properties of this analytic wavelet transform are described in Section 4.3.2, and its application to the measurement of instantaneous frequencies is explained in
Section 4.4.2. In contrast, real wavelets are often used to detect sharp
signal transitions. Section 4.3.1 introduces elementary properties of
real wavelets, which are developed in Chapter 6. 4.3.1 Real Wavelets Suppose that is a real wavelet. Since it has a zero average, the wavelet
integral
Z +1
1
t ; u dt
Wf (u s) =
f (t) ps
s
;1 measures the variation of f in a neighborhood of u, whose size is proportional to s. Section 6.1.3 proves that when the scale s goes to zero,
the decay of the wavelet coe cients characterizes the regularity of f
in the neighborhood of u. This has important applications for detecting transients and analyzing fractals. This section concentrates on the
completeness and redundancy properties of real wavelet transforms. Example 4.6 Wavelets equal to the second derivative of a Gaussian are called Mexican hats. They were rst used in computer vision to
detect multiscale edges 354]. The normalized Mexican hat wavelet is
(t) = t2 2
p 1=4 3 2 ;1 exp ;t2 :
2
2 (4.34) For = 1, Figure 4.6 plots ; and its Fourier transform
^(!) = p
;8 5=2 p 3 1=4 ! exp
2 ; !2 :
2
2 (4.35) Figure 4.7 shows the wavelet transform of a signal that is piecewise
regular on the left and almost everywhere singular on the right. The
maximum scale is smaller than 1 because the support of f is normalized to 0 1]. The minimum scale is limited by the sampling interval of
the discretized signal used in numerical calculations. When the scale CHAPTER 4. TIME MEETS FREQUENCY 122 decreases, the wavelet transform has a rapid decay to zero in the regions where the signal is regular. The isolated singularities on the left
create cones of large amplitude wavelet coe cients that converge to the
locations of the singularities. This is further explained in Chapter 6.
; ; ^(! ) (t) 1 1.5 0.5 1
0.5 0 0
−0.5
−5 0 5 −5 0 5 Figure 4.6: Mexican hat wavelet (4.34) for = 1 and its Fourier transform.
A real wavelet transform is complete and maintains an energy conservation, as long as the wavelet satis es a weak admissibility condition, speci ed by the following theorem. This theorem was rst proved
in 1964 by the mathematician Calderon 111], from a di erent point
of view. Wavelets did not appear as such, but Calderon de nes a
wavelet transform as a convolution operator that decomposes the identity. Grossmann and Morlet 200] were not aware of Calderon's work
when they proved the same formula for signal processing.
Theorem 4.3 (Calderon, Grossmann, Morlet) Let 2 L2 (R) be
a real function such that
Z +1 ^ 2
j (! )j
C=
(4.36)
! d! < +1:
0
Any f 2 L2 (R ) satis es
1 Z +1 Z +1 Wf (u s) p
1
t ; u du ds
f (t) = C
(4.37)
s
s
s2
0
;1
and
Z +1
1 Z +1 Z +1 jWf (u s)j2 du ds :
2
(4.38)
jf (t)j dt =
C 0 ;1
s2
;1 4.3. WAVELET TRANSFORMS 123 f(t)
2
1
0
0
log2(s)
−6 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t −4 −2 0
0 u Figure 4.7: Real wavelet transform Wf (u s) computed with a Mexican
hat wavelet (4.34). The vertical axis represents log2 s. Black, grey
and white points correspond respectively to positive, zero and negative
wavelet coe cients. CHAPTER 4. TIME MEETS FREQUENCY 124 Proof 1 . The proof of (4.38) is almost identical to the proof of (4.18).
Let us concentrate on the proof of (4.37). The right integral b(t) of (4.37)
can be rewritten as a sum of convolutions. Inserting Wf (u s) = f ? s (u)
with s (t) = s;1=2 (t=s) yields Z +1 Wf (: s) ? s (t) ds
s2
Z0 +1
1
=C
f ? s ? s (t) ds :
s2
0 1
b(t) = C (4.39) The \." indicates the variable over which the convolution is calculated.
We prove that b = f by showing that their Fourier transforms are equal.
The Fourier transform of b is
Z +1 p
^ Z +1 ^
^(!) s ^ (s!) ps ^(s!) ds = f (!)
^(!) = 1
f
j (s !)j2 ds :
b
C
s2 C
s
0 0 Since is real we know that j ^(;!)j2 = j ^(!)j2 . The change of variable
= s! thus proves that Z +1 j ^( )j2
^(!) = 1 f (!)
^
b
d = f^(!):
C
0 (4.39) The theorem hypothesis C= Z +1 ^ 2
j (! )j
0 ! d! < +1 is called the wavelet admissibility condition. To guarantee that this
integral is nite we must ensure that ^(0) = 0, which explains why
we imposed that wavelets must have a zero average. This condition is
nearly su cient. If ^(0) = 0 and ^(!) is continuously di erentiable
then the admissibility condition is satis ed. One can verify that ^(!)
is continuously di erentiable if has a su cient time decay
Z +1
;1 (1 + jtj) j (t)j dt < +1: 4.3. WAVELET TRANSFORMS 125 Reproducing Kernel Like a windowed Fourier transform, a wavelet
transform is a redundant representation, whose redundancy is characterized by a reproducing kernel equation. Inserting the reconstruction
formula (4.37) into the de nition of the wavelet transform yields
Z +1
1 Z +1 Z +1 Wf (u s) (t) du ds
Wf (u0 s0) =
us
s2 u0 s0 (t) dt:
;1
;1 C 0
Interchanging these integrals gives
1 Z +1 K (u u s s ) Wf (u s) du ds
(4.40)
Wf (u0 s0 ) = C
0
0
s2
;1
with
K (u0 u s0 s) = h u s u0 s0 i :
(4.41)
The reproducing kernel K (u0 u s0 s) measures the correlation of two
wavelets u s and u0 s0 . The reader can verify that any function (u s)
is the wavelet transform of some f 2 L2(R ) if and only if it satis es
the reproducing kernel equation (4.40).
Scaling Function When Wf (u s) is known only for s < s0, to re- cover f we need a complement of information corresponding to Wf (u s)
for s > s0. This is obtained by introducing a scaling function that is
an aggregation of wavelets at scales larger than 1. The modulus of its
Fourier transform is de ned by
Z +1
Z +1 ^ 2
j ( )j
2
2 ds
j ^(! )j =
j ^(s! )j
=
d
(4.42)
s
1
!
and the complex phase of ^(!) can be arbitrarily chosen. One can verify
that k k = 1 and we derive from the admissibility condition (4.36) that
lim j ^(!)j2 = C :
(4.43)
!!0 The scaling function can thus be interpreted as the impulse response
of a low-pass lter. Let us denote
1
t and (t) = (;t):
s (t) = p
s
s
ss CHAPTER 4. TIME MEETS FREQUENCY 126 The low-frequency approximation of f at the scale s is
1
Lf (u s) = f (t) ps t;u
s = f ? s(u): (4.44) With a minor modi cation of the proof of Theorem 4.3, it can be shown
that
1 Z s0 Wf (: s) ? (t) ds + 1 Lf (: s ) ? (t): (4.45)
f (t) = C
s
0
s0
s2 C s0
0 Example 4.7 If is the second order derivative of a Gaussian whose
Fourier transform is given by (4.35), then the integration (4.42) yields
^(!) = 2 3=2 1=4 p r !2 + 12 exp 3
Figure 4.8 displays and ^ for = 1. ; !
2 22 : (4.46) ^(!) (t)
1.5 0.8
0.6 1 0.4
0.5
0.2
0 0
−5 0 5 −5 0 5 Figure 4.8: Scaling function associated to a Mexican hat wavelet and
its Fourier transform calculated with (4.46). 4.3.2 Analytic Wavelets To analyze the time evolution of frequency tones, it is necessary to use
an analytic wavelet to separate the phase and amplitude information
of signals. The properties of the resulting analytic wavelet transform
are studied. 4.3. WAVELET TRANSFORMS 127 Analytic Signal A function fa 2 L2(R ) is said to be analytic if its Fourier transform is zero for negative frequencies:
f^a (!) = 0 if ! < 0:
An analytic function is necessarily complex but is entirely characterized
by its real part. Indeed, the Fourier transform of its real part f =
Real fa ] is
^
^
f^(!) = fa(!) +2fa (;!)
and this relation can be inverted:
^
0
f^a (!) = 2 f (!) if ! < 0 :
(4.47)
0
if !
The analytic part fa (t) of a signal f (t) is the inverse Fourier transform
of f^a (!) de ned by (4.47). Discrete Analytic Part The analytic part fa n] of a discrete sig- nal f n] of size N is also computed by setting to zero the negative
frequency components of its discrete Fourier transform. The Fourier
transform values at k = 0 and k = N=2 must be carefully adjusted so
that Real fa ] = f :
8^
if k = 0 N=2
< f k]
^a k] = 2 f^ k] if 0 < k < N=2 :
f
(4.48)
:
0
if N=2 < k < N
We obtain fa n] by computing the inverse discrete Fourier transform. Example 4.8 The Fourier transform of
is f (t) = a cos(!0t + ) = a exp i(!0 t + )] + exp ;i(!0 t + )]
2 f^(!) = a exp(i ) (! ; !0 ) + exp(;i ) (! + !0) :
The Fourier transform of the analytic part computed with (4.47) is
f^a (!) = 2 a exp(i ) (! ; !0) and hence
fa (t) = a exp i(!0t + )]:
(4.49) CHAPTER 4. TIME MEETS FREQUENCY 128 Time-Frequency Resolution An analytic wavelet transform is calculated with an analytic wavelet : Wf (u s) = hf u si = Z +1
;1 1
f (t) ps t ; u dt:
s (4.50) Its time-frequency resolution depends on the time-frequency spread of
the wavelet atoms u s. We suppose that is centered at 0, which
implies that u s is centered at t = u. With the change of variable
v = t;u , we verify that
s
Z +1
;1 (t ; u)2 j u s(t)j 2 dt = s2 2 t (4.51) R +1
with t2 = ;1 t2 j (t)j2 dt. Since ^(!) is zero at negative frequencies,
the center frequency of ^ is 1 Z +1 ! j ^(!)j2 d!:
=
20 The Fourier transform of u s is
p (4.52) a dilation of ^ by 1=s: ^u s(!) = s ^(s!) exp(;i!u) : (4.53) Its center frequency is therefore =s. The energy spread of ^u s around
=s is
2
1 Z +1 ! ; 2 ^ (!) 2 d! = !
(4.54)
us
2
s
s2
0 with 1 Z +1(! ; )2 j ^(!)j2 d!:
!=2
0
The energy spread of a wavelet time-frequency atom u s thus corresponds to a Heisenberg box centered at (u =s), of size s t along time
and ! =s along frequency. The area of the rectangle remains equal to
t ! at all scales but the resolution in time and frequency depends on
s, as illustrated in Figure 4.9.
2 4.3. WAVELET TRANSFORMS 129 An analytic wavelet transform de nes a local time-frequency energy
density PW f , which measures the energy of f in the Heisenberg box of
each wavelet u s centered at (u = =s):
2 PW f (u ) = jWf (u s)j2 = W f u : (4.55) This energy density is called a scalogram.
ω ^
|ψ (ω)|
u,s η
s σω
s
s σt s0σt ^
|ψu ,s(ω)|
00 η
s0 ψ u ,s ψu,s
0 0 u σω
s0 0 u0 t Figure 4.9: Heisenberg boxes of two wavelets. Smaller scales decrease
the time spread but increase the frequency support, which is shifted
towards higher frequencies. Completeness An analytic wavelet transform of f depends only on
its analytic part fa . The following theorem derives a reconstruction
formula and proves that energy is conserved for real signals. Theorem 4.4 For any f 2 L2(R )
1
Wf (u s) = 2 Wfa (u s): If C = R +1 ;1 ^ 2
! j (!)j d! < +1 and f is real then
0
Z +1 Z +1 2
f (t) = C Real 0 ;1 Wf (u s) s(t ; u) du ds
s2 (4.56) (4.57) CHAPTER 4. TIME MEETS FREQUENCY 130
and 2 Z +1 Z +1 jWf (u s)j2 du ds :
kf k =
C 0 ;1
s2
2 (4.58) Proof 1 . Let us rst prove (4.56). The Fourier transform with respect
to u of
fs(u) = Wf (u s) = f ? s (u)
is
p
f^s(!) = f^(!) s ^ (s!):
^
Since ^(!) = 0 at negative frequencies, and fa (!) = 2f^(!) for ! 0,
we derive that
p
f^s(!) = 1 f^a (!) s ^ (s!)
2
which is the Fourier transform of (4.56).
With the same derivations as in the proof of (4.37) one can verify
that the inverse wavelet formula reconstructs the analytic part of f : 1
fa (t) = C Z +1 Z +1
0 ;1 Wfa(u s) s (t ; u) ds du:
s2 (4.59) Since f = Real fa ], inserting (4.56) proves (4.57).
An energy conservation for the analytic part fa is proved as in (4.38)
by applying the Plancherel formula: Z +1
;1 jfa (t)j dt = C1
2 Z +1 Z +1
0 ;1 jWaf (u s)j2 du ds :
s2 Since Wfa(u s) = 2Wf (u s) and kfa k2 = 2kf k2 , equation (4.58) follows. If f is real the change of variable = 1=s in the energy conservation
(4.58) proves that
2 Z +1 Z +1 P f (u ) du d :
kf k =
C 0 ;1 W
2 It justi es the interpretation of a scalogram as a time-frequency energy
density. 4.3. WAVELET TRANSFORMS 131 Wavelet Modulated Windows An analytic wavelet can be constructed with a frequency modulation of a real and symmetric window
g. The Fourier transform of
(t) = g(t) exp(i t) (4.60) is ^(!) = g(! ; ): If g(!) = 0 for j!j > then ^(!) = 0 for ! < 0.
^
^
Hence is analytic, as shown in Figure 4.10. Since g is real and even,
g is also real and symmetric. The center frequency of ^ is therefore
^
and
j ^( )j = sup j ^(! )j = g (0):
^
(4.61)
A Gabor wavelet ! 2R
(t) = g(t) ei t is obtained with a Gaussian window 2
g(t) = ( 2 1)1=4 exp ;t2 :
2 (4.62) The Fourier transform of this window is g(!) = (4 2)1=4 exp(; 2 !2=2):
^
If 2 2 1 then g(!) 0 for j!j > . Such Gabor wavelets are thus
^
considered to be approximately analytic.
^
ψ(ω)
^
g(ω ) 0 ω η Figure 4.10: Fourier transform ^(!) of a wavelet (t) = g(t) exp(i t). Example 4.9 The wavelet transform of f (t) = a exp(i!0t) is
p p Wf (u s) = a s ^ (s!0) exp(i!0t) = a s g(s!0 ; ) exp(i!0t):
^
Observe that the normalized scalogram is maximum at = !0 : PW f (u ) = 1 jWf (u s)j2 = a2 g
^
s !0 ; 1 2 : CHAPTER 4. TIME MEETS FREQUENCY 132 Example 4.10 The wavelet transform of a linear chirp f (t) = exp(iat2 ) =
exp i (t)] is computed for a Gabor wavelet whose Gaussian window is
(4.62). By using the Fourier transform of Gaussian chirps (2.34) one
can verify that
jWf (u s s)j2 = 42
1 + 4s2a2 1=2
4 2 exp 1 + ;a2s4 4 (
4 ; 2asu)2 : As long as 4a2 s4 4 1, at a xed time u the renormalized scalogram
;1 PW f (u ) is a Gaussian function of s that reaches its maximum at
(u) = s(u) = 0(u) = 2 a u: (4.63) Section 4.4.2 explains why the amplitude is maximum at the instantaneous frequency 0(u). Example 4.11 Figure 4.11 displays the normalized scalogram ;1 PW f (u ),
and the complex phase W (u ) of Wf (u s), for the signal f of Figure 4.3. The frequency bandwidth of wavelet atoms is proportional to
1=s = = . The frequency resolution of the scalogram is therefore ner
than the spectrogram at low frequencies but coarser than the spectrogram at higher frequencies. This explains why the wavelet transform
produces interference patterns between the high frequency Gabor function at the abscissa t = 0:87 and the quadratic chirp at the same
location, whereas the spectrogram in Figure 4.3 separates them well. 4.3.3 Discrete Wavelets 2 Let f_(t) be a continuous time signal that is uniformly sampled at intervals N ;1 over 0 1]. Its wavelet transform can only be calculated at
scales N ;1 < s < 1, as shown in Figure 4.7. In discrete computations,
it is easier to normalize the sampling distance to 1 and thus consider
the dilated signal f (t) = f_(N ;1 t). A change of variable in the wavelet
transform integral (4.31) proves that
W f_(u s) = N ;1=2 Wf (Nu Ns) : 4.3. WAVELET TRANSFORMS 133 ξ / 2π
400
300
200
100
0
0 0.2 0.4 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 u ξ / 2π
400
300
200
100
0
0 u (b)
Figure 4.11: (a) Normalized scalogram ;1 PW f (u ) computed from
the signal in Figure 4.3. Dark points indicate large amplitude coe cients. (b) Complex phase W (u ) of Wf (u = ), where the modulus
is non-zero. 134 CHAPTER 4. TIME MEETS FREQUENCY To simplify notation, we concentrate on f and denote f n] = f (n) the
discrete signal of size N . Its discrete wavelet transform is computed at
scales s = aj , with a = 21=v , which provides v intermediate scales in
each octave 2j 2j+1).
Let (t) be a wavelet whose support is included in ;K=2 K=2].
For 2 aj N K ;1 , a discrete wavelet scaled by aj is de ned by
1
n:
j n] = p j
aj
a
This discrete wavelet has Kaj non-zero values on ;N=2 N=2]. The
scale aj is larger than 2 otherwise the sampling interval may be larger
than the wavelet support. Fast Transform To avoid border problems, we treat f n] and the
wavelets j n] as periodic signals of period N . The discrete wavelet
transform can then be written as a circular convolution j n] = j ;n]: Wf n aj ] = N ;1
X
m=0 f m] j m ; n] = f ? j n]: (4.64) This circular convolution is calculated with the fast Fourier transform
algorithm, which requires O(N log2 N ) operations. If a = 21=v , there
are v log2(N=(2K )) scales aj 2 2N ;1 K ;1 ]. The total number of operations to compute the wavelet transform over all scales is therefore
O(vN (log2 N )2 ) 291].
To compute the scalogram PW n ] = jWf n ]j2 we calculate
Wf n s] at any scale s with a parabola interpolation. Let j be the
closest integer to log2 s=log2 a, and p(x) be the parabola such that
p(j ; 1) = Wf n aj;1] p(j ) = Wf n aj ] p(j + 1) = Wf n aj+1]:
A second order interpolation computes
log s
Wf n s] = p log2 a :
2
Parabolic interpolations are used instead of linear interpolations in order to locate more precisely the ridges de ned in Section 4.4.2. 4.3. WAVELET TRANSFORMS 135 Discrete Scaling Filter A wavelet transform computed up to a scale aJ is not a complete signal representation. It is necessary to add the low
frequencies Lf n aJ ] corresponding to scales larger than aJ . A discrete
and periodic scaling lter is computed by sampling the scaling function
(t) de ned in (4.42):
n for n 2 ;N=2 N=2]:
1
J n] = p J
aJ
a
Let J n] = J ;n]. The low frequencies are carried by
Lf n aJ ] = N ;1
X
m=0 f m] J m ; n] = f ? J n]: (4.65) Reconstruction An inverse wavelet transform is implemented by discretizing the integral (4.45). Suppose that aI = 2 is the nest scale.
Since ds=s2 = d loge s=s and the discrete wavelet transform is computed
along an exponential scale sequence faj gj with a logarithmic increment
d loge s = loge a, we obtain
J
loge a X 1 Wf : aj ] ? n] + 1 Lf : aJ ] ? n]: (4.66)
f n] C
j
J
j
C aJ
j =I a
The \." indicates the variable over which the convolution is calculated. These circular convolutions are calculated using the FFT, with
O(vN (log2 N )2 ) operations.
Analytic wavelet transforms are often computed over real signals
f n] that have no energy at low frequencies. In this case do not use a
scaling lter J n]. Theorem 4.4 shows that
!
J
2 loge a Real X 1 Wf : aj ] ? n] :
f n]
(4.67)
j
C
aj
j =I
The error introduced by the discretization of scales decreases when
the number v of voices per octave increases. However, the approximation of continuous time convolutions with discrete convolutions also
creates high frequency errors. Perfect reconstructions can be obtained
with a more careful design of the reconstruction lters. Section 5.5.2
describes an exact inverse wavelet transform computed at dyadic scales
aj = 2j . 136 CHAPTER 4. TIME MEETS FREQUENCY 4.4 Instantaneous Frequency 2
When listening to music, we perceive several frequencies that change
with time. This notion of instantaneous frequency remains to be dened. The time variation of several instantaneous frequencies can be
measured with time-frequency decompositions, and in particular with
windowed Fourier transforms and wavelet transforms. Analytic Instantaneous Frequency A cosine modulation f (t) = a cos(w0 t + 0) = a cos (t)
has a frequency !0 that is the derivative of the phase (t) = w0t + 0.
To generalize this notion, real signals f are written as an amplitude a
modulated with a time varying phase :
f (t) = a(t) cos (t) with a(t) 0 :
(4.68)
The instantaneous frequency is de ned as a positive derivative of the
phase:
!(t) = 0(t) 0 :
The derivative can be chosen to be positive by adapting the sign of
(t). One must be careful because there are many possible choices of
a(t) and (t), which implies that !(t) is not uniquely de ned relative
to f .
A particular decomposition (4.68) is obtained from the analytic part
fa of f , whose Fourier transform is de ned in (4.47) by
^
!0
f^a (!) = 2 f (!) if ! < 0 :
(4.69)
0
if
This complex signal is represented by separating the modulus and the
complex phase:
fa(t) = a(t) exp i (t)] :
(4.70)
Since f = Real fa], it follows that
f (t) = a(t) cos (t):
We call a(t) the analytic amplitude of f (t) and 0(t) its instantaneous
frequency they are uniquely de ned. 4.4. INSTANTANEOUS FREQUENCY 137 Example 4.12 If f (t) = a(t) cos(!0t + 0), then
1
f^(!) = 2 exp(i 0 ) ^(! ; !0 ) + exp(;i 0 ) ^(! + !0) :
a
a
2
If the variations of a(t) are slow compared to the period !0 , which is
achieved by requiring that the support of a be included in ;!0 !0],
^
then
f^a (!) = a(! ; !0) exp(i 0)
^
so fa(t) = a(t) exp i(!0 t + 0)]. If a signal f is the sum of two sinusoidal waves: f (t) = a cos(!1t) + a cos(!2 t)
then
1
i
fa (t) = a exp(i!1 t)+a exp(i!2 t) = a cos 2 (!1 ; !2) t exp 2 (!1 + !2 ) t :
The instantaneous frequency is 0(t) = (!1 + !2)=2 and the amplitude
is
a(t) = a cos 1 (!1 ; !2 ) t :
2
This result is not satisfying because it does not reveal that the signal
includes two sinusoidal waves of the same amplitude. It measures an
average frequency value. The next sections explain how to measure the
instantaneous frequencies of several spectral components by separating
them with a windowed Fourier transform or a wavelet transform. We
rst describe two important applications of instantaneous frequencies. Frequency Modulation In signal communications, information can be transmitted through the amplitude a(t) (amplitude modulation) or
the instantaneous frequency 0(t) (frequency modulation) 65]. Frequency modulation is more robust in the presence of additive Gaussian white noise. In addition, it better resists multi-path interferences, CHAPTER 4. TIME MEETS FREQUENCY 138 which destroy the amplitude information. A frequency modulation
sends a message m(t) through a signal f (t) = a cos (t) with 0 (t) = ! 0 + k m(t): The frequency bandwidth of f is proportional to k. This constant is
adjusted depending on the transmission noise and the available bandwidth. At the reception, the message m(t) is restored with a frequency
demodulation that computes the instantaneous frequency 0(t) 101]. Additive Sound Models Musical sounds and voiced speech segments can be modeled with sums of sinusoidal partials: f (t) = K
X
k=1 fk (t) = K
X
k=1 ak (t) cos k (t) (4.71) where ak and 0k are vary slowly 296, 297]. Such decompositions are
useful for pattern recognition and for modifying sound properties 245].
Sections 4.4.1 and 4.4.2 explain how to compute ak and the instantaneous frequency 0k of each partial, from which the phase k is derived
by integration.
To compress the sound f by a factor in time, without modifying
the values of 0k and ak , we synthesize g(t) = K
X
k=1 ak ( t) cos 1 k ( t) : (4.72) The partials of g at t = t0 and the partials of f at t = t0 have the
same amplitudes and instantaneous frequencies. If > 1, the sound g
is shorter but it is perceived as having the same \frequency content"
as f .
A frequency transposition is calculated by multiplying each phase
by a constant : g(t) = K
X
k=1 bk (t) cos k (t) : (4.73) 4.4. INSTANTANEOUS FREQUENCY 139 The instantaneous frequency of each partial is now 0k (t). To compute
new amplitudes bk (t), we use a resonance model, which supposes that
these amplitudes are samples of a smooth frequency envelope F (t !): ak (t) = F t 0 (t)
k and bk (t) = F t 0 (t)
k : This envelope is called a formant in speech processing. It depends
on the type of phoneme that is pronounced. Since F (t !) is a regular
function of !, its amplitude at ! = 0k (t) is calculated by interpolating
the values ak (t) corresponding to ! = 0k (t). 4.4.1 Windowed Fourier Ridges The spectrogram PS f (u ) = jSf (u )j2 measures the energy of f in a
time-frequency neighborhood of (u ). The ridge algorithm computes
instantaneous frequencies from the local maxima of PS f (u ). This
approach was introduced by Delprat, Escudie, Guillemain, KronlandMartinet, Tchamitchian and Torresani 154, 71] to analyze musical
sounds. Since then it has found applications for a wide range of signals
201, 71] that have time varying frequency tones.
The windowed Fourier transform is computed with a symmetric
window g(t) = g(;t) whose support is equal to ;1=2 1=2]. The Fourier
transform g is a real symmetric function and jg(!)j g(0) for all ! 2 R .
^
^
^
R 1=2
The maximum g(0) = ;1=2 g(t) dt is on the order of 1. Table 4.1 gives
^
several examples of such windows. The window g is normalized so that
kg k = 1. For a xed scale s, gs(t) = s;1=2 g (t=s) has a support of size s
and a unit norm. The corresponding windowed Fourier atoms are gs u (t) = gs(t ; u) ei t
and the windowed Fourier transform is de ned by Sf (u ) = hf gs u i = Z +1
;1 f (t) gs(t ; u) e;i t dt: (4.74) The following theorem relates Sf (u ) to the instantaneous frequency
of f . CHAPTER 4. TIME MEETS FREQUENCY 140 Theorem 4.5 Let f (t) = a(t) cos (t). If
hf gs u i = p s a(u) exp(i (u) ; u]) g(s
^
2 The corrective term satis es
j (u )j
a1 + a2 + with 2 0 then
; 0 (u)]) + + sup j!j s (u)
0 (u ) :
(4.75) jg (! )j
^ 2 00
s ja0(u)j
j
sup s jaau()tj)j
a1
a2
ja(u)j
(
jt;uj s=2
and if s ja0 (u)j ja(u)j;1 1, then
sup s2 j 00(t)j :
2 If = 0 (u) then (4.77)
(4.78) jt;uj s=2 s ja0(u)j g0 2 s 0(u)
^
a1 =
ja(u)j (4.76) : (4.79) Proof 2 . Observe that h f gs u i = Z +1
;1
Z a(t) cos (t) gs (t ; u) exp(;i t) dt 1 +1 a(t) (exp i (t)] + exp ;i (t)]) g (t ; u) exp ;i t] dt
=2
s
;1
= I ( ) + I (; ):
We rst concentrate on
1 Z +1 a(t) exp i (t)] g (t ; u) exp(;i t) dt
I( ) = 2
s
;1
1 Z +1 a(t + u) ei (t+u) g (t) exp ;i (t + u)] dt:
=2
s
;1
This integral is computed by using second order Taylor expansions: a(t + u) = a(u) + t a0(u) + t2 (t) with j (t)j
2 (t + u) = 2
(u) + t 0 (u) + t2 (t) with j (t)j sup ja00 (h)j h2 u t+u] sup j 00 (h)j : h2 u t+u] 4.4. INSTANTANEOUS FREQUENCY 141 We get
2 exp ;i( (u) ; u) I ( ) = Z +1 2
a(u) gs (t) exp ;it( ; 0(u)) exp i t2 (t) dt
;1
Z +1
2
+
a0 (u) t gs (t) exp ;it( ; 0(u)) exp i t2 (t) dt +1
2 ;1
Z +1 ;1 (t) t2 gs (t) exp ;i(t + (u) ; (t + u)) dt : A rst order Taylor expansion of exp(ix) gives
Since 2
2
exp i t2 (t) = 1 + t2 (t) (t) with j (t)j 1 : Z +1
;1 p^
gs (t) exp ;it( ; 0 (u))] dt = s g(s ; 0 (u)]) inserting (4.80) in the expression of I ( ) yields p
I ( ) ; 2s a(u) exp i( (u) ; u)] g( ; 0 (u))
^ with
+ a1
a2
2 ps ja(u)j
4 ( +1 + a 2 + 2 )
a
(4.81) 2ja0 (u)j Z +1 t p g (t) exp ;it( ; 0 (u))] dt
1
= ja(u)j
ss
;1
Z +1
1
=
t2 j (t)j ps jgs (t)j dt
= (4.82)
(4.83) Z;11
+ 1
t2 j (t)j ps jgs (t)j dt
;1
ja0 (u)j Z +1 jt3 j j (t)j p jg (t)jdt:
1
+ ja(u)j
ss
;1 Applying (4.81) to I (; ) gives jI (; )j (4.80) ps ja(u)j
2 jg( + 0 (u))j +
^ ps ja(u)j
4 ( ;1 + a 2 +
a (4.84) 2 ) CHAPTER 4. TIME MEETS FREQUENCY 142
with
0
; = 2ja (u)j
a1 ja(u)j Z +1
;1 1
t ps gs (t) exp ;it( + 0 (u))] dt : (4.85) 0 and 0 (u) 0, we derive that Since jg(s + 0 (u)])j
^
and hence sup jg (!)j
^ j!j s (u)
0 ps ^
I ( ) + I (; ) = 2 a(u) exp i( (u) ; u)] g(s ; 0 (u)]) + (u )
with +
+;
(u ) = a 1 2 a 1 + a 2 + 2 + sup j!j sj (u)j
0 jg(!)j :
^ Let us now verify the upper bound (4.77) for a 1 = ( +1 + ;1 )=2.
a
a
Since gs (t) = s;1=2 g(t=s), a simple calculation shows that for n 0 Z +1
;1 1
jtjn p s jgs (t)j dt = sn Z 1=2 ;1=2 sn
sn
jtjn jg(t)j dt 2n kgk2 = 2n : (4.86) Inserting this for n = 1 in (4.82) and (4.85) gives
a1 = ; a1+ a1
+ 2 s ja0 (u)j :
ja(u)j The upper bounds (4.77) and (4.78) of the second order terms a 2
and 2 are obtained by observing that the remainder (t) and (t) of
the Taylor expansion of a(t + u) and (t + u) satisfy
sup j (t)j jtj s=2 sup ja00 (t)j jt;uj s=2 sup j (t)j jtj s=2 Inserting this in (4.83) yields
a2 2 j 00
sup s jaau(t)j :
jt;uj s=2 ( )j sup j 00 (t)j: (4.87) jt;uj s=2 4.4. INSTANTANEOUS FREQUENCY 143 When s ja0 (u)jja(u)j;1 1, replacing j (t)j by its upper bound in (4.84)
gives
1 1 + s ja0 (u)j sup s2 j 00 (t)j
sup s2 j 00 (t)j:
2
2
ja(u)j jt;uj s=2
jt;uj s=2
Let us nally compute a when = 0 (u). Since g(t) = g(;t), we
derive from (4.82) that
0 2ja (u)j
a 1 = ja(u)j
+ Z +1
;1 1
t ps gs (t) dt = 0 : 1
We also derive from (2.22) that the Fourier transform of t ps gs (t) is
i s g0 (s!), so (4.85) gives
^
1 ; sja0 (u)j ^0 0
(4.87)
a = 2 a 1 = ja(u)j jg (2s (u))j : Delprat et al. 154] give a di erent proof of a similar result when g(t) is
a Gaussian, using a stationary phase approximation. If we can neglect
the corrective term (u ) we shall see that (4.75) enables us to measure
a(u) and 0(u) from Sf (u ). This implies that the decomposition
f (t) = a(t) cos (t) is uniquely de ned. By reviewing the proof of
Theorem 4.5, one can verify that a and 0 are the analytic amplitude
and instantaneous frequencies of f .
The expressions (4.77, 4.78) show that the three corrective terms
, a 2 and 2 are small if a(t) and 0(t) have small relative variations
a1
over the support of the window gs. Let ! be the bandwidth of g
^
de ned by
jg (! )j
^
1 for j!j
!:
(4.88)
The term sup jg(!)j of (u ) is negligible if
^
j!j sj (u)j
0 0 (u) !:
s Ridge Points Let us suppose that a(t) and 0(t) have small variations
over intervals of size s and that 0(t)
!=s so that the corrective term
(u ) in (4.75) can be neglected. Since jg(!)j is maximum at ! = 0,
^ 144 CHAPTER 4. TIME MEETS FREQUENCY (4.75) shows that for each u the spectrogram jSf (u )j2 = jhf gs u ij2
is maximum at (u) = 0(u). The corresponding time-frequency points
(u (u)) are called ridges. At ridge points, (4.75) becomes
p
Sf (u ) = 2s a(u) exp(i (u) ; u]) g(0) + (u ) : (4.89)
^
Theorem 4.5 proves that the (u ) is smaller at a ridge point because
the rst order term a 1 becomes negligible in (4.79). This is shown by
verifying that jg0(2s 0(u))j is negligible when s 0(u)
^
!. At ridge
points, the second order terms a 2 and 2 are predominant in (u ).
The ridge frequency gives the instantaneous frequency (u) = 0(u)
and the amplitude is calculated by
(
p
a(u) = 2 jSfs(ug(0)u))j :
(4.90)
j^ j
Let S (u ) be the complex phase of Sf (u ). If we neglect the corrective term, then (4.89) proves that ridges are also points of stationary
phase:
@ S (u ) = 0(u) ; = 0:
@u
Testing the stationarity of the phase locates the ridges more precisely. Multiple Frequencies When the signal contains several spectral lines whose frequencies are su ciently apart, the windowed Fourier
transform separates each of these components and the ridges detect
the evolution in time of each spectral component. Let us consider f (t) = a1 (t) cos 1(t) + a2 (t) cos 2(t)
where ak (t) and 0k (t) have small variations over intervals of size s and
s 0k (t)
!. Since the windowed Fourier transform is linear, we apply
(4.75) to each spectral component and neglect the corrective terms:
p
^
Sf (u ) = 2s a1 (u) g(s ; 01(u)]) exp(i 1 (u) ; u])
p
+ 2s a2 (u) g(s ; 02 (u)]) exp(i 2 (u) ; u]) :(4.91)
^ 4.4. INSTANTANEOUS FREQUENCY 145 The two spectral components are discriminated if for all u
g(sj 01(u) ; 02(u)j) 1
^
(4.92)
which means that the frequency di erence is larger than the bandwidth
of g(s!):
^
!:
(4.93)
j 01 (u) ; 02 (u)j
s
In this case, when = 01(u), the second term of (4.91) can be neglected
and the rst term generates a ridge point from which we may recover
0 (u) and a1 (u), using (4.90). Similarly, if = 0 (u) the rst term can
1
2
be neglected and we have a second ridge point that characterizes 02 (u)
and a2(u). The ridge points are distributed along two time-frequency
lines (u) = 01(u) and (u) = 02(u). This result is valid for any number
of time varying spectral components, as long as the distance between
any two instantaneous frequencies satis es (4.93). If two spectral lines
are too close, they interfere, which destroys the ridge pattern.
Generally, the number of instantaneous frequencies is unknown. We
thus detect all local maxima of jSf (u )j2 which are also points of stationary phase @ S (u ) = 0(u) ; = 0. These points de ne curves in
@u
the (u ) planes that are the ridges of the windowed Fourier transform.
Ridges corresponding to a small amplitude a(u) are often removed because they can be artifacts of noise variations, or \shadows" of other
instantaneous frequencies created by the side-lobes of g(!).
^
Figure 4.12 displays the ridges computed from the modulus and
phase of the windowed Fourier transform shown in Figure 4.3. For
t 2 0:4 0:5], the instantaneous frequencies of the linear chirp and the
quadratic chirps are close and the frequency resolution of the window
is not su cient to discriminate them. As a result, the ridges detect a
single average instantaneous frequency. Choice of Window The measurement of instantaneous frequencies at ridge points is valid only if the size s of the window gs is su ciently
small so that the second order terms a 2 and 2 in (4.77,4.78) are
small:
2 00
(t
sup sjaja(ku)j)j 1 and sup s2j 0k0 (t)j 1 :
(4.94)
jt;uj s=2 k
jt;uj s=2 CHAPTER 4. TIME MEETS FREQUENCY 146
ξ/2π
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1 u Figure 4.12: Larger amplitude ridges calculated from the spectrogram
in Figure 4.3. These ridges give the instantaneous frequencies of the linear and quadratic chirps, and of the low and high frequency transients
at t = 0:5 and t = 0:87.
On the other hand, the frequency bandwidth !=s must also be su ciently small to discriminate consecutive spectral components in (4.93).
The window scale s must therefore be adjusted as a trade-o between
both constraints.
Table 4.1 gives the spectral parameters of several windows of compact support. For instantaneous frequency detection, it is particularly
important to ensure that g has negligible side-lobes at !0, as illus^
trated by Figure 4.4. The reader can verify with (4.75) that these sidelobes \react" to an instantaneous frequency 0(u) by creating shadow
maxima of jSf (u )j2 at frequencies = 0(u) !0. The ratio of the
amplitude of these shadow maxima to the amplitude of the main local maxima at = 0(u) is jg(!0)j2 jg(0)j;2. They can be removed by
^
^
thresholding or by testing the stationarity of the phase. Example 4.13 The sum of two parallel linear chirps
f (t) = a1 cos(bt2 + ct) + a2 cos(bt2 )
(4.95)
has two instantaneous frequencies 01(t) = 2bt + c and 02 (t) = 2bt.
Figure 4.13 gives a numerical example.
The window gs has enough frequency resolution to discriminate both
chirps if
!:
j 01 (t) ; 02 (t)j = jcj
(4.96)
s 4.4. INSTANTANEOUS FREQUENCY 147 f(t)
0.5
0
−0.5
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
500
400
300
200
100
0
0 (a) u ξ / 2π
500
400
300
200
100
0
0 u (b)
Figure 4.13: Sum of two parallel linear chirps. (a): Spectrogram
PS f (u ) = jSf (u )j2. (b): Ridges calculated from the spectrogram. 148 CHAPTER 4. TIME MEETS FREQUENCY Its time support is small enough compared to their time variation if s2 j 010(u)j = s2 j 020(u)j = 2 b s2 1: (4.97) Conditions (4.96) and (4.97) prove that we can nd an appropriate
window g if and only if
c
p
!:
(4.98)
b
Since g is a smooth window with a support ;1=2 1=2], its frequency
bandwidth ! is on the order of 1. The linear chirps in Figure 4.13
satisfy (4.98). Their ridges are computed with the truncated Gaussian
window of Table 4.1, with s = 0:5. Example 4.14 The hyperbolic chirp
f (t) = cos ;t for 0 t < has an instantaneous frequency
0 (t) = ( ; t)2
which varies quickly when t is close to . The instantaneous frequency
of hyperbolic chirps goes from 0 to +1 in a nite time interval. This
is particularly useful for radars. These chirps are also emitted by the
cruise sonars of bats 154].
The instantaneous frequency of hyperbolic chirps cannot be estimated with a windowed Fourier transform because for any xed window
size the instantaneous frequency varies too quickly at high frequencies.
When u is close enough to then (4.94) is not satis ed because
2
s2 j 00(u)j = ( s; u)3 > 1:
Figure 4.14 shows a signal that is a sum of two hyperbolic chirps: f (t) = a1 cos + a2 cos
1;t
1 2 2 ;t (4.99) 4.4. INSTANTANEOUS FREQUENCY 149 with 1 = 0:68 and 2 = 0:72. At the beginning of the signal, the
two chirps have close instantaneous frequencies that are discriminated
by the windowed Fourier ridge computed with a large size window.
When getting close to 1 and 2 , the instantaneous frequency varies
too quickly relative to the window size. The resulting ridges cannot
follow these instantaneous frequencies. 4.4.2 Wavelet Ridges
Windowed Fourier atoms have a xed scale and thus cannot follow the
instantaneous frequency of rapidly varying events such as hyperbolic
chirps. In contrast, an analytic wavelet transform modi es the scale
of its time-frequency atoms. The ridge algorithm of Delprat et al.
154] is extended to analytic wavelet transforms to accurately measure
frequency tones that are rapidly changing at high frequencies.
An approximately analytic wavelet is constructed in (4.60) by multiplying a window g with a sinusoidal wave:
(t) = g(t) exp(i t) :
As in the previous section, g is a symmetric window with a support
equal to ;1=2 1=2], and a unit norm kgk = 1. Let ! be the bandwidth of g de ned in (4.88). If > ! then
^
8! <0 ^(!) = g(! ; )
^ 1: The wavelet is not strictly analytic because its Fourier transform is
not exactly equal to zero at negative frequencies.
Dilated and translated wavelets can be rewritten
1
t ; u = g (t) exp(;i u)
u s (t) = p
su
s
s
with = =s and
p
gs u (t) = s g t ; u exp(i t):
s CHAPTER 4. TIME MEETS FREQUENCY 150 f(t)
1
0
−1
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
500
400
300
200
100
0
0 (a) u ξ / 2π
500
400
300
200
100
0
0 u (b)
Figure 4.14: Sum of two hyperbolic chirps. (a): Spectrogram PS f (u ).
(b): Ridges calculated from the spectrogram 4.4. INSTANTANEOUS FREQUENCY 151 The resulting wavelet transform uses time-frequency atoms similar to
those of a windowed Fourier transform (4.74) but in this case the scale
s varies over R + while = =s:
Wf (u s) = hf u si = hf gs u i exp(i u) :
Theorem 4.5 computes hf gs u i when f (t) = a(t) cos (t), which
gives
p
^
Wf (u s) = 2s a(u) exp i (u)] g(s ; 0(u)]) + (u ) : (4.100)
The corrective term (u ) is negligible if a(t) and 0 (t) have small
variations over the support of u s and if 0(u)
!=s. Wavelet Ridges The instantaneous frequency is measured from ridges
de ned over the wavelet transform. The normalized scalogram de ned
by
jWf (u s)j2
PW f (u ) =
for = =s
s is calculated with (4.100):
2
h
0i
PW f (u ) = 1 a2 (u) g 1 ; (u) + (u ) :
^
4
Since jg(!)j is maximum at ! = 0, if we neglect (u ), this expression
^
shows that the scalogram is maximum at 0
(4.101)
s(u) = (u) = (u) :
The corresponding points (u (u)) are called wavelet ridges. The analytic amplitude is given by
p
2 ;1 PW f (u )
a(u) =
:
(4.102)
jg (0)j
^
The complex phase of Wf (u s) in (4.100) is W (u ) = (u). At ridge
points,
@ W (u ) = 0(u) = :
(4.103)
@u CHAPTER 4. TIME MEETS FREQUENCY 152 When = 0(u), the rst order term a 1 calculated in (4.79) becomes negligible. The corrective term is then dominated by a 2 and
00
00
2 . To simplify their expression we approximate the sup of a and
0 (u),
in the neighborhood of u by their value at u. Since s = = = =
(4.77,4.78) imply that these second order terms become negligible if
ja00 (u)j
j 0 (u)j2 ja(u)j
2 1 and 2 j 00(u)j
j 0 (u)j2 1: (4.104) The presence of 0 in the denominator proves that a0 and 0 must have
slow variations if 0 is small but may vary much more quickly for large
instantaneous frequencies. Multispectral Estimation Suppose that f is a sum of two spectral
components: f (t) = a1 (t) cos 1(t) + a2 (t) cos 2(t):
As in (4.92), we verify that the second instantaneous frequency 02 does
not interfere with the ridge of 01 if the dilated window has a su cient
spectral resolution at the ridge scale s = = = = 01(u): g(sj 01(u) ; 02(u)j)
^ 1: (4.105) Since the bandwidth of g(!) is !, this means that
^
j 01 (u) ; 02 (u)j 0 (u)
1 !: (4.106) Similarly, the rst spectral component does not interfere with the second ridge located at s = = = = 02(u) if
j 01 (u) ; 02 (u)j 0 (u)
2 !: (4.107) To separate spectral lines whose instantaneous frequencies are close,
these conditions prove that the wavelet must have a small octave bandwidth != . The bandwidth ! is a xed constant, which is on the
order of 1. The frequency is a free parameter whose value is chosen 4.4. INSTANTANEOUS FREQUENCY 153 ξ / 2π
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1 u Figure 4.15: Ridges calculated from the scalogram shown in Figure
4.11. Compare with the windowed Fourier ridges in Figure 4.12.
as a trade-o between the time-resolution condition (4.104) and the
frequency bandwidth conditions (4.106) and (4.107).
Figure 4.15 displays the ridges computed from the normalized scalogram and the wavelet phase shown in Figure 4.11. The ridges of the
high frequency transient located at t = 0:87 have oscillations because
of the interferences with the linear chirp above. The frequency separation condition (4.106) is not satis ed. This is also the case in the time
interval 0:35 0:55], where the instantaneous frequencies of the linear
and quadratic chirps are too close. Example 4.15 The instantaneous frequencies of two linear chirps
f (t) = a1 cos(b t2 + c t) + a2 cos(b t2 )
are not well measured by wavelet ridges. Indeed
j 02 (u) ; 01 (u)j
=c
0 (u)
bt
1
converges to zero when t increases. When it is smaller than != the
two chirps interact and create interference patterns like those in Figure
4.16. The ridges follow these interferences and do not estimate properly
the two instantaneous frequencies, as opposed to the windowed Fourier
ridges shown in Figure 4.13. CHAPTER 4. TIME MEETS FREQUENCY 154 ξ / 2π
400
300
200
100
0
0 0.2 0.4 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 u ξ / 2π
400
300
200
100
0
0 u (b)
Figure 4.16: (a): Normalized scalogram ;1 PW f (u ) of two parallel
linear chirps shown in Figure 4.13. (b): Wavelet ridges. 4.4. INSTANTANEOUS FREQUENCY 155 ξ / 2π
400
300
200
100
0
0 0.2 0.4 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 u ξ / 2π
400
300
200
100
0
0 u (b)
Figure 4.17: (a): Normalized scalogram ;1 PW f (u ) of two hyperbolic chirps shown in Figure 4.14. (b): Wavelet ridges. 156 CHAPTER 4. TIME MEETS FREQUENCY Example 4.16 The instantaneous frequency of a hyperbolic chirp
f (t) = cos ;t is 0(t) = (1 ; t);2. Wavelet ridges can measure this instantaneous
frequency if the time resolution condition (4.104) is satis ed:
2 0 (t)2
j 00 (t)j = jt ; j : This is the case if jt ; j is not too large.
Figure 4.17 displays the scalogram and the ridges of two hyperbolic
chirps
1
2
f (t) = a1 cos
+ a2 cos
1;t
2;t
with 1 = 0:68 and 2 = 0:72. As opposed to the windowed Fourier
ridges shown in Figure 4.14, the wavelet ridges follow the rapid time
modi cation of both instantaneous frequencies. This is particularly
useful in analyzing the returns of hyperbolic chirps emitted by radars
or sonars. Several techniques have been developed to detect chirps with
wavelet ridges in presence of noise 117, 328]. 4.5 Quadratic Time-Frequency Energy 1
The wavelet and windowed Fourier transforms are computed by correlating the signal with families of time-frequency atoms. The time and
frequency resolution of these transforms is thus limited by the timefrequency resolution of the corresponding atoms. Ideally, one would
like to de ne a density of energy in a time-frequency plane, with no
loss of resolution.
The Wigner-Ville distribution is a time-frequency energy density
computed by correlating f with a time and frequency translation of itself. Despite its remarkable properties, the application of Wigner-Ville
distributions is limited by the existence of interference terms. These
interferences can be attenuated by a time-frequency averaging, but this 4.5. QUADRATIC TIME-FREQUENCY ENERGY 157 results in a loss of resolution. It is proved that the spectrogram, the
scalogram and all squared time-frequency decompositions can be written as a time-frequency averaging of the Wigner-Ville distribution. 4.5.1 Wigner-Ville Distribution To analyze time-frequency structures, in 1948 Ville 342] introduced in
signal processing a quadratic form that had been studied by Wigner
351] in a 1932 article on quantum thermodynamics:
Z +1 (4.108)
f u + 2 f u ; 2 e;i d :
;1
The Wigner-Ville distribution remains real because it is the Fourier
transform of f (u + =2)f (u ; =2), which has a Hermitian symmetry in
. Time and frequency have a symmetrical role. This distribution can
also be rewritten as a frequency integration by applying the Parseval
formula:
1 Z +1 f^ + f^ ; ei u d :
PV f (u ) = 2
(4.109)
2
2
;1 PV f (u ) = Time-Frequency Support The Wigner-Ville transform localizes the time-frequency structures of f . If the energy of f is well concentrated
in time around u0 and in frequency around 0 then PV f has its energy centered at (u0 0), with a spread equal to the time and frequency
spread of f . This property is illustrated by the following proposition,
which relates the time and frequency support of PV f to the support of
f and f^.
Proposition 4.2
If the support of f is u0 ; T=2 u0 + T=2], then
for all the support in u of PV f (u ) is included in this interval.
If the support of f^ is 0 ; =2 0 + =2], then for all u the support
in of PV f (u ) is included in this interval.
Proof 2 . Let f (t) = f (;t). The Wigner-Ville distribution is rewritten
Z +1 + 2u
; 2u e;i d :
f
P f (u ) =
f
(4.110)
V ;1 2 2 CHAPTER 4. TIME MEETS FREQUENCY 158 Suppose that f has a support equal to u0 ; T=2 u0 + T=2]. The supports
of f ( =2 + u) and f ( =2 ; u) are then respectively
2(u0 ; u) ; T 2(u0 ; u) + T ] and ;2(u0 + u) ; T ;2(u0 + u) + T ]:
The Wigner-Ville integral (4.110) shows that PV f (u ) is non-zero if
these two intervals overlap, which is the case only if ju0 ; uj < T . The
support of PV f (u ) along u is therefore included in the support of f . If
the support of f^ is an interval, then the same derivation based on (4.109)
shows that the support of PV f (u ) along is included in the support
of f^. Example 4.17 Proposition 4.2 proves that the Wigner-Ville distribution does not spread the time or frequency support of Diracs or
sinusoids, unlike windowed Fourier and wavelet transforms. Direct calculations yield
f (t) = (u ; u0) =) PV f (u ) = (t ; u0)
(4.111)
1 ( ; ) : (4.112)
f (t) = exp(i 0t) =) PV f (u ) = 2
0 Example 4.18 If f is a smooth and symmetric window then its Wigner-Ville distribution PV f (u ) is concentrated in a neighborhood
of u = = 0. A Gaussian f (t) = ( 2 );1=4 exp(;t2 =(2 2)) is transformed into a two-dimensional Gaussian because its Fourier transform
is also a Gaussian (2.32) and one can verify that
2
(4.113)
P f (u ) = 1 exp ;u ; 2 2 :
V 2 ^
In this particular case PV f (u ) = jf (u)j2jf ( )j2.
The Wigner-Ville distribution has important invariance properties.
A phase shift does not modify its value:
g(t) = ei g(t) =) PV f (u ) = PV g(u ) :
(4.114)
When f is translated in time or frequency, its Wigner-Ville transform
is also translated:
f (t) = g(t ; u0) =) PV f (u ) = PV g(u ; u0 ) (4.115)
f (t) = exp(i 0t)g(t) =) PV f (u ) = PV g(u ; 0) :(4.116) 4.5. QUADRATIC TIME-FREQUENCY ENERGY 159 If f is scaled by s and thus f^ is scaled by 1=s then the time and
frequency parameters of PV f are also scaled respectively by s and 1=s
1t
(4.117)
f (t) = ps g s =) PV f (u ) = PV f u s :
s Example 4.19 If g is a smooth and symmetric window then PV g(u ) has its energy concentrated in the neighborhood of (0 0). The timefrequency atom
a
f0(t) = ps exp(i 0) f t ; u0 exp(i 0t) :
s
has a Wigner-Ville distribution that is calculated with (4.114), (4.115)
and (4.116):
PV f0(u ) = jaj2 PV g u ; u0 s( ; 0) :
(4.118)
s
Its energy is thus concentrated in the neighborhood of (u0 0), on an
ellipse whose axes are proportional to s in time and 1=s in frequency. Instantaneous Frequency Ville's original motivation for studying time-frequency decompositions was to compute the instantaneous frequency of a signal 342]. Let fa be the analytic part of f obtained in
(4.69) by setting to zero f^(!) for ! < 0. We write fa (t) = a(t) exp i (t)]
to de ne the instantaneous frequency !(t) = 0(t). The following
proposition proves that 0(t) is the \average" frequency computed relative to the Wigner-Ville distribution PV fa. Proposition 4.3 If fa (t) = a(t) exp i (t)] then
R +1
;1 PV fa (u ) d
0 (u) = R
+1
;1 PV fa (u )d : (4.119) Proof 2 . To prove this result, we verify that any function g satis es ZZ h
i
g u+ 2 g u; 2 exp(;i ) d d = ; i g0 (u) g (u);g(u) g 0 (u) :
(4.120) CHAPTER 4. TIME MEETS FREQUENCY 160 This identity is obtained by observing that the Fourier transform of
i is the derivative of a Dirac, which gives an equality in the sense of
distributions:
Z +1
exp(;i ) d = ;i 2 0 ( ):
;1
0 ( ) h( ) = ;h0 (0), inserting h( ) = g(u +
;1 R
Since +1 =2) g (u ; =2) proves (4.120). If g(u) = fa (u) = a(u) exp i (u)] then (4.120) gives Z +1
;1 PV fa (u ) d = 2 a2 (u) 0(u): R +1
We will see in (4.124) that jfa (u)j2 = ;1 PV fa (u ) d , and since
jfa (u)j2 = a(u)2 we derive (4.119). This proposition shows that for a xed u the mass of PV fa (u ) is typically concentrated in the neighborhood of the instantaneous frequency
= 0(u). For example, a linear chirp f (t) = exp(iat2 ) is transformed
into a Dirac located along the instantaneous frequency = 0(u) = 2au:
PV f (u ) = ( ; 2au):
Similarly, the multiplication of f by a linear chirp exp(iat2 ) makes a
frequency translation of PV f by the instantaneous frequency 2au:
f (t) = exp(iat2 ) g(t) =) PV f (u ) = PV g(u ; 2au) : (4.121) Energy Density The Moyal 275] formula proves that the WignerVille transform is unitary, which implies energy conservation properties. Theorem 4.6 (Moyal) For any f and g in L2(R ) 1 Z Z P f (u ) P g(u ) du d :
f (t) g (t) dt = 2
V
V
;1 Z +1 2 (4.122) Proof 1 . Let us compute the integral I=
= ZZ PV f (u ) PV g(u ) du d ZZZZ 0 0 f u+ 2 f u; 2 g u+ 2 g u; 2
exp ;i ( + 0 )] d d 0 du d : 4.5. QUADRATIC TIME-FREQUENCY ENERGY 161 ^
The Fourier transform of h(t) = 1 is h(!) = 2 (!), which means that
R exp ;i ( + 0 )]d = 2 ( + 0). As a
we have a distribution equality
result, I=2 ZZZ
0
0
f u+ 2 f u; 2 g u+ 2 g u; 2
ZZ ( + 0 ) d d 0 du f u + 2 f u ; 2 g u ; 2 g u + 2 d du:
The change of variable t = u + =2 and t0 = u ; =2 yields (4.122).
One can consider jf (t)j2 and jf^(!)j2=(2 ) as energy densities in time
and frequency that satisfy a conservation equation:
Z +1
1 Z +1 jf^(!)j2 d!:
2
2
kf k =
jf (t)j dt =
2 ;1
;1
The following proposition shows that these time and frequency densities are recovered with marginal integrals over the Wigner-Ville distribution.
Proposition 4.4 For any f 2 L2(R )
=2 Z +1 and ;1 PV f (u ) du = jf^( )j2 1 Z +1 P f (u ) d = jf (u)j2:
2 ;1 V (4.123)
(4.124) Proof 1 . The frequency integral (4.109) proves that the one-dimensional
Fourier transform of g (u) = PV f (u ) with respect to u is
g ( ) = f^ + 2 f^ ; 2 :
^
We derive (4.123) from the fact that is g (0) =
^ Z +1
;1 g (u) du: Similarly, (4.108) shows that PV f (u ) is the one-dimensional Fourier
transform of f (u + =2)f (u ; =2) with respect to , where is the
Fourier variable. Its integral in thus gives the value for = 0, which
is the identity (4.124). CHAPTER 4. TIME MEETS FREQUENCY 162 This proposition suggests interpreting the Wigner-Ville distribution as
a joint time-frequency energy density. However, the Wigner-Ville distribution misses one fundamental property of an energy density: positivity. Let us compute for example the Wigner-Ville distribution of
f = 1 ;T T ] with the integral (4.108): PV f (u ) = 2 sin 2(T ; juj) 1 ;T T ](u): It is an oscillating function that takes negative values. In fact, one can
prove that translated and frequency modulated Gaussians are the only
functions whose Wigner-Ville distributions remain positive. As we will
see in the next section, to obtain positive energy distributions for all
signals, it is necessary to average the Wigner-Ville transform and thus
lose some time-frequency resolution. 4.5.2 Interferences and Positivity At this point, the Wigner-Ville distribution may seem to be an ideal
tool for analyzing the time-frequency structures of a signal. This is
however not the case because of interferences created by the quadratic
properties of this transform. These interferences can be removed by
averaging the Wigner-Ville distribution with appropriate kernels which
yield positive time-frequency densities. However, this reduces the timefrequency resolution. Spectrograms and scalograms are examples of
positive quadratic distributions obtained by smoothing the WignerVille distribution. Cross Terms Let f = f1 + f2 be a composite signal. Since the Wigner-Ville distribution is a quadratic form, PV f = PV f1 + PV f2 + PV f1 f2 ] + PV f2 f1] (4.125) where PV h g] is the cross Wigner-Ville distribution of two signals PV h g](u ) = Z +1
;1 h u + 2 g u ; 2 e;i d : (4.126) 4.5. QUADRATIC TIME-FREQUENCY ENERGY 163 The interference term
I f1 f2] = PV f1 f2] + PV f2 f1]
is a real function that creates non-zero values at unexpected locations
of the (u ) plane.
Let us consider two time-frequency atoms de ned by
f1 (t) = a1 ei 1 g(t ; u1) ei 1 t and f2(t) = a2 ei 2 g(t ; u2) ei 2t
where g is a time window centered at t = 0. Their Wigner-Ville distributions computed in (4.118) are
PV f1(u ) = a2 PV g(u;u1 ; 1) and PV f2(u ) = a2 PV g(u;u2 ; 2):
1
2
Since the energy of PV g is centered at (0 0), the energy of PV f1 and
PV f2 is concentrated in the neighborhoods of (u1 1) and (u2 2) respectively. A direct calculation veri es that the interference term is
I f1 f2 ](u ) = 2a1a2 PV g(u;u0 ; 0) cos (u;u0) ;( ; 0) u+
with
1+ 2
u0 = u1 + u2
0=
2
2
u = u1 ; u2
= 1; 2
= 1 ; 2 + u0 :
It is an oscillatory waveform centered at the middle point (u0 0). This
is quite counter-intuitive since f and f^ have very little energy in the
neighborhood of u0 and 0. The frequency of the oscillations is proporp2
tional to the Euclidean distance
+ u2 of (u1 1) and (u2 2).
The direction of these oscillations is perpendicular to the line that joins
(u1 1) and (u2 2). Figure 4.18 displays the Wigner-Ville distribution
of two atoms obtained with a Gaussian window g. The oscillating interference appears at the middle time-frequency point.
This example shows that the interference I f1 f2](u ) has some
energy in regions where jf (u)j2 0 and jf^( )j2 0. These interferences
can have a complicated structure 26, 211] but they are necessarily
oscillatory because the marginal integrals (4.123) and (4.124) vanish:
Z +1
;1 PV f (u )d = 2 jf (u)j 2 Z +1
;1 PV f (u )du = jf^( )j2: CHAPTER 4. TIME MEETS FREQUENCY 164
f(t)
2
0
−2
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
100 50 0
0 u Figure 4.18: Wigner-Ville distribution PV f (u ) of two Gabor atoms
shown at the top. The oscillating interferences are centered at the
middle time-frequency location. Analytic Part Interference terms also exist in a real signal f with a single instantaneous frequency component. Let fa (t) = a(t) exp i (t)]
be its analytic part: f = Real fa ] = 1 (fa + fa ):
2
Proposition 4.3 proves that for xed u, PV fa (u ) and PV fa (u ) have
an energy concentrated respectively in the neighborhood of 1 = 0(u)
and 2 = ; 0 (u). Both components create an interference term at the
intermediate zero frequency 0 = ( 1 + 2)=2 = 0. To avoid this low
frequency interference, we often compute PV fa as opposed to PV f .
Figure 4.19 displays PV fa for a real signal f that includes a linear
chirp, a quadratic chirp and two isolated time-frequency atoms. The
linear and quadratic chirps are localized along narrow time frequency
lines, which are spread on wider bands by the scalogram and the scalogram shown in Figure 4.3 and 4.11. However, the interference terms
create complex oscillatory patterns that make it di cult to detect the
existence of the two time-frequency transients at t = 0:5 and t = 0:87,
which clearly appear in the spectrogram and the scalogram. 4.5. QUADRATIC TIME-FREQUENCY ENERGY 165 f(t)
2
0
−2
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
500
400
300
200
100
0
0 u Figure 4.19: The bottom displays the Wigner-Ville distribution
PV fa (u ) of the analytic part of the top signal. Positivity Since the interference terms include positive and negative
oscillations, they can be partly removed by smoothing PV f with a kernel :
P f (u ) = Z +1 Z +1
;1 ;1 PV f (u0 0) (u u0 0) du0 d 0: (4.127) The time-frequency resolution of this distribution depends on the spread
of the kernel in the neighborhood of (u ). Since the interferences
take negative values, one can guarantee that all interferences are removed by imposing that this time-frequency distribution remain positive P f (u ) 0 for all (u ) 2 R 2 .
The spectrogram (4.12) and scalogram (4.55) are examples of positive time-frequency energy distributions. In general, let us consider a
family of time-frequency atoms f g 2; . Suppose that for any (u )
there exists a unique atom (u ) centered in time-frequency at (u ).
The resulting time-frequency energy density is Pf (u ) = jhf 2
(u ) ij : The Moyal formula (4.122) proves that this energy density can be writ- CHAPTER 4. TIME MEETS FREQUENCY 166 ten as a time-frequency averaging of the Wigner-Ville distribution
ZZ
1
PV f (u0 0) PV (u )(u0 0) du0 d 0:
(4.128)
Pf (u ) = 2
The smoothing kernel is the Wigner-Ville distribution of the atoms
(u u0 0) = 21 PV (u )(u0 0):
The loss of time-frequency resolution depends on the spread of the
distribution PV (u ) (u0 0) in the neighborhood of (u v). Example 4.20 A spectrogram is computed with windowed Fourier
atoms (t) = g(t ; u) ei t:
The Wigner-Ville distribution calculated in (4.118) yields
(u u0 0) = 1 PV (u )(u0 0) = 1 PV g(u0 ; u 0 ; ): (4.129)
2
2
For a spectrogram, the Wigner-Ville averaging (4.128) is therefore a
two-dimensional convolution with PV g. If g is a Gaussian window, then
PV g is a two-dimensional Gaussian. This proves that averaging PV f
with a su ciently wide Gaussian de nes a positive energy density. The
general class of time-frequency distributions obtained by convolving
PV f with an arbitrary kernel is studied in Section 4.5.3.
(u ) Example 4.21 Let be an analytic wavelet whose center frequency is
. The wavelet atom u s(t) = s;1=2 ((t ; u)=s) is centered at (u =
=s) and the scalogram is de ned by PW f (u ) = jhf 2 u sij for = =s: Properties (4.115,4.117) prove that the averaging kernel is
(u u0 0) = 1P
2V u0 ; u s
s 0 = 1 PV
2 (u0 ; u) 0 : 4.5. QUADRATIC TIME-FREQUENCY ENERGY 167 Positive time-frequency distributions totally remove the interference
terms but produce a loss of resolution. This is emphasized by the
following theorem, due to Wigner 352]. Theorem 4.7 (Wigner) There is no positive quadratic energy distribution Pf that satis es
Z +1
;1 Pf (u ) d = 2 jf (u)j and
2 Z +1
;1 Pf (u ) du = jf^( )j2:
(4.130) Proof 2 . Suppose that Pf is a positive quadratic distribution that satis es these marginals. Since Pf (u ) 0, the integrals (4.130) imply
that if the support of f is included in an interval I then Pf (u ) = 0 for
u 2 I . We can associate to the quadratic form Pf a bilinear distribution
=
de ned for any f and g by
P f g] = 1 P (f + g) ; P (f ; g) :
4
Let f1 and f2 be two non-zero signals whose supports are two intervals
I1 and I2 that do not intersect, so that f1 f2 = 0. Let f = a f1 + b f2: Pf = jaj2 Pf1 + ab P f1 f2 ] + a b P f2 f1 ] + jbj2 Pf2 :
Since I1 does not intersect I2 , Pf1 (u ) = 0 for u 2 I2 . Remember that Pf (u ) 0 for all a and b so necessarily P f1 f2 ](u ) =
P f2 f1 ](u ) = 0 for u 2 I2 . Similarly we prove that these cross terms
are zero for u 2 I1 and hence
Pf (u ) = jaj2 Pf1 (u ) + jbj2 Pf2 (u ):
Integrating this equation and inserting (4.130) yields
jf^( )j2 = jaj2 jf^1( )j2 + jbj2 jf^2 ( )j2 :
^
Since f^( ) = a f1 ( ) + b f^2 ( ) it follows that f^1 ( ) f^2 ( ) = 0. But this
is not possible because f1 and f2 have a compact support in time and
^
Theorem 2.6 proves that f^1 and f2 are C1 functions that cannot vanish
on a whole interval. We thus conclude that one cannot construct a
positive quadratic distribution Pf that satis es the marginals (4.130). CHAPTER 4. TIME MEETS FREQUENCY 168 4.5.3 Cohen's Class 2 While attenuating the interference terms with a smoothing kernel ,
we may want to retain certain important properties. Cohen 135] introduced a general class of quadratic time-frequency distributions that
satisfy the time translation and frequency modulation invariance properties (4.115) and (4.116). If a signal is translated in time or frequency, its energy distribution is just translated by the corresponding amount. This was the beginning of a systematic study of quadratic
time-frequency distributions obtained as a weighted average of a WignerVille distribution 10, 26, 136, 210].
Section 2.1 proves that linear translation invariant operators are
convolution products. The translation invariance properties (4.115,4.116)
are thus equivalent to imposing that the smoothing kernel in (4.127)
be a convolution kernel
(u u0 0) = (u ; u0 ; 0)
(4.131)
and hence P f (u ) = PV f ? (u ) = ZZ (u ; u0 ; 0 ) PV f (u0 0 ) du0 d 0 : (4.132)
The spectrogram is an example of Cohen's class distribution, whose
kernel in (4.129) is the Wigner-Ville distribution of the window
1 P g(u ) = 1 Z +1 g u + g u ; e;i d :
(u ) = 2 V
2 ;1
2
2
(4.133) Ambiguity Function The properties of the convolution (4.132) are more easily studied by calculating the two-dimensional Fourier transform of PV f (u ) with respect to u and . We denote by Af ( ) this
Fourier transform Af ( )= Z +1 Z +1
;1 ;1 PV f (u ) exp ;i(u + )] du d : Note that the Fourier variables and are inverted with respect to the
usual Fourier notation. Since the one-dimensional Fourier transform of 4.5. QUADRATIC TIME-FREQUENCY ENERGY 169 PV f (u ) with respect to u is f^( + =2) f^ ( ; =2), applying the
one-dimensional Fourier transform with respect to gives
Z +1 f^ + 2 f^
;1
The Parseval formula yields
Af ( )= ; ;i
2 e d: (4.134) Z +1 (4.135)
f u + 2 f u ; 2 e;i u du:
;1
We recognize the ambiguity function encountered in (4.24) when studying the time-frequency resolution of a windowed Fourier transform. It
measures the energy concentration of f in time and in frequency. Af ( )= Kernel Properties The Fourier transform of (u ) is
^( )= Z +1 Z + 1
;1 ;1 (u ) exp ;i(u + )] du d : As in the de nition of the ambiguity function (4.134), the Fourier parameters and of ^ are inverted. The following proposition gives necessary and su cient conditions to ensure that P satis es marginal energy properties like those of the Wigner-Ville distribution. The Wigner
Theorem 4.7 proves that in this case P f (u ) takes negative values. Proposition 4.5 For all f 2 L2 (R)
Z +1
;1 P f (u ) d = 2 jf (u)j if and only if
8( ) 2 R2 2 Z +1
;1 P f (u ) du = jf^( )j2
(4.136) ^( 0) = ^(0 ) = 1: (4.137) Proof 2 . Let A f ( ) be the two-dimensional Fourier transform of P f (u ).
The Fourier integral at (0 ) gives Z +1 Z +1
;1 ;1 P f (u ) e;iu d du = A f (0 ): (4.138) CHAPTER 4. TIME MEETS FREQUENCY 170 Since the ambiguity function Af ( ) is the Fourier transform of PV f (u ),
the two-dimensional convolution (4.132) gives
A ( ) = Af ( ) ^( ):
(4.139)
^
^
The Fourier transform of 2 jf (u)j2 is f^? f^( ), with f ( ) = f (; ). The
relation (4.138) shows that (4.136) is satis ed if and only if A f (0 ) = Af (0 ) ^(0 ) = f^ ? f^( ): (4.140) Since PV f satis es the marginal property (4.123), we similarly prove
that
Af (0 ) = f^ ? f^( ):
^
Requiring that (4.140) be valid for any f ( ), is equivalent to requiring
^(0 ) = 1 for all 2 R.
that
The same derivation applied to the other marginal integration yields
^( 0) = 1. In addition to requiring time-frequency translation invariance, it may
be useful to guarantee that P satis es the same scaling property as a
Wigner-Ville distribution:
1
t
g(t) = ps f s =) P g(u ) = P f u s :
s
Such a distribution P is a ne invariant. One can verify (Problem
4.15) that a ne invariance is equivalent to imposing that
8s 2 R + and hence s u s = (u ) (4.141) (u ) = (u 1) = (u ): Example 4.22 The Rihaczek distribution is an a ne invariant distribution whose convolution kernel is
^( ) = exp i 2 : (4.142) 4.5. QUADRATIC TIME-FREQUENCY ENERGY 171 f(t)
2
0
−2
0
ξ / 2π 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t 100 50 0
0 u Figure 4.20: Choi-William distribution P f (u ) of the two Gabor
atoms shown at the top. The interference term that appears in the
Wigner-Ville distribution of Figure 4.18 has nearly disappeared. f(t)
2
0
−2
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ξ / 2π
500
400
300
200
100
0
0 u Figure 4.21: Choi-William distribution P fa(u ) of the analytic part
of the signal shown at the top. The interferences remain visible. CHAPTER 4. TIME MEETS FREQUENCY 172 A direct calculation shows that
P f (u ) = f (u) f^ ( ) exp(;iu ): (4.143) Example 4.23 The kernel of the Choi-William distribution is 122] ^( ) = exp(; 2 2 2) :
(4.144)
It is symmetric and thus corresponds to a real function (u ). This distribution satis es the marginal conditions (4.137). Since lim !0 ^( ) =
1, when is small the Choi-William distribution is close to a WignerVille distribution. Increasing attenuates the interference terms, but
spreads (u ), which reduces the time-frequency resolution of the distribution.
Figure 4.20 shows that the interference terms of two modulated
Gaussians nearly disappear when the Wigner-Ville distribution of Figure 4.18 is averaged by a Choi-William kernel having a su ciently large
. Figure 4.21 gives the Choi-William distribution of the analytic signal whose Wigner-Ville distribution is in Figure 4.19. The energy of
the linear and quadratic chirps are spread over wider time-frequency
bands but the interference terms are attenuated, although not totally
removed. It remains di cult to isolate the two modulated Gaussians
at t = 0:5 and t = 0:87, which clearly appear in the spectrogram of
Figure 4.3. 4.5.4 Discrete Wigner-Ville Computations 2 The Wigner integral (4.108) is the Fourier transform of f (u+ =2)f (u;
=2):
Z +1 f u + 2 f u ; 2 e;i d :
(4.145)
;1
For a discrete signal f n] de ned over 0 n < N , the integral is
replaced by a discrete sum:
N ;1 h
i
X
pi h
f n + 2 f n ; p exp ;i2 kp : (4.146)
PV f n k] =
2
N
p=;N
PV f (u ) = 4.5. QUADRATIC TIME-FREQUENCY ENERGY 173 When p is odd, this calculation requires knowing the value of f at
half integers. These values are computed by interpolating f , with an
addition of zeroes to its Fourier transform. This is necessary to avoid
the aliasing produced by the discretization of the Wigner-Ville integral
126].
The interpolation f~ of f is a signal of size 2N whose discrete Fourier
b
transform f~ is de ned from the discrete Fourier transform f^ of f by
8
> 2f k]
if 0 k < N=2
>^
>
>
<0
if N=2 < k < 3N=2
b
:
f~ k] = > ^
> 2f k ; N ] if 3N=2 < k < 2N
>
>^
: f N=2]
if k = N=2 3N=2
Computing the inverse discrete Fourier transform shows that f~ 2n] =
f n] for n 2 0 N ; 1]. When n 2 0 2N ; 1], we set f~ n] = 0. The
=
Wigner summation (4.146) is calculated from f~:
N ;1
X
f~ 2n + p] f~ 2n ; p] exp ;i2 kp
PV f n k ] =
N
p=;N
= 2N ;1 X
p=0 f~ 2n + p ; N ] f~ 2n ; p + N ] exp ; i2 (2k)p :
2N For 0 n < N xed, PV f n k] is the discrete Fourier transform of
size 2N of g p] = f~ 2n + p ; N ] f~ 2n ; p + N ] at the frequency 2k.
The discrete Wigner-Ville distribution is thus calculated with N FFT
procedures of size 2N , which requires O(N 2 log N ) operations. To compute the Wigner-Ville distribution of the analytic part fa of f , we use
(4.48). Cohen's Class A Cohen's class distribution is calculated with a circular convolution of the discrete Wigner-Ville distribution with a kernel
p q]:
P n k] = PV ? n k]:
(4.147)
Its two-dimensional discrete Fourier transform is therefore
A p q] = Af p q] ^ p q]:
(4.148) CHAPTER 4. TIME MEETS FREQUENCY 174 The signal Af p q] is the discrete ambiguity function, calculated with a
two-dimensional FFT of the discrete Wigner-Ville distribution PV f n k].
As in the case of continuous time, we have inverted the index p and q of
the usual two-dimensional Fourier transform. The Cohen's class distribution (4.147) is obtained by calculating the inverse Fourier transform
of (4.148). This also requires a total of O(N 2 log N ) operations. 4.6 Problems
4.1. Instantaneous frequency Let f (t) = exp i (t)].
R +1
(a) Prove that ;1 jSf (u )j2 d = 2 . Hint: Sf (u ) is a Fourier
transform use the Parseval formula.
(b) Similarly, show that
1 Z +1
;1 4.2.
4.3.
4.4. 4.5. jSf (u )j d = 2
2 Z +1
;1 0 (t) jg(t ; u)j2 dt and interpret this result.
Write a reproducing kernel equation for the discrete windowed
Fourier transform Sf m l] de ned in (4.27).
1 When g (t) = ( 2 );1=4 exp(;t2 =(2 2 )), compute the ambiguity
function Ag( ).
1 Let g n] be a window with L non-zero coe cients. For signals
of size N , describe a fast algorithm that computes the discrete
windowed Fourier transform (4.27) with O(N log2 L) operations.
Implement this algorithm in WaveLab. Hint: Use a fast overlapadd convolution algorithm.
1 Let K be the reproducing kernel (4.21) of a windowed Fourier
transform.
(a) For any 2 L2 (R2 ) we de ne:
Z +1 Z +1
T (u0 0 ) = 21
(u ) K (u0 u 0 ) du d :
;1 ;1
Prove that T is an orthogonal projector on the space of functions (u ) that are windowed Fourier transforms of functions in L2 (R).
~
(b) Suppose that for all (u ) 2 R2 we are given Sf (u ) =
Q S f (u ) , which is a quantization of the windowed Fourier
1 4.6. PROBLEMS 175 coe cients. How can we reduce the norm L2(R2 ) of the quanti cation error (u ) = Sf (u ) ; Q S f (u ) ?
4.6. 1 Prove that a scaling function de ned by (4.42) satis es k k =
1.
R
4.7. 2 Let be a real and even wavelet such that C = 0+1 !;1 ^(!) d! <
+1. Prove that
1 Z +1 Wf (t s) ds :
2 (R )
8f 2 L
f (t) =
(4.149) C0
s3=2
4.8. 1 Analytic Continuation Let f 2 L2 (R) be a function such that
f^(!) = 0 for ! < 0. For any complex z 2 C such that Im(z ) 0,
we de ne 1 Z +1 (i!)p f^(!) eiz! d! :
f (z ) =
(p) 0 (a) Verify that if f is Cp then f (p) (t) is the derivative of order p
of f (t).
(b) Prove that if Im(z ) > 0, then f (p) (z ) is di erentiable relative
to the complex variable z . Such a function is said to be analytic
on the upper half complex plane.
(c) Prove that this analytic extension can be written as a wavelet
transform
f (p)(x + iy) = y;p;1=2 Wf (x y)
calculated with an analytic wavelet that you will specify.
1 Let f (t) = cos(a cos bt). We want to compute precisely the in4.9.
stantaneous frequency of f from the ridges of its windowed Fourier
transform. Find a necessary condition on the window support as a
function of a and b. If f (t) = cos(a cos bt)+cos(a cos bt + ct), nd a
condition on a, b and c in order to measure both instantaneous frequencies with the ridges of a windowed Fourier transform. Verify
your calculations with a numerical implementation in WaveLab.
4.10. 1 Sound manipulation
(a) Make a program that synthesizes sounds with the model (4.71)
where the amplitudes ak and phase k are calculated from
the ridges of a windowed Fourier transform or of a wavelet
transform. Test your results on the Tweet and Greasy signals
in WaveLab. CHAPTER 4. TIME MEETS FREQUENCY 176 (b) Make a program that modi es the sound duration with the
formula (4.72) or which transposes the sound frequency with
(4.73).
1 Prove that Pf (u ) = kf k;2 jf (u)j2 jf ( )j2 satis es the marginal
^
4.11.
properties (4.123,4.124). Why can't we apply the Wigner Theorem
4.7?
4.12. 1 Let g be a Gaussian of variance 2 . Prove that P f (u ) =
PV f ? (u ) is a positive distribution if (u ) = g (u) g ( ) with
1=2. Hint: consider a spectrogram calculated with a Gaussian window.
4.13. 2 Let fgn (t)gn2N be an orthonormal basis of L2 (R). Prove that 8(t !) 2 R
4.14. n=0 PV gn (t !) = 1 : Let fa (t) = a(t) exp i (t)] be the analytic part of f (t). Prove
that
Z +1
2
2
; 0 (t) P f (t ) d = ; a2(t) d log a(t) :
2 Va ;1 4.15. +1
X 2 dt2 Quadratic a ne time-frequency distributions satisfy time shift
(4.115), scaling invariance (4.117), and phase invariance (4.114).
Prove that any such distribution can be written as an a ne smoothing of the Wigner-Ville distribution
2 P (u ) = Z +1 Z +1
;1 ;1 (u ; ) PV ( )d d (4.150) where (a b) depends upon dimensionless variables.
4.16. 3 To avoid the time-frequency resolution limitations of a windowed Fourier transform, we want to adapt the window size to the
signal content. Let g(t) be a window of variance 1. We denote
by Sj f (u ) the windowed Fourier transform calculated with the
dilated window gj (t) = 2;j=2 g(2;j t). Find a procedure that computes a single map of ridges by choosing a \best" window size at
each (u ). One approach is to choose the scale 2l for each (u )
such that jSl f (u )j2 = supj jSj f (u )j2 . Test your algorithm on
the linear and hyperbolic chirp signals (4.95,4.99). Test it on the
Tweet and Greasy signals in WaveLab. 4.6. PROBLEMS
4.17. 177 The sinusoidal model (4.71) is improved for speech signals by
adding a \noise component" B (t) to the partials 245]:
3 F (t) = K
X
k=1 ak (t) cos k (t) + B (t): (4.151) Given a signal f (t) that is considered to be a realization of F (t),
compute the ridges of a windowed Fourier transform, nd the
\main" partials and compute their amplitude ak and phase k .
These partials are subtracted from the signal. Over intervals of
xed size, the residue is modeled as the realization of an autoregressive process B (t), of order 10 to 15. Use a standard algorithm to compute the parameters of this autoregressive process
60]. Evaluate the audio quality of the sound restored from the
calculated model (4.151). Study an application to audio compression by quantizing and coding the parameters of the model. 178 CHAPTER 4. TIME MEETS FREQUENCY Chapter 5
Frames
Frame theory analyzes the completeness, stability and redundancy of
linear discrete signal representations. A frame is a family of vectors f ngn2; that characterizes any signal f from its inner products
fhf nign2; . Signal reconstructions from regular and irregular samplings are examples of applications.
Discrete windowed Fourier transforms and discrete wavelet transforms are studied through the frame formalism. These transforms generate signal representations that are not translation invariant, which
raises di culties for pattern recognition applications. Dyadic wavelet
transforms maintain translation invariance by sampling only the scale
parameter of a continuous wavelet transform. A fast dyadic wavelet
transform is calculated with a lter bank algorithm. In computer vision, dyadic wavelet transforms are used for texture discrimination and
edge detection. 5.1 Frame Theory 2
5.1.1 Frame De nition and Sampling
The frame theory was originally developed by Du n and Schae er 175]
to reconstruct band-limited signals f from irregularly spaced samples
ff (tn )gn2Z. If f has a Fourier transform included in ; =T =T ], we
179 CHAPTER 5. FRAMES 180 prove as in (3.14) that
t=T
1
(5.1)
f (tn) = T hf (t) hT (t ; tn)i with hT (t) = sin(t=T ) :
This motivated Du n and Schae er to establish general conditions
under which one can recover a vector f in a Hilbert space H from
its inner products with a family of vectors f ngn2;. The index set
; might be nite or in nite. The following frame de nition gives an
energy equivalence to invert the operator U de ned by
8n 2 ; Uf n] = hf De nition 5.1 The sequence f n i: (5.2) is a frame of H if there exist
two constants A > 0 and B > 0 such that for any f 2 H A kf k2 X
n2; ngn2; jhf 2 n ij B kf k2: (5.3) When A = B the frame is said to be tight. If the frame condition is satis ed then U is called a frame operator.
Section 5.1.2 proves that (5.3) is a necessary and su cient condition
guaranteeing that U is invertible on its image, with a bounded inverse.
A frame thus de nes a complete and stable signal representation, which
may also be redundant. When the frame vectors are normalized k nk =
1, this redundancy is measured by the frame bounds A and B . If the
f n gn2; are linearly independent then it is proved in (5.23) that A 1 B:
The frame is an orthonormal basis if and only if A = B = 1. This
is veri ed by inserting f = n in (5.3). If A > 1 then the frame is
redundant and A can be interpreted as a minimum redundancy factor. Example 5.1 Let (e1 e2) be an orthonormal basis of a two-dimensional
plane H. The three vectors
1 = e1 p e1 + 3 e
2=;
2 22 p e1 ; 3 e
3=;
2 22 5.1. FRAME THEORY 181 have equal angles of 2 =3 between themselves. For any f 2 H
3
X n=1 jhf 2 n ij 3
= 2 kf k2: These three vectors thus de ne a tight frame with A = B = 3 . The
2
frame bound 3 measures their redundancy in a space of dimension 2.
2 Example 5.2 For any 0 k < K , suppose that fek ngn2Z is an
orthonormal basis of H. The union of these K orthonormal bases fek ngn2Z 0 k<K is a tight frame with A = B = K . Indeed, the energy conservation in an orthonormal basis implies that for any f 2 H,
X
jhf ek n ij2 = kf k2 hence n2Z K ;1 X
X
k=0 n2Z jhf ek n ij2 = K kf k2: Example 5.3 One can verify (Problem 5.8) that a nite set of N
vectors f ng1 n N is always a frame of the space V generated by linear combinations of these vectors. When N increases, the frame bounds A
and B may go respectively to 0 and +1. This illustrates the fact that
in in nite dimensional spaces, a family of vectors may be complete and
not yield a stable signal representation. Irregular Sampling Let UT be the space of L2(R ) functions whose Fourier transforms have a support included in ; =T =T ]. For a uniform sampling, tn = nT , Proposition 3.2 proves that fT ;1=2 hT (t ;
nT )gn2Z is an orthonormal basis of UT . The reconstruction of f from
its samples is then given by the sampling Theorem 3.1.
The irregular sampling conditions of Du n and Schae er 175] for
constructing a frame were later re ned by several researchers 91, 360,
74]. Grochenig proved 197] that if n!+1 tn = +1 and n!;1 tn = ;1,
lim
lim
and if the maximum sampling distance satis es
= sup jtn+1 ; tnj < T
(5.4)
n2Z CHAPTER 5. FRAMES 182
then ) (r tn+1 ; tn;1 h (t ; t )
T
n
2T
n2Z
2
is a frame with frame bounds A (1 ; =T ) and B (1 + =T )2.
The amplitude factor 2;1=2 (tn+1 ; tn;1 )1=2 compensates for the nonuniformity of the density of samples. It attenuates the amplitude of
frame vectors where there is a high density of samples. The reconstruction of f requires inverting the frame operator Uf n] = hf (t) hT (t ;
tn)i. 5.1.2 Pseudo Inverse The reconstruction of f from its frame coe cients Uf n] is calculated
with a pseudo inverse. This pseudo inverse is a bounded operator that
is expressed with a dual frame. We denote l2(;) = fx : kxk2 = X
n2; jx n]j2 < +1g and by ImU the image space of all Uf with f 2 H.
Proposition 5.1 If f ngn2; is a frame whose vectors are linearly dependent, then ImU is strictly included in l2(;), and U admits an in nite number of left inverses U ;1 :
8f 2 H
U ;1 Uf = f:
(5.5)
Proof 2 . The frame inequality (5.3) guarantees that ImU l2 (;) since kUf k2 = X n2; jhf n ij2 B kf k2 : (5.6) Since f n gn2; is linearly dependent, there exists a non-zero vector x 2
l2(;) such that
X
For any f 2 H n2; X
n2; x n] hf x n] ni = n = 0: X
n2; x n] Uf n] = 0: 5.1. FRAME THEORY 183 This proves that ImU is orthogonal to x and hence that ImU 6= l2 (;).
A frame operator U is injective (one to one). Indeed, the frame
inequality (5.3) guarantees that Uf = 0 implies f = 0. Its restriction
to ImU is thus invertible. Let ImU? be the orthogonal complement of
ImU in l2(;). If f ngn2; are linearly dependent then ImU? 6= f0g and
the restriction of U ;1 to ImU? may be any arbitrary linear operator. The more redundant the frame f ngn2;, the larger the orthogonal com~
plement ImU? of the image ImU. The pseudo inverse U ;1 is the left
inverse that is zero on ImU?:
~
8x 2 ImU?
U ;1 x = 0:
~
In in nite dimensional spaces, the pseudo inverse U ;1 of an injective operator is not necessarily bounded. This induces numerical instabilities
when trying to reconstruct f from Uf . The following theorem proves
that a frame operator has a pseudo inverse that is always bounded. We
denote by U the adjoint of U : hUf xi = hf U xi. Theorem 5.1 (Pseudo inverse) The pseudo inverse satis es
~
U ;1 = (U U );1 U : (5.7) It is the left inverse of minimum sup norm. If U is a frame operator
with frame bounds A and B then
1
~
kU ;1 kS p :
(5.8) A ~
Proof 2 . To prove that U ;1 has a minimum sup norm, let us decompose
2 (;) as a sum x = x + x with x 2 ImU? and x 2 ImU.
any x 2 l
1
2
2
1
Let U ;1 be an arbitrary left inverse of U . Then
~
~
kU ;1 xk = kU ;1 x1k = kU ;1 x1k kU ;1x1 k : kxk We thus derive that
~
kU ;1 kS = kxk ~
kU ;1 xk
x2l2 (;);f0g kxk
sup kxk kx1 k kU ;1xk = kU ;1 k :
S
x2l2(;);f0g kxk
sup CHAPTER 5. FRAMES 184 Since x1 2 ImU, there exists f 2 H such that x1 = Uf . The
inequality (5.8) is derived from the frame inequality (5.3) which shows
that
~
kU ;1 xk = kf k p1 kUf k p1 kxk: A A To verify (5.7), we rst prove that the self-adjoint operator U U
is invertible by showing that it is injective and surjective (onto). If
U U f = 0 then hU U f f i = 0 and hence hUf Uf i = 0. Since U is
injective then f = 0, which proves that U U is injective. To prove that
the image of U U is equal to H we prove that no non-zero vector can
be orthogonal to this image. Suppose that g 2 H is orthogonal to the
image of U U . In particular hg U U gi = 0, so hUg Ugi = 0, which
implies that g = 0. This proves that U U is surjective.
Since U U is invertible, proving (5.7) is equivalent to showing that
for any x the pseudo inverse satis es
~
(U U )U ;1 x = U x:
(5.9)
~
~
If x 2 ImU? then (U U )U ;1 x = 0 because U ;1 x = 0, and U x = 0
because
8f 2 H hf U xi = hUf xi = 0:
~
It thus veri es (5.9) for x 2 ImU?. If x 2 ImU, then U U ;1 x = x so
(5.9) remains valid. We thus derive that (5.9) is satis ed for all x 2 H. Dual Frame The pseudo inverse of a frame operator is related to a dual frame family, which is speci ed by the following theorem.
Theorem 5.2 Let f ngn2Z be a frame with bounds A B . The dual
frame de ned by
~n = (U U );1 n :
satis es
1 kf k2 X jhf ~ ij2 1 kf k2
8f 2 H
(5.10)
n
B
A
n2;
and X
~
f = U ;1 Uf = hf
n2; n i ~n = X
n2; hf ~ni n: (5.11) 5.1. FRAME THEORY 185 If the frame is tight (i.e., A = B ), then ~n = A;1 n .
Proof 2 . To prove (5.11), we relate U to f n gn2; and use the expression
~
(5.7) of U ;1 . For any x 2 l2 (;) and f 2 H hU x f i = hx Uf i =
Consequently
which implies that X hU x f i =
U x= n2; X n2; X n2; x n] hf hx n] n ni : fi x n] n : (5.12) The pseudo inverse formula (5.7) proves that
X
~
U ;1 x = (U U );1 U x = (U U );1 x n] n
so ~
U ;1 x = X
n2; n2; x n] ~n : If x n] = Uf n] = hf ni then
X
~
f = U ;1Uf = hf ni ~n:
n2; (5.13)
(5.14) The dual family of vectors f n gn2; and f ~n gn2; play symmetrical roles.
Indeed (5.14) implies that for any f and g in H,
X
hf gi = hf ni h ~n gi
(5.15)
n2; hence g= X
n2; hg ~n i n (5.16) which proves (5.11).
The expression (5.12) of U proves that for x n] = Uf n] = hf n i U Uf = X n2; hf ni n : (5.17) CHAPTER 5. FRAMES 186 The frame condition (5.3) can thus be rewritten A kf k2 hU U f f i B kf k2 :
(5.18)
If A = B then hU U f f i = A kf k2 . Since U U is symmetrical, one can
show that necessarily U U = A Id where Id is the identity operator. It
thus follows that ~n = (U U );1 n = A;1 n .
Similarly (5.10) can be rewritten
1 kf k2 h(U U );1 f f i 1 kf k2 :
(5.19)
because B (U U );1 f = A X
n2; X
hf ~ni (U U );1 n = hf ~ni ~n:
n2; The double inequality (5.19) is derived from (5.18) by applying the following lemma to L = U U . Lemma 5.1 If L is a self-adjoint operator such that there exist A > 0
and B satisfying
8f 2 H A kf k2 hLf f i B kf k2 (5.20) then L is invertible and 12
12
;1
(5.21)
B kf k hL f f i A kf k :
In nite dimensions, since L is self-adjoint we know that it is diag- 8f 2 H onalized in an orthonormal basis. The inequality (5.20) proves that its
eigenvalues are between A and B . It is therefore invertible with eigenvalues between B ;1 and A;1 , which proves (5.21). In in nite dimensions,
the proof is left to the reader. This theorem proves that f ~ngn2; is a dual frame that recovers any
f 2 H from its frame coe cients fhf nign2;. If the frame is tight
then ~n = A;1 n, so the reconstruction formula becomes
1X
f = A hf ni n:
(5.22)
n2; 5.1. FRAME THEORY 187 Biorthogonal Bases A Riesz basis is a frame of vectors that are
linearly independent, which implies that ImU = l2(;). One can derive
from (5.11) that the dual frame f ~ngn2; is also linearly independent.
It is called the dual Riesz basis. Inserting f = p in (5.11) yields
p = X hp n2; ~ni n and the linear independence implies that
h p ~n i = p ; n]:
Dual Riesz bases are thus biorthogonal families of vectors. If the basis
is normalized (i.e., k nk = 1), then A 1 B:
This is proved by inserting f =
1k
B 2
pk X
n2; p jh p (5.23) in the frame inequality (5.10):
~nij2 = 1 1k
A 2 pk : Partial Reconstruction Suppose that f ngn2; is a frame of a subspace V of the whole signal space. The inner products Uf n] = hf ni give partial information on f that does not allow us to fully recover
f . The best linear mean-square approximation of f computed from
these inner products is the orthogonal projection of f on the space V.
This orthogonal projection is computed with the dual frame f ~ngn2;
of f ngn2; in V:
X
~
PV f = U ;1 Uf = hf
n2; ni ~n: (5.24) To prove that PV f is the orthogonal projection in V, we verify that
PV f 2 V and that hf ; PV f pi = 0 for all p 2 ;. Indeed,
hf ; PV f pi = hf pi ; X
n2; hf n i h ~n pi CHAPTER 5. FRAMES 188 and the dual frame property in V implies that
X
h ~n p i n = p :
n2; Suppose we have a nite number of data measures fhf nig0 n<N .
Since a nite family f ng0 n<N is necessarily a frame of the space V
it generates, the approximation formula (5.24) reconstructs the best
linear approximation of f . 5.1.3 Inverse Frame Computations We describe e cient numerical algorithms to recover a signal f from its
frame coe cients Uf n] = hf ni. If possible, the dual frame vectors
are precomputed:
~n = (U U );1 n
and we recover each f with the sum
X
f = hf ni ~n:
n2; In some applications, the frame vectors f ngn2; may depend on the
signal f , in which case the dual frame vectors ~n cannot be computed
in advance. For example, the frame (5.1) associated to an irregular
sampling depends on the position tn of each sample. If the sampling
grid varies from signal to signal it modi es the frame vectors. It is then
highly ine cient to compute the dual frame for each new signal. A
more direct approach applies the pseudo inverse to Uf :
~
f = U ;1 Uf = (U U );1 (U U )f = L;1Lf
(5.25)
where Lf = U U f = X
n2; hf ni n: (5.26) Whether we precompute the dual frame vectors or apply the pseudo
inverse on the frame data, both approaches require an e cient way to
compute f = L;1g for some g 2 H. Theorems 5.3 and 5.4 describe two
iterative algorithms with exponential convergence. The extrapolated 5.1. FRAME THEORY 189 Richardson procedure is simpler but requires knowing the frame bounds
A and B. Conjugate gradient iterations converge more quickly when B
A
is large, and do not require knowing the values of A and B. Theorem 5.3 (Extrapolated Richardson) Let g 2 H. To compute f = L;1 g we initialize f0 = 0. Let > 0 be a relaxation parameter.
For any n > 0, de ne
fn = fn;1 + (g ; Lfn;1):
= max fj1 ; Aj j1 ; B jg < 1 If (5.27)
(5.28) then kf ; fn k n kf k (5.29) and hence n!+1 fn = f .
lim
Proof 2 . The induction equation (5.27) can be rewritten f ; fn = f ; fn;1 ; L(f ; fn;1):
Let R = Id ; L
f ; fn = R(f ; fn;1) = Rn(f ; f0) = Rn(f ): (5.30) We saw in (5.18) that the frame inequality can be rewritten A kf k2 hLf f i B kf k2 :
This implies that R = I ; L satis es jhRf f ij kf k2 where is given by (5.28). Since R is symmetric, this inequality proves
that kRk . We thus derive (5.29) from (5.30). The error kf ; fn k
clearly converges to zero if < 1. 190 CHAPTER 5. FRAMES For frame inversion, the extrapolated Richardson algorithm is sometimes called the frame algorithm 21]. The convergence rate is maximized when is minimum:
= B ; A = 1 ; A=B
B + A 1 + A=B
which corresponds to the relaxation parameter
2
= A + B:
The algorithm converges quickly if A=B is close to 1. If A=B is small
then
A
(5.31)
1 ; 2 B:
The inequality (5.29) proves that we obtain an error smaller than for
a number n of iterations, which satis es:
kf ; fn k
n= :
kf k
Inserting (5.31) gives
B
l
(5.32)
n log (1oge2A=B ) ;A loge :
;
2
e
The number of iterations thus increases proportionally to the frame
bound ratio B=A.
The exact values of A and B are often not known, in which case
the relaxation parameter must be estimated numerically by trial and
error. If an upper bound B0 of B is known then we can choose =
1=B0. The algorithm is guaranteed to converge, but the convergence
rate depends on A.
The conjugate gradient algorithm computes f = L;1 g with a gradient descent along orthogonal directions with respect to the norm induced by the symmetric operator L:
kf k2 = kLf k2 :
(5.33)
L
This L norm is used to estimate the error. Grochenig's 198] implementation of the conjugate gradient algorithm is given by the following
theorem. 5.1. FRAME THEORY 191 Theorem 5.4 (Conjugate gradient) Let g 2 H. To compute f =
L;1 g we initialize f0 = 0 r0 = p0 = g p;1 = 0: (5.34) For any n 0, we de ne by induction
n= hrn pn i
hpn Lpn i fn+1 = fn + n pn
rn+1 = rn ; n Lpn
Lp
Lp Lp
pn+1 = Lpn ; hhp n Lp ni pn ; hhp n Lpn;1ii pn;1:
n
ni
n;1 Lpn;1 p (5.35)
(5.36)
(5.37)
(5.38) p If = pB;pA then
B+ A kf ; fn kL and hence n!+1 fn = f .
lim 2 n kf k
1 + 2n L (5.39) Proof 2 . We give the main steps of the proof as outlined by Grochenig
198].
Step 1: Let Un be the subspace generated by fLj f g1 j n . By induction on n, we derive from (5.38) that pj 2 Un , for j < n.
Step 2: We prove by induction that fpj g0<j<n is an orthogonal basis
of Un with respect to the inner product hf hiL = hf Lhi. Assuming
that hpn Lpj i = 0, for j n ; 1, it can be shown that hpn+1 Lpj i = 0,
for j n.
Step 3: We verify that fn is the orthogonal projection of f onto Un
with respect to h: :iL which means that 8g 2 Un kf ; gkL kf ; fnkL :
Since fn 2 Un , this requires proving that hf ; fn pj iL = 0, for j < n. Step 4: We compute the orthogonal projection of f in embedded
spaces Un of dimension n, and one can verify that limn!+1 kf ; fnkL =
0. The exponential convergence (5.39) is proved in 198]. CHAPTER 5. FRAMES 192 As in the extrapolated Richardson algorithm, the convergence is slower
when A=B is small. In this case
r
p
1 ; pA=B
A
1 ; 2 B:
=
1 + A=B
The upper bound (5.39) proves that we obtain a relative error
kf ; fn kL
kf kL
for a number of iterations
p
loge 2 ; B
p log :
n log
2 A e2
e
Comparing this result with (5.32) shows that when A=B is small, the
conjugate gradient algorithm needs many fewer iterations than the extrapolated Richardson algorithm to compute f = L;1g at a xed precision. 5.1.4 Frame Projector and Noise Reduction Frame redundancy is useful in reducing noise added to the frame coe cients. The vector computed with noisy frame coe cients is projected
on the image of U to reduce the amplitude of the noise. This technique
is used for high precision analog to digital conversion based on oversampling. The following proposition speci es the orthogonal projector
on ImU.
Proposition 5.2 The orthogonal projection from l2(;) onto ImU is
X
~
Px n] = U U ;1 x n] = x p] h ~p ni :
(5.40)
p2; Proof 2 . If x 2 ImU then x = Uf and
~
Px = U U ;1 Uf = Uf = x:
~
If x 2 ImU? then Px = 0 because U ;1 x = 0. This proves that P is
~
an
projector on Im
P orthogonalwe derive (5.40). U. Since Uf n] = hf ni and U ;1x =
p2; x p] ~p , 5.1. FRAME THEORY 193 A vector x n] is a sequence of frame coe cients if and only if x = Px,
which means that x satis es the reproducing kernel equation
X
x n] = x p] h ~p ni:
(5.41)
p2; This equation generalizes the reproducing kernel properties (4.20) and
(4.40) of windowed Fourier transforms and wavelet transforms. Noise Reduction Suppose that each frame coe cient Uf n] is contaminated by an additive noise W n], which is a random variable. Applying the projector P gives P (Uf + W ) = Uf + PW
with PW n] = X
p2; W p] h ~p n i: Since P is an orthogonal projector, kPW k kW k. This projector removes the component of W that is in ImU?. Increasing the redundancy
of the frame reduces the size of ImU and thus increases ImU?, so a
larger portion of the noise is removed. If W is a white noise, its energy
is uniformly distributed in the space l2(;). The following proposition
proves that its energy is reduced by at least A if the frame vectors are
normalized. Proposition 5.3 Suppose that k nk = C , for all n 2 ;. If W is a
zero-mean white noise of variance EfjW n]j2g = 2 , then EfjPW n]j g
2 C2 :
A 2 (5.42) If the frame is tight then this inequality is an equality.
Proof 2 . Let us compute 80
1
!9
=
<X
X
W l] h ~l n i
:
EfjPW n]j2 g = E @ W p] h ~p n iA
: p2 ;
l2; CHAPTER 5. FRAMES 194
Since W is white,
EfW p] W l]g = and therefore
EfjPW n]j2 g = 2 X
p2; jh ~p 2 n ij2 p ; l]
2 k nk2 = 2 C 2 :
A
A The last inequality is an equality if the frame is tight. Oversampling This noise reduction strategy is used by high preci- sion analog to digital converters. After a low-pass lter, a band-limited
analog signal f (t) is uniformly sampled and quantized. In hardware, it
is often easier to increase the sampling rate rather than the quantization precision. Increasing the sampling rate introduces a redundancy
between the sample values of the band-limited signal. For a wide range
of signals, it has been shown that the quantization error is nearly a
white noise 194]. It can thus be signi cantly reduced by a frame projector.
After the low-pass ltering, f belongs to the space UT of functions
whose Fourier transforms have their support included in ; =T =T ].
The Whittaker sampling Theorem 3.1 guarantees perfect reconstruction
with a sampling interval T , but f is oversampled with an interval T0 =
T=K that provides K times more coe cients. We verify that the frame
projector is then a low-pass lter that reduces by K the energy of the
quantization noise.
Proposition 3.2 proves that
1
t=T
f (nT0 ) = T hf (t) hT (t ; nT0)i with hT (t) = sin(t=T )
and for each 1 k K the family fhT (t ; kT=K ; nT )gn2Z is an
orthogonal basis of UT . As a consequence
n
o
= hT t ; k T ; nT
n(t) = hT (t ; nT0 )
K
n2Z
1 k K n2Z
is a union of K orthogonal bases, with vectors having a square norm
C 2 = T . It is therefore a tight frame of UT with A = B = K T = T0 . 5.1. FRAME THEORY 195 Proposition 5.3 proves that the frame projector P reduces the energy
of the quantization white noise W of variance 2 by a factor K :
EfjPW n]j2g = C2 = 2 :
A
K 2 The frame f n(t)gn2Z is tight so ~n = T10 n (5.43) and (5.40) implies that +1
1 X x p] hh (t ; pT ) h (t ; nT )i:
Px n] = T
T
0
T
0
0 p=;1 This orthogonal projector can thus be rewritten as the convolution
1
Px n] = x ? h0 n] with h0 n] = T hhT (t) hT (t ; nT0 )i:
0 One can verify that h0 is an ideal low-pass lter whose transfer function
^
has a restriction to ; ] de ned by h0 = 1 ; =K =K ]. In this case
ImU is simply the space of discrete signals whose Fourier transforms
have a restriction to ; ] which is non-zero only in ; =K =K ].
The noise can be further reduced if it is not white but if its energy
is better concentrated in ImU?. This can be done by transforming the
quantization noise into a noise whose energy is mostly concentrated at
high frequencies. Sigma-Delta modulators produce such quantization
noises by integrating the signal before its quantization 82]. To compensate for the integration, the quantized signal is di erentiated. This
di erentiation increases the energy of the quantized noise at high frequencies and reduces its energy at low frequencies. The low-pass lter
h0 thus further reduces the energy of the quantized noise. Several levels of integration and di erentiation can be used to better concentrate
the quantization noise in the high frequencies, which further reduces
its energy after the ltering by h0 330].
This oversampling example is analyzed just as well without the
frame formalism because the projector is a simple convolution. However, the frame approach is more general and applies to noise removal
in more complicated representations such as irregularly oversampled
signals or redundant windowed Fourier and wavelet frames 329]. 196 CHAPTER 5. FRAMES 5.2 Windowed Fourier Frames 2
Frame theory gives conditions for discretizing the windowed Fourier
transform while retaining a complete and stable representation. The
windowed Fourier transform of f 2 L2 (R) is de ned in Section 4.2 by
Sf (u ) = hf gu i
with
gu (t) = g(t ; u) ei t:
Setting kgk = 1 implies that kgu k = 1. A discrete windowed Fourier
transform representation
fSf (un k ) = hf gun k ig(n k)2Z2
is complete and stable if fgun k g(n k)2Z2 is a frame of L2(R ).
Intuitively, one can expect that the discrete windowed Fourier transform is complete if the Heisenberg boxes of all atoms fgun k g(n k)2Z2
fully cover the time-frequency plane. Section 4.2 shows that the Heisenberg box of gun k is centered in the time-frequency plane at (un k ).
Its size is independent of un and k . It depends on the time-frequency
spread of the window g. A complete cover of the plane is thus obtained by translating these boxes over a uniform rectangular grid, as
illustrated in Figure 5.1. The time and frequency parameters (u ) are
discretized over a rectangular grid with time and frequency intervals of
size u0 and 0. Let us denote
gn k (t) = g(t ; nu0) exp(ik 0t):
The sampling intervals (u0 0) must be adjusted to the time-frequency
spread of g. Window Scaling Suppose that fgn kg(n k)2Z is a frame of L2 (R ) with
2 frame bounds A and B . Let us dilate the window gs(t) = s;1=2 g(t=s).
It increases by s the time width of the Heisenberg box of g and reduces
by s its frequency width. We thus obtain the same cover of the timefrequency plane by increasing u0 by s and reducing 0 by s. Let gs n k(t) = gs(t ; nsu0) exp ik s0 t : (5.44) 5.2. WINDOWED FOURIER FRAMES 197 We prove that fgs n kg(n k)2Z2 satis es the same frame inequalities as
fgn k g(n k)2Z2, with the same frame bounds A and B , by a change of
variable t0 = ts in the inner product integrals.
ω ξ gu nξk k ξ0
u0
0 un t Figure 5.1: A windowed Fourier frame is obtained by covering the
time-frequency plane with a regular grid of windowed Fourier atoms,
translated by un = n u0 in time and by k = k 0 in frequency. Necessary Conditions Daubechies 21] proved several necessary conditions on g, u0 and 0 to guarantee that fgn k g(n k)2Z2 is a frame of
L2(R ). We do not reproduce the proofs, but summarize the main results.
Theorem 5.5 (Daubechies) The windowed Fourier family fgn k g(n k)2Z 2 is a frame only if 2
u0 0 1:
The frame bounds A and B necessarily satisfy
A u2
B
00
8t 2 R
8! 2 R A
A +1
2 X jg(t ; nu )j2
0
0 n=;1 +1
1 X jg(! ; k )j2
0
u0 k=;1 ^ (5.45)
(5.46) B (5.47) B: (5.48) CHAPTER 5. FRAMES 198 The ratio 2 =(u0 0) measures the density of windowed Fourier atoms
in the time-frequency plane. The rst condition (5.45) ensures that this
density is greater than 1 because the covering ability of each atom is
limited. The inequalities (5.47) and (5.48) are proved in full generality
by Chui and Shi 124]. They show that the uniform time translations of
g must completely cover the time axis, and the frequency translations
of its Fourier transform g must similarly cover the frequency axis.
^
Since all windowed Fourier vectors are normalized, the frame is an
orthogonal basis only if A = B = 1. The frame bound condition
(5.46) shows that this is possible only at the critical sampling density
u0 0 = 2 . The Balian-Low Theorem 86] proves that g is then either
non-smooth or has a slow time decay. Theorem 5.6 (Balian-Low) If fgn kg(n k)2Z is a windowed Fourier
2 frame with u0 0 = 2 , then
Z +1
;1 t jg(t)j dt = +1 or
2 2 Z +1
;1 !2 jg(!)j2 d! = +1:
^ (5.49) This theorem proves that we cannot construct an orthogonal windowed Fourier basis with a di erentiable window g of compact support.
On the other hand, one can verify that the discontinuous rectangular
window
g = p1 1 ;u0=2 u0 =2]
u0
yields an orthogonal windowed Fourier basis for u0 0 = 2 . This basis
is rarely used because of the bad frequency localization of g.
^ Su cient Conditions The following theorem proved by Daubechies
145] gives su cient conditions on u0,
dowed Fourier frame. 0 and g for constructing a win- Theorem 5.7 (Daubechies) Let us de ne
(u) = sup
0 +1
X t u0 n=;1 jg (t ; nu0 )j jg (t ; nu0 + u)j (5.50) 5.2. WINDOWED FOURIER FRAMES
and = +1
X 2k k=
k=0 199 0 ;1 ;2 1=2 k 0 : (5.51) 6 If u0 and 0 satisfy A0 = 2 0 0 itnfu +1
X
0 n=;1 jg (t ; nu0 )j2 ; and B0 = 2 0 0 sup +1
X t u0 n=;1 ! jg (t ; nu0 )j2 + ! >0 < +1 (5.52) (5.53) then fgn k g(n k)2Z2 is a frame. The constants A0 and B0 are respectively
lower bounds and upper bounds of the frame bounds A and B . Observe that the only di erence between the su cient conditions
(5.52, 5.53) and the necessary condition (5.47) is the addition and subP1
traction of . If is small compared to inf 0 t u0 +=;1 jg(t ; nu0)j2
n
then A0 and B0 are close to the optimal frame bounds A and B . Dual Frame Theorem 5.2 proves that the dual windowed frame vectors are gn k = (U U );1 gn k:
~
(5.54)
The following proposition shows that this dual frame is also a windowed
Fourier frame, which means that its vectors are time and frequency
translations of a new window g.
~ Proposition 5.4 Dual windowed Fourier vectors can be rewritten
gn k (t) = g(t ; nu0) exp(ik 0t)
~
~
where g is the dual window
~ g = (U U );1 g:
~ (5.55) CHAPTER 5. FRAMES 200 Proof 2 . This result is proved by showing rst that L = U U commutes
with time and frequency translations proportional to u0 and 0 . If h 2
L2(R) and hm l (t) = h(t ; mu0 ) exp(il 0 t) we verify that Lhm l (t) = exp(il 0 t) Lh(t ; mu0):
Indeed (5.26) shows that Lhm l = X
(n k)2Z2 hhm l gn k i gn k and a change of variable yields hhm l gn k i = hh gn;m k;l i:
Consequently Lhm l (t) = X
(n k)2Z2 hh gn;m k;l i exp(il 0 t) gn;m k;l(t ; mu0) = exp(il 0 t) Lh(t ; mu0 ):
Since L commutes with these translations and frequency modulations we
verify that L;1 necessarily commutes with the same group operations.
Hence gn k (t) = L;1 gn k = exp(ik 0 ) L;1 g0 0(t ; nu0 ) = exp(ik 0 ) g (t ; nu0 ):
~
~
(5.55) Gaussian Window The Gaussian window
g(t) = ;1=4 exp ;t2 (5.56)
2
has a Fourier transform g that is a Gaussian with the same variance.
^
The time and frequency spreads of this window are identical. We therefore choose equal sampling intervals in time and frequency: u0 = 0. For
the same product u0 0 other choices would degrade the frame bounds.
If g is dilated by s then the time and frequency sampling intervals must
become su0 and 0=s.
If the time-frequency sampling density is above the critical value:
2 =(u0 0) > 1, then Daubechies 145] proves that fgn k g(n k)2Z2 is a 5.2. WINDOWED FOURIER FRAMES 201 u0 0
=2
3 =4 A0 B0 B0 =A0
3.9
4.1
1.05
2.5
2.8
1.1
1.6
2.4
1.5
4 =3 0.58 2.1
3.6
1:9
0.09 2.0
22
Table 5.1: Frame bounds estimated with Theorem 5.7 for the Gaussian
window (5.56) and u0 = 0. R frame. When u0 0 tends to 2 , the frame bound A tends to 0. For
u0 0 = 2 , the family fgn kg(n k)2Z2 is complete in L2( ), which means
that any f 2 L2( ) is entirely characterized by the inner products
fhf gn k ig(n k)2Z2. However, the Balian-Low Theorem 5.6 proves that
it cannot be a frame and one can indeed verify that A = 0 145].
This means that the reconstruction of f from these inner products is
unstable.
Table 5.1 gives the estimated frame bounds A0 and B0 calculated
with Theorem 5.7, for di erent values of u0 = 0. For u0 0 = =2,
which corresponds to time and frequency sampling intervals that are
half the critical sampling rate, the frame is nearly tight. As expected,
A B 4, which veri es that the redundancy factor is 4 (2 in time
and 2 in frequency). Since the frame is almost tight, the dual frame
is approximately equal to the original frame, which means that g g.
~
When u0 0 increases we see that A decreases to zero and g deviates
~
more and more from a Gaussian. In the limit u0 0 = 2 , the dual
window g is a discontinuous function that does not belong to L2( ).
~
These results can be extended to discrete window Fourier transforms
computed with a discretized Gaussian window 361]. R R Tight Frames Tight frames are easier to manipulate numerically
since the dual frame is equal to the original frame. Daubechies, Grossmann and Meyer 146] give two su cient conditions for building a window of compact support that generates a tight frame.
Theorem 5.8 (Daubechies, Grossmann, Meyer) Let g be a win- CHAPTER 5. FRAMES 202
dow whose support is included in ; =
8t 2 R +1
X 2 0 n=;1 0 = 0]. If jg (t ; nu0 )j2 = A (5.57) then fgn k g(n k)2Z2 is a tight frame with a frame bound equal to A. The proof is studied in Problem 5.4. If we impose that
1 u2
2
00
then only consecutive windows g(t ; nu0) and g(t ; (n + 1)u0) have
supports that overlap. The design of such windows is studied in Section
8.4.2 for local cosine bases. 5.3 Wavelet Frames 2
Wavelet frames are constructed by sampling the time and scale parameters of a continuous wavelet transform. A real continuous wavelet
transform of f 2 L2( ) is de ned in Section 4.3 by R Wf (u s) = hf u si where is a real wavelet and
u s(t) = 1
p s t;u :
s Imposing k k = 1 implies that k u sk = 1.
Intuitively, to construct a frame we need to cover the time-frequency
plane with the Heisenberg boxes of the corresponding discrete wavelet
family. A wavelet u s has an energy in time that is centered at u
over a domain proportional to s. Over positive frequencies, its Fourier
transform ^u s has a support centered at a frequency =s, with a spread
proportional to 1=s. To obtain a full cover, we sample s along an
exponential sequence faj gj2Z, with a su ciently small dilation step a > 5.3. WAVELET FRAMES 203 1. The time translation u is sampled uniformly at intervals proportional
to the scale aj , as illustrated in Figure 5.2. Let us denote
j
(t) = p1 j t ; nu0a :
jn
aj
a
We give necessary and su cient conditions on , a and u0 so that
f j ng(j n)2Z2 is a frame of L2 ( ). R Necessary Conditions We suppose that is real, normalized, and
satis es the admissibility condition of Theorem 4.3:
Z +1 j ^(!)j2
C=
! d! < +1:
0 (5.58) ω
η
11
00
aj-1
η
aj 11
00
11
00 1 11 11
0 00 00 ψ 11
00 j n
11
00 11
00
11
00 0 nu0 a j 1
0 1
0
1
0 1 11
0 00 u0 aj 11
00
11
00 t Figure 5.2: The Heisenberg box of a wavelet j n scaled by s = aj has
a time and frequency width proportional respectively to aj and a;j .
The time-frequency plane is covered by these boxes if u0 and a are
su ciently small. Theorem 5.9 (Daubechies) If f
the frame bounds satisfy R A 8! 2 ; f0g R j ng(j n)2Z2 is a frame of L2 ( C
u0 loge a B
+1
1 X j ^(aj !)j2 B :
Au
0 j =;1 ) then
(5.59)
(5.60) CHAPTER 5. FRAMES 204 The condition (5.60) imposes that the Fourier axis is covered by
wavelets dilated by faj gj2Z. It is proved in 124, 21]. Section 5.5
explains that this condition is su cient for constructing a complete
and stable signal representation if the time parameter u is not sampled.
The inequality (5.59), which relates the sampling density u0 loge a to
the frame bounds, is proved in 21]. It shows that the frame is an
orthonormal basis if and only if
A = B = u C a = 1:
0 loge
Chapter 7 constructs wavelet orthonormal bases of L2( ) with regular
wavelets of compact support. R Su cient Conditions The following theorem proved by Daubechies
21] provides a lower and upper bound for the frame bounds A and B ,
depending on , u0 and a. Theorem 5.10 (Daubechies) Let us de ne
( ) = sup +1
X 1 j!j a j =;1 and = +1
X
k=;1
k6=0 If u0 and a are such that 1
A0 = u
0
and 1
B0 = u
0 inf j ^(aj ! )j j ^(aj ! + )j 2k
u0
+1
X 1 j!j a j =;1 sup +1
X 1 j!j a j =;1 ;2 k 1=2 u0 j ^(aj ! )j2 ; j ^(aj ! )j2 + R !
! (5.61) : >0 (5.62) < +1 (5.63) then f j ng(j n)2Z2 is a frame of L2( ) . The constants A0 and B0 are
respectively lower and upper bounds of the frame bounds A and B . 5.3. WAVELET FRAMES 205 The su cient conditions (5.62) and (5.63) are similar to the necesP1
sary condition (5.60). If is small relative to inf 1 j!j a +=;1 j ^(aj !)j2
j
then A0 and B0 are close to the optimal frame bounds A and B . For a
xed dilation step a, the value of decreases when the time sampling
interval u0 decreases. Dual Frame Theorem 5.2 gives a general formula for computing the
dual wavelet frame vectors
~j n = (U U );1 j n:
(5.64)
One could reasonably hope that the dual functions ~j n would be obtained by scaling and translating a dual wavelet ~. The sad reality
is that this is generally not true. In general the operator U U does
not commute with dilations by aj , so (U U );1 does not commute with
these dilations either. On the other hand, one can prove that (U U );1
commutes with translations by naj u0, which means that
~j n(t) = ~j 0(t ; naj u0):
(5.65)
The dual frame f ~j ng(j n)2Z2 is thus obtained by calculating each elementary function ~j 0 with (5.64), and translating them with (5.65).
The situation is much simpler for tight frames, where the dual frame is
equal to the original wavelet frame. Mexican Hat Wavelet The normalized second derivative of a Gaussian is 2
(t) = p
3
Its Fourier transform is ;1=4 (t2 ; 1) p 2
exp ;t :
2 (5.66) 1=4 2
2
^(!) = ; 8 p ! exp ;! :
2
3
The graph of these functions is shown in Figure 4.6.
The dilation step a is generally set to be a = 21=v where v is the
number of intermediate scales (voices) for each octave. Table 5.2 gives
the estimated frame bounds A0 and B0 computed by Daubechies 21] 206 CHAPTER 5. FRAMES a u0
A0
B0 B0=A0
2 0.25 13.091 14.183 1.083
2 0.5 6.546 7.092 1.083
2 1.0 3.223 3.596 1.116
2 1.5 0.325 4.221 12.986
2
2 1 0.25 27.273 27.278 1.0002
1
2 2 0.5 13.673 13.639 1.0002
2
2 1 1.0 6.768 6.870 1.015
2
2 1 1.75 0.517 7.276 14.061
4
2 1 0.25 54.552 54.552 1.0000
4
2 1 0.5 27.276 27.276 1.0000
4
2 1 1.0 13.586 13.690 1.007
4
2 1 1.75 2.928 12.659 4.324
Table 5.2: Estimated frame bounds for the Mexican hat wavelet computed with Theorem 5.10 21].
with the formula of Theorem 5.10. For v 2 voices per octave, the
frame is nearly tight when u0 0:5, in which case the dual frame can be
approximated by the original wavelet frame. As expected from (5.59),
when A B
v
A B u C a = u C log2 e:
0 loge
0
The frame bounds increase proportionally to v=u0. For a = 2, we see
that A0 decreases brutally from u0 = 1 to u0 = 1:5. For u0 = 1:75
the wavelet family is not a frame anymore. For a = 21=2, the same
transition appears for a larger u0. 5.4 Translation Invariance 1
In pattern recognition, it is important to construct signal representations that are translation invariant. When a pattern is translated, its
numerical descriptors should be translated but not modi ed. Indeed,
a pattern search is particularly di cult if its representation depends
on its location. Continuous wavelet transforms and windowed Fourier 5.4. TRANSLATION INVARIANCE 207 transforms provide translation-invariant representations, but uniformly
sampling the translation parameter destroys this translation invariance. Continuous Transforms Let f (t) = f (t; ) be a translation of f (t) by . The wavelet transform can be written as a convolution product:
Z +1 1 t ; u
Wf (u s) =
f (t) ps
s dt = f ? s(u)
;1
with s(t) = s;1=2 (;t=s). It is therefore translation invariant:
Wf (u s) = f ? s(u) = Wf (u ; s):
A windowed Fourier transform can also be written as a linear ltering Sf (u ) =
with g
ant: Z +1
;1 (t) = g(;t) eit f (t) g(t ; u) e;it dt = e;iu f ? g (u) . Up to a phase shift, it is also translation invari- Sf (u ) = e;iu f ? g (u ; ) = e;i S f (u ; Frame Sampling A wavelet frame ): t ; naj u0
aj
a
yields inner products that sample the continuous wavelet transform at
time intervals aj u0:
hf j n i = f ? aj (naj u0 ) = Wf (naj u0 aj ):
Translating f by gives
hf j ni = f ? aj (naj u0 ; ) = Wf (naj u0 ; aj ):
If the sampling interval aj u0 is large relative to the rate of variation
of f ? aj (t), then the coe cients hf j ni and hf j ni may take very
di erent values that are not translated with respect to one another.
This is illustrated in Figure 5.3. This problem is particularly acute
for wavelet orthogonal bases where u0 is maximum. The orthogonal
wavelet coe cients of f may be very di erent from the coe cients of
f . The same translation distortion phenomena appear in windowed
Fourier frames.
j n(t) = p1 j CHAPTER 5. FRAMES 208
Wf(u,aj) u j Wfτ (u,a )
τ u
j a u0 Figure 5.3: If f (t) = f (t ; ) then Wf (u aj ) = Wf (u ; aj ).
Uniformly sampling Wf (u aj ) and Wf (u aj ) at u = naj u0 may yield
very di erent values if 6= ku0aj . Translation-Invariant Representations There are several strate- gies for maintaining the translation invariance of a wavelet transform. If
the sampling interval aj u0 is small enough then the samples of f ? aj (t)
are approximately translated when f is shifted. The dyadic wavelet
transform presented in Section 5.5 is a translation-invariant representation that does not sample the translation factor u. This creates a
highly redundant signal representation.
To reduce the representation size while maintaining translation invariance, one can use an adaptive sampling scheme, where the sampling
grid is automatically translated when the signal is translated. For each
scale aj , Wf (u aj ) = f ? aj (u) can be sampled at locations u where
jWf (aj u)j is locally maximum. The resulting representation is translation invariant since the local maxima positions are translated when
f and hence f ? aj are translated. This adaptive sampling is studied
in Section 6.2.2. 5.5 Dyadic Wavelet Transform 2
To construct a translation-invariant wavelet representation, the scale s
is discretized but not the translation parameter u. The scale is sampled
along a dyadic sequence f2j gj2Z, to simplify the numerical calculations.
Fast computations with lter banks are presented in the next two sec- 5.5. DYADIC WAVELET TRANSFORM 209 tions. An application to computer vision and texture discrimination is
described in Section 5.5.3.
The dyadic wavelet transform of f 2 L2( ) is de ned by Wf (u 2j ) = with Z +1
;1 R t ; u dt = f ?
2j f (t) p1 j
2 2j (u) (5.67) ;t
p1 j
:
2j
2
The following proposition proves that if the frequency axis is completely
covered by dilated dyadic wavelets, as illustrated by Figure 5.4, then it
de nes a complete and stable representation.
2j (t) = 2j (;t) = Theorem 5.11 If there exist two constants A > 0 and B > 0 such
that R 8! 2 ; f0g then A kf k2
If ~ satis es A +1
X B (5.68) B kf k2: (5.69) ^ (2j !) b (2j !) = 1
~ (5.70) j =;1 +1
X j ^(2j ! )j2 1 kWf (u 2j )k2
j
j =;1 2 R 8! 2 ; f0g then f (t) = +1
X j =;1 +1
X 1 Wf (: 2j ) ? ~ j (t):
2
j
j =;1 2 (5.71) Proof 2 . The Fourier transform of fj (u) = Wf (u 2j ) with respect to u
is derived from the convolution formula (5.67): p
f^j (!) = 2j ^ (2j !) f^(!): (5.72) CHAPTER 5. FRAMES 210
The condition (5.68) implies that A jf^(!)j2 +1
X 1 jf^ (!)j2
jj
j =;1 2 B jf^(!)j2 : Integrating each side of this inequality with respect to ! and applying
the Parseval equality (2.25) yields (5.69).
Equation (5.71) is proved by taking the Fourier transform on both
sides and inserting (5.70) and (5.72).
0.25
0.2
0.15
0.1
0.05
0 −2 0 2 Figure 5.4: Scaled Fourier transforms j ^(2j !)j2 computed with (5.84),
for 1 j 5 and ! 2 ; ].
The energy equivalence (5.69) proves that the normalized dyadic wavelet
transform operator
Uf j u] = p1 j Wf (u 2j ) = f p1 j 2j (t ; u)
2
2
satis es frame inequalities. There exist an in nite number of reconstructing wavelets ~ that verify (5.70). They correspond to di erent
left inverses of U , calculated with (5.71). If we choose
b (!) = P+1 ^(!)
~
(5.73)
^j 2
j =;1 j (2 ! )j
~
then one can verify that the left inverse is the pseudo inverse U ;1 .
Figure 5.5 gives a dyadic wavelet transform computed over 5 scales
with the quadratic spline wavelet shown in Figure 5.6. 5.5. DYADIC WAVELET TRANSFORM 211 Signal 2−7
2−6
2−5
2−4
2−3
Approximation
2−3 Figure 5.5: Dyadic wavelet transform Wf (u 2j ) computed at scales
2;7 2j 2;3 with the lter bank algorithm of Section 5.5.2, for
signal de ned over 0 1]. The bottom curve carries the lower frequencies
corresponding to scales larger than 2;3. CHAPTER 5. FRAMES 212 5.5.1 Wavelet Design A discrete dyadic wavelet transform can be computed with a fast lter
bank algorithm if the wavelet is appropriately designed. The synthesis
of these dyadic wavelets is similar to the construction of biorthogonal
wavelet bases, explained in Section 7.4. All technical issues related to
the convergence of in nite cascades of lters are avoided in this section.
Reading Chapter 7 rst is necessary for understanding the main results.
Let h and g be a pair of nite impulse response lters. Suppose that
p
^
h is a low-pass lter whose transfer function satis es h(0) = 2. As in
the case of orthogonal and biorthogonal wavelet bases, we construct a
scaling function whose Fourier transform is
+1 ^ ;p
1^
^(!) = Y h(2 !) = p h ! ^ ! :
p
(5.74)
2
2
22
p=1 R We suppose here that this Fourier transform is a nite energy function so that 2 L2( ). The corresponding wavelet has a Fourier
transform de ned by
1^
^(!) = p g ! ^ ! :
(5.75)
2
22
Proposition 7.2 proves that both and have a compact support
because h and g have a nite number of non-zero coe cients. The
number of vanishing moments of is equal to the number of zeroes of
^(!) at ! = 0. Since ^(0) = 1, (5.75) implies that it is also equal to
the number of zeros of g(!) at ! = 0.
^ Reconstructing Wavelets Reconstructing wavelets that satisfy (5.70)
~
are calculated with a pair of nite impulse response dual lters h and
g. We suppose that the following Fourier transform has a nite energy:
~
+1 ~
hp
1b
b(!) = Y b(2;p!) = p h ! b ! :
~
~
~
(5.76)
2
2
22
p=1
Let us de ne 1g
b (!) = p b ! b ! :
~
~
~
2
22 (5.77) 5.5. DYADIC WAVELET TRANSFORM 213 The following proposition gives a su cient condition to guarantee that
b is the Fourier transform of a reconstruction wavelet.
~ Proposition 5.5 If the lters satisfy
8! 2 ; R then b(!) ^ (!) + b(!) g (!) = 2
~
hh
g^
~
+1
X 8! 2 ; f0g j =;1 ^ (2j !) b (2j !) = 1:
~ (5.78)
(5.79) Proof 2 . The Fourier transform expressions (5.75) and (5.77) prove that
b (!) ^ (!) = 1 b ! g ! b ! ^ ! :
~
~
~
2g 2 ^ 2
2
2
Equation (5.78) implies
b (!) ^ (!) = 1 h2 ; h ! h ! i b ! ^ !
b^
~
~
~
2
2
2
2
2
b ! ^ ! ; b(!) ^ (!):
~
~
=
2
2
Hence
k
Xb
~ j =;l ~
~
(2j !) ^ (2j !) = ^ (2;l !) b(2;l !) ; ^ (2k !) b(2k !): b^
~
Since g (0) = 0, (5.78) implies h(0) h (0) = 2. We also impose that
^
p
~
^
h(0) = 2 so one can derive from (5.74,5.76) that b(0) = ^ (0) = 1.
b are continuous, and the RiemannSince and ~ belong to L1( ), ^ and ~
~
Lebesgue lemma (Problem 2.6) proves that j ^(!)j and jb(!)j decrease
to zero when ! goes to 1. For ! 6= 0, letting k and l go to +1 yields
(5.79). R Observe that (5.78) is the same as the unit gain condition (7.122) for
biorthogonal wavelets. The aliasing cancellation condition (7.121) of
biorthogonal wavelets is not required because the wavelet transform is
not sampled in time. 214 CHAPTER 5. FRAMES Finite Impulse Response Solution Let us shift h and g to obtain
^
causal lters. The resulting transfer functions h(!) and g(!) are poly^
;i! . We suppose that these polynomials have no common
nomials in e
zeros. The Bezout Theorem 7.6 on polynomials proves that if P (z) and
Q(z) are two polynomials of degree n and l, with no common zeros,
~
~
then there exists a unique pair of polynomials P (z) and Q(z) of degree
l ; 1 and n ; 1 such that
~
~
(5.80)
P (z) P (z) + Q(z) Q(z) = 1: ~
This guarantees the existence of b(!) and b(!) that are polynomials in
h
g
~
;i! and satisfy (5.78). These are the Fourier transforms of the nite
e
~
impulse response lters h and g. One must however be careful because
~
~
the resulting scaling function b in (5.76) does not necessarily have a
nite energy. Spline Dyadic Wavelets A box spline of degree m is a translation
of m + 1 convolutions of 1 0 1] with itself. It is centered at t = 1=2 if m
is even and at t = 0 if m is odd. Its Fourier transform is
m+1
^(!) = sin(!=2)
exp ;i ! with = 1 if m is even
0 if m is odd
!=2
2
(5.81)
so
m+1
p^
p
^
h(!) = 2 ^(2!) = 2 cos !
exp ;i2 ! :
(5.82)
2
(! )
We construct a wavelet that has one vanishing moment by choosing
g(!) = O(!) in the neighborhood of ! = 0. For example
^
p
(5.83)
g(!) = ;i 2 sin ! exp ;i2 ! :
^
2
The Fourier transform of the resulting wavelet is
1^
^(!) = p g ! ^ ! = ;i! sin(!=4)
2
4
!=4
22 m+2 exp ;i!(1 + ) :
4
(5.84) 5.5. DYADIC WAVELET TRANSFORM
p n ~p
h n]= 2 h n]= 2 215
p p g n]= 2 g n]= 2
~
;2
;0:03125
;1
0.125
0.125
;0:21875
0
0.375
0.375
;0:5
;0:6875
1
0.375
0.375
0.5
0.6875
2
0.125
0.125
0.21875
3
0.03125
Table 5.3: Coe cients of the lters computed from their transfer functions (5.82, 5.83, 5.85) for m = 2. These lters generate the quadratic
spline scaling functions and wavelets shown in Figure 5.6.
It is the rst derivative of a box spline of degree m + 1 centered at
t = (1 + )=4. For m = 2, Figure 5.6 shows the resulting quadratic
splines and . The dyadic admissibility condition (5.68) is veri ed
numerically for A = 0:505 and B = 0:522.
(t) (t)
0.8 0.5 0.6
0.4 0 0.2 −0.5
−0.5 0 0.5 1 1.5 0
−1 0 1 2 Figure 5.6: Quadratic spline wavelet and scaling function.
To design dual scaling functions ~ and wavelets ~ which are splines,
~
we choose b = h. As a consequence, = ~ and the reconstruction
h^
condition (5.78) implies that
^
b(!) = 2 ; jh(!)j2 = ;i p2 exp
g
~ g (!)
^ ;i!
sin !
2
2 Table 5.3 gives the corresponding lters for m = 2. m
X
n=0 cos !
2 2n :
(5.85) CHAPTER 5. FRAMES 216 5.5.2 \Algorithme a Trous" Suppose that the scaling functions and wavelets , , ~ and ~ are de~
signed with the lters h, g, h and g. A fast dyadic wavelet transform is
~
calculated with a lter bank algorithm called in French the algorithme
a trous, introduced by Holschneider, Kronland-Martinet, Morlet and
Tchamitchian 212]. It is similar to a fast biorthogonal wavelet transform, without subsampling 308, 261].
Let f_(t) be a continuous time signal characterized by N samples
at a distance N ;1 over 0 1]. Its dyadic wavelet transform can only
be calculated at scales 1 > 2j N ;1 . To simplify the description
of the lter bank algorithm, it is easier to consider the signal f (t) =
f_(N ;1 t), whose samples have distance equal to 1. A change of variable in the dyadic wavelet transform integral shows that W f_(u 2j ) =
N ;1=2 Wf (Nu N 2j ). We thus concentrate on the dyadic wavelet transform of f , from which the dyadic wavelet transform of f_ is easily derived. Fast Dyadic Transform We suppose that the samples a0 n] of the input discrete signal are not equal to f (n) but to a local average of f in
the neighborhood of t = n. Indeed, the detectors of signal acquisition
devices perform such an averaging. The samples a0 n] are written as
averages of f (t) weighted by the scaling kernels (t ; n): a0 n] = hf (t) (t ; n)i = Z +1
;1 f (t) (t ; n) dt: This is further justi ed in Section 7.3.1. For any j 0, we denote
aj n] = hf (t) 2j (t ; n)i with 2j (t) = p1 j 2tj :
2
The dyadic wavelet coe cients are computed for j > 0 over the integer
grid
dj n] = Wf (n 2j ) = hf (t) 2j (t ; n)i:
For any lter x n], we denote by xj n] the lters obtained by inserting 2j ; 1 zeros between each sample of x n]. Its Fourier transform is
x(2j !). Inserting zeros in the lters creates holes (trous in French). Let
^ 5.5. DYADIC WAVELET TRANSFORM 217 xj n] = xj ;n]. The next proposition gives convolution formulas that
are cascaded to compute a dyadic wavelet transform and its inverse.
Proposition 5.6 For any j 0,
aj+1 n] = aj ? hj n]
dj+1 n] = aj ? gj n]
(5.86)
and 1
~
aj n] = 2 aj+1 ? hj n] + dj+1 ? gj n] :
~ (5.87) Proof 2 . Proof of (5.86). Since
aj+1 n] = f ? 2j+1 (n) and dj +1 n] = f ? 2j+1 (n)
we verify with (3.3) that their Fourier transforms are respectively aj+1 (!) =
^
and ^
dj +1 (!) = +1
X k=;1
+1
X k=;1 f^(! + 2k ) ^2j+1 (! + 2k ) f^(! + 2k ) ^2j+1 (! + 2k ): The properties (5.76) and (5.77) imply that
^2j+1 (!) = p2j +1 ^(2j +1 !) = h(2j !) p2j ^(2j !)
^
^2j+1 (!) = p2j +1 ^(2j +1 !) = g(2j !) p2j ^(2j !):
^
^ (2j !) and g (2j !) are 2 periodic, so
Since j 0, both h
^
^
^
aj+1 (!) = h (2j !) aj (!) and dj+1(!) = g (2j !) aj (!):
^
^
^
^
(5.88)
These two equations are the Fourier transforms of (5.86).
Proof of (5.87). Equations (5.88) imply
b
~
^
g
aj+1 (!) h(2j !) + dj +1(!) b(2j !) =
^
~
b(2j !) + aj (!) g (2j !) b(2j !):
~
aj (!) h (2j !) h
^^
^^
g
~
Inserting the reconstruction condition (5.78) proves that
b
~
^
aj+1(!) h(2j !) + dj+1(!) b(2j !) = 2 aj (!)
^
g
~
^
which is the Fourier transform of (5.87). CHAPTER 5. FRAMES 218 The dyadic wavelet representation of a0 is de ned as the set of wavelet
coe cients up to a scale 2J plus the remaining low-frequency information aJ :
h
i
fdj g1 j J aJ :
(5.89)
It is computed from a0 by cascading the convolutions (5.86) for 0
j < J , as illustrated in Figure 5.7(a). The dyadic wavelet transform
of Figure 5.5 is calculated with this lter bank algorithm. The original
signal a0 is recovered from its wavelet representation (5.89) by iterating
(5.87) for J > j 0, as illustrated in Figure 5.7(b).
hj hj+1 a j+1 gj aj dj+1 g j+1 aj+2 dj+2 (a)
~ aj+2 hj+1 dj+2 g ~ j+1 + 1/2 ~ a j+1 hj dj+1 + aj gj 1/2 ~ (b)
Figure 5.7: (a): The dyadic wavelet coe cients are computed by cascading convolutions with dilated lters hj and gj . (b): The original
~
signal is reconstructed through convolutions with hj and gj . A multi~
plication by 1=2 is necessary to recover the next ner scale signal aj .
If the input signal a0 n] has a nite size of N samples, the convolutions (5.86) are replaced by circular convolutions. The maximum scale
2J is then limited to N , and P J = log2 N one can verify that aJ n] is
for
;1=2 N ;1 a n]. Suppose that h and g have reconstant and equal to N
n=0 0
spectively Kh and Kg non-zero samples. The \dilated" lters hj and gj
have the same number of non-zero coe cients. The number of multiplications needed to compute aj+1 and dj+1 from aj or the reverse is thus
equal to (Kh + Kg )N . For J = log2 N , the dyadic wavelet representation (5.89) and its inverse are thus calculated with (Kh + Kg )N log2 N
multiplications and additions. 5.5. DYADIC WAVELET TRANSFORM 219 5.5.3 Oriented Wavelets for a Vision 3 Image processing applications of dyadic wavelet transforms are motivated by many physiological and computer vision studies. Textures
can be synthesized and discriminated with oriented two-dimensional
wavelet transforms. Section 6.3 relates multiscale edges to the local
maxima of a wavelet transform. Oriented Wavelets In two dimensions, a dyadic wavelet transform is computed with several mother wavelets f k g1 k K which often have
di erent spatial orientations. For x = (x1 x2 ), we denote
1 k x1 x2 and k (x) = k (;x):
k
2j (x1 x2 ) = 2j
2j
2j
2j 2j
The wavelet transform of f 2 L2( 2 ) in the direction k is de ned at
the position u = (u1 u2) and at the scale 2j by R W k f (u 2j ) = hf (x) k
2j (x ; u)i = f ? k
2j (u): (5.90) As in Theorem 5.11, one can prove that the two-dimensional wavelet
transform is a complete and stable signal representation if there exist
A > 0 and B such that
8! = (!1 !2 ) 2 R 2 ; f(0 0)g A K +1
XX k=1 j =;1 j ^k (2j ! )j2 B : (5.91) Then there exist reconstruction wavelets f ~k g1 k K whose Fourier transforms satisfy
K
+1
X 1 X c j ^k j
~k
(5.92)
22j k=1 (2 !) (2 !) = 1
j =;1
which yields
+1
X K
1 X W k f (: 2j ) ? ~k (x) :
f (x) =
2j
2j
j =;1 2 k=1 Wavelets that satisfy (5.91) are called dyadic wavelets. (5.93) CHAPTER 5. FRAMES 220 Families of oriented wavelets along any angle can be designed as
a linear expansion of K mother wavelets 312]. For example, a wavelet
in the direction may be de ned as the partial derivative of order p of
a window (x) in the direction of the vector ~ = (cos sin ):
n
px
@
@p
(x) = @ @~(p ) = cos @x + sin @x (x):
n
1
2
This partial derivative is a linear expansion of K = p + 1 mother
wavelets
p
Xp
(x) =
k=0 with k (cos )k (sin )p;k k (x) (5.94) @ p (x) for 0 k p.
@xk @xp;k
12
For appropriate windows , these p + 1 partial derivatives de ne a
family of dyadic wavelets. In the direction , the wavelet transform
W f (u 2j ) = f ? 2j (u) is computed from the p + 1 components
k
W k f (u 2j ) = f ? 2j (u) with the expansion (5.94). Section 6.3 uses
such oriented wavelets, with p = 1, to detect the multiscale edges of an
image.
k (x) = Gabor Wavelets In the cat's visual cortex, Hubel and Wiesel 215] discovered a class of cells, called simple cells, whose responses depend
on the frequency and orientation of the visual stimuli. Numerous physiological experiments 283] have shown that these cells can be modeled
as linear lters, whose impulse responses have been measured at di erent locations of the visual cortex. Daugmann 149] showed that these
impulse responses can be approximated by Gabor wavelets, obtained
with a Gaussian window g(x1 x2) multiplied by a sinusoidal wave:
k (x
1 x2) = g(x1 x2 ) exp ;i (x1 cos k + x2 sin k )]: The position, the scale and the orientation k of this wavelet depend
on the cortical cell. These ndings suggest the existence of some sort
of wavelet transform in the visual cortex, combined with subsequent 5.5. DYADIC WAVELET TRANSFORM 221 non-linearities 284]. The \physiological" wavelets have a frequency
resolution on the order of 1{1.5 octaves, and are thus similar to dyadic
wavelets.
Let g(!1 !2) be the Fourier transform of g(x1 x2). Then
^
k
^2j (!1 !2) = p2j g(2j !1 ; cos k 2j !2 ; sin k ):
^
In the Fourier plane, the energy of this Gabor wavelet is mostly concentrated around (2;j cos k 2;j sin k ), in a neighborhood proportional
to 2;j . Figure 5.8 shows a cover of the frequency plane by such dyadic
wavelets. The bandwidth of g(!1 !2) and must be adjusted to satisfy
^
(5.91).
ω2 ω1 Figure 5.8: Each circle represents the frequency support of a dyadic
k
wavelet ^2j . This support size is proportional to 2;j and its position
rotates when k is modi ed. Texture Discrimination Despite many attempts, there are no ap- propriate mathematical models for \homogeneous image textures." The
notion of texture homogeneity is still de ned with respect to our visual
perception. A texture is said to be homogeneous if it is preattentively
perceived as being homogeneous by a human observer.
The texton theory of Julesz 231] was a rst important step in
understanding the di erent parameters that in uence the perception
of textures. The orientation of texture elements and their frequency
content seem to be important clues for discrimination. This motivated early researchers to study the repartition of texture energy in CHAPTER 5. FRAMES 222 jW 1 f (u 2;5)j2 jW 2f (u 2;5)j2 jW 1f (u 2;4)j2 jW 2 f (u 2;4)j2 Figure 5.9: Gabor wavelet transform jW k f (u 2j )j2 of a texture patch,
at the scales 2;4 and 2;5, along two orientations k respectively equal
to 0 and =2 for k = 1 and k = 2. The darker a pixel, the larger the
wavelet coe cient amplitude.
the Fourier domain 85]. For segmentation purposes, it is however necessary to localize texture measurements over neighborhoods of varying
sizes. The Fourier transform was thus replaced by localized energy
measurements at the output of lter banks that compute a wavelet
transform 224, 244, 285, 334]. Besides the algorithmic e ciency of
this approach, this model is partly supported by physiological studies
of the visual cortex.
k
Since W k f (u 2j ) = hf (x) 2j (x ; u)i, we derive that jW k f (u 2j )j2
measures the energy of f in a spatial neighborhood of u of size 2j and
in a frequency neighborhood of (2;j cos k 2;j sin k ) of size 2;j .
Varying the scale 2j and the angle k modi es the frequency channel
100]. The wavelet transform energy jW k f (u 2j )j2 is large when the
angle k and scale 2j match the orientation and scale of high energy
texture components in the neighborhood of u 224, 244, 285, 334]. The
amplitude of jW k f (u 2j )j2 can thus be used to discriminate textures.
Figure 5.9 shows the dyadic wavelet transform of two textures, computed along horizontal and vertical orientations, at the scales 2;4 and 5.6. PROBLEMS 223 2;5 (the image support is normalized to 0 1]2). The central texture
has more energy along horizontal high frequencies than the peripheric
texture. These two textures are therefore discriminated by the wavelet
oriented with k = 0 whereas the other wavelet corresponding k = =2
produces similar responses for both textures.
For segmentation, one must design an algorithm that aggregates
the wavelet responses at all scales and orientations in order to nd the
boundaries of homogeneous textured regions. Both clustering procedures and detection of sharp transitions over wavelet energy measurements have been used to segment the image 224, 285, 334]. These
algorithms work well experimentally but rely on ad hoc parameter settings.
A homogeneous texture can be modeled as a realization of a stationary process, but the main di culty is to nd the characteristics of
this process that play a role in texture discrimination. Texture synthesis experiments 277, 313] show that Markov random eld processes
constructed over grids of wavelet coe cients o er a promising mathematical framework for understanding texture discrimination. 5.6 Problems
5.1.
5.2.
5.3.
5.4. 5.5. 1 Prove that if Z
C2 R ; f g
L K 2 ;f0g then fek n] = exp (i2 kn=(KN ))g0 k<KN is a tight frame of N . Compute the frame bound.
1 Prove that if K
0 then fek (t) = exp (i2 knt=K )gk2Z is
a tight frame of 2 0 1]. Compute the frame bound.
1 Let g = 1
^
;!0 !0 ] . Prove that fg (t ; nu0 ) exp (i2k t=u0 )g(k n)2Z2
is an orthonormal basis of L2( ).
1 Let g (t) = g (t ; nu ) exp(ik t), where g is a window whose
0
0
nk
support is included in ; = 0 = 0 ].
P1
(a) Prove that jg(t ; nu0 )j2 f (t) = +=;1 hf gn k i gn k (t).
k
(b) Prove Theorem 5.8.
b
1 Compute the trigonometric polynomials h(! ) and b(! ) of min~
g
~
imum degree that satisfy (5.78) for the spline lters (5.82, 5.83)
with m = 2. Compute ~ with WaveLab. Is it a nite energy
function? R CHAPTER 5. FRAMES 224
5.6. Compute a cubic spline dyadic wavelet with 2 vanishing moments using the lter h de ned by (5.82) for m = 3, with a lter g
having 3 non-zero coe cients. Compute in WaveLab the dyadic
wavelet transform of the Lady signal with this new wavelet. Cal~
culate g n] if h n] = h n].
~
5.7. 1 Let fg(t ; nu0 ) exp(ik 0 t)g(n k)2Z2 be a windowed Fourier frame
de ned by g(t) = ;1=4 exp(;t2 =2) with u0 = 0 and u0 0 < 2 .
With the conjugate gradient algorithm of Theorem 5.4, compute
in Matlab the window g(t) that generates the dual frame, for
~
the values of u0 0 in Table 5.1. Compare g with g and explain
~
your result. Verify numerically that when 0 u0 = 2 then g is a
~
discontinuous function that does not belong to L2 ( ).
5.8. 1 Prove that a nite set of N vectors f n g1 n N is always a frame
of the space V generated by linear combinations of these vectors.
With an example, show that the frame bounds A and B may go
respectively to 0 and +1 when N goes to +1.
5.9. 2 Sigma-Delta converter A signal f (t) is sampled and quantized.
^
We suppose that f has a support in ; =T =T ].
(a) Let x n] = f (nT=K ). Show that if ! 2 ; ] then x(!) 6= 0
^
only if ! 2 ; =K =K ].
(b) Let x n] = Q(x n]) be the quantized samples. We now consider
~
x n] as a random vector, and we model the error x n] ; x n] =
~
2 . Find the lter
W n] as a white noise process of variance
h n] that minimizes
1 R = Efkx ? h ; xk2 g
~
and compute this minimum as a function of 2 and K . Compare your result with (5.43).
^
(c) Let hp (!) = (1 ; e;i! );p be the transfer function of a discrete
integration of order p. We quantize x n] = Q(x ? hp n]). Find
~
the lter h n] that minimizes = Efkx ? h ; xk2 g, and com~
2 , K and p. For a xed
pute this minimum as a function of
oversampling factor K , how can we reduce this error?
5.10. 2 Let be a dyadic wavelet that satis es (5.68). Let l2 (L2 ( ))
P1
be the space of sequences fgj (u)gj 2Z such that +=;1 kgj k2 <
j
+1. R 5.6. PROBLEMS 225 R R (a) Verify that if f 2 L2 ( ) then fWf (u 2j )gj 2Z 2 l2 (L2 ( )).
Let ~ be de ned by
b (!) = P ^(!)
~
+1 j ^(2j ! )j2
j =;1
and W ;1 be the operator de ned by W ;1 fg j (u)gj 2Z = +1
X 1 g ? ~ j (t):
jj 2
j =;1 2 R Prove that W ;1 is the pseudo inverse of W in l2 (L2 ( )).
(b) Verify that ~ has the same number of vanishing moments as
.
(c) Let V be the subspace of l2 (L2 ( )) that regroups all the
dyadic wavelet transforms of functions in L2 ( ). Compute
the orthogonal projection of fgj (u)gj 2Z in V.
1 Prove that if there exist A > 0 and B
0 such that
5.11.
^
^
A (2 ; jh(!)j2 ) jg(!)j2 B (2 ; jh(!)j2 )
^
(5.95) R R R and if de ned in (5.74) belongs to L2( ), then the wavelet
given by (5.75) is a dyadic wavelet.
5.12. 2 Zak transform The Zak transform associates to any f 2 L2 ( ) Zf (u ) = +1
X l=;1 R ei2 l f (u ; l) : R (a) Prove that it is a unitary operator from L2 ( ) to L2 0 1]2 : Z +1
;1 f (t) g (t) dt = Z 1Z 1
0 0 Zf (u ) Zg (u ) du d R by verifying that for g = 1 0 1] it transforms the orthogonal
basis fgn k (t) = g(t ; n) exp(i2 kt)g(n k)2Z2 of L2( ) into an
orthonormal basis of L2 0 1]2 .
(b) Prove that the inverse Zak transform is de ned by 8h 2 L 0
2 1]2 Z ;1 h(u) = Z1
0 h(u ) d : 226 RR CHAPTER 5. FRAMES (c) Prove that if g 2 L2 ( ) then fg(t ; n) exp(i2 kt)g(n k)2Z2 is
a frame of L2( ) if and only if there exist A > 0 and B such
that 8(u ) 2 0 1]2 A jZg(u )j2 B (5.96) where A and B are the frame bounds.
(d) Prove that if (5.96) holds then the dual window g of the dual
~
frame is de ned by Z g(u ) = 1=Zg (u ).
~
^
5.13. 3 Suppose that f has a support in ; =T =T ]. Let ff (tn )gn2Z
be irregular samples that satisfy (5.4). With an inverse frame algorithm based on the conjugate gradient Theorem 5.4, implement
in Matlab a procedure that computes ff (nT )gn2Z (from which
f can be recovered with the sampling Theorem 3.1). Analyze the
convergence rate of the conjugate gradient algorithm as a function
of . What happens if the condition (5.4) is not satis ed?
5.14. 3 Develop a texture classi cation algorithm with a two-dimensional
Gabor wavelet transform using four oriented wavelets. The classication procedure can be based on \feature vectors" that provide
local averages of the wavelet transform amplitude at several scales,
along these four orientations 224, 244, 285, 334]. Chapter 6
Wavelet Zoom
Awavelet transform can focus on localized signal structures with a
zooming procedure that progressively reduces the scale parameter. Singularities and irregular structures often carry essential information in
a signal. For example, discontinuities in the intensity of an image indicate the presence of edges in the scene. In electrocardiograms or
radar signals, interesting information also lies in sharp transitions. We
show that the local signal regularity is characterized by the decay of
the wavelet transform amplitude across scales. Singularities and edges
are detected by following the wavelet transform local maxima at ne
scales.
Non-isolated singularities appear in complex signals such as multifractals. In recent years, Mandelbrot led a broad search for multifractals, showing that they are hidden in almost every corner of nature and
science. The wavelet transform takes advantage of multifractal selfsimilarities, in order to compute the distribution of their singularities.
This singularity spectrum is used to analyze multifractal properties.
Throughout the chapter, the wavelets are real functions. 6.1 Lipschitz Regularity 1
To characterize singular structures, it is necessary to precisely quantify the local regularity of a signal f (t). Lipschitz exponents provide
uniform regularity measurements over time intervals, but also at any
227 CHAPTER 6. WAVELET ZOOM 228 point v. If f has a singularity at v, which means that it is not di erentiable at v, then the Lipschitz exponent at v characterizes this singular
behavior.
The next section relates the uniform Lipschitz regularity of f over
to the asymptotic decay of the amplitude of its Fourier transform.
This global regularity measurement is useless in analyzing the signal
properties at particular locations. Section 6.1.3 studies zooming procedures that measure local Lipschitz exponents from the decay of the
wavelet transform amplitude at ne scales. R 6.1.1 Lipschitz De nition and Fourier Analysis The Taylor formula relates the di erentiability of a signal to local polynomial approximations. Suppose that f is m times di erentiable in
v ; h v + h]. Let pv be the Taylor polynomial in the neighborhood of
v:
m;1 (k)
X
pv (t) = f k!(v) (t ; v)k :
(6.1)
k=0
The Taylor formula proves that the approximation error
v (t) = f (t) ; pv (t) satis es
8t 2 v ; h v + h] j v (t)j jt ; v jm
m! sup u2 v;h v+h] jf m (u)j: (6.2) The mth order di erentiability of f in the neighborhood of v yields
an upper bound on the error v (t) when t tends to v. The Lipschitz
regularity re nes this upper bound with non-integer exponents. Lipschitz exponents are also called Holder exponents in the mathematical
literature. De nition 6.1 (Lipschitz) A function f is pointwise Lipschitz
0 at v, if there exist K > 0, and a polynomial pv of degree
m = b c such that
8t 2 R jf (t) ; pv (t)j K jt ; vj : (6.3) 6.1. LIPSCHITZ REGULARITY 229 A function f is uniformly Lipschitz over a b] if it satis es (6.3)
for all v 2 a b], with a constant K that is independent of v .
The Lipschitz regularity of f at v or over a b] is the sup of the
such that f is Lipschitz . At each v the polynomial pv (t) is uniquely de ned. If f is m = b c
times continuously di erentiable in a neighborhood of v, then pv is the
Taylor expansion of f at v. Pointwise Lipschitz exponents may vary
arbitrarily from abscissa to abscissa. One can construct multifractal
functions with non-isolated singularities, where f has a di erent Lipschitz regularity at each point. In contrast, uniform Lipschitz exponents
provide a more global measurement of regularity, which applies to a
whole interval. If f is uniformly Lipschitz > m in the neighborhood
of v then one can verify that f is necessarily m times continuously
di erentiable in this neighborhood.
If 0
< 1 then pv (t) = f (v) and the Lipschitz condition (6.3)
becomes
8t 2
jf (t) ; f (v )j K jt ; v j :
A function that is bounded but discontinuous at v is Lipschitz 0 at v.
If the Lipschitz regularity is < 1 at v, then f is not di erentiable at
v and characterizes the singularity type. R R Fourier Condition The uniform Lipschitz regularity of f over is related to the asymptotic decay of its Fourier transform. The following
theorem can be interpreted as a generalization of Proposition 2.1. RTheorem 6.1
if A function f is bounded and uniformly Lipschitz over Z +1
;1 ^
jf (! )j (1 + j! j ) d! < +1: (6.4) Proof 1 . To prove that f is bounded, we use the inverse Fourier integral
(2.8) and (6.4) which shows that jf (t)j Z +1
;1 jf^(!)j d! < +1: CHAPTER 6. WAVELET ZOOM 230 Let us now verify the Lipschitz condition (6.3) when 0
1. In this
case pv (t) = f (v) and the uniform Lipschitz regularity means that there
exists K > 0 such that for all (t v) 2 2
jf (t) ; f (v)j K: R jt ; vj Since 1 Z +1 f^(!) exp(i!t) d!
f (t) = 2
;1
jf (t) ; f (v)j
1 Z +1 jf^(!)j j exp(i!t) ; exp(i!v)j d!: (6.5)
jt ; vj
2 ;1
jt ; vj
For jt ; vj;1 j!j,
j exp(i!t) ; exp(i!v)j
2
2 j!j : jt ; vj For jt ; vj;1 j!j,
j exp(i!t) ; exp(i!v)j jt ; vj j!j jt ; vj j!j :
jt ; vj
jt ; vj
Cutting the integral (6.5) in two for j!j < jt ; vj;1 and j!j jt ; vj;1
yields jf (t) ; f (v)j 1 Z +1 2 jf^(!)j j!j d! = K:
jt ; vj
2 ;1
If (6.4) is satis ed, then K < +1 so f is uniformly Lipschitz .
Let us extend this result for m = b c > 0. We proved in (2.42) R that (6.4) implies that f is m times continuously di erentiable. One
can verify that f is uniformly Lipschitz over if and only if f (m)
is uniformly Lipschitz ; m over . The Fourier transform of f (m)
^
is (i!)m f (!). Since 0
; m < 1, we can use our previous result
which proves that f (m) is uniformly Lipschitz ; m, and hence that f
is uniformly Lipschitz . R The Fourier transform is a powerful tool for measuring the minimum
global regularity of functions. However, it is not possible to analyze the
^
regularity of f at a particular point v from the decay of jf (!)j at high
frequencies !. In contrast, since wavelets are well localized in time,
the wavelet transform gives Lipschitz regularity over intervals and at
points. 6.1. LIPSCHITZ REGULARITY 231 6.1.2 Wavelet Vanishing Moments To measure the local regularity of a signal, it is not so important to use
a wavelet with a narrow frequency support, but vanishing moments are
crucial. If the wavelet has n vanishing moments then we show that the
wavelet transform can be interpreted as a multiscale di erential operator of order n. This yields a rst relation between the di erentiability
of f and its wavelet transform decay at ne scales. Polynomial Suppression The Lipschitz property (6.3) approximates
f with a polynomial pv in the neighborhood of v:
(6.6)
f (t) = pv (t) + v (t) with j v (t)j K jt ; vj :
A wavelet transform estimates the exponent by ignoring the polynomial pv . For this purpose, we use a wavelet that has n > vanishing
moments:
Z +1
tk (t) dt = 0 for 0 k < n :
;1 A wavelet with n vanishing moments is orthogonal to polynomials of
degree n ; 1. Since < n, the polynomial pv has degree at most n ; 1.
With the change of variable t0 = (t ; u)=s we verify that
Z +1
1
Wpv (u s) =
pv (t) ps t ; u dt = 0:
(6.7)
s
;1
Since f = pv + v ,
Wf (u s) = W v (u s):
(6.8)
Section 6.1.3 explains how to measure from jWf (u s)j when u is in
the neighborhood of v. Multiscale Di erential Operator The following proposition proves
that a wavelet with n vanishing moments can be written as the nth order
derivative of a function the resulting wavelet transform is a multiscale di erential operator. We suppose that has a fast decay which
means that for any decay exponent m 2 there exists Cm such that
Cm :
8t 2
j (t)j
(6.9)
1 + jtjm R N CHAPTER 6. WAVELET ZOOM 232 Theorem 6.2 A wavelet with a fast decay has n vanishing moments
if and only if there exists with a fast decay such that
n
n d (t) :
(t) = (;1)
n
As a consequence dt dn
Wf (u s) = sn dun (f ? s)(u) (6.10)
(6.11) with s(t) = s;1=2 (;t=s)+1
R . Moreover, 0. has no more than n vanishing
moments if and only if ;1 (t) dt 6=
Proof 1 . The fast decay of implies that ^ is C1. This is proved by
setting f = ^ in Proposition 2.1. The integral of a function is equal to
its Fourier transform evaluated at ! = 0. The derivative property (2.22)
implies that for any k < n Z +1
;1 tk (t) dt = (i)k ^(k) (0) = 0: (6.12) We can therefore make the factorization
^(!) = (;i!)n ^(!)
(6.13)
and ^(!) is bounded. The fast decay of is proved with an induction on
n. For n = 1,
(t) = Zt ;1 (u) du = Z +1
t (u) du and the fast decay of is derived from (6.9). We then similarly verify
that increasing by 1 the order of integration up to n maintains the fast
decay of .
R +1
Conversely, j ^(!)j ;1 j (t)j dt < +1, because has a fast decay.
The Fourier transform of (6.10) yields (6.13) which implies that ^(k) (0) =
0 for k < n. It follows from (6.12) that has n vanishing moments.
To test whether has more than n vanishing moments, we compute
with (6.13) Z +1
;1 tn (t) dt = (i)n ^(n) (0) = (;i)n n! ^(0): 6.1. LIPSCHITZ REGULARITY 233 Clearly, has no more than n vanishing moments if and only if ^(0) =
R +1 (t) dt 6= 0.
;1
The wavelet transform (4.32) can be written
1
(6.14)
Wf (u s) = f ? s(u) with s (t) = ps ;t :
s
n We derive from (6.10) that s (t) = sn d dtsn(t) . Commuting the convolution and di erentiation operators yields
n
dn
Wf (u s) = sn f ? ddtns (u) = sn dun (f ? s)(u): R (6.15)
(6.15) +1
If K = ;1 (t) dt 6= 0 then the convolution f ? s(t) can be interpreted
as a weighted average of f with a kernel dilated by s. So (6.11) proves
that Wf (u s) is an nth order derivative of an averaging of f over a
domain proportional to s. Figure 6.1 shows a wavelet transform calculated with = ; 0 , where is a Gaussian. The resulting Wf (u s) is
the derivative of f averaged in the neighborhood of u with a Gaussian
kernel dilated by s.
Since has a fast decay, one can verify that
1
lim p = K
s!0 s s
in the sense of the weak convergence (A.30). This means that for any
that is continuous at u,
1
lim ? p s(u) = K (u):
s!0
s
If f is n times continuously di erentiable in the neighborhood of u then
(6.11) implies that
(u
1
lim Wf+1=2s) = s!0 f (n) ? p s(u) = K f (n) (u) :
lim
(6.16)
n
s!0 s
s
In particular, if f is Cn with a bounded nth order derivative then
jWf (u s)j = O(sn+1=2 ). This is a rst relation between the decay
of jWf (u s)j when s decreases and the uniform regularity of f . Finer
relations are studied in the next section. CHAPTER 6. WAVELET ZOOM 234 f(t)
2
1
0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t s
0.02
0.04
0.06
0.08
0.1
0.12
0 u Figure 6.1: Wavelet transform Wf (u s) calculated with = ; 0 where
is a Gaussian, for the signal f shown above. The position parameter u
and the scale s vary respectively along the horizontal and vertical axes.
Black, grey and white points correspond respectively to positive, zero
and negative wavelet coe cients. Singularities create large amplitude
coe cients in their cone of in uence. 6.1. LIPSCHITZ REGULARITY 235 6.1.3 Regularity Measurements with Wavelets The decay of the wavelet transform amplitude across scales is related to
the uniform and pointwise Lipschitz regularity of the signal. Measuring
this asymptotic decay is equivalent to zooming into signal structures
with a scale that goes to zero. We suppose that the wavelet has n
vanishing moments and is Cn with derivatives that have a fast decay.
This means that for any 0 k n and m 2 there exists Cm such
that
Cm :
8t 2
j (k) (t)j
(6.17)
1 + jtjm
The following theorem relates the uniform Lipschitz regularity of f on
an interval to the amplitude of its wavelet transform at ne scales.
Theorem 6.3 If f 2 L2 ( ) is uniformly Lipschitz
n over a b],
then there exists A > 0 such that
8(u s) 2 a b] + jWf (u s)j A s +1=2 :
(6.18)
Conversely, suppose that f is bounded and that Wf (u s) satis es (6.18)
for an < n that is not an integer. Then f is uniformly Lipschitz
on a + b ; ], for any > 0. R
R N R Proof 3 . This theorem is proved with minor modi cations in the proof
of Theorem 6.4. Since f is Lipschitz at any v 2 a b], Theorem 6.4
shows in (6.21) that RR :
1+ u;v
s
For u 2 a b], we can choose v = u, which implies that jWf (u s)j
A s +1=2 . We verify from the proof of (6.21) that the constant A does
not depend on v because the Lipschitz regularity is uniform over a b].
To prove that f is uniformly Lipschitz over a + b ; ] we must
verify that there exists K such that for all v 2 a + b ; ] we can nd
a polynomial pv of degree b c such that
8t 2
jf (t) ; pv (t)j K jt ; vj :
(6.19)
When t 2 a+ =2 b; =2] then jt;vj =2 and since f is bounded, (6.19)
=
is veri ed with a constant K that depends on . For t 2 a + =2 b ; =2], 8(u s) 2 jWf (u s)j + R As +1=2 CHAPTER 6. WAVELET ZOOM 236 the proof follows the same derivations as the proof of pointwise Lipschitz
regularity from (6.22) in Theorem 6.4. The upper bounds (6.27) and
(6.28) are replaced by 8t 2 a + =2 b ; =2] j (k) (t)j
j ;k)j for 0 k b c + 1 :
(6.20)
This inequality is veri ed by computing an upper bound integral similar
to (6.26) but which is divided in two, for u 2 a b] and u 2 a b]. When
=
u 2 a b], the condition (6.22) is replaced by jWf (u s)j A s +1=2 in
(6.26). When u 2 a b], we just use the fact that jWf (u s)j kf k k k
=
and derive (6.20) from the fast decay of j (k) (t)j, by observing that jt ;
uj =2 for t 2 a + =2 b ; =2]. The constant K depends on A and
but not on v. The proof then proceeds like the proof of Theorem 6.4,
and since the resulting constant K in (6.30) does not depend on v, the
Lipschitz regularity is uniform over a ; b + ]. K 2( The inequality (6.18) is really a condition on the asymptotic decay of
jWf (u s)j when s goes to zero. At large scales it does not introduce
any constraint since the Cauchy-Schwarz inequality guarantees that the
wavelet transform is bounded:
jWf (u s)j = jhf kf k k k: u s ij When the scale s decreases, Wf (u s) measures ne scale variations in
the neighborhood of u. Theorem 6.3 proves that jWf (u s)j decays like
s +1=2 over intervals where f is uniformly Lipschitz .
Observe that the upper bound (6.18) is similar to the su cient
Fourier condition of Theorem 6.1, which supposes that jf^(!)j decays
faster than !; . The wavelet scale s plays the role of a \localized"
inverse frequency !;1. As opposed to the Fourier transform Theorem
6.1, the wavelet transform gives a Lipschitz regularity condition that is
localized over any nite interval and it provides a necessary condition
which is nearly su cient. When a b] = then (6.18) is a necessary
and su cient condition for f to be uniformly Lipschitz on .
If has exactly n vanishing moments then the wavelet transform
decay gives no information concerning the Lipschitz regularity of f for
> n. If f is uniformly Lipschitz > n then it is Cn and (6.16) proves
that lims!0 s;n;1=2 Wf (u s) = K f (n) (u) with K 6= 0. This proves that
jWf (u s)j sn+1=2 at ne scales despite the higher regularity of f . R R 6.1. LIPSCHITZ REGULARITY 237 R If the Lipschitz exponent is an integer then (6.18) is not su cient
in order to prove that f is uniformly Lipschitz . When a b] = , if
= 1 and has two vanishing moments, then the class of functions
that satisfy (6.18) is called the Zygmund class 47]. It is slightly larger
than the set of functions that are uniformly Lipschitz 1. For example,
f (t) = t loge t belongs to the Zygmund class although it is not Lipschitz
1 at t = 0. Pointwise Lipschitz Regularity The study of pointwise Lipschitz exponents with the wavelet transform is a delicate and beautiful topic
which nds its mathematical roots in the characterization of Sobolev
spaces by Littlewood and Paley in the 1930's. Characterizing the regularity of f at a point v can be di cult because f may have very
di erent types of singularities that are aggregated in the neighborhood
of v. In 1984, Bony 99] introduced the \two-microlocalization" theory which re nes the Littlewood-Paley approach to provide pointwise
characterization of singularities, which he used to study the solution
of hyperbolic partial di erential equations. These technical results became much simpler through the work of Ja ard 220] who proved that
the two-microlocalization properties are equivalent to speci c decay
conditions on the wavelet transform amplitude. The following theorem
gives a necessary condition and a su cient condition on the wavelet
transform for estimating the Lipschitz regularity of f at a point v. Remember that the wavelet has n vanishing moments and n derivatives
having a fast decay.
Theorem 6.4 (Ja ard) If f 2 L2( ) is Lipschitz
n at v, then
there exists A such that
u;v
+
8(u s) 2
jWf (u s)j A s +1=2 1 +
: (6.21)
s
Conversely, if < n is not an integer and there exist A and 0 <
such that
0!
u;v
+
(6.22)
8(u s) 2
jWf (u s)j A s +1=2 1 +
s
then f is Lipschitz at v . RR RR R CHAPTER 6. WAVELET ZOOM 238 Proof. The necessary condition is relatively simple to prove but the sufcient condition is much more di cult.
Proof 1 of (6.21) Since f is Lipschitz at v, there exists a polynomial
pv of degree b c < n and K such that jf (t) ; pv (t)j K jt ; vj . Since
has n vanishing moments, we saw in (6.7) that Wpv (u s) = 0 and hence jWf (u s)j = Z +1 1
f (t) ; pv (t) ps t ; u dt
s
Z ;1
+1
1
K jt ; vj ps t ; u dt:
s
;1 The change of variable x = (t ; u)=s gives ps Z +1 K jsx + u ; vj j (x)j dx:
jWf (u s)j
;1 Since ja + bj 2 (jaj + jbj ), ps s Z +1 jxj j (x)j dx + ju ; vj Z +1 j (x)j dx
jWf (u s)j K 2
;1 ;1 which proves (6.21).
Proof 2 of (6.22) The wavelet reconstruction formula (4.37) proves that
f can be decomposed in a Littlewood-Paley type sum f (t) =
with
1
j (t) = C Z +1 Z 2j
;1 2j +1 +1
X j =;1 j (t) 1
Wf (u s) ps (6.23) t ; u ds du :
s s2 (6.24) Let (k) be its kth order derivative. To prove that f is Lipschitz at
j
v we shall approximate f with a polynomial that generalizes the Taylor
polynomial
0 +1
1
bc
X @ X (k) A (t ; v)k
pv (t) =
(6.25)
j (v )
k! :
k=0 j =;1 6.1. LIPSCHITZ REGULARITY 239 If f is n times di erentiable at v then pv corresponds to the Taylor
polynomial but this is not necessarily true. We shall rst prove that
P+1 (k) (v) is nite by getting upper bounds on j (k) (t)j. These
j =;1 j
j
sums may be thought of as a generalization of pointwise derivatives.
To simplify the notation, we denote by K a generic constant which
may change value from one line to the next but that does not depend
on j and t. The hypothesis (6.22) and the asymptotic decay condition
(6.17) imply that
1 j j (t)j = C K Z +1 Z 2j
2j ;1 Z +1
;1 2j As 1 + u ; v
s
0!
1+ u;v 0 ! Cm
ds
m s2 du
1 + j(t ; u)=sj
1
du
(6.26)
j jm 2j
1 + j(t ; u)=2 +1 2j Since ju ; vj 0 2 0 (ju ; tj 0 + jt ; vj 0 ), the change of variable u0 =
2;j (u ; t) yields j j (t)j K 2
Choosing m = 0+2 j Z +1 1 + ju0 j 0 + (v ; t)=2j 0
du0 :
1 + ju0 jm
;1 yields j j (t)j K 2 j ;
1 + v 2j t 0 ! : (6.27) The same derivations applied to the derivatives of j (t) yield 8k b c + 1 j (k) (t)j
j K 2( At t = v it follows that 8k b c j ;k)j (k) (v)j
j
(k) (v )j
j ;
1 + v 2j t
K 2( 0 ! : ;k )j : (6.28)
(6.29) This guarantees a fast decay of j
when 2j goes to zero, because
is not an integer so > b c. At large scales 2j , since jWf (u s)j
kf k k k with the change of variable u0 = (t ; u)=s in (6.24) we have j (k) (v )j
j kf k k k Z +1 j
C ;1 (k) (u0 )j du0 Z 2j
2j +1 ds s3=2+k CHAPTER 6. WAVELET ZOOM 240 and hence j (k) (v)j K 2;(k+1=2)j . Together with (6.29) this proves
j
that the polynomial pv de ned in (6.25) has coe cients that are nite.
With the Littlewood-Paley decomposition (6.23) we compute jf (t) ; pv (t)j = 0
bc
+1
X@
X
j (t) ; j =;1 1 (k) (v ) (t ; v )k A :
j
k! k=0
The sum over scales is divided in two at 2J such that 2J jt ; vj 2J ;1 . For j J , we can use the classical Taylor theorem to bound the Taylor
expansion of j :
+1
X I= j =J
+1
X j (t) ; bc
X k=0 (k) (v ) (t ; v)k
j
k! (t ; v)b c+1 b
(b c + 1)! hsupv] j j
2t
j =J Inserting (6.28) yields I K jt ; vjb c+1 +1
X j =J 2;j (b c+1; ) c+1 (h)j : v;t 0 2j and since 2J jt ; vj 2J ;1 we get I K jv ; tj .
Let us now consider the case j < J II = J ;1
X j =;1
J ;1
X K j =;1 j (t) ; 0
@2 j bc
X k=0 (k) (v) (t ; v)k
j
k! 1
!X
bc
k
v;t
+ (t ; v) 2j ( ;k) A 1 + 2j 0 k! k=0
0
1
bc
X (t ; v)k J ( ;k) A
K @2 J + 2( ; 0 )J jt ; vj 0 +
2
k=0 and since 2J k! jt ; vj 2J ;1 we get II K jv ; tj . As a result
jf (t) ; pv (t)j I + II K jv ; tj
(6.30) which proves that f is Lipschitz at v. 6.1. LIPSCHITZ REGULARITY 241 Cone of In uence To interpret more easily the necessary condition
(6.21) and the su cient condition (6.22), we shall suppose that has
a compact support equal to ;C C ]. The cone of in uence of v in the
scale-space plane is the set of points (u s) such that v is included in the
support of u s(t) = s;1=2 ((t ; u)=s). Since the support of ((t ; u)=s)
is equal to u ; Cs u + Cs], the cone of in uence of v is de ned by
ju ; v j C s: (6.31) It is illustrated in Figure 6.2. If u is in the cone of in uence of v then
Wf (u s) = hf u si depends on the value of f in the neighborhood of
v. Since ju ; vj=s C , the conditions (6.21,6.22) can be written
jWf (u s)j A0 s +1=2 which is identical to the uniform Lipschitz condition (6.18) given by
Theorem 6.3. In Figure 6.1, the high amplitude wavelet coe cients are
in the cone of in uence of each singularity.
v 0 u |u-v| > C s |u-v| > C s
|u-v| < C s s Figure 6.2: The cone of in uence of an abscissa v consists of the scalespace points (u s) for which the support of u s intersects t = v. Oscillating Singularities It may seem surprising that (6.21,6.22) also impose a condition on the wavelet transform outside the cone of
in uence of v. Indeed, this corresponds to wavelets whose support does
not intersect v. For ju ; vj > Cs we get
jWf (u s)j A0 s ; 0 +1=2 ju ; v j : (6.32) CHAPTER 6. WAVELET ZOOM 242 We shall see that it is indeed necessary to impose this decay when u
tends to v in order to control the oscillations of f that might generate
singularities.
Let us consider the generic example of a highly oscillatory function
f (t) = sin 1
t
which is discontinuous at v = 0 because of the acceleration of its oscillations. Since is a smooth Cn function, if it is centered close to
zero then the rapid oscillations of sin t;1 produce a correlation integral h sin t;1 u si that is very small. With an integration by parts,
one can verify that if (u s) is in the cone of in uence of v = 0, then
jWf (u s)j A s2+1=2 . This looks as if f is Lipschitz 2 at 0. However, Figure 6.3 shows high energy wavelet coe cients below the cone
of in uence of v = 0, which are responsible for the discontinuity. To
guarantee that f is Lipschitz , the amplitude of such coe cients is
controlled by the upper bound (6.32).
f(t)
1
0
−1
−0.5 t
0 0.5 0 0.5 s
0.05
0.1
0.15
0.2
0.25
−0.5 u Figure 6.3: Wavelet transform of f (t) = sin(a t;1) calculated with =
; 0 where is a Gaussian. High amplitude coe cients are along a
parabola below the cone of in uence of t = 0.
To explain why the high frequency oscillations appear below the
cone of in uence of v, we use the results of Section 4.4.2 on the esti- 6.2. WAVELET TRANSFORM MODULUS MAXIMA 2 243 mation of instantaneous frequencies with wavelet ridges. The instantaneous frequency of sin t;1 = sin (t) is j 0(t)j = t;2 . Let a be the
analytic part of , de ned in (4.47). The corresponding complex ana
alytic wavelet transform is W af (u s) = hf u si. It was proved in
(4.101) that for a xed time u, the maximum of s;1=2 jW af (u s)j is
located at the scale
s(u) = 0(u) = u2
where is the center frequency of ^a (!). When u varies, the set of
points (u s(u)) de ne a ridge that is a parabola located below the cone
of in uence of v = 0 in the plane (u s). Since = Real a ], the real
wavelet transform is Wf (u s) = Real W af (u s)]:
The high amplitude values of Wf (u s) are thus located along the same
parabola ridge curve in the scale-space plane, which clearly appears in
Figure 6.3. Real wavelet coe cients Wf (u s) change sign along the
ridge because of the variations of the complex phase of W a f (u s).
The example of f (t) = sin t;1 can be extended to general oscillatory
singularities 33]. A function f has an oscillatory singularity at v if there
exist
0 and > 0 such that for t in a neighborhood of v
1
f (t) jt ; vj g jt ; vj
where g(t) is a C1 oscillating function whose primitives at any order
are bounded. The function g(t) = sin t is a typical example. The oscillations have an instantaneous frequency 0(t) that increases to in nity
faster than jtj;1 when t goes to v. High energy wavelet coe cients are
located along the ridge s(u) = = 0(u), and this curve is necessarily
below the cone of in uence ju ; vj C s. 6.2 Wavelet Transform Modulus Maxima 2
Theorems 6.3 and 6.4 prove that the local Lipschitz regularity of f at v
depends on the decay at ne scales of jWf (u s)j in the neighborhood 244 CHAPTER 6. WAVELET ZOOM of v. Measuring this decay directly in the time-scale plane (u s) is not
necessary. The decay of jWf (u s)j can indeed be controlled from its
local maxima values.
We use the term modulus maximum to describe any point (u0 s0)
such that jWf (u s0)j is locally maximum at u = u0. This implies that @Wf (u0 s0) = 0:
@u
This local maximum should be a strict local maximum in either the
right or the left neighborhood of u0, to avoid having any local maxima
when jWf (u s0)j is constant. We call maxima line any connected curve
s(u) in the scale-space plane (u s) along which all points are modulus
maxima. Figure 6.5(b) shows the wavelet modulus maxima of a signal. 6.2.1 Detection of Singularities Singularities are detected by nding the abscissa where the wavelet
modulus maxima converge at ne scales. To better understand the
properties of these maxima, the wavelet transform is written as a multiscale di erential operator. Theorem 6.2 proves that if has exactly
n vanishing moments and a compact support, then there exists of
R +1
compact support such that = (;1)n (n) with ;1 (t) dt 6= 0. The
wavelet transform is rewritten in (6.11) as a multiscale di erential operator
dn
(6.33)
Wf (u s) = sn dun (f ? s)(u):
If the wavelet has only one vanishing moment, wavelet modulus
maxima are the maxima of the rst order derivative of f smoothed by
s , as illustrated by Figure 6.4. These multiscale modulus maxima are
used to locate discontinuities, and edges in images. If the wavelet has
two vanishing moments, the modulus maxima correspond to high curvatures. The following theorem proves that if Wf (u s) has no modulus
maxima at ne scales, then f is locally regular. Theorem 6.5 (Hwang, Mallat) Suppose that is Cn with a comR +1
pact support, and = (;1)n (n) with ;1 (t)dt 6= 0. Let f 2 L1 a b].
If there exists s0 > 0 such that jWf (u s)j has no local maximum for 6.2. WAVELET TRANSFORM MODULUS MAXIMA 245 u 2 a b] and s < s0 , then f is uniformly Lipschitz n on a + b ; ],
for any > 0. f(t) _
f * θs(u) u W1 f(u,s) u W2 f(u,s) u Figure 6.4: The convolution f ? s(u) averages f over a domain propord
tional to s. If = ; 0 then W1f (u s) = s du (f ? s)(u) has modulus
maxima at sharp variation points of f ? s (u). If = 00 then the
d
modulus maxima of W2 f (u s) = s2 du22 (f ? s)(u) correspond to locally
maximum curvatures.
This theorem is proved in 258]. It implies that f can be singular
(not Lipschitz 1) at a point v only if there is a sequence of wavelet
maxima points (up sp)p2N that converges towards v at ne scales:
lim u = v and p!+1 sp = 0 :
lim
p!+1 p
These modulus maxima points may or may not be along the same maxima line. This result guarantees that all singularities are detected by
following the wavelet transform modulus maxima at ne scales. Figure
6.5 gives an example where all singularities are located by following the
maxima lines. Maxima Propagation For all = (;1)n (n) , we are not guaranteed that a modulus maxima located at (u0 s0) belongs to a maxima line CHAPTER 6. WAVELET ZOOM 246 that propagates towards ner scales. When s decreases, Wf (u s) may
have no more maxima in the neighborhood of u = u0. The following
proposition proves that this is never the case if is a Gaussian. The
wavelet transform Wf (u s) can then be written as the solution of the
heat di usion equation, where s is proportional to the di usion time.
The maximum principle applied to the heat di usion equation proves
that maxima may not disappear when s decreases. Applications of
the heat di usion equation to the analysis of multiscale averaging have
been studied by several computer vision researchers 217, 236, 359]. R Proposition 6.1 (Hummel, Poggio, Yuille) Let = (;1)n (n) where
is a Gaussian. For any f 2 L2 ( ) , the modulus maxima of Wf (u s) belong to connected curves that are never interrupted when the scale
decreases.
Proof 3 . To simplify the proof, we suppose that is a normalized Gaussian (t) = 2;1 ;1=2 exp(;t2 =4) whose Fourier transform is ^(!) =
exp(;!2 ). Theorem 6.2 proves that Wf (u s) = sn f (n) ? s (u)
(6.34)
where the nth derivative f (n) is de ned in the sense of distributions. Let
be the di usion time. The solution of
@g( u) = ; @ 2 g( u) (6.35)
@
@u2
with initial condition g(0 u) = g0 (u) is obtained by computing the
Fourier transform with respect to u of (6.35):
@g( u) = ;!2 g( !):
^
@
It follows that g ( !) = g0 (!) exp(; !2 ) and hence
^
^
1
g(u ) = p g0 ? (u):
= s, setting g0 = f (n) and inserting (6.34) yields Wf (u s) =
s): The wavelet transform is thus proportional to a heat diffusion with initial condition f (n) .
For sn+1=2 g(u 6.2. WAVELET TRANSFORM MODULUS MAXIMA 247 f(t)
2
1
0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t log2(s)
−6
−4
−2
0
0 (a) u log (s)
−6 2 −4 −2 0
0 (b) u log2|Wf(u,s)|
−3
−4
−5
−6
−7
−6 −5 −4 log2(s)
−3 (c)
Figure 6.5: (a): Wavelet transform Wf (u s). The horizontal and vertical axes give respectively u and log2 s. (b): Modulus maxima of
Wf (u s). (c): The full line gives the decay of log2 jWf (u s)j as a
function of log2 s along the maxima line that converges to the abscissa
t = 0:05. The dashed line gives log2 jWf (u s)j along the left maxima
line that converges to t = 0:42. 248 CHAPTER 6. WAVELET ZOOM
The maximum principle for the parabolic heat equation 36] proves
that a global maximum of jg(u s)j for (u s) 2 a b] s0 s1 ] is necessarily
either on the boundary u = a b or at s = s0 . A modulus maxima
of Wf (u s) at (u1 s1 ) is a local maxima of jg(u s)j for a xed s and
u varying. Suppose that a line of modulus maxima is interrupted at
(u1 s1 ), with s1 > 0. One can then verify that there exists > 0 such
that a global maximum of jg(u s)j over u1 ; u1 + ] s1 ; s1 ] is at
(u1 s1 ). This contradicts the maximum principle, and thus proves that
all modulus maxima propagate towards ner scales. Derivatives of Gaussians are most often used to guarantee that all maxima lines propagate up to the nest scales. Chaining together maxima
into maxima lines is also a procedure for removing spurious modulus
maxima created by numerical errors in regions where the wavelet transform is close to zero. Isolated Singularities A wavelet transform may have a sequence of local maxima that converge to an abscissa v even though f is perfectly
regular at v. This is the case of the maxima line of Figure 6.5 that
converges to the abscissa v = 0:23. To detect singularities it is therefore
not su cient to follow the wavelet modulus maxima across scales. The
Lipschitz regularity is calculated from the decay of the modulus maxima
amplitude.
Let us suppose that for s < s0 all modulus maxima that converge
to v are included in a cone
ju ; v j C s:
(6.36)
This means that f does not have oscillations that accelerate in the
neighborhood of v. The potential singularity at v is necessarily isolated.
Indeed, we can derive from Theorem 6.5 that the absence of maxima
below the cone of in uence implies that f is uniformly Lipschitz n in
the neighborhood of any t 6= v with t 2 (v ; Cs0 v + Cs0 ). The decay
of jWf (u s)j in the neighborhood of v is controlled by the decay of
the modulus maxima included in the cone ju ; vj C s. Theorem 6.3
implies that f is uniformly Lipschitz in the neighborhood of v if and
only if there exists A > 0 such that each modulus maximum (u s) in
the cone (6.36) satis es
jWf (u s)j A s +1=2
(6.37) 6.2. WAVELET TRANSFORM MODULUS MAXIMA 249 which is equivalent to
log2 jWf (u s)j log2 A + 1
+ 2 log2 s: (6.38) The Lipschitz regularity at v is thus the maximum slope of log2 jWf (u s)j
as a function of log2 s along the maxima lines converging to v.
In numerical calculations, the nest scale of the wavelet transform
is limited by the resolution of the discrete data. From a sampling at
intervals N ;1 , Section 4.3.3 computes the discrete wavelet transform at
scales s
N ;1, where is large enough to avoid sampling coarsely the
wavelets at the nest scale. The Lipschitz regularity of a singularity
is then estimated by measuring the decay slope of log2 jWf (u s)j as
a function of log2 s for 2J s
N ;1. The largest scale 2J should
be smaller than the distance between two consecutive singularities to
avoid having other singularities in uence the value of Wf (u s). The
sampling interval N ;1 must therefore be small enough to measure
accurately. The signal in Figure 6.5(a) is de ned by N = 256 samples.
Figure 6.5(c) shows the decay of log2 jWf (u s)j along the maxima line
converging to t = 0:05. It has slope +1=2 1=2 for 2;4 s 2;6. As
expected, = 0 because the signal is discontinuous at t = 0:05. Along
the second maxima line converging to t = 0:42 the slope is + 1=2 1,
which indicates that the singularity is Lipschitz 1=2.
When f is a function whose singularities are not isolated, nite resolution measurements are not su cient to distinguish individual singularities. Section 6.4 describes a global approach that computes the
singularity spectrum of multifractals by taking advantage of their selfsimilarity. Smoothed Singularities The signal may have important variations
that are in nitely continuously di erentiable. For example, at the border of a shadow the grey level of an image varies quickly but is not
discontinuous because of the di raction e ect. The smoothness of these
transitions is modeled as a di usion with a Gaussian kernel whose variance is measured from the decay of wavelet modulus maxima.
In the neighborhood of a sharp transition at v, we suppose that
f (t) = f0 ? g (t)
(6.39) CHAPTER 6. WAVELET ZOOM 250 where g is a Gaussian of variance 2:
2
1
(6.40)
g (t) = p exp ;t2 :
2
2
If f0 has a Lipschitz singularity at v that is isolated and non-oscillating,
it is uniformly Lipschitz in the neighborhood of v. For wavelets that
are derivatives of Gaussians, the following theorem 261] relates the
decay of the wavelet transform to and .
Theorem 6.6 Let = (;1)n (n) with (t) = exp(;t2 =(2 2)). If
f = f0 ? g and f0 is uniformly Lipschitz on v ; h v + h] then there
exists A such that
8(u s) 2 v ;h v +h] R + jWf (u s)j A s +1=2 2 1 + 2 s2 ;(n; )=2 (6.41) Proof 2 . The wavelet transform can be written dn
dn
Wf (u s) = sn dun (f ? s)(u) = sn dun (f0 ? g ? s)(u): (6.42) Since is a Gaussian, one can verify with a Fourier transform calculation
that
s
rs
2
(6.43)
s ? g (t) = s s0 (t) with s0 = s2 + 2 :
0 Inserting this result in (6.42) yields r dn
n+1=2
Wf (u s) = sn ss dun (f0 ? s0 )(u) = ss
Wf0 (u s0 ): (6.44)
0
0
Since f0 is uniformly Lipschitz on v ; h v + h], Theorem 6.3 proves
that there exists A > 0 such that
8(u s) 2 v ; h v + h] + jWf0(u s)j A s +1=2 : (6.45)
Inserting this in (6.44) gives R jWf (u s)j A ss
0 n+1=2 s0 +1=2 (6.46) from which we derive (6.41) by inserting the expression (6.43) of s0 . : 6.2. WAVELET TRANSFORM MODULUS MAXIMA 251 This theorem explains how the wavelet transform decay relates to the
amount of di usion of a singularity. At large scales s
= , the
Gaussian averaging is not \felt" by the wavelet transform which decays
like s +1=2 . For s = , the variation of f at v is not sharp relative to
s because of the Gaussian averaging. At these ne scales, the wavelet
transform decays like sn+1=2 because f is C1.
The parameters K , , and are numerically estimated from the
decay of the modulus maxima along the maxima curves that converge
towards v. The variance 2 depends on the choice of wavelet and is
known in advance. A regression is performed to approximate
2
log2 jWf (u s)j log2 (K ) + + 1 log2 s ; n ; log2 1 + 2 2 :
2
2
s
Figure 6.6 gives the wavelet modulus maxima computed with a wavelet
that is a second derivative of a Gaussian. The decay of log2 jWf (u s)j
as a function of log2 s is given along several maxima lines corresponding
to smoothed and non-smoothed singularities. The wavelet is normalized
so that = 1 and the di usion scale is = 2;5. 6.2.2 Reconstruction From Dyadic Maxima 3 Wavelet transform maxima carry the properties of sharp signal transitions and singularities. If one can reconstruct a signal from these
maxima, it is then possible to modify the singularities of a signal by
processing the wavelet transform modulus maxima. The strength of
singularities can be modi ed by changing the amplitude of the maxima
and we can remove some singularities by suppressing the corresponding
maxima.
For fast numerical computations, the detection of wavelet transform
maxima is limited to dyadic scales f2j gj2Z. Suppose that is a dyadic
wavelet, which means that there exist A > 0 and B such that R 8! 2 ; f0g A +1
X j =;1 j ^(2j ! )j2 B: (6.47) Theorem 5.11 proves that the dyadic wavelet transform fWf (u 2j )gj2Z
is a complete and stable representation. This means that it admits CHAPTER 6. WAVELET ZOOM 252
f(t)
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t log2(s)
−7
−6
−5
−4
−3
0 (a) u log2(s)
−7
−6
−5
−4
−3
0 (b) log2|Wf(u,s)| log2|Wf(u,s)| −8 −6 −10 u −8 −12 −10 −14
−16
−7 −12
−6 −5 −4 log2(s)
−3 −7 −6 −5 −4 log2(s)
−3 (c)
Figure 6.6: (a): Wavelet transform Wf (u s). (b): Modulus maxima
of a wavelet transform computed = 00, where is a Gaussian with
variance = 1. (c): Decay of log2 jWf (u s)j along maxima curves. In
the left gure, the solid and dotted lines correspond respectively to the
maxima curves converging to t = 0:81 and t = 0:12. In the right gure,
they correspond respectively to the curves converging to t = 0:38 and
t = 0:55. The di usion at t = 0:12 and t = 0:55 modi es the decay for
s
= 2;5. 6.2. WAVELET TRANSFORM MODULUS MAXIMA 253 a bounded left inverse. This dyadic wavelet transform has the same
properties as a continuous wavelet transform Wf (u s). All theorems
of Sections 6.1.3 and 6.2 remain valid if we restrict s to the dyadic
scales f2j gj2Z. Singularities create sequences of maxima that converge
towards the corresponding location at ne scales, and the Lipschitz
regularity is calculated from the decay of the maxima amplitude. Translation-Invariant Representation At each scale 2j , the max- ima representation provides the values of Wf (u 2j ) where jWf (u 2j )j
is locally maximum. Figure 6.7(c) gives an example. This adaptive
sampling of u produces a translation-invariant representation. When f
is translated by each Wf (2j u) is translated by and their maxima
are translated as well. This is not the case when u is uniformly sampled
as in the wavelet frames of Section 5.3. Section 5.4 explains that this
translation invariance is of prime importance for pattern recognition
applications. Reconstruction To study the completeness and stability of wavelet maxima representations, Mallat and Zhong introduced an alternate projection algorithm 261] that recovers signal approximations from their
wavelet maxima several other algorithms have been proposed more recently 116, 142, 199]. Numerical experiments show that one can only
recover signal approximations with a relative mean-square error of the
order of 10;2. For general dyadic wavelets, Meyer 48] and Berman 94]
proved that exact reconstruction is not possible. They found families
of continuous or discrete signals whose dyadic wavelet transforms have
the same modulus maxima. However, signals with the same wavelet
maxima di er from each other only slightly, which explains the success of numerical reconstructions 261]. If the signal has a band-limited
Fourier transform and if ^ has a compact support, then Kicey and
Lennard 235] proved that wavelet modulus maxima de ne a complete
and stable signal representation.
A simple and fast reconstruction algorithm is presented from a frame
perspective. Section 5.1 is thus a prerequisite. At each scale 2j , we
know the positions fuj pgp of the local maxima of jWf (u 2j )j and the CHAPTER 6. WAVELET ZOOM 254
f(t)
200
100
0
0 0.2 0.4 (a) 0.6 0.8 1 t 2−7
2−6
2−5
2−4
2−3
2−2
2−1
2−0 (b)
2−7
2−6
2−5
2−4
2−3
2−2
2−1
2−0 (c)
Figure 6.7: (a): Intensity variation along one row of the Lena image.
(b): Dyadic wavelet transform computed at all scales 2N ;1 2j 1,
with the quadratic spline wavelet = ; 0 shown in Figure 5.6. (c):
Modulus maxima of the dyadic wavelet transform. 6.2. WAVELET TRANSFORM MODULUS MAXIMA
values Wf (uj p 2j ) = hf with 255 j pi u
p1 j t ; j j p :
2
2
The reconstruction algorithm should recover a function f~ such that
j p(t) = W f~(uj p 2j ) = hf~ j pi = hf j p i: (6.48) and whose wavelet modulus maxima are all located at uj p. Frame Pseudo-Inverse The main di culty comes from the non- linearity and non-convexity of the constraint on the position of local
maxima. To reconstruct an approximated signal with a fast algorithm,
this constraint is replaced by a minimization of the signal norm. Instead
of nding a function whose wavelet modulus maxima are exactly located
at the uj p, the reconstruction algorithm recovers the function f~ of
minimum norm such that hf~ j pi = hf j pi. The minimization of
~
kf k has a tendency to decrease the wavelet transform energy at each
scale 2j
Z +1
j )k2 =
~
~
kW f ( u 2
jW f (u 2j )j2 du
;1 because of the norm equivalence proved in Theorem 5.11: A kf~k2 +1
X j =;1 2;j kW f~(u 2j )k2 B kf~k2: ~
The norm kW f (u 2j )k is reduced by decreasing jW f~(u 2j )j. Since
we also impose that W f~(uj p 2j ) = hf j pi, minimizing kf k generally
creates local maxima at u = uj p.
The signal f~ of minimum norm that satis es (6.48) is the orthogonal
projection PV f of f on the space V generated by the wavelets f j pgj p
corresponding to the maxima. In discrete calculations, there is a nite
number of maxima so f j pgj p is a nite family and hence a basis or a
redundant frame of V. CHAPTER 6. WAVELET ZOOM 256 Theorem 5.4 describes a conjugate gradient algorithm that recovers
f~ from the frame coe cients hf~ j pi with a pseudo-inverse. It performs
this calculation by inverting a frame symmetrical operator L introduced
in (5.26), which is de ned by
8r 2 V Lr = Clearly f~ = L;1Lf = L;1g with
X
g = Lf~ = hf~ j pi
jp X
jp hr jp = j pi j p X
jp hf : j pi j p (6.49) : (6.50) The conjugate gradient computes L;1g with an iterative procedure that
has exponential convergence. The convergence rate depends on the
frame bounds A and B of f j pgj p in V. Approximately 10 iterations
are usually su cient to recover an approximation of f with a relative
mean-square error on the order of 10;2. More iterations do not decrease
the error much because f~ 6= f . Each iteration requires O(N log2 N )
calculations if implemented with a fast \a trous" algorithm. Example 6.1 Figure 6.8(b) shows the signal f~ = PV f recovered with 10 iterations of the conjugate gradient algorithm, from the wavelet tranform maxima in Figure 6.7(c). After 20 iterations, the reconstruction
error is kf ; f~k=kf k = 2:5 10;2. Figure 6.8(c) shows the signal reconstructed from the 50% of wavelet maxima that have the largest amplitude. Sharp signal transitions corresponding to large wavelet maxima
have not been a ected, but small texture variations disappear because
the corresponding maxima are removed. The resulting signal is piecewise regular. Fast Discrete Calculations To simplify notation, the sampling in- terval of the input signal is normalized to 1. The dyadic wavelet transform of this normalized discrete signal a0 n] of size N is calculated at
scales 2 2j N with the \algorithme a trous" of Section 5.5.2. The
cascade of convolutions with the two lters h n] and g n] is computed
with O(N log2 N ) operations. 6.2. WAVELET TRANSFORM MODULUS MAXIMA 257 f(t)
200
100
0
0 0.2 0.4 0.2 0.4 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 0.6 0.8 1 t 200
100
0
0 (b) t 200
100
0
0 t (c)
Figure 6.8: (a): Original signal. (b): Frame reconstruction from the
dyadic wavelet maxima shown in Figure 6.7(c). (c): Frame reconstruction from the maxima whose amplitude is above the threshold T = 10. CHAPTER 6. WAVELET ZOOM 258 Each wavelet coe cient can be written as an inner product of a0
with a discrete wavelet translated by m: dj m] = ha0 n] j n ; m]i = X N ;1
n=0 a0 n] j n ; m] : The modulus maxima are located at abscissa uj p where jdj uj p]j is
locally maximum, which means that
jdj uj p]j jdj uj p ; 1]j and jdj uj p]j jdj uj p + 1]j so long as one of these two inequalities is strict. We denote j p n] =
j n ; uj p].
To reconstruct a signal from its dyadic wavelet transform calculated
up to the coarsest scale 2J , it is necessary to provide the remaining
coarse approximation aJ m], which is reduced to a constant when 2J =
N:
X
1 N ;1 a n] = pN C :
aJ m] = p
N n=0 0
Providing the average C is also necessary in order to reconstruct a
signal from its wavelet maxima.
The maxima reconstruction algorithm inverts the symmetrical operator L associated to the frame coe cients that are kept: Lr = XX log2 N j =1 p hr j pi j p + C : (6.51) The computational complexity of the conjugate gradient algorithm of
Theorem 5.4 is driven by the calculation of Lpn in (5.38). This is
optimized with an e cient lter bank implementation of L.
To compute Lr we rst calculate the dyadic wavelet transform of
r n] with the \algorithme a trous". At each scale 2j , all coe cients
that are not located at an abscissa uj p are set to zero:
~
dj m] = h r n]
0 j n ; uj p]i if m = uj p :
otherwise (6.52) 6.3. MULTISCALE EDGE DETECTION 2 259 Then Lr n] is obtained by modifying the lter bank reconstruction
given by Proposition 5.6. The decomposition and reconstruction wavelets
~
are the same in (6.51) so we set h n] = h n] and g n] = g n]. The fac~
tor 1=2 in (5.87) is also removed because the reconstruction wavelets
in (6.51) are not attenuated by 2;j as are the wavelets in the nonsampled reconstruction formula (5.71). For J = log2 N , we initialize
p
aJ n] = C= N and for log2 N > j 0 we compute
~
~
aj n] = aj+1 ? hj n] + dj+1 ? gj n]:
~
~
(6.53)
One can verify that Lr n] = a0 n] with the same derivations as in the
~
proof of Proposition 5.6. Let Kh and Kg be the number of non-zero
coe cients of h n] and g n]. The calculation of Lr n] from r n] requires
a total of 2(Kh + Kg )N log2 N operations. The reconstructions shown
in Figure 6.8 are computed with the lters of Table 5.3. 6.3 Multiscale Edge Detection 2
The edges of structures in images are often the most important features
for pattern recognition. This is well illustrated by our visual ability to
recognize an object from a drawing that gives a rough outline of contours. But, what is an edge? It could be de ned as points where the
image intensity has sharp transitions. A closer look shows that this
de nition is often not satisfactory. Image textures do have sharp intensity variations that are often not considered as edges. When looking
at a brick wall, we may decide that the edges are the contours of the
wall whereas the bricks de ne a texture. Alternatively, we may include
the contours of each brick in the set of edges and consider the irregular surface of each brick as a texture. The discrimination of edges
versus textures depends on the scale of analysis. This has motivated
computer vision researchers to detect sharp image variations at di erent scales 44, 298]. The next section describes the multiscale Canny
edge detector 113]. It is equivalent to detecting modulus maxima in
a two-dimensional dyadic wavelet transform 261]. The Lipschitz regularity of edge points is derived from the decay of wavelet modulus
maxima across scales. It is also shown that image approximations may
be reconstructed from these wavelet modulus maxima, with no visual 260 CHAPTER 6. WAVELET ZOOM degradation. Image processing algorithms can thus be implemented on
multiscale edges. 6.3.1 Wavelet Maxima for Images 2 Canny Edge Detection The Canny algorithm detects points of sharp variation in an image f (x1 x2 ) by calculating the modulus of
its gradient vector
@ f @f :
~
rf =
(6.54)
@x @x
1 2 The partial derivative of f in the direction of a unit vector ~ = (cos sin )
n
in the x = (x1 x2) plane is calculated as an inner product with the gradient vector
@f = rf :~ = @f cos + @f sin :
~n
@~
n
@x1
@x2
The absolute value of this partial derivative is maximum if ~ is colinear
n
~ f . This shows that rf (x) is parallel to the direction of maximum
~
to r
change of the surface f (x). A point y 2 2 is de ned as an edge if
~
~
jrf (x)j is locally maximum at x = y when x = y + rf (y ) for j j
small enough. This means that the partial derivatives of f reach a local
maximum at x = y, when x varies in a one-dimensional neighborhood
of y along the direction of maximum change of f at y. These edge
points are in ection points of f . R Multiscale Edge Detection A multiscale version of this edge detec- tor is implemented by smoothing the surface with a convolution kernel
(x) that is dilated. This is computed with two wavelets that are the
partial derivatives of :
@
1=;@
and 2 = ; @x :
(6.55)
@x1
2
The scale varies along the dyadic sequence f2j gj2Z to limit computations and storage. For 1 k 2, we denote for x = (x1 x2 )
1 k x1 x2 and k (x) = k (;x):
k
2j (x1 x2 ) = 2j
2j
2j
2j 2j 6.3. MULTISCALE EDGE DETECTION R 261 In the two directions indexed by 1 k 2, the dyadic wavelet transform of f 2 L2( 2 ) at u = (u1 u2) is
k
k
W k f (u 2j ) = hf (x) 2j (x ; u)i = f ? 2j (u) :
(6.56)
Section 5.5.3 gives necessary and su cient conditions for obtaining a
complete and stable representation.
Let us denote 2j (x) = 2;j (2;j x) and 2j (x) = 2j (;x). The two
scaled wavelets can be rewritten
1
j @ 2j
2
j @ 2j
2j = 2 @x and 2j = 2 @x :
1
2
We thus derive from (6.56) that the wavelet transform components are
proportional to the coordinates of the gradient vector of f smoothed
by 2j :
!
@
W 1f (u 2j ) = 2j @u1 (f ? 2j )(u) = 2j r(f ? j )(u) : (6.57)
~
2
@ (f ? j )(u)
W 2f (u 2j )
2
@u2
The modulus of this gradient vector is proportional to the wavelet transform modulus
p
Mf (u 2j ) = jW 1f (u 2j )j2 + jW 2f (u 2j )j2:
(6.58)
Let Af (u 2j ) be the angle of the wavelet transform vector (6.57) in the
plane (x1 x2 )
1
j
u
0
Af (u 2j ) = (;) (u) if W 1f (u 2j ) < 0
(6.59)
if W f (u 2 )
with
2
j
(u) = tan;1 W 1f (u 2j ) :
W f (u 2 ) The unit vector ~ j (u) = (cos Af (u 2j ) sin Af (u 2j )) is colinear to
n
~ (f ? 2j )(u). An edge point at the scale 2j is a point v such that
r
Mf (u 2j ) is locally maximum at u = v when u = v + ~ j (v) for j j
n
small enough. These points are also called wavelet transform modulus
maxima. The smoothed image f ? 2j has an in ection point at a modulus maximum location. Figure 6.9 gives an example where the wavelet
modulus maxima are located along the contour of a circle. 262 CHAPTER 6. WAVELET ZOOM Maxima curves Edge points are distributed along curves that often correspond to the boundary of important structures. Individual wavelet
modulus maxima are chained together to form a maxima curve that
follows an edge. At any location, the tangent of the edge curve is
approximated by computing the tangent of a level set. This tangent
direction is used to chain wavelet maxima that are along the same edge
curve.
The level sets of g(x) are the curves x(s) in the (x1 x2 ) plane where
g(x(s)) is constant. The parameter s is the arc-length of the level set.
Let ~ = ( 1 2 ) be the direction of the tangent of x(s). Since g(x(s)) is
constant when s varies,
@g(x(s)) = @g + @g = rg :~ = 0 :
~
@s
@x1 1 @x2 2
~
So rg(x) is perpendicular to the direction ~ of the tangent of the level
set that goes through x.
This level set property applied to g = f ? 2j proves that at a
maximum point v the vector ~ j (v) of angle Af (v 2j ) is perpendicular
n
to the level set of f ? 2j going through v. If the intensity pro le remains
constant along an edge, then the in ection points (maxima points)
are along a level set. The tangent of the maxima curve is therefore
perpendicular to ~ j (v). The intensity pro le of an edge may not be
n
constant but its variations are often negligible over a neighborhood of
size 2j for a su ciently small scale 2j , unless we are near a corner. The
tangent of the maxima curve is then nearly perpendicular to ~ j (v).
n
In discrete calculations, maxima curves are thus recovered by chaining
together any two wavelet maxima at v and v + ~ , which are neighbors
n
over the image sampling grid and such that ~ is nearly perpendicular
n
to ~ j (v).
n Example 6.2 The dyadic wavelet transform of the image in Figure 6.9 yields modulus images Mf (2j v) whose maxima are along the boundary
of a disk. This circular edge is also a level set of the image. The vector
~ j (v) of angle Af (2j v) is thus perpendicular to the edge at the maxima
n
locations. Example 6.3 In the Lena image shown in Figure 6.10, some edges 6.3. MULTISCALE EDGE DETECTION 263 disappear when the scale increases. These correspond to ne scale intensity variations that are removed by the averaging with 2j when 2j is
large. This averaging also modi es the position of the remaining edges.
Figure 6.10(f) displays the wavelet maxima such that Mf (v 2j ) T ,
for a given threshold T . They indicate the location of edges where the
image has large amplitude variations. Lipschitz Regularity The decay of the two-dimensional wavelet
transform depends on the regularity of f . We restrict the analysis
1. A function f is said to be Lipschitz
to Lipschitz exponents 0
at v = (v1 v2) if there exists K > 0 such that for all (x1 x2 ) 2 2
jf (x1 x2 ) ; f (v1 v2 )j K (jx1 ; v1 j2 + jx2 ; v2 j2 ) =2 :
(6.60)
If there exists K > 0 such that (6.60) is satis ed for any v 2 then
f is uniformly Lipschitz over . As in one dimension, the Lipschitz
regularity of a function f is related to the asymptotic decay jW 1f (u 2j )j
and jW 2f (u 2j )j in the corresponding neighborhood. This decay is
controlled by Mf (u 2j ). Like in Theorem 6.3, one can prove that f
is uniformly Lipschitz inside a bounded domain of 2 if and only if
there exists A > 0 such that for all u inside this domain and all scales
2j
jMf (u 2j )j A 2j ( +1) :
(6.61)
Suppose that the image has an isolated edge curve along which f has
Lipschitz regularity . The value of jMf (u 2j )j in a two-dimensional
neighborhood of the edge curve can be bounded by the wavelet modulus
values along the edge curve. The Lipschitz regularity of the edge is
estimated with (6.61) by measuring the slope of log2 jMf (u 2j )j as a
function of j . If f is not singular but has a smooth transition along
the edge, the smoothness can be quanti ed by the variance 2 of a twodimensional Gaussian blur. The value of 2 is estimated by generalizing
Theorem 6.6. R R Reconstruction from Edges In his book about vision, Marr 44] conjectured that images can be reconstructed from multiscale edges.
For a Canny edge detector, this is equivalent to recovering images 264 CHAPTER 6. WAVELET ZOOM (a)
(b)
(c)
(d)
(e)
Figure 6.9: The top image has N 2 = 1282 pixels. (a): Wavelet transform in the horizontal direction, with a scale 2j that increases from
top to bottom: fW 1f (u 2j )g;6 j 0. Black, grey and white pixels correspond respectively to negative, zero and positive values. (b): Vertical direction: fW 2f (u 2j )g;6 j 0. (c): Wavelet transform modulus
fMf (u 2j )g;6 j 0. White and black pixels correspond respectively to
zero and large amplitude coe cients. (d): Angles fAf (u 2j )g;6 j 0 at
points where the modulus is non-zero. (e): Wavelet modulus maxima
are in black. 6.3. MULTISCALE EDGE DETECTION 265 (a)
(b)
(c)
(d)
(e)
(f)
Figure 6.10: Multiscale edges of the Lena image shown in Figure
6.11. (a): fW 1f (u 2j )g;7 j ;3. (b): fW 2f (u 2j )g;7 j ;3. (c):
fMf (u 2j )g;7 j ;3 . (d): fAf (u 2j )g;7 j ;3. (e): Modulus maxima.
(f): Maxima whose modulus values are above a threshold. CHAPTER 6. WAVELET ZOOM 266 from wavelet modulus maxima. Despite the non-completeness of dyadic
wavelet maxima 94, 48], the algorithm of Mallat and Zhong 261] computes an image approximation that is visually identical to the original
one.
As in Section 6.2.2, we describe a simpler inverse frame algorithm.
At each scale 2j , a multiscale edge representation provides the positions
uj p of the wavelet transform modulus maxima as well as the values of
the modulus Mf (uj p 2j ) and the angle Af (uj p 2j ). The modulus and
angle specify the two wavelet transform components
(6.62)
W k f (uj p 2j ) = hf jk pi for 1 k 2
with jk p(x) = 2;j k (2;j (x ; uj p)). As in one dimension, the reconstruction algorithm recovers a function of minimum norm f~ such that
W k f~(uj p 2j ) = hf~ jk pi = hf jk pi:
(6.63)
It is the orthogonal projection of f in the closed space V generated by
the family of wavelets
1
2
jp jp jp:
If j1 p j2 p j p is a frame of V, which is true in nite dimensions, then
f~ is computed with the conjugate gradient algorithm of Theorem 5.4
by calculating f~ = L;1g with g = Lf~ = 2
XX k=1 j p hf k
k
j pi j p : (6.64) The reconstructed image f~ is not equal to the original image f
but their relative mean-square di erences is below 10;2. Singularities
and edges are nearly perfectly recovered and no spurious oscillations
are introduced. The images di er slightly in smooth regions, which
visually is not noticeable. Example 6.4 The image reconstructed in Figure 6.11(b) is visually
identical to the original image. It is recovered with 10 conjugate gradient iterations. After 20 iterations, the relative mean-square reconstruction error is kf~ ; f k=kf k = 4 10;3. The thresholding of edges 6.3. MULTISCALE EDGE DETECTION 267 accounts for the disappearance of image structures from the reconstruction shown in Figure 6.11(c). Sharp image variations are perfectly
recovered. Illusory Contours A multiscale wavelet edge detector de nes edges as points where the image intensity varies sharply. This de nition is
however too restrictive when edges are used to nd the contours of objects. For image segmentation, edges must de ne closed curves that
outline the boundaries of each region. Because of noise or light variations, local edge detectors produce contours with holes. Filling these
holes requires some prior knowledge about the behavior of edges in the
image. The illusion of the Kanizsa triangle 39] shows that such an edge
lling is performed by the human visual system. In Figure 6.12, one
can \see" the edges of a straight and a curved triangle although the image grey level remains uniformly white between the black discs. Closing
edge curves and understanding illusory contours requires computational
models that are not as local as multiscale di erential operators. Such
contours can be obtained as the solution of a global optimization that
incorporates constraints on the regularity of contours and which takes
into account the existence of occlusions 189]. 6.3.2 Fast Multiscale Edge Computations 3 The dyadic wavelet transform of an image of N 2 pixels is computed
with a separable extension of the lter bank algorithm described in
Section 5.5.2. A fast multiscale edge detection is derived 261]. Wavelet Design Edge detection wavelets (6.55) are designed as sep- arable products of one-dimensional dyadic wavelets, constructed in Section 5.5.1. Their Fourier transform is
^1(!1 !2) = g !1 ^ !1 ^ !2
^2
(6.65)
2
2
and
^2 (!1 !2) = g !2 ^ !1 ^ !2
^2
(6.66)
2
2 CHAPTER 6. WAVELET ZOOM 268 (a) (b) (c)
Figure 6.11: (a): Original Lena. (b): Reconstructed from the wavelet
maxima displayed in Figure 6.10(e) and larger scale maxima. (c): Reconstructed from the thresholded wavelet maxima displayed in Figure
6.10(f) and larger scale maxima. 6.3. MULTISCALE EDGE DETECTION 269 Figure 6.12: The illusory edges of a straight and a curved triangle are
perceived in domains where the images are uniformly white.
where ^(!) is a scaling function whose energy is concentrated at low
frequencies and
p
i!
g(!) = ;i 2 sin ! exp ;2 :
^
(6.67)
2
This transfer function is the Fourier transform of a nite di erence lter
which is a discrete approximation of a derivative
8 ;0:5 if p = 0
g p] = < 0:5 if p = 1 :
p
(6.68)
2 : 0 otherwise
The resulting wavelets 1 and 2 are nite di erence approximations
of partial derivatives along x and y of (x1 x2) = 4 (2x) (2y).
To implement the dyadic wavelet transform with a lter bank algorithm, the scaling function ^ is calculated, as in (5.76), with an in nite
product:
+1
Y ^ ;p
1^
^(!) = h(2 !) = p h ! ^ ! :
p
(6.69)
2
2
22
p=1
^
The 2 periodic function h is the transfer function of a nite impulse
response low-pass lter h p]. We showed in (5.81) that the Fourier
transform of a box spline of degree m
m+1
^(!) = sin(!=2)
with = 1 if m is even
exp ;i2 !
0 if m is odd
!=2 CHAPTER 6. WAVELET ZOOM 270
is obtained with p^
p
^
h(!) = 2 ^(2!) = 2 cos !
2
(!) m+1 exp ;i !
2 : Table 5.3 gives h p] for m = 2. \Algorithme a trous" The one-dimensional \algorithme a trous" of Section 5.5.2 is extended in two dimensions with convolutions along
the rows and columns of the image. The support of an image f_ is
normalized to 0 1]2 and the N 2 pixels are obtained with a sampling on
a uniform grid with intervals N ;1. To simplify the description of the
algorithm, the sampling interval is normalized to 1 by considering the
dilated image f (x1 x2) = f_(N ;1 x1 N ;1x2 ). A change of variable shows
that the wavelet transform of f_ is derived from the wavelet transform
of f with a simple renormalization:
W k f_(u 2j ) = N ;1 W k f (Nu N 2j ) :
Each sample a0 n] of the normalized discrete image is considered to
be an average of f calculated with the kernel (x1 ) (x2) translated at
n = (n1 n2): a0 n1 n2] = hf (x1 x2) (x1 ; n1 ) (x2 ; n2)i :
This is further justi ed in Section 7.7.3. For any j aj n1 n2 ] = hf (x1 x2 ) 0, we denote 2j (x1 ; n1 ) 2j (x2 ; n2 )i: The discrete wavelet coe cients at n = (n1 n2) are d1 n] = W 1f (n 2j ) and d2 n] = W 2f (n 2j ) :
j
j
They are calculated with separable convolutions.
For any j 0, the lter h p] \dilated" by 2j is de ned by hj p] = Z h ;p=2j ] if p=2j 2
0
otherwise (6.70) 6.3. MULTISCALE EDGE DETECTION 271 and for j > 0, a centered nite di erence lter is de ned by
8 0:5 if p = ;2j;1
gpp] = < ;0:5 if p = 2j;1 :
j
(6.71)
2 :0
otherwise
p
p
For j = 0, we de ne g0 0]= 2 = ;0:5, g0 ;1]= 2 = ;0:5 and g0 p] = 0
for p 6= 0 ;1. A separable two-dimensional lter is written n1 n2] = n1 ] n2]
and n] is a discrete Dirac. Similarly to Proposition 5.6, one can prove
that for any j 0 and any n = (n1 n2)
aj+1 n] = aj ? hj hj n]
(6.72)
(6.73)
d1+1 n] = aj ? gj n]
j
2 n] = a ? g n]:
dj+1
(6.74)
j
j
Dyadic wavelet coe cients up to the scale 2J are therefore calculated
by cascading the convolutions (6.72-6.74) for 0 < j J . To take into
account border problems, all convolutions are replaced by circular convolutions, which means that the input image a0 n] is considered to be
N periodic along its rows and columns. Since J log2 N and all lters
have a nite impulse response, this algorithm requires O(N 2 log2 N )
operations. If J = log2 N then one can verify that the larger scale
approximation is a constant proportional to the grey level average C :
1
aJ n1 n2] = N X N ;1
n1 n2 =0 a0 n1 n2] = N C : The wavelet transform modulus is Mf (n 2j ) = jd1 n]j2 + jd2 n]j2
j
j
whereas Af (n 2j ) is the angle of the vector (d1 n] d2 n]). The wavelet
j
j
modulus maxima are located at points uj p where Mf (uj p 2j ) is larger
than its two neighbors Mf (uj p ~ 2j ), where ~ = ( 1 2) is the vector
whose coordinates 1 and 2 are either 0 or 1, and whose angle is the
closest to Af (uj p 2j ). This veri es that Mf (n 2j ) is locally maximum
at n = uj p in a one-dimensional neighborhood whose direction is along
the angle Af (uj p 2j ). 272 CHAPTER 6. WAVELET ZOOM Reconstruction from Maxima The frame algorithm recovers an image approximation from multiscale edges by inverting the operator
L de ned in (6.64), with the conjugate gradient algorithm of Theorem
5.4. This requires computing Lr e ciently for any image r n]. For
this purpose, the wavelet coe cients of r are rst calculated with the
\algorithme a trous," and at each scale 2 2j N all wavelets coe cients not located at a maximum position uj p are set to zero as in the
one-dimensional implementation (6.52):
k
j
~j
dk n] = W r(n 2 ) if n = uj p :
0
otherwise
The signal Lr n] is recovered from these non-zero wavelet coe cients
with a reconstruction formula similar to (6.53). Let hj n] = hj ;n]
and gj n] = gj ;n] be the two lters de ned with (6.70) and (6.71).
The calculation is initialized for J = log2 N by setting aJ n] = C N ;1 ,
~
where C is the average image intensity. For log2 N > j 0 we compute
aj n] = aj+1 ? hj hj n] + d1+1 ? gj n] + d2+1 n] ? gj n]
~
~
j
j
and one can verify that Lr n] = a0 n]. It is calculated from r n] with
~
2 log N ) operations. The reconstructed images in Figure 6.11 are
O(N 2
obtained with 10 conjugate gradient iterations implemented with this
lter bank algorithm. 6.4 Multifractals 2
Signals that are singular at almost every point were originally studied as pathological objects of pure mathematical interest. Mandelbrot
43] was the rst to recognize that such phenomena are encountered
everywhere. Among the many examples 25] let us mention economic
records like the Dow Jones industrial average, physiological data including heart records, electromagnetic uctuations in galactic radiation
noise, textures in images of natural terrain, variations of tra c ow. . .
The singularities of multifractals often vary from point to point,
and knowing the distribution of these singularities is important in analyzing their properties. Pointwise measurements of Lipschitz exponents are not possible because of the nite numerical resolution. After 6.4. MULTIFRACTALS 273 discretization, each sample corresponds to a time interval where the
signal has an in nite number of singularities that may all be di erent.
The singularity distribution must therefore be estimated from global
measurements, which take advantage of multifractal self-similarities.
Section 6.4.2 computes the fractal dimension of sets of points having
the same Lipschitz regularity, with a global partition function calculated from wavelet transform modulus maxima. Applications to fractal
noises such as fractional Brownian motions and to hydrodynamic turbulence are studied in Section 6.4.3. 6.4.1 Fractal Sets and Self-Similar Functions R n is said to be self-similar if it is the union of disjoint subsets
A set S
S1 : : : Sk that can be obtained from S with a scaling, translation and
rotation. This self-similarity often implies an in nite multiplication of
details, which creates irregular structures. The triadic Cantor set and
the Van Koch curve are simple examples. Example 6.5 The Von Koch curve is a fractal set obtained by re- cursively dividing each segment of length l in four segments of length
l=3, as illustrated in Figure 6.13. Each subdivision increases the length
by 4=3. The limit of these subdivisions is therefore a curve of in nite
length. Example 6.6 The triadic Cantor set is constructed by recursively dividing intervals of size l in two sub-intervals of size l=3 and a central
hole, illustrated by Figure 6.14. The iteration begins from 0 1]. The
Cantor set obtained as a limit of these subdivisions is a dust of points
in 0 1]. R Fractal Dimension The Von Koch curve has in nite length in a nite square of 2 . The usual length measurement is therefore not
well adapted to characterize the topological properties of such fractal
curves. This motivated Hausdor in 1919 to introduce a new de nition
of dimension, based on the size variations of sets when measured at
di erent scales. CHAPTER 6. WAVELET ZOOM 274 l l/3 l/3
l/3 l/3 Figure 6.13: Three iterations of the Von Koch subdivision. The Von
Koch curve is the fractal obtained as a limit of an in nite number of
subdivisions. 1
1/3
1/9 1/3
1/9 1/9 1/9 Figure 6.14: Three iterations of the Cantor subdivision of 0 1]. The
limit of an in nite number of subdivisions is a closed set in 0 1]. 6.4. MULTIFRACTALS 275 The capacity dimension is a simpli cation of the Hausdor dimension that is easier to compute numerically. Let S be a bounded set in
n . We count the minimum number N (s) of balls of radius s needed
to cover S . If S is a set of dimension D with a nite length (D = 1),
surface (D = 2) or volume (D = 3) then R N (s) s;D
so (
D = ; s!0 log Nss) :
lim
log (6.75) The capacity dimension D of S generalizes this result and is de ned by
(
(6.76)
D = ; lim inf log N ss) :
s!0
log
The measure of S is then M = lim sup N (s) sD :
s!0 It may be nite or in nite.
The Hausdor dimension is a re ned fractal measure that considers
all covers of S with balls of radius smaller than s. It is most often
equal to the capacity dimension, but not always. In the following, the
capacity dimension is called fractal dimension. Example 6.7 The Von Koch curve has in nite length because its fractal dimension is D > 1. We need N (s) = 4n balls of size s = 3;n
to cover the whole curve, hence N (3;n) = (3;n); log 4= log 3:
One can verify that at any other scale s, the minimum number of balls
N (s) to cover this curve satis es
(
D = ; lim inf log Nss) = log 4 :
s!0
log
log 3
As expected, it has a fractal dimension between 1 and 2. 276 CHAPTER 6. WAVELET ZOOM Example 6.8 The triadic Cantor set is covered by N (s) = 2n intervals
of size s = 3;n, so N (3;n) = (3;n); log 2=log 3:
One can also verify that
(
D = ; lim inf log Nss) = log 2 :
s!0
log
log 3
Self-Similar Functions Let f be a continuous function with a compact support S . We say that f is self-similar if there exist disjoint
subsets S1 : : : Sk such that the graph of f restricted to each Si is an
a ne transformation of f . This means that there exist a scale li > 1,
a translation ri, a weight pi and a constant ci such that
8t 2 Si f (t) = ci + pi f li (t ; ri ) :
(6.77)
Outside these subsets, we suppose that f is constant. Generalizations
of this de nition can also be used 110].
If a function is self similar, its wavelet transform is also. Let g be
an a ne transformation of f :
g(t) = p f l(t ; r) + c:
(6.78)
Its wavelet transform is
Z +1 1 t ; u
Wg(u s) =
g(t) ps
s dt:
;1
With the change of variable t0 = l(t ; r), since has a zero average,
the a ne relation (6.78) implies
p
Wg(u s) = p Wf l(u ; r) sl :
l
Suppose that has a compact support included in ;K K ]. The
a ne invariance (6.77) of f over Si = ai bi] produces an a ne invariance for all wavelets whose support is included in Si . For any
s < (bi ; ai)=K and any u 2 ai + Ks bi ; Ks],
p
Wf (u s) = pi Wf li(u ; ri) sli :
li 6.4. MULTIFRACTALS 277 The self-similarity of the wavelet transform implies that the positions
and values of its modulus maxima are also self-similar. This can be used
to recover unknown a ne invariance properties with a voting procedure
based on wavelet modulus maxima 218]. Example 6.9 A Cantor measure is constructed over a Cantor set. Let d 0(x) = dx be the uniform Lebesgue measure on 0 1]. As in
the Cantor set construction, this measure is subdivided into three uniform measures, whose integrals over 0 1=3], 1=3 2=3] and 2=3 1] are
respectively p1, 0 and p2 . We impose p1 + p2 = 1 to obtain a total
measure d 1 on 0 1] whose integral is equal to 1. This operation is
iteratively repeated by dividing each uniform measure of integral p over
a a + l] into three equal parts whose integrals are respectively p1p, 0
and p2p over a a + l=3], a + l=3 a + 2l=3] and a + 2l=3 a + l]. This is
illustrated by Figure 6.15. After each subdivision, the resulting measure d n has a unit integral. In the limit, we obtain a Cantor measure
d 1 of unit integral, whose support is the triadic Cantor set.
1 d µ (x)
0 p p2 1 2
p1 p2 p1 pp 12 2
p2 d µ1(x)
d µ2(x) Figure 6.15: Two subdivisions of the uniform measure on 0 1] with
left and right weights p1 and p2. The Cantor measure d 1 is the limit
of an in nite number of these subdivisions. Example 6.10 A devil's staircase is the integral of a Cantor measure:
f (t) = Zt
0 d 1(x): (6.79) It is a continuous function that increases from 0 to 1 on 0 1]. The
recursive construction of the Cantor measure implies that f is self- CHAPTER 6. WAVELET ZOOM 278 f(t)
1
0.5
0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t log2(s)
−6
−4
−2
0
0 (a) u log2(s)
−6
−4
−2
0
0 u (b)
Figure 6.16: Devil's staircase calculated from a Cantor measure with
equal weights p1 = p2 = 0:5. (a): Wavelet transform Wf (u s) computed with = ; 0 , where is Gaussian. (b): Wavelet transform
modulus maxima. 6.4. MULTIFRACTALS
similar: 279 8
if t 2 0 1=3]
> p1 f (3t)
<
if t 2 1=3 2=3] :
f (t) = > p1
: p1 + p2 f (3t ; 2) if t 2 2=3 0] Figure 6.16 displays the devil's staircase obtained with p1 = p2 = 0:5.
The wavelet transform below is calculated with a wavelet that is the
rst derivative of a Gaussian. The self-similarity of f yields a wavelet
transform and modulus maxima that are self-similar. The subdivision
of each interval in three parts appears through the multiplication by
2 of the maxima lines, when the scale is multiplied by 3. This Cantor construction is generalized with di erent interval subdivisions and
weight allocations, beginning from the same Lebesgue measure d 0 on
0 1] 5]. 6.4.2 Singularity Spectrum 3 Finding the distribution of singularities in a multifractal signal f is
particularly important for analyzing its properties. The spectrum of
singularity measures the global repartition of singularities having different Lipschitz regularity. The pointwise Lipschitz regularity of f is
given by De nition 6.1. De nition 6.2 (Spectrum) Let S be the set of all points t 2 R where the pointwise Lipschitz regularity of f is equal to . The spectrum
of singularity D( ) of f is the fractal dimension of S . The support of
D( ) is the set of such that S is not empty. This spectrum was originally introduced by Frisch and Parisi 185] to
analyze the homogeneity of multifractal measures that model the energy
dissipation of turbulent uids. It was then extended by Arneodo, Bacry
and Muzy 278] to multifractal signals. The fractal dimension de nition
(6.76) shows that if we make a disjoint cover of the support of f with
intervals of size s then the number of intervals that intersect S is N (s) s;D( ) : (6.80) CHAPTER 6. WAVELET ZOOM 280 The singularity spectrum gives the proportion of Lipschitz singularities that appear at any scale s. A multifractal f is said to be homogeneous if all singularities have the same Lipschitz exponent 0 , which
means the support of D( ) is restricted to f 0g. Fractional Brownian
motions are examples of homogeneous multifractals. Partition Function One cannot compute the pointwise Lipschitz regularity of a multifractal because its singularities are not isolated, and
the nite numerical resolution is not su cient to discriminate them. It
is however possible to measure the singularity spectrum of multifractals from the wavelet transform local maxima, using a global partition
function introduced by Arneodo, Bacry and Muzy 278].
Let be a wavelet with n vanishing moments. Theorem 6.5 proves
that if f has pointwise Lipschitz regularity 0 < n at v then the wavelet
transform Wf (u s) has a sequence of modulus maxima that converges
towards v at ne scales. The set of maxima at the scale s can thus be
interpreted as a covering of the singular support of f with wavelets of
scale s. At these maxima locations
jWf (u s)j s 0 +1=2 : Let fup(s)gp2Z be the position of all local maxima of jWg(u s)j at
a xed scale s. The partition function Z measures the sum at a power
q of all these wavelet modulus maxima:
Z (q s) = X
p jWf (up s)jq : (6.81) At each scale s, any two consecutive maxima up and up+1 are supposed
to have a distance jup+1 ; upj > s, for some > 0. If not, over
intervals of size s, the sum (6.81) includes only the maxima of largest
amplitude. This protects the partition function from the multiplication
of very close maxima created by fast oscillations.
For each q 2 , the scaling exponent (q) measures the asymptotic
decay of Z (q s) at ne scales s: R (q) = lim inf log Z (q s) :
s!0
log s 6.4. MULTIFRACTALS 281 This typically means that
Z (q s) s (q ) : Legendre Transform The following theorem relates (q) to the Legendre transform of D( ) for self-similar signals. This result was established in 83] for a particular class of fractal signals and generalized by
Ja ard 222].
Theorem 6.7 (Arneodo, Bacry, Ja ard, Muzy) Let =
be the support of D( ). Let be a wavelet with n >
moments. If f is a self-similar signal then max min max] vanishing (q) = min q ( + 1=2) ; D( ) :
2 (6.82) Proof 3 . The detailed proof is long we only give an intuitive justi cation.
The sum (6.81) over all maxima positions is replaced by an integral over
the Lipschitz parameter. At the scale s, (6.80) indicates that the density
of modulus maxima that cover a singularity with Lipschitz exponent
is proportional to s;D( ) . At locations where f has Lipschitz regularity
, the wavelet transform decay is approximated by jWf (u s)j s
It follows that Z (q s) Z sq( +1=2 : +1=2) s;D( ) d : When s goes to 0 we derive that Z (q s) s (q) for (q) = min
1=2) ; D( )). 2 (q ( + This theorem proves that the scaling exponent (q) is the Legendre
transform of D( ). It is necessary to use a wavelet with enough vanishing moments to measure all Lipschitz exponents up to max. In
numerical calculations (q) is computed by evaluating the sum Z (q s).
We thus need to invert the Legendre transform (6.82) to recover the
spectrum of singularity D( ). Proposition 6.2 The scaling exponent (q) is a convex and increasing function of q. CHAPTER 6. WAVELET ZOOM 282 The Legendre transform (6.82) is invertible if and only if D( ) is
convex, in which case D( ) = min q ( + 1=2) ; (q) :
q2R (6.83) The spectrum D( ) of self-similar signals is convex.
Proof 3 . The proof that D( ) is convex for self-similar signals can be found in 222]. We concentrate on the properties of the Legendre transform that are important in numerical calculations. To simplify the proof,
let us suppose that D(q) is twice di erentiable. The minimum of the Legendre transform (6.82) is reached at a critical point q( ). Computing
the derivative of q( + 1=2) ; D( ) with respect to gives
with q( ) = dD
d + 1 ; D( ):
2 (q ) = q (6.84)
(6.85) Since it is a minimum, the second derivative of (q( )) with respect to
is negative, from which we derive that
d2 D( (q)) 0: d 2 This proves that (q) depends only on the values where D( ) has a
negative second derivative. We can thus recover D( ) from (q) only if
it is convex.
The derivative of (q) is
d (q) = + 1 + q d ; d dD( ) = + 1 0:
(6.86)
dq
2
dq dq d
2
It is therefore increasing. Its second derivative is
d2 (q) = d : dq2 dq Taking the derivative of (6.84) with respect to q proves that
d d2 D ( ) = 1: dq d 2 6.4. MULTIFRACTALS 283 Since d dD(2 ) 0 we derive that d dq(2q) 0. Hence (q) is convex. By
using (6.85), (6.86) and the fact that (q) is convex, we verify that
2 2 D( ) = min q ( + 1=2) ; (q) :
q2R (6.86) The spectrum D( ) of self-similar signals is convex and can therefore
be calculated from (q) with the inverse Legendre formula (6.83) This
formula is also valid for a much larger class of multifractals. For example, it is veri ed for statistical self-similar signals such as realizations
of fractional Brownian motions. Multifractals having some stochastic
self-similarity have a spectrum that can often be calculated as an inverse Legendre transform (6.83). However, let us emphasize that this
formula is not exact for any function f because its spectrum of singularity D( ) is not necessarily convex. In general, Ja ard proved 222]
that the Legendre transform (6.83) gives only an upper bound of D( ).
These singularity spectrum properties are studied in detail in 49].
Figure 6.17 illustrates the properties of a convex spectrum D( ).
The Legendre transform (6.82) proves that its maximum is reached at D( 0) = max D( ) = ; (0):
2
It is the fractal dimension of the Lipschitz exponent 0 most frequently
encountered in f . Since all other Lipschitz singularities appear over
sets of lower dimension, if 0 < 1 then D( 0 ) is also the fractal dimension of the singular support of f . The spectrum D( ) for < 0
depends on (q) for q > 0, and for > 0 it depends on (q) for q < 0. Numerical Calculations To compute D( ), we assume that the Legendre transform formula
is valid. We rst calculate q s
P jWf (u s)jq, then derive(6.83)decay scaling exponent (qZ (and) =the
),
p p nally compute D( ) with a Legendre transform. If q < 0 then the value
of Z (q s) depends mostly on the small amplitude maxima jWf (up s)j.
Numerical calculations may then become unstable. To avoid introducing spurious modulus maxima created by numerical errors in regions
where f is nearly constant, wavelet maxima are chained to produce
maxima curve across scales. If = (;1)p (p) where is a Gaussian,
Proposition 6.1 proves that all maxima lines up(s) de ne curves that CHAPTER 6. WAVELET ZOOM 284 propagate up to the limit s = 0. All maxima lines that do not propagate up to the nest scale are thus removed in the calculation of Z (q s).
The calculation of the spectrum D( ) proceeds as follows.
D(α ) q=0 q<0 q>0 0 q = +∞
αmin q =− ∞ αmax α0 α Figure 6.17: Convex spectrum D( ).
1. Maxima Compute Wf (u s) and the modulus maxima at each
scale s. Chain the wavelet maxima across scales.
2. Partition function Compute
Z (q s) = X
p jWf (up s)jq : 3. Scaling Compute (q) with a linear regression of log2 Z (s q) as
a function of log2 s:
log2 Z (q s) (q) log2 s + C (q) : 4. Spectrum Compute D( ) = min q( + 1=2) ; (q) :
q 2R Example 6.11 The spectrum of singularity D( ) of the devil's staircase (6.79) is a convex function that can be calculated analytically 203].
Suppose that p1 < p2. The support of D( ) is min max] with
; log p2
and max = ; log p1 :
min =
log 3
log 3 6.4. MULTIFRACTALS 285 If p1 = p2 = 1=2 then the support of D( ) is reduced to a point,
which means that all the singularities of f have the same Lipschitz
log 2=log 3 regularity. The value D(log 2=log 3) is then the fractal dimension of the triadic Cantor set and is thus equal to log 2=log 3.
Figure 6.18(a) shows a devil's staircase calculated with p1 = 0:4 and
p2 = 0:6. Its wavelet transform is computed with = ; 0 , where is
a Gaussian. The decay of log2 Z (q s) as a function of log2 s is shown
in Figure 6.18(b) for several values of q. The resulting (q) and D( )
are are given by Figures 6.18(c,d). There is no numerical instability for
q < 0 because there is no modulus maximum whose amplitude is close
to zero. This is not the case if the wavelet transform is calculated with
a wavelet that has more vanishing moments. Smooth Perturbations Let f be a multifractal whose spectrum of
singularity D( ) is calculated from (q). If a C1 signal g is added to f then the singularities are not modi ed and the singularity spectrum of f~ = f + g remains D( ). We study the e ect of this smooth
perturbation on the spectrum calculation.
The wavelet transform of f~ is
W f~(u s) = Wf (u s) + Wg(u s):
Let (q) and ~(q) be the scaling exponent of the partition functions
~
Z (q s) and Z (q s) calculated from the modulus maxima respectively
~
of Wf (u s) and W f~(u s). We denote by D( ) and D( ) the Legendre
transforms respectively of (q) and ~(q). The following proposition
relates (q) and ~(q).
Proposition 6.3 (Arneodo, Bacry, Muzy) Let be a wavelet with
exactly n vanishing moments. Suppose that f is a self-similar function.
If g is a polynomial of degree p < n then (q) = ~(q) for all q 2 .
If g (n) is almost everywhere non-zero then
()
~(q) = (nq+ 1=2) q if q > qc
(6.87)
if q qc
where qc is de ned by (qc ) = (n + 1=2)qc. R CHAPTER 6. WAVELET ZOOM 286 f(t) Z(q,s)
200 1 150 q=−10.00 0.8 100 q=−6.67 0.6 50 q=−3.33
0 q=0.00 0.4 −50 0.2 q=6.67 −100
0
0 q=3.33 0.2 0.4 0.6 0.8 (a) 1 t q=10.00 −150
−10 −8 −6 −4 (b) −2 log2 s D(α) τ(q)
10 0.6
5
0.5
0 0.4 −5 0.3
0.2 −10
0.1
−15
−10 −5 0 5 q
10 0
0.4 0.5 0.6 0.7 0.8 0.9 α (c)
(d)
Figure 6.18: (a): Devil's staircase with p1 = 0:4 and p2 = 0:6. (b):
Partition function Z (q s) for several values of q. (c): Scaling exponent
(q). (d): The theoretical spectrum D( ) is shown with a solid line.
The + are the spectrum values calculated numerically with a Legendre
transform of (q). 6.4. MULTIFRACTALS 287 Proof 3 . If g is a polynomial of degree p < n then Wg(u s) = 0. The
addition of g does not modify the calculation of the singularity spectrum
based on wavelet maxima, so (q) = ~(q) for all q 2 .
If g is a C1 function that is not a polynomial then its wavelet transform is generally non-zero. We justify (6.88) with an intuitive argument
that is not a proof. A rigorous proof can be found in 83]. Since has
exactly n vanishing moments, (6.16) proves that
jWg(u s)j K sn+1=2 g(n) (u):
We suppose that g(n) (u) 6= 0. For (q) (n + 1=2)q, since jWg(u s)jq
sq(n+1=2) has a faster asymptotic decay than s (q) when s goes to zero,
~
one can verify that Z (q s) and Z (q s) have the same scaling exponent,
~(q) = (q). If (q) > (n + 1=2)q, which means that q qc, then
the decay of jW f~(u s)jq is controlled by the decay of jWg(u s)jq , so
~(q) = (n + 1=2)q. R This proposition proves that the addition of a non-polynomial smooth
function introduces a bias in the calculation of the singularity spectrum.
Let c be the critical Lipschitz exponent corresponding to qc:
D( c) = qc ( c + 1=2) ; (qc ):
The Legendre transform of ~(q) in (6.87) yields
8 D( ) if
<
c
~
if = n
D( ) = : 0
:
(6.88)
;1
if > c and 6= n
This modi cation is illustrated by Figure 6.19.
The bias introduced by the addition of smooth components can
be detected experimentally by modifying the number n of vanishing
moments of . Indeed the value of qc depends on n. If the singularity
spectrum varies when changing the number of vanishing moments of
the wavelet then it indicates the presence of a bias. 6.4.3 Fractal Noises 3 Fractional Brownian motions are statistically self-similar Gaussian processes that give interesting models for a wide class of natural phenomena 265]. Despite their non-stationarity, one can de ne a power CHAPTER 6. WAVELET ZOOM 288 spectrum that has a power decay. Realizations of fractional Brownian motions are almost everywhere singular, with the same Lipschitz
regularity at all points. ~ D(α ) D(α ) D(α )
0 αmin α0 αc αmax n α Figure 6.19: If has n vanishing moments, in presence of a C1 pertur~
bation the computed spectrum D( ) is identical to the true spectrum
D( ) for
c . Its support is reduced to fng for > c .
We often encounter fractal noise processes that are not Gaussian
although their power spectrum has a power decay. Realizations of these
processes may include singularities of various types. The spectrum of
singularity is then important in analyzing their properties. This is
illustrated by an application to hydrodynamic turbulence. De nition 6.3 (Fractional Brownian motion) A fractional Brownian motion of Hurst exponent 0 < H < 1 is a zero-mean Gaussian
process BH such that
BH (0) = 0
and
EfjBH (t) ; BH (t ; )j2 g = 2 j j2H :
(6.89) Property (6.89) imposes that the deviation of jBH (t) ; BH (t ; )j
be proportional to j jH . As a consequence, one can prove that any
realization f of BH is almost everywhere singular with a pointwise
Lipschitz regularity = H . The smaller H , the more singular f .
Figure 6.20(a) shows the graph of one realization for H = 0:7.
Setting = t in (6.89) yields
EfjBH (t)j2 g = 2 jtj2H : 6.4. MULTIFRACTALS
Developing (6.89) for 289 = t ; u also gives EfBH (t) BH (u)g = ;jtj2H + juj2H ; jt ; uj2H :
2
2 (6.90) The covariance does not depend only on t ; u, which proves that a
fractional Brownian motion is non-stationary.
The statistical self-similarity appears when scaling this process. One
can derive from (6.90) that for any s > 0
EfBH (st) BH (su)g = EfsH BH (t) sH BH (u)g: Since BH (st) and sH BH (t) are two Gaussian processes with same mean
and same covariance, they have the same probability distribution BH (st) sH BH (t)
where denotes an equality of nite-dimensional distributions. Power Spectrum Although BH is not stationary, one can de ne a generalized power spectrum. This power spectrum is introduced by
proving that the increments of a fractional Brownian motion are stationary, and by computing their power spectrum 78]. Proposition 6.4 Let g (t) = (t) ; (t ; ). The increment
IH (t) = BH ? g (t) = BH (t) ; BH (t ; ) (6.91) is a stationary process whose power spectrum is
2 ^
RIH (!) = j!j2H +1 jg (!)j2:
^
H (6.92) Proof 2 . The covariance of IH is computed with (6.90):
2 2H
2H
2H
2 (j ; j + j + j ; 2j j ) = RIH ( ):
(6.93)
^
The power spectrum RIH (!) is the Fourier transform of RIH ( ). One
can verify that the Fourier transform of the distribution f ( ) = j j2H is EfIH (t) IH (t ; )g = CHAPTER 6. WAVELET ZOOM 290 f(t)
20
0
−20
0 0.2 0.4 0.6 (a) 0.8 1 t log 2 s log2(s)
−12 −10
−10 8
−8 −6 −6 −4 −4 −2
0 −2 0.2 0.4 0.6 (b) 0.8 1 τ(q)
4 u 0
0 0.2 0.4 (c) 0.6 0.8 1 1111
0000
1111
0000
u
1111
0000
1111
0000 D(α)
1 3
2 0.9 1 0.8 0
−1
0 0.7 2 4 q 0.65 0.7 α
0.75 (d)
(e)
Figure 6.20: (a): One realization of a fractional Brownian motion for a
Hurst exponent H = 0:7. (b): Wavelet transform. (c): Modulus maxima of its wavelet transform. (d): Scaling exponent (q). (e): Resulting
D( ) over its support. 6.4. MULTIFRACTALS
f^(!) = ; 291 H j!j;(2H +1) , with H > 0. We thus derive that the Fourier
transform of (6.93) can be written
^
RIH (!) = 2 2
which proves (6.92) for H = 2 H j!j;(2H +1) sin2 2! 2 H =2. If X (t) is a stationary process then we know that Y (t) = X ? g(t) is
also stationary and the power spectrum of both processes is related by
^Y (
^
RX (!) = jR(!!j) :
g )2
^ (6.94) Although BH (t) is not stationary, Proposition 6.4 proves that IH (t) =
BH ? g (t) is stationary. As in (6.94), it is tempting to de ne a \generalized" power spectrum calculated with (6.92):
2
^
R (!)
^
RBH (!) = jgIH(!)j2 = j!j2H +1 :
H
^ (6.95) The non-stationarity of BH (t) appears in the energy blow-up at low
frequencies. The increments IH (t) are stationary because the multiplication by jg (!)j2 = O(!2) removes the explosion of the low fre^
quency energy. One can generalize this result and verify that if g is an
arbitrary stable lter whose transfer function satis es jg(!)j = O(!),
^
then Y (t) = BH ? g(t) is a stationary Gaussian process whose power
spectrum is
2
^
RY (!) = j!j2H +1 jg(!)j2:
^
(6.96)
H Wavelet Transform The wavelet transform of a fractional Brownian motion is WBH (u s) = BH ? s (u):
(6.97)
Since has a least one vanishing moment, necessarily j ^(!)j = O(!)
in the neighborhood of ! = 0. The wavelet lter g = s has a Fourier
p
transform g(!) = s ^ (s!) = O(!) near ! = 0. This proves that
^ 292 CHAPTER 6. WAVELET ZOOM for a xed s the process Ys(u) = WBH (u s) is a Gaussian stationary
process 181], whose power spectrum is calculated with (6.96):
2 ^
^
RYs (!) = s j ^(s!)j2 j!j2H +1 = s2H +2 RY1 (s!):
H (6.98) The self-similarity of the power spectrum and the fact that BH is Gaussian are su cient to prove that WBH (u s) is self-similar across scales:
WBH (u s) sH +1=2 WBH u 1
s
where the equivalence means that they have same nite distributions.
Interesting characterizations of fractional Brownian motion properties
are also obtained by decomposing these processes in wavelet bases 49,
78, 357]. Example 6.12 Figure 6.20(a) displays one realization of a fractional Brownian with H = 0:7. The wavelet transform and its modulus maxima are shown in Figures 6.20(b) and 6.20(c). The partition function
(6.81) is computed from the wavelet modulus maxima. Figure 6.20(d)
gives the scaling exponent (q), which is nearly a straight line. Fractional Brownian motions are homogeneous fractals with Lipschitz exponents equal to H . In this example, the theoretical spectrum D( ) has
therefore a support reduced to f0:7g with D(0:7) = 1. The estimated
spectrum in Figure 6.20(e) is calculated with a Legendre transform of
(q). Its support is 0:65 0:75]. There is an estimation error because
the calculations are performed on a signal of nite size. Fractal Noises Some physical phenomena produce more general fractal noises X (t), which are not Gaussian processes, but which have stationary increments. As for fractional Brownian motions, one can de ne
a \generalized" power spectrum that has a power decay
2 ^
RX (!) = j!j2H +1 :
H These processes are transformed into a wide-sense stationary process by
a convolution with a stable lter g which removes the lowest frequencies jg(!)j = O(!). One can thus derive that the wavelet transform
^ 6.4. MULTIFRACTALS 293 Ys(u) = WX (u s) is a stationary process at any xed scale s. Its spectrum is the same as the spectrum (6.98) of fractional Brownian motions.
^
If H < 1, the asymptotic decay of RX (!) indicates that realizations of
X (t) are singular functions but it gives no information on the distribution of these singularities. As opposed to fractional Brownian motions,
general fractal noises have realizations that may include singularities of
various types. Such multifractals are di erentiated from realizations of
fractional Brownian motions by computing their singularity spectrum
D( ). For example, the velocity elds of fully developed turbulent
ows have been modeled by fractal noises, but the calculation of the
singularity spectrum clearly shows that these ows di er in important
ways from fractional Brownian motions. Hydrodynamic Turbulence Fully developed turbulence appears in incompressible ows at high Reynolds numbers. Understanding the
properties of hydrodynamic turbulence is a major problem of modern physics, which remains mostly open despite an intense research
e ort since the rst theory of Kolmogorov in 1941 237]. The number
of degrees of liberty of three-dimensional turbulence is considerable,
which produces extremely complex spatio-temporal behavior. No formalism is yet able to build a statistical-physics framework based on the
Navier-Stokes equations, that would enable us to understand the global
behavior of turbulent ows, at it is done in thermodynamics.
In 1941, Kolmogorov 237] formulated a statistical theory of turbulence. The velocity eld is modeled as a process V (x) whose increments
have a variance
EfjV (x + ) ; V (x)j2 g 2=3 2=3 :
The constant is a rate of dissipation of energy per unit of mass and
time, which is supposed to be independent of the location. This indicates that the velocity eld is statistically homogeneous with Lipschitz
regularity = H = 1=3. The theory predicts that a one-dimensional
trace of a three-dimensional velocity eld is a fractal noise process with
stationary increments, and whose spectrum decays with a power exponent 2H + 1 = 5=3:
2
^
RV (!) = j!jH=3 :
5 CHAPTER 6. WAVELET ZOOM 294 The success of this theory comes from numerous experimental veri cations of this power spectrum decay. However, the theory does not
take into account the existence of coherent structures such as vortices.
These phenomena contradict the hypothesis of homogeneity, which is
at the root of Kolmogorov's 1941 theory.
Kolmogorov 238] modi ed the homogeneity assumption in 1962, by
introducing an energy dissipation rate (x) that varies with the spatial
location x. This opens the door to \local stochastic self-similar" multifractal models, rst developed by Mandelbrot 264] to explain energy
exchanges between ne-scale structures and large-scale structures. The
spectrum of singularity D( ) is playing an important role in testing
these models 185]. Calculations with wavelet maxima on turbulent
velocity elds 5] show that D( ) is maximum at 1=3, as predicted
by the Kolmogorov theory. However, D( ) does not have a support
reduced to f1=3g, which veri es that a turbulent velocity eld is not
a homogeneous process. Models based on the wavelet transform were
recently introduced to explain the distribution of vortices in turbulent
uids 12, 179, 180]. 6.5 Problems
6.1. Lipschitz regularity
(a) Prove that if f is uniformly Lipschitz on a b] then it is
pointwise Lipschitz at all t0 2 a b].
(b) Show that f (t) = t sin t;1 is Lipschitz 1 at all t0 2 ;1 1]
and verify that it is uniformly Lipschitz over ;1 1] only for
1=2. Hint: consider the points tn = (n + 1=2);1 ;1 .
6.2. 1 Regularity of derivatives
(a) Prove that f is uniformly Lipschitz > 1 over a b] if and
only if f 0 is uniformly Lipschitz ; 1 over a b].
(b) Show that f may be pointwise Lipschitz > 1 at t0 while f 0 is
not pointwise Lipschitz ; 1 at t0 . Consider f (t) = t2 cos t;1
at t = 0.
1 Find f (t) which is uniformly Lipschitz 1 but does not satisfy
6.3.
the su cient Fourier condition (6.1).
6.4. 1 Let f (t) = cos !0 t and (t) be a wavelet that is symmetric about
1 6.5. PROBLEMS 295 0.
(a) Verify that p Wf (u s) = s ^(s!0 ) cos !0 t : 6.5. 6.6. 6.7. 6.8.
6.9. (b) Find the equations of the curves of wavelet modulus maxima
in the time-scale plane (u s). Relate the decay of jWf (u s)j
along these curves to the number n of vanishing moments of
.
1 Let f (t) = jtj . Show that Wf (u s) = s +1=2 Wf (u=s 1).
Prove that it is not su cient to measure the decay of jWf (u s)j
when s goes to zero at u = 0 in order to compute the Lipschitz
regularity of f at t = 0.
2 Let f (t) = jtj sin jtj; with
> 0 and > 0. What is the
pointwise Lipschitz regularity of f and f 0 at t = 0? Find the
equation of the ridge curve in the (u s) plane along which the
high amplitude wavelet coe cients jWf (u s)j converge to t = 0
when s goes to zero. Compute the maximum values of and 0
such that Wf (u s) satisfy (6.22).
1 For a complex wavelet, we call lines of constant phase the curves
in the (u s) plane along which the complex phase of Wf (u s)
remains constant when s varies.
(a) If f (t) = jtj , prove that the lines of constant phase converge
towards the singularity at t = 0 when s goes to zero. Verify
this numerically in WaveLab.
(b) Let be a real wavelet and Wf (u s) be the real wavelet transform of f . Show that the modulus maxima of Wf (u s) correspond to lines of constant phase of an analytic wavelet transform, which is calculated with a particular analytic wavelet a
that you will specify.
2 Prove that if f = 1
0 +1) then the number of modulus maxima
of Wf (u s) at each scale s is larger than or equal to the number
of vanishing moments of .
1 The spectrum of singularity of the Riemann function f (t) = +1
X n=;1 1 sin n2 t
n2 CHAPTER 6. WAVELET ZOOM 296 is de ned on its support by D( ) = 4 ; 2 if 2 1=2 3=4] and
D(3=2) = 0 213, 222]. Verify this result numerically with WaveLab, by computing this spectrum from the partition function of a
wavelet transform modulus maxima.
6.10. 2 Let = ; 0 where is a positive window of compact support.
If f is a Cantor devil's staircase, prove that there exist lines of
modulus maxima that converge towards each singularity.
6.11. 2 Implement in WaveLab an algorithm that detects oscillating singularities by following the ridges of an analytic wavelet
transform when the scale s decreases. Test your algorithm on
f (t) = sin t;1 .
6.12. 2 Let (t) be a Gaussian of variance 1.
(a) Prove that the Laplacian of a two-dimensional Gaussian
2
2
(x x ) = @ (x1 ) (x ) + (x ) @ (x2 )
12 2 @x2 1 @x2
2 satis es the dyadic wavelet condition (5.91) (there is only 1
wavelet).
(b) Explain why the zero-crossings of this dyadic wavelet transform provide the locations of multiscale edges in images. Compare the position of these zero-crossings with the wavelet modulus maxima obtained with 1 (x1 x2 ) = ; 0 (x1 ) (x2 ) and
2 (x1 x2 ) = ; (x1 ) 0 (x2 ).
6.13. 1 The covariance of a fractional Brownian motion BH (t) is given
by (6.90). Show that the wavelet transform at a scale s is stationary by verifying that n o Z +1 WBH (u1 s) WBH (u2 s) = ; 2
jtj2H u1 ; u2 ;t dt
s
;1
with (t) = ? (t) and (t) = (;t).
6.14. 2 Let X (t) be a stationary Gaussian process whose covariance
RX ( ) = EfX (t)X (t ; )g is twice di erentiable. One can prove
E 2 s2H +1 that the average number 1of zero-crossings over an interval of size 1
;
00
is ; RX (0) 2 RX (0) ; 56]. Let BH (t) be a fractional Brownian
motion and a wavelet that is C2 . Prove that the average numbers
repectively of zero-crossings and of modulus maxima of WBH (u s)
for u 2 0 1] are proportional to s. Verify this result numerically
in WaveLab. 6.5. PROBLEMS 297 We want to interpolate the samples of a discrete signal f (n=N )
without blurring its singularities, by extending its dyadic wavelet
transform at ner scales with an interpolation procedure on its
modulus maxima. The modulus maxima are calculated at scales
2j > N ;1 . Implement in WaveLab an algorithm that creates a
new set of modulus maxima at the ner scale N ;1 , by interpolating
across scales the amplitudes and positions of the modulus maxima
calculated at 2j > N ;1 . Reconstruct a signal of size 2N by adding
these ne scale modulus maxima to the maxima representation of
the signal.
6.16. 3 Implement an algorithm that estimates the Lipschitz regularity
and the smoothing scale of sharp variation points in onedimensional signals by applying the result of Theorem 6.6 on the
dyadic wavelet transform maxima. Extend Theorem 6.6 for twodimensional signals and nd an algorithm that computes the same
parameters for edges in images.
6.17. 3 Construct a compact image code from multiscale wavelet maxima 261]. An e cient coding algorithm must be introduced to
store the positions of the \important" multiscale edges as well as
the modulus and the angle values of the wavelet transform along
these edges. Do not forget that the wavelet transform angle is
nearly orthogonal to the tangent of the edge curve. Use the image
reconstruction algorithm of Section 6.3.2 to recover an image from
this coded representation.
6.18. 3 A generalized Cantor measure is de ned with a renormalization that transforms the uniform measure on 0 1] into a measure
equal to p1 , 0 and p2 respectively on 0 l1 ], l1 l2 ] and l2 1], with
p1 + p2 = 1. Iterating in nitely many times this renormalization
operation over each component of the resulting measures yields a
Cantor measure. The integral (6.79) of this measure is a devil's
staircase. Suppose that l1 , l2 , p1 and p2 are unknown. Find an
algorithm that computes these renormalization parameters by analyzing the self-similarity properties of the wavelet transform modulus maxima across scales. This problem is important in order to
identify renormalization maps in experimental data obtained from
physical experiments.
6.15. 3 298 CHAPTER 6. WAVELET ZOOM Chapter 7
Wavelet Bases
One can construct wavelets such that the dilated and translated
family
j
(t) = p1 j t ;22 n
jn
j
2
(j n)2Z2
is an orthonormal basis of L2( ). Behind this simple statement lie
very di erent point of views which open a fruitful exchange between
harmonic analysis and discrete signal processing.
Orthogonal wavelets dilated by 2j carry signal variations at the resolution 2;j . The construction of these bases can thus be related to
multiresolution signal approximations. Following this link leads us to
an unexpected equivalence between wavelet bases and conjugate mirror
lters used in discrete multirate lter banks. These lter banks implement a fast orthogonal wavelet transform that requires only O(N ) operations for signals of size N . The design of conjugate mirror lters also
gives new classes of wavelet orthogonal bases including regular wavelets
of compact support. In several dimensions, wavelet bases of L2( d ) are
constructed with separable products of functions of one variable. R R 7.1 Orthogonal Wavelet Bases 1 R Our search for orthogonal wavelets begins with multiresolution approximations. For 2
the partial sum
cients
P+1 hf i f canL2( ), be interpreted as of wavelet coebetween
indeed
the di erence
jn jn
n=;1
299 CHAPTER 7. WAVELET BASES 300 two approximations of f at the resolutions 2;j+1 and 2;j . Multiresolution approximations compute the approximation of signals at various
resolutions with orthogonal projections on di erent spaces fVj gj2Z.
Section 7.1.3 proves that multiresolution approximations are entirely
characterized by a particular discrete lter that governs the loss of information across resolutions. These discrete lters provide a simple
procedure for designing and synthesizing orthogonal wavelet bases. 7.1.1 Multiresolution Approximations
Adapting the signal resolution allows one to process only the relevant
details for a particular task. In computer vision, Burt and Adelson
108] introduced a multiresolution pyramid that can be used to process
a low-resolution image rst and then selectively increase the resolution
when necessary. This section formalizes multiresolution approximations, which set the ground for the construction of orthogonal wavelets.
The approximation of a function f at a resolution 2;j is speci ed by
a discrete grid of samples that provides local averages of f over neighborhoods of size proportional to 2j . A multiresolution approximation
is thus composed of embedded grids of approximation. More formally,
the approximation of a function at a resolution 2;j is de ned as an
orthogonal projection on a space Vj L2( ). The space Vj regroups
all possible approximations at the resolution 2;j . The orthogonal projection of f is the function fj 2 Vj that minimizes kf ; fj k. The
following de nition introduced by Mallat 254] and Meyer 47] speci es
the mathematical properties of multiresolution spaces. To avoid confusion, let us emphasize that a scale parameter 2j is the inverse of the
resolution 2;j . R R De nition 7.1 (Multiresolutions) A sequence fVj gj2Z of closed subspaces of L2 ( ) is a multiresolution approximation if the following 6
properties are satis ed:
8(j k) 2 ZZ
2 f (t) 2 Vj , f (t ; 2j k) 2 Vj 8j 2 Vj+1 Vj (7.1)
(7.2) 7.1. ORTHOGONAL WAVELET BASES
8j 2 Z 301 t
f (t) 2 Vj , f 2 2 Vj+1
lim V =
j !+1 j lim V
j !;1 j \ +1 Vj = f0g j =;1
+1 = Closure ! R Vj = L2 ( ) : j =;1
(t ; n)gn2Z is (7.3)
(7.4)
(7.5) There exists such that f
a Riesz basis of V0 :
Let us give an intuitive explanation of these mathematical properties. Property (7.1) means that Vj is invariant by any translation
proportional to the scale 2j . As we shall see later, this space can be
assimilated to a uniform grid with intervals 2j , which characterizes the
signal approximation at the resolution 2;j . The inclusion (7.2) is a
causality property which proves that an approximation at a resolution
2;j contains all the necessary information to compute an approximation
at a coarser resolution 2;j;1. Dilating functions in Vj by 2 enlarges
the details by 2 and (7.3) guarantees that it de nes an approximation
at a coarser resolution 2;j;1. When the resolution 2;j goes to 0 (7.4)
implies that we lose all the details of f and
lim kP f k = 0:
(7.6)
j !+1 Vj On the other hand, when the resolution 2;j goes +1, property (7.5)
imposes that the signal approximation converges to the original signal:
lim kf ; PVj f k = 0:
(7.7)
j !;1 When the resolution 2;j increases, the decay rate of the approximation
error kf ; PVj f k depends on the regularity of f . Section 9.1.3 relates
this error to the uniform Lipschitz regularity of f .
The existence of a Riesz basis f (t ; n)gn2Z of V0 provides a discretization theorem. The function can be interpreted as a unit resolution cell Appendix A.3 gives the de nition of a Riesz basis. There
exist A > 0 and B such that any f 2 V0 can be uniquely decomposed
into
+1
X
f (t) =
a n ] (t ; n )
(7.8)
n=;1 CHAPTER 7. WAVELET BASES 302
with +1
X A kf k2 n=;1 ja n]j2 B kf k2: (7.9) This energy equivalence guarantees that signal expansions over f (t ;
n)gn2Z are numerically stable. With the dilation property (7.3) and the
expansion (7.8), one can verify that the family f2;j=2 (2;j t ; n)gn2Z is
a Riesz basis of Vj with the same Riesz bounds A and B at all scales
2j . The following proposition gives a necessary and su cient condition
for f (t ; n)gn2Z to be a Riesz basis. Proposition 7.1 A family f (t ; n)gn2Z is a Riesz basis of the space
V0 it generates if and only if there exist A > 0 and B > 0 such that
8! 2 ; +1
X 1
B k=;1 j ^(! ; 2k )j2 1:
A (7.10) Proof 1 . Any f 2 V0 can be decomposed as f (t) = +1
X n=;1 a n] (t ; n): The Fourier transform of this equation yields
f^(!) = a(!) ^(!)
^ = 21 Z +1
;1 Z2
0 Z2 X
+1 jf^(!)j2 d! = 21 ja(!)j2
^ X
+1 k=;1 0 (7.12) P+1 a n] exp(;in!).
n=;1 where a(!) is the Fourier series a(!) =
^
^
norm of f can thus be written kf k2 = 21 (7.11) k=;1 The ja(! + 2k )j2 j ^(! + 2k )j2 d!
^ j ^(! + 2k )j2 d! (7.13) because a(!) is 2 periodic. The family f (t ; n)gn2Z is a Riesz basis if
and only if A kf k2 1
2 Z2
0 ja(!)j2 d! =
^ +1
X n=;1 ja n]j2 B kf k2 : (7.14) 7.1. ORTHOGONAL WAVELET BASES 303 If ^ satis es (7.10) then (7.14) is derived from (7.13). The linear independence of f (t ; n)gn2Z is a consequence of the fact that (7.14) is valid
for any a n] satisfying (7.11). If f = 0 then necessarily a n] = 0 for all
n 2 . The family f (t ; n)gn2Z is therefore a Riesz basis of V0.
Conversely, if f (t ; n)gn2Z is a Riesz basis then (7.14) is valid for
any a n] 2 l2 ( ). If either the lower bound or the upper bound of (7.10)
is not satis ed for almost all ! 2 ; ] then one can construct a nonzero 2 periodic function a(!) whose support corresponds to frequencies
^
where (7.10) is not veri ed. We then derive from (7.13) that (7.14) is
not valid for a n], which contradicts the Riesz basis hypothesis. Z Z Example 7.1 Piecewise constant approximations A simple mul- ZR tiresolution approximation is composed of piecewise constant functions.
The space Vj is the set of all g 2 L2 ( ) such that g(t) is constant for
t 2 n2j (n + 1)2j ) and n 2 . The approximation at a resolution
2;j of f is the closest piecewise constant function on intervals of size
2j . The resolution cell can be chosen to be the box window = 1 0 1) .
Clearly Vj Vj;1 since functions constant on intervals of size 2j are
also constant on intervals of size 2j;1. The veri cation of the other
multiresolution properties is left to the reader. It is often desirable
to construct approximations that are smooth functions, in which case
piecewise constant functions are not appropriate. Example 7.2 Shannon approximations Frequency band-limited functions also yield multiresolution approximations. The space Vj is dened as the set of functions whose Fourier transform has a support
included in ;2;j 2;j ]. Proposition 3.2 provides an orthonormal
basis f (t ; n)gn2Z of V0 de ned by
(t) = sin t t :
(7.15)
All other properties of multiresolution approximation are easily veri ed.
The approximation at the resolution 2;j of f 2 L2( ) is the function
PVj f 2 Vj that minimizes kPVj f ; f k. It is proved in (3.13) that its
Fourier transform is obtained with a frequency ltering:
PVj f (!) = f^(!) 1 ;2;j 2;j ](!): R CHAPTER 7. WAVELET BASES 304 This Fourier transform is generally discontinuous at 2;j , in which
case jPVj f (t)j decays like jtj;1, for large jtj, even though f might have
a compact support. Example 7.3 Spline approximations Polynomial spline approxi- mations construct smooth approximations with fast asymptotic decay.
The space Vj of splines of degree m 0 is the set of functions that
are m ; 1 times continuously di erentiable and equal to a polynomial
of degree m on any interval n2j (n + 1)2j ], for n 2 . When m = 0, it
is a piecewise constant multiresolution approximation. When m = 1,
functions in Vj are piecewise linear and continuous.
A Riesz basis of polynomial splines is constructed with box splines.
A box spline of degree m is computed by convolving the box window
1 0 1] with itself m + 1 times and centering at 0 or 1=2. Its Fourier
transform is
m+1
^(!) = sin(!=2)
exp ;i ! :
(7.16)
!=2
2
If m is even then = 1 and has a support centered at t = 1=2. If m is
odd then = 0 and (t) is symmetric about t = 0. Figure 7.1 displays
a cubic box spline m = 3 and its Fourier transform. For all m 0,
one can prove that f (t ; n)gn2Z is a Riesz basis of V0 by verifying
the condition (7.10). This is done with a closed form expression for the
series (7.24). Z ^(!) (t)
0.8 1 0.6 0.8
0.6 0.4 0.4
0.2
0
−2 0.2
−1 0 1 2 0 −10 0 10 Figure 7.1: Cubic box spline and its Fourier transform ^. 7.1. ORTHOGONAL WAVELET BASES 305 7.1.2 Scaling Function The approximation of f at the resolution 2;j is de ned as the orthogonal projection PVj f on Vj . To compute this projection, we must nd
an orthonormal basis of Vj . The following theorem orthogonalizes the
Riesz basis f (t ; n)gn2Z and constructs an orthogonal basis of each
space Vj by dilating and translating a single function called a scaling
function. To avoid confusing the resolution 2;j and the scale 2j , in
the rest of the chapter the notion of resolution is dropped and PVj f is
called an approximation at the scale 2j . Theorem 7.1 Let fVj gj2Z be a multiresolution approximation and
be the scaling function whose Fourier transform is
^(!)
^(!) =
P+1 j ^(! + 2k )j2 1=2 :
k=;1
Let us denote (7.17) p1 j t ;j n :
2
2
The family f j ngn2Z is an orthonormal basis of Vj for all j 2 .
j n(t) = Z Proof 1 . To construct an orthonormal basis, we look for a function 2
V0 . It can thus be expanded in the basis f (t ; n)gn2Z: (t) = +1
X n=;1 a n] (t ; n) which implies that ^(!) = a(!) ^(!)
^
where a is a 2 periodic Fourier series of nite energy. To compute a we
^
^
express the orthogonality of f (t ; n)gn2Z in the Fourier domain. Let
(t) = (;t). For any (n p) 2 2, ZZ h (t ; n) (t ; p)i =
= +1 ;1 (t ; n) (t ; p) dt ? (p ; n) : (7.18) CHAPTER 7. WAVELET BASES 306 Hence f (t ; n)gn2Z is orthonormal if and only if ? (n) = n]. Computing the Fourier transform of this equality yields
+1
X k=;1 j ^(! + 2k )j2 = 1: (7.19) Indeed, the Fourier transform of ? (t) is j ^(!)j2 , and we we proved
in (3.3) that sampling a function periodizes its Fourier transform. The
property (7.19) is veri ed if we choose a(!) =
^ +1
X k=;1 j ^(! + 2k )j2 !;1=2 : Proposition 7.1 proves that the denominator has a strictly positive lower
bound, so a is a 2 periodic function of nite energy.
^ Approximation The orthogonal projection of f over Vj is obtained
with an expansion in the scaling orthogonal basis PVj f = +1
X n=;1 hf j ni j n: (7.20) The inner products aj n] = hf j ni
(7.21)
provide a discrete approximation at the scale 2j . We can rewrite them
as a convolution product:
aj n] = Z +1
;1 f (t) p1 j
2 t ; 2j n dt = f ? (2j n)
j
2j (7.22) p
with j (t) = 2;j (2;j t). The energy of the Fourier transform ^
is typically concentrated in ; ], as p
illustrated by Figure 7.2. As
a consequence, the Fourier transform 2j ^ (2j !) of j (t) is mostly
non-negligible in ;2;j 2;j ]. The discrete approximation aj n] is
therefore a low-pass ltering of f sampled at intervals 2j . Figure 7.3
gives a discrete multiresolution approximation at scales 2;9 2j 2;4. 7.1. ORTHOGONAL WAVELET BASES 307
^(!) (t)
1
1
0.8
0.6 0.5 0.4
0 0.2
−10 −5 0 5 0 10 −10 0 10 Figure 7.2: Cubic spline scaling function and its Fourier transform ^
computed with (7.23). 2−4
2−5
2−6
2−7
2−8
2−9 f(t)
40
20
0
−20
0 0.2 0.4 0.6 0.8 1 t Figure 7.3: Discrete multiresolution approximations aj n] at scales 2j ,
computed with cubic splines. 308 CHAPTER 7. WAVELET BASES Example 7.4 For piecewise constant approximations and Shannon multiresolution approximations we have constructed Riesz bases f (t ;
n)gn2Z which are orthonormal bases, hence = . Example 7.5 Spline multiresolution approximations admit a Riesz basis constructed with a box spline of degree m, whose Fourier transform is given by (7.16). Inserting this expression in (7.17) yields
(;
^(!) = expp i !=2)
(7.23)
!m+1 S2m+2 (!)
with
+1
X
1
Sn(!) =
(7.24)
(! + 2k )n
k=;1 and = 1 if m is even or = 0 if m is odd. A closed form expression
of S2m+2 (!) is obtained by computing the derivative of order 2m of the
identity
+1
X
1
1
S2 (2!) =
2 = 4 sin2 ! :
k=;1 (2! + 2k )
For linear splines m = 1 and
2
S4(2!) = 1 + 2 cos !
(7.25)
48 sin4 !
which yields
p2
4
^(!) = p 3 sin (!=2) :
(7.26)
!2 1 + 2 cos2(!=2)
The cubic spline scaling function corresponds to m = 3 and ^(!) is
calculated with (7.23) by inserting
2
2
2
S8(2!) = 5 + 30 cos ! + 30 sin ! cos !
(7.27)
105 28 sin8 !
4
6
4
2
+ 70 cos ! + 2 sin ! cos8 ! + 2=3 sin ! :
105 28 sin !
This cubic spline scaling function and its Fourier transform are displayed in Figure 7.2. It has an in nite support but decays exponentially. 7.1. ORTHOGONAL WAVELET BASES 309 7.1.3 Conjugate Mirror Filters A multiresolution approximation is entirely characterized by the scaling
function that generates an orthogonal basis of each space Vj . We
study the properties of which guarantee that the spaces Vj satisfy
all conditions of a multiresolution approximation. It is proved that any
scaling function is speci ed by a discrete lter called a conjugate mirror
lter. Scaling Equation The multiresolution causality property (7.2) imposes that Vj Vj;1. In particular 2;1=2 (t=2) 2 V1 V0 . Since
f (t ; n)gn2Z is an orthonormal basis of V0 , we can decompose
+1
1 ( t ) = X h n] (t ; n)
p
2 2 n=;1 (7.28) with t (t ; n) :
1
(7.29)
h n] = p
22
This scaling equation relates a dilation of by 2 to its integer translations. The sequence h n] will be interpreted as a discrete lter.
The Fourier transform of both sides of (7.28) yields
1^
^(2!) = p h(!) ^(!)
(7.30)
2
P1
^
for h(!) = +=;1 h n] e;in! . It is thus tempting to express ^(!)
n
^
directly as a product of dilations of h(!). For any p 0, (7.30) implies
1^
^(2;p+1!) = p h(2;p!) ^(2;p!):
(7.31)
2
By substitution, we obtain
^(!) = P^
Y h(2;p!) ! ^
p=1 p 2 (2;P !): (7.32) CHAPTER 7. WAVELET BASES 310 If ^(!) is continuous at ! = 0 then P !+1 ^(2;P !) = ^(0) so
lim
+1 ^ ;p
^(!) = Y h(2 !) ^(0):
p
2
p=1 (7.33) The following theorem 254, 47] gives necessary and then su cient con^
ditions on h(!) to guarantee that this in nite product is the Fourier
transform of a scaling function. R Theorem 7.2 (Mallat, Meyer) Let 2 L2 ( ) be an integrable scalh2;1=2 ing function. The Fourier series of h n] =
(t=2) (t ; n)i satis es
^
^
8! 2
jh(! )j2 + jh(! + )j2 = 2
(7.34)
and
p
^
h(0) = 2:
(7.35)
^
Conversely, if h(! ) is 2 periodic and continuously di erentiable in a
neighborhood of ! = 0, if it satis es (7.34) and (7.35) and if R inf !2 ; =2 =2] then ^(!) = ^
jh(! )j > 0 +1 ^
Y h(2;p!) p=1 p 2 (7.36) R (7.37) is the Fourier transform of a scaling function 2 L2 ( ) .
Proof. This theorem is a central result whose proof is long and technical.
It is divided in several parts.
Proof 1 of the necessary condition (7.34) The necessary condition is
proved to be a consequence of the fact that f (t ; n)gn2Z is orthonormal.
In the Fourier domain, (7.19) gives an equivalent condition: 8! 2 R +1
X k=;1 j ^(! + 2k )j2 = 1: (7.38) 7.1. ORTHOGONAL WAVELET BASES 311 ^
Inserting ^(!) = 2;1=2 h(!=2) ^(!=2) yields
+1
X k=;1 ^
jh( ! + k )j2 j ^( ! + k )j2 = 2:
2
2 ^
Since h(!) is 2 periodic, separating the even and odd integer terms
gives
^
jh( ! )j2
2 +1
X p=;1 ^ ! + 2p
2 2 X 2 +1 +^ !+
h2 p=;1 ^ ! + + 2p
2 2 = 2: Inserting (7.38) for !0 = !=2 and !0 = !=2 + proves that
^
^
jh(!0)j2 + jh(!0 + )j2 = 2: p
^
Proof 2 of the necessary condition (7.35) We prove that h(0) = 2
^
by showing that ^(0) 6= 0. Indeed we know that ^(0) = 2;1=2 h(0) ^(0).
^(0)j = 1 is a consequence of the comMore precisely,we verify that j
pleteness property (7.5) of multiresolution approximations.
The orthogonal projection of f 2 L2 ( ) on Vj is
PVj f = +1
X n=;1 R hf j ni j n: (7.39) Property (7.5) expressed in the time and Fourier domains with the
Plancherel formula implies that
lim kf ; PVj f k2 = j !;1 2 kf^ ; PVj f k2 = 0:
lim
(7.40)
j !;1 p To compute the Fourier transform PVj f (!), we denote j (t) = 2;j (2;j t).
Inserting the convolution expression (7.22) in (7.39) yields PVj f (t) = +1
X n=;1 f? j (2j n) j (t ; 2j n) = j ? +1
X n=;1 f ? j (2j n) (t ; 2j n): p
The Fourier transform of f ? j (t) is 2j f^(!) ^ (2j !). A uniform sam- pling has a periodized Fourier transform calculated in (3.3), and hence PVj f (!) = ^(2j !) +1
X k=;1 k
f^ ! ; 22j ^ 2j ! ; 2kj
2 : (7.41) CHAPTER 7. WAVELET BASES 312 Let us choose f^ = 1 ; ]. For j < 0 and ! 2 ; ], (7.41) gives
PVj f (!) = j ^(2j !)j2 . The mean-square convergence (7.40) implies that
lim
j !;1 Z 2 1 ; j ^(2j !)j2 d! = 0 : ; Since is integrable, ^(!) is continuous and hence limj !;1 j ^(2j !)j =
j ^(0)j = 1.
We now prove that the function whose Fourier transform is given by
(7.37) is a scaling function. This is divided in two intermediate results.
Proof 3 that f (t ; n)gn2Z is orthonormal. Observe rst that the
in nite product (7.37) converges and that j ^(!)j 1 because (7.34)
p
^
implies that jh(!)j
2. The Parseval formula gives h (t) (t ; n)i = Z +1
;1 (t) (t ; n) dt = 21 Z +1
;1 j ^(!)j2 ein! d!: Verifying that f (t ; n)gn2Z is orthonormal is thus equivalent to showing
that
Z +1
j ^(!)j2 ein! d! = 2 n]:
;1 This result is obtained by considering the functions
^
^k (!) = Y h(2 !) 1 ;2k
p
2
p=1
k ;p 2k ] (! ): and computing the limit, as k increases to +1, of the integrals Ik n] = Z +1
;1 j ^k (!)j2 ein! d! = Z 2k Y jh(2;p!)j2
k^
ein! d!:
;2k p=1 First, let us show that Ik n] = 2 n] for all k
divide Ik n] into two integrals: 2 1. To do this, we Z 0 Y jh(2;p!)j2
Z 2k Y jh(2;p !)j2
k^
k^
in! d! +
Ik n] =
e
ein! d!:
;2k p=1 2 0 p=1 2 7.1. ORTHOGONAL WAVELET BASES 313 Let us make the change of variable !0 = ! +2k in the rst integral. Since
^
^
^
h(!) is 2 periodic, when p < k then jh(2;p !0 ; 2k ])j2 = jh(2;p !0 )j2 .
When k = p the hypothesis (7.34) implies that
^
^
jh(2;k !0 ; 2k ])j2 + jh(2;k !0)j2 = 2:
For k > 1, the two integrals of Ik n] become Ik n] = Q Z 2k kY1 jh(2;p !)j2
;^
0 2 p=1 ein! d! : (7.42) ;1
Since k=1 jh(2;p !)j2 ein! is 2k periodic we obtain Ik n] = Ik;1 n], and
p^
by induction Ik n] = I1 n]. Writing (7.42) for k = 1 gives I1 n] = Z2
0 ein! d! = 2 R n] R which veri es that Ik n] = 2 n], for all k 1.
We shall now prove that ^ 2 L2 ( ). For all ! 2
1 ^ ;p 2
^k (!)j2 = Y jh(2 !)j = j ^(!)j2 :
lim j
k!1
2
p=1
The Fatou Lemma A.1 on positive functions proves that Z +1 Z +1 j ^(!)j2 d! kl!1
im
j ^k (!)j2 d! = 2
;1
;1
because Ik 0] = 2 for all k 1. Since
j ^(!)j2 ein! = lim j ^k (!)j2 ein! (7.43) k!1 we nally verify that Z +1
;1 j ^(!)j2 ein! d! = klim
!1 Z +1
;1 j ^k (!)j2 ein! d! = 2 n] (7.44) by applying the dominated convergence Theorem A.1. This requires
verifying the upper-bound condition (A.1). This is done in our case by
proving the existence of a constant C such that
j ^k (!)j2 ein! = j ^k (!)j2 C j ^(!)j2 :
(7.45) CHAPTER 7. WAVELET BASES 314 Indeed, we showed in (7.43) that j ^(!)j2 is an integrable function.
The existence of C > 0 satisfying (7.45) is trivial for j!j > 2k since
^k (!) = 0. For j!j 2k since ^(!) = 2;1=2 h(!=2) ^(!=2), it follows
^
that
j ^(!)j2 = j ^k (!)j2 j ^(2;k !)j2 :
To prove (7.45) for j!j 2k , it is therefore su cient to show that
j ^(!)j2 1=C for ! 2 ; ].
^
Let us rst study the neighborhood of ! = 0. Since h(!) is continu^ (!)j2 2 = jh(0)j2 ,
^
ously di erentiable in this neighborhood and since jh
2 and log jh(!)j2 have derivatives that vanish at
^
the functions jh(!)j
e^
! = 0. It follows that there exists > 0 such that
!
^
jh(!)j2 ;j!j:
8j!j
0 loge
2
Hence, for j!j 2 +1
!3
X
^ (2;p !)j2
5
j ^(!)j2 = exp 4 loge jh
2 p=1 e;j!j e; : (7.46) Now let us analyze the domain j!j > . To do this we take an integer l
^
such that 2;l < . Condition (7.36) proves that K = inf !2 ; =2 =2] jh(!)j >
0 so if j!j
l
Y ^ ;p 2
2 K 2l ;
1
j ^(!)j2 = jh(2 2 !)j ^ 2;l !
l e = C:
2
p=1
This last result nishes the proof of inequality (7.45). Applying the
dominated convergence Theorem A.1 proves (7.44) and hence that f (t ;
n)gn2Z is orthonormal. A simple change of variable shows that f j n gj2Z
is orthonormal for all j 2 .
Proof 3 that fVj gj 2Z is a multiresolution. To verify that is a scaling function, we must show that the spaces Vj generated by f j n gj 2Z
de ne a multiresolution approximation. The multiresolution properties
(7.1) and (7.3) are clearly true. The causality Vj +1 Vj is veri ed by
showing that for any p 2 , Z Z j +1 p = +1
X n=;1 h n ; 2p] j n: 7.1. ORTHOGONAL WAVELET BASES 315 This equality is proved later in (7.112). Since all vectors of a basis of Vj+1 can decomposed in a basis of Vj it follows that Vj+1 Vj . R To prove the multiresolution property (7.4) we must show that any f 2 L2( ) satis es lim kPVj f k = 0: (7.47) j !+1 Since f j ngn2Z is an orthonormal basis of Vj kPVj f k2 = +1
X n=;1 jhf j nij2 : Suppose rst that f is bounded by A and has a compact support included
in 2J 2J ]. The constants A and J may be arbitrarily large. It follows
that
+1
X n=;1 jhf jn 2;j ij2 " X Z 2J
+1 jf (t)j j n=;1 ;2J
+1
2J 2;j A2 "X Z n=;1 ;2J j (2;j t ; n)j dt (2;j t ; n)j dt #2 #2 Applying the Cauchy-Schwarz inequality to 1 j (2;j t ; n)j yields
+1
X n=;1 jhf jn ij2 A2 2J +1
A2 2J +1 +1
X Z 2J n=;1 ;2J Z Sj j j (2;j t ; n)j2 2;j dt (t)j2 dt = A2 2J +1 Z +1 Z ;1 j (t)j2 1Sj (t) dt with Sj = n2Z n ; 2J ;j n + 2J ;j ] for j > J . For t 2 we obviously
=
have 1Sj (t) ! 0 for j ! +1. The dominated convergence Theorem A.1
applied to j (t)j2 1Sj (t) proves that the integral converges to 0 and hence
lim R +1
X j !+1 n=;1 jhf j nij 2 = 0: R Property (7.47) is extended to any f 2 L2( ) by using the density in
L2( ) of bounded function with a compact support, and Proposition
A.3. CHAPTER 7. WAVELET BASES 316 R To prove the last multiresolution property (7.5) we must show that
for any f 2 L2( ),
lim kf ; PVj f k2 = j !;1 kf k2 ; kPVj f k2 = 0:
lim j !;1 (7.48) We consider functions f whose Fourier transform f^ has a compact support included in ;2J 2J ] for J large enough. We proved in (7.41)
that the Fourier transform of PVj f is PVj f (!) = ^(2j !) +1
X ^ ; ;j
f ! ; 2 2k k=;1 ^ ;2j ! ; 2;j 2k : If j < ;J , then the supports of f^(! ; 2;j 2k ) are disjoint for di erent
k so
Z +1
jf^(!)j2 j ^(2j !)j4 d!
kPVj f k2 = 21
;1
+1
1 Z +1 X jf ;! ; 2;j 2k j2 j ^(2j !)j2 j ^ ;2j ! ; 2;j 2k
^
+2
;1 k=;1
k6=0 We have already observed that j (!)j 1 and (7.46) proves that for !
su ciently small j (!)j e;j!j so
lim j ^(!)j = 1:
!!0
Since jf^(!)j2 j ^(2j !)j4 jf^(!)j2 and limj !;1 j ^(2j !)j4 jf^(!)j2 = jf^(!)j2
one can apply the dominated convergence Theorem A.1, to prove that
lim Z +1 j !;1 ;1 jf^(!)j2 j ^(2j !)j4 d! = Z +1
;1 jf^(!)j2 d! = kf k2 : (7.50) The operator PVj is an orthogonal projector, so kPVj f k kf k. With
(7.49) and (7.50), this implies that limj !;1(kf k2 ; kPVj f k2 ) = 0, and
hence veri es (7.48). This property is extended to any f 2 L2 ( ) by
using the density in L2 ( ) of functions whose Fourier transforms have a
compact support and the result of Proposition A.3. R R Discrete lters whose transfer functions satisfy (7.34) are called conjugate mirror lters. As we shall see in Section 7.3, they play an important role in discrete signal processing they make it possible to decompose discrete signals in separate frequency bands with lter banks. (7.49) j2 d!: 7.1. ORTHOGONAL WAVELET BASES 317 One di culty of the proof is showing that the in nite cascade of convolutions that is represented in the Fourier domain by the product (7.37)
does converge to a decent function in L2( ). The su cient condition
(7.36) is not necessary to construct a scaling function, but it is always
satis ed in practical designs of conjugate mirror lters. It cannot just
^
be removed as shown by the example h(!) = cos(3!=2), which satises all other conditions. In this case, a simple calculation shows that
= 1 1 ;3=2 3=2] . Clearly f (t ; n)gn2Z is not orthogonal so is not a
3
scaling function. The condition (7.36) may however be replaced by a
weaker but more technical necessary and su cient condition proved by
Cohen 17, 128]. R Example 7.6 For a Shannon multiresolution approximation, ^ =
1 ; ]. We thus derive from (7.37) that
p
^
8! 2 ; ] h(! ) = 2 1 ; =2 =2] (! ):
Example 7.7 For piecewise constant approximations, = 1 0 1]. Since
t
h n] = h2;1=2 ( 2 ) (t ; n)i it follows that h n] = 2;1=2 if n = 0 1
0
otherwise (7.51) Example 7.8 Polynomial splines of degree m correspond to a conju- ^
gate mirror lter h(!) that is calculated from ^(!) with (7.30):
p^
^
h(!) = 2 ^(2!) :
(7.52)
(!)
Inserting (7.23) yields
^
h(!) = exp ;i2 ! s S2m+2 (!)
2m+1 S2m+2 (2! )
2 (7.53) CHAPTER 7. WAVELET BASES 318 where = 0 if m is odd and = 1 if m is even. For linear splines m = 1
so (7.25) implies that
1=2 2
^ (!) = p2 1 + 2 cos (!=2)
h
1 + 2 cos2 ! cos2 ! :
2 (7.54) For cubic splines, the conjugate mirror lter is calculated by inserting
^
(7.27) in (7.53). Figure 7.4 gives the graph of jh(!)j2. The impulse responses h n] of these lters have an in nite support but an exponential
decay. For m odd, h n] is symmetric about n = 0. Table 7.1 gives the
coe cients h n] above 10;4 for m = 1 3.
2
1
0 −2 0 2 ^
Figure 7.4: The solid line gives jh(!)j2 on ; ], for a cubic spline
multiresolution. The dotted line corresponds to jg(!)j2.
^ 7.1.4 In Which Orthogonal Wavelets Finally Arrive Orthonormal wavelets carry the details necessary to increase the resolution of a signal approximation. The approximations of f at the scales
2j and 2j;1 are respectively equal to their orthogonal projections on
Vj and Vj;1. We know that Vj is included in Vj;1. Let Wj be the
orthogonal complement of Vj in Vj;1: Vj;1 = Vj Wj :
(7.55)
The orthogonal projection of f on Vj;1 can be decomposed as the sum
of orthogonal projections on Vj and Wj :
PVj;1 f = PVj f + PWj f: (7.56) 7.1. ORTHOGONAL WAVELET BASES m=1 n
0 h n] 0.817645956
1 ;1
0.397296430
2 ;2 ;0:069101020
3 ;3 ;0:051945337
4 ;4
0.016974805
5 ;5
0.009990599
6 ;6 ;0:003883261
7 ;7 ;0:002201945
8 ;8
0.000923371
9 ;9
0.000511636
10 ;10 ;0:000224296
11 ;11 ;0:000122686
m=3
0
0.766130398
1 ;1
0.433923147
2 ;2 ;0:050201753
3 ;3 ;0:110036987
4 ;4
0.032080869 m=3 319 5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 n ;5
;6
;7
;8
;9
;10
;11
;12
;13
;14
;15
;16
;17
;18
;19
;20 h n] 0.042068328
;0:017176331
;0:017982291
0.008685294
0.008201477
;0:004353840
;0:003882426
0.002186714
0.001882120
;0:001103748
;0:000927187
0.000559952
0.000462093
;0:000285414
;0:000232304
0.000146098 Table 7.1: Conjugate mirror lters h n] corresponding to linear splines
m = 1 and cubic splines m = 3. The coe cients below 10;4 are not
given. CHAPTER 7. WAVELET BASES 320 The complement PWj f provides the \details" of f that appear at the
scale 2j;1 but which disappear at the coarser scale 2j . The following
theorem 47, 254] proves that one can construct an orthonormal basis
of Wj by scaling and translating a wavelet . Theorem 7.3 (Mallat, Meyer) Let be a scaling function and h the corresponding conjugate mirror lter. Let
Fourier transform is be the function whose 1^
^(!) = p g ! ^ !
2
22 (7.57) ^
g(!) = e;i! h (! + ):
^ (7.58) with
Let us denote j
(t) = p1 j t ;22 n :
jn
j
2
For any scale 2j , f j ngn2Z is an orthonormal basis of Wj . For all
scales, f j ng(j n)2Z2 is an orthonormal basis of L2 ( ) . R Proof 1 . Let us prove rst that ^ can be written as the product (7.57).
Necessarily (t=2) 2 W1 V0 . It can thus be decomposed in f (t ;
n)gn2Z which is an orthogonal basis of V0 : 1
p
2
with +1
t = X g n] (t ; n)
2 1
g n] = p n=;1 t 2
2
The Fourier transform of (7.59) yields (t ; n) : (7.59)
(7.60) 1^
^(2!) = p g(!) ^(!):
(7.61)
2
The following lemma gives necessary and su cient conditions on g for
^
designing an orthogonal wavelet. 7.1. ORTHOGONAL WAVELET BASES Lemma 7.1 The family f
only if 321 j n gn2Z is an orthonormal basis of Wj if and jg(!)j2 + jg(! + )j2 = 2
^
^
^
g(!) h (!) + g(! + ) h (! + ) = 0:
^^
^ and (7.62)
(7.63) The lemma is proved for j = 0 from which it is easily extended to
j 6= 0 with an appropriate scaling. As in (7.19) one can verify that
f (t ; n)gn2Z is orthonormal if and only if 8! 2 R I (!) = +1
X k=;1 j ^(! + 2k )j2 = 1: (7.64) Since ^(!) = 2;1=2 g(!=2) ^(!=2) and g (!) is 2 periodic,
^
^
+1
X I (!) = k=;1 jg ! + k j2 j ^ ! + k j2
^2
2 +1
X ^!
= jg ! j2
^2
j 2 + 2p j2 + jg ! +
^2
p=;1 j2 +1
X p=;1 j ^ ! + + 2p j2:
2 P1
We know that +=;1 j ^(! +2p )j2 = 1 so (7.64) is equivalent to (7.62).
p
The space W0 is orthogonal to V0 if and only if f (t ; n)gn2Z and
f (t ; n)gn2Z are orthogonal families of vectors. This means that for
any n 2
h (t) (t ; n)i = ? (n) = 0:
The Fourier transform of ? (t) is ^(!) ^ (!). The sampled sequence
? (n) is zero if its Fourier series computed with (3.3) satis es Z 8! 2 R +1
X k=;1 ^(! + 2k ) ^ (! + 2k ) = 0: (7.65) 1= ^
^
By inserting ^(!) = 2;P2 g (!=2) ^(!=2) and ^(!) = 2;1=2 h(!=2) ^(!=2)
+1 j ^(! +2k )j2 = 1 we prove as before that
in this equation, since k=;1
(7.65) is equivalent to (7.63). 322 CHAPTER 7. WAVELET BASES
p
We must nally verify that V;1 = V0 W0 . Knowing that f 2 (2t;
n)gn2Z is an orthogonal basis of V;1 , it is equivalent to show that for
any a n] 2 l2 ( ) there exist b n] 2 l2 ( ) and c n] 2 l2( ) such that
+1
+1
+1
Xp
X
X
;1 n]) =
a n] 2 ( 2 t ; 2
b n] (t ; n) +
c n] (t ; n): Z n=;1 Z n=;1 Z n=;1 (7.66)
This is done by relating ^(!) and c(!) to a(!). The Fourier transform
b
^
^
of (7.66) yields
1^
p a ! ^ ! = ^(!) ^(!) + c(!) ^(!):
b
^
2
22
^
Inserting ^(!) = 2;1=2 g (!=2) ^(!=2) and ^(!) = 2;1=2 h(!=2) ^(!=2)
^
in this equation shows that it is necessarily satis ed if
b ^2 ^ ^2
(7.67)
a ! = ^(!) h ! + c(!) g ! :
^2
Let us de ne
^(2!) = 1 a(!) h (!) + a(! + ) h (! + )]
^
^
b
^
2^
and
c(2!) = 1 a(!) g (!) + a(! + ) g (! + )]:
^
^
^
2^ ^
When calculating the right-hand side of (7.67) we verify that it is equal
to the left-hand side by inserting (7.62), (7.63) and using
^
^
jh(!)j2 + jh(! + )j2 = 2:
(7.68) Since ^(!) and c(!) are 2 periodic they are the Fourier series of two
b
^
sequences b n] and c n] that satisfy (7.66). This nishes the proof of the
lemma.
The formula (7.58)
^
g(!) = e;i! h (! + )
^ satis es (7.62) and (7.63) because of (7.68). We thus derive from Lemma
7.1 that f j n g(j n)2Z2 is an orthogonal basis of Wj .
We complete the proof of the theorem by verifying that f j n g(j n)2Z2
is an orthogonal basis of L2 ( ). Observe rst that the detail spaces R 7.1. ORTHOGONAL WAVELET BASES 323 fWj gj2Z are orthogonal. Indeed Wj is orthogonal to Vj and Wl
Vl;1 Vj for j < l. Hence Wj and Wl are orthogonal. We can also
decompose Indeed Vj ;1 = Wj L>J R 1
L2( ) = +=;1Wj :
(7.69)
j
Vj and we verify by substitution that for any VL = J
j =L;1Wj VJ : (7.70)
Since fVj gj 2Z is a multiresolution approximation, VL and VJ tend respectively to L2 ( ) and f0g when L and J go respectively to ;1 and
+1, which implies (7.69). A union of orthonormal bases of all Wj is
therefore an orthonormal basis of L2 ( ). R R
g The proof of the theorem shows that ^ is the Fourier series of
t (t ; n)
1
(7.71)
g n] = p
22
which are the decomposition coe cients of
+1
1
t = X g n] (t ; n):
p
(7.72)
22
n=;1
Calculating the inverse Fourier transform of (7.58) yields
g n] = (;1)1;n h 1 ; n]:
(7.73)
This mirror lter plays an important role in the fast wavelet transform
algorithm. Example 7.9 Figure 7.5 displays the cubic spline wavelet and its Fourier transform ^ calculated by inserting in (7.57) the expressions
^
(7.23) and (7.53) of ^(!) and h(!). The properties of this BattleLemarie spline wavelet are further studied in Section 7.2.2. Like most
orthogonal wavelets, the energy of ^ is essentially concentrated in
;2 ; ]
2 ]. For any that generates an orthogonal basis of
L2( ), one can verify that R R 8! 2 ; f0g This is illustrated in Figure 7.6. +1
X j =;1 j ^(2j ! )j2 = 1: CHAPTER 7. WAVELET BASES 324 j ^(! )j (t )
1
1 0.8 0.5 0.6 0 0.4 −0.5 0.2 −1 −5 0 0 5 −20 −10 0 10 20 Figure 7.5: Battle-Lemarie cubic spline wavelet and its Fourier transform modulus. 1
0.8
0.6
0.4
0.2
0 −2 0 2 Figure 7.6: Graphs of j ^(2j !)j2 for the cubic spline Battle-Lemarie
wavelet, with 1 j 5 and ! 2 ; ]. 7.1. ORTHOGONAL WAVELET BASES 325 The orthogonal projection of a signal f in a \detail" space Wj is
obtained with a partial expansion in its wavelet basis PWj f = +1
X n=;1 hf j ni j n: A signal expansion in a wavelet orthogonal basis can thus be viewed as
an aggregation of details at all scales 2j that go from 0 to +1 f= +1
X j =;1 PWj f = +1 +1
XX j =;1 n=;1 hf j ni j n: Figure 7.7 gives the coe cients of a signal decomposed in the cubic
spline wavelet orthogonal basis. The calculations are performed with
the fast wavelet transform algorithm of Section 7.3.
Approximation 2−5
2−6
2−7
2−8
2−9 f(t)
40
20
0
−20
0 0.2 0.4 0.6 0.8 1 t Figure 7.7: Wavelet coe cients dj n] = hf j ni calculated at scales 2j
with the cubic spline wavelet. At the top is the remaining coarse signal
approximation aJ n] = hf J ni for J = ;5. CHAPTER 7. WAVELET BASES 326 Wavelet Design Theorem 7.3 constructs a wavelet orthonormal ba- ^
sis from any conjugate mirror lter h(!). This gives a simple procedure
for designing and building wavelet orthogonal bases. Conversely, we
may wonder whether all wavelet orthonormal bases are associated to
a multiresolution approximation and a conjugate mirror lter. If we
impose that has a compact support then Lemarie 41] proved that
necessarily corresponds to a multiresolution approximation. It is however possible to construct pathological wavelets that decay like jtj;1
at in nity, and which cannot be derived from any multiresolution approximation. Section 7.2 describes important classes of wavelet bases
^
and explains how to design h to specify the support, the number of
vanishing moments and the regularity of . 7.2 Classes of Wavelet Bases 1
7.2.1 Choosing a Wavelet Most applications of wavelet bases exploit their ability to e ciently
approximate particular classes of functions with few non-zero wavelet
coe cients. This is true not only for data compression but also for
noise removal and fast calculations. The design of must therefore
be optimized to produce a maximum number of wavelet coe cients
hf j ni that are close to zero. A function f has few non-negligible
wavelet coe cients if most of the ne-scale (high-resolution) wavelet
coe cients are small. This depends mostly on the regularity of f , the
number of vanishing moments of and the size of its support. To
construct an appropriate wavelet from a conjugate mirror lter h n],
^
we relate these properties to conditions on h(!). Vanishing Moments Let us recall that has p vanishing moments
if Z +1
;1 tk (t) dt = 0 for 0 k < p. (7.74) This mean that is orthogonal to any polynomial of degree p ; 1.
Section 6.1.3 proves that if f is regular and has enough vanishing
moments then the wavelet coe cients jhf j nij are small at ne scales 7.2. CLASSES OF WAVELET BASES 327 2j . Indeed, if f is locally Ck , then over a small interval it is well approximated by a Taylor polynomial of degree k. If k < p, then wavelets
are orthogonal to this Taylor polynomial and thus produce small amplitude coe cients at ne scales. The following theorem relates the
number of vanishing moments of to the vanishing derivatives of ^(!)
^
at ! = 0 and to the number of zeroes of h(!) at ! = . It also proves
that polynomials of degree p ; 1 are then reproduced by the scaling
functions. Theorem 7.4 (Vanishing moments) Let and be a wavelet and
a scaling function that generate an orthogonal basis. Suppose that
j (t)j = O((1 + t2 );p=2;1 ) and j (t)j = O((1 + t2 );p=2;1 ). The four
following statements are equivalent:
(i) The wavelet has p vanishing moments.
(ii) ^(!) and its rst p ; 1 derivatives are zero at ! = 0.
^
(iii) h(!) and its rst p ; 1 derivatives are zero at ! = .
(iv) For any 0 k < p, qk (t) = +1
X n=;1 nk (t ; n) is a polynomial of degree k: (7.75) Proof 2 . The decay of j (t)j and j (t)j implies that ^(!) and ^(!) are
p times continuously di erentiable. The kth order derivative ^(k) (!) is
the Fourier transform of (;it)k (t). Hence ^(k) (0) = Z +1
;1 (;it)k (t) dt: We derive that (i) is equivalent to (ii).
Theorem 7.3 proves that
p^
^
2 (2!) = e;i! h (! + ) ^(!):
Since ^(0) 6= 0, by di erentiating this expression we prove that (ii) is
equivalent to (iii).
Let us now prove that (iv) implies (i). Since is orthogonal to
f (t ; n)gn2Z, it is thus also orthogonal to the polynomials qk for 0
k < p. This family of polynomials is a basis of the space of polynomials CHAPTER 7. WAVELET BASES 328 of degree at most p ; 1. Hence is orthogonal to any polynomial of
degree p ; 1 and in particular to tk for 0 k < p. This means that
has p vanishing moments.
To verify that (i) implies (iv) we suppose that has p vanishing
moments, and for k < p we evaluate qk (t) de ned in (7.75). This is done
by computing its Fourier transform: qk (!) = ^(!)
^ +1
X n=;1 nk exp(;in!) = (i)k k ^(!) d k
d! +1
X n=;1 exp(;in!) : Let (k) be the distribution that is the kth order derivative of a Dirac,
de ned in Appendix A.7. The Poisson formula (2.4) proves that
+1
k 1 ^(!) X (k) (! ; 2l ):
qk (!) = (i) 2
^
(7.76)
l=;1
With several integrations by parts, we verify the distribution equality
^(!) (k) (! ; 2l ) = ^(2l ) (k) (! ; 2l )+ X ak l (m) (! ; 2l ) (7.77)
m
k;1 m=0 is a linear combination of the derivatives f ^(m) (2l )g0 m k .
For l 6= 0, let us prove that ak l = 0 by showing that ^(m) (2l ) = 0
m
if 0 m < p. For any P > 0, (7.32) implies
P ^ ;p
^(!) = ^(2;P !) Y h(2 !) :
p
(7.78)
2
p=1
^
Since has p vanishing moments, we showed in (iii) that h(!) has a zero
^
of order p at ! = . But h(!) is also 2 periodic, so (7.78) implies
^(!) = O(j! ; 2l jp ) in the neighborhood of ! = 2l , for any l 6= 0.
that
Hence ^(m) (2l ) = 0 if m < p.
Since ak l = 0 and (2l ) = 0 when l 6= 0, it follows from (7.77) that
m
^(!) (k) (! ; 2l ) = 0 for l 6= 0:
The only term that remains in the summation (7.76) is l = 0 and inserting (7.77) yields
where ak l
m qk (!) = (i)k 21
^ ^(0) (k) (!) + k;1
X m=0 ak 0
m (m) (! ) ! : 7.2. CLASSES OF WAVELET BASES 329 The inverse Fourier transform of (m) (!) is (2 );1 (;it)m and Theorem
7.2 proves that ^(0) 6= 0. Hence the inverse Fourier transform qk of qk is
^
a polynomial of degree k. The hypothesis (iv) is called the Fix-Strang condition 320]. The polynomials fqk g0 k<p de ne a basis of the space of polynomials of degree
p ; 1. The Fix-Strang condition thus proves that has p vanishing
moments if and only if any polynomial of degree p ; 1 can be written
as a linear expansion of f (t ; n)gn2Z. The decomposition coe cients
of the polynomials qk do not have a nite energy because polynomials
do not have a nite energy. Size of Support If f has an isolated singularity at t0 and if t0 is inside the support of j n(t) = 2;j=2 (2;j t ; n), then hf j ni may
have a large amplitude. If has a compact support of size K , at
each scale 2j there are K wavelets j n whose support includes t0 . To
minimize the number of high amplitude coe cients we must reduce the
support size of . The following proposition relates the support size of
h to the support of and .
Proposition 7.2 (Compact support) The scaling function has a
compact support if and only if h has a compact support and their support
are equal. If the support of h and is N1 N2] then the support of is
(N1 ; N2 + 1)=2 (N2 ; N1 + 1)=2].
Proof 1 . If has a compact support, since
1
t
h n] = p
2 (t ; n)
2
we derive that h also has a compact support. Conversely, the scaling
function satis es
+1
1
t = X h n] (t ; n):
p
(7.79)
22
n=;1 If h has a compact support then one can prove 144] that has a compact
support. The proof is not reproduced here.
To relate the support of and h, we suppose that h n] is non-zero for
N1 n N2 and that has a compact support K1 K2 ]. The support CHAPTER 7. WAVELET BASES 330 of (t=2) is 2K1 2K2 ]. The sum at the right of (7.79) is a function whose
support is N1 + K1 N2 + K2 ]. The equality proves that the support of
is K1 K2 ] = N1 N2 ].
Let us recall from (7.73) and (7.72) that
1
p
2 +1
+1
t = X g n] (t ; n) = X (;1)n+1 h ;n ; 1] (t ; n):
2 n=;1 n=;1 If the supports of and h are equal to N1 N2 ], the sum in the righthand side has a support equal to N1 ; N2 + 1 N2 ; N1 + 1]. Hence
has a support equal to (N1 ; N2 + 1)=2 (N2 ; N1 + 1)=2]. If h has a nite impulse response in N1 N2 ], Proposition 7.2 proves
that has a support of size N2 ; N1 centered at 1=2. To minimize the
size of the support, we must synthesize conjugate mirror lters with as
few non-zero coe cients as possible. Support Versus Moments The support size of a function and the number of vanishing moments are a priori independent. However, we
shall see in Theorem 7.5 that the constraints imposed on orthogonal
wavelets imply that if has p vanishing moments then its support is
at least of size 2p ; 1. Daubechies wavelets are optimal in the sense
that they have a minimum size support for a given number of vanishing
moments. When choosing a particular wavelet, we thus face a trade-o
between the number of vanishing moments and the support size. If f
has few isolated singularities and is very regular between singularities,
we must choose a wavelet with many vanishing moments to produce
a large number of small wavelet coe cients hf j ni. If the density
of singularities increases, it might be better to decrease the size of
its support at the cost of reducing the number of vanishing moments.
Indeed, wavelets that overlap the singularities create high amplitude
coe cients.
The multiwavelet construction of Geronimo, Hardin and Massupust
190] o ers more design exibility by introducing several scaling functions and wavelets. Problem 7.16 gives an example. Better trade-o
can be obtained between the multiwavelets supports and their vanishing moments 321]. However, multiwavelet decompositions are imple- 7.2. CLASSES OF WAVELET BASES 331 mented with a slightly more complicated lter bank algorithm than a
standard orthogonal wavelet transform. Regularity The regularity of has mostly a cosmetic in uence on the error introduced by thresholding or quantizing the wavelet coe cients. When reconstructing a signal from its wavelet coe cients f= +1 +1
XX j =;1 n=;1 hf j ni j n an error added to a coe cient hf j ni will add the wavelet component j n to the reconstructed signal. If is smooth, then j n
is a smooth error. For image coding applications, a smooth error is
often less visible than an irregular error, even though they have the
same energy. Better quality images are obtained with wavelets that are
continuously di erentiable than with the discontinuous Haar wavelet.
The following proposition due to Tchamitchian 327] relates the uni^
form Lipschitz regularity of and to the number of zeroes of h(!)
at ! = .
^
Proposition 7.3 (Tchamitchian) Let h(!) be a conjugate mirror lter with p zeroes at and which satis es the su cient conditions of
Theorem 7.2. Let us perform the factorization
i! p
^ (! ) = p 2 1 + e
^(!):
h
l
2
If sup!2R j^(! )j = B then and are uniformly Lipschitz for
l
< 0 = p ; log2 B ; 1:
(7.80)
Proof 3 . This result is proved by showing that there exist C1 > 0 and
C2 > 0 such that for all ! 2
j ^(!)j
C1 (1 + j!j);p+log2 B
(7.81)
;p+log2 B :
j ^(!)j
C2 (1 + j!j)
(7.82) R The Lipschitz regularity of and is then derived from Theorem 6.1,
R +1
which shows that if ;1 (1 + j!j ) jf^(!)j d! < +1, then f is uniformly
Lipschitz . 332 CHAPTER 7. WAVELET BASES Q1
^
We proved in (7.37) that ^(!) = +=1 2;1=2 h(2;j !). One can verify
j
that
+1
Y 1 + exp(i2;j !) 1 ; exp(i!)
=
2
i!
j =1
hence p +1
Y j ^(!)j = j1 ; exp(i!)j
j!jp j =1 j^(2;j !)j:
l (7.83) Q1l
Let us now compute an upper bound for +=1 j^(2;j !)j. At ! = 0
j
p
^
^
we have h(0) = 2 so ^(0) = 1. Since h(!) is continuously di erentiable
l
at ! = 0, ^(!) is also continuously di erentiable at ! = 0. We thus
l
derive that there exists > 0 such that if j!j < then j^(!)j 1 + K j!j.
l
Consequently
sup j! j Y^ +1 j =1 jl(2;j !)j sup
j!j If j!j > , there exists J
decompose Y^ +1 j =1 Y +1 j =1 (1 + K j2;j !j) eK : 1 such that 2J ;1 l(2;j !) = J
Y^ j =1 jl(2;j !)j j!j Y^ +1 j =1 jl(2;j;J !)j: (7.84)
2J and we
(7.85) Since sup!2R j^(!)j = B , inserting (7.84) yields for j!j >
l Y^ +1 j =1 Since 2J ;1 2j! j, 8! 2 R l(2;j !) B J eK = eK 2J log2 B : (7.86) this proves that Y^ +1 j =1 j
l(2;j !) eK 1 + j2!og2 B :
l
log2 B Equation (7.81) is derived from (7.83) and this last inequality. Since
^
j ^(2!)j = 2;1=2 jh(! + )j j ^(!)j, (7.82) is obtained from (7.81). 7.2. CLASSES OF WAVELET BASES 333 This proposition proves that if B < 2p;1 then 0 > 0. It means that
and are uniformly continuous. For any m > 0, if B < 2p;1;m then
0 > m so and are m times continuously di erentiable. Theorem
^
7.4 shows that the number p of zeros of h(!) at is equal to the
number of vanishing moments of . A priori, we are not guaranteed
that increasing p will improve the wavelet regularity, since B might
increase as well. However, for important families of conjugate mirror
lters such as splines or Daubechies lters, B increases more slowly than
p, which implies that wavelet regularity increases with the number of
vanishing moments. Let us emphasize that the number of vanishing
moments and the regularity of orthogonal wavelets are related but it is
the number of vanishing moments and not the regularity that a ects
the amplitude of the wavelet coe cients at ne scales. 7.2.2 Shannon, Meyer and Battle-Lemarie Wavelets We study important classes of wavelets whose Fourier transforms are
derived from the general formula proved in Theorem 7.3,
1^
1
^(!) = p g ! ^ ! = p exp ;i! h ! +
^
2
2
2
2
2
2 ^!:
2
(7.87) Shannon Wavelet The Shannon wavelet is constructed from the
Shannon multiresolution approximation, which approximates functions
by their restriction p low frequency intervals. It corresponds to ^ =
to
^
1 ; ] and h(!) = 2 1 ; =2 =2](!) for ! 2 ; ]. We derive from
(7.87) that
^(!) =
and hence exp (;i!=2) if ! 2 ;2 ; ]
0
otherwise 2] t 1=
t 1=
(t) = sin 2(t(;;=2)2) ; sin (t(;;=2)2) :
2
1
1 (7.88) This wavelet is C1 but has a slow asymptotic time decay. Since ^(!) is
zero in the neighborhood of ! = 0, all its derivatives are zero at ! = 0. 334 CHAPTER 7. WAVELET BASES Theorem 7.4 thus implies that has an in nite number of vanishing
moments.
Since ^(!) has a compact support we know that (t) is C1. However j (t)j decays only like jtj;1 at in nity because ^(!) is discontinuous at and 2 . Meyer Wavelets A Meyer wavelet 270] is a frequency band-limited function whose Fourier transform is smooth, unlike the Fourier transform of the Shannon wavelet. This smoothness provides a much faster
asymptotic decay in time. These wavelets are constructed with conju^
gate mirror lters h(!) that are Cn and satisfy
p
2 if ! 2 ; =3 =3]
^ (!) =
(7.89)
h
0 if ! 2 ; ;2 =3] 2 =3 ] :
^
The only degree of freedom is the behavior of h(!) in the transition
bands ;2 =3 ; =3] =3 2 =3]. It must satisfy the quadrature condition
^
^
jh(! )j2 + jh(! + )j2 = 2
(7.90)
and to obtain Cn junctions at j!j = =3 and j!j = 2 =3, the n rst
derivatives must vanish at these abscissa. One can construct such functions that are C1.
Q1 ^
The scaling function ^(!) = +=1 2;1=2 h(2;p!) has a compact supp
port and one can verify that
( ;1=2 ^
^(!) = 2 h(!=2) if j!j 4 =3 :
(7.91)
0
if j!j > 4 =3 The resulting wavelet (7.87) is
80
if j!j 2 =3
>
> 2;1=2 g(!=2)
<
^
if 2 =3 j!j 4 =3
^(!) =
> 2;1=2 exp(;i!=2) ^ (!=4) if 4 =3 j!j 8 =3 : (7.92)
h
>
:0
if j!j > 8 =3
The functions and are C1 because their Fourier transforms have
a compact support. Since ^(!) = 0 in the neighborhood of ! = 0, all 7.2. CLASSES OF WAVELET BASES 335 its derivatives are zero at ! = 0, which proves that has an in nite
number of vanishing moments.
^
If h is Cn then ^ and ^ are also Cn. The discontinuities of the
^
(n + 1)th derivative of h are generally at the junction of the transition
band j!j = =3 2 =3, in which case one can show that there exists A
such that
j (t)j A (1 + jtj);n;1 and j (t)j A (1 + jtj);n;1 :
Although the asymptotic decay of is fast when n is large, its e ective
numerical decay may be relatively slow, which is re ected by the fact
that A is quite large. As a consequence, a Meyer wavelet transform is
generally implemented in the Fourier domain. Section 8.4.2 relates
these wavelet bases to lapped orthogonal transforms applied in the
Fourier domain. One can prove 21] that there exists no orthogonal
wavelet that is C1 and has an exponential decay.
(t)
j ^(! )j
1 1 0.8
0.5 0.6
0 0.4 −0.5
−1 0.2
−5 0 5 Figure 7.8: Meyer wavelet
puted with (7.94). 0 −10 −5 0 5 10 and its Fourier transform modulus com- Example 7.10 To satisfy the quadrature condition (7.90), one can ^
verify that h in (7.89) may be de ned on the transition bands by
p
^
h(!) = 2 cos 2 3j!j ; 1 for j!j 2 =3 2 =3]
where (x) is a function that goes from 0 to 1 on the interval 0 1] and
satis es
8x 2 0 1]
(x) + (1 ; x) = 1:
(7.93) CHAPTER 7. WAVELET BASES 336 An example due to Daubechies 21] is
(x) = x4 (35 ; 84 x + 70 x2 ; 20 x3): (7.94) ^
The resulting h(!) has n = 3 vanishing derivatives at j!j = =3 2 =3.
Figure 7.8 displays the corresponding wavelet . Haar Wavelet The Haar basis is obtained with a multiresolution of
piecewise constant functions. The scaling function is = 1 0 1]. The lter h n] given in (7.51) has two non-zero coe cients equal to 2;1=2 at
n = 0 and n = 1. Hence
1
p
2 +1
t = X (;1)1;n h 1 ; n] (t ; n) = p
1
2
2
n=;1 so 8 ;1
<
(t) = : 1
0 (t ; 1) ; (t) if 0 t < 1=2
if 1=2 t < 1
otherwise (7.95) The Haar wavelet has the shortest support among all orthogonal wavelets.
It is not well adapted to approximating smooth functions because it has
only one vanishing moment.
(t ) (t) 1.5 2 1 1 0.5 0 0
−4 −2 0 2 4 −1
−4 −2 0 2 4 Figure 7.9: Linear spline Battle-Lemarie scaling function and wavelet
. 7.2. CLASSES OF WAVELET BASES 337 Battle-Lemarie Wavelets Polynomial spline wavelets introduced by Battle 89] and Lemarie 249] are computed from spline multires^
olution approximations. The expressions of ^(!) and h(!) are given
^
respectively by (7.23) and (7.53). For splines of degree m, h(!) and its
rst m derivatives are zero at ! = . Theorem 7.4 derives that has
m + 1 vanishing moments. It follows from (7.87) that s S2m+2 (!=2 + ) :
^(!) = exp(;i!=2)
m+1
!
S2m+2 (!) S2m+2(!=2)
This wavelet has an exponential decay. Since it is a polynomial spline
of degree m, it is m ; 1 times continuously di erentiable. Polynomial
spline wavelets are less regular than Meyer wavelets but have faster
time asymptotic decay. For m odd, is symmetric about 1=2. For m
even it is antisymmetric about 1=2. Figure 7.5 gives the graph of the
cubic spline wavelet corresponding to m = 3. For m = 1, Figure 7.9
displays linear splines and . The properties of these wavelets are
further studied in 93, 15, 125]. 7.2.3 Daubechies Compactly Supported Wavelets Daubechies wavelets have a support of minimum size for any given
number p of vanishing moments. Proposition 7.2 proves that wavelets of
compact support are computed with nite impulse response conjugate
mirror lters h. We consider real causal lters h n], which implies that
^
h is a trigonometric polynomial: X
^
h(!) = h n] e;in! :
N ;1
n=0 ^
To ensure that has p vanishing moments, Theorem 7.4 shows that h
must have a zero of order p at ! = . To construct a trigonometric
polynomial of minimal size, we factor (1 + e;i! )p, which is a minimum
size polynomial having p zeros at ! = :
;i! p
^ (!) = p2 1 + e
R(e;i! ):
(7.96)
h
2 338 CHAPTER 7. WAVELET BASES The di culty is to design a polynomial R(e;i! ) of minimum degree m
^
such that h satis es
^
^
jh(! )j2 + jh(! + )j2 = 2:
(7.97)
As a result, h has N = m + p + 1 non-zero coe cients. The following
theorem by Daubechies 144] proves that the minimum degree of R is
m = p ; 1. Theorem 7.5 (Daubechies) A real conjugate mirror lter h, such ^
that h(! ) has p zeroes at ! = , has at least 2p non-zero coe cients.
Daubechies lters have 2p non-zero coe cients.
Proof 2 . The proof is constructive and computes the Daubechies lters.
^
Since h n] is real, jh(!)j2 is an even function and can thus be written as
a polynomial in cos !. Hence jR(e;i! )j2 de ned in (7.96) is a polynomial
in cos ! that we can also write as a polynomial P (sin2 ! )
2 ^
jh(!)j2 = 2 cos !
2 2p P sin2 ! :
2 (7.98) The quadrature condition (7.97) is equivalent to (1 ; y)p P (y) + yp P (1 ; y) = 1 (7.99) for any y = sin2 (!=2) 2 0 1]. To minimize the number of non-zero
^
terms of the nite Fourier series h(!), we must nd the solution P (y)
0 of minimum degree, which is obtained with the Bezout theorem on
polynomials. Theorem 7.6 (Bezout) Let Q1(y) and Q2(y) be two polynomials of
degrees n1 and n2 with no common zeroes. There exist two unique polynomials P1 (y) and P2 (y) of degrees n2 ; 1 and n1 ; 1 such that P1 (y) Q1 (y) + P2 (y) Q2 (y) = 1: (7.100) The proof of this classical result is in 21]. Since Q1 (y) = (1 ; y)p and
Q2 (y) = yp are two polynomials of degree p with no common zeros, the
Bezout theorem proves that there exist two unique polynomials P1 (y)
and P2 (y) such that
(1 ; y)p P1 (y) + yp P2 (y) = 1: 7.2. CLASSES OF WAVELET BASES 339 The reader can verify that P2 (y) = P1 (1 ; y) = P (1 ; y) with P (y) = p;1
X p;1+k k
y:
k
k=0 (7.101) Clearly P (y) 0 for y 2 0 1]. Hence P (y) is the polynomial of minimum
degree satisfying (7.99) with P (y) 0. Minimum Phase Factorization Now we need to construct a minimum degree polynomial R(e;i! ) = m
X
k=0 rk e;ik! = r0 m
Y (1 ; ak e;i! ) k=0 such that jR(e;i! )j2 = P (sin2 (!=2)). Since its coe cients are real,
R (e;i! ) = R(ei! ) and hence
i!
;i!
jR(e;i! )j2 = R(e;i! ) R(ei! ) = P 2 ; e 4; e
= Q(e;i! ): (7.102) This factorization is solved by extending it to the whole complex plane
with the variable z = e;i! :
m
; ;1
2Y
R(z ) R(z;1 ) = r0 (1 ; ak z ) (1 ; ak z ;1 ) = Q(z) = P 2 ; z 4 z :
k=0
(7.103)
Let us compute the roots of Q(z ). Since Q(z ) has real coe cients if ck
is a root, then ck is also a root and since it is a function of z + z ;1 if ck
is a root then 1=ck and hence 1=ck are also roots. To design R(z ) that
satis es (7.103), we choose each root ak of R(z ) among a pair (ck 1=ck )
and include ak as a root to obtain real coe cients. This procedure yields
2
a polynomial of minimum degree m = p ; 1, with r0 = Q(0) = P (1=2) =
p;1 . The resulting lter h of minimum size has N = p + m + 1 = 2p
2
non-zero coe cients.
Among all possible factorizations, the minimum phase solution R(ei! )
is obtained by choosing ak among (ck 1=ck ) to be inside the unit circle
jak j 1 55]. The resulting causal lter h has an energy maximally concentrated at small abscissa n 0. It is a Daubechies lter of order p. CHAPTER 7. WAVELET BASES 340
p=2
p=3 p=4 p=5 p=6 p=7 n
0
1
2
3
0
1
2
3
4
5
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
3
4
5
6
7
8
9
10
11
12
13 hp n]
.482962913145
.836516303738
.224143868042
;:129409522551
.332670552950
.806891509311
.459877502118
;:135011020010
;:085441273882
.035226291882
.230377813309
.714846570553
.630880767930
;:027983769417
;:187034811719
.030841381836
.032883011667
;:010597401785
.160102397974
.603829269797
.724308528438
.138428145901
;:242294887066
;:032244869585
.077571493840
;:006241490213
;:012580751999
.003335725285
.111540743350
.494623890398
.751133908021
.315250351709
;:226264693965
;:129766867567
.097501605587
.027522865530
;:031582039317
.000553842201
.004777257511
;:001077301085
.077852054085
.396539319482
.729132090846
.469782287405
;:143906003929
;:224036184994
.071309219267
.080612609151
;:038029936935
;:016574541631
.012550998556
.000429577973
;:001801640704
.000353713800 p=8 p=9 p = 10 n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 hp n]
.054415842243
.312871590914
.675630736297
.585354683654
;:015829105256
;:284015542962
.000472484574
.128747426620
;:017369301002
;:04408825393
.013981027917
.008746094047
;:004870352993
;:000391740373
.000675449406
;:000117476784
.038077947364
.243834674613
.604823123690
.657288078051
.133197385825
;:293273783279
;:096840783223
.148540749338
.030725681479
;:067632829061
.000250947115
.022361662124
;:004723204758
;:004281503682
.001847646883
.000230385764
;:000251963189
.000039347320
.026670057901
.188176800078
.527201188932
.688459039454
.281172343661
;:249846424327
;:195946274377
.127369340336
.093057364604
;:071394147166
;:029457536822
.033212674059
.003606553567
;:010733175483
.001395351747
.001992405295
;:000685856695
;:000116466855
.000093588670
;:000013264203 Table 7.2: Daubechies lters for wavelets with p vanishing moments. 7.2. CLASSES OF WAVELET BASES 341 The constructive proof of this theorem synthesizes causal conjugate
mirror lters of size 2p. Table 7.2 gives the coe cients of these Daubechies
lters for 2 p 10. The following proposition derives that Daubechies
wavelets calculated with these conjugate mirror lters have a support
of minimum size. R Proposition 7.4 (Daubechies) If is a wavelet with p vanishing
moments that generates an orthonormal basis of L2 ( ) , then it has a
support of size larger than or equal to 2p ; 1. A Daubechies wavelet
has a minimum size support equal to ;p + 1 p]. The support of the
corresponding scaling function is 0 2p ; 1]. This proposition is a direct consequence of Theorem 7.5. The support of the wavelet, and that of the scaling function, are calculated
with Proposition 7.2. When p = 1 we get the Haar wavelet. Figure
7.10 displays the graphs of and for p = 2 3 4.
(t) (t) (t) 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 −0.5
0 1 2 3 (t) 1.5 −0.5
0 2 2 3 4 2 4 6 (t) 1 1.5
1
0.5 0 0 0 −1 −0.5 −1
−2
−1 −0.5
0 (t) 2 1 1 0 1 2 −2 −1 0 1 2 −1 −2 0 2 4 p=2
p=3
p=4
Figure 7.10: Daubechies scaling function and wavelet with p vanishing moments.
The regularity of and is the same since (t) is a nite linear combination of the (2t ; n). This regularity is however di cult
to estimate precisely. Let B = sup!2R jR(e;i! )j where R(e;i! ) is the CHAPTER 7. WAVELET BASES 342 trigonometric polynomial de ned in (7.96). Proposition 7.3 proves that
is at least uniformly Lipschitz for < p;log2 B ;1. For Daubechies
wavelets, B increases more slowly than p and Figure 7.10 shows indeed
that the regularity of these wavelets increases with p. Daubechies and
Lagarias 147] have established a more precise technique that computes
the exact Lipschitz regularity of . For p = 2 the wavelet is only
Lipschitz 0:55 but for p = 3 it is Lipschitz 1:08 which means that it is
already continuously di erentiable. For p large, and are uniformly
Lipschitz for
0:2 p 129]. Symmlets Daubechies wavelets are very asymmetric because they are constructed by selecting the minimum phase square root of Q(e;i! )
in (7.102). One can show 55] that lters corresponding to a minimum
phase square root have their energy optimally concentrated near the
starting point of their support. They are thus highly non-symmetric,
which yields very asymmetric wavelets.
To obtain a symmetric or antisymmetric wavelet, the lter h must
be symmetric or antisymmetric with respect to the center of its sup^
port, which means that h(!) has a linear complex phase. Daubechies
proved 144] that the Haar lter is the only real compactly supported
conjugate mirror lter that has a linear phase. The Symmlet lters of
Daubechies are obtained by optimizing the choice of the square root
R(e;i! ) of Q(e;i! ) to obtain an almost linear phase. The resulting
wavelets still have a minimum support ;p + 1 p] with p vanishing moments but they are more symmetric, as illustrated by Figure 7.11 for
p = 8. The coe cients of the Symmlet lters are in WaveLab. Complex conjugate mirror lters with a compact support and a linear phase
can be constructed 251], but they produce complex wavelet coe cients
whose real and imaginary parts are redundant when the signal is real. Coi ets For an application in numerical analysis, Coifman asked Daubechies 144] to construct a family of wavelets that have p vanishing moments and a minimum size support, but whose scaling functions
also satisfy Z +1
;1 (t) dt = 1 and Z +1
;1 tk (t) dt = 0 for 1 k < p: (7.104) 7.2. CLASSES OF WAVELET BASES
(t) 343 (t) (t) 1 1 (t)
1.5 1.5 1 1 0.5 0.5 0.5 0 0.5 0
0
−0.5
0 −0.5 0 −1
5 10 15 −5 0 5 −0.5
0 −0.5
5 10 15 −1 −5 0 5 Figure 7.11: Daubechies ( rst two) and Symmlets (last two) scaling
functions and wavelets with p = 8 vanishing moments.
Such scaling functions are useful in establishing precise quadrature formulas. If f is Ck in the neighborhood of 2J n with k < p, then a Taylor
expansion of f up to order k shows that
2;J=2 hf J ni f (2J n) + O(2(k+1)J ) : (7.105) At a ne scale 2J , the scaling coe cients are thus closely approximated
by the signal samples. The order of approximation increases with p.
The supplementary condition (7.104) requires increasing the support of
the resulting Coi et has a support of size 3p ; 1 instead of 2p ; 1
for a Daubechies wavelet. The corresponding conjugate mirror lters
are tabulated in WaveLab. Audio Filters The rst conjugate mirror lters with nite impulse response were constructed in 1986 by Smith and Barnwell 317] in the
context of perfect lter bank reconstruction, explained in Section 7.3.2.
^
^
These lters satisfy the quadrature condition jh(!)j2 + jh(! + )j2 = 2,
which is necessary and su cient for lter bank reconstruction. Howp
^
ever, h(0) 6= 2 so the in nite product of such lters does not yield a
wavelet basis of L2( ). Instead of imposing any vanishing moments,
Smith and Barnwell 317], and later Vaidyanathan and Hoang 337],
designed their lters to reduce the size of the transition band, where
p
^
jh(! )j decays from nearly 2 to nearly 0 in the neighborhood of =2.
This constraint is important in optimizing the transform code of audio signals, explained in Section 11.3.3. However, many cascades of
these lters exhibit wild behavior. The Vaidyanathan-Hoang lters are
tabulated in WaveLab. Many other classes of conjugate mirror lters R CHAPTER 7. WAVELET BASES 344 with nite impulse response have been constructed 74, 73]. Recursive
conjugate mirror lters may also be designed 209] to minimize the size
of the transition band for a given number of zeroes at ! = . These
lters have a fast but non-causal recursive implementation for signals
of nite size. 7.3 Wavelets and Filter Banks 1
Decomposition coe cients in a wavelet orthogonal basis are computed
with a fast algorithm that cascades discrete convolutions with h and g,
and subsamples the output. Section 7.3.1 derives this result from the
embedded structure of multiresolution approximations. A direct lter
bank analysis is performed in Section 7.3.2, which gives more general
perfect reconstruction conditions on the lters. Section 7.3.3 shows
that perfect reconstruction lter banks decompose signals in a basis of
l2( ). This basis is orthogonal for conjugate mirror lters. Z 7.3.1 Fast Orthogonal Wavelet Transform We describe a fast lter bank algorithm that computes the orthogonal
wavelet coe cients of a signal measured at a nite resolution. A fast
wavelet transform decomposes successively each approximation PVj f
into a coarser approximation PVj+1 f plus the wavelet coe cients carried
by PW j+1f . In the other direction, the reconstruction from wavelet
coe cients recovers each PVj f from PVj+1 f and PW j+1f .
Since f j ngn2Z and f j ngn2Z are orthonormal bases of Vj and Wj
the projection in these spaces is characterized by aj n] = hf j ni and dj n] = hf j ni : The following theorem 253, 255] shows that these coe cients are calculated with a cascade of discrete convolutions and subsamplings. We
denote x n] = x ;n] and x n] = x p] if n = 2p
0
if n = 2p + 1 : (7.106) 7.3. WAVELETS AND FILTER BANKS 345 Theorem 7.7 (Mallat) At the decomposition
aj+1 p] =
dj+1 p] = +1
X n=;1
+1 X n=;1 h n ; 2p] aj n] = aj ? h 2p] (7.107) g n ; 2p] aj n] = aj ? g 2p]: (7.108) At the reconstruction, aj p] =
= +1
X +1
X h p ; 2n] aj+1 n] +
g
n=;1
n=;1
aj+1 ? h p] + dj+1 ? g p]: Proof 1 . Proof of (7.107) Any j +1 p 2 Vj +1
in the orthonormal basis f j n gn2Z of Vj :
j +1 p = +1
X n=;1 h p ; 2n] dj+1 n]
(7.109) Vj can be decomposed j +1 p j ni j n: (7.110) With the change of variable t0 = 2;j t ; 2p we obtain h j +1 p j n i =
=
= Z +1 p 1j+1 ;1 2
1
p
2 2 2 Z;11
+
1
p t t 2 t ; 2j +1 p p1
j
2j +1
2 t ; 2j n dt
2j (t ; n + 2p) dt (t ; n + 2p) = h n ; 2p]: (7.111) Hence (7.110) implies that
j +1 p = +1
X n=;1 h n ; 2p] j n: (7.112) Computing the inner product of f with the vectors on each side of this
equality yields (7.107). CHAPTER 7. WAVELET BASES 346 Proof of (7.108) Since j +1 p 2 Wj +1
j +1 p = +1
X h n=;1 Vj , it can be decomposed as j +1 p j n i j n: As in (7.111), the change of variable t0 = 2;j t ; 2p proves that h j +1 p j ni = t 1
p (t ; n + 2p) = g n ; 2p] 2 2 and hence
j +1 p = +1
X n=;1 g n ; 2p] j n: (7.113)
(7.114) Taking the inner product with f on each side gives (7.108).
Proof of (7.109) Since Wj +1 is the orthogonal complement of Vj +1 in
Vj the union of the two bases f j+1 ngn2Z and f j+1 ngn2Z is an orthonormal basis of Vj . Hence any j p can be decomposed in this basis:
jp = +1
X h n=;1
+1
X + j p j +1 n i j +1 n n=;1 h j p j +1 n i j +1 n : Inserting (7.111) and (7.113) yields
jp = +1
X n=;1 h p ; 2n] j +1 n + +1
X n=;1 g p ; 2n] j +1 n: Taking the inner product with f on both sides of this equality gives
(7.109). Theorem 7.7 proves that aj+1 and dj+1 are computed by taking every
other sample of the convolution of aj with h and g respectively, as
illustrated by Figure 7.12. The lter h removes the higher frequencies
of the inner product sequence aj whereas g is a high-pass lter which
collects the remaining highest frequencies. The reconstruction (7.109) 7.3. WAVELETS AND FILTER BANKS 347 h 2 a j+1 h 2 aj+2 g aj 2 dj+1 g 2 dj+2 (a)
aj+2 2 h dj+2 2 a j+1 2 h dj+1 2 g g + + aj (b)
Figure 7.12: (a): A fast wavelet transform is computed with a cascade
of lterings with h and g followed by a factor 2 subsampling. (b):
A fast inverse wavelet transform reconstructs progressively each aj by
inserting zeroes between samples of aj+1 and dj+1, ltering and adding
the output.
is an interpolation that inserts zeroes to expand aj+1 and dj+1 and
lters these signals, as shown in Figure 7.12.
An orthogonal wavelet representation of aL = hf L ni is composed
of wavelet coe cients of f at scales 2L < 2j 2J plus the remaining
approximation at the largest scale 2J :
fdj gL<j J aJ ] : (7.115) It is computed from aL by iterating (7.107) and (7.108) for L j < J .
Figure 7.7 gives a numerical example computed with the cubic spline
lter of Table 7.1. The original signal aL is recovered from this wavelet
representation by iterating the reconstruction (7.109) for J > j L. Initialization Most often the discrete input signal b n] is obtained by a nite resolution device that averages and samples an analog input
signal. For example, a CCD camera lters the light intensity by the
optics and each photo-receptor averages the input light over its support.
A pixel value thus measures average light intensity. If the sampling
distance is N ;1 , to de ne and compute the wavelet coe cients, we
need to associate to b n] a function f (t) 2 VL approximated at the CHAPTER 7. WAVELET BASES 348 scale 2L = N ;1 , and compute aL n] = hf L ni. Problem 7.6 explains
how to compute aL n] = hf L ni so that b n] = f (N ;1n).
A simpler and faster approach considers f (t) =
Since f L n(t) = 2;L=2 +1
X n=;1 b n] t ; 2Ln 2 V :
L
2L (2;Lt ; n)gn2Z is orthonormal and 2L = N ;1 , b n] = N 1=2 hf R1
But ^(0) = ;1 (t) dt = 1, so
Z +1 L ni = N 1=2 a L n] : t ; N ;1 n dt
N ;1
;1
is a weighted average of f in the neighborhood of N ;1 n over a domain
proportional to N ;1 . Hence if f is regular,
b n] = N 1=2 aL n] f (N ;1n) :
(7.116)
If is a Coi et and f (t) is regular in the neighborhood of N ;1 n,
then (7.105) shows that N ;1=2 aL n] is a high order approximation of
f (N ;1 n).
N 1=2 aL n] = f (t) N1;1 Finite Signals Let us consider a signal f whose support is in 0 1] and which is approximated with a uniform sampling at intervals N ;1 .
The resulting approximation aL has N = 2;L samples. This is the case
in Figure 7.7 with N = 512. Computing the convolutions with h and
g at abscissa close to 0 or close to N requires knowing the values of
aL n] beyond the boundaries n = 0 and n = N ; 1. These boundary
problems may be solved with one of the three approaches described in
Section 7.5.
Section 7.5.1 explains the simplest algorithm, which periodizes aL.
The convolutions in Theorem 7.7 are replaced by circular convolutions.
This is equivalent to decomposing f in a periodic wavelet basis of
L2 0 1]. This algorithm has the disadvantage of creating large wavelet
coe cients at the borders. 7.3. WAVELETS AND FILTER BANKS 349 If is symmetric or antisymmetric, we can use a folding procedure
described in Section 7.5.2, which creates smaller wavelet coe cients
at the border. It decomposes f in a folded wavelet basis of L2 0 1].
However, we mentioned in Section 7.2.3 that Haar is the only symmetric
wavelet with a compact support. Higher order spline wavelets have a
symmetry but h must be truncated in numerical calculations.
The most performant boundary treatment is described in Section
7.5.3, but the implementation is more complicated. Boundary wavelets
which keep their vanishing moments are designed to avoid creating large
amplitude coe cients when f is regular. The fast algorithm is implemented with special boundary lters, and requires the same number of
calculations as the two other methods. Complexity Suppose that h and g have K non-zero coe cients. Let aL be a signal of size N = 2;L. With appropriate boundary calculations, each aj and dj has 2;j samples. Equations (7.107) and (7.108)
compute aj+1 and dj+1 from aj with 2;j K additions and multiplications. The wavelet representation (7.115) is therefore calculated with
at most 2KN additions and multiplications. The reconstruction (7.109)
of aj from aj+1 and dj+1 is also obtained with 2;j K additions and multiplications. The original signal aL is thus also recovered from the wavelet
representation with at most 2KN additions and multiplications. Wavelet Graphs The graphs of and are computed numerically with the inverse wavelet transform. If f = then a0 n] = n] and
dj n] = 0 for all L < j 0. The inverse wavelet transform computes
aL and (7.116) shows that N 1=2 aL n] (N ;1n) : If is regular and N is large enough, we recover a precise approximation
of the graph of from aL .
Similarly, if f = then a0 n] = 0, d0 n] = n] and dj n] = 0
for L < j < 0. Then aL n] is calculated with the inverse wavelet
transform and N 1=2 aL n]
(N ;1n). The Daubechies wavelets and
scaling functions in Figure 7.10 are calculated with this procedure. 350 CHAPTER 7. WAVELET BASES 7.3.2 Perfect Reconstruction Filter Banks The fast discrete wavelet transform decomposes signals into low-pass
and high-pass components subsampled by 2 the inverse transform performs the reconstruction. The study of such classical multirate lter
banks became a major signal processing topic in 1976, when Croisier,
Esteban and Galand 141] discovered that it is possible to perform
such decompositions and reconstructions with quadrature mirror lters
(Problem 7.7). However, besides the simple Haar lter, a quadrature
mirror lter can not have a nite impulse response. In 1984, Smith and
Barnwell 316] and Mintzer 272] found necessary and su cient conditions for obtaining perfect reconstruction orthogonal lters with a nite
impulse response, that they called conjugate mirror lters. The theory
was completed by the biorthogonal equations of Vetterli 338, 339] and
the general paraunitary matrix theory of Vaidyanathan 336]. We follow
this digital signal processing approach which gives a simple understanding of conjugate mirror lter conditions. More complete presentations
of lter banks properties can be found in 1, 2, 68, 73, 74]. Filter Bank A two-channel multirate lter bank convolves a signal
a0 with a low-pass lter h n] = h ;n] and a high-pass lter g n] = g ;n]
and subsamples by 2 the output:
a1 n] = a0 ? h 2n] and d1 n] = a0 ? g 2n]: (7.117) A reconstructed signal a0 is obtained by ltering the zero expanded
~
~
signals with a dual low-pass lter h and a dual high-pass lter g, as
~
shown in Figure 7.13. With the zero insertion notation (7.106) it yields
~
a0 n] = a1 ? h n] + d1 ? g n]:
~
~
(7.118)
~
We study necessary and su cient conditions on h, g, h and g to guar~
antee a perfect reconstruction a0 = a0.
~ Subsampling and Zero Interpolation Subsamplings and expan- sions with zero P
insertions have simple expressions in the Fourier domain.
1
Since x(!) = +=;1 x n] e;in! the Fourier series of the subsampled
^
n 7.3. WAVELETS AND FILTER BANKS
h 2 a1[n] 351
2 ~
h 2 ~
g + a 0 [n]
g 2 d1[n] ~
a 0 [n] Figure 7.13: The input signal is ltered by a low-pass and a high-pass
lter and subsampled. The reconstruction is performed by inserting
~
zeroes and ltering with dual lters h and g.
~
signal y n] = x 2n] can be written y(2!) =
^ +1
X n=;1 1^
^
x 2n] e;i2n! = 2 x(!) + x(! + ) : (7.119) The component x(! + ) creates a frequency folding. This aliasing
^
must be canceled at the reconstruction.
The insertion of zeros de nes
y n] = x n] = x p] if n = 2p + 1
0
if n = 2p
whose Fourier transform is y(!) =
^ +1
X n=;1 x n] e;i2n! = x(2!):
^ (7.120) The following theorem gives Vetterli's 339] biorthogonal conditions,
which guarantee that a0 = a0 .
~ Theorem 7.8 (Vetterli) The lter bank performs an exact reconstruction for any input signal if and only if ^
~
h (! + ) b(!) + g (! + ) b(!) = 0
h
^
g
~ and (7.121) ^ (!) b(!) + g (!) b(!) = 2:
h~
h
^g
~ (7.122) 352 CHAPTER 7. WAVELET BASES
Proof 1 . We rst relate the Fourier transform of a1 and d1 to the Fourier
transform of a0 . Since h and g are real, the transfer functions of h and
^
^
g are respectively h(;!) = h (!) and g (;!) = g (!). By using (7.119),
^
^
we derive from the de nition (7.117) of a1 and d1 that
^
^
^
(7.123)
a1 (2!) = 1 a0 (!) h (!) + a0 (! + ) h (! + )
^
2^
^
d1 (2!) = 1 (^0(!) g (!) + a0 (! + ) g (! + )) : (7.124)
^
^
2a ^
The expression (7.118) of a0 and the zero insertion property (7.120) also
~
imply
b
~
b0(!) = a1(2!) h(!) + d^1(2!) b(!):
a
~
^
g
~
(7.125)
Hence
b
^
~
b0 (!) = 1 h (!) h(!) + g (!) b(!) a0(!) +
^g
~
^
a
~
2
1 h (! + ) h(!) + g (! + ) b(!) a (! + ):
b
^
~
^
g
~
^0
2
To obtain a0 = a0 for all a0 , the lters must cancel the aliasing term
~
a0 (! + ) and guarantee a unit gain for a0 (!), which proves equations
^
^
(7.121) and (7.122). ~
Theorem 7.8 proves that the reconstruction lters h and g are entirely
~
speci ed by the decomposition lters h and g. In matrix form, it can
be rewritten
b (!) ! 2
^ (! )
~
h
g(!)
^
h
(7.126)
^
b (!) = 0 :
h(! + ) g(! + )
^
g
~
The inversion of this 2 2 matrix yields
b (!) ! 2
~
g
^
h
= (!) ;h(! + )
(7.127)
^ (! + )
b (!)
g
~
where (!) is the determinant
^^
^
(!) = h(!) g(! + ) ; h(! + ) g(!):
^
(7.128)
The reconstruction lters are stable only if the determinant does not
vanish for all ! 2 ; ]. Vaidyanathan 336] has extended this result
to multirate lter banks with an arbitrary number M of channels by
showing that the resulting matrices of lters satisfy paraunitary properties 73]. 7.3. WAVELETS AND FILTER BANKS 353 Finite Impulse Response When all lters have a nite impulse response, the determinant (!) can be evaluated. This yields simpler
relations between the decomposition and reconstruction lters.
Theorem 7.9 Perfect reconstruction lters satisfy
^h
~
^
~
h (!) b(!) + h (! + ) b(! + ) = 2:
h
(7.129) R Z For nite impulse response lters, there exist a 2 and l 2 such
that
~
^
g(!) = a e;i(2l+1)! b (! + ) and b(!) = a;1 e;i(2l+1)! h (! + ):
^
h
g
~
(7.130) Proof 1 . Equation (7.127) proves that
2^
2
b
~
h (!) = (!) g(! + ) and b (!) = ;!) ^ (! + ): (7.131)
g
~
(h
Hence
b
~
^
g(!) b (!) = ; (!(+) ) h (! + ) h(! + ):
^g
~
(7.132)
!
The de nition (7.128) implies that (! + ) = ; (!). Inserting (7.132)
in (7.122) yields (7.129).
The Fourier transform of nite impulse response lters is a nite series in exp( in!). The determinant (!) de ned by (7.128) is therefore
a nite series. Moreover (7.131) proves that ;1 (!) must also be a nite
series. A nite series in exp( in!) whose inverse is also a nite series
must have a single term. Since (!) = ; (! + ) the exponent n must
be odd. This proves that there exist l 2 and a 2 such that
(!) = ;2 a exp i(2l + 1)!]:
(7.133)
Inserting this expression in (7.131) yields (7.130). Z R The factor a is a gain which is inverse for the decomposition and reconstruction lters and l is a reverse shift. We generally set a = 1 and
l = 0. In the time domain (7.130) can then be rewritten
~
g n] = (;1)1;n h 1 ; n] and g n] = (;1)1;n h 1 ; n]: (7.134)
~
~~
The two pairs of lters (h g) and (h g) play a symmetric role and can
be inverted. CHAPTER 7. WAVELET BASES 354 Conjugate Mirror Filters If we impose that the decomposition l~
ter h is equal to the reconstruction lter h, then (7.129) is the condition
of Smith and Barnwell 316] and Mintzer 272] that de nes conjugate
mirror lters:
^
^
jh(! )j2 + jh(! + )j2 = 2:
(7.135)
It is identical to the lter condition (7.34) that is required in order to
synthesize orthogonal wavelets. The next section proves that it is also
equivalent to discrete orthogonality properties. 7.3.3 Biorthogonal Bases of l2(Z) 2 Z The decomposition of a discrete signal in a multirate lter bank is
interpreted as an expansion in a basis of l2( ). Observe rst that the
low-pass and high-pass signals of a lter bank computed with (7.117)
can be rewritten as inner products in l2( ): a1 l] =
d1 l ] = +1
X k=;1
+1 X k=;1 Z a0 n] h n ; 2l] = ha0 k] h k ; 2n]i (7.136) a0 n] g n ; 2l] = ha0 n] g n ; 2l]i: (7.137) The signal recovered by the reconstructing lters is a0 n] = +1
X l=;1 ~
a1 l] h n ; 2l] + +1
X l=;1 d1 l] g n ; 2l]:
~ (7.138) Inserting (7.136) and (7.137) yields a0 n] = +1
X l=;1 ~
hf k] h k ; 2l]i h n ; 2l] + +1
X l=;1 hf k] g k ; 2l]i g n ; 2l]:
~ (7.139)
We recognize the decomposition of a0 over dual families of vectors
~
fh n ; 2l] g n ; 2l]gl2Z and fh n ; 2l] g n ; 2l]gl2Z. The following
~
theorem proves that these two families are biorthogonal. 7.3. WAVELETS AND FILTER BANKS 355 ~
Theorem 7.10 If h, g, h and g are perfect reconstruction lters whose
~ Z ~
Fourier transform is bounded then fh n ; 2l] g n ; 2l]gl2Z and fh n ;
~
2l] g n ; 2l]gl2Z are biorthogonal Riesz bases of l2( ). Z Proof 2 . To prove that these families are biorthogonal we must show
that for all n 2 and ~
hh n] h n ; 2l]i =
hg n] g n ; 2l]i =
~ l]
l] ~
hh n] g n ; 2l]i = hg n] h n ; 2l]i = 0:
~ (7.140)
(7.141)
(7.142) For perfect reconstruction lters, (7.129) proves that
1^
b
b
~
^
~
2 h (!) h(!) + h (! + ) h(! + ) = 1:
In the time domain, this equation becomes
~
h ? h 2l] = +1
X k=;1 ~
h n] h n ; 2l] = l] (7.143) which veri es (7.140). The same proof as for (7.129) shows that
1 g (!) b(!) + g (! + ) b(! + ) = 1:
~
^
g
~
2^ g
In the time domain, this equation yields (7.141). It also follows from
(7.127) that
1 g (!) h(!) + g (! + ) h(! + ) = 0
b
~
~
^b
^
2
and
1 h (!) b(!) + h (! + ) b(! + ) = 0:
^g
^
~
g
~
2 The inverse Fourier transforms of these two equations yield (7.142).
To nish the proof, one must show the existence of Riesz bounds
de ned in (A.12). The reader can verify that this is a consequence of the
fact that the Fourier transform of each lter is bounded. CHAPTER 7. WAVELET BASES 356 Orthogonal Bases A Riesz basis is orthonormal if the dual basis is ~
the same as the original basis. For lter banks, this means that h = h
and g = g. The lter h is then a conjugate mirror lter
~
^
^
jh(! )j2 + jh(! + )j2 = 2:
(7.144) Z
Discrete Wavelet Bases
LR The resulting family fh n ; 2l] g n ; 2l]gl2Z is an orthogonal basis of
l2( ). The construction of conjugate mirror lters is simpler than the construction of orthogonal wavelet bases of
2 ( ). Why then should we bother with continuous time models of
wavelets, since in any case all computations are discrete and rely on
conjugate mirror lters? The reason is that conjugate mirror lters are
most often used in lter banks that cascade several levels of lterings
and subsamplings. It is thus necessary to understand the behavior of
such a cascade 290]. In a wavelet lter bank tree, the output of the
low-pass lter h is sub-decomposed whereas the output of the high-pass
lter g is not this is illustrated in Figure 7.12. Suppose that the sampling distance of the original discrete signal is N ;1 . We denote aL n]
this discrete signal, with 2L = N ;1 . At the depth j ; L 0 of this
lter bank tree, the low-pass signal aj and high-pass signal dj can be
written
aj l] = aL ? j 2j;Ll] = haL n] j n ; 2j;Ll]i
and
dj l] = aL ? j 2j;Ll] = haL n] j n ; 2j;Ll]i:
The Fourier transforms of these equivalent lters are
^j (!) = Y j ;L;1
p=0 ^
h(2p!) and ^j (!) = g(2j;L;1!)
^ Y j ;L;2
p=0 ^
h(2p!): (7.145) A lter bank tree of depth J ; L 0, decomposes aL over the family
of vectors n J n ; 2J ;Ll] n o l2Z j o n ; 2j;Ll] L<j J l2Z : (7.146) 7.4. BIORTHOGONAL WAVELET BASES 2 357 Z For conjugate mirror lters, one can verify that this family is an orthonormal basis of l2( ). These discrete vectors are close to a uniform
sampling of the continuous time scaling functions j (t) = 2;j=2 (2;j t)
and wavelets j (t) = 2;j=2 (2;j t). When the number L ; j of successive convolutions increases, one can verify that j n] and j n] converge
respectively to N ;1=2 j (N ;1 n) and N ;1=2 j (N ;1 n). The factor N ;1=2
normalizes the l2( ) norm of these sampled functions. If L ; j = 4 then
j n] and j n] are already very close to these limit values. The impulse responses j n] and j n] of the lter bank are thus much closer
to continuous time scaling functions and wavelets than they are to the
original conjugate mirror lters h and g. This explains why wavelets
provide appropriate models for understanding the applications of these
lter banks. Chapter 8 relates more general lter banks to wavelet
packet bases.
If the decomposition and reconstruction lters of the lter bank are
di erent, the resulting basis (7.146) is non-orthogonal. The stability of
this discrete wavelet basis does not degrade when the depth J ; L of the
lter bank increases. The next section shows that the corresponding
continuous time wavelet (t) generates a Riesz basis of L2( ). Z 7.4 Biorthogonal Wavelet Bases 2 R The stability and completeness properties of biorthogonal wavelet bases
~
are described for perfect reconstruction lters h and h having a nite
impulse response. The design of linear phase wavelets with compact
support is explained in Section 7.4.2. 7.4.1 Construction of Biorthogonal Wavelet Bases ~~
An in nite cascade of perfect reconstruction lters (h g) and (h g)
yields two scaling functions and wavelets whose Fourier transforms satisfy
1^
1h ~
^(2!) = p h(!) ^(!) b(2!) = p b(!) b(!) (7.147)
~
~
2
2
1^
1g ~
^(2!) = p g(!) ^(!) b (2!) = p b(!) b(!) : (7.148)
~
~
2
2 CHAPTER 7. WAVELET BASES 358 In the time domain, these relations become
p
(t) = 2 (t) = p 2 +1
X n=;1
+1 X n=;1 h n] (2t ; n) ~(t) = p2 p
g n] (2t ; n) ~(t) = 2 +1
X n=;1
+1 X n=;1 ~
h n] ~(2t ; n)
(7.149) g n] ~(2t ; n)(7.150)
~
: The perfect reconstruction conditions are given by Theorem 7.9. If
we normalize the gain and shift to a = 1 and l = 0, the lters must
satisfy
^~
^
~
h (!) b(!) + h (! + ) b(! + ) = 2
h
h
(7.151)
and
~
g(!) = e;i! b (! + )
^
h ^
b(!) = e;i! h (! + ):
g
~ (7.152) Wavelets should have a zero average, which means that ^(0) =
b (0) = 0. This is obtained by setting g(0) = b(0) = 0 and hence
~
^
g
~
b( ) = 0. The perfect reconstruction condition (7.151) implies
^
~
h( ) = h
^~
that h (0) b(0) = 2. Since both lters are de ned up to multiplicative
h
^
constants respectively equal to and ;1, we adjust so that h(0) =
p
b(0) = 2.
~
h
~
In the following, we also suppose that h and h are nite impulse
response lters. One can then prove 21] that
^(!) = +1 ^
Y h(2;p!) p=1 p 2 and +1 ~
hp
b(!) = Y b(2;p!)
~ p=1 (7.153) 2 are the Fourier transforms of distributions of compact support. However, these distributions may exhibit wild behavior and have in nite
energy. Some further conditions must be imposed to guarantee that ^
~
and b are the Fourier transforms of nite energy functions. The following theorem gives su cient conditions on the perfect reconstruction
lters for synthesizing biorthogonal wavelet bases of L2( ). R 7.4. BIORTHOGONAL WAVELET BASES 359 Theorem 7.11 (Cohen, Daubechies, Feauveau) Suppose that there
~
exist strictly positive trigonometric polynomials P (ei! ) and P (ei! ) such
that
^
h!
2
b!
~
h2 ^
P (ei!=2 ) + h ! +
2
2
i!=2 ) + b ! +
~
~
P (e
h2
2 2
2 P (ei(!=2+ ) ) = 2 P (ei! )(7.154)
~
~
P (ei(!=2+ ) ) = 2 P (ei! )(7.155) ~
and that P and P are unique (up to normalization). Suppose that
inf
!2 ; =2 ^
jh(! )j > 0
=2] inf
!2 ; =2 ~
jb(! )j > 0:
h
=2] (7.156) R ~
Then the functions ^ and b de ned in (7.153) belong to L2 ( ) ,
and , ~ satisfy biorthogonal relations
h (t) ~(t ; n)i = n]: (7.157) R The two wavelet families f j ng(j n)2Z2 and f ~j ng(j n)2Z2 are biorthogonal Riesz bases of L2( ) . The proof of this theorem is in 131] and 21]. The hypothesis (7.156)
is also imposed by Theorem 7.2, which constructs orthogonal bases of
scaling functions. The conditions (7.154) and (7.155) do not appear in
the construction of wavelet orthogonal bases because they are always
~
satis ed with P (ei! ) = P (ei! ) = 1 and one can prove that constants
are the only invariant trigonometric polynomials 247].
Biorthogonality means that for any (j j 0 n n0) 2 4, R h jn ~j0 n0 i = n ; n0] j ; j 0]: Z (7.158) Any f 2 L2( ) has two possible decompositions in these bases: f= +1
X n j =;1 hf j n i ~j n = +1
X n j =;1 hf ~j n i jn : (7.159) CHAPTER 7. WAVELET BASES 360 The Riesz stability implies that there exist A > 0 and B > 0 such that A kf k2
1 kf k2
B +1
X n j =;1
+1 X n j =;1 jhf 2
j nij jhf ~j nij2 B kf k2 (7.160) 1 kf k2:
A (7.161) Multiresolutions Biorthogonal wavelet bases are related to multiresolution approximations. The family f (t ; n)gn2Z is a Riesz basis
of the space V0 it generates, whereas f ~(t ; n)gn2Z is a Riesz basis of
~
~
another space V0. Let Vj and Vj be the spaces de ned by f (t) 2 Vj , f (2j t) 2 V0
~
~
f (t) 2 Vj , f (2j t) 2 V0:
~
One can verify that fVj gj2Z and fVj gj2Z are two multiresolution approximations of L2( ). For any j 2 , f j ngn2Z and f ~j ngn2Z are
~
Riesz bases of Vj and Vj . The dilated wavelets f j ngn2Z and f ~j ngn2Z
~
are bases of two detail spaces Wj and Wj such that
~
~
~
Vj Wj = Vj;1 and Vj Wj = Vj;1 : R Z The biorthogonality of the decomposition and reconstruction wavelets
~
~
implies that Wj is not orthogonal to Vj but is to Vj whereas Wj is
~ j but is to Vj .
not orthogonal to V Fast Biorthogonal Wavelet Transform The perfect reconstruc- tion lter bank studied in Section 7.3.2 implements a fast biorthogonal
wavelet transform. For any discrete signal input b n] sampled at intervals N ;1 = 2L, there exists f 2 VL such that aL n] = hf L ni =
N ;1=2 b n]. The wavelet coe cients are computed by successive convolutions with h and g. Let aj n] = hf j ni and dj n] = hf j ni. As in
Theorem 7.7, one can prove that aj+1 n] = aj ? h 2n] dj+1 n] = aj ? g 2n] : (7.162) 7.4. BIORTHOGONAL WAVELET BASES 361 ~
The reconstruction is performed with the dual lters h and g:
~
~
aj n] = aj+1 ? h n] + dj+1 ? g n]:
~
(7.163)
If aL includes N non-zero samples, the biorthogonal wavelet representation fdj gL<j J aJ ] is calculated with O(N ) operations, by iterating
(7.162) for L j < J . The reconstruction of aL by applying (7.163)
for J > j L requires the same number of operations. 7.4.2 Biorthogonal Wavelet Design 2 The support size, the number of vanishing moments, the regularity and
the symmetry of biorthogonal wavelets is controlled with an appropriate
~
design of h and h.
~
Support If the perfect reconstruction lters h and h have a nite impulse response then the corresponding scaling functions and wavelets
also have a compact support. As in Section 7.2.1, one can show that
~
if h n] and ~ n] are non-zero respectively for N1 n N2 and N1
h
~
n N2 , then and ~ have a support respectively equal to N1 N2] and
~1 N2]. Since
~
N
~
g n] = (;1)1;n h 1 ; n] and g n] = (;1)1;n h 1 ; n]
~
the supports of and ~ de ned in (7.150) are respectively " ~
~
N1 ; N2 + 1 N2 ; N1 + 1
2
2 # "~ # ~N
N
and N1 ; 2 2 + 1 N2 ; 2 1 + 1 :
(7.164)
Both wavelets thus have a support of the same size and equal to
~~
l = N2 ; N1 + N2 ; N1 :
(7.165)
2 Vanishing Moments The number of vanishing moments of and ~ b
^
~
depends on the number of zeroes at ! = of h(!) and h(!). Theorem
7.4 proves that has p vanishing moments if the derivatives of its
~ CHAPTER 7. WAVELET BASES 362 Fourier transform satisfy ^(k)(0) = 0 for k p. Since ^(0) = 1, (7.4.1)
~
implies that it is equivalent to impose that g(!) has a zero of order p
^
~
b (! + ), this means that b(!) has a zero
~
~
at ! = 0. Since g(!) = e;i! h
^
h
of order p at ! = . Similarly the number of vanishing moments of ~
~
^
is equal to the number p of zeroes of h(!) at . Regularity Although the regularity of a function is a priori indepen- dent of the number of vanishing moments, the smoothness of biorthogonal wavelets is related to their vanishing moments. The regularity of
and is the same because (7.150) shows that is a nite linear expansion of translated. Tchamitchian's Proposition 7.3 gives a su cient
^
condition for estimating this regularity. If h(!) has a zero of order p at
, we can perform the factorization
;i!
^
h(!) = 1 +2e Let B = sup!2 ;
Lipschitz for l j^(! )j. p ^(!) :
l (7.166) Proposition 7.3 proves that is uniformly < 0 = p ; log2 B ; 1:
Generally, log2 B increases more slowly than p. This implies that the
regularity of and increases with p, which is equal to the number of
vanishing moments of ~. Similarly, one can show that the regularity of
~ and ~ increases with p, which is the number of vanishing moments
~
^ and h have di erent numbers of zeroes at , the properties
~
of . If h
of and ~ can therefore be very di erent.
and ~ might not have the same
regularity and number of vanishing moments, the two reconstruction
formulas Ordering of Wavelets Since
f=
f= +1
X n j =;1
+1 X n j =;1 hf j n i ~j n hf ~j n i jn (7.167)
(7.168) 7.4. BIORTHOGONAL WAVELET BASES 363 are not equivalent. The decomposition (7.167) is obtained with the
~~
lters (h g) at the decomposition and (h g) at the reconstruction. The
~~
inverse formula (7.168) corresponds to (h g) at the decomposition and
(h g) at the reconstruction.
To produce small wavelet coe cients in regular regions we must
compute the inner products using the wavelet with the maximum number of vanishing moments. The reconstruction is then performed with
the other wavelet, which is generally the smoothest one. If errors are
added to the wavelet coe cients, for example with a quantization, a
smooth wavelet at the reconstruction introduces a smooth error. The
number of vanishing moments of is equal to the number p of zeroes
~
b. Increasing p also increases the regularity of ~. It is thus
~
at of h
~
~
^
better to use h at the decomposition and h at the reconstruction if h
b
~
has fewer zeroes at than h. Symmetry It is possible to construct smooth biorthogonal wavelets of compact support which are either symmetric or antisymmetric. This
is impossible for orthogonal wavelets, besides the particular case of the
Haar basis. Symmetric or antisymmetric wavelets are synthesized with
~
perfect reconstruction lters having a linear phase. If h and h have an
odd number of non-zero samples and are symmetric about n = 0, the
reader can verify that and ~ are symmetric about t = 0 while and
~ are symmetric with respect to a shifted center. If h and h have an
~
even number of non-zero samples and are symmetric about n = 1=2,
then (t) and ~(t) are symmetric about t = 1=2, while and ~ are
antisymmetric with respect to a shifted center. When the wavelets
are symmetric or antisymmetric, wavelet bases over nite intervals are
constructed with the folding procedure of Section 7.5.2. 7.4.3 Compactly Supported Biorthogonal Wavelets 2 We study the design of biorthogonal wavelets with a minimum size
support for a speci ed number of vanishing moments. Symmetric or
antisymmetric compactly supported spline biorthogonal wavelet bases
are constructed with a technique introduced in 131].
Theorem 7.12 (Cohen, Daubechies, Feauveau) Biorthogonal wavelets CHAPTER 7. WAVELET BASES 364 and ~ with respectively p and p vanishing moments have a support
~
of size at least p + p ; 1. CDF biorthogonal wavelets have a minimum
~
support of size p + p ; 1.
~
Proof 3 . The proof follows the same approach as the proof of Daubechies's
Theorem 7.5. One can verify that p and p must necessarily have the same
~
~
parity. We concentrate on lters h n] and h n] that have a symmetry with
respect to n = 0 or n = 1=2. The general case proceeds similarly. We
can then factor p 2 exp ;i !
2
p
b(!) = 2 exp ;i !
~
h
2 ^ (! ) =
h cos ! L(cos !)
2
p p
~
~
cos ! L(cos !)
2 (7.169)
(7.170) with = 0 for p and p even and = 1 for odd values. Let q = (p + p)=2.
~
~
The perfect reconstruction condition b
b
^
~
^
~
h (!) h(!) + h (! + ) h(! + ) = 2 is imposed by writing ~
L(cos !) L(cos !) = P sin2 !
2 (7.171) (1 ; y)q P (y) + yq P (1 ; y) = 1: (7.172) where the polynomial P (y) must satisfy for all y 2 0 1]
We saw in (7.101) that the polynomial of minimum degree satisfying this
equation is
q;1
X q;1+k k
P (y) =
y:
(7.173)
k=0 k The spectral factorization (7.171) is solved with a root attribution similar
to (7.103). The resulting minimum support of and ~ speci ed by
(7.165) is then p + p ; 1.
~ Spline Biorthogonal Wavelets Let us choose
p
^
h(!) = 2 exp ;i2 ! cos !
2 p (7.174) 7.4. BIORTHOGONAL WAVELET BASES
~(t) (t) ~(t) (t) 2 2 0.8 1.5 1 365 0.6 1.5
1 1 0.5 0.4 0.5
0 0
−1 0 0.5 −0.5 −4 1 0 0.2 −0.5 −0.5 0.5 −2 (t) 0 ~(t) 2 0
−1 4 0 1 −5 2 (t) 3 1 1 0 0 0.5 0
0 −2 −1
−1 0 1 2 −2 3 −1 −0.5 −0.5 −1 0 1 2 3 ~(t) 1 2
1 5 2 0.5 1.5 0 −1
−4 −2
−2 0 2 −4 4 −2 0 2 4 p=2 p=4
~
p=2 p=4
~
p=3 p=7
~
p=3 p=7
~
Figure 7.14: Spline biorthogonal wavelets and scaling functions of compact support corresponding to the lters of Table 7.3.
with = 0 for p even and = 1 for p odd. The scaling function
computed with (7.153) is then a box spline of degree p ; 1
^(!) = exp ;i !
2 sin(!=2)
!=2 p : Since is a linear combination of box splines (2t ; n), it is a compactly
supported polynomial spline of same degree.
The number of vanishing moments p of is a free parameter, which
~
must have the same parity as p. Let q = (p + p)=2. The biorthogonal
~
~ of minimum length is obtained by observing that L(cos !) = 1
lter h
in (7.169). The factorization (7.171) and (7.173) thus imply that
p
b
~
h(!) = 2 exp ;i !
2 cos !
2 X q;1+k p q;1
~ sin !
2 2k :
(7.175)
These lters satisfy the conditions of Theorem 7.11 and thus generate
biorthogonal wavelet bases. Table 7.3 gives the lter coe cients for
(p = 2 p = 4) and (p = 3 p = 7). The resulting dual wavelet and
~
~
scaling functions are shown in Figure 7.13.
k=0 k CHAPTER 7. WAVELET BASES 366
n
0 p,~
p h n] ~
h n] 0.70710678118655 0.99436891104358
1 ;1 p = 2 0.35355339059327 0.41984465132951
2 ;2 p = 4
~
;0:17677669529664
3 ;3
;0:06629126073624
4 ;4
0.03314563036812
01
0.53033008588991 0.95164212189718
;1 2 p = 3 0.17677669529664 ;0:02649924094535
;2 3 p = 7
~
;0:30115912592284
;3 4
0.03133297870736
;4 5
0.07466398507402
;5 6
;0:01683176542131
;6 7
;0:00906325830378
;7 8
0.00302108610126 ~
Table 7.3: Perfect reconstruction lters h and h for compactly supb
^
~
ported spline wavelets, with h and h having respectively p and p zeros
~
at ! = . Closer Filter Length Biorthogonal lters h and ~ of more similar
h length are obtained by factoring the polynomial P (sin2 ! ) in (7.171)
2
~
with two polynomial L(cos !) and L(cos !) of similar degree. There is
a limited number of possible factorizations. For q = (p + p)=2 < 4,
~
the only solution is L(cos !) = 1. For q = 4 there is one non-trivial
factorization and for q = 5 there are two. Table 7.4 gives the resulting
~
coe cients of the lters h and h of most similar length, computed by
Cohen, Daubechies and Feauveau 131]. These lters also satisfy the
conditions of Theorem 7.11 and therefore de ne biorthogonal wavelet
bases. Figure 7.15 gives the scaling functions and wavelets corresponding to p = p = 4. These dual functions are similar, which indicates
~
that this basis is nearly orthogonal. This particular set of lters is often used in image compression. The quasi-orthogonality guarantees a
good numerical stability and the symmetry allows one to use the folding procedure of Section 7.5.2 at the boundaries. There are also enough
vanishing moments to create small wavelet coe cients in regular image
domains. How to design other compactly supported biorthogonal lters
is discussed extensively in 131, 340]. 7.4. BIORTHOGONAL WAVELET BASES
pp
~
p=4
p=4
~
p=5
p=5
~ p=5
p=5
~ n
0
;1 1
;2 2
;3 3
;4 4
0
;1 1
;2 2
;3 3
;4 4
;5 5
0
;1 1
;2 2
;3 3
;4 4
;5 5 h n] 0.78848561640637
0.41809227322204
;0:04068941760920
;0:06453888262876
0
0.89950610974865
0.47680326579848
;0:09350469740094
;0:13670658466433
;0:00269496688011
0.01345670945912
0.54113273169141
0.34335173921766
0.06115645341349
0.00027989343090
0.02183057133337
0.00992177208685 367
~
h n] 0.85269867900889
0.37740285561283
;0:11062440441844
;0:02384946501956
0.03782845554969
0.73666018142821
0.34560528195603
;0:05446378846824
0.00794810863724
0.03968708834741
0
1.32702528570780
0.47198693379091
;0:36378609009851
;0:11843354319764
0.05382683783789
0 Table 7.4: Perfect reconstruction lters of most similar length. (t) ~(t) (t) 1.5 2 1 1 0.5 ~(t)
2 1.5
1 1 0.5 0 0
0 0 −1
−0.5 −2 0 2 −2 0 2 4 −0.5
−4 −2 0 2 4 −1 −2 0 2 Figure 7.15: Biorthogonal wavelets and scaling functions calculated
with the lters of Table 7.4, with p = 4 and p = 4.
~ 4 CHAPTER 7. WAVELET BASES 368 7.4.4 Lifting Wavelets 3 A lifting is an elementary modi cation of perfect reconstruction lters,
which is used to improve the wavelet properties. It also leads to fast
polyphase implementations of lter bank decompositions. The lifting
scheme of Sweldens 325, 324] does not rely on the Fourier transform
and can therefore construct wavelet bases over non-translation invariant domains such as bounded regions of p or surfaces. This section
concentrates on the main ideas, avoiding technical details. The proofs
are left to the reader.
Theorem 7.11 constructs compactly supported biorthogonal wavelet
bases from nite impulse response biorthogonal lters (h g ~ g) which
h~
satisfy
^~
^
~
h (!) b(!) + h (! + ) b(! + ) = 2
h
h
(7.176)
and R ~
^
g(!) = e;i! b (! + ) b(!) = e;i! h (! + ):
^
h
g
~
(7.177)
~
The lters h and h are said to be dual. The following proposition 209]
~
characterizes all lters of compact support that are dual to h.
~
Proposition 7.5 (Herley,Vetterli) Let h and h be dual lters with
~
a nite support. A lter hl with nite support is dual to h if and only
if there exists a nite lter l such that
^
^
~
hl (!) = h(!) + e;i! b (! + ) ^ (2!):
h
l (7.178) This proposition proves that if (h g ~ g) are biorthogonal then we
h~
h~
can construct a new set of biorthogonal lters (hl g ~ gl) with
^
^
hl (!) = h(!) + g(!) ^ (2!)
^l (7.179) bl
~
) = b(!) ; h(!) ^(2!):
g
~
(7.180)
This is veri ed by inserting (7.177) in (7.178). The new lters are said
to be lifted because the use of l can improve their properties.
^
bl(!) = e;i! hl (! +
g
~ 7.4. BIORTHOGONAL WAVELET BASES 369 The inverse Fourier transform of (7.179) and (7.180) gives hl n] = h n] + gl n] = g n] ;
~
~ +1
X k=;1
+1 X k=;1 g n ; 2k] l ;k] (7.181) ~
h n ; 2k] l k]: (7.182) Theorem 7.10 proves that the conditions (7.176) and (7.177) are equiva~
~
lent to the fact that fh n;2k] g n;2k]gk2Z and fh n;2k] g n;2k]gk2Z
2 ( ). The lifting scheme thus creates
are biorthogonal Riesz bases of l
~
new families fhl n ; 2k] g n ; 2k]gk2Z and fh n ; 2k] gl n ; 2k]gk2Z
~
2 ( ). The following theorem
that are also biorthogonal Riesz bases of l
derives new biorthogonal wavelet bases by inserting (7.181) and (7.182)
in the scaling equations (7.149) and (7.150). Z Z ~ ~) be a family of compactly
supported biorthogonal scaling functions and wavelets associated to the
lters (h g ~ g). Let l k] be a nite sequence. A new family of formally
h~
biorthogonal scaling functions and wavelets ( l l ~ ~l ) is de ned by Theorem 7.13 (Sweldens) Let ( l (t) = p 2 +1
X k=;1 h k] l (2t ; k) + +1
X k=;1 l ;k] l (t ; k)(7.183) CHAPTER 7. WAVELET BASES 370
l (t) p
=2 +1
X g k] l(2t ; k) k=;1
+1 ~l (t) = ~(t) ; X k=;1 (7.184) l k] ~(t ; k): (7.185) Theorem 7.11 imposes that the new lter hl should satisfy (7.154)
and (7.156) to generate functions l and l of nite energy. This is not
necessarily the case for all l, which is why the biorthogonality should
be understood in a formal sense. If these functions have a nite energy
then f jl ng(j n)2Z2 and f ~jl ng(j n)2Z2 are biorthogonal wavelet bases of
L2( ).
The lifting increases the support size of and ~ typically by the
length of the support of l. Design procedures compute minimum size
lters l to achieve speci c properties. Section 7.4.2 explains that the
regularity of and and the number of vanishing moments of ~ de^
pend on the number of zeros of h(!) at ! = , which is also equal to
b(!) at ! = 0. The coe cients l n] are often
the number of zeros of g
~
calculated to produce a lifted transfer function b l (!) with more zeros
g
~
at ! = 0.
To increase the number of vanishing moment of and the regularity
~ and ~ we use a dual lifting which modi es h and hence g instead
~
of
of h and g. The corresponding lifting formula with a lter L k] are
~
obtained by inverting h with g and g with g in (7.181) and (7.182):
~ R g L n] = g n] + ~
~
h L n] = h n ] ; +1
X k=;1
+1 X k=;1 h n ; 2k] L ;k] (7.186) g n ; 2k] L k]:
~ (7.187) The resulting family of biorthogonal scaling functions and wavelets
( L ~L ~L) are obtained by inserting these equations in the scaling equations (7.149) and (7.150):
+1
+1
X ~ ~L
X
~L(t) = p2
h k] (2t ; k) ;
L k] ~L(t ; k)(7.188)
k=;1 k=;1 7.4. BIORTHOGONAL WAVELET BASES
~L(t) = p2
L (t) = +1
X g k] ~L(2t ; k)
~ k=;1
+1 (t) + X k=;1 L ;k] (t ; k): 371
(7.189)
(7.190) Successive iterations of liftings and dual liftings can improve the regularity and vanishing moments of both and ~ by increasing the number
of zeros of g(!) and b(!) at ! = 0.
^
g
~
~
Lazy Wavelets Lazy lters h n] = h n] = n] and g n] = g n] =
~ n;1] satisfy the biorthogonality conditions (7.176) and (7.177). Their
Fourier transform is
b(!) = h(!) = 1 and b(!) = g(!) = e;i! :
~
^
h
g
~
^
(7.191)
The resulting lter bank just separates the even and odd samples of a
signal without ltering. This is also called a polyphase decomposition
73]. The lazy scaling functions and wavelets associated to these lters
are Diracs ~(t) = (t) = (t) and ~(t) = (t) = (t ; 1=2). They do not
belong to L2 ( ) because b(!) and g(!) do not vanish at ! = 0. These
g
~
^
wavelet can be transformed into nite energy functions by appropriate
liftings. R Example 7.11 A lifting of a lazy lter b(!) = e;i! yields
g
~ bl(!) = e;i! ; ^(2!):
g
~
l
To produce a symmetric wavelet ei! ^(2!) must be even. For example,
l
to create 4 vanishing moments a simple calculation shows that the
shortest lter l has a Fourier transform
^(2!) = e;i! 9 cos ! ; 1 cos 3! :
l
8
8
Inserting this in (7.178) gives
1
9
9
1
^
hl (!) = ; 16 e;3i! + 16 e;i! + 1 + 16 ei! ; 16 e3i! :
(7.192) CHAPTER 7. WAVELET BASES 372 The resulting l is the Deslauriers-Dubuc interpolating scaling function
p
of order 4 shown in Figure 7.21(b), and l (t) = 2 l (2t ; 1). These
interpolating scaling functions and wavelets are further studied in Section 7.6.2. Both l and l are continuously di erentiable but ~ and
~l are sums of Diracs. A dual lifting can transform these into nite
energy functions by creating a lifted lter gl(!) with one or more zero
^
at ! = 0.
The following theorem proves that lifting lazy wavelets is a general
lter design procedure. A constructive proof is based on the Euclidean
algorithm 148].
Theorem 7.14 (Daubechies, Sweldens) Any biorthogonal lters (h g ~ g)
h~
can be synthesized with a succession of liftings and dual liftings applied
to the lazy lters (7.191), up to shifting and multiplicative constants. Fast Polyphase Transform After lifting, the biorthogonal wavelet transform is calculated with a simple modi cation of the original wavelet
transform. This implementation requires less calculation than a direct
lter bank implementation of the lifted wavelet transform. We denote
l k] = hf l i and dl k] = hf l i.
aj
j
jk
jk
~~
The standard lter bank decomposition with (hl h g gl) computes alj+1 k] = dlj+1 k] = +1
X n=;1
+1 X n=;1 hl n ; 2k] alj n] = alj ? hl 2k] (7.193) g n ; 2k] alj n] = alj ? g 2k]: (7.194) The reconstruction is obtained with alj n] = +1
X n=;1 ~
h n ; 2k] alj+1 k] + +1
X n=;1 gl n ; 2k] dlj+1 k]:
~ (7.195) Inserting the lifting formulas (7.181) and (7.182) in (7.193) gives an
expression that depends only on the original lter h: a0+1
j k] = +1
X n=;1 h n ; 2k] alj n] = alj ? h 2k] 7.4. BIORTHOGONAL WAVELET BASES 373 plus a lifting component that is a convolution with l alj+1 k] = a0+1
j k] + +1
X n=;1 l k ; n] dlj+1 n] = a0+1 k] + dlj+1 ? l k]:
j This operation is simply inverted by calculating a0+1 k] = alj+1 k] ; dlj+1 ? l k]
j
~~
and performing a reconstruction with the original lters (h g) alj n] = X~
n h n ; 2k] a0 k] +
j X
n g n ; 2k] dlj k]:
~ Figure 7.16 illustrates this decomposition and reconstruction. It also
includes the implementation of a dual lifting with L, which is calculated
with (7.186):
dL+1 k] = dlj+1 k] + alj+1 ? L k] :
j
Theorem 7.14 proves that any biorthogonal family of lters can be
calculated with a succession of liftings and dual liftings applied to lazy
~
lters. In this case, the lters h n] = h n] = n] can be removed
whereas g n] = n + 1] and g n] = n ; 1] shift signals by 1 sample
~
in opposite directions. The lter bank convolution and subsampling is
thus directly calculated with a succession of liftings and dual liftings
on the polyphase components of the signal (odd and even samples)
73]. One can verify that this implementation divides the number of
operations by up to a factor 2 148], compared to direct convolutions
and subsamplings calculated in (7.193) and (7.194). Lifted Wavelets on Arbitrary Domains The lifting procedure is extended to signal spaces which are not translation invariant. Wavelet
bases and lter banks are designed for signals de ned on arbitrary domains D of p or on surfaces such as a spheres.
Wavelet bases of L2(D) are derived from a family of embedded vector spaces fVj gj2Z that satisfy similar multiresolution properties as in
De nition 7.1. These spaces are constructed from embedded sampling
grids fGj gj2Z included in D. For each index j , Gj has nodes whose R CHAPTER 7. WAVELET BASES 374 a _
h
a + 2 l
j a l
_
g (a)
_ l
a j+1
L L
d j+1 + 0
a j+1
2 l
j+1 ~
h + l _
l
d
j+1 2 l
j+1 L 2
d L
d
j+1 0
j+1 l
aj ~
g (b)
Figure 7.16: (a): A lifting and a dual lifting are implemented by modifying the original lter bank with two lifting convolutions, where l and
L are respectively the lifting and dual lifting sequences. (b): The inverse lifted transform removes the lifting components before calculating
the lter bank reconstruction. 7.4. BIORTHOGONAL WAVELET BASES 375 distance to all its neighbors is of the order of 2j . Since Gj+1 is included
in Gj we can de ne a complementary grid Cj+1 that regroups all nodes
of Gj that are not in Gj+1. For example, if D = 0 N ] then Gj is the
uniform grid f2j ng0 n 2;j N . The complementary grid Cj+1 corresponds
to f2j (2n +1)g0 n<2;j;1 N . In two dimensions, the sampling grid Gj can
be de ned as the nodes of a regular triangulation of D. This triangulation is progressively re ned with a midpoint subdivision illustrated in
Figure 7.17. Such embedded grids can also be constructed on surfaces
325].
Suppose that fhj k gk2Gj+1 fgj mgm2Cj+1 is a basis of the space l2(Gj )
of nite energy signals de ned over Gj . Any aj 2 l2(Gj ) is decomposed
into two signals de ned respectively over Gj+1 and Cj+1 by
8k 2 Gj +1 aj+1 k] = haj hj ki = 8m 2 Cj +1 dj+1 m] = haj gj mi = X n2Gj X n2Gj aj n] hj k n] (7.196)
dj n] gj m n]: (7.197) This decomposition is implemented by linear operators on subsampled
grids as in the lter banks previously studied. However, these operators are not convolutions because the basis fhj k gk2Gj+1 fgj mgm2Cj+1
is not translation invariant. The reconstruction is performed with a
~
biorthogonal basis fhj kgk2Gj+1 fgj mgm2Cj+1 :
~
X
X
aj n] =
aj+1 k] ~ j k n] +
h
dj+1 m] gj m n]:
~
m2Cj+1 k2Gj+1 Scaling functions and wavelets are obtained by cascading lter bank
reconstructions over progressively ner scales. As a result, they satisfy
scaling equations similar to (7.112) and (7.114)
j +1 k = ~j+1 k = X n2Gj hj k n] jn hj k n] jn X~ n2Gj j +1 m = X n2Gj gj m n] ~j+1 m = X gj m n]
~
n2Gj jn (7.198) jn : (7.199) These wavelets and scaling functions have a support included in D. If
they have a nite energy with respect to an appropriate measure d CHAPTER 7. WAVELET BASES 376 Figure 7.17: Black dots are the nodes of a triangulation grid Gj+1 of a
polygon domain D. This grid is re ned with a subdivision, which adds
a complementary grid Cj+1 composed of all midpoints indicated with
white circles. The ner grid is Gj = Gj+1 Cj+1.
de ned over D then one can verify that for any J log2 N
h
f J k gk2GJ f j m gm2Cj j J and f ~J k gk2GJ f ~j mgm2Cj j J i are biorthogonal bases of L2(D d ).
The discrete lazy basis of l2(Gj ) is composed of Diracs hj k n] =
n;k] for (k n) 2 Gj+1 Gj and gj m n] = n;k] for (k n) 2 Cj+1 Gj .
This basis is clearly orthonormal so the dual basis is also the lazy basis.
The resulting lter bank just separates samples of Gj into two sets of
samples that belong respectively to Gj+1 and Cj+1 . The corresponding
scaling functions and wavelets are Diracs located over these sampling
grids. Finite energy wavelets and scaling functions are constructed by
lifting the discrete lazy basis.
Theorem 7.15 (Sweldens) Suppose that fhj kgk2Gj+1 fgj mgm2Cj+1
~
and fhj k gk2Gj+1 fgj mgm2Cj+1 are biorthogonal Riesz bases of l2(Gj ).
~
Let lj k m] be a matrix with a nite number of non-zero values. If
8k 2 Gj +1 hlj k = hj k + 8m 2 Cj +1 gjl m = gj m ;
~
~ X m2Cj+1 X k2Gj+1 lj k m] gm j (7.200) ~
lj k m] hk j (7.201) ~
then fhlj kgk2Gj+1 fgj m gm2Cj+1 and fhj kgk2Gj+1 fgjl mgm2Cj+1 are biorthog~
2 (G ).
onal Riesz bases of l j 7.4. BIORTHOGONAL WAVELET BASES 377 These formulas generalize the translation invariant lifting (7.181)
and (7.182), which corresponds to lj k m] = l k ; m]. In the general
case, at each scale 2j , the lifting matrix lj k m] can be chosen arbitrarily. The lifted bases generate new scaling functions and wavelets
that are related to the original scaling functions and wavelets by inserting (7.200) and (7.201) in the scaling equations (7.198) and (7.199)
calculated with lifted lters:
l
j +1 k = l
j +1 m = X n2Gj X n2Gj hj k n] jn+ gj m n] l
jn ~jl +1 m = ~j+1 m ; X k2Gj+1 X m2Cj+1 lj k m] j +1 m lj k m] ~j+1 k: ~
The dual scaling functions ~j k are not modi ed since hj k is not changed
by the lifting.
The fast decomposition algorithm in this lifted wavelet basis is calculated with the same approach as in the translation invariant case
previously studied. However, the lifting blocks illustrated in Figure
7.16 are not convolutions anymore. They are linear operators computed with the matrices lj k m], which depend upon the scale 2j .
To create wavelets ~j m with vanishing moments, we ensure that
they are orthogonal to a basis of polynomials fpigi of degree smaller
than q. The coe cients l k m] are calculated by solving the linear
system for all i and m 2 Cj+1
X
h ~jl +1 m pi i = h jl +1 m pi i ;
lj k m] h ~lj+1 k pii = 0:
k2Gj+1 ~
A dual lifting is calculated by modifying hj k and gj m instead of hj k
~j k .
and gj m. It allows one to change
~ Applications Lifting lazy wavelets is a simple way to construct biorthogonal wavelet bases of L2 0 1]. One may use a translation invariant lifting, which is modi ed near the left and right borders to construct
lters whose supports remains inside D = 0 1]. The lifting coe cients CHAPTER 7. WAVELET BASES 378 are calculated to design regular wavelets with vanishing moments 325].
Section 7.5 studies other ways to construct orthogonal wavelet bases of
L2 0 1].
Biorthogonal wavelet bases on manifolds or bounded domains of p
are calculated by lifting lazy wavelets constructed on embedded sampling grids. Lifted wavelets on the sphere have applications in computer
graphics 326]. In nite two-dimensional domains, lifted wavelet bases
are used for numerical calculations of partial di erential equations 118].
To optimize the approximation of signals with few wavelet coe cients, one can also construct adaptive wavelet bases with liftings that
depend on the signal. Short wavelets are needed in the neighborhood
of singularities, but longer wavelets with more vanishing moments can
improve the approximation in regions where the signal is more regular. Such a basis can be calculated with a time varying lifting whose
coe cients lj k m] are adapted to the local signal properties 325]. R 7.5 Wavelet Bases on an Interval 2
To decompose signals f de ned over an interval 0 1], it is necessary
to construct wavelet bases of L2 0 1]. Such bases are synthesized by
modifying the wavelets j n(t) = 2;j=2 (2;j t;n) of a basis f j ng(j n)2Z2
of L2 ( ). The inside wavelets j n whose support are included in 0 1]
are not modi ed. The boundary wavelets j n whose supports overlap
t = 0 or t = 1 are transformed into functions having a support in 0 1],
which are designed in order to provide the necessary complement to
generate a basis of L2 0 1]. If has a compact support then there is a
constant number of boundary wavelets at each scale.
The main di culty is to construct boundary wavelets that keep
their vanishing moments. The next three sections describe di erent approaches to constructing boundary wavelets. Periodic wavelets have no
vanishing moments at the boundary, whereas folded wavelets have one
vanishing moment. The custom-designed boundary wavelets of Section
7.5.3 have as many vanishing moments as the inside wavelets but are
more complicated to construct. Scaling functions j n are also restricted
to 0 1] by modifying the scaling functions j n(t) = 2;j=2 (2;j t ; n)
associated to the wavelets j n. The resulting wavelet basis of L2 0 1] R 7.5. WAVELET BASES ON AN INTERVAL 379 is composed of 2;J scaling functions at a coarse scale 2J < 1, plus 2;j
wavelets at each scale 2j 2J :
f iJnt g0 n<2;J f jint g;1<j J 0 n<2;j :
(7.202)
n
n
On any interval a b], a wavelet orthonormal basis of L2 a b] is constructed with a dilation by b ; a and a translation by a of the wavelets
in (7.202). Discrete Basis of C N The decomposition of a signal in a wavelet
basis over an interval is computed by modifying the fast wavelet transform algorithm of Section 7.3.1. A discrete signal b n] of N samples
is associated to the approximation of a signal f 2 L2 0 1] at a scale
N ;1 = 2L with (7.116):
N ;1=2 b n] = aL n] = hf iLntni for 0 n < 2;L :
Its wavelet coe cients can be calculated at scales 1 2j > 2L. We set
aj n] = hf ijnt i and dj n] = hf jint i for 0 n < 2;j : (7.203)
n
n
The wavelets and scaling functions with support inside 0 1] are
identical to the wavelets and scaling functions of a basis of L2( ). The
corresponding coe cients aj n] and dj n] can thus be calculated with
the decomposition and reconstruction equations given by Theorem 7.7.
These convolution formulas must however be modi ed near the boundary where the wavelets and scaling functions are modi ed. Boundary
calculations depend on the speci c design of the boundary wavelets,
as explained in the next three sections. The resulting lter bank algorithm still computes the N coe cients of the wavelet representation
aJ fdj gL<j J ] of aL with O(N ) operations.
Wavelet coe cients can also be written as discrete inner products
of aL with discrete wavelets:
aj n] = haL m] ijnt m]i and dj n] = haL m] jint m]i : (7.204)
n
n
As in Section 7.3.3, we verify that
f iJnt m]g0 n<2;J f jint m]gL<j J 0 n<2;j
n
n
is an orthonormal basis of N . R C CHAPTER 7. WAVELET BASES 380 7.5.1 Periodic Wavelets R A wavelet basis f j ng(j n)2Z2 of L2( ) is transformed into a wavelet
basis of L2 0 1] by periodizing each j n. The periodization of f 2
L2( ) over 0 1] is de ned by R f per(t) = +1
X k=;1 f (t + k): (7.205) The resulting periodic wavelets are
per
j n (t) = +1
1X
pj
2 k=;1 t ; 2j n + k :
2j For j 0, there are 2;j di erent jper indexed by 0 n < 2;j . If the
n
support of j n is included in 0 1] then jper(t) = j n(t) for t 2 0 1].
n
The restriction to 0 1] of this periodization thus modi es only the
boundary wavelets whose supports overlap t = 0 or t = 1. As indicated
in Figure 7.18, such wavelets are transformed into boundary wavelets
which have two disjoint components near t = 0 and t = 1. Taken separately, the components near t = 0 and t = 1 of these boundary wavelets
have no vanishing moments, and thus create large signal coe cients, as
we shall see later. The following theorem proves that periodic wavelets
together with periodized scaling functions per generate an orthogonal
jn
basis of L2 0 1].
1 0 t Figure 7.18: The restriction to 0 1] of a periodic wavelet
disjoint components near t = 0 and t = 1. per
jn has two Theorem 7.16 For any J 0
f per
j n g;1<j J 0 n<2;j is an orthogonal basis of L2 0 1]. f per g
J n 0 n<2;J (7.206) 7.5. WAVELET BASES ON AN INTERVAL 381 Proof 2 . The orthogonality of this family is proved with the following
lemma. R Lemma 7.2 Let (t) (t) 2 L2( ) . If h (t) (t + k)i = 0 for all k 2 Z1 then 0 per (t) per (t) dt = 0: Z (7.207) To verify (7.207) we insert the de nition (7.205) of periodized functions: Z1
0 per (t) per (t) dt =
= Z +1 (t) ;1
+1
X Z +1 per (t) dt k=;1 ;1 (t) (t + k) dt = 0: R Since f j ng;1<j J n2Z f J n gn2Z] is orthogonal in L2( ), we can
verify that any two di erent wavelets or scaling functions per and per in
(7.206) have necessarily a non-periodized version that satis es h (t) (t+
k)i = 0 for all k 2 . Lemma 7.2 thus proves that (7.206) is orthogonal
in L2 0 1].
To prove that this family generates L2 0 1], we extend f 2 L2 0 1]
with zeros outside 0 1] and decompose it in the wavelet basis of L2( ): Z f= J +1
XX j =;1 n=;1 hf j ni jn+ +1
X n=;1 R hf J ni J n : (7.208) This zero extension is periodized with the sum (7.205), which de nes f per(t) = f (t) for t 2 0 1]. Periodizing (7.208) proves that f can be
decomposed over the periodized wavelet family (7.206) in L2 0 1]. R Theorem 7.16 shows that periodizing a wavelet orthogonal basis of
L2( ) de nes a wavelet orthogonal basis of L2 0 1]. If J = 0 then
there is a single scaling function, and one can verify that 0 0(t) = 1.
The resulting scaling coe cient hf 0 0i is the average of f over 0 1].
Periodic wavelet bases have the disadvantage of creating high amplitude wavelet coe cients in the neighborhood of t = 0 and t = 1,
because the boundary wavelets have separate components with no vanishing moments. If f (0) 6= f (1), the wavelet coe cients behave as if the CHAPTER 7. WAVELET BASES 382 signal were discontinuous at the boundaries. This can also be veri ed
by extending f 2 L2 0 1] into an in nite 1 periodic signal f per and by
showing that Z1
0 f (t) per
j n (t) dt = Z +1
;1 f per(t) j n(t) dt: (7.209) If f (0) 6= f () then f per(t) is discontinuous at t = 0 and t = 1, which creates high amplitude wavelet coe cients when j n overlaps the interval
boundaries. Periodic Discrete Transform For f 2 L2 0 1] let us consider
aj n] = hf peri and dj n] = hf jperi:
jn
n
We verify as in (7.209) that these inner products are equal to the coefcients of a periodic signal decomposed in a non-periodic wavelet basis:
aj n] = hf per j ni and dj n] = hf per j ni:
The convolution formulas of Theorem 7.7 thus apply if we take into
account the periodicity of f per. This means that aj n] and dj n] are
considered as discrete signals of period 2;j , and all convolutions in
(7.107-7.109) must therefore be replaced by circular convolutions. Despite the poor behavior of periodic wavelets near the boundaries, they
are often used because the numerical implementation is particularly
simple. 7.5.2 Folded Wavelets Decomposing f 2 L2 0 1] in a periodic wavelet basis was shown in
(7.209) to be equivalent to a decomposition of f per in a regular basis
of L2( ). Let us extend f with zeros outside 0 1]. To avoid creating
discontinuities with such a periodization, the signal is folded with respect to t = 0: f0(t) = f (t) + f (;t). The support of f0 is ;1 1] and
it is transformed into a 2 periodic signal, as illustrated in Figure 7.19 R f fold(t) = +1
X k=;1 f0 (t ; 2k) = +1
X k=;1 f (t ; 2k) + +1
X k=;1 f (2k ; t): (7.210) 7.5. WAVELET BASES ON AN INTERVAL 383 Clearly f fold(t) = f (t) if t 2 0 1], and it is symmetric with respect
to t = 0 and t = 1. If f is continuously di erentiable then f fold is
continuous at t = 0 and t = 1, but its derivative is discontinuous at
t = 0 and t = 1 if f 0(0) 6= 0 and f 0(1) 6= 0.
Decomposing f fold in a wavelet basis f j ng(j n)2Z2 is equivalent to
decomposing f on a folded wavelet basis. Let jfold be the folding of
n
with the summation (7.210). One can verify that
jn Z1
0 f (t) fold
j n (t) dt = Z +1
;1 f fold(t) j n(t) dt: (7.211) Suppose that f is regular over 0 1]. Then f fold is continuous at t = 0 1
and hence produces smaller boundary wavelet coe cients than f per.
However, it is not continuously di erentiable at t = 0 1, which creates
bigger wavelet coe cients at the boundary than inside.
f(t)
0 1 Figure 7.19: The folded signal f fold(t) is 2 periodic, symmetric about
t = 0 and t = 1, and equal to f (t) on 0 1].
To construct a basis of L2 0 1] with the folded wavelets jfold, it is
n
su cient for (t) to be either symmetric or antisymmetric with respect
to t = 1=2. The Haar wavelet is the only real compactly supported
wavelet that is symmetric or antisymmetric and which generates an
orthogonal basis of L2( ). On the other hand, if we loosen up the orthogonality constraint, Section 7.4 proves that there exist biorthogonal
bases constructed with compactly supported wavelets that are either
symmetric or antisymmetric. Let f j ng(j n)2Z2 and f ~j ng(j n)2Z2 be
such biorthogonal wavelet bases. If we fold the wavelets as well as the
scaling functions then for J 0
(7.212)
f jfold g;1<j J 0 n<2;j f fJoldg0 n<2;J
n
n
is a Riesz basis of L2 0 1] 134]. The biorthogonal basis is obtained by
folding the dual wavelets ~j n and is given by
h ~fold
i
~fJoldg0 n<2;J :
f j n g;1<j J 0 n<2;j f n
(7.213) R 384 CHAPTER 7. WAVELET BASES If J = 0 then f0old = ~f0old = 1.
0
0
Biorthogonal wavelets of compact support are characterized by a
~
pair of nite perfect reconstruction lters (h h). The symmetry of these
wavelets depends on the symmetry and size of the lters, as explained
in Section 7.4.2. A fast folded wavelet transform is implemented with
a modi ed lter bank algorithm, where the treatment of boundaries is
slightly more complicated than for periodic wavelets. The symmetric
and antisymmetric cases are considered separately. Folded Discrete Transform For f 2 L2 0 1], we consider
aj n] = hf fjoldi and dj n] = hf jfoldi:
n
n
We verify as in (7.211) that these inner products are equal to the coefcients of a folded signal decomposed in a non-folded wavelet basis:
aj n] = hf fold j ni and dj n] = hf fold j ni:
The convolution formulas of Theorem 7.7 thus apply if we take into account the symmetry and periodicity of f fold. The symmetry properties
of and imply that aj n] and dj n] also have symmetry and periodicity properties, which must be taken into account in the calculations
of (7.107-7.109).
Symmetric biorthogonal wavelets are constructed with perfect re^
construction lters h and h of odd size that are symmetric about n = 0.
Then is symmetric about 0, whereas is symmetric about 1=2. As a
result, one can verify that aj n] is 2;j+1 periodic and symmetric about
n = 0 and n = 2;j . It is thus characterized by 2;j + 1 samples, for
0 n 2;j . The situation is di erent for dj n] which is 2;j+1 periodic
but symmetric with respect to ;1=2 and 2;j ; 1=2. It is characterized
by 2;j samples, for 0 n < 2;j .
To initialize this algorithm, the original signal aL n] de ned over
0 n < N ; 1 must be extended by one sample at n = N , and considered to be symmetric with respect to n = 0 and n = N . The extension
is done by setting aL N ] = aL N ; 1]. For any J < L, the resulting
discrete wavelet representation fdj gL<j J aJ ] is characterized by N +1
coe cients. To avoid adding one more coe cient, one can modify symmetry at the right boundary of aL by considering that it is symmetric 7.5. WAVELET BASES ON AN INTERVAL 385 with respect to N ; 1=2 instead of N . The symmetry of the resulting
aj and dj at the right boundary is modi ed accordingly by studying the
properties of the convolution formula (7.162). As a result, these signals are characterized by 2;j samples and the wavelet representation
has N coe cients. This approach is used in most applications because
it leads to simpler data structures which keep constant the number of
coe cients. However, the discrete coe cients near the right boundary
can not be written as inner products of some function f (t) with dilated
boundary wavelets.
Antisymmetric biorthogonal wavelets are obtained with perfect re^
construction lters h and h of even size that are symmetric about
n = 1=2. In this case is symmetric about 1=2 and is antisymmetric
about 1=2. As a result aj and dj are 2;j+1 periodic and respectively
symmetric and antisymmetric about ;1=2 and 2;j ; 1=2. They are
both characterized by 2;j samples, for 0 n < 2;j . The algorithm is
initialized by considering that aL n] is symmetric with respect to ;1=2
and N ; 1=2. There is no need to add another sample. The resulting
discrete wavelet representation fdj gL<j J aJ ] is characterized by N
coe cients. 7.5.3 Boundary Wavelets 3 Wavelet coe cients are small in regions where the signal is regular only
if the wavelets have enough vanishing moments. The restriction of periodic and folded \boundary" wavelets to the neighborhood of t = 0
and t = 1 have respectively 0 and 1 vanishing moment. These boundary wavelets thus cannot fully take advantage of the signal regularity.
They produce large inner products, as if the signal were discontinuous
or had a discontinuous derivative. To avoid creating large amplitude
wavelet coe cients at the boundaries, one must synthesize boundary
wavelets that have as many vanishing moments as the original wavelet
. Initially introduced by Meyer, this approach has been re ned by
Cohen, Daubechies and Vial 134]. The main results are given without
proofs. Multiresolution of L2 0 1] A wavelet basis of L2 0 1] is constructed
with a multiresolution approximation fVjintg;1<j 0. A wavelet has p CHAPTER 7. WAVELET BASES 386 vanishing moments if it is orthogonal to all polynomials of degree p ; 1
or smaller. Since wavelets at a scale 2j are orthogonal to functions in
Vjint, to guarantee that they have p vanishing moments we make sure
that polynomials of degree p ; 1 are inside Vjint.
We de ne an approximation space Vjint L2 0 1] with a compactly
supported Daubechies scaling function , associated to a wavelet with
p vanishing moments. Theorem 7.5 proves that the support of has
size 2p ; 1. We translate so that its support is ;p + 1 p]. At a scale
2j (2p);1, there are 2;j ; 2p scaling functions with a support inside
0 1]:
j
p1 j t ; 2 n for p n < 2;j ; p :
2j
2
To construct an approximation space Vjint of dimension 2;j we add p
scaling functions with a support on the left boundary near t = 0:
1 left t for 0 n < p
int (t) = p
jn
2j n 2j
and p scaling functions on the right boundary near t = 1:
t ; 1 for 2;j ; p n < 2;j :
1 right
int (t) = p
jn
2;j ;1;n 2j
2j
The following proposition constructs appropriate boundary scaling functions f lneftg0 n<p and f rightg0 n<p.
n
int (t) =
j n (t ) =
jn Proposition 7.6 (Cohen, Daubechies, Vial) One can construct boundary scaling functions lneft and right so that if 2;j 2p then f
n
int
is an orthonormal basis of a space Vj satisfying int g
j n 0 n<2;j Vjint Vjint1
; 0; log (2p) 1
lim Vint = Closure @
VjintA = L2 0 1]
j !;1 j
2 j =;1 int
and the restrictions to 0 1] of polynomials of degree p ; 1 are in Vj . 7.5. WAVELET BASES ON AN INTERVAL 387 Proof 2 . A sketch of the proof is given. All details can be found in 134].
Since the wavelet corresponding to has p vanishing moments, the
Fix-Strang condition (7.75) implies that qk (t) = +1
X n=;1 nk (t ; n) (7.214) is a polynomial of degree k. At any scale 2j , qk (2;j t) is still a polynomial
of degree k, and for 0 k < p this family de nes a basis of polynomials
int
of degree p ; 1. To guarantee that polynomials of degree p ; 1 are in Vj
we impose that the restriction of qk (2;j t) to 0 1] can be decomposed in
int
the basis of Vj : qk (2;j t) 1 0 1] (t) = p;1
X n=0
p;1
X
n=0 X 2;j ;p;1 a n] left (2;j t) +
n b n] right (2;j t ; 2;j )
n n=p nk (2;j t ; n) + : (7.215) Since the support of is ;p + 1 p], the condition (7.215) together with
(7.214) can be separated into two non-overlapping left and right conditions. With a change of variable, we verify that (7.215) is equivalent
to
p
p;1
Xk
X
n (t ; n) 1 0 +1)(t) = a n] lneft(t)
(7.216)
and n=;p+1
p; 1
X
n=;p nk n=0 (t ; n) 1(;1 0] (t) = p;1
X b n] n=0
Vjint1 is
; right (t):
n (7.217) int
The embedding property Vj
obtained by imposing that
the boundary scaling functions satisfy scaling equations. We suppose
that lneft has a support 0 p + n] and satis es a scaling equation of the
form 2;1=2 left (2;1 t) =
n p;1
X
l=0 left
Hn l lleft (t) + pX
+2n
m=p hleft (t ; m)
nm (7.218) whereas right has a support ;p ; n 0] and satis es a similar scaling
n
right
left
equation on the right. The constants Hn l , hleft , Hn l and hright are
nm
nm CHAPTER 7. WAVELET BASES 388 adjusted to verify the polynomial reproduction equations (7.216) and
(7.217), while producing orthogonal scaling functions. The resulting
int
family f ijnt g0 n<2;j is an orthonormal basis of a space Vj .
n
int
The convergence of the spaces Vj to L2 0 1] when 2j goes to 0 is a
consequence of the fact that the multiresolution spaces Vj generated by
the Daubechies scaling function f j ngn2Z converge to L2 ( ). R The proof constructs the scaling functions through scaling equations
speci ed by discrete lters. At the boundaries, the lter coe cients
are adjusted to construct orthogonal scaling functions with a support in
0 1], and to guarantee that polynomials of degree p ; 1 are reproduced
by these scaling functions. Table 7.5 gives the lter coe cients for
p = 2. Wavelet Basis of L2 0 1] Let Wjint be the orthogonal complement of
Vjint in Vjint1. The support of the Daubechies wavelet with p vanishing
; moments is ;p + 1 p]. Since j n is orthogonal to any j l, we verify
that an orthogonal basis of Wjint can be constructed with the 2;j ; 2p
inside wavelets with support in 0 1]:
2j
p1 j t ; j n for p n < 2;j ; p
2
2
to which are added 2p left and right boundary wavelets
int
j n(t) = j n (t) = int
j n (t ) = p1 j
2 left
n t
2j for 0 n < p r
p1 j 2ight 1;n t ;j 1 for 2;j ; p n < 2;j :
;j ;
2
2
Since Wjint Vjint1, the left and right boundary wavelets at any scale
;
2j can be expanded into scaling functions at the scale 2j;1. For j = 1
we impose that the left boundary wavelets satisfy equations of the form
int
j n (t) = 1
p
2 left
n p;1
t = X Gleft
nl
2
l=0 left (t) +
l X p+2n
m=p left
gn m (t ; m) : (7.219) 7.5. WAVELET BASES ON AN INTERVAL
k
0
0
1
1
k
;2
;2
;1
;2 left
l
Hk l
Gleft
kl
0
0.6033325119
;0:7965436169
1
0.690895531
0.5463927140
0
0.03751746045
0.01003722456
1
0.4573276599
0.1223510431
right
l
Hk l
Gright
kl
;2
0.1901514184
;0:3639069596
;1 ;0:1942334074
0.3717189665
;2
0.434896998
0.8014229620
;1
0.8705087534
;0:2575129195
h ;1]
h 0]
0.482962913145
0.836516303738 k
0
1
1
1
k
;2
;2
;2
;1 m
hleft
km
2 ;0:398312997
2
0.8500881025
3
0.2238203570
4 ;0:1292227434
m
hright
km
;5
0.4431490496
;4
0.7675566693
;3
0.3749553316
;3
0.2303890438
h 1]
0.224143868042 389
left
gk m
;0:2587922483
0.227428117
;0:8366028212
0.4830129218
right
gk m
0.235575950
0.4010695194
;0:7175799994
;0:5398225007
h 2]
;0:129409522551 Table 7.5: Left and right border coe cients for a Daubechies wavelet
with p = 2 vanishing moments. The inside lter coe cients are
at the bottom of the table. A table of coe cients for p 2 vanishing moments can be retrieved over the Internet at the FTP site
ftp://math.princeton.edu/pub/user/ingrid/interval-tables.
The right boundary wavelets satisfy similar equations. The coe cients
left
right
t
Gleft, gn m, Gright, gn m are computed so that f jinng0 n<2;j is an ornl
nl
thonormal basis of Wjint. Table 7.5 gives the values of these coe cients
for p = 2.
For any 2J (2p);1 the multiresolution properties prove that
int
L2 0 1] = VJ J
int
j =;1 Wj which implies that
f int g
J n 0 n<2;J f int
j ng;1<j J 0 n<2;j (7.220) is an orthonormal wavelet basis of L2 0 1]. The boundary wavelets, like
the inside wavelets, have p vanishing moments because polynomials of
int
degree p ; 1 are included in the space VJ . Figure 7.20 displays the
2p = 4 boundary scaling functions and wavelets. CHAPTER 7. WAVELET BASES 390
left (t)
0 2 right (t)
0 left (t)
1 right (t)
1 1.5 1 1
1 1 0.8
0.5 0.5 0.6 0 −1
0 0.5 1 1.5 2 left
0 (t) −0.5
0 2 0.4 0 0 0.2
1 2 −0.5
−3 3 left
1 (t) 1 −2 −1 0 right
0 (t) 0
−2 −1.5 −1 −0.5 0 right
1 (t) 2 2
1 1 1 0
0 0
−1
−2
0 0
−1 −1 0.5 1 1.5 2 −2
0 1 2 −2
−3 3 −1
−2 −1 0 −2
−2 −1.5 −1 −0.5 0 Figure 7.20: Boundary scaling functions and wavelets with p = 2 vanishing moments. Fast Discrete Algorithm For any f 2 L2 0 1] we denote aj n] = hf ijnt i and dj n] = hf jint i for 0 n 2;j :
n
n
Wavelet coe cients are computed with a cascade of convolutions identical to Theorem 7.7 as long as lters do not overlap the signal boundaries. A Daubechies lter h is considered here to have a support located
at ;p+1 p]. At the boundary, the usual Daubechies lters are replaced
by the boundary lters that relate the boundary wavelets and scaling
functions to the ner-scale scaling functions in (7.218) and (7.219). Theorem 7.17 (Cohen, Daubechies, Vial)
If 0 k < p aj k ] =
dj k ] =
If p k < 2;j ; p p;1
X
l=0
p;1 X
l=0 left
Hk l aj;1 l] + Gleft aj;1 l] +
kl aj k] = +1
X l=;1 X p+2k
m= p
p+2k X m= p hleft aj;1 m]
km
left
gk m aj;1 m]: h l ; 2k] aj;1 l] 7.5. WAVELET BASES ON AN INTERVAL
dj k] =
If ;p aj 2;j k<0 ;1
X + k] = l = ;p
;1 X dj 2;j + k] = l = ;p +1
X l=;1 391 g l ; 2k] aj;1 l]: right
Hk l aj;1 2;j+1 + l] + Gright aj;1 2;j+1 + l] +
kl X ;p;1 m=;p+2k+1
;p;1 X m=;p+2k+1 hright aj;1 2;j+1 + m]
km
right
gk m aj;1 2;j+1 + m]: This cascade algorithm decomposes aL into a discrete wavelet transform aJ fdj gL<j J ] with O(N ) operations. The maximum scale must
satisfy 2J (2p);1, because the number of boundary coe cients remains equal to 2p at all scales. The implementation is more complicated than the folding and periodic algorithms described in Sections
7.5.1 and 7.5.2, but does not require more computations. The signal aL
is reconstructed from its wavelet coe cients, by inverting the decomposition formula in Theorem 7.17. Theorem 7.18 (Cohen, Daubechies, Vial)
If 0 l p;1 aj;1 l] =
If p l 3p ; 2 aj;1 l] = p;1
X
k=0 p;1
X
k=(l;p)=2
p;1 X If 3p ; 1 l k=(l;p)=2
2;j+1 ; 3p aj;1 l] = +1
X k=;1 left
Hk l aj k] + hleft aj
kl k] + left
gk l d j k ] + p;1
X
k=0 Gleft dj k]:
kl +1
X k=;1
+1 X k=;1 h l ; 2k] aj k] + h l ; 2k] aj k] +
g l ; 2k] dj k]: +1
X k=;1 g l ; 2k] dj k]: CHAPTER 7. WAVELET BASES 392
If ;p ; 1 l ;3p + 1 aj;1 2;j+1 + l] = X (l+p;1)=2 k=;p
(l+p;1)=2 X k=;p If ;1 l aj;1 hright aj 2;j + k] +
kl
right
gk l dj 2;j + k] + +1
X k=;1
+1 X k=;1 h l ; 2k] aj 2;j + k] +
g l ; 2k] dj 2;j + k]: ;p 2;j+1 + l] = ;1
X k=;p right
Hk l aj 2;j + k] + ;1
X k=;p Gright dj 2;j + k]:
kl The original signal aL is reconstructed from the orthogonal wavelet
representation aJ fdj gL<j J ] by iterating these equations for L < j
J . This reconstruction is performed with O(N ) operations. 7.6 Multiscale Interpolations 2
Multiresolution approximations are closely connected to the generalized interpolations and sampling theorems studied in Section 3.1.3.
The next section constructs general classes of interpolation functions
from orthogonal scaling functions and derives new sampling theorems.
Interpolation bases have the advantage of easily computing the decomposition coe cients from the sample values of the signal. Section 7.6.2
constructs interpolation wavelet bases. 7.6.1 Interpolation and Sampling Theorems Section 3.1.3 explains that a sampling scheme approximates a signal by
its orthogonal projection onto a space UT and samples this projection
at intervals T . The space UT is constructed so that any function in
UT can be recovered by interpolating a uniform sampling at intervals
T . We relate the construction of interpolation functions to orthogonal
scaling functions and compute the orthogonal projector on UT . 7.6. MULTISCALE INTERPOLATIONS 393 We call interpolation function any such that f (t ; n)gn2Z is a
Riesz basis of the space U1 it generates, and which satis es
1 if n = 0 :
0 if n 6= 0 (n) = (7.221) Any f 2 U1 is recovered by interpolating its samples f (n): f (t) = +1
X n=;1 f (n) (t ; n): (7.222) Indeed, we know that f is a linear combination of the basis vector
f (t ; n)gn2Z and the interpolation property (7.221) yields (7.222).
The Whittaker sampling Theorem 3.1 is based on the interpolation
function
(t) = sin t : t
In this case, the space U1 is the set of functions whose Fourier transforms are included in ; ].
Scaling an interpolation function yields a new interpolation for a
di erent sampling interval. Let us de ne T (t) = (t=T ) and R UT = f 2 L2( ) with f (Tt) 2 U1 :
One can verify that any f 2 UT can be written
f (t) = +1
X n=;1 f (nT ) T (t ; nT ) : Scaling Autocorrelation We denote by (7.223) o an orthogonal scaling
function, de ned by the fact that f o(t ; n)gn2Z is an orthonormal
basis of a space V0 of a multiresolution approximation. Theorem 7.2
proves that this scaling function is characterized by a conjugate mirror
lter ho . The following theorem de nes an interpolation function from
the autocorrelation of o 302]. CHAPTER 7. WAVELET BASES 394 Theorem 7.19 Let o(t) = o(;t) and ho n] = ho ;n]. If j ^o(!)j = O((1 + j!j);1) then
(t) = Z +1
;1 o (u) o(u ; t) du = o ? o (t) (7.224) is an interpolation function. Moreover +1
t = X h n] (t ; n)
2
n=;1 with h n] = +1
X m=;1 (7.225) ho m] ho m ; n] = ho ? ho n]: (7.226) Proof 3 . Observe rst that
(n) = h o (t) o (t ; n)i = n]
which prove the interpolation property (7.221). To prove that f (t ;
n)gn2Z is a Riesz basis of the space U1 it generates, we verify the condition (7.10). The autocorrelation (t) = o ? o (t) has a Fourier transform
^(!) = j ^o (!)j2 . Condition (7.10) thus means that there exist A > 0
and B > 0 such that
+1
X^
8! 2 ; ] 1
j o (! ; 2k )j4 1 :
(7.227) B A k=;1 We proved in (7.19) that the orthogonality of a family f o (t ; n)gn2Z is
equivalent to 8! 2 ; +1
X k=;1 j ^o (! + 2k )j2 = 1: (7.228) The right inequality of (7.227) is therefore valid for A = 1. Let us prove
;1
the left inequality. Since j ^o (!)j = O((1 + j!j)P), one can verify that
there exists K > 0 such that for all ! 2 ; ], jkj>K j ^o (! +2k )j2 <
P
1=2, so (7.228) implies that K=;K j ^o (! +2k )j2 1=2. It follows that
k
K
X j ^o (! + 2k )j4 4(2K1 + 1)
k=;K 7.6. MULTISCALE INTERPOLATIONS 395 which proves (7.227) for B = 4(2K + 1).
Since o is a scaling function, (7.28) proves that there exists a conjugate mirror lter ho such that
1
p 2o +1
t = X h n] (t ; n):
o
o
2 n=;1 Computing (t) = o ? o (t) yields (7.225) with h n] = ho ? ho n]. Theorem 7.19 proves that the autocorrelation of an orthogonal scaling
function o is an interpolation function that also satis es a scaling
equation. One can design to approximate regular signals e ciently
by their orthogonal projection in UT. De nition 6.1 measures the regularity of f with a Lipschitz exponent, which depends on the di erence
between f and its Taylor polynomial expansion. The following proposition gives a condition for recovering polynomials by interpolating their
samples with . It derives an upper bound for the error when approximating f by its orthogonal projection in UT . Proposition 7.7 (Fix, Strang) Any polynomial q(t) of degree smaller
or equal to p ; 1 is decomposed into q(t) = +1
X n=;1 q(n) (t ; n) (7.229) ^
if and only if h(! ) has a zero of order p at ! = .
Suppose that this property is satis ed. If f has a compact support and
is uniformly Lipschitz
p then there exists C > 0 such that
8T > 0 kf ; PUT f k CT : (7.230) Proof 3 . The main steps of the proof are given, without technical detail.
Let us set T = 2j . One can verify that the spaces fVj = U2j gj 2Z de ne a
multiresolution approximation of L2 ( ). The Riesz basis of V0 required
by De nition 7.1 is obtained with = . This basis is orthogonalized by
Theorem 7.1 to obtain an orthogonal basis of scaling functions. Theorem
7.3 derives a wavelet orthonormal basis f j ng(j n)2Z2 of L2 ( ). R R CHAPTER 7. WAVELET BASES 396 Using Theorem 7.4, one can verify that has p vanishing moments
^
if and only if h(!) has p zeros at . Although is not the orthogonal
scaling function, the Fix-Strang condition (7.75) remains valid. It is thus
also equivalent that for k < p qk (t) = Z +1
X n=;1 nk (t ; n) is a polynomial of degree k. The interpolation property (7.222) implies
that qk (n) = nk for all n 2 so qk (t) = tk . Since ftk g0 k<p is a basis
for polynomials of degree p ; 1, any polynomial q(t) of degree p ; 1 can
^
be decomposed over f (t ; n)gn2Z if and only if h(!) has p zeros at .
We indicate how to prove (7.230) for T = 2j . The truncated family
of wavelets f l n gl j n2Z is an orthogonal basis of the orthogonal complement of U2j = Vj in L2 ( ). Hence R kf ; PU2j f k2 = j +1
XX
l=;1 n=;1 jhf 2
l nij : If f is uniformly Lipschitz , since has p vanishing moments, Theorem
6.3 proves that there exists A > 0 such that jWf (2l n 2l )j = jhf l nij A 2( +1=2)l : To simplify the argument we suppose that has a compact support,
although this is not required. Since f also has a compact support, one
can verify that the number of non-zero hf l ni is bounded by K 2;l for
some K > 0. Hence kf ; PU2j f k2 j
X
l=;1 K 2;l A2 2(2 +1)l K A2 22
1 ; 2; j which proves (7.230) for T = 2j . As long as
p, the larger the Lipschitz exponent the faster the
error kf ;PUT f k decays to zero when the sampling interval T decreases.
If a signal f is Ck with a compact support then it is uniformly Lipschitz
k, so Proposition 7.7 proves that kf ; PUT f k C T k . 7.6. MULTISCALE INTERPOLATIONS 397 Example 7.12 A cubic spline interpolation function is obtained from the linear spline scaling function o. The Fourier transform expression
(7.5) yields
4
^(!) = j ^o(!)j2 = 4 48 sin (!=2) :
(7.231)
! (1 + 2 cos2 (!=2))
Figure 7.21(a) gives the graph of , which has an in nite support but
exponential decay. With Proposition 7.7 one can verify that this interpolation function recovers polynomials of degree 3 from a uniform
sampling. The performance of spline interpolation functions for generalized sampling theorems is studied in 123, 335].
1 1 0.5 0.5 0 0 −5 0 5 −2 0 2 (a)
(b)
Figure 7.21: (a): Cubic spline interpolation function. (b): DeslaurierDubuc interpolation function of degree 3. Example 7.13 Deslaurier-Dubuc 155] interpolation functions of de- gree 2p ; 1 are compactly supported interpolation functions of minimal
size that decompose polynomials of degree 2p ; 1. One can verify
that such an interpolation function is the autocorrelation of a scaling function o. To reproduce polynomials of degree 2p ; 1, Propo^
sition 7.7 proves that h(!) must have a zero of order 2p at . Since
^
^
^
h n] = ho ? ho n] it follows that h(!) = jho(!)j2, and hence ho(!) has a
zero of order p at . Daubechies's Theorem 7.5 designs minimum size
conjugate mirror lters ho which satisfy this condition. Daubechies lters ho have 2p non-zero coe cients and the resulting scaling function
o has a support of size 2p ; 1. The autocorrelation is the DeslaurierDubuc interpolation function, whose support is ;2p + 1 2p ; 1]. CHAPTER 7. WAVELET BASES 398 For p = 1, o = 1 0 1] and is the piecewise linear tent function
whose support is ;1 1]. For p = 2, the Deslaurier-Dubuc interpolation
function is the autocorrelation of the Daubechies 2 scaling function,
shown in Figure 7.10. The graph of this interpolation function is in
Figure 7.21(b). Polynomials of degree 2p ; 1 = 3 are interpolated by
this function.
The scaling equation (7.225) implies that any autocorrelation lter
veri es h 2n] = 0 for n 6= 0. For any p 0, the non-zero values of the
resulting lter are calculated from the coe cients of the polynomial
(7.173) that is factored to synthesize Daubechies lters. The support
of h is ;2p + 1 2p ; 1] and h 2n + 1] = (;1)p;n Q2p;1(k ; p + 1=2) k=0
(n + 1=2) (p ; n ; 1)! (p + n)! for ;p n < p:
(7.232) Dual Basis If f 2 UT then it is approximated by its orthogonal pro=
jection PU f on UT before the samples at intervals T are recorded. This
orthogonal projection is computed with a biorthogonal basis f ~T (t ;
nT )gn2Z, which is calculated by the following theorem 75].
T Theorem 7.20 Let be an interpolation function. We de ne ~ to be the function whose Fourier transform is ^(!)
:
2
^
k=;1 j (! + 2k )j b(!) = P+1
~ (7.233) Let ~T (t) = T ;1 ~(T ;1 t). Then the family f ~T (t ; nT )gn2Z is the
biorthogonal basis of f T (t ; nT )gn2Z in UT .
Proof 3 . Let us set T = 1. Since b(!) = a(!) ^(!)
~
^ Z (7.234) where a(!) 2 L2 ; ] is 2 periodic, we derive as in (7.12) that ~ 2 U1
^
and hence that ~(t ; n) 2 U1 for any n 2 . A dual Riesz basis is unique 7.6. MULTISCALE INTERPOLATIONS 399 Z and characterized by biorthogonality relations. Let (t) = (;t). For
all (n m) 2 2, we must prove that
h (t ; n) ~(t ; m)i = ~ ? (n ; m) = n ; m]:
(7.235) ~
Since the Fourier transform of ~? (t) is b(!) ^ (!), the Fourier transform
of the biorthogonality conditions (7.235) yields
+1
Xb
~(! + 2k k=;1 ) ^ ( ! + 2 k ) = 1: ~
This equation is clearly satis ed for b de ned by (7.233). The family
f ~(t ; n)gn2Z is therefore the dual Riesz basis of f (t ; n)gn2Z. The
extension for any T > 0 is easily derived. Figure 7.22 gives the graph of the cubic spline ~ associated to the cubic
spline interpolation function. The orthogonal projection of f over UT
is computed by decomposing f in the biorthogonal bases PUT f (t) = +1
X n=;1 hf (u) ~T (u ; nT )i T (t ; nT ): (7.236) Let ~T (t) = ~T (;t). The interpolation property (7.221) implies that
PUT f (nT ) = hf (u) ~T (u ; nT )i = f ? ~T (nT ):
(7.237) This discretization of f through a projection onto UT is therefore obtained by a ltering with ~T followed by a uniform sampling at intervals
T . The best linear approximation of f is recovered with the interpolation formula (7.236). 7.6.2 Interpolation Wavelet Basis 3 R An interpolation function can recover a signal f from a uniform sampling ff (nT )gn2Z if f belongs to an appropriate subspace UT of L2( ).
Donoho 162] has extended this approach by constructing interpolation
wavelet bases of the whole space of uniformly continuous signals, with
the sup norm. The decomposition coe cients are calculated from sample values instead of inner product integrals. CHAPTER 7. WAVELET BASES 400
1
0.5
0
−10 −5 0 5 10 Figure 7.22: The dual cubic spline ~(t) associated to the spline interpolation function (t) shown in Figure 7.21(a). Subdivision Scheme Let be an interpolation function, which is
the autocorrelation of an orthogonal scaling function o. Let j n(t) =
(2;j t ; n). The constant 2;j=2 that normalizes the energy of j n is not
added because we shall use a sup norm kf k1 = supt2R jf (t)j instead of
the L2( ) norm, and
k j nk1 = k k1 = j (0)j = 1:
We de ne the interpolation space Vj of functions R g= +1
X n=;1 a n] jn R where a n] has at most a polynomial growth in n. Since is an interpolation function, a n] = g(2j n). This space Vj is not included in L2( )
since a n] may not have a nite energy. The scaling equation (7.225)
implies that Vj+1 Vj for any j 2 . If the autocorrelation lter h
^
has a Fourier transform h(!) which has a zero of order p at ! = , then
Proposition 7.7 proves that polynomials of degree smaller than p ; 1
are included in Vj .
For f 2 Vj , we de ne a simple projector on Vj that interpolates
=
the dyadic samples f (2j n): Z PVj f (t) = +1
X n=;1 f (2j n) j (t ; 2j n): (7.238) This projector has no orthogonality property but satis es PVj f (2j n) =
f (2j n). Let C0 be the space of functions that are uniformly contin- 7.6. MULTISCALE INTERPOLATIONS 401 uous over R . The following theorem proves that any f 2 C0 can be
approximated with an arbitrary precision by PVj f when 2j goes to zero. Theorem 7.21 (Donoho) Suppose that has an exponential decay.
If f 2 C0 then
lim kf ; PVj f k1 = j!;1 sup jf (t) ; PVj f (t)j = 0:
lim
(7.239)
t2R
Proof 3 . Let !( f ) denote the modulus of continuity
!( f ) = sup sup jf (t + h) ; f (t)j:
(7.240)
jhj t2R
By de nition, f 2 C0 if lim !( f ) = 0.
!0
Any t 2 R can be written t = 2j (n + h) with n 2 Z and jhj 1.
Since PVj f (2j n) = f (2j n),
jf (2j (n + h)) ; PVj f (2j (n + h))j
jf (2j (n + h)) ; f (2j n)j
+ jPVj f (2j (n + h)) ; PVj f (2j n)j
!(2j f ) + !(2j PVj f ):
The next lemma proves that !(2j PVj f ) C !(2j f ) where C is a
constant independent of j and f . Taking a sup over t = 2j (n + h)
j !;1 implies the nal result: sup jf (t) ; PVj f (t)j (1 + C ) !(2j f ) ! 0 when j ! ;1:
t2R Lemma 7.3 There exists C > 0 such that for all j 2 Z and f 2 C0
!(2j PVj f ) C !(2j f ): (7.241) Let us set j = 0. For jhj 1, a summation by parts gives PV 0 f (t + h) ; PV 0f (t) =
where h(t) = +1
X k=1 +1
X n=;1 (f (n + 1) ; f (n)) h(t ; n) ( (t + h ; k) ; (t ; k)) : CHAPTER 7. WAVELET BASES 402
Hence +1
X jPV 0f (t + h) ; PV 0 f (t)j sup jf (n +1) ; f (n)j
j h(t ; n)j: (7.242)
n2Z
n=;1
Since has an exponential decay, there exists a constant C such that
P1
if jhj 1 and t 2 R then +=;1 j h (t ; n)j C . Taking a sup over t
n
in (7.242) proves that !(1 PV 0 f ) C sup jf (n + 1) ; f (n)j C !(1 f ):
n2Z Scaling this result by 2j yields (7.241). Interpolation Wavelets The projection PVj f (t) interpolates the values f (2j n). When reducing the scale by 2, we obtain a ner interpolation PV j;1f (t) which also goes through the intermediate samples f (2j (n + 1=2)). This re nement can be obtained by adding \details" that compensate for the di erence between PVj f (2j (n + 1=2))
and f (2j (n + 1=2)). To do this, we create a \detail" space Wj that
provides the values f (t) at intermediate dyadic points t = 2j (n + 1=2).
This space is constructed from interpolation functions centered at these
locations, namely j;1 2n+1. We call interpolation wavelets
jn Observe that j n(t) = = j;1 2n+1 : (2;j t ; n) with (t) = (2t ; 1) :
The function is not truly a wavelet since it has no vanishing moment.
However, we shall see that it plays the same role as a wavelet in this deP1
composition. We de ne Wj to be the space of all sums +=;1 a n] j n.
n
The following theorem proves that it is a (non-orthogonal) complement
of Vj in Vj;1. Theorem 7.22 For any j 2 Z
Vj;1 = Vj Wj : 7.6. MULTISCALE INTERPOLATIONS 403 If f 2 Vj ;1 then f= +1
X n=;1 f (2j n) +1
X jn+ n=;1 dj n] jn with dj n] = f 2j (n + 1=2) ; PVj f 2j (n + 1=2) : (7.243) Proof 3 . Any f 2 Vj ;1 can be written f= +1
X n=;1 f (2j ;1 n) j ;1 n : The function f ; PVj f belongs to Vj ;1 and vanishes at f2j ngn2Z. It
can thus be decomposed over the intermediate interpolation functions
j ;1 2n+1 = j n : f (t) ; PVj f (t) = +1
X n=;1 dj n] j n (t) 2 Wj : This proves that Vj ;1 Vj Wj . By construction we know that
Vj;1 so Vj;1 = Vj Wj . Setting t = 2j;1(2n + 1) in this
formula also veri es (7.243). Wj Theorem 7.22 re nes an interpolation from a coarse grid 2j n to a ner
grid 2j;1n by adding \details" whose coe cients dj n] are the interpolation errors f (2j (n +1=2)) ; PVj f (2j (n +1=2)). The following theorem
de nes a interpolation wavelet basis of C0 in the sense of uniform convergence. Theorem 7.23 If f 2 C0 then
lim kf ;
m!+1
l!;1 m
X
n=;m f (2J n) Jn ; Jm
XX
j =l n=;m dj n] j nk1 = 0: (7.244) 404 CHAPTER 7. WAVELET BASES The formula (7.244) decomposes f into a coarse interpolation at
intervals 2J plus layers of details that give the interpolation errors on
successively ner dyadic grids. The proof is done by choosing f to be
a continuous function with a compact support, in which case (7.244) is
derived from Theorem 7.22 and (7.239). The density of such functions
in C0 (for the sup norm) allows us to extend this result to any f in C0.
We shall write
+1
J
+1
XJ
XX
f=
f (2 n) J n +
dj n] j n
n=;1
j =;1 n=;1
which means that f J ngn2Z f j ngn2Z j J ] is a basis of C0 . In L2(R ),
\biorthogonal" scaling functions and wavelets are formally de ned by
Z +1
J n) = hf ~ i =
f (2
f (t) ~J n(t) dt
Jn
;1
Z +1
dj n] = hf ~j ni =
f (t) ~j n(t) dt :
(7.245)
;1
Clearly ~J n(t) = (t ; 2J n). Similarly, (7.243) and (7.238) implies
that ~j n is a nite sum of Diracs. These dual scaling functions and
wavelets do not have a nite energy, which illustrates the fact that
f J ngn2Z f j ngn2Z j J ] is not a Riesz basis of L2 (R ).
^
If h(!) has p zeros at then one can verify that ~j n has p vanishing
moments. With similar derivations as in the proof of (6.21) in Theorem
6.4, one can show that if f is uniformly Lipschitz
p then there exists
A > 0 such that
jhf ~j nij = jdj n]j A 2 j :
A regular signal yields small amplitude wavelet coe cients at ne scales.
We can thus neglect these coe cients and still reconstruct a precise approximation of f . Fast Calculations The interpolating wavelet transform of f is calculated at scale 1 2j > N ;1 = 2L from its sample values ff (N ;1 n)gn2Z. 7.6. MULTISCALE INTERPOLATIONS 405 At each scale 2j , the values of f in between samples f2j ngn2Z are calculated with the interpolation (7.238):
+1
Xj
j (n + 1=2) =
PVj f 2
f (2 k) (n ; k + 1=2)
k=;1
+1
Xj
=
f (2 k) hi n ; k]
(7.246)
k=;1
where the interpolation lter hi is a subsampling of the autocorrelation
lter h in (7.226):
hi n] = (n + 1=2) = h 2n + 1]:
(7.247)
The wavelet coe cients are computed with (7.243): dj n] = f 2j (n + 1=2) ; PVj f 2j (n + 1=2) :
The reconstruction of f (N ;1n) from the wavelet coe cients is performed recursively by recovering the samples f (2j;1n) from the coarser
sampling f (2j n) with the interpolation (7.246) to which is added dj n].
If hi n] is a nite lter of size K and if f has a support in 0 1] then
the decomposition and reconstruction algorithms require KN multiplications and additions.
A Deslauriers-Dubuc interpolation function has the shortest support while including polynomials of degree 2p ; 1 in the spaces Vj .
The corresponding interpolation lter hi n] de ned by (7.247) has 2p
non-zero coe cients for ;p n < p, which are calculated in (7.232). If
p = 2 then hi 1] = hi ;2] = ;1=16 and hi 0] = hi ;1] = 9=16. Suppose
that q(t) is a polynomial of degree smaller or equal to 2p ; 1. Since
q = PVj q, (7.246) implies a Lagrange interpolation formula
+1
Xj
q 2j (n + 1=2) =
q(2 k) hi n ; k] :
k=;1
The Lagrange lter hi of size 2p is the shortest lter that recovers
intermediate values of polynomials of degree 2p ; 1 from a uniform
sampling. CHAPTER 7. WAVELET BASES 406 To restrict the wavelet interpolation bases to a nite interval 0 1]
while reproducing polynomials of degree 2p ; 1, the lter hi is modi ed
at the boundaries. Suppose that f (N ;1 n) is de ned for 0 n < N .
When computing the interpolation
+1
Xj
j (n + 1=2) =
PVj f 2
f (2 k) hi n ; k]
k=;1
if n is too close to 0 or to 2;j ; 1 then hi must be modi ed to ensure that
the support of hi n ; k] remains inside 0 2;j ; 1]. The interpolation
PVj f (2j (n +1=2)) is then calculated from the closest 2p samples f (2j k)
for 2j k 2 0 1]. The new interpolation coe cients are computed in
order to recover exactly all polynomials of degree 2p ; 1 324]. For
p = 2, the problem occurs only at n = 0 and the appropriate boundary
coe cients are
15
1
5
hi 0] = 16 hi ;1] = 16 hi ;2] = ;5 hi ;3] = 16 :
16
The symmetric boundary lter hi ;n] is used on the other side at n =
2;j ; 1. 7.7 Separable Wavelet Bases 1 To any wavelet orthonormal basis f j ng(j n)2Z2 of L2 (R ), one can associate a separable wavelet orthonormal basis of L2(R 2 ): n j1 n1 (x1 ) o j2 n2 (x2 ) (j1 j2 n1 n2 )2Z4 : (7.248) The functions j1 n1 (x1 ) j2 n2 (x2 ) mix information at two di erent scales
2j1 and 2j2 along x1 and x2 , which we often want to avoid. Separable multiresolutions lead to another construction of separable wavelet
bases whose elements are products of functions dilated at the same
scale. These multiresolution approximations also have important applications in computer vision, where they are used to process images
at di erent levels of details. Lower resolution images are indeed represented by fewer pixels and might still carry enough information to
perform a recognition task. 7.7. SEPARABLE WAVELET BASES 407 Signal decompositions in separable wavelet bases are computed with
a separable extension of the lter bank algorithm described in Section
7.7.3. Non-separable wavelets bases can also be constructed 78, 239]
but they are used less often in image processing. Section 7.7.4 constructs separable wavelet bases in any dimension, and explains the corresponding fast wavelet transform algorithm. 7.7.1 Separable Multiresolutions As in one dimension, the notion of resolution is formalized with orthogonal projections in spaces of various sizes. The approximation of
an image f (x1 x2 ) at the resolution 2;j is de ned as the orthogonal
projection of f on a space Vj2 that is included in L2(R 2 ). The space Vj2
is the set of all approximations at the resolution 2;j . When the resolution decreases, the size of Vj2 decreases as well. The formal de nition of
a multiresolution approximation fVj2gj2Z of L2(R 2 ) is a straightforward
extension of De nition 7.1 that speci es multiresolutions of L2(R ). The
same causality, completeness and scaling properties must be satis ed.
We consider the particular case of separable multiresolutions. Let
fVj gj 2Z be a multiresolution of L2 (R ). A separable two-dimensional
multiresolution is composed of the tensor product spaces
Vj2 = Vj Vj :
(7.249)
The space Vj2 is the set of nite energy functions f (x1 x2 ) that are
linear expansions of separable functions:
+1
X
f (x1 x2) =
a m] fm (x1 ) gm(x2) with fm 2 Vj gm 2 Vj :
m=;1
Section A.5 reviews the properties of tensor products. If fVj gj2Z is a
multiresolution approximation of L2(R ) then fVj2gj2Z is a multiresolution approximation of L2(R 2 ).
Theorem 7.1 demonstrates the existence of a scaling function such
that f j mgm2Z is an orthonormal basis of Vj . Since Vj2 = Vj Vj ,
Theorem A.3 proves that for x = (x1 x2) and n = (n1 n2 )
j
x2 ; 2j n2
2
(x) = j n1 (x1 ) j n2 (x2 ) = 1j x1 ; j2 n1
jn
2
2
2j
n2Z2 CHAPTER 7. WAVELET BASES 408 is an orthonormal basis of Vj2. It is obtained by scaling by 2j the
two-dimensional separable scaling function 2(x) = (x1) (x2 ) and
translating it on a two-dimensional square grid with intervals 2j . Example 7.14 Piecewise constant approximation Let Vj be the approximation space of functions that are constant on 2j m 2j (m + 1)]
for any m 2 Z. The tensor product de nes a two-dimensional piecewise
constant approximation. The space Vj2 is the set of functions that
are constant on any square 2j n1 2j (n1 + 1)] 2j n2 2j (n2 + 1)], for
(n1 n2) 2 Z2. The two dimensional scaling function is
2 (x) = (x1 ) (x2) = 1 if 0 x1 1 and 0 x2
0 otherwise 1: Example 7.15 Shannon approximation Let Vj be the space of functions whose Fourier transforms have a support included in ;2;j 2;j ].
The space Vj2 is the set of functions whose two-dimensional Fourier
transforms have a support included in the low-frequency square ;2;j 2;j ]
;2;j 2;j ]. The two-dimensional scaling function is a perfect twodimensional low-pass lter whose Fourier transform is
;j
;j
^(!1) ^(!2) = 1 if j!1j 2 and j!2j 2 :
0 otherwise Example 7.16 Spline approximation Let Vj be the space of polynomial spline functions of degree p that are Cp;1, with nodes located
at 2;j m for m 2 Z. The space Vj2 is composed of two-dimensional
polynomial spline functions that are p ; 1 times continuously di erentiable. The restriction of f (x1 x2) 2 Vj2 to any square 2j n1 2j (n1 +
1)) 2j n2 2j (n2 + 1)) is a separable product q1 (x1)q2 (x2 ) of two polynomials of degree at most p. Multiresolution Vision An image of 512 by 512 pixels often in- cludes too much information for real time vision processing. Multiresolution algorithms process less image data by selecting the relevant
details that are necessary to perform a particular recognition task 62].
The human visual system uses a similar strategy. The distribution of 7.7. SEPARABLE WAVELET BASES 409 photoreceptors on the retina is not uniform. The visual acuity is greatest at the center of the retina where the density of receptors is maximum. When moving apart from the center, the resolution decreases
proportionally to the distance from the retina center 305]. Figure 7.23: Multiresolution approximations aj n1 n2] of an image at
scales 2j , for ;5 j ;8.
The high resolution visual center is called the fovea. It is responsible for high acuity tasks such as reading or recognition. A retina
with a uniform resolution equal to the highest fovea resolution would
require about 10,000 times more photoreceptors. Such a uniform resolution retina would increase considerably the size of the optic nerve
that transmits the retina information to the visual cortex and the size
of the visual cortex that processes this data.
Active vision strategies 76] compensate the non-uniformity of visual
resolution with eye saccades, which move successively the fovea over
regions of a scene with a high information content. These saccades
are partly guided by the lower resolution information gathered at the
periphery of the retina. This multiresolution sensor has the advantage
of providing high resolution information at selected locations, and a
large eld of view, with relatively little data.
Multiresolution algorithms implement in software 107] the search
for important high resolution data. A uniform high resolution image is
measured by a camera but only a small part of this information is processed. Figure 7.23 displays a pyramid of progressively lower resolution
images calculated with a lter bank presented in Section 7.7.3. Coarse CHAPTER 7. WAVELET BASES 410 to ne algorithms analyze rst the lower resolution image and selectively increase the resolution in regions where more details are needed.
Such algorithms have been developed for object recognition, and stereo
calculations 196]. Section 11.5.1 explains how to compute velocity
vectors in video sequences with a coarse to ne matching algorithm. 7.7.2 Two-Dimensional Wavelet Bases A separable wavelet orthonormal basis of L2(R 2 ) is constructed with
separable products of a scaling function and a wavelet . The scaling function is associated to a one-dimensional multiresolution approximation fVj gj2Z. Let fVj2gj2Z be the separable two-dimensional
multiresolution de ned by Vj2 = Vj Vj . Let Wj2 be the detail space
equal to the orthogonal complement of the lower resolution approximation space Vj2 in Vj2;1:
Vj2;1 = Vj2 Wj2 :
(7.250)
To construct a wavelet orthonormal basis of L2(R 2 ), the following theorem builds a wavelet basis of each detail space Wj2. Theorem 7.24 Let be a scaling function and be the corresponding
wavelet generating a wavelet orthonormal basis of L2(R ) . We de ne
three wavelets:
1 (x) = (x1) (x2 ) and denote for 1 k 2 (x) = (x1 ) (x2) 3 1k
k
j n(x) = 2j
The wavelet family 1 j n n2Z2 (7.252) j n (j n)2Z3 (7.253) 2 3 jn 2
is an orthonormal basis of Wj and
1 (x) = (x1 ) (x2)
(7.251) x1 ; 2j n1 x2 ; 2j n2 :
2j
2j jn jn 3 2 jn is an orthonormal basis of L2(R 2 ). 3 7.7. SEPARABLE WAVELET BASES 411 Proof 1 . Equation (7.250) is rewritten
Vj;1 Vj;1 = (Vj Vj ) Wj2 :
(7.254)
The one-dimensional multiresolution space Vj ;1 can also be decomposed
into Vj ;1 = Vj Wj . By inserting this in (7.254), the distributivity of
with respect to proves that
Wj2 = (Vj Wj ) (Wj Vj ) (Wj Wj ) :
(7.255)
Since f j m gm2Z and f j m gm2Z are orthonormal bases of Vj and Wj ,
we derive that
f j n1 (x1 ) j n2 (x2 ) j n1 (x1 ) j n2 (x2 ) j n1 (x1 ) j n2 (x2 )g(n1 n2)2Z2 2
is an orthonormal basis of Wj . As in the one-dimensional case, the
2 (R 2 ) can be decomposed as an orthogonal sum of the
overall space L
detail spaces at all resolutions:
1
L2(R2 ) = +=;1Wj2 :
(7.256)
j
Hence
f j n1 (x1 ) j n2 (x2 ) j n1 (x1 ) j n2 (x2 ) j n1 (x1 ) j n2 (x2 )g(j n1 n2)2Z3 is an orthonormal basis of L2(R2 ). The three wavelets extract image details at di erent scales and orientations. Over positive frequencies, ^ and ^ have an energy mainly
concentrated respectively on 0 ] and 2 ]. The separable wavelet
expressions (7.251) imply that
^1 (!1 !2) = ^(!1) ^(!2) ^2 (!1 !2) = ^(!1) ^(!2)
and ^3(!1 !2) = ^(!1 ) ^(!2). Hence j ^1(!1 !2)j is large at low horizontal frequencies !1 and high vertical frequencies !2 , whereas j ^2(!1 !2)j
is large at high horizontal frequencies and low vertical frequencies, and
j ^3 (!1 !2 )j is large at at high horizontal and vertical frequencies. Figure 7.24 displays the Fourier transform of separable wavelets and scaling functions calculated from a one-dimensional Daubechies 4 wavelet.
Wavelet coe cients calculated with 1 and 2 are large along edges
which are respectively horizontal and vertical. This is illustrated by
the decomposition of a square in Figure 7.26. The wavelet 3 produces
large coe cients at the corners. CHAPTER 7. WAVELET BASES 412
j ^2 (!1 !2 )j j ^1 (!1 !2 )j 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0
10 10
0
−10 −10 −5 0 5 10 0
−10 −10 j ^2 (!1 !2 )j −10 −10 −5 0 5 10 j ^3 (!1 !2 )j 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0
10 10
0
−10 −10 −5 0 5 10 0
−5 0 5 10 Figure 7.24: Fourier transforms of a separable scaling function and of
3 separable wavelets calculated from a one-dimensional Daubechies 4
wavelet.
ω2 ^3
ψj-1 ^3
ψj-1 ^1
ψj-1
^
ψj 3 ^3
ψj-1 1 ^
ψj ^
ψj 2 ^ 2j
φ ^
ψj ^
ψj ^2
ψj-1 ^
ψj ^
ψj1 ^
ψj 3 ^1
ψj-1 3 2 ^2
ψj-1 ω1 3 ^3
ψj-1 Figure 7.25: These dyadic rectangles indicate the regions where the
energy of ^jk n is mostly concentrated, for 1 k 3. Image approximations at the scale 2j are restricted to the lower frequency square. 7.7. SEPARABLE WAVELET BASES 413 Example 7.17 For a Shannon multiresolution approximation, the re- sulting two-dimensional wavelet basis paves the two-dimensional Fourier
plane (!1 !2) with dilated rectangles. The Fourier transforms ^ and
^ are the indicator functions respectively of ; ] and ;2 ; ]
2 ]. The separable space Vj2 contains functions whose two-dimensional
Fourier transforms have a support included in the low-frequency square
;2;j 2;j ] ;2;j 2;j ]. This corresponds to the support of
^2 n indicated in Figure 7.25. The detail space Wj2 is the orthogj
onal complement of Vj2 in Vj2;1 and thus includes functions whose
Fourier transforms have a support in the frequency annulus between
the two squares ;2;j 2;j ] ;2;j 2;j ] and ;2;j+1 2;j+1 ]
;2;j +1 2;j +1 ]. As shown in Figure 7.25, this annulus is decomposed
in three separable frequency regions, which are the Fourier supports of
^jk n for 1 k 3. Dilating these supports at all scales 2j yields an
exact cover of the frequency plane (!1 !2).
For general separable wavelet bases, Figure 7.25 gives only an indication of the domains where the energy of the di erent wavelets is concentrated. When the wavelets are constructed with a one-dimensional
wavelet of compact support, the resulting Fourier transforms have side
lobes that appear in Figure 7.24. Example 7.18 Figure 7.26 gives two examples of wavelet transforms computed using separable Daubechies wavelets with p = 4 vanishing
moments. They are calculated with the lter bank algorithm of Section 7.7.3. Coe cients of large amplitude in d1, d2 and d3 correspond
jj
j
respectively to vertical high frequencies (horizontal edges), horizontal
high frequencies (vertical edges), and high frequencies in both directions (corners). Regions where the image intensity varies smoothly yield
nearly zero coe cients, shown in grey. The large number of nearly zero
coe cients makes it particularly attractive for compact image coding. Separable Biorthogonal Bases One-dimensional biorthogonal wavelet
bases are extended to separable biorthogonal bases of L2(R 2 ) with the
same approach as in Theorem 7.24. Let , and ~, ~ be two dual pairs
of scaling functions and wavelets that generate biorthogonal wavelet CHAPTER 7. WAVELET BASES 414 2
aL+3 dL+3 2 1
3
dL+3 dL+3 dL+2 d1
L+2 3
dL+2 1 dL+1 2 dL+1 3 dL+1 Figure 7.26: Separable wavelet transforms of Lena and of a white square
in a black background, decomposed respectively on 3 and 4 octaves.
Black, grey and white pixels correspond respectively to positive, zero
and negative wavelet coe cients. The disposition of wavelet image
coe cients dk n m] = hf jk ni is illustrated at the top.
j 7.7. SEPARABLE WAVELET BASES 415 bases of L2(R ). The dual wavelets of 1, 2 and 3 de ned by (7.251)
are
~1 (x) = ~(x1 ) ~(x2 ) ~2(x) = ~(x1) ~(x2 ) ~3 (x) = ~(x1 ) ~(x2 ) :
(7.257)
One can verify that
1 jn and n ~1 jn 2 jn 3 j n (j n)2Z3 ~j2 n ~j3 n are biorthogonal Riesz bases of L2(R 2 ). o 2 (j n) Z3 (7.258)
(7.259) 7.7.3 Fast Two-Dimensional Wavelet Transform The fast wavelet transform algorithm presented in Section 7.3.1 is extended in two dimensions. At all scales 2j and for any n = (n1 n2 ), we
denote aj n] = hf 2 ni and dk n] = hf jk ni for 1 k 3 :
j
j
For any pair of one-dimensional lters y m] and z m] we write the
product lter yz n] = y n1] z n2 ], and y m] = y ;m]. Let h m] and
g m] be the conjugate mirror lters associated to the wavelet .
The wavelet coe cients at the scale 2j+1 are calculated from aj
with two-dimensional separable convolutions and subsamplings. The
decomposition formula are obtained by applying the one-dimensional
convolution formula (7.108) and (7.107) of Theorem 7.7 to the separable
two-dimensional wavelets and scaling functions for n = (n1 n2 ):
aj+1 n] = aj ? hh 2n]
(7.260)
1
dj+1 n] = aj ? hg 2n]
(7.261)
d2+1 n] = aj ? gh 2n]
(7.262)
j
3
dj+1 n] = aj ? gg 2n] :
(7.263)
We showed in (3.54) that a separable two-dimensional convolution can
be factored into one-dimensional convolutions along the rows and columns 416 CHAPTER 7. WAVELET BASES of the image. With the factorization illustrated in Figure 7.27(a), these
four convolutions equations are computed with only six groups of onedimensional convolutions. The rows of aj are rst convolved with h
and g and subsampled by 2. The columns of these two output images
are then convolved respectively with h and g and subsampled, which
gives the four subsampled images aj+1, d1+1, d2+1 and d3+1.
j
j
j
We denote by y n] = y n1 n2] the image twice the size of y n],
obtained by inserting a row of zeros and a column of zeros between pairs
of consecutive rows and columns. The approximation aj is recovered
from the coarser scale approximation aj+1 and the wavelet coe cients
dk+1 with two-dimensional separable convolutions derived from the onej
dimensional reconstruction formula (7.109) aj n] = aj+1 ? hh n] + d1+1 ? hg n] + d2+1 ? gh n] + d3+1 ? gg n] : (7.264)
j
j
j
These four separable convolutions can also be factored into six groups
of one-dimensional convolutions along rows and columns, illustrated in
Figure 7.27(b).
Let b n] be an input image whose pixels have a distance 2L = N ;1 .
2
We associate to b n] a function f (x) 2 VL approximated at the scale
L . Its coe cients aL n] = hf 2 i are de ned like in (7.116) by
2
Ln
(7.265)
b n] = N aL n] f (N ;1n) :
The wavelet image representation of aL is computed by iterating (7.2607.263) for L j < J : aJ fd1 d2 d3gL<j
jjj J : (7.266) The image aL is recovered from this wavelet representation by computing (7.264) for J > j L. Finite Image and Complexity When aL is a nite image of N 2 pixels, we face boundary problems when computing the convolutions
(7.260-7.264). Since the decomposition algorithm is separable along
rows and columns, we use one of the three one-dimensional boundary
techniques described in Section 7.5. The resulting values are decomposition coe cients in a wavelet basis of L2 0 1]2. Depending on the 7.7. SEPARABLE WAVELET BASES 417 Columns Rows
h h 2 2 dj+1 2 dj+1 g g a j+1 h 2 2 g aj 2 dj+1 1 2 3 (a)
Columns
a j+1 2 h 2 2 h 3 2 + 2 h + 2 g + aj g dj+1 Rows g 1 dj+1 2 dj+1 (b)
Figure 7.27: (a): Decomposition of aj with 6 groups of one-dimensional
convolutions and subsamplings along the image rows and columns. (b):
Reconstruction of aj by inserting zeros between the rows and columns
of aj+1 and dk+1, and ltering the output.
j CHAPTER 7. WAVELET BASES 418 boundary treatment, this wavelet basis is a periodic basis, a folded basis
or a boundary adapted basis.
The resulting images aj and dk have 2;2j samples. The images of
j
the wavelet representation (7.266) thus include a total of N 2 samples.
If h and g have size K , the reader can verify that 2K 2;2(j;1) multiplications and additions are needed to compute the four convolutions
(7.260-7.263) with the factorization of Figure 7.27(a). The wavelet
representation (7.266) is thus calculated with fewer than 8 KN 2 opera3
tions. The reconstruction of aL by factoring the reconstruction equation
(7.264) requires the same number of operations. Fast Biorthogonal Wavelet Transform The decomposition of an image in a biorthogonal wavelet basis is performed with the same fast
~~
wavelet transform algorithm. Let (h g) be the perfect reconstruction
lters associated to (h g). The inverse wavelet transform is computed
~~
by replacing the lters (h g) that appear in (7.264) by (h g). 7.7.4 Wavelet Bases in Higher Dimensions 2 Separable wavelet orthonormal bases of L2(R p ) are constructed for any
p 2, with a procedure similar to the two-dimensional extension. Let
be a scaling function and a wavelet that yields an orthogonal basis
of L2(R ). We denote 0 = and 1 = . To any integer 0
< 2p
written in binary form = 1 : : : p we associate the p-dimensional
functions de ned in x = (x1 : : : xp) by
(x) = 1 (x1 ) : : : n (xp ) For = 0, we obtain a p-dimensional scaling function
0 (x) = (x1 ) : : : (xp): Non-zero indexes correspond to 2p ; 1 wavelets. At any scale 2j and
for n = (n1 : : : np) we denote ;pj=2
j n (x) = 2 x1 ; 2j n1 : : : xp ; 2j np :
2j
2j 7.7. SEPARABLE WAVELET BASES 419 Theorem 7.25 The family obtained by dilating and translating the
2p ; 1 wavelets for 6= 0 no
jn 1 <2p (j n)2Zp+1 (7.267) is an orthonormal basis of L2(R p ).
The proof is done by induction on p. It follows the same steps as
the proof of Theorem 7.24 which associates to a wavelet basis of L2(R )
a separable wavelet basis of L2(R 2 ). For p = 2, we verify that the basis
(7.267) includes 3 elementary wavelets. For p = 3, there are 7 di erent
wavelets. Fast Wavelet Transform Let b n] be an input p-dimensional dis- crete signal sampled at intervals N ;1 = 2L. We associate to b n] an ap0
proximation f at the scale 2L whose scaling coe cients aL n] = hf L ni
satisfy
b n] = N p=2 aL n] f (N ;1n) :
The wavelet coe cients of f at scales 2j > 2L are computed with
separable convolutions and subsamplings along the p signal dimensions.
We denote
aj n] = hf j0 ni and dj n] = hf j ni for 0 < < 2p :
The fast wavelet transform is computed with lters that are separable products of the one-dimensional lters h and g. The separable
p-dimensional low-pass lter is
h0 n] = h n1 ] : : : h np] :
Let us denote u0 m] = h m] and u1 m] = g m]. To any integer =
1 : : : p written in a binary form, we associate a separable p-dimensional
band-pass lter
g n] = u 1 n1] : : : u p np]:
Let g n] = g ;n]. One can verify that
aj+1 n] = aj ? h0 2n]
(7.268)
(7.269)
dj+1 n] = aj ? g 2n] : CHAPTER 7. WAVELET BASES 420 We denote by y n] the signal obtained by adding a zero between
any two samples of y n] that are adjacent in the p-dimensional lattice
n = (n1 : : : np). It doubles the size of y n] along each direction. If
y n] has M p samples, then y n] has (2M )p samples. The reconstruction
is performed with
2p ;1
X
0
(7.270)
aj n] = aj+1 ? h n] + dj+1 ? g n] :
=1 The 2p separable convolutions needed to compute aj and fdj g1 2p
as well as the reconstruction (7.270) can be factored in 2p+1 ; 2 groups
of one-dimensional convolutions along the rows of p-dimensional signals. This is a generalization of the two-dimensional case, illustrated
in Figures 7.27. The wavelet representation of aL is
fdj g1 <2p L<j J aJ : (7.271) It is computed by iterating (7.268) and (7.269) for L j < J . The
reconstruction of aL is performed with the partial reconstruction (7.270)
for J > j L.
If aL is a nite signal of size N p, the one-dimensional convolutions
are modi ed with one of the three boundary techniques described in
Section 7.5. The resulting algorithm computes decomposition coe cients in a separable wavelet basis of L2 0 1]p. The signals aj and
dj have 2;pj samples. Like aL, the wavelet representation (7.271) is
composed of N p samples. If the lter h has K non-zero samples then
the separable factorization of (7.268) and (7.269) requires pK 2;p(j;1)
multiplications and additions. The wavelet representation (7.271) is
thus computed with fewer than p(1 ; 2;p);1KN p multiplications and
additions. The reconstruction is performed with the same number of
operations. 7.8 Problems
7.1. 1 . Let h be a conjugate mirror lter associated to a scaling function 7.8. PROBLEMS 421 ^
(a) Prove that if h(!) has a zero of order p at then ^(l) (2k ) = 0
for any k 2 Z ; f0g and lP p.
<
R +1
1
(b) Derive that if q < p then +=;1 nq (n) = ;1 tq (t) dt.
n
P1
7.2. 1 Prove that +=;1 (t ; n) = 1 if is an orthogonal scaling
n
function.
7.3. 1 Let m be the Battle-Lemarie scaling function of degree m dened in (7.23). Let be the Shannon scaling function de ned by
^ = 1 ; ]. Prove that lim k m ; k = 0.
m!+1
1 Suppose that h n] is non-zero only for 0
7.4.
nP K . We denote
<
p
;
m n] = 2 h n]. The scaling equation is (t) = K=01 m n] (2t ;
n
n).
(a) Suppose that K = 2. Prove that if t is a dyadic number that
can be written in binary form with i digits: t = 0: 1 2 i ,
with k 2 f0 1g, then (t) is the product
(t) = m 0 ] m 1 ] m i] (0) : (b) For K = 2, show that if m 0] = 4=3 and m 1] = 2=3 then
(t) is singular at all dyadic points. Verify numerically with
WaveLab that the resulting scaling equation does not de ne
a nite energy function .
(c) Show that one can nd two matrices M 0] and M 1] such that
the K -dimensional vector (t) = (t) (t + 1) : : : (t + K ;
1)]T satis es
(t) = M 0] (2t) + M 1] (2t ; 1) :
(d) Show that one can compute (t) at any dyadic number t =
0: 1 2 i with a product of matrices:
(t) = M 0 ] M 1 ]
7.5. 1 Let us de ne
k+1 (t) = with 0 p 2 +1
X n=;1 M i]
h n] k (2t ; n) = 1 0 1] , and ak n] = h k (t) k (t ; n)i . (0) : (7.272) CHAPTER 7. WAVELET BASES 422
(a) Let 1^
^
P f^(!) = 2 jh( ! )j2 f^( ! ) + jh( ! + )j2 f^( ! + ) :
2
2
2
2
Prove that ak+1 (!) = P ak (!).
^
^ (b) Prove that if there exists such that limQ +1 k k ; k = 0
k!
1
^
then 1 is an eigenvalue of P and ^(!) = +=1 2;1=2 h(2;p !).
p
What is the degree of freedom on 0 in order to still converge
to the same limit ?
(c) Implement in Matlab the computations of k (t) for the Daubechies
conjugate mirror lter with p = 6 zeros at . How many iterations are needed to obtain k k ; k < 10;4 ? Try to improve
the rate of convergence by modifying 0 .
1 Let b n] = f (N ;1 n) with 2L = N ;1 and f 2 V . We want to
7.6.
L
recover aL n] = hf L ni from b n] to compute the wavelet coe cients of f with Theorem 7.7.
(a) Let L n] = 2;L=2 (2;L n). Prove that b n] = aL ? L n].
(b) Prove that if there exists C > 0 such that for all ! 2 ; ]
+1
^d (!) = X ^(! + 2k ) C k=;1 then aL can be calculated from b with a stable lter ;1 n].
L
(c) If is a cubic spline scaling function, compute numerically
;1 n]. For a given numerical precision, compare the number
L
of operations needed to compute aL from b with the number
of operations needed to compute the fast wavelet transform of
aL .
(d) Show that calculating aL from b is equivalent to performing a
change of basis in VL , from a Riesz interpolation basis to an
orthonormal basis.
1 Quadrature mirror lters We de ne a multirate lter bank with
7.7.
four lters h, g, ~ , and g , which decomposes a signal a0 n]
h
~
a1 n] = a0 ? h 2n] d1 n] = a0 ? g 2n]:
Using the notation (7.106), we reconstruct
~
~
a0 n] = a1 ? h n] + d1 ? g n]:
~ 7.8. PROBLEMS 423 (a) Prove that a0 n] = a0 n ; l] if
~
^
g(!) = h(! + )
^ b
~
^
h(!) = h(!) b
~
b(!) = ;h(! + )
g
~ and h satis es the quadrature mirror condition
^
^
h2 (!) ; h2 (! + ) = 2 e;il! :
(b) Show that l is necessarily odd.
(c) Verify that the Haar lter (7.51) is a quadrature mirror lter
(it is the only nite impulse response solution).
1 Let f be a function of support 0 1], that is equal to di erent
7.8.
polynomials of degree q on the intervals f k k+1 ]g0 k<K , with
be a Daubechies wavelet with p vanish0 = 0 and K = 1. Let
ing moments. Depending on p, compute the number of non-zero
wavelet coe cients hf j ni. How should we choose p to minimize
this number?
7.9. 1 Let be a box spline of degree m obtained by m +1 convolutions
of 1 0 1] with itself.
(a) Prove that
X
1 m+1
(t) = m! (;1)k m + 1 ( t ; k]+ )m
k
k=0 where x]+ = max(x 0). Hint: write 1 0 1] = 1 0 +1) ; 1(1 +1) .
(b) Let Am and Bm be the Riesz bounds of f (t ; n)gn2Z. With
Proposition 7.1, prove that limm!+1 Bm = +1. Compute
numerically Am and Bm for m 2 f0 : : : 5g, with Matlab.
1 Prove that if f
7.10.
j n g(j P Z2 is an orthonormal basis of L2 (R)
n)2
1
then for all ! 2 R ; f0g +=;1 j ^(2j !)j2 = 1. Find an example
j
showing that the converse is not true.
7.11. 2 Let us de ne
or 4 j!j 4 + 4 =7
^(!) = 1 if 4 =7 j!j
0 otherwise
Prove that f j n g(j n)2Z2 is an orthonormal basis of L2 (R). Prove
that is not associated to a scaling function that generates a
multiresolution approximation. CHAPTER 7. WAVELET BASES 424
7.12. Express the Coi et property (7.104) as an equivalent condition
^
on the conjugate mirror lter h(ei! ).
7.13. 1 Prove that (t) has p vanishing moments if and only if for all
j > 0 the discrete wavelets j n] de ned in (7.145) have p discrete
vanishing moments
1 +1
X n=;1 7.14. nk j n] = 0 for 0 k < p: Let (t) be a compactly supported wavelet calculated with
0
Daubechies conjugate mirror lters (h g). Let j n(t) = 2;j=2 0 (2;j t;
n) be the derivative wavelets.
(a) Verify that h1 and g1 de ned by
^
^
h1 (!) = 2 h(!) (ei! ; 1);1 g1 (!) = 2 (ei! ; 1) g(!)
^
^
1 are nite impulse response lters.
(b) Prove that the Fourier transform of 0 (t) can be written ; Y^
^ p 1 +1 h1 (2;p !)
b0 (!) = g1 (2 !)
p
2 p=2
2 : (c) Describe a fast lter bank algorithm to compute the derivative
0
wavelet coe cients hf j ni 95].
7.15. 2 Let (t) be a compactly supported wavelet calculated with
^
^
Daubechies conjugate mirror lters (h g). Let ha (!) = jh(!)j2 .
^a (!) = ^(!) ha (!=4 ; =2) is an almost analytic
^
We verify that
wavelet.
(a) Prove that a is a complex wavelet such that Real a ] = .
(b) Compute a (!) in Matlab for a Daubechies wavelet with
four vanishing moments. Explain why a (!) 0 for ! < 0.
a
(c) Let j n(t) = 2;j=2 a (2;j t ; n). Using the fact that ;
;
;
^ p 1 ^ p 2 ^ ;2 ; ;1 2 +1 ^ p k
^a (!) = g(2 !) h(2 !) jh(2 !p 2 )j Y h(2 !)
2
2
2
2
k=3
show that we can modify the fast wavelet transform algorithm
a
to compute the \analytic" wavelet coe cients hf j ni by inserting a new lter. 7.8. PROBLEMS 425 (d) Let be the scaling function associated to . We de ne separable two-dimensional \analytic" wavelets by:
1 (x) = a (x1 ) (x2 ) 2 (x) = (x1 ) a (x2 ) 4
(x) = a (x1 ) a (x2 )
(x) = a (x1 ) a (;x2 ) :
k
Let j n(x) = 2;j k (2;j x ; n) for n 2 Z2. Modify the separable wavelet lter bank algorithm of Section 7.7.3 to compute
k
the \analytic" wavelet coe cients hf j ni.
k
(e) Prove that f j n g1 k 4 j 2Z n2Z2 is a frame of the space of real
functions f 2 L2(R2 ) 95].
7.16. 2 Multiwavelets We de ne the following two scaling functions:
3 (t) = 1 (2t) + 1 (2t ; 1)
1 (2t) + (2t ; 1) ; (2t) + (2t ; 1)
2 (t) =
2
1
1
22
(a) Compute the functions 1 and 2 . Prove that f 1 (t;n) 2 (t;
n)gn2Z is an orthonormal basis of a space V0 that will be
speci ed.
(b) Find 1 and 2 with a support on 0 1] that are orthogonal
to each other and to 1 and 2 . Plot these wavelets. Verify
that they have 2 vanishing moments and that they generate
an orthonormal basis of L2 (R).
7.17. 2 Let f fold be the folded function de ned in (7.210).
(a) Let (t) (t) 2 L2 (R) be two functions that are either symmetric or antisymmetric about t = 0. If h (t) (t + 2k)i = 0
and h (t) (2k ; t)i = 0 for all k 2 Z, then prove that
1 Z 0 1 fold (t) fold (t) dt = 0 : (b) Prove that if , ~, , ~ are either symmetric or antisymmetric
with respect to t = 1=2 or t = 0, and generate biorthogonal
bases of L2(R), then the folded bases (7.212) and (7.213) are
biorthogonal bases of L2 0 1]. Hint: use the same approach
as in Theorem 7.16.
1 A recursive lter has a Fourier transform that is a ratio of
7.18.
trigonometric polynomials as in (2.31). CHAPTER 7. WAVELET BASES 426 (a) Let p n] = h ? h n] with h n] = h ;n]. Verify that if h is a
recursive conjugate P
mirror lter then p(!) + p(! + ) = 2 and
^
^
;
there exists r(!) = K=01 r k] e;ik! such that
^
k jr j2
^
(7.273)
p(!) = jr(!)j22+ (j!()! + )j2 :
^
^
r
^
(b) Suppose that K is even and that r K=2 ; 1 ; k] = r K=2 + k].
Verify that r )j2
^
p(!) = 2 jr (!)j+(!(! + )j2 :
^
(7.274)
^
r
^
^
(c) If r(!) = (1 + e;i! )K ;1 with K = 6, compute h(!) with
^ the factorization (7.274), and verify that it is a stable lter
(Problem 3.8). Compute numerically and plot with WaveLab
the graph of the corresponding wavelet (t).
1 Balancing Suppose that h, h de ne a pair of perfect reconstruc~
7.19.
tion lters satisfying (7.129).
(a) Prove that
1~
1
~
~
hnew n] = 2 h n]+h n;1] hnew n] = 2 h n] + h n ; 1]
de nes a new pair of perfect reconstruction lters. Verify that
b
^
~
hnew (!) and hnew (!) have respectively 1 more and 1 less zero
b
^
~
at than h(!) and h(!) 68].
^
(b) The Deslauriers-Dubuc lters are h(!) = 1 and
1
b
~
h(!) = 16 (;e;3i! + 9 e;i! + 16 + 9 ei! ; e3i! ) :
~
Compute hnew and hnew as well as the corresponding biorthogonal wavelets new , ~new , after one balancing and after a second balancing.
1 Lifting The lter (7.192) is calculated by lifting lazy lters.
7.20.
Find a dual lifting that produces a lifted lter with a support
b
~
of size 9 so that hl (!) has 2 zeros at . Compute the resulting
lifted wavelets and scaling functions. Implement in WaveLab
the corresponding fast wavelet transform and its inverse with the
polyphase decomposition illustrated in Figure 7.16. 7.8. PROBLEMS
7.21. 427 For a Deslaurier-Dubuc interpolation wavelet of degree 3, compute the dual wavelet ~ in (7.245), which is a sum of Diracs. Verify
that it has 4 vanishing moments.
7.22. 1 Prove that a Deslaurier-Dubuc interpolation function of degree
2p ; 1 converges to a sinc function when p goes to +1.
7.23. 2 Let be an autocorrelation scaling function that reproduces
polynomials of degree p ; 1 as in (7.229). Prove that if f is uniformly Lipschitz then under the same hypotheses as in Theorem
7.21, there exists K > 0 such that
1 kf ; PVj f k1 K 2 j :
Let (t) be an interpolation function that generates an interpolation wavelet basis of C0 (R). Construct a separable interpolation wavelet basis of the space C0(Rp ) of uniformly continuous
p-dimensional signals f (x1 : : : xp ). Hint: construct 2p ; 1 interpolation wavelets by appropriately translating (x1 )
(xp ).
7.25. 2 Fractional Brownian Let (t) be a compactly supported wavelet
with p vanishing moments that generates an orthonormal basis of
L2(R). The covariance of a fractional Brownian motion BH (t) is
given by (6.90).
(a) Prove that EfjhBH j nij2 g is proportional to 2j (2H +1) . Hint:
use Problem 6.13.
(b) Prove that the decorrelation between same scale wavelet coefcients increases when the number p of vanishing moments of
increases:
7.24. 1 EfhBH j n i hBH l m ig = O 2j (2H +1) jn ; mj2(H ;p) : (c) In two dimensions, synthesize \approximate" fractional Browk
~
nian motion images BH with wavelet coe cients hBH j n i
that are independent Gaussian random variables, whose variances are proportional to 2j (2H +2) . Adjust H in order to produce textures that look like clouds in the sky.
1 Image mosaic Let f n n ] and f n n ] be two images of
7.26.
012
112
N 2 pixels. We want to merge the center of f0 n1 n2 ] for N=4
n1 n2 < 3N=4 in the center of f1 . Compute in WaveLab the
wavelet coe cients of f0 and f1 . At each scale 2j and orientation
1 k 3, replace the 2;2j =4 wavelet coe cients corresponding CHAPTER 7. WAVELET BASES 428 7.27. 7.28. 7.29. 7.30. to the center of f1 by the wavelet coe cients of f0 . Reconstruct
an image from this manipulated wavelet representation. Explain
why the image f0 seems to be merged in f1 , without the strong
boundary e ects that are obtained when replacing directly the
pixels of f1 by the pixels of f0 .
2 Foveal vision A foveal image has a maximum resolution at the
center, with a resolution that decreases linearly as a function of
the distance to the center. Show that one can construct an approximate foveal image by keeping a constant number of non-zero
wavelet coe cients at each scale 2j . Implement this algorithm in
WaveLab. You may build a highly compact image code from such
an image representation.
1 High contrast We consider a color image speci ed by three color
channels: red r n], green g n], and blue b n]. The intensity image
(r + g + b)=3 averages the variations of the three color channels.
k
To create a high contrast image f , for each wavelet j n we set
k i to be the coe cient among hr k i, hg k i and hb k i,
hf j n
jn
jn
jn
which has the maximum amplitude. Implement this algorithm in
WaveLab and evaluate numerically its performance for di erent
types of multispectral images. How does the choice of a ect the
results?
2 Restoration Develop an algorithm that restores the sharpness
of a smoothed image by increasing the amplitude of wavelet coefcients. Find appropriate ampli cation functionals depending on
the scale and orientation of the wavelet coe cients, in order to
increase the image sharpness without introducing important artifacts. To improve the visual quality of the result, study the impact
of the wavelet properties: symmetry, vanishing moments and regularity.
3 Smooth extension Let f n] be an image whose samples are known
only over a domain D, which may be irregular and may include
holes. Design and implement an algorithm that computes the
~
wavelet coe cients of a smooth extension f of f over a square do~ from these. Choose wavelets
main that includes D, and compute f
with p vanishing moments. Set to zero all coe cients corresponding wavelets whose support do not intersect D, which is equivalent
~
to impose that f is locally a polynomial of degree p. The coefcients of wavelets whose support are in D are calculated from 7.8. PROBLEMS 429 f . The issue is therefore to compute the coe cients of wavelets
whose support intersect the boundary of D. You must guarantee that f~ = f on D as well as the numerical stability of your
extension. 430 CHAPTER 7. WAVELET BASES Chapter 8
Wavelet Packet and Local
Cosine Bases
Di erent types of time-frequency structures are encountered in complex
signals such as speech recordings. This motivates the design of bases
whose time-frequency properties may be adapted. Wavelet bases are
one particular family of bases that represent piecewise smooth signals
e ectively. Other bases are constructed to approximate di erent types
of signals such as highly oscillatory waveforms.
Orthonormal wavelet packet bases use conjugate mirror lters to divide the frequency axis in separate intervals of various sizes. A discrete
signal of size N is decomposed in more than 2N=2 wavelet packet bases
with a fast lter bank algorithm that requires O(N log2 N ) operations.
If the signal properties change over time, it is preferable to isolate di erent time intervals with translated windows. Local cosine
bases are constructed by multiplying these windows with cosine functions. Wavelet packet and local cosine bases are dual families of bases.
Wavelet packets segment the frequency axis and are uniformly translated in time whereas local cosine bases divide the time axis and are
uniformly translated in frequency.
431 432CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES 8.1 Wavelet Packets 2
8.1.1 Wavelet Packet Tree Wavelet packets were introduced by Coifman, Meyer and Wickerhauser
139] by generalizing the link between multiresolution approximations
and wavelets. A space Vj of a multiresolution approximation is decomposed in a lower resolution space Vj+1 plus a detail space Wj+1. This
is done by dividing the orthogonal basis f j (t ; 2j n)gn2Z of Vj into
two new orthogonal bases
f of Vj+1 and f j +1
j +1 (t ; 2 n)gn2Z j +1
j +1 (t ; 2 n)gn2Z of Wj+1: The decompositions (7.112) and (7.114) of j+1 and j+1 in the basis
f j (t ; 2j n)gn2Z are speci ed by a pair of conjugate mirror lters h n]
and
g n] = (;1)1;n h 1 ; n]:
The following theorem generalizes this result to any space Uj that
admits an orthogonal basis of functions translated by n2j , for n 2 Z. Theorem 8.1 (Coifman, Meyer, Wickerhauser) Let f j (t;2j n)gn2Z
be an orthonormal basis of a space Uj . Let h and g be a pair of conjugate mirror lters. De ne
0
j +1 (t) = +1
X n=;1 The family h n] j (t ; 2j n) f j0+1 (t ; 2j +1n)
is an orthonormal basis of Uj . and 1
j +1 (t) = n=;1 g n] j (t ; 2j n):
(8.1) j +1 n)g j +1 (t ; 2
1 +1
X n2Z Proof 2 . This proof is very similar to the proof of Theorem 7.3. The
main steps are outlined. The fact that f j (t ; 2j n)gn2Z is orthogonal
means that
+1
1 X ^ ! + 2k 2 = 1 :
(8.2)
2j k=;1 j
2j 8.1. WAVELET PACKETS 433 0
We derive from (8.1) that the Fourier transform of j +1 is +1
0
^j +1(!) = ^j (!) X h n] e;i2j n! = h(2j !) ^j (!):
^ n=;1 1
Similarly, the Fourier transform of j +1 is
1
^j +1(!) = g(2j !) ^j (!):
^ (8.3) (8.4) 1
0
Proving that f j +1 (t ; 2j +1 n)g and f j +1 (t ; 2j +1 n)gn2Z are two families
of orthogonal vectors is equivalent to showing that for l = 0 or l = 1
+1
X 2
l
^j +1 ! + 2jk+1 = 1:
(8.5)
2j +1 k=;1
2
These two families of vectors yield orthogonal spaces if and only if 1 1 2j +1 +1
X 0
1
^j +1 ! + 2jk+1 ^j +1 ! + 2jk+1 = 0:
2
2
k=;1 (8.6) 0
1
The relations (8.5) and (8.6) are veri ed by replacing ^j +1 and ^j +1 by
(8.3) and (8.4) respectively, and by using the orthogonality of the basis
(8.2) and the conjugate mirror lter properties
^
^
jh(!)j2 + jh(! + )j2 = 2
jg(!)j2 + jg(! + )j2 = 2
^
^
^
g (!) h (!) + g(! + ) h (! + ) = 0:
^^
^
0
1
To prove that the family f j +1 (t ; 2j +1 n) j +1 (t ; 2j +1 n)gn2Z generates the same space as f j (t ; 2j n)gn2Z, we must prove that for any
a n] 2 l2 (Z) there exist b n] 2 l2(Z) and c n] 2 l2(Z) such that
+1
X n=;1 a n] +1
+1
j n) = X b n] 0 (t;2j +1 n)+ X c n] 1 (t;2j +1 n):
j (t;2
j +1
j +1
n=;1
n=;1 (8.7)
To do this, we relate ^(!) and c(!) to a(!). The Fourier transform of
b
^
^
(8.7) yields
b
^
(8.8)
a(2j !) ^j (!) = ^(2j +1 !) ^j0+1 (!) + c(2j +1 !) ^j1+1(!):
^ 434CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
One can verify that
and ^
^
^(2j +1 !) = 1 a(2j !) h (2j !) + a(2j ! + ) h (2j ! + )
^
b
2^ c(2j +1 !) = 1 a(2j !) g (2j !) + a(2j ! + ) g (2j ! + )
^
^
^
^
2^ satisfy (8.8). Theorem 8.1 proves that conjugate mirror lters transform an orthogonal basis f j (t;2j n)gn2Z in two orthogonal families f j0+1(t;2j+1 n)gn2Z
and f j1+1(t ; 2j+1n)gn2Z. Let U0+1 and U1+1 be the spaces generated
j
j
by each of these families. Clearly U0+1 and U1+1 are orthogonal and
j
j U0+1 U1+1 = Uj :
j
j
Computing the Fourier transform of (8.1) relates the Fourier transforms
of j0+1 and j1+1 to the Fourier transform of j :
^j0+1(!) = h(2j !) ^j (!)
^ ^j1+1(!) = g(2j !) ^j (!):
^ (8.9) ^
Since the transfer functions h(2j !) and g(2j !) have their energy con^
centrated in di erent frequency intervals, this transformation can be
interpreted as a division of the frequency support of ^j . Binary Wavelet Packet Tree Instead of dividing only the approximation spaces Vj to construct detail spaces Wj and wavelet bases,
Theorem 8.1 proves that we can set Uj = Wj and divide these detail spaces to derive new bases. The recursive splitting of vector spaces
is represented in a binary tree. If the signals are approximated at
the scale 2L, to the root of the tree we associate the approximation
space VL. This space admits an orthogonal basis of scaling functions
f L (t ; 2L n)gn2Z with L(t) = 2;L=2 (2;Lt).
Any node of the binary tree is labeled by (j p), where j ; L 0 is
the depth of the node in the tree, and p is the number of nodes that
are on its left at the same depth j ; L. Such a tree is illustrated in
Figure 8.1. To each node (j p) we associate a space Wjp, which admits 8.1. WAVELET PACKETS 435 an orthonormal basis f jp(t ; 2j n)gn2Z, by going down the tree. At
0
0
the root, we have WL = VL and L = L. Suppose now that we
p and its orthonormal basis Bp = f p (t ;
have already constructed Wj
j
j
2j n)gn2Z at the node (j p). The two wavelet packet orthogonal bases
at the children nodes are de ned by the splitting relations (8.1):
+1
X
2p
h n] jp(t ; 2j n)
(8.10)
j +1 (t) =
n=;1
and
+1
X
2p+1
g n] jp(t ; 2j n):
(8.11)
j +1 (t) =
n=;1
Since f jp(t ; 2j n)gn2Z is orthonormal,
p
p+1
h n] = h j2+1(u) jp(u ; 2j n)i g n] = h j2+1 (u) jp(u ; 2j n)i: (8.12)
0 WL
0 W L+1 1 W L+1 1
3
0
W L+2 W L+2W 2 W L+2
L+2 Figure 8.1: Binary tree of wavelet packet spaces.
p
p
p+1
Theorem 8.1 proves that Bj2+1 = f j2+1(t ; 2j+1n)gn2Z and Bj2+1 =
p+1
f j2+1 (t ; 2j +1n)gn2Z are orthonormal bases of two orthogonal spaces
p
p+1
Wj2+1 and Wj2+1 such that
p
p+1
Wj2+1 Wj2+1 = Wjp:
(8.13)
This recursive splitting de nes a binary tree of wavelet packet spaces
where each parent node is divided in two orthogonal subspaces. Figure
8.2 displays the 8 wavelet packets jp at the depth j ; L = 3, calculated with the Daubechies lter of order 5. These wavelet packets are
frequency ordered from left to right, as explained in Section 8.1.2. 436CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES 0
3 (t) 1
3 0.4 2
3 0.2 (t) 3
3 0.4 0.3 (t) (t) 0.5 0.4
0.2 0.2
0 0 −0.2 −0.2 0.1
0
−0.1 −0.4 −0.4
20 40 4
3 −20 60 0 (t) 5
3 0.5 (t) 20 40 0 0 20 (t) −20 40 (t) 0.5 0 −40 20 7
3 0.5 −0.5
0 −20 6
3 0 −0.5
−20 −0.5
−40 20 0.5 0 0 0 −0.5
0 −0.5
−20 0 20 0 20 40 Figure 8.2: Wavelet packets computed with the Daubechies 5 lter, at
the depth j ; L = 3 of the wavelet packet tree, with L = 0. They are
ordered from low to high frequencies. W0 L
11
00 1
0
0 W L+2 11
00
11
00
2 W L+2 1 11 11
0 00 00
1 11 11
0 00 00
2
3
6 1
0
1
0
7 W L+3 W L+3 W L+3 W L+3 Figure 8.3: Example of admissible wavelet packet binary tree. 8.1. WAVELET PACKETS 437 Admissible Tree We call admissible tree any binary tree where each node has either 0 or 2 children, as shown in Figure 8.3. Let fji pig1 i I
be the leaves of an admissible binary tree. By applying the recursive
splitting (8.13) along the branches of an admissible tree, we verify that
0
the spaces fWjpii g1 i I are mutually orthogonal and add up to WL:
0
WL = I Wpi
i=1 ji : (8.14) The union of the corresponding wavelet packet bases
f pi
ji
ji (t ; 2 n)gn2Z 1 i I 0
thus de nes an orthogonal basis of WL = VL. Number of Wavelet Packet Bases The number of di erent wavelet
packet orthogonal bases of VL is equal to the number of di erent ad- missible binary1 trees. The following proposition proves that there are
more than 22J ; di erent wavelet packet orthonormal bases included in
a full wavelet packet binary tree of depth J . Proposition 8.1 The number BJ of wavelet packet bases in a full
wavelet packet binary tree of depth J satis es
22J ;1 BJ 2 4 2J ;1 :
5 (8.15) Proof 2 . This result is proved by induction on the depth J of the wavelet
packet tree. The number BJ of di erent orthonormal bases is equal to
the number of di erent admissible binary trees of depth at most J , whose
nodes have either 0 or 2 children. For J = 0, the tree is reduced to its
root so B0 = 1.
Observe that the set of trees of depth at most J + 1 is composed of
trees of depth at least 1 and at most J + 1 plus one tree of depth 0 that
is reduced to the root. A tree of depth at least 1 has a left and a right
subtree that are admissible trees of depth at most J . The con guration
of these trees is a priori independent and there are BJ admissible trees
of depth J so
2
BJ +1 = BJ + 1:
(8.16) 438CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
2
Since B1 = 2 and BJ +1 BJ , we prove by induction that BJ
Moreover
;
log2 BJ +1 = 2 log2 BJ + log2 (1 + BJ 2 ):
If J 1 then BJ 2 so
log2 BJ +1 2 log2 BJ + 1 :
4
Since B1 = 2, J ;1
X
log2 BJ +1 2J + 1 2j
4
j =0 so BJ 22J ;1 . (8.17) J 2J + 24 4
2 5 2J ;1 . For discrete signals of size N , we shall see that the wavelet packet tree is
at most of depth J = log2 N . This proposition proves that the number
of wavelet packet bases satis es 2N=2 Blog2 N 25N=8. Wavelet Packets on Intervals To construct wavelet packets bases
of L2 0 1], we use the border techniques developed in Section 7.5 to
design wavelet bases of L2 0 1]. The simplest approach constructs periodic bases. As in the wavelet case, the coe cients of f 2 L2 0 1] in a periodic wavelet packet basis are the same as the decomposition
P1
coe cients of f per(t) = +=;1 f (t + k) in the original wavelet packet
k
basis of L2(R ). The periodization of f often creates discontinuities at
the borders t = 0 and t = 1, which generate large amplitude wavelet
packet coe cients.
Section 7.5.3 describes a more sophisticated technique which modi es the lters h and g in order to construct boundary wavelets which
keep their vanishing moments. A generalization to wavelet packets is
obtained by using these modi ed lters in Theorem 8.1. This avoids
creating the large amplitude coe cients at the boundary, typical of the
periodic case. Biorthogonal Wavelet Packets Non-orthogonal wavelet bases are constructed in Section 7.4 with two pairs of perfect reconstruction l~~
ters (h g) and (h g) instead of a single pair of conjugate mirror lters. 8.1. WAVELET PACKETS 439 The orthogonal splitting Theorem 8.1 is extended into a biorthogonal
splitting by replacing the conjugate mirror lters with these perfect reconstruction lters. A Riesz basis f j (t ; 2j n)gn2Z of Uj is transformed
into two Riesz bases f j0+1(t ; 2j+1n)gn2Z and f j1+1(t ; 2j+1n)gn2Z of
two non-orthogonal spaces U0+1 and U1+1 such that
j
j U0+1 U1+1 = Uj :
j
j
A binary tree of non-orthogonal wavelet packet Riesz bases can be
derived by induction using this vector space division. As in the orthogonal case, the wavelet packets at the leaves of an admissible binary tree
0
de ne a basis of WL, but this basis is not orthogonal.
The lack of orthogonality is not a problem by itself as long as the
basis remains stable. Cohen and Daubechies proved 130] that when the
depth j ; L increases, the angle between the spaces Wjp located at the
same depth can become progressively smaller. This indicates that some
of the wavelet packet bases constructed from an admissible binary tree
become unstable. We thus concentrate on orthogonal wavelet packets
constructed with conjugate mirror lters. 8.1.2 Time-Frequency Localization Time Support If the conjugate mirror lters h and g have a nite impulse response of size K , Proposition 7.2 proves that has a support
0
of size K ; 1 so L = L has a support of size (K ; 1)2L. Since
+1
+1
X
X
2p
p (t ; 2j n)
2p+1
h n] j
g n] jp(t ; 2j n)
j +1 (t) =
j +1 (t) =
n=;1
n=;1
(8.18)
p is (K ; 1)2j .
an induction on j shows that the support size of j
The parameter j thus speci es the scale 2j of the support. The wavelet
packets in Figure 8.2 are constructed with a Daubechies lter of K = 10
coe cients with j = 3 and thus have a support of size 23(10 ; 1) = 72. Frequency Localization The frequency localization of wavelet packets is more complicated to analyze. The Fourier transform of (8.18) 440CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
proves that the Fourier transforms of wavelet packet children are related to their parent by
p
p+1
^j2+1(!) = h(2j !) ^jp(!) ^j2+1 (!) = g(2j !) ^jp(!) :
^
^
(8.19)
The energy of ^jp is mostly concentrated over a frequency band and
^
^
the two lters h(2j !) and g(2j !) select the lower or higher frequency
components within this band. To relate the size and position of this
frequency band to the indexes (p j ), we consider a simple example. Shannon Wavelet Packets Shannon wavelet packets are computed with perfect discrete low-pass and high-pass lters
p
^ (!)j = 2 if ! 2 ; =2 + 2k =2 + 2k ] with k 2 Z (8.20)
jh
0 otherwise
and
p
jg(! )j = 0 2 if ! 2 =2 + 2k 3 =2 + 2k ] with k 2 Z : (8.21)
^
otherwise
In this case it is relatively simple to calculate the frequency support of
the wavelet packets. The Fourier transform of the scaling function is
0
^L = ^L = 1 ;2;L 2;L ]:
(8.22)
^
Each multiplication with h(2j !) or g(2j !) divides the frequency sup^
port of the wavelet packets in two. The delicate point is to realize
^
that h(2j !) does not always play the role of a low-pass lter because
of the side lobes that are brought into the interval ;2;L 2;L ] by
the dilation. At the depth j ; L, the following proposition proves that
^jp is proportional to the indicator function of a pair of frequency intervals, that are labeled Ijk . The permutation that relates p and k is
characterized recursively 76]. Proposition 8.2 (Coifman, Wickerhauser) For any j ; L > 0 and
0 p < 2j;L, there exists 0 k < 2j;L such that
j ^jp (! )j = 2j=2 1I k (! )
j (8.23) 8.1. WAVELET PACKETS 441 where Ijk is a symmetric pair of intervals Ijk = ;(k + 1) 2;j ;k 2;j ] k 2;j (k + 1) 2;j ]:
The permutation k = G p] satis es for any 0 p < 2j;L
G 2p] = 2G p]] + 1
2G p
G 2p + 1] = 2G p]] + 1
2G p if G p] is even
if G p] is odd
if G p] is even
if G p] is odd (8.24)
(8.25)
(8.26) Proof 3 . The three equations (8.23), (8.25) and (8.26) are proved by
induction on the depth j ; L. For j ; L = 0, (8.22) shows that (8.23) is
valid. Suppose that (8.23) is valid for j = l L and any 0 p < 2l;L .
We rst prove that (8.25) and (8.26) are veri ed for j = l. From these
two equations we then easily carry the induction hypothesis to prove
that (8.23) is true for j = l + 1 and for any 0 p < 2l+1;L .
Equations (8.20) and (8.21) imply that ^
jh(2l !)j = jg(2l !)j =
^ p 2 if ! 2 ;2;l;1 (4m ; 1) 2;l;1 (4m + 1) ] with m (8.27)
2Z
0 otherwise
p
2 if ! 2 ;2;l;1 (4m + 1) 2;l;1 (4m + 3) ] with m (8.28)
2Z
0 otherwise Since (8.23) is valid for l, the support of ^lp is Ilk = ;(2k + 2) 2;l;1 ;2k 2;l;1 ] 2k 2;l;1 (2k + 2) 2;l;1 ]: The two children are de ned by
p
^l2+1 (!) = h(2l !) ^lp (!)
^ p+1
^l2+1 (!) = g (2l !) ^lp (!) :
^ We thus derive (8.25) and (8.26) by checking the intersection of Ilk with
^
the supports of h(2j !) and g (2j !) speci ed by (8.27) and (8.28).
^ For Shannon wavelet packets, Proposition 8.2 proves that ^jp has a
frequency support located over two intervals of size 2;j , centered at
(k + 1=2) 2;j . The Fourier transform expression (8.23) implies that 442CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
these Shannon wavelet packets can be written as cosine modulated
windows
h
i
p (t) = 2;j=2+1 (2;j t) cos 2;j (k + 1=2)(t ; )
(8.29)
jp
j
with (t) = sin( tt=2) and hence ^(!) = 1 ; =2 =2](!):
The translation parameter j p can be calculated from the complex
phase of ^jp. Frequency Ordering It is often easier to label
p k
j a wavelet packet
whose Fourier transform is centered at (k + 1=2) 2;j , with k =
j
G p]. This means changing its position in the wavelet packet tree from
the node p to the node k. The resulting wavelet packet tree is frequency
ordered. The left child always corresponds to a lower frequency wavelet
packet and the right child to a higher frequency one.
The permutation k = G p] is characterized by the recursive equations (8.25) and (8.26). The inverse permutation p = G;1 k] is called a
Gray code in coding theory. This permutation is implemented on binary
strings by deriving the following relations from (8.25) and (8.26). If pi
is the ith binary digit of the integer p and ki the ith digit of k = G p]
then
+1
X!
ki =
pl mod 2
(8.30)
and l=i pi = (ki + ki+1) mod 2: (8.31) Compactly Supported Wavelet Packets Wavelet packets of com- pact support have a more complicated frequency behavior than Shannon wavelet packets, but the previous analysis provides important in^
sights. If h is a nite impulse response lter, h does not have a support
restricted to ; =2 =2] over the interval ; ]. It is however true
^
that the energy of h is mostly concentrated in ; =2 =2]. Similarly,
the energy of g is mostly concentrated in ; ; =2]
^
=2 ], for
! 2 ; ]. As a consequence, the localization properties of Shannon 8.1. WAVELET PACKETS 443 wavelet packets remain qualitatively valid. The energy of ^jp is mostly
concentrated over
Ijk = ;(k + 1) 2;j ;k 2;j ] k 2;j (k + 1) 2;j ]
^
with k = G p]. The larger the proportion of energy of h in ; =2 =2],
p in I k . The energy concentration
the more concentrated the energy of ^j j
^
of h in ; =2 =2] is increased by having more zeroes at , so that
^
h(!) remains close to zero in ; ; =2] =2 ]. Theorem 7.4 proves
that this is equivalent to imposing that the wavelets constructed in the
wavelet packet tree have many vanishing moments.
These qualitative statements must be interpreted carefully. The side
lobes of ^jp beyond the intervals Ijk are not completely negligible. For
example, wavelet packets created with a Haar lter are discontinuous
functions. Hence j ^jp(!)j decays like j!j;1 at high frequencies, which
p
indicates the existence of large side lobes outside Ik . It is also important
to note that contrary to Shannon wavelet packets, compactly supported
wavelet packets cannot be written as dilated windows modulated by
cosine functions of varying frequency. When the scale increases, wavelet
packets generally do not converge to cosine functions. They may have
a wild behavior with localized oscillations of considerable amplitude. Walsh Wavelet Packets Walsh wavelet packets are generated by the Haar conjugate mirror lter
1
p
h n] = 0 2 if n = 0 1 :
otherwise
They have very di erent properties from Shannon wavelet packets since
the lter h is well localized in time but not in frequency. The corresponding scaling function is = 1 0 1] and the approximation space
0
VL = WL is composed of functions that are constant over the intervals
2Ln 2L(n + 1)), for n 2 Z. Since all wavelet packets created with this
lter belong to VL, they are piecewise constant functions. The support
size of h is K = 2, so Walsh functions jp have a support of size 2j . The
wavelet packet recursive relations (8.18) become
1p
1p
2p
j
(8.32)
j +1 (t) = p j (t) + p j (t ; 2 )
2
2 444CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
and 1
1
p jp (t) ; p jp (t ; 2j ):
(8.33)
2
2
Since jp has a support of size 2j , it does not intersect the support of
p
j
j (t ; 2 ). These wavelet packets are thus constructed by juxtaposing
p with its translated version whose sign might be changed. Figure 8.4
j
shows the Walsh functions at the depth j ; L = 3 of the wavelet packet
tree. The following proposition computes the number of oscillations of
p
j.
2p+1
j +1 (t) = Proposition 8.3 The support of a Walsh wavelet packet p
j is 0 2j ].
Over its support, jp (t) = 2;j=2 . It changes sign k = G p] times,
where G p] is the permutation de ned by (8.25) and (8.26).
Proof 2 . By induction on j , we derive from (8.32) and (8.33) that the
p
support is 0 2j ] and that j (t) = 2;j=2 over its support. Let k be the
p
2p
number of times that j changes sign. The number of times that j +1
2p+1
and j +1 change sign is either 2k or 2k + 1 depending on the sign of
p
the rst and last non-zero values of j . If k is even, then the sign of the
p are the same. Hence the number of
rst and last non-zero values of j
2p
2p+1
times j +1 and j +1 change sign is respectively 2k and 2k + 1. If k is
p
odd, then the sign of the rst and last non-zero values of j are di erent.
2p
2p+1
The number of times j +1 and j +1 change sign is then 2k + 1 and 2k.
These recursive properties are identical to (8.25) and (8.26). A Walsh wavelet packet jp is therefore a square wave with k = G p]
oscillations over a support of size 2j . This result is similar to (8.29),
which proves that a Shannon wavelet packet jp is a window modulated
by a cosine of frequency 2;j k . In both cases, the oscillation frequency
of wavelet packets is proportional to 2;j k. Heisenberg Boxes For display purposes, we associate to any wavelet
p packet j (t ; 2j n) a Heisenberg rectangle which indicates the time and
frequency domains where the energy of this wavelet packet is mostly
concentrated. The time support of the rectangle is set to be the same
as the time support of a Walsh wavelet packet jp(t ; 2j n), which is 8.1. WAVELET PACKETS
0
3 (t) 1
3 0.4 445 (t) 2
3 (t) 3
3 0.4 0.2 0 0 −0.2 0.1 0.2 0 0.2 −0.2 (t) 0.4 0.2 0.3 0.4 −0.2 0
0 2 4 4
3 6 8 −0.4 0 2 (t) 4 5
3 6 8 −0.4 0 2 (t) 4 6
3 6 8 −0.4 0 2 (t) 4 7
3 0.4 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 6 8 0 −0.2 (t) 0.2 0 8 0.4 0.2 6 −0.2 −0.4 0 2 4 6 8 −0.4 0 2 4 6 8 −0.4 0 2 4 6 8 −0.4 0 2 4 Figure 8.4: Frequency ordered Walsh wavelet packets computed with
a Haar lter, at the depth j ; L = 3 of the wavelet packet tree, with
L = 0. equal to 2j n 2j (n + 1)]. The frequency support of the rectangle is dened as the positive frequency support k 2;j (k + 1) 2;j ] of Shannon
wavelet packets, with k = G p]. The scale 2j modi es the time and
frequency elongation of this time-frequency rectangle, but its surface
remains constant. The indices n and k give its localization in time
and frequency. General wavelet packets, for example computed with
Daubechies lters, have a time and a frequency spread that is much
wider than this Heisenberg rectangle. However, this convention has
the advantage of associating a wavelet packet basis to an exact paving
of the time-frequency plane. Figure 8.5 shows an example of such a
paving and the corresponding wavelet packet tree.
Figure 8.6 displays the decomposition of a multi-chirp signal whose
spectrogram was shown in Figure 4.3. The wavelet packet basis is
computed with the Daubechies 10 lter. As expected, the coe cients of
large amplitude are along the trajectory of the linear and the quadratic
chirps that appear in Figure 4.3. We also see the trace of the two
modulated Gaussian functions located at t = 512 and t = 896. 446CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
ω t Figure 8.5: The wavelet packet tree on the left divides the frequency
axis in several intervals. The Heisenberg boxes of the corresponding
wavelet packet basis are on the right. f(t)
2
0
−2
0 t
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ω / 2π
250
200
150
100
50
0
0 t Figure 8.6: Wavelet packet decomposition of the multi-chirp signal
whose spectrogram is shown in Figure 4.3. The darker the gray level
of each Heisenberg box the larger the amplitude jhf jpij of the corresponding wavelet packet coe cient. 8.1. WAVELET PACKETS 447 8.1.3 Particular Wavelet Packet Bases Among the many wavelet packet bases, we describe the properties of
M-band wavelet bases, \local cosine type" bases and \best" bases. The
wavelet packet tree is frequency ordered, which means that jk has
a Fourier transform whose energy is essentially concentrated in the
interval k 2;j (k + 1) 2;j ], for positive frequencies. (a)
(b)
Figure 8.7: (a): Wavelet packet tree of a dyadic wavelet basis. (b):
Wavelet packet tree of an M-band wavelet basis with M = 2. M-band Wavelets The standard dyadic wavelet basis is an example
of a wavelet packet basis of VL, obtained by choosing the admissible binary tree shown in Figure 8.7(a). Its leaves are the nodes k = 1 at
all depth j ; L and thus correspond to the wavelet packet basis
f j1 (t ; 2j n)gn2Z j>L constructed by dilating a single wavelet
j (t) =
1 p1 j
2 1 1 : t:
2j The energy of ^1 is mostly concentrated in the interval ;2 ; ]
2 ]. The octave bandwidth for positive frequencies is the ratio between the bandwidth of the pass band and its distance to the zero
frequency. It is equal to 1 octave. This quantity remains constant by
dilation and speci es the frequency resolution of the wavelet transform.
Wavelet packets include other wavelet bases constructed with several wavelets having a better frequency resolution. Let us consider the 448CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
admissible binary tree of Figure 8.7(b), whose leaves are indexed by
k = 2 and k = 3 at all depth j ; L. The resulting wavelet packet basis
of VL is
f j2 (t ; 2j n) j3 (t ; 2j n)gn2Z j>L+1 :
These wavelet packets can be rewritten as dilations of two elementary
wavelets 2 and 3:
1 2t
1 3t:
3
2
j (t) = p j ;1
j (t) = p j ;1
2j;1
2j;1
2
2
Over positive frequencies, the energy of ^2 and ^3 is mostly concentrated respectively in 3 =2] and 3 =2 2 ]. The octave bandwidths
of ^2 and ^3 are thus respectively equal to 1=2 and 1=3. These wavelets
2
and 3 have a higher frequency resolution than 1, but their time
support is twice as large. Figure 8.8(a) gives a 2-band wavelet decomposition of the multi-chirp signal shown in Figure 8.6, calculated with
the Daubechies 10 lter.
Higher resolution wavelet bases can be constructed with an arbitrary
number of M = 2l wavelets. In a frequency ordered wavelet packet
tree, we de ne an admissible binary tree whose leaves are the indexes
2l k < 2l+1 at the depth j ; L > l. The resulting wavelet packet
basis
f jk (t ; 2j n)gM k<2M j>L+l
can be written as dilations and translations of M elementary wavelets
1 kt:
k
j (t) = p j ;l
2j;l
2
The support size of k is proportional to M = 2l . Over positive frequencies, the energy of ^k is mostly concentrated in k 2;l (k +1) 2;l]. The
octave bandwidth is therefore 2;l=(k 2;l ) = k;1, for M k < 2M .
The M wavelets f k gM k<2M have an octave bandwidth smaller than
M ;1 but a time support M times larger than the support of 1 . Such
wavelet bases are called M-band wavelets. More general families of Mband wavelets can also be constructed with other M-band lter banks
studied in 73]. 8.1. WAVELET PACKETS 449 ω / 2π
250
200
150
100
50
0
0 0.2 0.4 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 t ω / 2π
250
200
150
100
50
0
0 t (b)
Figure 8.8: (a): Heisenberg boxes of a 2-band wavelet decomposition
of the multi-chirp signal shown in Figure 8.6. (b): Decomposition of
the same signal in a pseudo-local cosine wavelet packet basis. 450CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Pseudo Local Cosine Bases Pseudo local cosine bases are constructed with an admissible binary tree which is a full tree of depth
J ; L 0. The leaves are the nodes indexed by 0 k < 2J ;L and the
resulting wavelet packet basis is
k
f J (t ; 2J n)gn2Z 0 k<2J ;L : (8.34) If these wavelet packets are constructed with a conjugate mirror lter of
size K , they have a support of size (K ; 1)2J . Over positive frequencies,
the energy of ^jk is concentrated in k 2;J (k +1) 2;J ]. The bandwidth
of all these wavelet packets is therefore approximately constant and
equal to 2;J . The Heisenberg boxes of these wavelet packets have the
same size and divide the time-frequency plane in the rectangular grid
illustrated in Figure 8.9.
ω 1111111 1
0000000 0 11
00
11
00
1
0
11
00
1
0
11
00
1
0
11
00
1
0
11
00
11
00 t Figure 8.9: Admissible tree and Heisenberg boxes of a wavelet packet
pseudo local cosine basis.
k
Shannon wavelet packets J are written in (8.29) as a dilated window modulated by cosine functions of frequency 2;J (k +1=2) . In this
case, the uniform wavelet packet basis (8.34) is therefore a local cosine
basis, with windows of constant size. This result is not valid for wavelet
packets constructed with di erent conjugate mirror lters. Nevertheless, the time and frequency resolution of uniform wavelet packet bases
(8.34) remains constant, like that of local cosine bases constructed with
windows of constant size. Figure 8.8(b) gives the decomposition coe cients of a signal in such a uniform wavelet packet basis. Best Basis Applications of orthogonal bases often rely on their ability to e ciently approximate signals with only a few non-zero vectors. 8.1. WAVELET PACKETS 451 Choosing a wavelet packet basis that concentrates the signal energy
over a few coe cients also reveals its time-frequency structures. Section 9.3.2 describes a fast algorithm that searches for a \best" basis
that minimizes a Schur concave cost function, among all wavelet packet
bases. The wavelet packet basis of Figure 8.6 is calculated with this
best basis search. 8.1.4 Wavelet Packet Filter Banks Wavelet packet coe cients are computed with a lter bank algorithm
that generalizes the fast discrete wavelet transform. This algorithm is a
straightforward iteration of the two-channel lter bank decomposition
presented in Section 7.3.2. It was therefore used in signal processing
by Croisier, Esteban and Galand 141] when they introduced the rst
family of perfect reconstruction lters. The algorithm is presented here
from a wavelet packet point of view.
To any discrete signal input b n] sampled at intervals N ;1 = 2L,
like in (7.116) we associate f 2 VL whose decomposition coe cients
aL n] = hf L ni satisfy
b n] = N 1=2 aL n] f (N ;1n) :
(8.35)
For any node (j p) of the wavelet packet tree, we denote the wavelet
packet coe cients
dp n] = hf (t) jp(t ; 2j n)i:
j
At the root of the tree d0 n] = aL n] is computed from b n] with (8.35).
L Wavelet Packet Decomposition We denote x n] = x ;n] and by x the signal obtained by inserting a zero between each sample of x. The
following proposition generalizes the fast wavelet transform Theorem
7.7.
Proposition 8.4 At the decomposition
p
p+1
(8.36)
d2+1 k] = dp ? h 2k] and d2+1 k] = dp ? g 2k]:
j
j
j
j
At the reconstruction
p
p+1
dp k] = d2+1 ? h k] + d2+1 ? g k]:
(8.37)
j
j
j 452CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
The proof of these equations is identical to the proof of Theorem 7.7.
p
p+1
The coe cients of wavelet packet children d2+1 and d2+1 are obtained
j
j
by subsampling the convolutions of dp with h and g. Iterating these
j
equations along the branches of a wavelet packet tree computes all
wavelet packet coe cients, as illustrated by Figure 8.10(a). From the
wavelet packet coe cients at the leaves fji pig1 i I of an admissible
subtree, we recover d0 at the top of the tree by computing (8.37) for
L
each node inside the tree, as illustrated by Figure 8.10(b).
h
h 2 g 2 2 d1
L+2 h 2 d2
L+2 g d1
L+1 L+1 d0
L d0
L+2 g d0 2 2 d3L+2 (a)
d0 2 h d1
L+2 2 g d2
L+2 2 h 3
dL+2 2 g L+2 + 0
dL+1 2 h +
+ d1
L+1 2 d0
L g (b)
Figure 8.10: (a): Wavelet packet lter-bank decomposition with successive lterings and subsamplings. (b): Reconstruction by inserting
zeros and ltering the outputs. Finite Signals If aL is a nite signal of size 2;L = N , we are facing
the same border convolution problems as in a fast discrete wavelet
transform. One approach explained in Section 7.5.1 is to periodize the
wavelet packet basis. The convolutions (8.36) are then replaced by
circular convolutions. To avoid introducing sharp transitions with the
periodization, one can also use the border lters described in Section
7.5.3. In either case, dp has 2;j samples. At any depth j ; L of
j
the tree, the wavelet packet signals fdpg0 p<2j;L include a total of N
j 8.1. WAVELET PACKETS 453 coe cients. Since the maximum depth is log2 N , there are at most
N log2 N coe cients in a full wavelet packet tree.
In a full wavelet packet tree of depth log2 N , all coe cients are
computed by iterating (8.36) for L j < 0. If h and g have K nonzero coe cients, this requires KN log2 N additions and multiplications.
This is quite spectacular since there are more than 2N=2 di erent wavelet
packet bases included in this wavelet packet tree.
The computational complexity to recover aL = d0 from the wavelet
L
packet coe cients of an admissible tree increases with the number of
inside nodes of the admissible tree. When the admissible tree is the full
binary tree of depth log2 N , the number of operations is maximum and
equal to KN log2 N multiplications and additions. If the admissible
subtree is a wavelet tree, we need fewer than 2KN multiplications and
additions. Discrete Wavelet Packet Bases of l2(Z) The signal decomposition
in a conjugate mirror lter bank can also be interpreted as an expansion
in discrete wavelet packet bases of l2(Z). This is proved with a result
similar to Theorem 8.1.
Theorem 8.2 Let f j m ; 2j;Ln]gn2Z be an orthonormal basis of a
space Uj , with j ; L 2 N . De ne
+1
+1
X
X
0
j ;Ln]
1
h n] j m ; 2
g n] j m;2j;Ln]:
j +1 m] =
j +1 m] =
n=;1
n=;1
(8.38)
The family
0
j +1;Ln] 1 m ; 2j +1;Ln]
j +1 m ; 2
j +1
n2Z
is an orthonormal basis of Uj . The proof is similar to the proof of Theorem 8.1. As in the continuous time case, we derive from this theorem a binary tree of discrete
wavelet packets. At the root of the discrete wavelet packet tree is the
0
space WL = l2(Z) of discrete signals obtained with a sampling inter0
val N ;1 = 2L. It admits a canonical basis of Diracs f L m ; n] = 454CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES m ; n]gn2Z. The signal aL m] is speci ed by its sample values in this
basis. One can verify that the convolutions and subsamplings (8.36)
compute
dp n] = haL m] jp m ; 2j;Ln]i
j
where f jp m ; 2j;L n]gn2Z is an orthogonal basis of a space Wjp. These
discrete wavelet packets are recursively de ned for any j L and
0 p < 2j;L by
2p
j +1 m] = +1
X n=;1 h n] p
j ;L
j m;2 n] 2p+1
j +1 m] = +1
X n=;1 g n] p
j ;L
j m;2 n]: (8.39) 8.2 Image Wavelet Packets 2
8.2.1 Wavelet Packet Quad-Tree We construct wavelet packet bases of L2(R 2 ) whose elements are separable products of wavelet packets jp(x1 ; 2j n1 ) jq (x2 ; 2j n2 ) having
the same scale along x1 and x2 . These separable wavelet packet bases
are associated to quad-trees, and divide the two-dimensional Fourier
plane (!1 !2) into square regions of varying sizes. Separable wavelet
packet bases are extensions of separable wavelet bases.
If images approximated at the scale 2L, to the root of the quad2
tree we associate the approximation space VL = VL VL L2(R 2 )
de ned in Section 7.7.1. Section 8.1.1 explains how to decompose VL
with a binary tree of wavelet packet spaces Wjp VL, which admit an
orthogonal basis f jp(t ; 2j n)gn2Z. The two-dimensional wavelet packet
quad-tree is composed of separable wavelet packet spaces. Each node
of this quad-tree is labeled by a scale 2j and two integers 0 p < 2j;L
and 0 q < 2j;L, and corresponds to a separable space Wjp q = Wjp Wjq : (8.40) The resulting separable wavelet packet for x = (x1 x2 ) is
pq
p
q
j (x) = j (x1 ) j (x2 ) : 8.2. IMAGE WAVELET PACKETS 455 Theorem A.3 proves that an orthogonal basis of Wjp q is obtained with
a separable product of the wavelet packet bases of Wjp and Wjq , which
can be written
n pq
o
(x ; 2j n)
:
j
n2Z2
0
2
At the root WL 0 = VL and the wavelet packet is a two-dimensional
scaling function
00 L (x) = L (x) = L (x1 ) L(x2 )
2 : One-dimensional wavelet packet spaces satisfy
p
p+1
q
q+1
Wjp = Wj2+1 Wj2+1 and Wjq = Wj2+1 Wj2+1 :
Inserting these equations in (8.40) proves that Wjp q is the direct sum of the four orthogonal subspaces p
p
p+1
Wjp q = Wj2+12q Wj2p+1 2q Wj2+12q+1 Wj2+1 2q+1: (8.41) These subspaces are located at the four children nodes in the quad-tree,
as shown by Figure 8.11. We call admissible quad-tree any quad-tree
whose nodes have either 0 or 4 children. Let fji pi qig0 i I be the
indices of the nodes at the leaves of an admissible quad-tree. Applying
recursively the reconstruction sum (8.41) along the branches of this
0
quad-tree gives an orthogonal decomposition of WL 0 :
0
WL 0 = I Wpi qi :
i=1 ji The union of the corresponding wavelet packet bases n o pi qi
ji
ji (x ; 2 n) (n n )2Z2
12 1 iI 0
2
is therefore an orthonormal basis of VL = WL 0. Number of Wavelet Packet Bases The number of di erent bases
in a full wavelet packet quad-tree of depth J is equal to the number
of admissibleJsubtrees. The following proposition proves that there are
more than 24 ;1 such bases. 456CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
W jp,q 2p,2q W j+1 2p,2q+1 2p+1,2q
W j+1
W j+1 2p+1,2q+1
W j+1 Figure 8.11: A wavelet packet quad-tree for images is constructed recursively by decomposing each separable space Wjp q in four subspaces. Proposition 8.5 The number BJ of wavelet packet bases in a full
wavelet packet quad-tree of depth J satis es 24J ;1 BJ 2 48 4J ;1 :
49 Proof 3 . This result is proved with induction, as in the proof of Proposition 8.5. The reader can verify that BJ satis es an induction relation
similar to (8.16):
4
BJ +1 = BJ + 1:
(8.42)
4
Since B0 = 1, B1 = 2, and BJ +1 BJ , we derive that BJ 24J ;1 .
Moreover, for J 1 1
;
log2 BJ +1 = 4 log2 BJ +log2 (1+ BJ 4 ) 4 log2 BJ + 16
which implies that BJ X
1 J ;1
4J + 16 4j
j =0 48
2 49 4J ;1 . For an image of N 2 pixels, we shall see that the wavelet packet quadtree has a depth at most log2 N . The number of wavelet packet bases
thus satis es
2
2
48
2 N4 Blog2 N 2 49 N4 :
(8.43) Spatial and Frequency Localization The spatial and frequency lo- calization of two-dimensional wavelet packets is derived from the timefrequency analysis performed in Section 8.1.2. If the conjugate mirror 8.2. IMAGE WAVELET PACKETS 457 lter h has K non-zero coe cients, we proved that jp has a support
of size 2j (K ; 1) hence jp(x1 ) jq (x2 ) has a square support of width
2j (K ; 1).
We showed that the Fourier transform of jp has its energy mostly
concentrated in
;(k + 1)2;j ;k2;j ] k2;j (k + 1)2;j ]
where k = G p] is speci ed by Proposition 8.2. The Fourier transform
of a two-dimensional wavelet packet jp q therefore has its energy mostly
concentrated in
k1 2;j (k1 + 1)2;j ] k2 2;j (k2 + 1)2;j ]
(8.44)
with k1 = G p] and k2 = G q], and in the three squares that are symmetric with respect to the two axes !1 = 0 and !2 = 0. An admissible
wavelet packet quad-tree decomposes the positive frequency quadrant
into squares of dyadic sizes, as illustrated in Figure 8.12. For example,
the leaves of a full wavelet packet quad-tree of depth j ; L de ne a
wavelet packet basis that decomposes the positive frequency quadrant
into squares of constant width equal to 2;j . This wavelet packet basis is similar to a two-dimensional local cosine basis with windows of
constant size. 8.2.2 Separable Filter Banks The decomposition coe cients of an image in a separable wavelet packet
basis are computed with a separable extension of the lter bank algorithm described in Section 8.1.4. Let b n] be an input image whose
pixels have a distance 2L = N ;1. We associate to b n] a function
2
f 2 VL approximated at the scale 2L, whose decomposition coe cients
aL n] = hf (x) 2 (x ; 2Ln)i are de ned like in (7.265):
L
b n] = N aL n] f (N ;1n) :
The wavelet packet coe cients
dp q n] = hf jp q (x ; 2j n)i
j
characterize the orthogonal projection of f in Wjp q. At the root, d0 0 =
L
aL . 458CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
0,0 WL
0,0 0,1 W
L+1 WL+1 -L 1,0
WL+1 W 1,1
L+1 ω2 2π 0 2 -L π ω1 Figure 8.12: A wavelet packet quad-tree decomposes the positive frequency quadrant into squares of progressively smaller sizes as we go
down the tree. Separable Filter Bank From the separability of wavelet packet bases and the one-dimensional convolution formula of Proposition (8.4),
we derive that for any n = (n1 n2)
p2
p+1
d2+1q n] = dp q ? hh 2n] d2+1 2q n] = dp q ? gh 2n] (8.45)
j
j
j
j
2p 2q+1
p q ? hg 2n] d2p+1 2q+1 n] = dp q ? gg 2n]: (8.46)
dj+1 n] = dj
j +1
j
The coe cients of a wavelet packet quad-tree are thus computed by
iterating these equations along the branches of the quad-tree. The
calculations are performed with separable convolutions along the rows
and columns of the image, illustrated in Figure 8.13.
At the reconstruction
p2
p+1
dp q n] = d2+1q ? hh n] + d2+1 2q ? gh n]
j
j
j
p2
p+1
+ d2+1q+1 ? hg n] + d2+1 2q+1 ? gg n]:
(8.47)
j
j
The image aL = d0 0 is reconstructed from wavelet packet coe cients
L
stored at the leaves of any admissible quad-tree by repeating the partial
reconstruction (8.47) in the inside nodes of this quad-tree. Finite Images If the image aL has N 2 = 2;2L pixels, the one-dimensional convolution border problems are solved with one of the two ap- 8.2. IMAGE WAVELET PACKETS 459 Rows Columns
2p,2q h
h
g 2p,2q 2p,2q+1 d j+1
2p+1,2q
d j+1
2p+1,2q+1
d j+1 2 g dj+1 2 d j+1 2 d2p+1,2q+1
j+1 2p,2q+1 2p+1,2q (a)
Rows h 2 2 h 2 Columns
d j+1 g 2 d j+1 g p,q
dj 2 + 2 h +
2 h 2 p,q dj g + 2 g (b)
Figure 8.13: (a): Wavelet packet decomposition implementing (8.45)
and (8.46) with one-dimensional convolutions along the rows and
columns of dp q . (b): Wavelet packet reconstruction implementing
1
(8.47). 460CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
proaches described in Sections 7.5.1 and 7.5.3. Each wavelet packet
image dp q includes 2;2j pixels. At the depth j ; L, there are N 2
j
wavelet packet coe cients in fdp qg0 p q<2j;L . A quad-tree of maxij
mum depth log2 N thus includes N 2 log2 N coe cients. If h and g have
K non-zero coe cients, the one-dimensional convolutions that implement (8.45) and (8.46) require 2K 2;2j multiplications and additions.
All wavelet packet coe cients at the depth j +1 ; L are thus computed
from wavelet packet coe cients located at the depth j ; L with 2KN 2
calculations. The N 2 log2 N wavelet packet coe cients of a full tree of
depth log2 N are therefore obtained with 2KN 2 log2 N multiplications
and additions. The numerical complexity of reconstructing aL from
a wavelet packet basis depends on the number of inside nodes of the
corresponding quad-tree. The worst case is a reconstruction from the
leaves of a full quad-tree of depth log2 N , which requires 2KN 2 log2 N
multiplications and additions. 8.3 Block Transforms 1
Wavelet packet bases are designed by dividing the frequency axis in intervals of varying sizes. These bases are thus particularly well adapted
to decomposing signals that have di erent behavior in di erent frequency intervals. If f has properties that vary in time, it is then more
appropriate to decompose f in a block basis that segments the time
axis in intervals whose sizes are adapted to the signal structures. The
next section explains how to generate a block basis of L2(R ) from any
basis of L2 0 1]. The cosine bases described in Sections 8.3.2 and 8.3.3
de ne particularly interesting block bases. 8.3.1 Block Bases
Block orthonormal bases are obtained by dividing the time axis in consecutive intervals ap ap+1] with
lim a
p!;1 p = ;1 and p!+1 ap = +1:
lim 8.3. BLOCK TRANSFORMS 461 The size lp = ap+1 ; ap of each interval is arbitrary. Let g = 1 0 1]. An
interval is covered by the dilated rectangular window
gp(t) = 1 ap ap+1 ](t) = g t ; ap :
(8.48)
lp
The following theorem constructs a block orthogonal basis of L2(R )
from a single orthonormal basis of L2 0 1]. Theorem 8.3 If fek gk2Z is an orthonormal basis of L2 0 1] then ( 1
gp k(t) = gp(t) p ek t ; ap
lp
lp ) is a block orthonormal basis of L2 (R ) . 2 (p k) Z (8.49) Proof 1 . One can verify that the dilated and translated family
( t ; ap
1
p ek
lp
lp ) k 2Z (8.50) is an orthonormal basis of L2 ap ap+1 ]. If p 6= q then hgp k gq k i = 0 since
their supports do not overlap. The family (8.49) is thus orthonormal. To
expand a signal f in this family, it is decomposed as a sum of separate
blocks
+1
X
f (t) =
f (t) gp (t)
p=;1 and each block f (t)gp (t) is decomposed in the basis (8.50). Block Fourier Basis A block basis is constructed with the Fourier
basis of L2 0 1]:
n
o
ek (t) = exp(i2k t) k2Z:
The time support of each block Fourier vector gp k is ap ap+1], of size
lp. The Fourier transform of g = 1 0 1] is
!
g(!) = sin(!=2) exp i2
^
!=2 462CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
and p^
gp k(!) = lp g(lp! ; 2k ) exp ;i2l kap :
^
p
;
It is centered at 2k lp 1 and has a slow asymptotic decay proportional
;
to lp 1 j!j;1. Because of this bad frequency localization, even though
a signal f is smooth, its decomposition in a block Fourier basis may
include large high frequency coe cients. This can also be interpreted
as an e ect of periodization.
Discrete Block Bases For all p 2 Z, we suppose that ap 2 Z.
Discrete block bases are built with discrete rectangular windows whose
supports are ap ap+1 ; 1] gp n] = 1 ap ap+1;1](n):
Since dilations are not de ned in a discrete framework, we generally
cannot derive bases of intervals of varying sizes from a single basis. The
following theorem thus supposes that we can construct an orthonormal
basis of C l for any l > 0. The proof is straightforward. Theorem 8.4 Suppose that fek lg0
any l > 0. The family n k<l is an orthogonal basis of C l , for o gp k n] = gp n] ek lp n ; ap] 0 k<lp p2Z (8.51) is a block orthonormal basis of l2(Z). A discrete block basis is constructed with discrete Fourier bases
1
ek l n] = p exp i2 lkn
:
l
0 k<l
The resulting block Fourier vectors gp k have sharp transitions at the
window border, and are thus not well localized in frequency. As in the
continuous case, the decomposition of smooth signals f may produce
large amplitude high frequency coe cients because of border e ects. 8.3. BLOCK TRANSFORMS 463 Block Bases of Images General block bases of images are con- structed by partitioning the plane R 2 into rectangles f ap bp] cp dp]gp2Z
of arbitrary length lp = bp ; ap and width wp = dp ; cp . Let fek gk2Z be
an orthonormal basis of L2 0 1] and g = 1 0 1]. We denote
;
;
gp k j (x y) = g x ; ap gq y w cp p 1 ek x ; ap ej y w cp :
lp
lp
lpwp
p
p
The family fgp k j g(k j)2Z2 is an orthonormal basis of L2 ( ap bp] cp dp]),
and hence fgp k j g(p k j)2Z3 is an orthonormal basis of L2(R 2 ).
For discrete images, we de ne discrete windows that cover each
rectangle
gp = 1 ap bp;1] cp dp;1]:
If fek lg0 k<l is an orthogonal basis of C l for any l > 0, then n o gp k j n1 n2 ] = gp n1 n2] ek lp n1 ; ap] ej wp n2 ; cp] is a block basis of l2(Z2). (k j p)2Z3 : 8.3.2 Cosine Bases If f 2 L2 0 1] and f (0) 6= f (1), even though f might be a smooth
function, the Fourier coe cients
Z1
i2k ui =
f (u) e;i2k u du
hf (u) e
0 have a relatively large amplitude at high frequencies 2k . Indeed, the
Fourier series expansion
+1
X
f (t) =
hf (u) ei2k ui ei2k t
k=;1
is a function of period 1, equal to f over 0 1], and which is therefore
discontinuous if f (0) 6= f (1). This shows that the restriction of a
smooth function to an interval generates large Fourier coe cients. As
a consequence, block Fourier bases are rarely used. A cosine I basis
reduces this border e ect by restoring a periodic extension f~ of f which
is continuous if f is continuous. High frequency cosine I coe cients thus
have a smaller amplitude than Fourier coe cients. 464CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
f(t)
~
f(t)
-1 0 1 2 Figure 8.14: The function f~(t) is an extension of f (t) it is symmetric
about 0 and of period 2. Cosine I Basis We de ne f~ to be the function of period 2 that is symmetric about 0 and equal to f over 0 1]:
1
f~(t) = f (t)t) for t 2 (0 1 ] 0)
(8.52)
f (; for t 2 ;
If f is continuous over 0 1] then f~ is continuous over R , as shown by
Figure 8.14. However, if f has a non-zero right derivative at 0 or left
derivative at 1, then f~ is non-di erentiable at integer points.
The Fourier expansion of f~ over 0 2] can be written as a sum of
sine and cosine terms:
+1
+1
~(t) = X a k] cos 2 kt + X b k] sin 2 kt :
f
2
2
k=1
k=0
The sine coe cients b k] are zero because f~ is even. Since f (t) = f~(t)
over 0 1], this proves that any f 2 L2 0 1] can be written as a linear
combination of the cosines fcos(k t)gk2N . One can verify that this
family is orthogonal over 0 1]. It is therefore an orthogonal basis of
L2 0 1], as stated by the following theorem.
Theorem 8.5 (Cosine I) The family
np
o
;1=2 if k = 0
2 cos( kt)
with k = 2
k
1
if k 6= 0
k2N
is an orthonormal basis of L2 0 1]. Block Cosine Basis Let us divide the real line with square windows
gp = 1 ap ap+1]. Theorem 8.3 proves that ( s gp k (t) = gp(t) l2
p t ; ap
k cos k
lp ) k 2N p 2Z : 8.3. BLOCK TRANSFORMS 465 is a block basis of L2(R ). The decomposition coe cients of a smooth
function have a faster decay at high frequencies in a block cosine basis
than in a block Fourier basis, because cosine bases correspond to a
smoother signal extension beyond the intervals ap ap+1]. Cosine IV Basis Other cosine bases are constructed from Fourier series, with di erent extensions of f beyond 0 1]. The cosine IV basis
appears in fast numerical computations of cosine I coe cients. It is also
used to construct local cosine bases with smooth windows in Section
8.4.2.
Any f 2 L2 0 1] is extended into a function f~ of period 4, which is
symmetric about 0 and antisymmetric about 1 and ;1:
8 f (t) if t 2 0 1]
>
< t)
;
f~(t) = >;f (;; t) if t 2 (1 1 )0)
:;f (2 + t) if t 2 ;12 ;2)
f (2
if t 2
If f (1) 6= 0, the antisymmetry at 1 creates a function f~ that is discontinuous at f (2n + 1) for any n 2 Z, as shown by Figure 8.15. This
extension is therefore less regular than the cosine I extension (8.52).
Since f~ is 4 periodic, it can be decomposed as a sum of sines and
cosines of period 4:
+1
+1
X
X
f~(t) = a k] cos 2 4kt + b k] sin 2 4kt :
k=0
k=1
The symmetry about 0 implies that
1 Z 2 f~(t) sin 2 kt dt = 0:
b k] = 2
4
;2
For even frequencies, the antisymmetry about 1 and ;1 yields
1 Z 2 f~(t) cos 2 (2k)t dt = 0:
a 2k] = 2
4
;2
The only non-zero components are thus cosines of odd frequencies:
+1
~(t) = X a 2k + 1] cos (2k + 1)2 t :
f
(8.53)
4
k=0 466CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
Since f (t) = f~(t) over 0 1], this proves that any f 2 L2 0 1] is decomposed as a sum of such cosine functions. One can verify that the
restriction of these cosine functions to 0 1] is orthogonal in L2 0 1],
which implies the following theorem.
f(t) ~
f(t)
-2 -1 0 1 2 Figure 8.15: A cosine IV extends f (t) into a signal f~(t) of period 4
which is symmetric with respect to 0 and antisymmetric with respect
to 1. Theorem 8.6 (Cosine IV) The family
p 1 ti
2 cos k + 2
k2N h is an orthonormal basis of L2 0 1]. The cosine transform IV is not used in block transforms because
it has the same drawbacks as a block Fourier basis. Block Cosine IV
coe cients of a smooth f have a slow decay at high frequencies because
such a decomposition corresponds to a discontinuous extension of f
beyond each block. Section 8.4.2 explains how to avoid this issue with
smooth windows. 8.3.3 Discrete Cosine Bases
Discrete cosine bases are derived from the discrete Fourier basis with the
same approach as in the continuous time case. To simplify notations,
the sampling distance is normalized to 1. If the sampling distance was
originally N ;1 then the frequency indexes that appear in this section
must be multiplied by N . 8.3. BLOCK TRANSFORMS 467 Discrete Cosine I A signal f n] de ned for 0 n < N is extended by symmetry with respect to ;1=2 into a signal f~ n] of size 2N :
f~ n] = f n]n ; 1] for 0 N n < N ;1 :
(8.54)
f;
for ;
n
The 2N discrete Fourier transform of f~ can be written as a sum of sine
and cosine terms:
N ;1
N ;1
~ n] = X a k] cos k n + 1 + X b k] sin k n + 1 :
f
N
2
N
2
k=0
k=0 Since f~ is symmetric about ;1=2, necessarily b k] = 0 for 0 k < N .
Moreover f n] = f~ n] for 0 n < N , so any signal f 2 C N can be
written as a sum of these cosine functions. The reader can also verify
that these discrete cosine signals are orthogonal in C N . We thus obtain
the following theorem.
Theorem 8.7 (Cosine I) The family
)
(r
;1=2 if k = 0
2 cosh k n + 1 i
with k = 2
k
1
otherwise
N
N
2 0 k<N
is an orthonormal basis of C N . This theorem proves that any f 2 C N can be decomposed into
X
2 N ;1 f^ k] cos k n + 1
f n] = N
(8.55)
I
k
N
2
k=0 where k n+ 1
k cos
N
2 X N ;1 f n] cos k n + 1 :
N
2
n=0
(8.56)
is the discrete cosine transform I (DCT-I) of f . The next section describes a fast discrete cosine transform which computes f^I with O(N log2 N )
operations. f^I k] = f n] = k 468CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Discrete Block Cosine Transform Let us divide the integer set Z
with discrete windows gp n] = 1 ap ap;1](n), with ap 2 Z. Theorem 8.4
proves that the corresponding block basis gp k n] = gp n] k r2 cos k n + 1 ; ap
lp
lp
2 0 k<N p2Z is an orthonormal basis of l2(Z). Over each block of size lp = ap+1 ; ap ,
the fast DCT-I algorithm computes all coe cients with O(lp log2 lp) operations. Section 11.4.3 describes the JPEG image compression standard, which decomposes images in a separable block cosine basis. A
block cosine basis is used as opposed to a block Fourier basis because
it yields smaller amplitude high frequency coe cients, which improves
the coding performance. Discrete Cosine IV To construct a discrete cosine IV basis, a signal f of N samples is extended into a signal f~ of period 4N , which is
symmetric with respect to ;1=2 and antisymmetric with respect to
N ; 1=2 and ;N + 1=2. As in (8.53), the decomposition of f~ over
a family of sines and cosines of period 4N has no sine terms and no
cosine terms of even frequency. Since f~ n] = f n], for 0 n < N , we
derive that f can also be written as a linear expansion of these odd
frequency cosines, which are orthogonal in C N . We thus obtain the
following theorem. Theorem 8.8 (Cosine IV) The family (r 2 cosh k + 1
N
N
2 1
n+ 2 i)
0 k<N is an orthonormal basis of C N . This theorem proves that any f 2 C N can be decomposed into
X
2 N ;1 f^ k] cos
1
f n] = N
k+ 1 n+ 2
(8.57)
IV
N
2
k=0 8.3. BLOCK TRANSFORMS
where N ;1
^IV k] = X f n] cos
f
n=0 469 k+1
N
2 n+ 1
2 (8.58) is the discrete cosine transform IV (DCT-IV) of f . 8.3.4 Fast Discrete Cosine Transforms 2
The discrete cosine transform IV (DCT-IV) of a signal of size N is
related to the discrete Fourier transform (DFT) of a complex signal of
size N=2 with a formula introduced by Duhamel, Mahieux, and Petit
176, 42] By computing this DFT with the fast Fourier transform (FFT)
described in Section 3.3.3, we need O(N log2 N ) operations to compute
the DCT-IV. The DCT-I coe cients are then calculated through an
induction relation with the DCT-IV, due to Wang 346]. Fast DCT-IV To clarify the relation between a DCT-IV and a DFT,
we split f n] in two half-size signals of odd and even indices: b n] = f 2n]
c n] = f N ; 1 ; 2n]:
The DCT-IV (8.58) is rewritten f^IV k] = N=2;1 X n=0
N=2;1 X = n=0
N=2;1 X 1
b n] cos 2n + 2 k + 1 N +
2
1
c n] cos N ; 1 ; 2n + 2 k + 1 N
2 b n] cos n + 1 k + 1 2 +
4
2N
n=0
N=2;1
X
k
(;1)
c n] sin n + 1 k + 1 2 :
4
2N
n=0 470CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
The even frequency indices can thus be expressed as a real part
f^IV 2n] =
k
h 2 io
1
iP
Real exp ;N k N=2;1 (b n] + ic n]) exp ;i(n + 4 ) N exp ;iN=2kn
n=0 (8.59) whereas the odd coe cients correspond to an imaginary part
f^IV N ; 2k ; 1] =
n
h 2 io
iP
1
;Im exp ;N k N=2;1 (b n] + ic n]) exp ;i(n + 4 ) N exp ;iN=2kn :
n=0
(8.60)
For 0 n < N=2, we denote
1
g n] = (b n] + i c n]) exp ;i n + 4 N :
The DFT g k] of g n] is computed with an FFT of size N=2. Equations
^
(8.59) and (8.60) prove that
i^
f^IV 2k] = Real exp ;N k g k]
and
i^
f^IV N ; 2k ; 1] = ;Im exp ;N k g k] :
The DCT-IV coe cients f^IV k] are thus obtained with one FFT of
size N=2 plus O(N ) operations, which makes a total of O(N log2 N )
operations. To normalize the DCT-IV, the resulting coe cients must
q2
be multiplied by N . An e cient implementation of the DCT-IV with
a split-radix FFT requires 42]
N
(8.61)
DCT ;IV (N ) = log2 N + N
2
real multiplications and
3N log N
(8.62)
DCT ;IV (N ) =
22 8.3. BLOCK TRANSFORMS 471 additions.
The inverse DCT-IV of f^IV is given by (8.57). Up to the proportionality constant 2=N , this sum is the same as (8.58), where f^IV and
f are interchanged. This proves that the inverse DCT-IV is computed
with the same fast algorithm as the forward DCT-IV. Fast DCT-I A DCT-I is calculated with an induction relation that involves the DCT-IV. Regrouping the terms f n] and f N ; 1 ; n] of
a DCT-I (8.56) yields
N=2;1
^I 2k] = k X (f n] + f N ; 1 ; n]) cos k n + 1
(8.63)
f
N=2
2
n=0
N=2;1
X
+1
f^I 2k + 1] =
(f n] ; f N ; 1 ; n]) cos (kN=2 =2) n + 1 (8.64)
2:
n=0
The even index coe cients of the DCT-I are thus equal to the DCT-I
of the signal f n] + f N ; 1 ; n] of length N=2. The odd coe cients
are equal to the DCT-IV of the signal f n] ; f N ; 1 ; n] of length
N=2. The number of multiplications of a DCT-I is thus related to the
number of multiplications of a DCT-IV by the induction relation
DCT ;I (N ) = DCT ;I (N=2) + DCT ;IV (N=2) (8.65) while the number of additions is
DCT ;I (N ) = DCT ;I (N=2) + DCT ;IV (N=2) + N: (8.66) Since the number of multiplications and additions of a DCT-IV is
O(N log2 N ) this induction relation proves that the number of multiplications and additions of this algorithm is also O(N log2 N ).
If the DCT-IV is implemented with a split-radix FFT, inserting
(8.61) and (8.62) in the recurrence equations (8.65) and (8.66), we
derive that the number of multiplications and additions to compute a
DCT-I of size N is
N
(8.67)
DCT ;I (N ) = log2 N + 1
2 472CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
and 3N log N ; N + 1:
(8.68)
22
The inverse DCT-I is computed with a similar recursive algorithm.
Applied to f^I , it is obtained by computing the inverse DCT-IV of the
odd index coe cients f^I 2k + 1] with (8.64) and an inverse DCT-I of a
size N=2 applied to the even coe cients f^I 2k] with (8.63). From the
values f n] + f N ; 1 ; n] and f n] ; f N ; 1 ; n], we recover f n] and
f N ; 1 ; n]. The inverse DCT-IV is identical to the forward DCT-IV
up to a multiplicative constant. The inverse DCT-I thus requires the
same number of operations as the forward DCT-I.
DCT ;I (N ) = 8.4 Lapped Orthogonal Transforms 2
Cosine and Fourier block bases are computed with discontinuous rectangular windows that divide the real line in disjoint intervals. Multiplying a signal with a rectangular window creates discontinuities that
produce large amplitude coe cients at high frequencies. To avoid these
discontinuity artifacts, it is necessary to use smooth windows.
The Balian-Low Theorem 5.6 proves that for any u0 and 0, there
exists no di erentiable window g of compact support such that n g(t ; nu0) exp(ik 0t) o 2
is an orthonormal basis of L2(R ). This negative result discouraged
any research in this direction, until Malvar discovered in discrete signal
processing that one could create orthogonal bases with smooth windows modulated by a cosine IV basis 262, 263]. This result was independently rediscovered for continuous time functions by Coifman and
Meyer 138], with a di erent approach that we shall follow here. The
roots of these new orthogonal bases are lapped projectors, which split
signals in orthogonal components with overlapping supports 46]. Section 8.4.1 introduces these lapped projectors the construction of continuous time and discrete lapped orthogonal bases is explained in the
following sections. The particular case of local cosine bases is studied
in more detail.
(n k) Z2 8.4. LAPPED ORTHOGONAL TRANSFORMS 473 8.4.1 Lapped Projectors Block transforms compute the restriction of f to consecutive intervals ap ap+1] and decompose this restriction in an orthogonal basis of
ap ap+1]. Formally, the restriction of f to ap ap+1] is an orthogonal
projection on the space Wp of functions with a support included in
ap ap+1]. To avoid the discontinuities introduced by this projection,
we introduce new orthogonal projectors that perform a smooth deformation of f . Projectors on Half Lines Let us rst construct two orthogonal projectors that decompose any f 2 L2(R ) in two orthogonal components
P +f and P ;f whose supports are respectively ;1 +1) and (;1 1].
For this purpose we consider a monotone increasing pro le function
such that
(t) = 0 if t < ;1
(8.69)
1 if t > 1 and 8t 2 ;1 1]
A naive de nition 2 (t) + 2(;t) = 1: (8.70) P +f (t) = 2(t) f (t) and P ;f (t) = 2(;t) f (t)
satis es the support conditions but does not de ne orthogonal functions. Since the supports of P +f (t) and P ;f (t) overlap only on ;1 1],
the orthogonality is obtained by creating functions having a di erent
symmetry with respect to 0 on ;1 1]:
P +f (t) = (t) (t) f (t) + (;t) f (;t)] = (t) p(t)
(8.71)
and
P ;f (t) = (;t) (;t) f (t) ; (t) f (;t)] = (;t) q(t) : (8.72)
The functions p(t) and q(t) are respectively even and odd, and since
(t) (;t) is even it follows that
hP f P ; f i =
+ Z1 ;1 (t) (;t) p(t) q (t) dt = 0: (8.73) 474CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
Clearly P +f belongs to the space W+ of functions f 2 L2(R ) such
that there exists p(t) = p(;t) with
f (t) = 0 (t) p(t) if t < ;1 1] :
if t 2 ;1
Similarly P ;f is in the space W; composed of f 2 L2(R ) such that
there exists q(t) = ;q(;t) with
f (t) = 0 (;t) q(t) if t > 1 1 1] :
if t 2 ;
Functions in W+ and W; may have an arbitrary behavior on 1 +1)
and (;1 ;1] respectively . The following theorem characterizes P +
and P ;. We denote by Id the identity operator.
Theorem 8.9 (Coifman, Meyer) The operators P + and P ; are orthogonal projectors respectively on W+ and W;. The spaces W+ and
W; are orthogonal and
P + + P ; = Id:
(8.74)
Proof 2 . To verify that P + is a projector we show that any f 2 W+
satis es P + f = f . If t < ;1 then P + f (t) = f (t) = 0 and if t > 1 then
P + f (t) = f (t) = 1. If t 2 ;1 1] then f (t) = (t) p0 (t) and inserting
(8.71) yields P +f (t) = (t) 2 (t) p0 (t) + 2 (;t) p0 (;t)] = (t) p0 (t)
because p0 (t) is even and (t) satis es (8.70). The projector P + is proved
to be orthogonal by showing that it is self-adjoint: hP f gi =
+ Z 1 Z;1 2 1 + 1 (t) f (t) g (t) dt + f (t) g (t) dt: Z 1 ;1 (t) (;t) f (;t) g (t) dt + A change of variable t0 = ;t in the second integral veri es that this formula is symmetric in f and g and hence hP + f gi = hf P +gi. Identical
derivations prove that P ; is an orthogonal projector on W; .
The orthogonality of W; and W+ is proved in (8.73). To verify
(8.74), for f 2 L2 (R) we compute
P +f (t) + P ;f (t) = f (t) 2 (t) + 2 (;t)] = f (t):
(8.74) 8.4. LAPPED ORTHOGONAL TRANSFORMS
1 β( t-a )
η β(a-t )
η 0 475 a- η a a+ η t Figure 8.16: A multiplication with ( t;a ) and ( a;t ) restricts the support of functions to a ; +1) and (;1 a + ]
These half-line projectors are generalized by decomposing signals in two
orthogonal components whose supports are respectively a; +1) and
(;1 a+ ]. For this purpose, we scale and translate the pro le function
( t;a ), so that it increases from 0 to 1 on a ; a + ], as illustrated in
Figure 8.16. The symmetry with respect to 0, which transforms f (t) in
f (;t), becomes a symmetry with respect to a, which transforms f (t)
in f (2a ; t). The resulting projectors are
t ; a f (t) + a ; t f (2a ; t) (8.75)
P + f (t) = t ; a
a and
Pa; f (t) = a;t a ; t f (t) ; t ; a f (2a ; t) : (8.76) A straightforward extension of Theorem 8.9 proves that Pa+ is an or+
thogonal projector on the space Wa of functions f 2 L2(R ) such that
there exists p(t) = p(2a ; t) with
f (t) = 0 ( ;1(t ; a)) p(t) if t < a ; a + ] :
(8.77)
if t 2 a ;
;
Similarly Pa; is an orthogonal projector on the space Wa composed
2 (R ) such that there exists q (t) = ;q (2a ; t) with
of f 2 L
f (t) = 0 ( ;1(a ; t)) q(t) if t < ;1 a + ] :
(8.78)
if t 2 a ;
+
;
The spaces Wa and Wa are orthogonal and
Pa+ + Pa; = Id:
(8.79) 476CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Projectors on Intervals A lapped projector splits a signal in two orthogonal components that overlap on a ; a + ]. Repeating such
projections at di erent locations performs a signal decomposition into
orthogonal pieces whose supports overlap. Let us divide the time axis
in overlapping intervals:
Ip = ap ; p ap+1 + p+1]
with
(8.80)
lim a = ;1 and p!+1 ap = +1:
lim
p!;1 p
To ensure that Ip;1 and Ip+1 do not intersect for any p 2 Z, we impose
that
ap+1 ; p+1 ap + p
and hence
lp = ap+1 ; ap p+1 + p:
(8.81)
The support of f is restricted to Ip by the operator
Pp = Pa+ p Pa;+1 p+1 :
(8.82)
p
p
+
Since Pa+ p and Pa;+1 p+1 are orthogonal projections on Wap p and
p
p
;
Wap+1 p+1 , it follows that Pp is an orthogonal projector on
+
;
Wp = Wa \ Wa
:
(8.83)
pp p+1 p+1 Let us divide Ip in two overlapping intervals Op, Op+1 and a central
interval Cp:
Ip = ap ; p ap+1 + p+1] = Op Cp Op+1
(8.84)
with
Op = ap ; p ap + p] and Cp = ap + p ap+1 ; p+1]:
The space Wp is characterized by introducing a window gp whose support is Ip, and which has a raising pro le on Op and a decaying pro le
on Op+1:
80
=
> ;1
< ( p (t ; ap)) if t 2 Ipp
if t 2 O
gp(t) = >1
(8.85)
if t 2 Cp
: ( p;+1(ap+1 ; t)) if t 2 Op+1
1 8.4. LAPPED ORTHOGONAL TRANSFORMS 477 This window is illustrated in Figure 8.17. It follows from (8.77), (8.78)
and (8.83) that Wp is the space of functions f 2 L2 (R) that can be
written
t)
f (t) = gp(t) h(t) with h(t) = ;h(2ap ; ; t) if t 2 Op (8.86)
h(2ap+1
if t 2 Op+1
g (t) g p (t) p-1 a p −η p a p+η p g (t)
p+1 ap+1−ηp+1 ap+1+ηp+1 Figure 8.17: Each window gp has a support ap ; p ap+1 + p+1] with
an increasing pro le and a decreasing pro le over ap ; p ap + p] and
ap+1 ; p+1 ap+1 + p+1].
The function h is symmetric with respect to ap and antisymmetric
with respect to ap+1 , with an arbitrary behavior in Cp. The projector
Pp on Wp de ned in (8.82) can be rewritten
8;
>Pap p f (t) if t 2 Op
<
if t 2 C
= g (t) h (t)
(8.87)
Ppf (t) = >f (t)
:Pa+p+1 p+1 f (t) if t 2 Opp+1 p p
where hp(t) is calculated by inserting (8.75) and (8.76): 8g (t) f (t) + g (2a ; t) f (2a ; t)
if t 2 Op
<p
pp
p
if t 2 Cp :
hp(t) = :f (t)
gp(t) f (t) ; gp(2ap+1 ; t) f (2ap+1 ; t) if t 2 Op+1 (8.88) The following proposition derives a decomposition of the identity. Proposition 8.6 The operator Pp is an orthogonal projector on Wp.
If p 6= q then Wp is orthogonal to Wq and
+1
X p=;1 Pp = Id: (8.89) 478CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
Proof 2 . If p 6= q and jp ; qj > 1 then functions in Wp and Wq have
supports that do not overlap so these spaces are orthogonal. If q = p + 1
then
+
;
+
;
Wp = Wap p \ Wap+1 p+1 and Wp+1 = Wap+1 p+1 \ Wap+2 p+2 : ;
+
Since Wap+1 p+1 is orthogonal to Wap+1 p+1 it follows that Wp is orthogonal to Wp+1 . To prove (8.89), we rst verify that
Pp + Pp+1 = Pa+ p Pa;+2 p+2 :
(8.90)
p
p
This is shown by decomposing Pp and Pp+1 with (8.87) and inserting
Pa++1 p+1 + Pa;+1 p+1 = Id:
p
p
As a consequence
m
X
Pp = Pa+ n Pa; m :
(8.91)
n
m
For any f 2 L 2 (R ), p=n kf ; Pa+ n Pa; m f k2
n
m Z an + n ;1 jf (t)j2dt + 1
jf (t)j2dt
am ; m Z + and inserting (8.80) proves that
lim kf ; Pa+ n Pa; m f k2 = 0:
n
m
n!;1
m!+1
The summation (8.91) implies (8.89). Discretized Projectors Projectors Pp that restrict the signal sup- port to ap ; p ap+1 + p+1] are easily extended for discrete signals. Suppose that fapgp2Z are half integers, which means that ap +1=2 2 Z. The
windows gp(t) de ned in (8.85) are uniformly sampled gp n] = gp(n).
As in (8.86) we de ne the space Wp l2(Z) of discrete signals
h 2a n
f n] = gp n] h n] with h n] = ;h 2ap ; ;] n] if n 2 Op : (8.92)
if n 2 Op+1
p+1
The orthogonal projector Pp on Wp is de ned by an expression identical
to (8.87,8.88):
Ppf n] = gp n] hp n]
(8.93) 8.4. LAPPED ORTHOGONAL TRANSFORMS 479 with 8g n] f n] + g 2a ; n] f 2a ; n]
if n 2 Op
<p
pp
p
if n 2 Cp :
hp n] = :f n]
gp n] f n] ; gp 2ap+1 ; n] f 2ap+1 ; n] if n 2 Op+1 (8.94) Finally we prove as in Proposition 8.6 that if p 6= q, then Wp is orthogonal to Wq and
+1
X
Pp = Id:
(8.95)
p=;1 8.4.2 Lapped Orthogonal Bases An orthogonal basis of L2(R ) is de ned from a basis fek gk2N of L2 0 1]
by multiplying a translation and dilation of each vector with a smooth
window gp de ned in (8.85). A local cosine basis of L2(R ) is derived
from a cosine-IV basis of L2 0 1].
The support of gp is ap ; p ap+1 + p+1], with lp = ap+1 ; ap, as
illustrated in Figure 8.17. The design of these windows also implies
symmetry and quadrature properties on overlapping intervals: gp(t) = gp+1(2ap+1 ; t) for t 2 ap+1 ; p+1 ap+1 + p+1 ] and (8.96) 2
2
gp (t) + gp+1(t) = 1 for t 2 ap+1 ; p+1 ap+1 + p+1]:
Each ek 2 L2 0 1] is extended over R into a function ek that is
~
symmetric with respect to 0 and antisymmetric with respect to 1. The
resulting ek has period 4 and is de ned over ;2 2] by
~ 8 e (t)
>k
<
t)
ek (t) = >;ek (;; t)
~
:;ek (2 + t)
ek (2 if t 2 0 1]
if t 2 (;1 0) :
if t 2 1 2)
if t 2 ;1 ;2) The following theorem derives an orthonormal basis of L2(R ). 480CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Theorem 8.10 (Coifman, Malvar, Meyer) Let fek gk2N be an orthonormal basis of L2 0 1]. The family ( 1~
gp k(t) = gp(t) p ek t ; ap
lp
lp ) k2N p2Z (8.97) is an orthonormal basis of L2(R ) .
Proof 2 . Since ek (lp 1 (t ; ap )) is symmetric with respect to ap and an~;
tisymmetric with respect to ap+1 it follows from (8.86) that gp k 2 Wp
for all k 2 N . Proposition 8.6 proves that the spaces Wp and Wq are
1
orthogonal for p 6= q and that L2(R) = +=;1Wp . To prove that (8.97)
p
is an orthonormal basis of L2(R) we thus need to show that
( 1~
gp k (t) = gp(t) p ek t ; ap
lp
lp ) k2N p2Z (8.98) is an orthonormal basis of Wp .
Let us prove rst that any f 2 Wp can be decomposed over this
family. Such a function can be written f (t) = gp (t) h(t) where the restriction of h to ap ap+1 ] is arbitrary, and h is respectively symmetric
and antisymmetric with respect to ap and ap+1 . Since fek gk2N is an
~
2 0 1], clearly
orthonormal basis of L
( 1 ~ t ; ap
p ek
lp
lp ) (8.99) k2N is an orthonormal basis of L2 ap ap+1 ]. The restriction of h to ap ap+1 ]
can therefore be decomposed in this basis. This decomposition remains
;~;
valid for all t 2 ap ; p ap+1 + p+1] since h(t) and the lp 1=2 ek (lp 1 (t ; ap))
have the same symmetry with respect to ap and ap+1 . Therefore f (t) =
h(t)gp (t) can be decomposed over the family (8.98). The following lemma
nishes the proof by showing that the orthogonality of functions in (8.98)
is a consequence of the orthogonality of (8.99) in L2 ap ap+1 ]. Lemma 8.1 If fb(t) = hb (t) gp (t) 2 Wp and fc(t) = hc(t) gp(t) 2 Wp,
then hfb fci = Z ap+1 + p+1
ap ; p fb(t) fc (t) dt = Z ap+1
ap hb(t) hc (t) dt: (8.100) 8.4. LAPPED ORTHOGONAL TRANSFORMS
Let us evaluate hfb fci = Z ap+1 + p+1
ap ; p 481 2
hb (t) hc (t) gp (t) dt: (8.101) We know that hb (t) and hc (t) are symmetric with respect to ap so
Z ap + p
ap ; p hb (t) hc (t) gp (t) dt =
2 Z ap + p
ap 2
2
hb (t) hc (t) gp (t) + gp (2ap ; t)] dt: 2
2
Since gp (t) + gp (2ap+1 ; t) = 1 over this interval, we obtain Z ap + p
ap ; p hb(t) hc (t) gp (t) dt =
2 Z ap + p
ap hb (t) hc(t) dt: (8.102) The functions hb (t) and hc (t) are antisymmetric with respect to ap+1 so
hb (t)hc (t) is symmetric about ap+1. We thus prove similarly that
Z ap+1 + p+1
ap+1 ; p+1 2
hb (t) hc (t) gp+1 (t) dt = Z ap+1 ap+1 ; p+1 hb (t) hc (t) dt: (8.103) Since gp (t) = 1 for t 2 ap + p ap+1 ; p+1 ], inserting (8.102) and
(8.103) in (8.101) proves the lemma property (8.100). Theorem 8.10 is similar to the block basis Theorem 8.3 but it has the
advantage of using smooth windows gp as opposed to the rectangular
windows that are indicator functions of ap ap+1]. It yields smooth
functions gp k only if the extension ek of ek is a smooth function. This
~
p
is the case for the cosine IV basis fek (t) = 2 cos (k + 1=2) t]gk2N of
L2 0 1] de ned in Theorem 8.6. Indeed cos (k + 1=2) t] has a natural
symmetric and antisymmetric extension with respect to 0 and 1 over
R . The following corollary derives a local cosine basis. Corollary 8.1 The family of local cosine functions ( s gp k (t) = gp(t) l2 cos
p 1
k + 2 t ; ap
lp is an orthonormal basis of L2(R ) . ) k2N p2Z (8.104) 482CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Cosine-Sine I Basis Other bases can be constructed with functions having a di erent symmetry. To maintain the orthogonality of the windowed basis, we must ensure that consecutive windows gp and gp+1 are
multiplied by functions that have an opposite symmetry with respect
to ap+1 . For example, we can multiply g2p with functions that are symmetric with respect to both ends a2p and a2p+1, and multiply g2p+1 with
functions that are antisymmetric with respect to a2p+1 and a2p+2 . Such
p
bases can be constructed with the cosine I basis f 2 k cos( kt)gk2Z
de ned in Theorem 8.5, with 0 = 2;1=2 and k = 1 for k 6= 0, and
p
with the sine I family f 2 sin( kt)gk2N , which is also an orthonormal
basis of L2 0 1]. The reader can verify that if s cos k t ; a2p
l2p
s
g2p+1 k (t) = g2p+1(t) l 2 sin k t ; a2p+1
l2p+1
2p+1 g2p k(t) = g2p(t) l2
2p k then fgp k gk2N p2Z is an orthonormal basis of L2 (R). Lapped Transforms in Frequency Lapped orthogonal projectors can also divide the frequency axis in separate overlapping intervals.
This is done by decomposing the Fourier transform f^(!) of f (t) over a
local cosine basis de ned on the frequency axis fgp k(!)gp2Z k2N . This
is also equivalent to decomposing f (t) on its inverse Fourier transform
f 21 gp k (;t)gp2Z k2N . As opposed to wavelet packets, which decompose
^
signals in dyadic frequency bands, this approach o ers complete exibility on the size of the frequency intervals ap ; p ap+1 + p+1].
A signal decomposition in a Meyer wavelet or wavelet packet basis can be calculated with a lapped orthogonal transform applied in
the Fourier domain. Indeed, the Fourier transform (7.92) of a Meyer
wavelet has a compact support and fj ^(2j !)jgj2Z can be considered as
a family asymmetric windows, whose supports only overlap with adjacent windows with appropriate symmetry properties. These windows
P1
cover the whole frequency axis: +=;1 j ^(2j !)j2 = 1. As a result,
j
the Meyer wavelet transform can be viewed as a lapped orthogonal 8.4. LAPPED ORTHOGONAL TRANSFORMS 483 transform applied in the Fourier domain. It can thus be e ciently
implemented with the folding algorithm of Section 8.4.4. 8.4.3 Local Cosine Bases The local cosine basis de ned in (8.104) is composed of functions s gp k(t) = gp(t) l2 cos k + 1 t ; ap
2 lp
p
with a compact support ap ; p ap+1 + p+1]. The energy of their Fourier
transforms is also well concentrated. Let gp be the Fourier transform
^
of gp,
gp k (!) = exp(;iap
^
2 s p k) 2
^
lp gp(! ; ^
p k ) + gp (! + p k ) where = (k + 1=2) :
lp
The bandwidth of gp k around p k and ; p k is equal to the bandwidth
^
of gp. If the sizes p and p+1 of the variation intervals of gp are pro^
;
portional to lp, then this bandwidth is proportional to lp 1 .
For smooth functions f , we want to guarantee that the inner products hf gp k i have a fast decay when the center frequency p k increases.
The Parseval formula proves that
sZ
eiap p k 2 +1 f^(!) g (! ; ) + g (! + ) d!:
^p
^p
hf gp k i =
pk
pk
2
lp ;1
The smoothness of f implies that jf^(!)j has a fast decay at large frequencies !. This integral will therefore become small when p k increases if gp is a smooth window, because jgp(!)j has a fast decay.
^
pk Window Design The regularity of gp depends on the regularity of the pro le which de nes it in (8.85). This pro le must satisfy
2
(t) + 2(;t) = 1 for t 2 ;1 1]
(8.105) 484CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
plus (t) = 0 if t < ;1 and (t) = 1 if t > 1. One example is
0 (t) = sin 4 (1 + t) for t 2 ;1 1] but its derivative at t = 1 is non-zero so is not di erentiable at 1.
Windows of higher regularity are constructed with a pro le k de ned
by induction for k 0 by
k+1 (t) = k sin t
2 for t 2 ;1 1]: For any k 0, one can verify that k satis es (8.105) and has 2k ; 1
vanishing derivatives at t = 1. The resulting and gp are therefore
2k ; 1 times continuously di erentiable. Heisenberg Box A local cosine basis can be symbolically represented as an exact paving of the time-frequency plane. The time and
frequency region of high energy concentration for each local cosine vector gp k is approximated by a Heisenberg rectangle ap ap+1] h pk ; 2lp pk + 2lp i as illustrated in Figure 8.18. A local cosine basis fgp kgk2N p2Z corresponds to a time-frequency grid whose size varies in time.
Figure 8.19(a) shows the decomposition of a digital recording of
the sound \grea" coming from the word \greasy". The window sizes
are adapted to the signal structures with the best basis algorithm described in Section 9.3.2. High amplitude coe cients are along spectral
lines in the time-frequency plane, which correspond to di erent harmonics. Most Heisenberg boxes appear in white, which indicates that
the corresponding inner product is nearly zero. This signal can thus be
approximated with a few non-zero local cosine vectors. Figure 8.19(b)
decomposes the same signal in a local cosine basis composed of small
windows of constant size. The signal time-frequency structures do not
appear as well as in Figure 8.19(a). 8.4. LAPPED ORTHOGONAL TRANSFORMS 485 ω 0 0 t g p(t) a p-1 ap lp t a p+1 Figure 8.18: The Heisenberg boxes of local cosine vectors de ne a regular grid over the time-frequency plane. Translation and Phase Cosine modulations as opposed to complex
exponentials do not provide easy access to phase information. The
translation of a signal can induce important modi cations of its decomposition coe cients in a cosine basis. Consider for example s f (t) = gp k(t) = gp(t) l2 cos
p k + 1 t ; ap :
2 lp Since the basis is orthogonal, hf gp ki = 1, and all other inner products
are zero. After a translation by = lp=(2k + 1) s p
f (t) = f t ; 2kl+ 1 = gp(t) l2 sin
p k + 1 t ; ap :
2 lk The opposite parity of sine and cosine implies that hf gp k i 0. In
contrast, hf gp k;1i and hf gp k+1i become non-zero. After translation, a signal component initially represented by a cosine of frequency
(k +1=2)=lp is therefore spread over cosine vectors of di erent frequencies. 486CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES f(t)
1000
0
−1000
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 t ω / 2π
250
200
150
100
50
0
0 (a) t ω / 2π
250
200
150
100
50
0
0 t (b)
Figure 8.19: (a): The signal at the top is a recording of the sound \grea"
in the word \greasy". This signal is decomposed in a local cosine basis
with windows of varying sizes. The larger the amplitude of jhf gp k ij
the darker the gray level of the Heisenberg box. (b): Decomposition in
a local cosine basis with small windows of constant size. 8.4. LAPPED ORTHOGONAL TRANSFORMS 487 This example shows that the local cosine coe cients of a pattern
are severely modi ed by any translation. We are facing the same translation distortions as observed in Section 5.4 for wavelets and timefrequency frames. This lack of translation invariance makes it di cult
to use these bases for pattern recognition. 8.4.4 Discrete Lapped Transforms Lapped orthogonal bases are discretized by replacing the orthogonal
basis of L2 0 1] with a discrete basis of C N , and uniformly sampling
the windows gp. Discrete local cosine bases are derived with discrete
cosine-IV bases.
Let fapgp2Z be a sequence of half integers, ap + 1=2 2 Z with
lim a
p!;1 p = ;1 and p!+1 ap = +1:
lim
A discrete lapped orthogonal basis is constructed with the discrete projectors Pp de ned in (8.93). These operators are implemented with
the sampled windows gp n] = gp(n). Suppose that fek l n]g0 k<l is an
orthogonal basis of signals de ned for 0 n < l. These vectors are
extended over Z with a symmetry with respect to ;1=2 and an antisymmetry with respect to l ; 1=2. The resulting extensions have a
period 4l and are de ned over ;2l 2l ; 1] by 8 e n]
> lk
< el k ;1 ; n]
el k n] = >;e 2l ; 1 ; n]
~
:;ek 2l + n]
k if n 2 0 l ; 1]
if n 2 ;l ;1]
if n 2 l 2l ; 1] :
if n 2 ;2l ;l ; 1]
The following theorem proves that multiplying these vectors with the
discrete windows gp n] yields an orthonormal basis of l2(Z). Theorem 8.11 (Coifman, Malvar, Meyer) Suppose that fek lg0
is an orthogonal basis of C l , for any l > 0. The family n o gp k n] = gp n] ek lp n ; ap]
~ is a lapped orthonormal basis of l2(Z). 0 k<lp p2Z k<l (8.106) 488CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
The proof of this theorem is identical to the proof of Theorem 8.10
since we have a discrete equivalent of the spaces Wp and their projectors. It is also based on a discrete equivalent of Lemma 8.1, which is
veri ed with the same derivations. Beyond the proof of Theorem 8.11,
we shall see that this lemma is important for quickly computing the
decomposition coe cients hf gp k i. Lemma 8.2 Any fb n] = gp n] hb n] 2 Wp and fc n] = gp n] hc n] 2
Wp satisfy
hfb fci = X ap ; p <n<ap+1 + p+1 fb n] fc n] = X ap <n<ap+1 hb n] hc n]: (8.107) Theorem 8.11 is similar to the discrete block basis Theorem 8.4 but
constructs an orthogonal basis with smooth discrete windows gp n]. The
discrete cosine IV bases ( el k n] = r 2 cos
k+1
l
l
2 ) n+ 1
2 0 k<l have the advantage of including vectors that have a natural symmetric
and antisymmetric extension with respect to ;1=2 and l ; 1=2. This
produces a discrete local cosine basis of l2(Z). Corollary 8.2 The family ( s gp k n] = gp n] l2 cos
p k + 1 n ; ap
2 lp )
0 k<lp p2Z (8.108) is an orthonormal basis of l2(Z). Fast Lapped Orthogonal Transform A fast algorithm introduced by Malvar 42] replaces the calculations of hf gp k i by a computation
of inner products in the original bases fel k g0 k<l, with a folding procedure. In a discrete local cosine basis, these inner products are calculated
with the fast DCT-IV algorithm. 8.4. LAPPED ORTHOGONAL TRANSFORMS
p 489 To simplify notations, as in Section 8.4.1 we decompose Ip = ap ;
ap+1 + p+1] into Ip = Op Cp Op+1 with Op = ap ; p ap + p ] and Cp = ap + p ap+1 ; p+1 ]: The orthogonal projector Pp on the space Wp generated by fgp kg0
was calculated in (8.93): k<lp Ppf n] = gp n] hp n]
where hp is a folded version of f : 8g n] f n] + g 2a ; n] f 2a ; n]
if n 2 Op
<p
pp
p
if n 2 Cp :
hp n] = :f n]
gp n] f n] ; gp 2ap+1 ; n] f 2ap+1 ; n] if n 2 Op+1 (8.109) Since gp k 2 Wp,
hf gp k i = hPp f gp k i = hgphp gpelp k i:
~ Since elp k n] = elp k n] for n 2 ap ap+1], Lemma 8.2 derives that
~
hf gp k i = X ap <n<ap+1 hp n] elp k n] = hhp elp k i ap ap+1 ]: (8.110) This proves that the decomposition coe cients hf gp k i can be calculated by folding f into hp and computing the inner product with the
orthogonal basis felp k g0 k<lp de ned over ap ap+1].
For a discrete cosine basis, the DCT-IV coe cients X s k + 1 n ; ap
2 lp
(8.111)
are computed with the fast DCT-IV algorithm of Section 8.3.4, which
requires O(lp log2 lp) operations. The inverse lapped transform recovers
hp n] over ap ap+1] from the lp inner products fhhp elp k i ap ap+1]g0 k<lp .
In a local cosine IV basis, this is done with the fast inverse DCTIV, which is identical to the forward DCT-IV and requires O(lp log2 lp)
hhp elp k i ap ap+1 ] = hp n] l2 cos
p
ap <n<ap+1 490CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
operations. The reconstruction of f is done by applying (8.95) which
proves that
+1
+1
X
X
f n] =
Ppf n] =
gp n] hp n]:
(8.112)
p=;1
p=;1
;
+
Let us denote Op = ap ; p ap] and Op = ap ap + p]. The
restriction of (8.112) to ap ap+1] gives 8g n] h n] + g n] h n] if n 2 O+
<p p
p;1
p;1
p
if n 2 Cp
f n] = :hp n]
;
gp n] hp n] + gp+1 n] hp+1 n] if n 2 Op+1 The symmetry of the windows guarantees that gp;1 n] = gp 2ap ; n] and
gp+1 n] = gp 2ap+1 ; n]. Since hp;1 n] is antisymmetric with respect to
ap and hp+1 n] is symmetric with respect to ap+1 , we can recover f n]
on ap ap+1] from the values of hp;1 n], hp n] and hp+1 n] computed
respectively on ap;1 ap], ap ap+1], and ap+1 ap+2]: 8g n] h n] ; g 2a ; n] h 2a ; n]
+
if n 2 Op
<p p
pp
p;1 p
f n] = :hp n]
if n 2 Cp
;
gp n] hp n] + gp 2ap+1 ; n] hp+1 2ap+1 ; n] if n 2 Op+1 (8.113) This unfolding formula is implemented with O(lp) calculations. The
inverse local cosine transform thus requires O(lp log2 lp) operations to
recover f n] on each interval ap ap+1] of length lp. Finite Signals If f n] is de ned for 0 n < N , the extremities of the rst and last interval must be a0 = ;1=2 and aq = N ; 1=2. A fast
local cosine algorithm needs O(lp log2 lp) additions and multiplications
to decompose or reconstruct the signal on each interval of length lp.
On the whole signal of length N , it thus needs a total of O(N log2 L)
operations, where L = sup0 p<q lp.
Since we do not know the values of f n] for n < 0, at the left
border we set 0 = 0. This means that g0 n] jumps from 0 to 1 at
n = 0. The resulting transform on the left boundary is equivalent to a
straight DCT-IV. Section 8.3.2 shows that since cosine IV vectors are
even on the left boundary, the DCT-IV is equivalent to a symmetric 8.5. LOCAL COSINE TREES 2 491 signal extension followed by a discrete Fourier transform. This avoids
creating discontinuity artifacts at the left border.
At the right border, we also set q = 0 to limit the support of gq;1
to 0 N ; 1]. Section 8.4.4 explains that since cosine IV vectors are odd
on the right boundary, the DCT-IV is equivalent to an antisymmetric
signal extension. If f N ; 1] 6= 0, this extension introduces a sharp
signal transition that creates arti cial high frequencies. To reduce this
border e ect, we replace the cosine IV modulation s gq;1 k n] = gq;1 n] l 2 cos
q;1
by a cosine I modulation k + 1 n ; aq;1
2 lq;1 s gq;1 k n] = gq;1 n] l 2 k cos k n ; aq;1 :
lq;1
q;1
The orthogonality with the other elements of the basis is maintained
because these cosine I vectors, like cosine IV vectors, are even with
respect to aq;1. Since cos kn ; aq;1 =lq;1] is also symmetric with respect to aq = N ; 1=2, computing a DCT-I is equivalent to performing a
symmetric signal extension at the right boundary, which avoids discontinuities. In the fast local cosine transform, we thus compute a DCT-I
of the last folded signal hq;1 instead of a DCT-IV. The reconstruction
algorithm uses an inverse DCT-I to recover hq;1 from these coe cients. 8.5 Local Cosine Trees 2
Corollary 8.1 constructs local cosine bases for any segmentation of the
time axis into intervals ap ap+1] of arbitrary lengths. This result is
more general than the construction of wavelet packet bases that can
only divide the frequency axis into dyadic intervals, whose length are
proportional to powers of 2. However, Coifman and Meyer 138] showed
that restricting the intervals to dyadic sizes has the advantage of creating a tree structure similar to a wavelet packet tree. \Best" local
cosine bases can then be adaptively chosen with the fast dynamical
programming algorithm described in Section 9.3.2. 492CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES 8.5.1 Binary Tree of Cosine Bases A local cosine tree includes orthogonal bases that segment the time axis
in dyadic intervals. For any j 0, the interval 0 1] is divided in 2j
intervals of length 2;j by setting
ap j = p 2;j for 0 p 2j :
These intervals are covered by
support ap j ; ap+1 j + ]:
8 ( ;1(t ; a ))
>
pj
<1
gp j (t) = > ( ;1(a ; t))
p+1 j
:0 windows gp j de ned by (8.85) with a
if t 2 ap j ; ap j + ]
if t 2 ap j + ap+1 j ; ]
if t 2 ap+1 j ; ap+1 j + ]
otherwise (8.114) To ensure that the support of gp j is in 0 1] for p = 0 and p = 2j ; 1, we
modify respectively the left and right sides of these windows by setting
g0 j (t) = 1 if t 2 0 ], and g2j ;1 j (t) = 1 if t 2 1 ; 1]. It follows that
g0 0 = 1 0 1]. The size of the raising and decaying pro les of gp j is
independent of j . To guarantee that windows overlap only with their
two neighbors, the length ap+1 j ; ap j = 2;j must be larger than the
size 2 of the overlapping intervals and hence
2;j;1:
(8.115)
Similarly to wavelet packet trees, a local cosine tree is constructed
by recursively dividing spaces built with local cosine bases. A tree node
at a depth j and a position p is associated to a space Wjp generated by
the local cosine family
Bjp ( r = gp j (t) 2 j cos
2; k + 1 t ;;ajp j
22 ) k 2Z : (8.116) j
Any f 2 Wp has a support in ap j ; ap+1 j + ] and can be written
f (t) = gp j (t) h(t) where h(t) is respectively symmetric and antisymmetric with respect to ap j and ap+1 j . The following proposition shows
p
p+1
that Wjp is divided in two orthogonal spaces Wj2+1 and Wj2+1 that are
built over the two half intervals. 8.5. LOCAL COSINE TREES 493 Proposition 8.7 (Coifman, Meyer) For any j 0 and p < 2j , the
2p
2p+1
spaces Wj +1 and Wj+1 are orthogonal and
p
p+1
Wjp = Wj2+1 Wj2+1 :
(8.117)
2p
2p+1
Proof 2 . The orthogonality of Wj +1 and Wj +1 is proved by Proposition
p
8.6. We denote Pp j the orthogonal projector on Wj . With the notation
of Section 8.4.1, this projector is decomposed into two splitting projectors
at ap j and ap+1 j :
Pp j = Pa+ j Pa;+1 j :
p
p
Equation (8.90) proves that
P2p j+1 + P2p+1 j +1 = Pa+p j+1 Pa;p+2 j+1 = Pa+ j Pa;+1 j = Pp j :
2
2
p
p
This equality on orthogonal projectors implies (8.117). The space Wjp located at the node (j p) of a local cosine tree is therefore
p
p+1
the sum of the two spaces Wj2+1 and Wj2+1 located at the children
0
nodes. Since g0 0 = 1 0 1] it follows that W0 = L2 0 1]. The maximum
depth J of the binary tree is limited by the support condition 2;J ;1,
and hence
J ; log2(2 ):
(8.118) Admissible Local Cosine Bases As in a wavelet packet binary tree, many local cosine orthogonal bases are constructed from this local
cosine tree. We call admissible binary tree any subtree of the local cosine
tree whose nodes have either 0 or 2 children. Let fji pig1 i I be the
indices at the leaves of a particular admissible binary tree. Applying
the splitting property (8.117) along the branches of this subtree proves
that
0
L2 0 1] = W0 = Ii=1 Wjpii :
Hence, the union of local cosine bases I=1 Bjpii is an orthogonal basis of
i
L2 0 1]. This can also be interpreted as a division of the time axis into
windows of various length, as illustrated by Figure 8.20.
The number BJ of di erent dyadic local cosine bases is equal to
the number of di erent admissible subtrees of depth at most J . For
J = ; log2 (2 ), Proposition 8.1 proves that
21=(4 ) BJ 23=(8 ) : 494CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
Figure 8.19 shows the decomposition of a sound recording in two dyadic
local cosine bases selected from the binary tree. The basis in (a) is
calculated with the best basis algorithm of Section 9.3.2. Choice of At all scales 2j , the windows gp j of a local cosine tree
have raising and decaying pro les of the same size . These windows can
thus be recombined independently from their scale. If is small compared to the interval size 2;j then gp j has a relatively sharp variation
at its borders compared to the size of its support. Since is not proportional to 2;j , the energy concentration of gp j is not improved when
^
;j increases. Even though f may be very smooth over
the window size 2
ap j ap+1 j ], the border variations of the window create relatively large
coe cients up to a frequency of the order of = .
W00
W1
1 W0
1
W22 2η W3
2 2η Figure 8.20: An admissible binary tree of local cosine spaces divides
the time axis in windows of dyadic lengths.
To reduce the number of large coe cients we must increase , but
this also increases the minimum window size in the tree, which is 2;J =
2 . The choice of is therefore the result of a trade-o between window
regularity and the maximum resolution of the time subdivision. There
is no equivalent limitation in the construction of wavelet packet bases. 8.5.2 Tree of Discrete Bases For discrete signals of size N , a binary tree of discrete cosine bases
is constructed like a binary tree of continuous time cosine bases. To 8.5. LOCAL COSINE TREES 495 simplify notations, the sampling distance is normalized to 1. If it is
equal to N ;1 then frequency parameters must be multiplied by N .
The subdivision points are located at half integers:
ap j = p N 2;j ; 1=2 for 0 p 2j :
The discrete windows are obtained by sampling the windows gp(t) dened in (8.114), gp j n] = gp j (n). The same border modi cation is used
to ensure that the support of all gp j n] is in 0 N ; 1].
A node at depth j and position p in the binary tree corresponds to
the space Wjp generated by the discrete local cosine family
Bjp ( r = gp j n] ;2 cos
2 jN k + 1 n2;jap j
2 ;N ) 0 k<N 2;j : 0
Since g0 0 = 1 0 N ;1], the space W0 at the root of the tree includes any
0
signal de ned over 0 n < N , so W0 = C N . As in Proposition 8.7 we
p is orthogonal to Wq for p 6= q and that
verify that Wj
j p
p+1
Wjp = Wj2+1 Wj2+1 : (8.119) The splitting property (8.119) implies that the union of local cosine families Bjp located at the leaves of an admissible subtree is an
0
orthogonal basis of W0 = C N . The minimum window size is limited by
N
2 2;j N so the maximum depth of this binary tree is J = log2 2 .
One can thus construct more than 22J ;1 = 2N=(4 ) di erent discrete
local cosine bases within this binary tree. Fast Calculations The fast local cosine transform algorithm de- scribed in Section 8.4.4 requires O(2;j N log2(2;j N )) operations to compute the inner products of f with the 2;j N vectors in the local cosine
family Bjp. The total number of operations to perform these computations at all nodes (j p) of the tree, for 0 p < 2j and 0 j J ,
is therefore O(NJ log2 N ). The local cosine decompositions in Figure 8.19 are calculated with this fast algorithm. To improve the right
border treatment, Section 8.4.4 explains that the last DCT-IV should
be replaced by a DCT-I, at each scale 2j . The signal f is recovered 496CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
from the local cosine coe cients at the leaves of any admissible binary
tree, with the fast local cosine reconstruction algorithm, which needs
O(N log2 N ) operations. 8.5.3 Image Cosine Quad-Tree A local cosine binary tree is extended in two dimensions into a quadtree, which recursively divides square image windows into four smaller
windows. This separable approach is similar to the extension of wavelet
packet bases in two dimensions, described in Section 8.2.
Let us consider images of N 2 pixels. A node of the quad-tree is
labeled by its depth j and two indices p and q. Let gp j n] be the
discrete one-dimensional window de ned in Section 8.5.2. At the depth
j , a node (p q) corresponds to a separable space
Wjp q = Wjp Wjq
(8.120)
which is generated by a separable local cosine basis of 2;2j N 2 vectors
2
;
Bjp q = gp j n1 ] gq j n2 ] ;j
cos k1 + 1 n1 ;j ap j
2N
2 2N
;
cos k2 + 1 n2 ;j aq j
2 2N
0 k1 k2 <2;j N
We know from (8.119) that
p
p+1
q
q+1
Wjp = Wj2+1 Wj2+1 and Wjq = Wj2+1 Wj2+1 :
Inserting these equations in (8.120) proves that Wjp q is the direct sum
of four orthogonal subspaces:
p
p+1
p
p+1
Wjp q = Wj2+12q Wj2+1 2q Wj2+12q+1 Wj2+1 2q+1:
(8.121)
A space Wjp q at a node (j p q) is therefore decomposed in the four
subspaces located at the four children nodes of the quad-tree. This
decomposition can also be interpreted as a division of the square window gp j n1 ]gq j n2 ] into four sub-windows of equal sizes, as illustrated
in Figure 8.21. The space located at the root of the tree is
0
0
0
W0 0 = W0 W0 :
(8.122) 8.5. LOCAL COSINE TREES 497 It includes all images of N 2 pixels. The size of the raising and decaying
pro les of the one-dimensional windows de nes the maximum depth
N
J = log2 2 of the quad-tree.
-j 2N W jp,q
2p,2q W j+1
2p,2q+1 W j+1 -j 2N 2p+1,2q
W j+1
2p+1,2q+1 W j+1 Figure 8.21: Functions in Wjp q have a support located in a square
region of the image. It is divided into four subspaces that cover smaller
squares in the image. Admissible Quad-Trees An admissible subtree of this local cosine quad-tree has nodes that have either 0 or four children. Applying the
decomposition property (8.121) along the branches of an admissible
quad-tree proves that the spaces Wjpii qi located at the leaves decompose
0
W0 0 in orthogonal subspaces. pThe union of the corresponding twodimensional local cosine bases Bjii qi is therefore an orthogonal basis of
0
W0 0. We proved in (8.42) that there are more than 24J ;1 = 2N 2 =16 2
N
di erent admissible trees of maximum depth J = log2 2 . These bases
divide the image plane into squares of varying sizes. Figure 8.22 gives
an example of image decomposition in a local cosine basis corresponding
to an admissible quad-tree. This local cosine basis is selected with the
best basis algorithm of Section 9.3.2. Fast Calculations Thep decomposition of an image f n] over a sepaq rable local cosine family Bj requires O(2;2j N 2 log2(2;j N )) operations,
with a separable implementation of the fast one-dimensional local cosine
transform. For a full local cosine quad-tree of depth J , these calculations are performed for 0 p q < 2j and 0 j J , which requires 498CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES Figure 8.22: The grid shows the support of the windows gj p n1 ] gj q n2 ]
of a \best" local cosine basis selected in the local cosine quad-tree. O(N 2J log2 N ) multiplications and additions. The original image is recovered from the local cosine coe cients at the leaves of any admissible
subtree with O(N 2 log2 N ) computations. 8.6 Problems
8.1.
8.2. Prove the discrete splitting Theorem 8.2.
2 Meyer wavelet packets are calculated with a Meyer conjugate
mirror lter (7.89). Compute the size of the frequency support of
p
^j as a function of 2j . Study the convergence of j n(t) when the
scale 2j goes to +1.
8.3. 1 Extend the separable wavelet packet tree of Section 8.2.2 for
discrete p-dimensional signals. Verify that the wavelet packet
tree of a p-dimensional discrete signal of N p samples includes
O(N p log2 N ) wavelet packet coe cients that are calculated with
O(K N p log2 N ) operations if the conjugate mirror lter h has K
non-zero coe cients.
p
8.4. 1 Anisotropic wavelet packets j a ; 2L;j n1 ] lq b ; 2L;l n2 ] may
have di erent scales 2j and 2l along the rows and columns. A
1 8.6. PROBLEMS 499 decomposition over such wavelet packets is calculated with a lter bank that lters and subsamples the image rows j ; L times
whereas the columns are ltered and subsampled l ; L times. For
an image f n] of N 2 pixels, show that a dictionary of anisotropic
wavelet packets includes O(N 2 log2 N ]2 ) di erent vectors. Compute the number of operations needed to decompose f in this dictionary.
8.5. 1 Hartley transform Let cas(t) = cos(t) + sin(t). We de ne
B = g n] = p1 cas 2 nk
:
k 8.6.
8.7.
8.8. 8.9. 8.10. N N 0 k<N (a) Prove that B is an orthonormal basis of C N .
(b) For any signal f n] of size N , nd a fast Hartley transform
algorithm based on the FFT, which computes fhf gk ig0 k<N
with O(N log2 N ) operations.
p
1 Prove that f 2 sin (k + 1=2) t]g
k2Z is an orthonormal basis of
L2 0 1]. Find apcorresponding discrete orthonormal basis of C N .
1 Prove that f 2 sin(k t)g
k2Z is an orthonormal basis of L2 0 1].
Find a corresponding discrete orthonormal basis of C N .
1 Lapped Fourier basis
(a) Construct a lapped orthogonal basis fgp k g(p k)2Z of L2 (R)
~
from the Fourier basis fexp(i2 kt)gk2Z of L2 0 1].
(b) Explain why this local Fourier basis does not contradict the
Balian-Low Theorem 5.6.
(c) Let f 2 L2(R) be such that jf^(!)j = O((1 + j!jp );1 ) for some
p > 0. Compute the rate of decay of jhf gp k ij when the fre~
quency index jkj increases. Compare it with the rate of decay
of jhf gp k ij, where gp k is a local cosine vector (8.104). How
do the two bases compare for signal processing applications?
1 Describe a fast algorithm to compute the Meyer orthogonal
wavelet transform with a lapped transform applied in the Fourier
domain. Calculate the numerical complexity of this algorithm for
periodic signals of size N . Compare this result with the numerical complexity of the standard fast wavelet transform algorithm,
where the convolutions with Meyer conjugate mirror lters are
calculated with an FFT.
2 Arbitrary Walsh tilings 500CHAPTER 8. WAVELET PACKET AND LOCAL COSINE BASES
p
p0
(a) Prove that two Walsh wavelet packets j n and j 0 n0 are orthogonal if their Heisenberg boxes de ned in Section 8.1.2 do
not intersect in the time-frequency plane 76].
(b) A dyadic tiling of the time-frequency plane is an exact cover
f 2j n 2j (n+1)] k 2;j (k +1) 2;j ]g(j n p)2I , where the index
set I is adjusted to guarantee that the time-frequency boxes
do not intersect and that they leave no hole. Prove that any
such tiling corresponds to a Walsh orthonormal basis of L2 (R) 8.11. f p
j n g(p j n)2I : Double tree We want to construct a dictionary of block wavelet
packet bases, which has the freedom to segment both the time
and frequency axes. For this purpose, as in a local cosine basis
dictionary, we construct a binary tree, which divides 0 1] in 2j
intervals p2;j (p + 1)2;j ], that correspond to nodes indexed by p
at the depth j of the tree. At each of these nodes, we construct
another tree of wavelet packet orthonormal bases of L2 p2;j (p +
1)2;j ] 208].
(a) De ne admissible sub-trees in this double tree, whose leaves
correspond to orthonormal bases of L2 0 1]. Give an example
of an admissible tree and draw the resulting tiling of the timefrequency plane.
(b) Give a recursive equation that relates the number of admissible
sub-trees of depth J +1 and of depth J . Give an upper bound
and a lower bound for the total number of orthogonal bases in
this double tree dictionary.
(c) Can one nd a basis in a double tree that is well adapted
to implement an e cient transform code for audio signals?
Justify your answer.
8.12. 2 An anisotropic local cosine basis for images is constructed with
rectangular windows that have a width 2j that may be di erent
from their height 2l . Similarly to a local cosine tree, such bases
are calculated by progressively dividing windows, but the horizontal and vertical divisions of these windows is done independently.
Show that a dictionary of anisotropic local cosine bases can be represented as a graph. Implement in WaveLab an algorithm that
decomposes images in a graph of anisotropic local cosine bases.
2 Chapter 9
An Approximation Tour
It is time to wonder why are we constructing so many di erent orthonormal bases. In signal processing, orthogonal bases are of interest because
they can e ciently approximate certain types of signals with just a few
vectors. Two examples of such applications are image compression and
the estimation of noisy signals, which are studied in Chapters 10 and
11.
Approximation theory studies the error produced by di erent approximation schemes in an orthonormal basis. A linear approximation projects the signal over M vectors chosen a priori. In Fourier or
wavelet bases, this linear approximation is particularly precise for uniformly regular signals. However, better approximations are obtained
by choosing the M basis vectors depending on the signal. Signals with
isolated singularities are well approximated in a wavelet basis with this
non-linear procedure.
A further degree of freedom is introduced by choosing the basis
adaptively, depending on the signal properties. From families of wavelet
packet bases and local cosine bases, a fast dynamical programming
algorithm is used to select the \best" basis that minimizes a Schur
concave cost function. The approximation vectors chosen from this
\best" basis outline the important signal structures, and characterize
their time-frequency properties. Pursuit algorithms generalize these
adaptive approximations by selecting the approximation vectors from
redundant dictionaries of time-frequency atoms, with no orthogonality
constraint.
501 CHAPTER 9. AN APPROXIMATION TOUR 502 9.1 Linear Approximations 1
A signal can be represented with M parameters in an orthonormal basis
by keeping M inner products with vectors chosen a priori. In Fourier
and wavelet bases, Sections 9.1.2 and 9.1.3 show that such a linear
approximation is e cient only if the signal is uniformly regular. Linear
approximations of random vectors are studied and optimized in Section
9.1.4. 9.1.1 Linear Approximation Error Let B = fgmgm2N be an orthonormal basis of a Hilbert space H. Any
f 2 H can be decomposed in this basis:
+1
X
f = hf gmi gm:
m=0 If instead of representing f by all inner products fhf gm igm2N we use
only the rst M , we get the approximation
M ;1
X
fM = hf gm i gm:
m=0 This approximation is the orthogonal projection of f over the space
VM generated by fgmg0 m<M . Since
+1
X
f ; fM =
hf gm i gm
m=M the approximation error is
l M ] = kf ; fM k = The fact that kf k2 =
decays to zero: +1
X 2 P+1 jhf g
m=0 lim
M !+1 l m ij m= M 2 jhf gm ij2 : (9.1) < +1 implies that the error M ] = 0: 9.1. LINEAR APPROXIMATIONS 503 However, the decay rate of l M ] as M increases depends on the decay
of jhf gmij as m increases. The following theorem gives equivalent
conditions on the decay of l M ] and jhf gmij. Theorem 9.1 For any s > 1=2, there exists A B > 0 such that if
P+1
m=0 jmj 2s A +1
X m=0 jhf gm ij2 < +1 then 2s m jhf gmij +1
X 2 M =0 M; 2s 1 l M] B and hence l M ] = o(M ;2s ). +1
X m=0 m2s jhf gm ij2 (9.2) Proof 1 . By inserting (9.1), we compute
+1
X M =0 M 2s;1 l M ] = +1 +1
XX M =0 m=M M 2s;1 jhf gm ij2 = +1
X m=0 jhf gmij2 m
X
M =0 M 2s;1: For any s > 1=2
Zm
0 x ; dx
2s 1 P m
X M; 2s 1 M =0 Z m+1
1 x2s;1 dx which implies that m =0 M 2s;1 m2s and hence proves (9.2).
M
To verify that l M ] = o(M ;2s ), observe that l m]
l M ] for
m M , so
l M] M ;1
X m; 2s 1 M ;1
X m ; l m]
2s 1 +1
X m=M=2
m=M=2
m=M=2
P
Since +1 m2s;1 l m] < +1 it follows that
m=1 lim M !+1 +1
X m=M=2 m2s;1 l m]: (9.3) m2s;1 l m] = 0: P ;1
Moreover, there exists C > 0 such that M=M=2 m2s;1
m
(9.3) implies that limM !+1 l M ] M 2s = 0. C M 2s , so CHAPTER 9. AN APPROXIMATION TOUR 504 This theorem proves that the linear approximation error of f in the
basis B decays faster than M ;2s if f belongs to the space
(
)
+1
X 2s
WB s = f 2 H :
m jhf gmij2 < +1 :
m=0 The next sections prove that if B is a Fourier or wavelet basis, then
WB s is a Sobolev space. Observe that the linear approximation of
f from the rst M vectors of B is not always precise because these
vectors are not necessarily the best ones with which to approximate f .
Non-linear approximations calculated with vectors chosen adaptively
depending upon f are studied in Section 9.2. 9.1.2 Linear Fourier Approximations The Fourier basis can approximate uniformly regular signals with few
low-frequency sinuso dal waves. The approximation error is related
to the Sobolev di erentiability. It is also calculated for discontinuous
signals having a bounded total variation. Sobolev Di erentiability The smoothness of f can be measured by the number of times it is di erentiable. However, to distinguish the
regularity of functions that are n ; 1 times, but not n times, continuously di erentiable, we must extend the notion of di erentiability to
non-integers. This can be done in the Fourier domain. Recall that
the Fourier transform of the derivative f 0(t) is i!f^(!). The Plancherel
formula proves that f 0 2 L2(R ) if Z +1 ^
j! j jf (! )j2 d! = 2
2 Z +1 jf 0 (t)j2 dt < +1: ;1
;1
This suggests replacing the usual pointwise de nition of the derivative
by a de nition based on the Fourier transform. We say that f 2 L2(R )
is di erentiable in the sense of Sobolev if
Z +1
^
j! j2 jf (! )j2 d! < +1:
(9.4)
;1 9.1. LINEAR APPROXIMATIONS 505 This integral imposes that jf^(!)j must have a su ciently fast decay
when the frequency ! goes to +1. As in Section 2.3.1, the regularity
of f is measured from the asymptotic decay of its Fourier transform.
This de nition is generalized for any s > 0. The space Ws(R ) of
Sobolev functions that are s times di erentiable is the space of functions
f 2 L2 (R ) whose Fourier transforms satisfy 72]
Z +1
^
(9.5)
j! j2s jf (! )j2 d! < +1:
;1
If s > n + 1=2, then one can verify (Problem 9.2) that f is n times
continuously di erentiable. We de ne the space Ws 0 1] of functions
on 0 1] that are s times di erentiable in the sense of Sobolev as the
space of functions f 2 L2 0 1] that can be extended outside 0 1] into
a function f 2 Ws(R ). Fourier Approximations Theorem 3.2 proves (modulo a change of
variable) that fei2 mt gm2Z is an orthonormal basis of L2 0 1]. We can
thus decompose f 2 L2 0 1] in the Fourier series
f (t) =
with hf (u) +1
X m=;1 hf (u) ei2 ei2 mui = Z1
0 mui ei2 mt f (u) e;i2 (9.6) mu du: The decomposition (9.6) de nes a periodic extension of f for all t 2
R . The decay of the Fourier coe cients jhf (u) ei2 mu ij as m increases
depends on the regularity of this periodic extension. To avoid creating
singularities at t = 0 or at t = 1 with this periodization, we suppose
that the support of f is strictly included in (0 1). One can then prove
(not trivial) that if f 2 L2 0 1] is a function whose support is included
in (0 1), then f 2 Ws 0 1] if and only if
+1
X 2s
jmj jhf (u) ei2 mu ij2 < +1:
(9.7)
m=;1 CHAPTER 9. AN APPROXIMATION TOUR 506 The linear approximation of f 2 L2 0 1] by the M sinuso ds of lower
frequencies is X fM (t) = hf (u) ei2 mu i ei2 mt : jmj M=2
For di erentiable functions in the sense of Sobolev, the following proposition computes the approximation error
l M ] = kf ; fM k =
2 Z1
0 jf (t) ; fM (t)j2 dt = X jmj>M=2 jhf (u) ei2 mu ij2 : (9.8) Proposition 9.1 Let f 2 L2 0 1] be a function whose support is included in (0 1). Then f 2 Ws 0 1] if and only if
+1
X M
M 2s lM ] < +1
M =1
which implies l M ] = o(M ;2s ). (9.9) Functions in Ws 0 1] with a support in (0 1) are characterized by
(9.7). This proposition is therefore a consequence of Theorem 9.1. The
linear Fourier approximation thus decays quickly if and only if f has a
large regularity exponent s in the sense of Sobolev. Discontinuities and Bounded Variation If f is discontinuous,
then f 2 Ws 0 1] for any s > 1=2. Proposition 9.1 thus proves that
= M ] can decay like M ; only if
1. For bounded variation functions, which are introduced in Section 2.3.3, the following proposition
proves that l M ] = O(M ;1 ). A function has a bounded variation if
l kf kV = Z1
0 jf 0 (t)j dt < +1 : The derivative must be taken in the sense of distributions because f
may be discontinuous, as is the case for f = 1 0 1=2]. Recall that a M ]
b M ] if a M ] = O(b M ]) and b M ] = O(a M ]). 9.1. LINEAR APPROXIMATIONS 507 Proposition 9.2
If kf kV < +1 then l M ] = O(kf k2 M ;1 ).
V
If f = C 1 0 1=2] then l M ] kf k2 M ;1 .
V
Proof 2 . If kf kV < +1 then jhf (u) exp(i2m u)ij = Z 1 f (u) exp(;i2m u) du
Z1
i
=
f 0(u) exp(;22m u) dt
;i m
0 Hence
l M] = X 0 f2
jhf (u) exp(i2m u)ij2 k4 k2V jmj>M=2
If f = C 1 0 1=2] then kf kV = 2C and X jmj>M=2 kf kV
2jmj : 1 = O(kf k2 M ;1 ):
V
m2 jhf (u) exp(i2m u)ij = 0 ( jmj) if m 6= 0 is even
C=
if m is odd,
so l M ] C 2 M ;1 .
This proposition shows that when f is discontinuous with bounded
variations, then l M ] decays typically like M ;1 . Figure 9.1(b) shows a
bounded variation signal approximated by Fourier coe cients of lower
frequencies. The approximation error is concentrated in the neighborhood of discontinuities where the removal of high frequencies creates
Gibbs oscillations (see Section 2.3.1). Localized Approximations To localize Fourier series approxima- tions over intervals, we multiply f by smooth windows that cover each
of these intervals. The Balian-Low Theorem 5.6 proves that one cannot build local Fourier bases with smooth windows of compact support.
However, Section 8.4.2 constructs orthonormal bases by replacing complex exponentials by cosine functions. For appropriate windows gp of
compact support ap ; p ap+1 + p+1], Corollary 8.1 constructs an orthonormal basis of L2(R ): ( s gp k (t) = gp(t) l2 cos
p 1
k + 2 t ; ap
lp ) k2N p2Z : CHAPTER 9. AN APPROXIMATION TOUR 508 f(t)
40
20
0
−20
0 0.2 0.4 0.6 0.8 1 fM(t)
40
20
0
−20
0 0.2 0.4 0.6 0.8 1 fM(t)
40
20
0
−20
0 0.2 0.4 0.6 0.8 1 t t t Figure 9.1: Top: Original signal f . Middle: Signal fM approximated from lower frequency Fourier coe cients, with M=N = 0:15 and
kf ; fM k=kf k = 8:63 10;2. Bottom: Signal fM approximated from
larger scale Daubechies 4 wavelet coe cients, with M=N = 0:15 and
kf ; fM k=kf k = 8:58 10;2. 9.1. LINEAR APPROXIMATIONS 509 Writing f in this local cosine basis is equivalent to segmenting it into
several windowed components fp(t) = f (t) gp(t), which are decomposed
in a cosine IV basis. If gp is C1, the regularity of gp(t) f (t) is the
same as the regularity of f over ap ; p ap+1 + p+1]. Section 8.3.2
relates cosine IV coe cients to Fourier series coe cients. It follows
from Proposition 9.1 that if fp 2 Ws(R ), then the approximation
M ;1
X
fp M = hf gp k i gp k
k=0 yields an error = o(M ;2s):
The approximation error in a local cosine basis thus depends on the
local regularity of f over each window support.
p l M ] = kfp ; fp M k 2 9.1.3 Linear Multiresolution Approximations Linear approximations of f from large scale wavelet coe cients are
equivalent to nite element approximations over uniform grids. The
approximation error depends on the uniform regularity of f . In a periodic orthogonal wavelet basis, this approximation behaves like a Fourier
series approximation. In both cases, it is necessary to impose that f
have a support inside (0 1) to avoid border discontinuities created by
the periodization. This result is improved by the adapted wavelet basis
of Section 7.5.3, whose border wavelets keep their vanishing moments.
These wavelet bases e ciently approximate any function that is uniformly regular over 0 1], as well as the restriction to 0 1] of regular
functions having a larger support. Uniform Approximation Grid Section 7.5 explains how to design
wavelet orthonormal bases of L2 0 1], with a maximum scale 2J < 1: h f J ng0 n<2;J f j ng;1<j J i 0 n<2;j : (9.10) We suppose that the wavelets j n are in Cq and have q vanishing
moments. The M = 2;l scaling functions and wavelets at scales 2j > 2l CHAPTER 9. AN APPROXIMATION TOUR 510 de ne an orthonormal basis of the approximation space Vl : h f J ng0 n<2;J f j ngl<j J i 0 n<2;j : (9.11) The approximation of f over the M rst wavelets and scaling functions
is an orthogonal projection on Vl :
J ;j ;
2;J ;1
X 2X1
X
fM = PVl f =
hf j ni j n +
hf J ni J n : (9.12)
j =l+1 n=0 n=0 Since Vl also admits an orthonormal basis of M = 2;l scaling functions
f l ng0 n<2;l , this projection can be rewritten: fM = PVl f = ;l ;
X1 2 n=0 hf l ni l n : (9.13) This summation is an approximation of f with 2;l nite elements
l n(t) = l (t ; 2l n) translated over a uniform grid. The approximation
error is the energy of wavelet coe cients at scales ner than 2l :
l ;j ;
X 2X1
2
jhf j nij2 :
(9.14)
l M ] = kf ; fM k =
j =;1 n=0
If 2;l < M < 2;l+1, one must include in the approximations (9.12) and
(9.13) the coe cients of the M ; 2;l wavelets f l;1 ng0 n<M ;2;l at the
scale 2l;1. Approximation error Like a Fourier basis, a wavelet basis provides an e cient approximation of functions that are s times di erentiable
in the sense of Sobolev over 0 1] (i.e., functions of Ws 0 1]). If has
q vanishing moments then (6.11) proves that the wavelet transform is a
multiscale di erential operator of order q. To test the di erentiability
of f up to order s we thus need q > s. The following theorem gives
a necessary and su cient condition on the wavelet coe cients so that
f 2 Ws 0 1]. 9.1. LINEAR APPROXIMATIONS 511 Theorem 9.2 Let 0 < s < q be a Sobolev exponent. A function f 2
L2 0 1] is in Ws 0 1] if and only if
J ;j ;
X 2X1 ;2sj
2 jhf j nij 2 j =;1 n=0 < +1: (9.15) Proof 2 . We give an intuitive justi cation but not a proof of this result.
To simplify, we suppose that the support of f is included in (0 1). If we
extend f by zeros outside 0 1] then f 2 Ws (R), which means that 1 2s
j!j jf^(!)j2 d! < +1:
;1 Z + (9.16) The low frequency part of this integral always remains nite because f 2 L2(R):
Z Z j!j jf^(!)j2 d! 2;2sJ 2s
jf^(!)j2 d! 2;2sJ 2s kf k2:
;J
j! j 2
j!j
The energy of ^j n is essentially concentrated in the intervals ;2;j 2 ;2;j ]
2;j 2;j 2 ]. As a consequence
Z
2; j ; 1
X
jhf j nij2 ;j
jf^(!)j2 d!:
;j+1
2
j!j 2
n=0
2s Over this interval j!j 2;j , so
Z
2;j ;1
X ;2sj
2
2 jhf j nij 2;j
j!j 2;j+1 n=0 It follows that JX
X 2;j ;1 2;2sj jhf j nij 2 Z j!j2s jf^(!)j2 d!:
j!j2s jf^(!)j2 d! j!j 2;J
j =;1 n=0
which explains why (9.16) is equivalent to (9.15). This theorem proves that the Sobolev regularity of f is equivalent to
a fast decay of the wavelet coe cients jhf j nij when the scale 2j decreases. If has q vanishing moments but is not q times continuously
di erentiable, then f 2 Ws 0 1] implies (9.15), but the opposite implication is not true. The following proposition uses the decay condition
(9.15) to compute the approximation error with M wavelets. 512 CHAPTER 9. AN APPROXIMATION TOUR Proposition 9.3 Let 0 < s < q be a Sobolev exponent. A function
f 2 L2 0 1] is in Ws 0 1] if and only if
+1
X M
M 2s lM ] < +1
M =1
which implies l M ] = o(M ;2s ). (9.17) Proof 2 . Let us write the wavelets j n = gm with m = 2;j + n. One
can verify that the Sobolev condition (9.15) is equivalent to
+1
X m=0 jmj2s jhf gmij2 < +1: The proof ends by applying Theorem 9.1. Proposition 9.3 proves that f 2 Ws 0 1] if and only if the approximation error l M ] decays slightly faster than M ;2s . The wavelet approximation error is of the same order as the Fourier approximation error
calculated in (9.9). If the wavelet has q vanishing moments but is not q
times continuously di erentiable, then f 2 Ws 0 1] implies (9.17) but
the opposite implication is false.
If f has a discontinuity in (0 1) then f 2 Ws 0 1] for s > 1=2
=
so Proposition 9.3 proves that we cannot have l M ] = O(M ; ) for
> 1. If f has bounded variation, one can verify (Problem 9.4) that
l M ] = O(M ;1 ), and if f = 1 0 1=2] then l M ] M ;1 . This result is
identical to Proposition 9.2, obtained in a Fourier basis.
Figure 9.1 gives an example of discontinuous signal with bounded
variation, which is approximated by its larger scale wavelet coe cients.
The largest amplitude errors are in the neighborhood of singularities,
where the scale should be re ned. The relative approximation error
kf ; fM k=kf k = 8:56 10;2 is almost the same as in a Fourier basis. Multidimensional Approximations The results of this section are easily extended to multidimensional signals decomposed in the separable wavelet basis constructed in Section 7.7.4. If f 2 L2 0 1]d then
f is approximated by M = 2;d l wavelets at scales 2j < 2l . As in
dimension d = 1, the decay rate of the error l M ] depends on the 9.1. LINEAR APPROXIMATIONS 513 (a)
(b)
Figure 9.2: (a): Original Lena f of N 2 = 2562 pixels. (b): Linear
approximations fM from M = N 2 =16 Symmlet 4 wavelet coe cients at
scales 2j > 22 N ;1 : kf ; fM k=kf k = 0:036. Sobolev regularity of f . In dimension d = 2, the linear approximation
error of bounded variation functions may decay arbitrarily slowly. For
any decreasing sequence M ] such that limM !+1 M ] = 0 one can
show (Problem 9.5) that there exists a bounded variation image f on
0 1]2 such that l M ]
M ]. If f is discontinuous along a contour of
length L > 0 then one cannot have l M ] = O(M ; ) for > 0. The
approximation error thus decays extremely slowly.
Figure 9.2(a) has N 2 = 2562 pixels. Since its support is normalized
to 0 1]2, the N 2 wavelet coe cients are at scales 1 > 2j > N ;1 . Figure
9.2(b) is approximated with M = 2;4 N 2 wavelet coe cients at scales
2j > 22 N ;1 , which suppresses the ner details. Gibbs oscillations
appear in the neighborhood of contours. Section 9.2.2 explains how
to improve this approximation with a non-linear selection of wavelet
coe cients. CHAPTER 9. AN APPROXIMATION TOUR 514 9.1.4 Karhunen-Loeve Approximations 2 Let us consider a whole class of signals that we approximate with the
rst M vectors of a basis. These signals are modeled as realizations of a
random vector F n] of size N . We show that the basis that minimizes
the average linear approximation error is the Karhunen-Loeve basis
(principal components).
Appendix A.6 reviews the covariance properties of random vectors.
If F n] does not have a zero mean, we subtract the expected value
EfF n]g from F n] to get a zero mean. The random vector F can be
decomposed in an orthogonal basis fgmg0 m<N :
N ;1
X
F = hF gmi gm :
m=0 Each coe cient
hF gm i = X N ;1
n=0 F n] gm n] is a random variable (see Appendix A.6). The approximation from the
rst M vectors of the basis is
M ;1
X
FM = hF gmi gm:
m=0 The resulting mean-square error is
l n o N ;1 n
X M ] = E kF ; FM k =
2 m=M E jhF gm ij 2 o : This error is related to the covariance of F de ned by
R n m] = EfF n] F m]g:
Let K be the covariance operator represented by this matrix. For any
vector x n],
(N ;1 N ;1
)
n
o
XX
E jhF xij2 = E
F n] F m] x n] x m]
n=0 m=0 9.1. LINEAR APPROXIMATIONS
= XX N ;1 N ;1
n=0 m=0 515 R n m] x n] x m] = hKx xi:
The error l M ] is therefore a sum of the last N ;M diagonal coe cients
of the covariance operator
N ;1
X
M] =
hKgm gm i:
l
m=M The covariance operator K is Hermitian and positive and is thus diagonalized in an orthogonal basis called a Karhunen-Loeve basis. This
basis is not unique if several eigenvalues are equal. The following theorem proves that a Karhunen-Loeve basis is optimal for linear approximations. Theorem 9.3 Let K be a covariance operator. For all M
approximation error l M] = X N ;1
m=M 1, the hKgm gm i is minimum if and only if fgmg0 m<N is a Karhunen-Loeve basis whose
vectors are ordered by decreasing eigenvalues
hKgm+1 gm+1 i for 0 hKgm gm i m < N ; 1: Proof 3 . Let us consider an arbitrary orthonormal basis fhm g0 m<N .
The trace tr(K ) of K is independent of the basis: tr(K ) = N ;1
X
m=0 hKhm hm i: P ;1
P;
The basis that minimizes N =M hKhm hm i thus maximizes M=01 hKhm hm i.
m
m
Let fgm g0 m<N be a basis that diagonalizes K : Kgm = m gm
2 2
with m 2 m+1 for 0 m < N ; 1: CHAPTER 9. AN APPROXIMATION TOUR 516 The theorem is proved by verifying that for all M
M ;1
X
m=0 M ;1
X hKhm hm i m=0 hKgm gm i = 0, M ;1
X m:
2 m=0 To relate hKhm hm i to the eigenvalues f i2 g0 i<N , we expand hm in
the basis fgi g0 i<N : hKhm hm i =
Hence with M ;1
X
m=0 hKhm hm i = 0 qi =
We evaluate
M ;1
X
m=0 hKhm hm i ;
=
= M ;1
X
m=0 N ;1
X
i=0 M ;1 N ;1
XX
m=0 i=0 jhhm gi ij2 i2 : jhhm gi ij2 i2 =
N ;1
X jhhm gi ij2 1 and M ;1
X
i=0
N ;1
X N ;1
X qi i2 ; i=0
M ;1
X qi i ;
2 i=0
M ;1
X
i=0 i=
2 i=0 2 i i=0 M ;1
X
i=0 (9.18)
N ;1
X
i=0 2 i qi = M: 2 i + M ;1 M ;
2 2
( i2 ; M ;1 ) (qi ; 1) + qi N ;1
X i=M N ;1
X
i=0 qi qi ( i2 ; M ;1 ):
2 Since the eigenvalues are listed in order of decreasing amplitude, it follows that
M ;1
M ;1
X
X2
hKhm hm i ;
0:
m
m=0 m=0 Suppose that this last inequality is an equality. We nish the proof by
showing that fhm g0 m<N must be a Karhunen-Loeve basis. If i < M ,
2
2
then i2 6= M ;1 implies qi = 1. If i M , then i2 6= M ;1 implies
2
qi = 0. This is valid for all M 0 if hhm gi i 6= 0 only when i2 = m .
This means that the change of basis is performed inside each eigenspace
of K so fhm g0 m<N also diagonalizes K . 9.1. LINEAR APPROXIMATIONS 517 Theorem 9.3 proves that a Karhunen-Loeve basis yields the smallest
average error when approximating a class of signals by their projection
on M orthogonal vectors, chosen a priori. This result has a simple geometrical interpretation. The realizations of F de ne a cloud of points
in C N . The density of this cloud speci es the probability distribution
of F . The vectors gm of the Karhunen-Loeve basis give the directions
2
of the principal axes of the cloud. Large eigenvalues m correspond
to directions gm along which the cloud is highly elongated. Theorem
9.3 proves that projecting the realizations of F on these principal components yields the smallest average error. If F is a Gaussian random
vector, the probability density is uniform along ellipsoids whose axes
are proportional to m in the direction of gm . These principal directions
are thus truly the preferred directions of the process. Random Shift Processes If the process is not Gaussian, its proba- bility distribution can have a complex geometry, and a linear approximation along the principal axes may not be e cient. As an example, we
consider a random vector F n] of size N that isP random shift modulo
a
N of a deterministic signal f n] of zero mean, N ;1 f n] = 0:
n=0 F n] = f (n ; P ) mod N ]:
(9.19)
The shift P is an integer random variable whose probability distribution
is uniform on 0 N ; 1]:
Pr(P = p) = 1 for 0 p < N:
N
This process has a zero mean:
X
1 N ;1 f (n ; p) mod N ] = 0
E fF n]g = N
p=0
and its covariance is X
1 N ;1
R n k] = EfF n]F k]g = N f (n ; p) mod N ] f (k ; p) mod N ]
p=0 1
= N f ? f n ; k] with f n] = f ;n] : (9.20) CHAPTER 9. AN APPROXIMATION TOUR 518 Hence R n k] = RF n ; k] with
1
RF k] = N f ? f k]: Since RF is N periodic, F is a circular stationary random vector, as
de ned in Appendix A.6. The covariance operator K is a circular convolution with RF and is therefore diagonalized in the discrete Fourier
;
Karhunen-Loeve basis f p1N exp i2 Nmn g0 m<N . The eigenvalues are
given by the Fourier transform of RF :
1^ 2
2
^
(9.21)
m = RF m] = N jf m]j :
Theorem 9.3 proves that a linear approximation that projects F on
M vectors selected a priori is optimized in this Fourier basis. To better
understand this result, let us consider an extreme case where f n] =
n] ; n ; 1]. Theorem 9.3 guarantees that the Fourier KarhunenLoeve basis produces a smaller expected approximation error than does
a canonical basis of Diracs fgm n] = n ; m]g0 m<N . Indeed, we do not
know a priori the abscissa of the non-zero coe cients of F , so there is no
particular Dirac that is better adapted to perform the approximation.
Since the Fourier vectors cover the whole support of F , they always
absorb part of the signal energy:
E (D mn
F n] p1 exp i2 N
N E 2) ^
= RF m] = 4 sin2 k :
N
N Selecting M higher frequency Fourier coe cients thus yields a better
mean-square approximation than choosing a priori M Dirac vectors to
perform the approximation.
The linear approximation of F in a Fourier basis is not e cient
^
because all the eigenvalues RF m] have the same order of magnitude. A
simple non-linear algorithm can improve this approximation. In a Dirac
basis, F is exactly reproduced by selecting the two Diracs corresponding
to the largest amplitude coe cients, whose positions P and P ; 1
depend on each realization of F . A non-linear algorithm that selects
the largest amplitude coe cient for each realization of F is not e cient 9.2. NON-LINEAR APPROXIMATIONS 1 519 in a Fourier basis. Indeed, the realizations of F do not have their
energy concentrated over a few large amplitude Fourier coe cients.
This example shows that when F is not a Gaussian process, a non-linear
approximation may be much more precise than a linear approximation,
and the Karhunen-Loeve basis is no longer optimal. 9.2 Non-Linear Approximations 1
Linear approximations project the signal on M vectors selected a priori.
This approximation is improved by choosing the M vectors depending
on each signal. The next section analyzes the performance of these
non-linear approximations. These results are then applied to wavelet
bases. 9.2.1 Non-Linear Approximation Error A signal f 2 H is approximated with M vectors selected adaptively in
an orthonormal basis B = fgmgm2N of H. Let fM be the projection of
f over M vectors whose indices are in IM : fM = X m2IM hf gm i gm : The approximation error is the sum of the remaining coe cients: M ] = kf ; fM k2 = X m= IM
2 jhf gm ij2 : (9.22) To minimize this error, the indices in IM must correspond to the M
vectors having the largest inner product amplitude jhf gm ij. These
are the vectors that best correlate f . They can thus be interpreted as
the \main" features of f . The resulting n M ] is necessarily smaller
than the error of a linear approximation (9.1), which selects the M
approximation vectors independently of f .
r
Let us sort fjhf gmijgm2N in decreasing order. We denote fB k] =
hf gmk i the coe cient of rank k:
r
r
jfB k]j jfB k + 1]j with k > 0. CHAPTER 9. AN APPROXIMATION TOUR 520 The best non-linear approximation is fM = M
X
k=1 r
fB k] gmk : (9.23) It can also be calculated by applying the thresholding function
T (x) = x if jxj T
0 if jxj < T (9.24) r
r
with a threshold T such that fB M + 1] < T fB M ]:
+1
X
fM =
T (hf gm i) gm :
m=0 (9.25) The minimum non-linear approximation error is
+1
Xr2
2
jfB k]j :
n M ] = kf ; fM k =
k=M +1 The following theorem relates the decay of this approximation error as
r
M increases to the decay of jfB k]j as k increases.
r
Theorem 9.4 Let s > 1=2. If there exists C > 0 such that jfB k]j C k;s then Conversely, if n M]
n C 2 M 1;2s :
2s ; 1 (9.26) M ] satis es (9.26) then
;s
r
jfB k]j
1; 1
C k;s :
2s Proof 2 . Since
n M] = +1
X k=M +1 jf r B k]j 2 C 2 +1
X k=M +1 (9.27) k;2s 9.2. NON-LINEAR APPROXIMATION IN BASES
and +1
X k=M +1 1 ;2s
1;2s
x dx = M ; 1
2s
M Z k;2s + 521
(9.28) we derive (9.26).
Conversely, let < 1,
n M
X M] k= M +1 r
r
jfB k]j2 (1 ; ) M jfB M ]j2 : So if (9.26) is satis ed C 2 1;2s M ;2s :
1;
2s ; 1 1 ;
For = 1 ; 1=2s we get (9.27) for k = M .
The decay of sorted inner products can be evaluated from the lp norm
of these inner products:
r
jfB M ]j2 M ] M ;1 n +1
X kf kB p = m=0 jhf gm ijp The following theorem relates the decay of !1=p n : M ] to kf kB p. Theorem 9.5 Let p < 2. If kf kB p < +1 then
and n M ] = o(M 1;2=p ). r
jfB k]j kf kB p k;1=p (9.29) Proof 2 . We prove (9.29) by observing that +1
k
Xr p Xr p
jfB n]j
B p = jfB n]j
n=1
n=1
To show that n M ] = o(M 1;2=p ), we set kf kp S k] = ; 2k 1
X n=k r
k jfB k]jp : r
r
jfB n]jp k jfB 2k]jp : CHAPTER 9. AN APPROXIMATION TOUR 522
Hence
n M] = +1
X k=M +1 r
jfB k]j2
2=p sup jS k]j k>M=2 +1
X k=M +1
+
X 1 S k=2]2=p (k=2);2=p (k=2);2=p : k=M +1 1r
Since kf kp p = +=1 jfB n]jp < +1, it follows that limk!+1 supk>M=2 jS k]j =
n
B
0. We thus derive from (9.28) that n M ] = o(M 1;2=p ).
P This theorem speci es spaces of functions that are well approximated
by a few vectors of an orthogonal basis B. We denote n o BB p = f 2 H : kf kB p < +1 : (9.30) If f 2 BB p then Theorem 9.5 proves that n M ] = o(M 1;2=p ). This
is called a Jackson inequality 22]. Conversely, if n M ] = O(M 1;2=p)
then the Bernstein inequality (9.27) for s = 1=p shows that f 2 BB q
for any q > p. Section 9.2.3 studies the properties of the spaces BB p
for wavelet bases. 9.2.2 Wavelet Adaptive Grids
A non-linear approximation in a wavelet orthonormal basis de nes an
adaptive grid that re nes the approximation scale in the neighborhood
of the signal singularities. Theorem 9.5 proves that this non-linear
approximation introduces a small error if the sorted wavelet coe cients
have a fast decay, which can be related to Besov spaces 157]. We study
the performance of such wavelet approximations for bounded variation
functions and piecewise regular functions.
We consider a wavelet basis adapted to L2 0 1], constructed in Section 7.5.3 with compactly supported wavelets that are Cq with q vanishing moments: h B= f J ng0 n<2;J f j ng;1<j J 0 n<2;j i : 9.2. NON-LINEAR APPROXIMATION IN BASES 523 To simplify the notation we write J n = J +1 n. The best non-linear
approximation of f 2 L2 0 1] from M wavelets is fM = X hf j ni j n 2IM
where IM is the index set of the M wavelet coe cients having the
largest amplitude jhf j nij. The approximation error is
(j n) X :
2
r
r
r
Let fB k] = hf jk nk i be the coe cient of rank k: jfB k]j jfB k + 1]j
r
for k 1. Theorem 9.4 proves that jfB k]j = O(k;s) if and only if
n M ] = O(M ;2s ). The error n M ] is always smaller than the linear approximation error l M ] studied in Section 9.1.3, but we must
understand under which condition this improvement is important.
n M ] = kf ; fM k2 = (j n) = IM jhf j nij 2 Piecewise Regularity If f is piecewise regular then we show that M ] has a fast decay as M increases. Few wavelet coe cients are
a ected by isolated discontinuities and the error decay depends on the
uniform regularity between these discontinuities.
n Proposition 9.4 If f has a nite number of discontinuities on 0 1]
and is uniformly Lipschitz
;2
n M ] = O(M ). < q between these discontinuities, then r
Proof 2 . We prove that n M ] = O(M ;2 ) by verifying that fB k] =
; ;1=2 ) and applying inequality (9.26) of Theorem 9.4. We distinO(k
guish type 1 wavelets j n, whose support includes an abscissa where f is
discontinuous, from type 2 wavelets, whose support is included in a dor
r
main where f is uniformly Lipschitz . Let fB 1 k] and fB 2 k] be the values of the wavelet coe cient of rank k among type 1 and type 2 wavelets.
r
r
We show that fB k] = O(k; ;1=2 ) by verifying that fB 1 k] = O(k; ;1=2 )
r
and that fB 2 k] = O(k; ;1=2 ).
If f is uniformly Lipschitz on the support of j n then there exists
A such that
jhf j nij A 2j( +1=2) :
(9.31) 524 CHAPTER 9. AN APPROXIMATION TOUR
Indeed, orthogonal wavelet coe cients are samples of the continuous
wavelet transform hf j ni = Wf (2j n 2j ), so (9.31) is a consequence of
(6.18).
For any l > 0, there are at most 2l type 2 coe cients at scales
j > 2;l . Moreover, (9.31) shows that all type 2 coe cients at scales
2
2j 2;l are smaller than A 2l( +1=2) , so
r
fB 2 2l ] A2;l( +1=2) : r
It follows that fB 2 k] = O(k; ;1=2 ), for all k > 0.
Let us now consider the type 1 wavelets. There exists K > 0 such
that each wavelet j n has its support included in 2j n ; 2j K=2 2j n +
2j nK=2]. At each scale 2j , there are thus at most K wavelets whose
support includes a given abscissa v. This implies that there are at most
K D wavelets j n whose support includes at least one of the D discontinuities of f . Since f is uniformly Lipschitz > 0 outside these points, f
is necessarily uniformly bounded on 0 1] and thus uniformly Lipschitz
0. Hence (9.31) shows that there exists A such that jhf j nij A 2j=2 .
Since there are at most lKD type 1 coe cients at scales 2j > 2;l and
since all type 1 coe cients at scales 2j 2;l are smaller than A2;l=2 we
get
r
fB 1 lKD] A2;l=2 :
r
This implies that fB 1 k] = O(k; ;1=2 ) for any > 0, which ends the
proof. If > 1=2, then n M ] decays faster than l M ] since we saw in Section
9.1.3 that the presence of discontinuities implies that l M ] decays like
M ;1 . The more regular f is between its discontinuities, the larger
the improvement of non-linear approximations with respect to linear
approximations. Adaptive Grids Isolated singularities create large amplitude wavelet coe cients but there are few of them. The approximation fM calculated
from the M largest amplitude wavelet coe cients can be interpreted
as an adaptive grid approximation, where the approximation scale is
re ned in the neighborhood of singularities.
A non-linear approximation keeps all coe cients jhf j nij T , for
r
r
a threshold fB M ] T > fB M + 1]. In a region where f is uniformly 9.2. NON-LINEAR APPROXIMATION IN BASES
Lipschitz , since jhf
typically at scales j n ij A 2j( 2j T
=A > 2l +1=2) 525 the coe cients above T are 2=(2 +1) : Setting to zero all wavelet coe cients below the scale 2l is equivalent
to computing a local approximation of f at the scale 2l . The smaller
the local Lipschitz regularity , the ner the approximation scale 2l.
Figure 9.3 shows the non-linear wavelet approximation of a piecewise regular signal. Observe that the largest amplitude wavelet coe cients are in the cone of in uence of each singularity. Since the
approximation scale is re ned in the neighborhood of each singularity,
they are much better restored than in the xed scale linear approximation shown in Figure 9.1. The non-linear approximation error in this
case is 17 times smaller than the linear approximation error.
Non-linear wavelet approximations are nearly optimal compared to
s
adaptive spline approximations. A spline approximation fM is calculated by choosing K nodes t1 < t2 < < tK inside 0 1]. Over each
interval tk tk+1], f is approximated by the closest polynomial of degree
s
r. This polynomial spline fM is speci ed by M = K (r +2) parameters,
which are the node locations ftk g1 k K plus the K (r + 1) parameters
s
of the K polynomials of degree r. To reduce kf ; fM k, the nodes
must be closely spaced when f is irregular and farther apart when f is
s
smooth. However, nding the M parameters that minimize kf ; fM k
is a di cult non-linear optimization.
A non-linear approximation with wavelets having q = r + 1 vanishing moments is much faster to compute than an optimized spline
approximation. A spline wavelet basis of Battle-Lemarie gives nonlinear approximations that are also splines functions, but the nodes tk
are restricted to dyadic locations 2j n, with a scale 2j that is locally
adapted to the signal regularity. For large classes of signals, including the balls of Besov spaces, the maximum approximation errors with
wavelets or with optimized splines have the same decay rate when M
increases 158]. The computational overhead of an optimized spline
approximation is therefore not worth it. CHAPTER 9. AN APPROXIMATION TOUR 526 f(t)
40
20
0
−20
0 0.2 0.4 (a) 0.6 0.8 1 0.6 0.8 1 t 2−5
2−6
2−7
2−8
2−9 (b)
fM(t)
40
20
0
−20
0 0.2 0.4 t (c)
Figure 9.3: (a): Original signal f . (b): Larger M = 0:15 N wavelet
coe cients calculated with a Symmlet 4. (c): Non-linear approximation fM recovered from the M wavelet coe cients shown above,
kf ; fM k=kf k = 5:1 10;3. 9.2. NON-LINEAR APPROXIMATION IN BASES 527 9.2.3 Besov Spaces 3 Studying the performance of non-linear wavelet approximations more
precisely requires introducing a new space. As previously, we write
J n = J +1 n . The Besov space Bs 0 1] is the set of functions f 2
L2 0 1] such that
kf ks 02
0 ;j ;
J +1
X 6 ;j(s+1=2;1= ) @2X1
=B
jhf
@ 42
j =;1 n=0 11= 3 11=
7C
j n ij A 5 A < +1 : (9.32)
Frazier, Jawerth 182] and Meyer 270] proved that Bs 0 1] does not
depend on the particular choice of wavelet basis, as long as the wavelets
in the basis have q > s vanishing moments and are in Cq . The space
Bs 0 1] corresponds typically to functions that have a \derivative of
order s" that is in L 0 1]. The index is a fune tuning parameter,
which is less important. We need q > s because a wavelet with q
vanishing moments can test the di erentiability of a signal only up to
the order q.
If
2, then functions in Bs 0 1] have a uniform regularity of
order s. For = = 2, Theorem 9.2 proves that Bs 2 0 1] = Ws 0 1]
2
is the space of s times di erentiable functions in the sense of Sobolev.
Proposition 9.3 proves that this space is characterized by the decay of
the linear approximation error l M ] and that l M ] = o(M ;2s ). Since
n M]
l M ] clearly n M ] = o(M ;s ). One can verify (Problem 9.6)
that for a large class of functions inside Ws 0 1], the non-linear approximation error has the same decay rate as the linear approximation
error. It is therefore not useful to use non-linear approximations in a
Sobolev space.
For < 2, functions in Bs 0 1] are not necessarily uniformly regular. The adaptativity of non-linear approximations then improves the
decay rate of the error signi cantly. In particular, if p = = and
s = 1=2 + 1=p, then the Besov norm is a simple lp norm:
kf ks 0 J +1 2;j ;1
XX
=@
jhf
j =;1 n=0 11=p
p
j nij A : 528 CHAPTER 9. AN APPROXIMATION TOUR Theorem 9.5 proves that if f 2 Bs 0 1], then n M ] = o(M 1;2=p ). The
smaller p, the faster the error decay. The proof of Proposition 9.4 shows
that although f may be discontinuous, if the number of discontinuities
is nite and f is uniformly Lipschitz between these discontinuities,
r
then its sorted wavelet coe cients satisfy jfB k]j = O(k; ;1=2 ), so f 2
Bs 0 1] for 1=p < + 1=2. This shows that these spaces include
functions that are not s times di erentiable at all points. The linear
approximation error l M ] for f 2 Bs 0 1] can decrease arbitrarily
slowly because the M wavelet coe cients at the largest scales may be
arbitrarily small. A non-linear approximation is much more e cient in
these spaces. Bounded Variation Bounded variation functions are important ex- amples of signals for which a non-linear approximation yields a much
smaller error than a linear approximation. The total variation norm is
de ned in (2.57) by
Z1
kf kV = jf 0(t)j dt :
0
0 must be understood in the sense of distributions,
The derivative f
in order to include discontinuous functions. The following theorem
computes an upper and a lower bound of kf kV from the modulus of
wavelet coe cients. Since kf kV does not change when a constant is
added to f , the maximum amplitude of f is controlled with the sup
norm kf k1 = supt2R jf (t)j. Theorem 9.6 Consider a wavelet basis constructed with such that
k kV < +1. There exist A B > 0 such that for all f 2 L2 0 1]
J +1 ;j ;
X 2X1 ;j=2
kf kV + kf k1 B
2 jhf
j =;1 n=0 and
kf kV + kf k1 A sup j J +1 02;j ;1
@ X 2;j=2 jhf
n=0 j nij = B kf k1 1 1 (9.33) 1
j nijA = A kf k1 1 1 : (9.34) 9.2. NON-LINEAR APPROXIMATION IN BASES 529 Proof 2 . By decomposing f in the wavelet basis
JX
2;J ;1
X 2;j ;1
X
f=
hf j ni j n +
hf J ni J n
j =;1 n=0 we get kf kV + kf k1 n=0 JX
X 2;j ;1 jhf j =;1 n=0
2;J ;1
X + n=0 jhf j nij J nij k k j nkV J n kV + k j n k1 (9.35) + k J n k1 : The wavelet basis includes wavelets whose support are inside (0 1) and
border wavelets, which are obtained by dilating and translating a nite
number of mother wavelets. To simplify notations we write the basis as
if there were a single mother wavelet: j n(t) = 2;j=2 (2;j t ; n). Hence,
we verify with a change of variable that k j n kV + k j n k1 = Z 1 0 2;j=2 2;j j 0 (2;j t ; n)j dt +2;j=2 sup j (2;j t ; n)j t2 0 1]
;j=2 k kV + k k1 :
=2 Since J n(t) = 2;J=2 (2;J t ; n) we also prove that k J n kV + k J n k1 = 2;J=2 k kV + k k1 : The inequality (9.33) is thus derived from (9.35).
Since has at least one vanishing moment, its primitive is a function with the same support, which we suppose included in ;K=2 K=2].
To prove (9.34), for j J we make an integration by parts:
2;j ;1
2;j ;1 Z 1
X
X
jhf j nij =
f (t) 2;j=2 (2;j t ; n) dt
n=0 n=0 = 0 2;j ;1 Z
X n=0 0 1 f 0 (t) 2j=2 (2;j t ; n) dt CHAPTER 9. AN APPROXIMATION TOUR 530 2;j ;1 Z 1
j=2 X
2
jf 0(t)j j
0
n=0 (2;j t ; n)j dt : Since has a support in ;K=2 K=2],
Z1
2;j ;1
X
j=2 K sup j (t)j
jhf j nij 2
jf 0(t)j dt A;12j=2 kf kV :
t2R n=0 0 The largest scale 2J is a xed constant and hence
Z
2;J ;1
X
;3J=2 sup jf (t)j 1 j J n(t)jdt
jhf J nij
2
n=0 t2 0 1]
Z 2;J=2 kf k1 0 (9.36) 0 1 j (t)jdt A;1 2J=2 kf k1 : This inequality and (9.36) prove (9.34). This theorem shows that the total variation norm is bounded by two
Besov norms: A kf k1 1 1 kf kV + kf k1 B kf k1 1 1 :
One can verify that if kf kV < +1, then kf k1 < +1 (Problem 9.1),
but we do not control the value of kf k1 from kf kV because the addition
of a constant changes kf k1 but does not modify kf kV . The space
BV 0 1] of bounded variation functions is therefore embedded in the
corresponding Besov spaces: B1 1 0 1] BV 0 1] B1 1 0 1] :
1
1
If f 2 BV 0 1] has discontinuities, then the linear approximation error
l M ] does not decay faster than M ;1 . The following theorem proves
that n M ] has a faster decay. Proposition 9.5 There exists B such that for all f 2 BV 0 1]
n M] B kf k2 M ;2 :
V (9.37) 9.2. NON-LINEAR APPROXIMATION IN BASES 531 r
Proof 2 . We denote by fB k] the wavelet coe cient of rank k, excluding
all the scaling coe cients hf J ni, since we cannot control their value
with kf kV . We rst show that there exists B0 such that for all f 2
BV 0 1]
r
jfB k]j B0 kf kV k;3=2 :
(9.38)
To take into account the fact that (9.38) does not apply to the 2J scaling
coe cients hf J ni, we observe that in the worst case they are selected
by the non-linear approximation so
0 +1
X M] k=M ;2J +1 r
jfB k]j2 : (9.39) Since 2J is a constant, inserting (9.38) proves (9.37).
The upper bound (9.38) is proved by computing an upper bound of
the number of coe cients larger than an arbitrary threshold T . At scale
r
2j , we denote by fB j k] the coe cient of rank k among fhf j nig0 n 2;j .
The inequality (9.36) proves that for all j J
2;j ;1
X
jhf j nij A;1 2j=2 kf kV :
n=0 It thus follows from (9.29) that
r
fB j k] A;1 2j=2 kf kV k;1 = C 2j=2 k;1 :
At scale 2j , the number kj of coe cients larger than T thus satis es kj min(2;j 2j=2 C T ;1 ) :
The total number k of coe cients larger than T is
k= J
X
j =;1 kj X
2j (C ;1 T )2=3 2;j + X
2j >(C ;1 T )2=3 6 (CT ;1 )2=3 :
r
By choosing T = jfB k]j, since C = A;1 kf kV , we get
which proves (9.38). r
jfB k]j 63=2 A;1 kf kV k;3=2 2j=2 CT ;1 CHAPTER 9. AN APPROXIMATION TOUR 532 The error decay rate M ;2 obtained with wavelets for all bounded variation functions cannot be improved either by optimal spline approximations or by any non-linear approximation calculated in an orthonormal basis 160]. In this sense, wavelets are optimal for approximating
bounded variation functions. 9.2.4 Image Approximations with Wavelets Non-linear approximations of functions in L2 0 1]d can be calculated in
separable wavelet bases. In multiple dimensions, wavelet approximations are often not optimal because they cannot be adapted to the geometry of the signal singularities. We concentrate on the two-dimensional
case for image processing.
Section 7.7.4 constructs a separable wavelet basis of L2 0 1]d from a
wavelet basis of L2 0 1], with separable products of wavelets and scaling
functions. We suppose that all wavelets of the basis of L2 0 1] are Cq
with q vanishing moments. The wavelet basis of L2 0 1]2 includes three
elementary wavelets f l g1 l 3 that are dilated by 2j and translated over
a square grid of interval 2j in 0 1]2. Modulo modi cations near the
borders, these wavelets can be written
1l
l
j n (x) = 2j x1 ; 2j n1 x2 ; 2j n2 :
2j
2j (9.40) If we limit the scales to 2j 2J , we must complete the wavelet family
with two-dimensional scaling functions
1 J n(x) = J
2 x1 ; 2J n1 x2 ; 2J n2
2J
2J to obtain an orthonormal basis
B= f J ng2J n2 0 1)2 f l
j ngj J 2j n2 0 1)2 1 l 3 : A non-linear approximation fM in this wavelet basis is constructed
from the M wavelet coe cients of largest amplitude. Figure 9.4(b)
shows the position of these M = N 2 =16 wavelet coe cients for Lena. 9.2. NON-LINEAR APPROXIMATION IN BASES 533 The large amplitude coe cients are located in the area where the image intensity varies sharply, in particular along the edges. The corresponding approximation fM is shown in Figure 9.4(a). This non-linear
approximation is much more precise than the linear approximation of
Figure 9.2(b), l M ] 10 n M ]. As in one dimension, the non-linear
wavelet approximation can be interpreted as an adaptive grid approximation. By keeping wavelet coe cients at ne scales, we re ne the
approximation along the image contours. (a)
(b)
Figure 9.4: (a): Non-linear approximation fM of a Lena image
f of N 2 = 2562 pixels, with M = N 2 =16 wavelet coe cients:
kf ; fM k=kf k = 0:011. Compare with the linear approximation of
Figure 9.2(b). (b): The positions of the largest M wavelet coe cients
are shown in black. Bounded Variation Images Besov spaces over 0 1]2 are de ned with norms similar to (9.32) these norms are calculated from the modulus of wavelet coe cients. We rather concentrate on the space of
bounded variation functions, which is particularly important in image
processing. CHAPTER 9. AN APPROXIMATION TOUR 534 The total variation of f is de ned in Section 2.3.3 by
kf kV = Z 1Z 1
0 0 ~
jrf (x1 x2 )j dx1 dx2 : (9.41) ~
The partial derivatives of rf must be taken in the general sense of
distribution in order to include discontinuous functions. Let @ t be
the level set de ned as the boundary of
t = f(x1 x2) 2 R 2 : f (x1 x2) > tg : Theorem 2.7 proves that the total variation depends on the length
H 1(@ t ) of level sets:
Z 1Z 1
Z +1
~
jrf (x1 x2 )j dx1 dx2 =
H 1(@ t ) dt:
(9.42)
0
0
;1
The following theorem gives upper and lower bounds of kf kV from
wavelet coe cients and computes the decay of the approximation error
n M ]. We suppose that the separable wavelet basis has been calculated
from a one-dimensional wavelet with bounded variation. We denote by
r
fB k] the rank k wavelet coe cient of f , without including the 22J
scaling coe cients hf J ni. Theorem 9.7 (Cohen, DeVore, Pertrushev, Xu) If kf kV < +1
then there exist A B > 0 such that A kf kV J
3
XX X
j =;1 l=1 2;j n2 0 1]2 jhf l
j nij + X
;J n2 0 1]2 jhf J nij : (9.43) 2 r
The sorted wavelet coe cients fB k] satisfy
r
jfB k]j so n M] B kf kV k;1 (9.44) B 2 kf k2 M ;1 :
V (9.45) 9.2. NON-LINEAR APPROXIMATION IN BASES 535 r
log 2|fB [k]|
12 10 8 (b)
6 (a)
4 2
6 8 10 12 14 log2 k 16 r
Figure 9.5: Sorted wavelet coe cients log2 jfB k]j as a function of log2 k
for two images. (a): Lena image shown in Figure 9.2(a). (b): Mandrill
image shown in Figure 11.6. Proof 2 . We prove (9.43) exactly as we did (9.33), by observing that
k jl nkV = k l kV and that k J nkV = k kV . The proof of (9.44) is much
more technical 133]. To take into account the exclusion of the 22J scaling coe cients
1
r
hf J ni in (9.44), we observe as in (9.39) that 0 M ] P+=M ;22J +1 jfB k]j2 ,
k
from which we derive (9.45). The norm kf k1 that appears in Theorem 9.6 does not appear in Theorem 9.7 because in two dimensions kf kV < +1 does not imply that
kf k1 < +1. The inequality (9.45) proves that if kf kV < +1 then
r
jfB k]j = O(k;1 ). Lena is a bounded variation image in the sense
r
of (2.70), and Figure 9.5 shows that indeed log2 jfB k]j decays with a
slope that reaches ;1 as log2 k increases. In contrast, the Mandrill
image shown in Figure 11.6 does not have a bounded total variation
r
because of the fur, and indeed log2 jfB k]j decays with slope that reaches
;0:65 > ;1.
The upper bound (9.45) proves that the non-linear approximation
error n M ] of a bounded variation image decays at least like M ;1 ,
whereas one can prove (Problem 9.5) that the linear approximation error l M ] may decay arbitrarily slowly. The non-linear approximation
of Lena in Figure 9.4(a) is indeed much more precise than the linear approximation in Figure 9.2(b), which is calculated with the same number
of wavelet coe cients, CHAPTER 9. AN APPROXIMATION TOUR 536 Piecewise Regular Images In one dimension, Proposition 9.4 proves that a nite number of discontinuities does not in uence the decay rate
r
of sorted wavelet coe cients jfB k]j, which depends on the uniform signal regularity outside the discontinuities. Piecewise regular functions
are thus better approximated than functions for which we only know
that they have a bounded variation. A piecewise regular image has discontinuities along curves of dimension 1, which create a non-negligible
number of high amplitude wavelet coe cients. The following proposition veri es with a simple example of piecewise regular image, that
r
the sorted wavelet coe cients jfB k]j do not decay faster than k;1. As
in Theorem 9.7, the 22J scaling coe cients hf J ni are not included
among the sorted wavelet coe cients. Proposition 9.6 If f = 1 is the indicator function of a set whose
border @ has a nite length, then
r
jfB k]j and hence n kf kV k;1 M ] kf k2 M ;1 :
V (9.46)
(9.47) Proof 2 . The main idea of the proof is given without detail. If the supl
l
port of j n does not intersect the border @ , then hf j ni = 0 because
f is constant over the support of jl n. The wavelets jl n have a square
support of size proportional to 2j , which is translated on a grid of interval 2j . Since @ has a nite length L, there are on the order of L 2;j
wavelets whose support intersects @ . Figure 9.6(b) illustrates the position of these coe cients.
l
Along the border, we verify that jhf j nij 2j by replacing the
wavelet by its expression (9.40). Since the amplitude of these coe cients
decreases as the scale 2j decreases and since there are on the order of
L 2;j non-zero coe cients at scales larger than 2j , the wavelet coe cient
r
r
fB k] of rank k is at a scale 2j such that k L 2;j . Hence jfB k]j 2j
r
L k;1 . The co-area (9.41) formula proves that kf kV = L, so jfB k]j
;1, which proves (9.46). As in the proof of Theorem 9.7, (9.47) is
kf kV k
derived from (9.46). This proposition shows that the sorted wavelet coe cients of f = 1 do
not decay any faster than the sorted wavelet coe cients of any bounded 9.2. NON-LINEAR APPROXIMATION IN BASES 537 (a)
(b)
Figure 9.6: (a): Image f = 1 . (b): At the scale 2j , the wavelets jl n
are translated on a grid of interval 2j which is indicated by the smaller
dots. They have a square support proportional to 2j . The darker dots
correspond to wavelets whose support intersects the frontier of , for
which hf jl ni 6= 0.
r
variation function, for which (9.44) proves that jfB k]j = O(kf kV k;1).
This property can be extended to piecewise regular functions that have
a discontinuity of amplitude larger than a > 0 along a contour of length
L > 0. The non-linear approximation errors n M ] of general bounded
variation images and piecewise regular images have essentially the same
decay. Approximation with Adaptive Geometry Supposing that an im- age has bounded variations is equivalent to imposing that its level set
have a nite average length, but it does not impose geometrical regularity conditions on these level sets. The level sets and \edges" of many
images such as Lena are often curves with a regular geometry, which is
a prior information that the approximation scheme should be able to
use.
In two dimensions, wavelets cannot use the regularity of level sets
because they have a square support that is not adapted to the image
geometry. More e cient non-linear approximations may be constructed
using functions whose support has a shape that can be adapted to 538 CHAPTER 9. AN APPROXIMATION TOUR the regularity of the image contours. For example, one may construct
piecewise linear approximations with adapted triangulations 293, 178]. Figure 9.7: A piecewise linear approximation of f = 1 is optimized
with a triangulation whose triangles are narrow in the direction where
f is discontinuous, along the border @ .
A function f 2 L2 0 1]2 is approximated with a triangulation composed of M triangles by a function fM that is linear on each triangle
and which minimizes kf ; fM k. This is a two-dimensional extension
of the spline approximations studied in Section 9.2.2. The di culty
is to optimize the geometry of the triangulation to reduce the error
kf ; fM k. Let us consider the case where f = 1 , with a border @
which is a di erentiable curve of nite length and bounded curvature.
The triangles inside and outside may have a large support since f
is constant and therefore linear on these triangles. On the other hand,
the triangles that intersect @ must be narrow in order to minimize the
approximation error in the direction where f is discontinuous. One can
use M=2 triangles for the inside and M=2 for the outside of . Since
@ has a nite length, this border can be covered by M=2 triangles
which have a length on the order of M ;1 in the direction of the tangent
~ of @ . Since the curvature of @ is bounded, one can verify that the
width of these triangles can be on the order of M ;2 in the direction
perpendicular to ~ . The border triangles are thus very narrow, as illustrated by Figure 9.7. One can now easily show that there exists a
function fM that is linear on each triangle of this triangulation and such 9.3. ADAPTIVE BASIS SELECTION 2 539 that kf ; fM k2 M ;2 . This error thus decays more rapidly than the
non-linear wavelet approximation error n M ] M ;1 . The adaptive
triangulation yields a smaller error because it follows the geometrical
regularity of the image contours.
Donoho studies the optimal approximation of particular classes of
indicator functions with elongated wavelets called wedglets 165]. However, at present there exists no algorithm for computing quasi-optimal
approximations adapted to the geometry of complex images such as
Lena. Solving this problem would improve image compression and denoising algorithms. 9.3 Adaptive Basis Selection 2
To optimize non-linear signal approximations, one can adaptively choose
the basis depending on the signal. Section 9.3.1 explains how to select
a \best" basis from a dictionary of bases, by minimizing a concave cost
function. Wavelet packet and local cosine bases are large families of
orthogonal bases that include di erent types of time-frequency atoms.
A best wavelet packet basis or a best local cosine basis decomposes the
signal over atoms that are adapted to the signal time-frequency structures. Section 9.3.2 introduces a fast best basis selection algorithm.
The performance of a best basis approximation is evaluated in Section
9.3.3 through particular examples. 9.3.1 Best Basis and Schur Concavity We consider a dictionary D that is a union of orthonormal bases in a
signal space of nite dimension N :
D= B: 2
Each orthonormal basis is a family of N vectors
B = fgm g1 m N :
Wavelet packets and local cosine trees are examples of dictionaries
where the bases share some common vectors. CHAPTER 9. AN APPROXIMATION TOUR 540 Comparison of Bases We want to optimize the non-linear approx- imation of f by choosing a best basis in D. Let IM be the index set
of the M vectors of B that maximize jhf gmij. The best non-linear
approximation of f in B is fM = X m2IM hf gm i gm : The approximation error is M] = X m= IM
2 jhf gm ij2 = kf k2 ; X
m2IM jhf gm ij2 : (9.48) De nition 9.1 We say that B = fgmg1
B = fgm g1 mN m N is a better basis than
for approximating f if for all M 1 M] M ]: (9.49) This basis comparison is a partial order relation between bases in D.
Neither B nor B is better if there exist M0 and M1 such that M0 ] < M0 ] and M1 ] > M1 ]: (9.50) Inserting (9.48) proves that the better basis condition (9.49) is equivalent to:
X
X
8M 1
jhf gm ij2
jhf gm ij2 :
(9.51)
m2IM
m2IM
The following theorem derives a criteria based on Schur concave cost
functions. Theorem 9.8 A basis B is a better basis than B to approximate f
if and only if for all concave functions (u)
N
X m=1 jhf gm ij2
kf k2 N
X m=1 jhf gm ij2
:
kf k2 (9.52) Proof 3 . The proof of this theorem is based on the following classical
result in the theory of majorization 45]. 9.3. ADAPTIVE BASIS SELECTION 541 Lemma 9.1 (Hardy, Littlewood, Polya) Let x m] 0 and y m]
0 be two positive sequences of size N , with x m] x m + 1] and y m] y m + 1] for 1 m N
(9.53)
P
P
and N =1 x m] = N =1 y m]. For all M N these sequences satisfy
m
m
M
X
m=1 M
X x m] m=1 y m] (9.54) if and only if for all concave functions (u)
N
X
m=1 N
X (x m]) m=1 (y m]): (9.55) We rst prove that (9.54) implies (9.55). Let be a concave function.
We denote by H the set of vectors z of dimension N such that z 1] z N ]: For any z 2 H, we write the partial sum Sz M ] =
We denote by M
X
m=1 z m]: the multivariable function (Sz 1] Sz 2] : : : Sz N ]) =
= N
X
m=1 (z m]) (Sz 1]) + N
X
m=2 (Sz m] ; Sz m ; 1]) The sorting hypothesis (9.53) implies that x 2 H and y 2 H, and we
know that they have the same sum Sx N ] = Sy N ]. Condition (9.54)
can be rewritten Sx M ] Sy M ] for 1 M < N . To prove (9.55) is
thus equivalent to showing that is a decreasing function with respect
to each of its arguments Sz k] as long as z remains in H. In other words,
we must prove that for any 1 k N
(Sz 1] Sz 2] : : : Sz N ]) (Sz 1] : : : Sz k;1] Sz k]+ Sz k+1] : : : Sz N ]) CHAPTER 9. AN APPROXIMATION TOUR 542
which means that
N
X
m=1 (z m]) k ;1
X (z m])+ (z k]+ )+ (z k +1]; )+ m=1 N
X
m=k+2 (z m]): (9.56)
To guarantee that we remain in H despite the addition of , its value
must satisfy z k ; 1] z k] + z k + 1] ; z k + 2]: The inequality (9.56) amounts to proving that
(z k]) + (z k + 1]) (z k] + ) + (z k + 1] ; ) : (9.57) Let us show that this is a consequence of the concavity of .
By de nition, is concave if for any (x y) and 0
1
( x + (1 ; ) y) (x) + (1 ; ) (y): (9.58) Let us decompose z k] = (z k] + ) + (1 ; ) (z k + 1] ; )
and
with z k + 1] = (1 ; ) (z k] + ) + (z k + 1] ; )
0 ; + 1] +
= zz kk]]; zz kk+ 1] + 2 1: Computing (z k])+ (z k +1]) and applying the concavity (9.58) yields
(9.57). This nishes the proof of (9.55).
We now verify that (9.54) is true if (9.55) is valid for a particular
family of concave thresholding functions de ned by
M (u) = x M ] ; u if u x M ] :
0
otherwise Let us evaluate
N
X
m=1 M (x m]) = M x M ] ; M
X
m=1 x m]: 9.3. ADAPTIVE BASIS SELECTION 543 P The hypothesis (9.55) implies that N =1 M (x m])
m
Moreover (u) 0 and (u) x M ] ; u so M x M ]; M
X m=1 x m] N
X M (y m]) m=1 M
X
m=1 M (y m]) PN
m=1 M (y m]). M x M ]; M
X
m=1 y m] which proves (9.54) and thus Lemma 9.1.
The statement of the theorem is a direct consequence of Lemma 9.1.
For any basis B , we sort the inner products jhf gm ij and denote x k] = jhf gmk ij2 x k + 1] = jhf gmk+1 ij2 :
kf k2
kf k2
P The energy conservation in an orthogonal basis implies N=1 x k] = 1:
k
Condition (9.51) proves that a basis B is better than a basis B if and
only if for all M 1
M
X
k=1 x k] M
X
k=1 x k]: Lemma 9.1 proves that this is equivalent to imposing that for all concave
functions ,
N
X
k=1 (x k]) which is identical to (9.52). N
X
k=1 (x k]) In practice, two bases are compared using a single concave function
(u). The cost of approximating f in a basis B is de ned by the
Schur concave sum C (f B ) = N
X m=1 jhf gm ij2
:
kf k2 Theorem 9.8 proves that if B is a better basis than B for approximating f then
C (f B ) C (f B ):
(9.59)
This condition is necessary but not su cient to guarantee that B is
better than B since we test a single concave function. Coifman and 544 CHAPTER 9. AN APPROXIMATION TOUR Wickerhauser 140] nd a best basis B in D by minimizing the cost of
f:
C (f B ) = min C (f B ):
2 There exists no better basis in D to approximate f . However, there
are often other bases in D that are equivalent in the sense of (9.50). In
this case, the choice of the best basis depends on the particular concave
function . Ideal and Di using Bases An ideal basis B for approximating f has one of its vectors proportional to f , say gm = f with 2 C .
Clearly f can then be recovered with a single basis vector. If (0) = 0
then the cost of f in this basis is C (f B) = (1). In contrast, a worst
basis for approximating f is a basis B that di uses uniformly the energy
of f across all vectors:
kf k2
2
jhf gm ij =
N for 0 m < N :
The cost of f in a di using basis is C (f B) = N (N ;1 ). Proposition 9.7 Any basis B is worse than an ideal basis and better
than a di using basis for approximating f . If (0) = 0 then
(1) C (f B) N 1 : N (9.60) Proof 2 . An ideal basis is clearly better than any other basis in the sense
of De nition 9.1, since it produces a zero error for M 1. The approximation error from M vectors in a di using basis is kf k2 (N ; M )=N . To
prove that any basis B is better than a di using basis, observe that if m
is not in the index set IM corresponding to the M largest inner products
then
2
X
jhf gmij2 1
jhf gn ij2 kf k :
(9.61) M n2IM
M
The approximation error from M vectors thus satis es
X
;
M] =
jhf gm ij2 kf k2 N M M
m= IM
2 9.3. ADAPTIVE BASIS SELECTION 545 which proves that it is smaller than the approximation error in a di using
basis. The costs of ideal and di using bases are respectively (1) and
N (N ;1 ). We thus derive (9.60) from (9.59). Examples of Cost Functions As mentioned earlier, if there exists no basis that is better than all other bases in D, the \best" basis that
minimizes C (f B ) depends on the choice of . Entropy The entropy (x) = ;x loge x is concave for x 0. The
corresponding cost is called the entropy of the energy distribution C (f B) = ; N
X jhf gmij2 m=1 kf k2 2
m
loge jhfkfgk2ij : (9.62) Proposition 9.7 proves that
0 C (f B) loge N:
(9.63)
It reaches the upper bound loge N for a di using basis.
Let us emphasize that this entropy is a priori not related to the
number of bits required to encode the inner products hf gm i. The
Shannon Theorem 11.1 proves that a lower bound for the number of
bits to encode individually each hf gmi is the entropy of the probability
distribution of the values taken by hf gm i. This probability distribution might be very di erent from the distribution of the normalized
energies jhf gm ij2=kf k2. For example, if hf gmi = A for 0 m < N
then jhf gmij2=kf k2 = N ;1 and the cost C (f B) = loge N is maximum. In contrast, the probability distribution of the inner product is
a discrete Dirac located at A and its entropy is therefore minimum and
equal to 0. lp Cost For p < 2, (x) = xp=2 is concave for x 0. The resulting
cost is C (f B ) = N
X jhf gmijp : kf kp
Proposition 9.7 proves that it is always bounded by
1 C (f B) N 1;p=2 :
m=1 (9.64) 546 CHAPTER 9. AN APPROXIMATION TOUR This cost measures the lp norm of the coe cients of f in B:
k
C 1=p(f B) = kf f B p :
kk
We derive from (9.26) that the approximation error M ] is bounded
by
2 2=p
M ] kf k2C ;(f B) M 21 ;1 :
=p
=p 1 The minimization of this lp cost can thus also be interpreted as a reduction of the decay factor C such that
M ] M 2C ;1 :
=p 9.3.2 Fast Best Basis Search in Trees A best wavelet packet or local cosine basis divides the time-frequency
plane into elementary atoms that are best adapted to approximate a
particular signal. The construction of dictionaries of wavelet packet
and local cosine bases is explained in Sections 8.1 and 8.4. For signals
of size N , these dictionaries include more than 2N=2 bases. The best
basis associated to f minimizes the cost
N ;1
X jhf gmij2
:
(9.65)
C (f B ) =
kf k2
m=0
Finding this minimum by a brute force comparison of the cost of all
wavelet packet or local cosine bases would require more than N 2N=2
operations, which is computationally prohibitive. The fast dynamic
programming algorithm of Coifman and Wickerhauser 140] nds the
best basis with O(N log2 N ) operations, by taking advantage of the tree
structure of these dictionaries. Dynamic Programming In wavelet packet and local cosine binary
trees, each node corresponds to a space Wjp, which admits an orthonorp mal basis Bj of wavelet packets or local cosines. This space is divided
in two orthogonal subspaces located at the children nodes:
p
p+1
Wjp = Wj2+1 Wj2+1 : 9.3. ADAPTIVE BASIS SELECTION 547 In addition to Bjp we can thus construct an orthogonal basis of Wjp with
p
p+1
a union of orthogonal bases of Wj2+1 and Wj2+1 . The root of the tree
0
corresponds to a space of dimension N , which is W0 for local cosine
0
bases and WL with 2L = N ;1 for wavelet packet bases.
The cost of f in a family of M N orthonormal vectors B =
fgm g0 m<M is de ned by the partial sum
M ;1
X jhf gmij2
C (f B) =
:
(9.66)
kf k2
m=0
This cost is additive in the sense that for any orthonormal bases B0 and
B1 of two orthogonal spaces C (f B0 B1 ) = C (f B0 ) + C (f B1 ):
(9.67)
The best basis Ojp of Wjp is the basis that minimizes the cost (9.66),
among all the bases of Wjp that can be constructed from the vectors
in the tree. The following proposition gives a recursive construction of
best bases, from bottom up along the tree branches. Proposition 9.8 (Coifman, Wickerhauser) If C is an additive cost
function then
Ojp p
p+1
p
p+1
Oj2+1 Oj2+1 if C (f Oj2+1) + C (f Oj2+1 ) < C (f Bjp )
= Bp
p
p+1
if C (f Oj2+1) + C (f Oj2+1 ) C (f Bjp)
j
(9.68) p
j
Proof 2 . The best basis Oj is either equal to Bp or to the union B0 B1
2p
2p+1
of two bases of Wj +1 and Wj +1 . In this second case, the additivity
p
property (9.67) implies that the cost of f in Oj is minimum if B0 and
p
p+1
p
B1 minimize the cost of f in Wj2+1 and Wj2+1 . Hence B0 = Oj2+1 and
p+1
j
p
p+1
B1 = Oj2+1 . This proves that Ojp is either Bp or Oj2+1 Oj2+1 . The best
basis is obtained by comparing the cost of these two possibilities. The best basis of the space at the root of the tree is obtained by nding
the best bases of all spaces Wjp in the tree, with a bottom-up progresp
sion. At the bottom of the tree, each WJ is not subdecomposed. The
p is thus the only basis available: Op = Bp . The best
best basis of WJ
J
J 548 CHAPTER 9. AN APPROXIMATION TOUR bases of the spaces fWjpgp are then recursively computed from the best
bases of the spaces fWjp+1gp with the aggregation relation (9.68). Re0
peating this for j > J until the root gives the best basis of f in W0 for
0
local cosine bases and in WL for wavelet packet bases.
The fast wavelet packet or local cosine algorithms compute the
inner product of f with all the vectors in the tree with respectively
O(N log2 N ) and O(N (log2 N )2 ) operations. At a level of the tree indexed by j , there is a total of N vectors in the orthogonal bases fBjpgp.
The costs fC (f Bjp)gp are thus calculated with O(N ) operations by
summing (9.66). The computation of the best basis of all the spaces
p
p
fWj gp from the best bases of fWj +1gp via (9.68) thus requires O(N )
operations. Since the depth of the tree is smaller than log2 N , the best
basis of the space at the root is selected with O(N log2 N ) operations. Best Bases of Images Wavelet packet and local cosine bases of images are organized in quad-trees described in Sections 8.2.1 and 8.5.3.
Each node of the quad-tree is associated to a space Wjp q , which admits
a separable basis Bjp q of wavelet packet or local cosine vectors. This
space is divided into four subspaces located at the four children nodes:
p
p+1
p
p+1
Wjp q = Wj2+12q Wj2+1 2q Wj2+12q+1 Wj2+1 2q+1:
The union of orthogonal bases of the four children spaces thus de nes
an orthogonal basis of Wjp q. At the root of the quad-tree is a space of
0
dimension N 2 , which corresponds to W0 0 for local cosine bases and to
0
WL 0 withp 2L = N ;1 for wavelet packet bases.
Let Oj q be the best basis Wjp q for a signal f . Like Proposition
9.8 the following proposition relates the best basis of Wjp q to the best
bases of its children. It is proved with the same derivations.
Proposition 9.9 (Coifman, Wickerhauser) Suppose that C is an
additive cost function. If
p2
p+1
C (f Bjp q ) < C (f Oj2+1 q ) + C (f Oj2+1 2q ) +
p2
p+1
C (f Oj2+1 q+1) + C (f Oj2+1 2q+1)
then
Ojp q = Bjp q 9.3. ADAPTIVE BASIS SELECTION 549 otherwise
p2
p+1
p2
p+1
Ojp q = Oj2+1 q Oj2+1 2q Oj2+1 q+1 Oj2+1 2q+1 : This recursive relation computes the best basis of fWjp qgp q from the
q
best bases of the spaces fWjp+1gp q , with O(N 2) operations. Iterating
this procedure from the bottom of the tree to the top nds the best
basis of f with O(N 2 log2 N ) calculations. 9.3.3 Wavelet Packet and Local Cosine Best Bases The performance of best wavelet packet and best local cosine approximations depends on the time-frequency properties of f . We evaluate
these approximations through examples that also reveal their limitations.
f(t)
1
0
−1
0 t
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ω / 2π
250
200
150
100
50
0
0 t Figure 9.8: The top signal includes two hyperbolic chirps. The Heisenberg boxes of the best wavelet packet basis are shown below. The darkness of each rectangle is proportional to the amplitude of the wavelet
packet coe cient. Best Wavelet Packet Bases A wavelet packet basis divides the frequency axis into intervals of varying sizes. Each frequency interval CHAPTER 9. AN APPROXIMATION TOUR 550 is covered by a wavelet packet function that is translated uniformly in
time. A best wavelet packet basis can thus be interpreted as a \best"
frequency segmentation.
A signal is well approximated by a best wavelet packet basis if in
any frequency interval, the high energy structures have a similar timefrequency spread. The time translation of the wavelet packet that covers this frequency interval is then well adapted to approximating all the
signal structures in this frequency range that appear at di erent times.
Figure 9.8 gives the best wavelet packet basis computed with the entropy (u) = ;u loge u, for a signal composed of two hyperbolic chirps.
The wavelet packet tree was calculated with the Symmlet 8 conjugate
mirror lter. The time-support of the wavelet packets is reduced at
high frequencies to adapt itself to the rapid modi cation of the chirps'
frequency content. The energy distribution revealed by the wavelet
packet Heisenberg boxes is similar to the scalogram shown in Figure
4.17. Figure 8.6 gives another example of a best wavelet packet basis,
for a di erent multi-chirp signal. Let us mention that the application
of best wavelet packet bases to pattern recognition remains di cult because they are not translation invariant. If the signal is translated, its
wavelet packet coe cients are severely modi ed and the minimization
of the cost function may yield a di erent basis. This remark applies to
local cosine bases as well.
s0
ξ 1
s1
s1 ξ 0
s 0 u0 u 1 Figure 9.9: Time-frequency energy distribution of the four elementary
atoms in (9.69). 9.3. ADAPTIVE BASIS SELECTION 551 If the signal includes di erent types of high energy structures, located at di erent times but in the same frequency interval, there is no
wavelet packet basis that is well adapted to all of them. Consider, for
example a sum of four transients centered respectively at u0 and u1, at
two di erent frequencies 0 and 1:
K
K
f (t) = ps0 g t ; u0 exp(i 0t) + ps1 g t ; u1 exp(i 0t(9.69)
)
s0
s1
0
1
K
K
+ ps2 g t ; u0 exp(i 1t) + ps3 g t ; u1 exp(i 1t):
s1
s0
1
0
The smooth window g has a Fourier transform g whose energy is concen^
trated at low frequencies. The Fourier transform of the four transients
have their energy concentrated in frequency bands centered respectively
at 0 and 1 :
^
f^(!) = K0 ps0 g s0 (! ; 0) exp(;iu0 ! ; 0])
p
+K1 s1 g s1 (! ; 0) exp(;iu1 ! ; 0])
^
p
+ K2 s1 g s1 (! ; 1) exp(;iu0 ! ; 1])
^
p
+K3 s0 g s0 (! ; 1) exp(;iu1 ! ; 1]):
^
If s0 and s1 have di erent values, the time and frequency spread of these
transients is di erent, which is illustrated in Figure 9.9. In the best
wavelet packet basis selection, the rst transient K0 s;1=2 g(s;1(t ; u0)) exp(i 0 t)
0
0
\votes" for a wavelet packet whose scale 2j is of the order s0 at the frequency 0 whereas K1 s;1=2 g(s;1(t ; u1)) exp(i 0t) \votes" for a wavelet
1
1
packet whose scale 2j is close to s1 at the same frequency. The \best"
wavelet packet is adapted to the transient of highest energy, which
yields the strongest vote in the cost (9.65). The energy of the smaller
transient is then spread across many \best" wavelet packets. The same
thing happens for the second pair of transients located in the frequency
neighborhood of 1 .
Speech recordings are examples of signals whose properties change
rapidly in time. At two di erent instants, in the same frequency neighborhood, the signal may have a totally di erent energy distributions. CHAPTER 9. AN APPROXIMATION TOUR 552 A best wavelet packet is not adapted to this time variation and gives
poor non-linear approximations.
As in one dimension, an image is well approximated in a best wavelet
packet basis if its structures within a given frequency band have similar properties across the whole image. For natural scene images, the
best wavelet packet often does not provide much better non-linear approximations than the wavelet basis included in this wavelet packet
dictionary. For speci c classes of images such as ngerprints, one may
nd wavelet packet bases that outperform signi cantly the wavelet basis
103].
4 x 10
1
0
−1
0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1 ω / 2π
250 200 150 100 50 0
0 1 t Figure 9.10: Recording of bird song. The Heisenberg boxes of the best
local cosine basis are shown below. The darkness of each rectangle is
proportional to the amplitude of the local cosine coe cient. Best Local Cosine Bases A local cosine basis divides the time axis into intervals of varying sizes. A best local cosine basis thus adapts the 9.3. ADAPTIVE BASIS SELECTION 553 time segmentation to the variations of the signal time-frequency structures. In comparison with wavelet packets, we gain time adaptation but
we lose frequency exibility. A best local cosine basis is therefore well
adapted to approximating signals whose properties may vary in time,
but which do not include structures of very di erent time and frequency
spread at any given time. Figure 9.10 shows the Heisenberg boxes of
the best local cosine basis for the recording of a bird song, computed
with an entropy cost. Figure 8.19 shows the best local cosine basis for
a speech recording.
The sum of four transients (9.69) is not e ciently represented in a
wavelet packet basis but neither is it well approximated in a best local
cosine basis. Indeed, if the scales s0 and s1 are very di erent, at u0 and
u1 this signal includes two transients at the frequency 0 and 1 that
have a very di erent time-frequency spread. In each time neighborhood,
the size of the window is adapted to the transient of highest energy.
The energy of the second transient is spread across many local cosine
vectors. E cient approximations of such signals require using larger
dictionaries of bases, which can simultaneously divide the time and
frequency axes in intervals of various sizes 208]. Figure 9.11: The grid shows the approximate support of square overlapping windows in the best local cosine basis, computed with an l1
cost. 554 CHAPTER 9. AN APPROXIMATION TOUR In two dimensions, a best local cosine basis divides an image into
square windows whose sizes are adapted to the spatial variations of
local image structures. Figure 9.11 shows the best basis segmentation
of the Barbara image, computed with an l1 cost calculated with (u) =
u1=2 . The squares are bigger in regions where the image structures
remain nearly the same. Figure 8.22 shows another example of image
segmentation with a best local cosine basis computed with the same cost
function. As in one dimension, a best local cosine basis is an e cient
representation if the image does not include very di erent frequency
structures in the same spatial region. 9.4 Approximations with Pursuits 3
A music recording often includes notes of di erent durations at the
same time, which means that such a signal is not well represented in a
best local cosine basis. The same musical note may also have di erent
durations when played at di erent times, in which case a best wavelet
packet basis is also not well adapted to represent this sound. To approximate musical signals e ciently, the decomposition must have the same
exibility as the composer, who can freely choose the time-frequency
atoms (notes) that are best adapted to represent a sound.
Wavelet packet and local cosine dictionaries include P = N log2 N
di erent vectors. The set of orthogonal bases is much smaller than the
set of non-orthogonal bases that could be constructed by choosing N
linearly independent vectors from these P . To improve the approximation of complex signals such as music recordings, we study general
non-orthogonal signal decompositions.
Consider the space of signals of size N . Let D = fgpg0 p<P be
a redundant dictionary of P > N vectors, which includes at least N
linearly independent vectors. For any M 1, an approximation fM
of f may be calculated with a linear combination of any M dictionary
vectors:
M ;1
X
fM = a pm ] gpm :
m=0 The freedom of choice opens the door to a considerable combinatorial
explosion. For general dictionaries of P > N vectors, computing the 9.4. APPROXIMATIONS WITH PURSUITS 555 approximation fM that minimizes kf ; fM k is an NP hard problem
151]. This means that there is no known polynomial time algorithm
that can solve this optimization.
Pursuit algorithms reduce the computational complexity by searching for e cient but non-optimal approximations. A basis pursuit formulates the search as a linear programming problem, providing remarkably good approximations with O(N 3:5 log3:5 N ) operations. For large
2
signals, this remains prohibitive. Matching pursuits are faster greedy
algorithms whose applications to large time-frequency dictionaries is
described in Section 9.4.2. An orthogonalized pursuit is presented in
Section 9.4.3. 9.4.1 Basis Pursuit We study the construction of a \best" basis B, not necessarily orthogonal, for e ciently approximating a signal f . The N vectors of B =
fgpm g0 m<N are selected from a redundant dictionary D = fgpg0 p<P
with a pursuit elaborated by Chen and Donoho 119]. Let us decompose
f in this basis:
N ;1
X
f = a pm ] gpm :
(9.70)
m=0 If we had restricted ourselves to orthogonal bases, Section 9.3.1 explains
that the basis choice would be optimized by minimizing
N ;1
X ja pm]j2
(9.71)
C (f B) =
kf k2
m=0
where (u) is concave. For non-orthogonal bases, this result does not
hold in general.
Despite the absence of orthogonality, a basis pursuit searches for a
\best" basis that minimizes (9.71) for (u) = u1=2 :
X
1 N ;1
C (f B) = kf k ja pm ]j:
(9.72)
m=0 Minimizing the l1 norm of the decomposition coe cients avoids diffusing the energy of f among many vectors. It reduces cancellations 556 CHAPTER 9. AN APPROXIMATION TOUR between the vectors a pm]gpm that decompose f , because such cancellations increase ja pm]j and thus increase the cost (9.72). The minimization of an l1 norm is also related to linear programming, which leads
to fast computational algorithms. Linear Programming Instead of immediately isolating subsets of N vectors in the dictionary D, a linear system of size P is written with
all dictionary vectors
P ;1
X
a p] gp n] = f n]
(9.73)
p=0 while trying to minimize X P ;1
p=0 ja p]j: (9.74) The system (9.73) can be expressed in matrix form with the P N
matrix G = fgp n]g0 n<N 0 p<P
Ga = f:
(9.75)
Although the minimization of (9.74) is nonlinear, it can be reformulated
as a linear programming problem.
A standard-form linear programming problem 28] is a constrained
optimization over positive vectors of size L. Let b n] be a vector of size
N < L, c p] a non-zero vector of size L and A n p] an L N matrix.
We must nd x p] 2 R L such that x p] 0, while minimizing
L;1
X
x p] c p]
(9.76)
p=0 subject to Ax = b:
To reformulate the minimization of (9.74) subject to (9.75) as a
linear programming problem, we introduce \slack variables" u p] 0
and v p] 0 such that
a p] = u p] ; v p]: 9.4. APPROXIMATIONS WITH PURSUITS
As a result
and Ga = Gu ; Gv = f X P ;1
p=0 ja p]j = X P ;1
p=0 u p] + X P ;1
p=0 557
(9.77) v p]: (9.78) We thus obtain a standard form linear programming of size L = 2P
with
A = (G ;G) x = u
b = f c = 1:
v
The matrix A of size N L has rank N because the dictionary D
includes N linearly independent vectors. A standard result of linear
programming 28] proves that the vector x has at most N non-zero
coe cients. One can also verify that if a p] > 0 then a p] = u p] and
v p] = 0 whereas if a p] 0 then a p] = v p] and u p] = 0. In the
non-degenerate case, which is most often encountered, the non-zero
coe cients of x p] thus correspond to N indices fpm g0 m<N such that
fgpm g0 m<N are linearly independent. This is the best basis of R N that
minimizes the cost (9.72). Linear Programming Computations The collection of feasible points fxjAx = b x 0g is a convex polyhedron in R L . The vertices of this polyhedron are solutions x p] having at most N non-zero
coe cients. The linear cost (9.76) can be minimum only at a vertex
of this polyhedron. In the non-degenerate case, the N non-zero coefcients correspond to N column vectors B = fgpm g0 m<N that form a
basis.
One can also prove 28] that if the cost is not minimum at a given
vertex then there exists an adjacent vertex whose cost is smaller. The
simplex algorithm takes advantage of this property by jumping from
one vertex to an adjacent vertex while reducing the cost (9.76). Going
to an adjacent vertex means that one of the zero coe cients of x p]
becomes non-zero while one non-zero coe cient is set to zero. This is
equivalent to modifying the basis B by replacing one vector by another
vector of D. The simplex algorithm thus progressively improves the
basis by appropriate modi cations of its vectors, one at a time. In the 558 CHAPTER 9. AN APPROXIMATION TOUR worst case, all vertices of the polyhedron will be visited before nding
the solution, but the average case is much more favorable.
Since the 1980's, more e ective interior point procedures have been
developed. Karmarkar's interior point algorithm 234] begins in the
middle of the polyhedron and converges by iterative steps towards the
vertex solution, while remaining inside the convex polyhedron. For
nite precision calculations, when the algorithm has converged close
enough to a vertex, it jumps directly to the corresponding vertex, which
is guaranteed to be the solution. The middle of the polyhedron corresponds to a decomposition of f over all vectors of D, typically with
P > N non-zero coe cients. When moving towards a vertex some
coe cients progressively decrease while others increase to improve the
cost (9.76). If only N decomposition coe cients are signi cant, jumping to the vertex is equivalent to setting all other coe cients to zero.
Each step requires computing the solution of a linear system. If A is
an N L matrix then Karmarkar's algorithm terminates with O(L3:5)
operations. Mathematical work on interior point methods has led to
a large variety of approaches that are summarized in 252]. The basis pursuit of Chen and Donoho 119] is implemented in WaveLab
with a \Log-barrier" method 252], which converges more quickly than
Karmarkar's original algorithm Wavelet Packet and Local Cosine Dictionaries These dictionar- ies have P = N log2 N time-frequency atoms. A straightforward im3
plementation of interior point algorithms thus requires O(N 3:5 log2:5 N )
operations. By using the fast wavelet packet and local cosine transforms
together with heuristic computational rules, the number of operations
is considerably reduced 119]. The algorithm still remains relatively
slow and the computations become prohibitive for N 1000.
Figure 9.12 decomposes a synthetic signal that has two high frequency transients followed by two lower frequency transients and two
Diracs for n < 100. The signal then includes two linear chirps that cross
each other and which are superimposed with localized sinusoidal waves.
In a dictionary of wavelet packet bases calculated with a Daubechies
8 lter, the best basis shown in Figure 9.12(c) optimizes the division
of the frequency axis, but it has no exibility in time. It is therefore 9.4. APPROXIMATIONS WITH PURSUITS 559 not adapted to the time evolution of the signal components. A basis
pursuit algorithm adapts the wavelet packet choice to the local signal
structures Figure 9.12(d) shows that it better reveals its time-frequency
properties. 9.4.2 Matching Pursuit Despite the linear programming approach, a basis pursuit is computationally expensive because it minimizes a global cost function over
all dictionary vectors. The matching pursuit introduced by Mallat
and Zhang 259] reduces the computational complexity with a greedy
strategy. It is closely related to projection pursuit algorithms used in
statistics 184] and to shape-gain vector quantizations 27]. Vectors are
selected one by one from the dictionary, while optimizing the signal
approximation at each step.
Let D = fg g 2; be a dictionary of P > N vectors, having a unit
norm. This dictionary includes N linearly independent vectors that
de ne a basis of the space C N of signals of size N . A matching pursuit
begins by projecting f on a vector g 0 2 D and computing the residue
Rf :
f = hf g 0 i g 0 + Rf:
(9.79)
Since Rf is orthogonal to g 0
kf k2 = jhf g 0 ij2 + kRf k2 : (9.80) To minimize kRf k we must choose g 0 2 D such that jhf g 0 ij is maximum. In some cases, it is computationally more e cient to nd a
vector g 0 that is almost optimal:
jhf g 0 ij sup jhf g ij
2; (9.81) where 2 (0 1] is an optimality factor. The pursuit iterates this
procedure by subdecomposing the residue. Let R0 f = f . Suppose that
the mth order residue Rmf is already computed, for m 0. The next
iteration chooses g m 2 D such that
jhRm f g m ij sup jhRm f g ij
2; (9.82) CHAPTER 9. AN APPROXIMATION TOUR 560 and projects Rm f on g m : Rmf = hRm f g m i g m + Rm+1 f: (9.83) The orthogonality of Rm+1 f and g m implies
kRm f k2 = jhRm f g m ij2 + kRm+1 f k2 : Summing (9.83) from m between 0 and M ; 1 yields
M ;1
X
f = hRm f g m i g m + RM f:
m=0 (9.84) (9.85) Similarly, summing (9.84) from m between 0 and M ; 1 gives
M ;1
Xm
2
kf k =
jhR f g m ij2 + kRM f k2 :
(9.86)
m=0 The following theorem proves that kRmf k converges exponentially to
0 when m tends to in nity. Theorem 9.9 There exists > 0 such that for all m 0
kRm f k As a consequence f=
and +1
X m=0 kf k2 = 2; m kf k: hRm f g m i g m +1
X m=0 jhRm f g m ij2 : Proof 3 . Let us rst verify that there exists f 2 CN sup jhf g ij 2; kf k: (9.87)
(9.88)
(9.89)
> 0 such that for any
(9.90) 9.4. APPROXIMATIONS WITH PURSUITS
f(t)
2 561 ω 2π
250 1 0 150
−1 −2 −3
0 0.2 0.4 (a) 0.6 0.8 1 0 t ω / 2π 0 1 (b) t ω / 2π 250 250 200 200 150 150 100 100 50 50 0
0 0.2 0.4 (c) 0.6 0.8 1 t ω / 2π 0
0 0.2 0.4 0.2 0.4 0.6 0.8 1 0.6 0.8 1 (d) t ω / 2π 250 250 200 200 150 150 100 100 50 50 0
0 0.2 0.4 0.6 0.8 1 t 0
0 t (e)
(f)
Figure 9.12: (a): Signal synthesized with a sum of chirps, truncated
sinusoids, short time transients and Diracs. The time-frequency images
display the atoms selected by di erent adaptive time-frequency transforms. The darkness is proportional to the coe cient amplitude. (b):
Gabor matching pursuit. Each dark blob is the Wigner-Ville distribution of a selected Gabor atom. (c): Heisenberg boxes of a best wavelet
packet basis calculated with Daubechies 8 lter. (d): Wavelet packet
basis pursuit. (e): Wavelet packet matching pursuit. (f): Wavelet
packet orthogonal matching pursuit. CHAPTER 9. AN APPROXIMATION TOUR 562 Suppose that it is not possible to nd such a . This means that we can
construct ffm gm2N with kfm k = 1 and
lim sup jhfm g ij = 0:
(9.91)
m!+1 2; Since the unit sphere of C N is compact, there exists a sub-sequence
ffmk gk2N that converges to a unit vector f 2 C N . It follows that
sup jhf g ij = k!+1 sup jhfmk g ij = 0
lim
(9.92) 2; 2; so hf g i = 0 for all g 2 D. Since D contains a basis of C N , necessarily
f = 0 which is not possible because kf k = 1. This proves that our initial
assumption is wrong, and hence there exists such that (9.90) holds.
The decay condition (9.87) is derived from the energy conservation
kRm+1 f k2 = kRm f k2 ; jhRmf gpm ij2:
The matching pursuit chooses g m that satis es
jhRm f g m ij sup jhRmf g ij
(9.93)
and (9.90) implies that jhRm f g m ij 2; kRm f k: So
kRm+1 f k kRm f k (1 ; 2 2 )1=2 (9.94) which veri es (9.87) for
2; = (1 ; 2 2 )1=2 < 1:
This also proves that limm!+1 kRm f k = 0. Equation (9.88) and (9.89)
are thus derived from (9.85) and (9.86). The convergence rate decreases when the size N of the signal space
increases. In the limit of in nite dimensional spaces, Jones' theorem
proves that the algorithm still converges but the convergence is not
exponential 230, 259]. The asymptotic behavior of a matching pursuit
is further studied in Section 10.5.2. Observe that even in nite dimensions, an in nite number of iterations is necessary to completely reduce
the residue. In most signal processing applications, this is not an issue
because many fewer than N iterations are needed to obtain su ciently
precise signal approximations. Section 9.4.3 describes an orthogonalized matching pursuit that converges in fewer than N iterations. 9.4. APPROXIMATIONS WITH PURSUITS 563 Fast Network Calculations A matching pursuit is implemented with a fast algorithm that computes hRm+1 f g i from hRm f g i with
a simple updating formula. Taking an inner product with g on each
side of (9.83) yields
hRm+1 f g i = hRm f g i ; hRm f g m i hg m g i: (9.95) In neural network language, this is an inhibition of hRmf g i by the selected pattern g m with a weight hg m g i that measures its correlation
with g . To reduce the computational load, it is necessary to construct
dictionaries with vectors having a sparse interaction. This means that
each g 2 D has non-zero inner products with only a small fraction of
all other dictionary vectors. It can also be viewed as a network that is
not fully connected. Dictionaries are designed so that non-zero weights
hg g i can be retrieved from memory or computed with O(1) operations. A matching pursuit with a relative precision is implemented
with the following steps.
1. Initialization Set m = 0 and compute fhf g ig 2;.
2. Best match Find g m 2 D such that
jhRm f g m ij sup jhRm f g ij:
2;
3. Update For all g 2 D with hg m g i 6= 0 (9.96) hRm+1 f g i = hRm f g i ; hRm f g m i hg m g i: (9.97) 4. Stopping rule If
kRm+1 f k2 = kRm f k2 ; jhRm f g m ij2 2 kf k2 then stop. Otherwise m = m + 1 and go to 2.
If D is very redundant, computations at steps 2 and 3 are reduced
by performing the calculations in a sub-dictionary Ds = fg g 2;s D.
The sub-dictionary Ds is constructed so that if g~m 2 Ds maximizes
jhf g ij in Ds then there exists g m 2 D which satis es (9.96) and whose 564 CHAPTER 9. AN APPROXIMATION TOUR index m is \close" to ~m. The index m is found with a local search.
This is done in time-frequency dictionaries where a sub-dictionary can
be su cient to indicate a time-frequency region where an almost best
match is located. The updating (9.97) is then restricted to vectors
g 2 Ds.
The particular choice of a dictionary D depends upon the application. Speci c dictionaries for inverse electro-magnetic problems, face
recognition and data compression are constructed in 268, 229, 279].
In the following, we concentrate on dictionaries of local time-frequency
atoms. Wavelet Packets and Local Cosines Wavelet packet and local co- sine trees constructed in Sections 8.2.1 and 8.5.3 are dictionaries containing P = N log2 N vectors. They have a sparse interaction and
non-zero inner products of dictionary vectors can be stored in tables.
Each matching pursuit iteration then requires O(N log2 N ) operations.
Figure 9.12(c) is an example of a matching pursuit decomposition
calculated in a wavelet packet dictionary. Compared to the best wavelet
packet basis shown in Figure 9.12(a), it appears that the exibility of
the matching pursuit selects wavelet packet vectors that give a more
compact approximation, which reveals better the signal time-frequency
structures. However, a matching pursuit requires more computations
than a best basis selection.
In this example, matching pursuit and basis pursuit algorithms give
similar results. In some cases, a matching pursuit does not perform
as well as a basis pursuit because the greedy strategy selects decomposition vectors one by one 159]. Choosing decomposition vectors by
optimizing a correlation inner product can produce a partial loss of time
and frequency resolution 119]. High resolution pursuits avoid the loss
of resolution in time by using non-linear correlation measures 195, 223]
but the greediness can still have adverse e ects. Translation Invariance Section 5.4 explains that decompositions in orthogonal bases lack translation invariance and are thus di cult to
use for pattern recognition. Matching pursuits are translation invariant 9.4. APPROXIMATIONS WITH PURSUITS 565 if calculated in translation invariant dictionaries. A dictionary D is
translation invariant if for any g 2 D then g n;p] 2 D for 0 p < N .
Suppose that the matching decomposition of f in D is f n] = X M ;1
m=0 hRm f g m i g m n] + RM f n]: (9.98) One can verify 151] that the matching pursuit of fp n] = f n ; p] selects
a translation by p of the same vectors g m with the same decomposition
coe cients
M ;1
X
fp n] = hRmf g m i g m n ; p] + RM fp n]:
m=0 Patterns can thus be characterized independently of their position. The
same translation invariance property is valid for a basis pursuit. However, translation invariant dictionaries are necessarily very large, which
often leads to prohibitive calculations. Wavelet packet and local cosine
dictionaries are not translation invariant because at each scale 2j the
waveforms are translated only by k 2j with k 2 Z.
Translation invariance is generalized as an invariance with respect
to any group action 151]. A frequency translation is another example
of a group operation. If the dictionary is invariant under the action
of a group then the pursuit remains invariant under the action of the
same group. Gabor Dictionary A time and frequency translation invariant Ga- bor dictionary is constructed by Qian and Chen 287] as well as Mallat
and Zhong 259], by scaling, translating and modulating a Gaussian
window. Gaussian windows are used because of their optimal time and
frequency energy concentration, proved by the uncertainty Theorem
2.5.
For each scale 2j , a discrete window of period 2N is designed by
sampling and periodizing a Gaussian g(t) = 21=4 e; t : gj n] = Kj +1
X g n ; jpN :
2
p=;1 566 CHAPTER 9. AN APPROXIMATION TOUR The constant Kj is adjusted so that kgj k = 1. This window is then
translated in time and frequency. Let ; be the set of indexes =
(p k 2j ) for (p k) 2 0 N ; 1]2 and j 2 0 log2 N ]. A discrete Gabor
atom is
g n] = gj n ; p] exp i2Nkn :
(9.99)
The resulting Gabor dictionary D = fg g 2; is time and frequency
translation invariant modulo N . A matching pursuit decomposes real
signals in this dictionary by grouping atoms g + and g ; with =
(p k 2j ). At each iteration, instead of projecting Rm f over an atom
g , the matching pursuit computes its projection on the plane generated
by (g + g ; ). Since Rmf n] is real, one can verify that this is equivalent
to projecting Rmf on a real vector that can be written kn
g n] = Kj gj n ; p] cos 2 N + :
The constant Kj sets the norm of this vector to 1 and the phase is
optimized to maximize the inner product with Rmf . Matching pursuit
iterations yield
+1
Xm mm
f = hR f g m i g m :
(9.100)
m=0 This decomposition is represented by a time-frequency energy distribution obtained by summing the Wigner-Ville distribution PV g m n k] of
the complex atoms g m :
+1
X m m2
PM f n k] = jhR f g m ij PV g m n k]:
(9.101)
m=0 Since the window is Gaussian, if m = (pm km 2jm ) then PV g m is a twodimensional Gaussian blob centered at (pm km) in the time-frequency
plane. It is scaled by 2jm in time and N 2;jm in frequency. Example 9.1 Figure 9.12(b) gives the matching pursuit energy distribution PM f n k] of a synthetic signal. The inner structures of this
signal appear more clearly than with a wavelet packet matching pursuit 9.4. APPROXIMATIONS WITH PURSUITS 567 because Gabor atoms have a better time-frequency localization than
wavelet packets, and they are translated over a ner time-frequency
grid. Example 9.2 Figure 9.13 shows the Gabor matching pursuit decom- position of the word \greasy", sampled at 16 kHz. The time-frequency
energy distribution shows the low-frequency component of the \g" and
the quick burst transition to the \ea". The \ea" has many harmonics that are lined up. The \s" is noise whose time-frequency energy
is spread over a high-frequency interval. Most of the signal energy is
characterized by a few time-frequency atoms. For m = 250 atoms,
kRm f k=kf k = :169, although the signal has 5782 samples, and the
sound recovered from these atoms is of excellent audio-quality.
f(t)
2000
1000
0
−1000
0 0.2 0.4 0.6 0.8 1 t ω 2π
8000 4000 0 t
0 1 Figure 9.13: Speech recording of the word \greasy" sampled at 16kHz.
In the time-frequency image, the dark blobs of various sizes are the
Wigner-Ville distributions of a Gabor functions selected by the matching pursuit.
Matching pursuit calculations in a Gabor dictionary are performed 568 CHAPTER 9. AN APPROXIMATION TOUR with a sub-dictionary Ds. At each scale 2j , the time-frequency indexes
(p k) are subsampled at intervals a2j and aN 2;j where the sampling
factor a < 1 is small enough to detect the time-frequency regions where
the signal has high energy components. The step 2 of the matching
pursuit iteration (9.96) nds the Gabor atom in g~m 2 Ds which best
matches the signal residue. This match is then improved by searching for an atom g m 2 D whose index m is close to ~m and which
locally maximizes the correlation with the signal residue. The updating formula (9.97) is calculated for g 2 Ds. Inner products between
two Gabor atoms are computed with an analytic formula 259]. Since
Ds has O(N log2 N ) vectors, one can verify that each matching pursuit
iteration is implemented with O(N log2 N ) calculations. 9.4.3 Orthogonal Matching Pursuit The approximations of a matching pursuit are improved by orthogonalizing the directions of projection, with a Gram-Schmidt procedure
proposed by Pati et al. 280] and Davis et al. 152]. The resulting
orthogonal pursuit converges with a nite number of iterations, which
is not the case for a non-orthogonal pursuit. The price to be paid is the
important computational cost of the Gram-Schmidt orthogonalization.
The vector g m selected by the matching algorithm is a priori not
orthogonal to the previously selected vectors fg p g0 p<m. When subtracting the projection of Rm f over g m the algorithm reintroduces
new components in the directions of fg p g0 p<m. This is avoided by
projecting the residues on an orthogonal family fupg0 p<m computed
from fg p g0 p<m.
Let us initialize u0 = g 0 . For m 0, an orthogonal matching
pursuit selects g m that satis es
jhRm f g m ij
sup jhRm f g ij:
(9.102)
2;
The Gram-Schmidt algorithm orthogonalizes g m with respect to fg p g0 p<m
and de nes
m;1
X mu
um = g m ; hgku k2pi up:
(9.103)
p
p=0 9.4. APPROXIMATIONS WITH PURSUITS 569 The residue Rmf is projected on um instead of g m :
mu
Rmf = hRkuf k2mi um + Rm+1 f: (9.104) Summing this equation for 0 m < k yields
k;1
X hR m f u m i
f=
u + Rk f
kum k2 m
m=0
= PVk f + Rk f (9.105) m where PVk is the orthogonal projector on the space Vk generated by
fum g0 m<k . The Gram-Schmidt algorithm ensures that fg m g0 m<k is
also a basis of Vk . For any k 0 the residue Rk f is the component of
f that is orthogonal to Vk . For m = k (9.103) implies that
hRm f um i = hRm f g m i: (9.106) Since Vk has dimension k there exists M N such that f 2 VM , so
RM f = 0 and inserting (9.106) in (9.105) for k = M yields
M ;1 m
X hR f g m i
f=
um :
(9.107)
2
m=0 kum k
The convergence is obtained with a nite number M of iterations. This
is a decomposition in a family of orthogonal vectors so
M ;1
X jhRmf g m ij2
kf k2 =
:
(9.108)
kum k2
m=0
To expand f over the original dictionary vectors fg m g0 m<M , we
must perform a change of basis. The triangular Gram-Schmidt relations
(9.103) are inverted to expand um in fg p g0 p m: um = m
X
p=0 b p m] g p : (9.109) CHAPTER 9. AN APPROXIMATION TOUR 570 Inserting this expression into (9.107) gives
M ;1
X
f = a p] g p (9.110) p=0 with a p] = X M ;1
m=p mg
b p m] hRkuf k2m i :
m During the rst few iterations, the pursuit often selects nearly orthogonal vectors, so the Gram-Schmidt orthogonalization is not needed.
The orthogonal and non-orthogonal pursuits are then nearly the same.
When the number of iterations increases and gets close to N , the
residues of an orthogonal pursuit have norms that decrease faster than
for a non-orthogonal pursuit.
Figure 9.12(f) displays the wavelet packets selected by an orthogonal matching pursuit. A comparison with Figure 9.12(e) shows that the
orthogonal and non-orthogonal pursuits selects nearly the same wavelet
packets having a high amplitude inner product. These wavelet packets
are selected during the rst few iterations, and since they are nearly
orthogonal the Gram-Schmidt orthogonalization does not modify much
the pursuit. The di erence between the two algorithms becomes signi cant when selected wavelet packet vectors have non-negligible inner
products, which happens when the number of iterations is large.
The Gram-Schmidt summation (9.103) must be carefully implemented to avoid numerical instabilities 29]. Orthogonalizing M vectors requires O(NM 2 ) operations. In wavelet packet, local cosine and
Gabor dictionaries, M matching pursuit iterations are calculated with
O(MN log2 N ) operations. For M large, the Gram-Schmidt orthogonalization increases very signi cantly the computational complexity of
the pursuit. The non-orthogonal pursuit is thus more often used for
large signals. 9.5 Problems
9.1. Prove that for any f 2 L2 0 1], if kf kV < +1 then kf k1 <
+1. Verify that one can nd an image f 2 L2 0 1]2 such that
1 9.5. PROBLEMS 571 kf kV < +1 and kf k1 = +1.
9.2. 1 Prove that if f 2 Ws (R ) with s > p + 1=2 then f 2 Cp.
9.3. 1 The family of discrete polynomials fpk n] = nk g0 k<N is a basis 9.4.
9.5.
9.6. 9.7. 9.8.
9.9. of C N .
(a) Implement in WaveLab a Gram-Schmidt algorithm that orthogonalizes fpk g0 k<N .
(b) Let f be a signal of size N . Compute the polynomial fk of
degree k which minimizes kf ; fk k. Perform numerical experiments on signals f that are uniformly smooth and piecewise
smooth. Compare the approximation error with the error obtained by approximating f with the k lower frequency Fourier
coe cients.
1 If f has bounded variation on 0 1], prove that its linear approximation in a wavelet basis satis es l M ] = O(M ;1 ) (Hint:
use Theorem 9.6). Verify that l M ] M ;1 if f = 1 0 1=2] .
2 Let M ] be a decreasing sequence such that lim
M !+1 M ] =
0. By using (9.43) prove that there exists a bounded variation
image f 2 L2 0 1]2 such that l M ]
M ].
1 Consider a wavelet basis of L2 0 1] constructed with wavelets
having q > s vanishing moments and which are Cq . Construct
functions f 2 Ws 0 1] for which the linear and non-linear approximation errors in this basis are identical: l M ] = n M ] for any
M 0.
1 Color images A color pixel is represented by red, green and
blue components (r g b), which are considered as orthogonal coordinates in a three dimensional color space. The red r n1 n2 ],
green g n1 n2 ] and blue b n1 n2 ] image pixels are modeled as values taken by respectively three random variables R, G and B , that
are the three coordinates of a color vector. Estimate numerically
the 3 by 3 covariance matrix of this color random vector from
several images and compute the Karhunen-Loeve basis that diagonalizes it. Compare the color images reconstructed from the two
Karhunen-Loeve color channels of highest variance with a reconstruction from the red and green channels.
1 Let us de ne kxk = ;P+1 jx n]jp 1=p . Prove that kxk
p
q
n=;1
kxkp if q p.
1 Let f (t) be a piecewise polynomial signal of degree 3 de ned 572 CHAPTER 9. AN APPROXIMATION TOUR
on 0 1], with K discontinuities. We denote by fK and f~K respectively the linear and non-linear approximations of f from K
vectors chosen from a Daubechies wavelet basis of L2 0 1], with
p + 1 vanishing moments.
(a) Give upper bounds as a function of K and p of kf ; fK k and
kf ; f~K k.
(b) The Piece-Polynomial signal f in WaveLab is piecewise polynomial with degree 3. Decompose it in a Daubechies wavelet
basis with four vanishing moments, and compute kf ; fK k and
kf ; f~K k as a function of K . Verify your analytic formula.
9.10. 2 Let f n] be de ned over 0 N ]. We denote by fp k n] the signal
that is piecewise constant on 0 k], takes at most p di erent values,
and minimizes
p k = k f ; fp k k 2
0 k] = k
X n=0 jf n] ; fp k n]j2 : (a) Compute as a function of f n] the value al k that minimizes
P
cl k = k =l jf n] ; al k j2.
n
(b) Prove that
p k = min f p;1 l + cl k g:
l2 0 k;1] Derive a bottom up algorithm that computes progressively fp k
for 0 k N and 1 p K , and obtains fK N with
O(K N 2 ) operations. Implement this algorithm in WaveLab.
(c) Compute the non-linear approximation of f with the K largest
amplitude Haar wavelet coe cients, and the resulting approximation error. Compare this error with kf ; fK N k as a function of K , for the Lady and the Piece-Polynomial signals in
WaveLab. Explain your results.
2 Approximation of oscillatory functions
9.11.
(a) Let f (t) = a(t) exp i (t)]. If a(t) and 0 (t) remain nearly
constant on the support of j n then show with an approximate
calculation that
p;
hf j ni a(2j n) 2j ^ 2j 0(2j n) :
(9.111)
(b) Let f (t) = sin t;1 1 ;1= 1= ] (t). Show that the lp norm of the
wavelet coe cients of f is nite if and only if p < 1. Use the
approximate formula (9.111). 9.5. PROBLEMS 9.12.
9.13. 9.14. 9.15. 9.16. 573 (c) Compute an upper bound of the non-linear approximation error M ] of sin t;1 from M wavelet coe cients. Verify your
theoretical estimate with a numerical calculation in WaveLab.
1 Let f be a signal of size N and T a given threshold. Describe a
fast algorithm that searches in a wavelet packet or a local cosine
dictionary for the best basis B = fgm g0 m<N that minimizes the
number of inner products such that jhf gm ij T .
1 Best translated basis Let f
j m n]gj m be a discrete wavelet
orthonormal basis of signals of period N , computed with a conk
jugate mirror lter h with K non-zero coe cients. Let j m n] =
k
j m n ; k] and Bk = f j m n]gj m be the translated basis, for any
0 k < N.
(a) Describe an algorithm that decomposes f over all wavelets
k
j m with O(KN log2 N ) operations.
P
k
(b) Let C (f Bk ) = j m jhf j m ij2 =kf k2 . Describe an algorithm that nds the best shift l such that C (f Bl) = 0 min C (f Bk ),
k<N
with O(N log2 N ) operations 281].
1 Best wavelet packet and local cosine approximations
(a) Synthesize a discrete signal that is well approximated by few
vectors in a best wavelet packet basis, but which requires many
more vectors to obtain an equivalent approximation in a best
local cosine basis. Test your signal in WaveLab.
(b) Design a signal that is well approximated in a best local cosine basis but requires many more vectors to approximate it
e cient in a best wavelet packet basis. Verify your result in
WaveLab.
1 In two dimensions, a wavelet packet quad-tree of an image of
size N 2 requires a storage of N 2 log2 N numbers. Describe an
algorithm that nds the best wavelet packet basis with a storage
of 4N 2 =3, by constructing the wavelet packet tree and computing
the cost function in a depth- rst preorder 76].
2 A double tree of block wavelet packet bases is de ned in Problem
8.11.
(a) Describe a fast best basis algorithm which requires O(N (log2 N )2 )
operations to nd the block wavelet packet basis that minimizes an additive cost (9.65) 208]. 574 CHAPTER 9. AN APPROXIMATION TOUR
(b) Implement the double tree decomposition and the best basis
search in WaveLab. Program a display that shows the timefrequency tiling of the best basis and the amplitude of the
decomposition coe cients. How does the best block wavelet
packet basis compare with a best local cosine basis for the
Greasy and Tweet signals?
2 Let D = f n ; k ] exp (i2 kn=N )g
9.17.
0 k<N be a Dirac-Fourier
dictionary to decompose N periodic signals.
(a) Prove that a matching pursuit residue calculated with an optimality factor = 1 satis es kRm f k kf k exp (;m=(2N )).
(b) Implement the matching pursuit ; this Dirac-Fourier dictioin
nary and decompose f n] = exp ;i2 n2 =N . Compare the
decay rate of the residue with the upper bound that was calculated. Suggest a better dictionary to decompose this signal.
2 Let f be a piecewise constant image de ned over 0 N ]2 . Sup9.18.
pose that f is constant over regions f i g1 k K whose borders are
di erentiable curves with a bounded curvature. It may be discontinuous along the borders of the i . Prove that there exists K > 0
such that for any M > 0 one can construct fM which is constant
on the M triangles of a triangulation of 0 N ]2 and which satis es
kf ; fM k K M ;2. Design and implement in WaveLab an algorithm which computes fM for any piecewise constant function f .
Compare the performance of your algorithm with an approximation with M vectors selected from a two-dimensional Haar wavelet
basis.
9.19. 3 Let (t) be a cubic box spline centered at t = 0. We de ne a
dictionary of N periodic cubic splines: D= n j o (n ; k) mod N ] 0 j log2 N 0 k<N where j n] = Kj (2;j n) for j 1, and 0 n] = n].
(a) Implement a matching pursuit in this dictionary.
(b) Show that if f n] = j n] + j n ; k] where k is on the order
of 2j , then the greediness of the matching pursuit may lead to
a highly non-optimal decomposition. Explain why. Would a
basis pursuit decomposition do better?
(c) If f n] 0, explain how to improve the matching pursuit by
imposing that Rm f n] 0 for any m 0. Chapter 10
Estimations Are
Approximations
In a background noise of French conversations, it is easier to carry on
a personal discussion in English. The estimation of signals in additive
noise is similarly optimized by nding a representation that discriminates the signal from the noise.
An estimation is calculated by an operator that attenuates the noise
while preserving the signal. Linear operators have long predominated
because of their simplicity, despite their limited performance. It is
possible to keep the simplicity while improving the performance with
non-linearities in a sparse representation. Thresholding estimators are
studied in wavelet and wavelet packet bases, where they are used to
suppress additive noises and restore signals degraded by low-pass lters.
Non-linear estimations from sparse representations are also studied for
operators, with an application to power spectrum estimation.
Optimizing an estimator requires taking advantage of prior information. Bayes theory uses a probabilistic signal model to derive estimators
that minimize the average risk. These models are often not available
for complex signals such as natural images. An alternative is o ered
by the minimax approach, which only requires knowing a prior set
where the signal is guaranteed to be. The quasi-minimax optimality of
wavelet thresholding estimators is proved for piecewise regular signals
and images.
575 576 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 10.1 Bayes Versus Minimax 2
A signal f n] of size N is contaminated by the addition of a noise. This
noise is modeled as the realization of a random process W n], whose
probability distribution is known. The measured data are
X n] = f n] + W n] :
The signal f is estimated by transforming the noisy data X with a
decision operator D. The resulting estimator is
~
F = DX :
Our goal is to minimize the error of the estimation, which is measured
by a loss function. For speech or images, the loss function should measure the audio and visual degradation, which is often di cult to model.
A mean-square distance is certainly not a perfect model of perceptual
degradations, but it is mathematically simple and su ciently precise
in most applications. Throughout this chapter, the loss function is
thus chosen to be a square Euclidean norm. The risk of the estimator
~
F of f is the average loss, calculated with respect to the probability
distribution of the noise W :
r(D f ) = Efkf ; DX k2g :
(10.1)
The optimization of the decision operator D depends on prior information that is available about the signal. The Bayes framework
supposes that we know the probability distribution of the signal and
optimizes D to minimize the expected risk. The main di culty is to
acquire enough information to de ne this prior probability distribution,
which is often not possible for complex signals. The minimax framework uses a simpler model which says that signals remain in a prior
set . The goal is then to minimize the maximum risk over . Section 10.1.2 relates minimax and Bayes estimators through the minimax
theorem. 10.1.1 Bayes Estimation The Bayes principle supposes that signals f are realizations of a random vector F whose probability distribution is known a priori. This 10.1. BAYES ESTIMATION 577 probability distribution is called the prior distribution. The noisy data
are thus rewritten
X n] = F n] + W n] :
We suppose that the noise values W k] are independent from the signal
F n] for any 0 k n < N . The joint distribution of F and W is the
product of the distributions of F and W . It speci es the conditional
probability distribution of F given the observed data X , also called
the posterior distribution. This posterior distribution can be used to
~
construct a decision operator D that computes an estimation F = DX
of F from the data X .
The Bayes risk is the expected risk calculated with respect to the
prior probability distribution of the signal: r(D ) = E fr(F D)g :
By inserting (10.1), it can be rewritten as an expected value relative to
the joint probability distribution of the signal and the noise:
N ;1
X
~
~
r(D ) = EfkF ; F k2g = EfjF n] ; F n]j2g:
n=0 Let On be the set of all operators (linear and non-linear) from C N to
C N . Optimizing D yields the minimum Bayes risk: rn( ) = Dinf n r(D ) :
2O
The following theorem proves that there exist a Bayes decision operator
~
D and a corresponding Bayes estimator F that achieve this minimum
risk.
~
Theorem 10.1 The Bayes estimator F that yields the minimum Bayes
risk rn( ) is the conditional expectation
~
F n] = EfF n] j X 0] X 1] ::: X N ; 1]g: (10.2) Proof 2 . Let n (y) be the probability distribution of the value y of F n].
~
The minimum risk is obtained by nding F n] = Dn (X ) that minimizes CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 578 ~
n] ; F n]j2 g, for each 0 n < N . This risk depends on
the conditional distribution Pn (xjy) of the data X = x, given F n] = y: r(Dn n ) = EfjF r(Dn n) = ZZ (Dn (x) ; y)2 dPn (xjy) d n (y) : R Let P (x) = Pn (xjy) d n (y) be the marginal distribution of X and
n (yjx) be the posterior distribution of F n] given X . The Bayes formula
gives
ZZ
r (D n n ) =
(Dn (x) ; y)2 d n (yjx) dP (x) :
The double integral is minimized by minimizing the inside integral for
each x. This quadratic form is minimum when its derivative vanishes: @ Z (D (x) ; y)2 d (yjx) = 2 Z (D (x) ; y) d (yjx) = 0
n
n
n
n
@Dn(x) which implies that Z Dn (x) = y d n (yjx) = EfF n] j X = xg
so Dn (X ) = EfF n] j X g. Linear Estimation The conditional expectation (10.2) is generally
a complicated non-linear function of the data fX k]g0 k<N , and is difcult to evaluate. To simplify this problem, we restrict the decision
operator D to be linear. Let Ol be the set of all linear operators from
C N to C N . The linear minimum Bayes risk is:
rl ( ) = Dinf r(D ) :
2O
l ~
The linear estimator F = DF that achieves this minimum risk is called
the Wiener estimator. The following proposition gives a necessary
and su cient condition that speci es this estimator. We suppose that
EfF n]g = 0, which can be enforced by subtracting EfF n]g from X n]
to obtain a zero-mean signal.
~
Proposition 10.1 A linear estimator F is a Wiener estimator if and
only if
~
Ef(F n] ; F n]) X k]g = 0 for 0 k n < N :
(10.3) 10.1. BAYES ESTIMATION 579 Proof 2 . For each 0 n < N , we must nd a linear estimation ~
F n] = DnX =
which minimizes r (Dn n) = E ( F n] ; N ;1
X
k=0 N ;1
X
k=0 h n k] X k] h n k] X k] F n] ; N ;1
X
k=0 h n k] X k] ) : (10.4)
The minimum of this quadratic form is reached if and only if for each
0 k < N, @r(Dn n) = ;2 E
@h n k] ( F n] ; N ;1
X
l=0 ) h n l] X l] X k ] = 0 which veri es (10.3). If F and W are independent Gaussian random vectors, then the linear
optimal estimator is also optimal among non-linear estimators. Indeed,
two jointly Gaussian random vectors are independent if they are non~
correlated 56]. Since F n];F n] is jointly Gaussian with X k], the non~
correlation (10.3) implies that F n] ; F n] and X k] are independent
~
for any 0 k n < N . In this case, we can verify that F is the Bayes
~
estimator (10.2): F n] = EfF n] j X g. Estimation in a Karhunen-Loeve Basis The following theorem proves that if the covariance matrices of the signal F and of the noise
W are diagonal in the same Karhunen-Loeve basis B = fgmg0 m<N
then the optimal linear estimator is diagonal in this basis. We write
~
~
XB m] = hX gm i FB m] = hF gmi FB m] = hF gmi
2
2
m = EfjFB m]j g WB m] = hW gm i
2
2
m = EfjWB m]j g : 580 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Theorem 10.2 (Wiener) If there exists a Karhunen-Loeve basis B =
fgm g0 m<N that diagonalizes the covariance matrices of both F and W ,
then the Wiener estimator is ~
F= X N ;1 2 m m=0 m + m
2 2 XB m] gm (10.5) and the resulting minimum linear Bayes risk is rl ( ) = X N ;1
m=0 2 2 mm:
2+ 2
m
m (10.6) ~
Proof 2 . Let F n] be a linear estimator of F n]:
~
F n] = N ;1
X
l=0 h n l] X l]: (10.7) This equation can be rewritten as a matrix multiplication by introducing
the N N matrix H = (h n l])0 n l<N :
~
F = H X:
(10.8)
The non-correlation condition (10.3) implies that for 0 n k < N
~
EfF n] X k]g = EfF n] X k]g = N ;1
X
l=0 h n l] EfX l] X k]g: Since X k] = F k] + W k] and EfF n] W k]g = 0, we derive that
EfF n] F k]g = N ;1
X
l=0 h n l] EfF l] F k]g + EfW l] W k]g : (10.9) Let RF and RW be the covariance matrices of F and W , whose entries
are respectively EfF n] F k]g and EfW n] W k]g. Equation (10.9) can
be rewritten as a matrix equation: RF = H (RF + RW ):
Inverting this equation gives H = RF (RF + RW );1 : 10.1. BAYES ESTIMATION 581 Since RF and RW are diagonal in the basis B with diagonal values re2
2
spectively equal to m and m , the matrix H is also diagonal in B with
2 ( 2 + 2 );1 . So (10.8) shows that the dediagonal values equal to m m m
~
composition coe cients of F and X in B satisfy
~
FB m] =
which implies (10.5).
The resulting risk is
EfkF ~
; F k2 g = 2 m 2
m+ m
2 N ;1 n
X
E jFB
m=0 XB m] (10.10) o ~
m] ; FB m]j2 : (10.11) Inserting (10.10) in (10.11) knowing that XB m] = FB m]+ WB m] where
FB m] and WB m] are independent yields (10.6). This theorem proves that the Wiener estimator is implemented with
a diagonal attenuation of each data coe cient XB m] by a factor that
2
2
depends on the signal to noise ratio m = m in the direction of gm. The
smaller the signal to noise ratio, the more attenuation is required. If F
and W are Gaussian processes, then the Wiener estimator is optimal
among linear and non-linear estimators of F .
If W is a white noise then its coe cients are uncorrelated with the
same variance
EfW n] W k]g = 2 n ; k] :
Its covariance matrix is therefore RW = 2 Id. It is diagonal in all
orthonormal bases and in particular in the Karhunen-Loeve basis of F .
Theorem 10.2 can thus be applied and m = for 0 m < N . Frequency Filtering Suppose that F and W are zero-mean, wide- sense circular stationary random vectors. The properties of such processes are reviewed in Appendix A.6. Their covariance satis es
E fF n] F k]g = RF n ; k] E fW n] W k]g = RW n ; k] where RF n] and RW n] are N periodic. These matrices correspond to
circular convolution operators and are therefore diagonal in the discrete 582 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Fourier basis gm n] = p1 exp i2m n
N
N 0 m<N : 2
2
The eigenvalues m and m are the discrete Fourier transforms of RF n]
and RW n], also called power spectra:
N ;1
X
m
2
^
= RF n] exp ;i2N n = RF m]
m
n=0 X N ;1 m
^
RW n] exp ;i2N n = RW m]:
n=0
The Wiener estimator (10.5) is a diagonal operator in the discrete
Fourier basis, computed with the frequency lter:
^
^
h m] = ^ RF m]
:
(10.12)
^
RF m] + RW m]
2 m = It is therefore a circular convolution:
~
F n] = D X = X ? h n]:
The resulting risk is calculated with (10.6):
N ;1 ^
X RF m] RW m]
^
~ k2g =
rl ( ) = EfkF ; F
:
^
^
m=0 RF m] + RW m] (10.13) The numerical value of the risk is often speci ed by the Signal to Noise
Ratio, which is measured in decibels SNRdb = 10 log10 EfkF k2 g ~
EfkF ; F k2 g : (10.14) 10.1. BAYES ESTIMATION 583 100 100 50 50 0 0 −50 −50 −100
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (a) −100
0 0.2 0.4 (b) 0.6 0.8 100 50 0 −50 −100
0 (c)
Figure 10.1: (a): Realization of a Gaussian process F . (b): Noisy
signal obtained by adding a Gaussian white noise (SNR = ;0:48 db).
~
(c): Wiener estimation F (SNR = 15:2 db). 1 584 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Example 10.1 Figure 10.1(a) shows a realization of a Gaussian process F obtained as a convolution of a Gaussian white noise B of variance
2
with a low-pass lter g:
F n] = B ? g n]
with n
g n] = C cos2 2K 1 ;K K ] n] :
Theorem A.4 proves that
^
^
RF m] = RB m] jg m]j2 = 2 jg m]j2:
^
^ The noisy signal X shown in Figure 10.1(b) is contaminated by a Gaus^
sian white noise W of variance 2 , so RW m] = 2 . The Wiener esti~ is calculated with the frequency lter (10.12)
mation F
2
jg j2
^
^
h m] = 2 jg m]m]+ 2 :
^ j2
This linear estimator is also an optimal non-linear estimator because F
and W are jointly Gaussian random vectors. Piecewise Regular The limitations of linear estimators appear clearly
for processes whose realizations are piecewise regular signals. A simple example is a random shift process F constructedP translating
by
;
randomly a piecewise regular signal f n] of zero mean, N=01 f n] = 0:
n F n] = f (n ; P ) mod N ] : (10.15) The shift P is an integer random variable whose probability distribution
is uniform on 0 N ; 1]. It is proved in (9.20) that F is a circular widesense stationary process whose power spectrum is calculated in (9.21):
1
^
RF m] = N jf^ m]j2:
(10.16)
Figure 10.2 shows an example of a piecewise polynomial signal f of
degree d = 3 contaminated by a Gaussian white noise W of variance 2 .
~
Assuming that we know jf^ m]j2, the Wiener estimator F is calculated as 10.1. BAYES ESTIMATION 585 200 200 150 150 100 100 50 50 0 0 −50 −50 −100
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (a) −100
0 0.2 0.4 (b) 0.6 0.8 200
150
100
50
0
−50
−100
0 (c)
Figure 10.2: (a): Piecewise polynomial of degree 3. (b): Noisy signal
degraded by a Gaussian white noise (SNR = 21.9 db). (c): Wiener
estimation (SNR= 25.9 db). 1 586 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS a circular convolution with the lter whose transfer function is (10.12).
This Wiener lter is a low-pass lter that averages the noisy data to
attenuate the noise in regions where the realization of F is regular,
but this averaging is limited to avoid degrading the discontinuities too
much. As a result, some noise is left in the smooth regions and the
discontinuities are averaged a little. The risk calculated in (10.13) is
normalized by the total noise energy EfkW k2g = N 2:
X
rl ( ) = N ;1 N ;1 jf^ m]j2 :
(10.17)
N 2 m=0 jf^ m]j2 + N 2
Suppose that f has discontinuities of amplitude on the order of C
and that the noise energy is not negligible: N 2 C 2. Using the fact
that jf^ m]j decays typically like C N m;1 , a direct calculation of the
risk (10.17) gives
rl ( )
C:
(10.18)
N2
N 1=2
The equivalence means that upper and lower bounds of the left-hand
side are obtained by multiplying the right-hand side by two constants
A B > 0 that are independent of C , and N .
The estimation of F can be improved by non-linear operators, which
average the data X over large domains where F is regular but do not
make any averaging where F is discontinuous. Many estimators have
been studied 183, 276] that estimate the position of the discontinuities
of f in order to adapt the data averaging. These algorithms have
long remained ad hoc implementations of intuitively appealing ideas.
Wavelet thresholding estimators perform such an adaptive smoothing
and Section 10.3.3 proves that the normalized risk decays like N ;1 as
opposed to N ;1=2 in (10.18). 10.1.2 Minimax Estimation Although we may have some prior information, it is rare that we know
the probability distribution of complex signals. This prior information
often de nes a set to which signals are guaranteed to belong, without
specifying their probability distribution in . The more prior information, the smaller the set . For example, we may know that a signal 10.1. BAYES ESTIMATION 587 has at most K discontinuities, with bounded derivatives outside these
discontinuities. This de nes a particular prior set . Presently, there
exists no stochastic model that takes into account the diversity of natural images. However, many images, such as the one in Figure 2.2, have
some form of piecewise regularity, with a bounded total variation. This
also speci es a prior set .
The problem is to estimate f 2 from the noisy data X n] = f n] + W n] :
~
The risk of an estimation F = DX is r(D f ) = EfkDX ; f k2g. The
expected risk over cannot be computed because we do not know the
probability distribution of signals in . To control the risk for any
f 2 , we thus try to minimize the maximum risk:
r(D ) = sup EfkDX ; f k2g:
f2
The minimax risk is the lower bound computed over all linear and
non-linear operators D:
rn( ) = Dinf n r(D ):
2O
In practice, we must nd a decision operator D that is simple to implement and such that r(D ) is close to the minimax risk rn( ).
As a rst step, as for Wiener estimators in the Bayes framework,
we can simplify the problem by restricting D to be a linear operator.
The linear minimax risk over is the lower bound: rl ( ) = Dinf r(D ):
2O
l This strategy is e cient only if rl ( ) is of the same order as rn( ). Bayes Priors A Bayes estimator supposes that we know the prior
probability distribution of signals in . If available, this supplement
of information can only improve the signal estimation. The central
result of game and decision theory shows that minimax estimations are
Bayes estimations for a \least favorable" prior distribution. 588 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Let F be the signal random vector, whose probability distribution
is given by the prior . For a decision operator D, the expected risk
is r(D ) = E fr(D F )g. The minimum Bayes risks for linear and
non-linear operators are de ned by:
rl ( ) = Dinf r(D ) and rn( ) = Dinf r(D ) :
2O
2O
n l Let be the set of all probability distributions of random vectors
whose realizations are in . The minimax theorem relates a minimax
risk and the maximum Bayes risk calculated for priors in .
Theorem 10.3 (Minimax) For any subset of C N
rl ( ) = sup rl ( ) and rn( ) = sup rn( ) :
(10.19)
2
2
Proof 2 . For any 2
r(D ) r(D )
(10.20)
because r(D ) is an average risk over realizations of F that are in ,
whereas r(D ) is the maximum risk over . Let O be a convex set of
operators (either Ol or On ). The inequality (10.20) implies that
sup r( ) = sup Dnf r(D ) Dnf r(D ) = r( ) :
i
i 2O
(10.21)
2
2 2O
The main di culty is to prove the reverse inequality: r( ) sup 2 r( ).
When is a nite set, the proof gives an important geometrical interpretation of the minimum Bayes risk and the minimax risk. The extension
to an in nite set is sketched.
Suppose that = ffi g1 i p is a nite set of signals. We de ne a
risk set:
R = f(y1 ::: yp ) 2 C p : 9D 2 O with yi = r(D fi) for 1 i pg :
This set is convex in C p because O is convex. We begin by giving geometrical interpretations to the Bayes risk and the minimax risk.
A prior 2 is a vector of discrete probabilities ( 1 ::: p ) and r( D) = p
X
i=1 i r(D fi ) : (10.22) 10.1. BAYES ESTIMATION 589 P The equation p=1 i yi = b de nes a hyperplane Pb in C p . Computing
i
r( ) = inf D2O r(D ) is equivalent to nding the in mum b0 = r( ) of
all b for which Pb intersects R. The plane Pb0 is tangent to R as shown
in Figure 10.3.
The minimax risk r( ) has a di erent geometrical interpretation.
Let Qc = f(y1 ::: yp ) 2 C p : yi cg One can verify that r( ) =
inf D2O supfi 2 r(D fi ) is the in mum c0 = r( ) of all c such that Qc
intersects R.
r(D,f2 ) π
R Bayes
11
00 1
0
1
0 Minimax c0 τ Qc
0
r(D,f1 )
c0 Figure 10.3: At the Bayes point, a hyperplane de ned by the prior
is tangent to the risk set R. The least favorable prior de nes a
hyperplane that is tangential to R at the minimax point.
To prove that r( ) sup 2 r( ) we look for a prior distribution
~
~
such that r( ) = r( ). Let Qc0 be the interior of Qc0 . Since Qc0 \
~ c0 and R are convex sets, the hyperplane separation
R = and both Q
theorem says that there exists a hyperplane of equation 2 p
X
i=1 i yi = :y = b (10.23) ~
with : y b for y 2 Qc0 and : y b for y 2 R. Each i 0, for
~
if j < 0 then for y 2 Qc0 we obtain a contradiction by taking yj to
;1 with the other coordinates being xed. Indeed, : y goes to +1
~
and since y remains in Qc0 it contradicts the fact that P b. We can
:y
Pp
normalize i=1 i = 1 by dividing each side of (10.23) by p=1 i > 0. So
i
~
corresponds to a probability distribution. By letting y 2 Qc0 converge 590 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
to the corner point (c0 ::: c0 ), since y :
Moreover, since : y b for all y 2 R, r( ) = D2O
inf p
X
i=1 i r(D fi ) b we derive that c0 b. c c0 = r ( ) : So r( ) sup 2 r( ) which, together with (10.21), proves that r( ) =
sup 2 r( ).
The extension of this result to an in nite set of signals is done with
a compacity argument. When O = Ol or O = On , for any prior 2
we know from Theorem 10.1 and Proposition 10.1 that inf D2O r(D )
is reached by some Bayes decision operator D 2 O. One can verify that
there exists a subset of operators C that includes the Bayes operator for
any prior 2 , and such that C is compact for an appropriate topology.
When O = Ol , one can choose C to be the set of linear operators of
norm smaller than 1, which is compact because it belongs to a nite
dimensional space of linear operators. Moreover, the risk r(f D) can be
shown to be continuous in this topology with respect to D 2 C .
Let c < r( ). For any f 2 we consider the set of operators
Sf = fD 2 C : r(D f ) > cg. The continuity of r implies that Sf is
an open set. For each D 2 C there exists f 2 such that D 2 Sf ,
so C = f 2 Sf . Since C is compact there exists a nite subcovering
C = 1 i pSfi . The minimax risk over c = ffig1 i p satis es
r( c) = D2O sup r(D fi ) c :
inf
1 ip Since c is a nite set, we proved that there exists c 2 c
such that r( c ) = r( c ). But r( c) c so letting c go to r( ) implies that sup 2 r( ) r( ). Together with (10.21) this shows that
inf 2 r( ) = r( ).
A distribution 2 such that r( ) = inf 2 r( ) is called a least favorable prior distribution. The minimax theorem proves that the
minimax risk is the minimum Bayes risk for a least favorable prior.
In signal processing, minimax calculations are often hidden behind
apparently orthodox Bayes estimations. Let us consider an example
involving images. It has been observed that histograms of the wavelet
coe cients of \natural" images can be modeled with generalized Gaussian distributions 255, 311]. This means that natural images belong 10.2. DIAGONAL ESTIMATION IN A BASIS 2 591 to a certain set , but it does not specify a prior distribution over this
set. To compensate for the lack of knowledge about the dependency of
wavelet coe cients spatially and across scales, one may be tempted to
create a \simple probabilistic model" where all wavelet coe cients are
considered to be independent. This model is clearly wrong since images have geometrical structures that create strong dependencies both
spatially and across scales (see Figure 7.26). However, calculating a
Bayes estimator with this inaccurate prior model may give valuable
results when estimating images. Why? Because this \simple" prior is
often close to a least favorable prior. The resulting estimator and risk
are thus good approximations of the minimax optimum. If not chosen
carefully, a \simple" prior may yield an optimistic risk evaluation that
is not valid for real signals. Understanding the robustness of uncertain
priors is what minimax calculations are often about. 10.2 Diagonal Estimation in a Basis 2
It is generally not possible to compute the optimal Bayes or minimax
estimator that minimizes the risk among all possible operators. To
manage this complexity, the most classical strategy limits the choice of
operators among linear operators. This comes at a cost, because the
minimum risk among linear estimators may be well above the minimum
risk obtained with non-linear estimators. Figure 10.2 is an example
where the linear Wiener estimation can be considerably improved with
a non-linear averaging. This section studies a particular class of nonlinear estimators that are diagonal in a basis B. If the basis B de nes a
sparse signal representation, then such diagonal estimators are nearly
optimal among all non-linear estimators.
Section 10.2.1 computes a lower bound for the risk when estimating
an arbitrary signal f with a diagonal operator. Donoho and Johnstone
167] made a fundamental breakthrough by showing that thresholding
estimators have a risk that is close to this lower bound. The general
properties of thresholding estimators are introduced in Sections 10.2.2
and 10.2.3. Thresholding estimators in wavelet bases are studied in
Section 10.2.4. They implement an adaptive signal averaging that is
much more e cient than linear operators to estimate piecewise regular 592 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS signals. Section 10.2.5 ends this section by explaining how to search
for a best basis that minimizes the risk. The minimax optimality of
diagonal operators for estimating signals in a prior set is studied in
Section 10.3. 10.2.1 Diagonal Estimation with Oracles We consider estimators computed with a diagonal operator in an orthonormal basis B = fgmg0 m<N . Lower bounds for the risk are computed with \oracles," which simplify the estimation by providing information about the signal that is normally not available. These lower
bounds are closely related to errors when approximating signals from a
few vectors selected in B.
The noisy data
X =f +W
(10.24)
is decomposed in B. We write XB m] = hX gm i fB m] = hf gmi and WB m] = hW gmi :
The inner product of (10.24) with gm gives
XB m] = fB m] + WB m]:
We suppose that W is a zero-mean white noise of variance 2 , which
means
EfW n] W k]g = 2 n ; k] :
The noise coe cients
N ;1
X
WB m] = W n] gm n]
n=0 also de ne a white noise of variance 2. Indeed,
N ;1 N ;1
XX
EfWB m] WB p]g =
gm n] gp k] EfW n] W k]g
= n=0 k=0
2
hgp gm i = 2 p ; m]: 10.2. DIAGONAL ESTIMATION IN A BASIS 2 593 Since the noise remains white in all bases, it does not in uence the
choice of basis. When the noise is not white, which is the case for
the inverse problems of Section 10.4, the noise can have an important
impact on the basis choice.
A diagonal operator estimates independently each fB m] from XB m]
with a function dm(x). The resulting estimator is
N ;1
X
~ = D X = dm (XB m]) gm :
F
(10.25)
m=0 The class of signals that are considered is supposed to be centered at
0, so we set D 0 = 0 and hence dm (0) = 0. As a result, we can write dm (XB m]) = a m] XB m] for 0 m < N ,
where a m] depends on XB m]. The operator D is linear when a m] is
a constant independent of XB m]. We shall see that a smaller risk is
obtained with ja m]j 1, which means that the diagonal operator D
attenuates the noisy coe cients. Attenuation With Oracle Let us nd the a m] that minimizes the
risk r(D f ) of the estimator (10.25):
n ~ o N ;1
X
r(D f ) = E kf ; F k2 = EfjfB m] ; XB m] a m]j2 g : (10.26)
m=0 Since XB = fB + WB and EfjWB m]j2 g = it follows that 2 EfjfB m] ; XB m] a m]j2 g = jfB m]j2 (1 ; a m])2 + 2 a m]2 : (10.27) This risk is minimum for j
a m] = jf jfB ]m]+
2
Bmj 2 in which case ~
rinf (f ) = Efkf ; F k2g = X jfB m]j2 N ;1 m=0 jfB (10.28) 2 m]j2 + 2
2 : (10.29) 594 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS In practice, the attenuation factor a m] in (10.28) cannot be computed
since it depends on jfB m]j, whose value is not known. The risk rinf (f )
is therefore a lower bound which is not reachable. This risk is obtained
with an oracle that provides information that is normally not available.
Section 10.2.2 shows that one can get close to rinf (f ) with a simple
thresholding. Linear Projection The analysis of diagonal estimators can be simpli ed by restricting a m] 2 f0 1g. When a m] = 1, the estimator
~
F = DX selects the coe cient XB m], and it removes it if a m] = 0.
If each a m] is a constant, then D is a linear orthonormal projection on the space generated by the M vectors gm such that a m] = 1.
Suppose that a m] = 1 for 0 m < M . The risk (10.26) becomes
N ;1
X
r (D f ) =
jfB m]j2 + M 2 = l M ] + M 2
(10.30)
m=M where l M ] is the linear approximation error computed in (9.1). The
two terms l M ] and M 2 are respectively the bias and the variance
components of the estimator. To minimize r(D f ), the parameter M
is adjusted so that the bias is of the same order as the variance. When
the noise variance 2 decreases, the following proposition proves that
the decay rate of r(D f ) depends on the decay rate of l M ] as M
increases.
Proposition 10.2 If l M ] C 2 M 1;2s with 1 C= N s then
min r(D f ) C 1=s 2;1=s :
(10.31)
M Proof 2 . Let M0 be de ned by: (M0 + 1) 2 l M0 ] M0 2 :
Since l M ] C 2 M 1;2s we get M0 C s = s . The condition 1 C =
N s ensures that 1 M0 N . The risk (10.30) satis es
M0 2 min r(D f ) (2M0 + 1) 2
(10.32)
M
and M0 C s= s implies (10.32). 10.2. DIAGONAL ESTIMATION IN A BASIS 2 595 Projection With Oracle The non-linear projector that minimizes
the risk (10.27) is de ned by 1 if jfB m]j
(10.33)
0 if jfB m]j < :
This projector cannot be implemented because a m] depends on jfB m]j
instead of XB m]. It uses an \oracle" that keeps the coe cients fB m]
that are above the noise. The risk of this oracle projector is computed
with (10.27):
N ;1
X
~ k2g = min(jfB m]j2 2):
(10.34)
rp(f ) = Efkf ; F a m] = m=0 Since for any x y x y 1 min(x y)
x+y 2
the risk of the oracle projector (10.34) is of the same order as the risk
of an oracle attenuation (10.29):
(10.35)
rp(f ) rinf (f ) 1 rp(f ) :
2
As in the linear case, the risk of an oracle projector can be related
to the approximation error of f in the basis B. Let M be the number
of coe cients such that jfB m]j . The optimal non-linear approximation of f by these M larger amplitude coe cients is
X
fM =
fB m] gm :
jfB m]j
The approximation error is studied in Section 9.2:
X
M ] = kf ; fM k2 =
jfB m]j2 :
n
jfB m]j<
The risk (10.34) of an oracle projection can thus be rewritten
N ;1
X
rp(f ) = min(jfB m]j2 2) = n M ] + M 2 :
(10.36)
min(x y) m=0 The following proposition proves that when decreases, the decay of
this risk depends on the decay of n M ] as M increases. CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 596 Proposition 10.3 If n M ] C 2 M 1;2s with 1 C=
rp(f ) C 1=s 2;1=s : N s, then
(10.37) r
Proof 2 . Let fB k] = fB mk ] be the coe cient of rank k in decreasing
r
r
order: jfB k]j jfB k + 1]j. If n M ] C 2 M 1;2s then Theorem 9.4
r k]j C k;s .
proves that jfB
r
r
Since jfB M ]j
> jfB M + 1]j we derive that M (C= )1=s .
The condition 1 C =
N s guarantees 1 M < N . It follows that
2 and (10.37) is derived from (10.36).
n M] M Propositions 10.2 and 10.3 prove that the performance of linear and oracle projection estimators depends respectively on the precision of linear
and non-linear approximations in the basis B. Having an approximation error that decreases quickly means that one can then construct a
sparse and precise signal representation with only a few vectors in B.
Section 9.2 shows that non-linear approximations can be much more
precise, in which case the risk of a non-linear oracle projection is much
smaller than the risk of a linear projection. 10.2.2 Thresholding Estimation
In a basis B = fgm g0 m<N , a diagonal estimator of f from X = f + W
can be written
N ;1
X
~
F = DX = dm(XB m]) gm :
(10.38)
m=0 We suppose that W is a Gaussian white noise of variance 2 . When
dm are thresholding functions, the risk of this estimator is shown to be
close to the lower bounds obtained with oracle estimators. Hard thresholding A hard thresholding estimator is implemented
with dm(x) = T (x) = x if jxj > T :
0 if jxj T (10.39) 10.2. DIAGONAL ESTIMATION IN A BASIS 2 597 The operator D in (10.38) is then a non-linear projector in the basis
B. The risk of this thresholding is rt(f ) = r(D f ) = X N ;1
m=0 EfjfB m] ; T (XB m])j2 g : Since XB m] = fB m] + WB m],
jfB m] ; T (XB m])j2 = jWB m]j2 if jXB m]j > T :
jfB m]j2 if jXB m]j T A thresholding is a projector whose risk is therefore larger than the risk
(10.34) of an oracle projector:
N ;1
X
rt(f ) rp(f ) = min(jfB m]j2 2):
m=0 Soft Thresholding An oracle attenuation (10.28) yields a risk rinf (f )
that is smaller than the risk rp(f ) of an oracle projection, by slightly
decreasing the amplitude of all coe cients in order to reduce the added
noise. A similar attenuation, although non-optimal, is implemented by
a soft thresholding, which decreases by T the amplitude of all noisy
~
coe cients. The resulting diagonal estimator F in (10.38) is calculated
with the soft thresholding function dm(x) = 8 x;T
<
T (x) = : x + T
0 if x T
if x ;T :
if jxj T (10.40) This soft thresholding is the solution that minimizes a quadratic distance to the data, penalized by an l1 norm. Given the data x m], the
vector y m] which minimizes
N ;1
N ;1
X
X
jy m] ; x m]j2 + 2 T
jy m]j
m=1 is y m] = T (x m]). m=1 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 598 The threshold T is generally chosen so that there is a high probability that it is just above the maximum level of the noise coe cients
jWB m]j. Reducing by T the amplitude of all noisy coe cients thus
ensures that the amplitude of an estimated coe cient is smaller than
the amplitude of the original one:
j T (XB m])j jfB m]j : (10.41) In a wavelet basis where large amplitude coe cients correspond to transient signal variations, this means that the estimation keeps only transients coming from the original signal, without adding others due to
the noise. Thresholding Risk The following theorem 167] proves that for an appropriate choice of T , the P of a thresholding is close to the risk of
risk
;1
an oracle projector rp(f ) = N =0 min(jfB m]j2 2). We denote by Od
m
the set of all operators that are in B, and which can thus be written as
in (10.38). Theorem 10.4 (Donoho, Johnstone) Let T = p2 log N . The
e
risk rt (f ) of a hard or a soft thresholding estimator satis es for all
N4
rt(f ) (2 loge N + 1) 2 + rp(f ) :
(10.42)
The factor 2 loge N is optimal among diagonal estimators in B:
~ k2
1
lim1 Dinf sup Efkf+; Ff ) g 2 log N = 1 :
(10.43)
2
N !+ 2Od f 2C N
rp(
e
Proof 2 . The proof of (10.42) is given for a soft thresholding. For a hard
thresholding, the proof is similar although slightly more complicated.
For a threshold , a soft thresholding is computed with (x) = (x ; sign(x)) 1jxj> :
Let X be a Gaussian random variable of mean and variance 1. The
risk when estimating with a soft thresholding of X is r( ) = Efj (X ) ; j2 g = Efj(X ; sign(X )) 1jX j> ; j2 g : (10.44) 10.2. DIAGONAL ESTIMATION IN A BASIS 2
If X has a variance
verify that 2 ~
then by considering X = X we and a mean Efj (X ) ; 599 j2g = 2 r : Since fB m] is a constant, XB m] = fB m] + WB m] is a Gaussian
random variable of mean fB m] and variance 2 . The risk of the soft
~
thresholding estimator F with a threshold T is thus rt (f ) = 2 N ;1
X
m=0 r T fB m] : (10.45) An upper bound of this risk is calculated with the following lemma. Lemma 10.1 If 0 then
r( ) r( 0) + min( To prove (10.46), we rst verify that if
@ r( ) = 2 Z ;
0 @ ;+ 2 1 + 2 ): (10.46) 0 then
(x) dx 2 (10.47) where (x) is the normalized Gaussian probability density
2
(x) = p1 exp ; x :
2
2
Indeed (10.44) shows that r( )= 2 Z ; ;+ Z (x) dx+ Z;+
1
(x; )2 (x) dx+
(x+ )2 (x) dx:
;
;1
+ We obtain (10.47) by di erentiating with respect to .
R +1
R +1
(
Since ;1 (x) dx = ;1 x2 (x) dx = 1 and @r@ ) r( ) Moreover, since @r( s)
@s r( lim
!+1 r( 2s ) ; r( 0) = Z
0 ) = 1 + 2: @r( s) ds
@s (10.48) 0, necessarily
(10.49) 2 : (10.50) 600 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
The inequality (10.46) of the lemma is nally derived from (10.49) and
(10.50): r( ) min(r( 0) + 2 1 + 2 ) r( 0) + min( 2 1 + 2 ): By inserting the inequality (10.46) of the lemma in (10.45), we get rt (f ) N r T 0 +
2 2 N ;1
X
m=0 2
2
min T 2 jfB m]j :
2 (10.51) The expression (10.48) shows that r( 0) = 2 0+1 x2 (x + ) dx. For
p
T = 2 loge N and N 4, one can verify that Nr T 0 R 2 loge N + 1 : (10.52) Moreover,
2 2
2
min T 2 jfB m]j
2 = min(2 2 loge N jfB m]j2 )
(2 loge N + 1) min( 2 jfB m]j2 )(:10.53) Inserting (10.52) and (10.53) in (10.51) proves (10.42).
Since the soft and hard thresholding estimators are particular instances of diagonal estimators, the inequality (10.42) implies that
~ k2
1
lim1 Dinf sup Efkf+; Ff ) g 2 log N 1 :
(10.54)
2
N !+ 2Od f 2C N
rp (
e
To prove that the limit is equal to 1, for N xed we compute a lower
bound by replacing the sup over all signals f by an expected value over
the distribution of a particular signal process F . The coe cients FB m]
are chosen to de ne a very sparse sequence. They are independent random variables having a high probability 1 ; N to be equal to 0 and
a p probability N to be equal to a value N that is on the order of
low
2 loge N , but smaller. By adjusting N and N , Donoho and John~
stone 167] prove that the Bayes estimator F of F tends to zero as N
increases and they derive a lower bound of the left-hand side of (10.54)
that tends to 1. 10.2. DIAGONAL ESTIMATION IN A BASIS 2 601 The upper bound (10.42) proves that the risk rt (f ) of a thresholding
estimator is at most 2 loge N times larger than the risk rp(f ) of an oracle
projector. Moreover, (10.43) proves that the 2 loge N factor cannot
be improved by any other diagonal estimator. For rp(f ) to be small,
(10.36) shows that f must be well approximated by a few vectors in B.
One can verify 167] that the theorem remains valid if rp(f ) is replaced
by the risk rinf (f ) of an oracle attenuation, which is smaller. Choice of Threshold The threshold T must be chosen just above the maximum level of the noise. Indeed, if f = 0 and thus XB = WB ,
~
then to ensure that F 0 the noise coe cients jWB m]j must have a
high probability of being below T . However, if f 6= 0 then T must not
be too large, so that we do not set to zero too many coe cients such
that jfB m]j
. Since WB is a vector of N independent Gaussian
random variables of variance 2 , one can prove 9] that the maximum
amplitude of the noise has a very high probability of being just below
p
T = 2 loge N :
lim1 Pr T ; loge loge N 0 max jWB m]j T = 1: (10.55)
m<N
N !+
loge N
This explains why the theorem chooses this value. That the threshold
T increases with N may seem counterintuitive. This is due to the tail
of the Gaussian distribution, which creates larger and larger amplitude
noise coe cients when the sample size increases. The threshold T =
p2 log N is not optimal and in general a lower threshold reduces the
e
risk. One can however p that when N tends to +1, the optimal
prove
value of T grows like 2 loge N . Upper-Bound Interpretation Despite the technicality of the proof, the factor 2 loge N of the upper bound (10.42) can be easily explained.
The ideal coe cient selection (10.33) sets XB m] to zero if and only
if jfB m]j
, whereas a hard thresholding sets XB m] to zero when
jXB m]j T . If jfB m]jj
then it is very likely that jXB m]j T ,
because T is above the noise level. In this case the hard thresholding
sets XB m] to zero as the oracle projector (10.33) does. If jfB m]j 2T
then it is likely that jXB m]j T because jWB m]j T . In this case
the hard thresholding and the oracle projector retain XB m]. 602 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS The hard thresholding may behave di erently from the ideal coe cient selection when jfB m]j is on the order of T . The ideal selection yields a risk: min( 2 jfB m]j2) = 2 . If we are unlucky and
jXB m]j T , then the thresholding sets XB m] to zero, which produces
a risk
jfB m]j2 T 2 = 2 loge N 2 :
In this worst case, the thresholding risk is 2 loge N times larger than
the ideal selection risk. Since the proportion of coe cients jfB m]j on
the order of T is often small, the ratio between the hard thresholding
risk and the oracle projection risk is generally signi cantly smaller than
2 loge N . Colored Noise Thresholding estimators can be adapted when the
noise W is not white. We suppose that EfW n]g = 0. Since W is not
2
white, m = EfjWB m]j2g depends on each vector gm of the basis. As
in (10.33) and (10.34), we verify that an oracle projector which keeps
all coe cients such that jfB m]j m and sets to zero all others has a
risk
N ;1
X
2
rp(f ) = min(jfB m]j2 m) :
m=0 Any linear or non-linear projector in the basis B has a risk larger than
rp(f ).
Since the noise variance depends on m, a thresholding estimator
must vary the threshold Tm as a function of m. Such a hard or soft
thresholding estimator can be written
~
F = DX = X N ;1
m=0 Tm (XB m]) gm : (10.56) The following proposition generalizes Theorem 10.4 to compute the
~
thresholding risk rt (f ) = Efkf ; F k2g.
~
Proposition 10.4 (Donoho, Johnstone) Let F be a hard or soft
thresholding estimator with Tm = m p 2 loge N for 0 m < N : 10.2. DIAGONAL ESTIMATION IN A BASIS 2
Let 2 = N ;1 PN ;1 2
m=0 m . For any N 603 4 rt (f ) (2 loge N + 1) 2+r p (f ) : (10.57) The proof of (10.57) is identical to the proof of (10.42). The thresholds Tm are chosen to be just above the amplitude of each noisy coefcient WB m]. Section 10.4.2 studies an application to the restoration
of blurred signals. 10.2.3 Thresholding Re nements 3 We mentioned that the thresholding risk can be reduced by choosing a
p
threshold smaller than 2 loge N . A threshold adapted to the data is
calculated by minimizing an estimation of the risk. This section nishes
with an important improvement of thresholding estimators, obtained
with a translation invariant algorithm. SURE Thresholds To study the impact of the threshold on the risk, we denote by rt (f T ) the risk of a soft thresholding estimator calculated
with a threshold T . An estimate rt (f T ) of rt(f T ) is calculated from
~
the noisy data X , and T is optimized by minimizing rt (f T ).
~
To estimate the risk rt(f T ), observe that if jXB m]j < T then the
soft thresholding sets this coe cient to zero, which produces a risk
equal to jfB m]j2 . Since
EfjXB m]j2 g = jfB m]j2 + 2
one can estimate jfB m]j2 with jXB m]j2 ; 2 . If jXB m]j T , the soft
thresholding subtracts T from the amplitude of XB m]. The expected
risk is the sum of the noise energy plus the bias introduced by the
reduction of the amplitude of XB m] by T . It is estimated by 2 + T 2.
The resulting estimator of rt (f T ) is rt (f T ) =
~
with (u) = X N ;1 m=0 (jXB m]j2 ) u ; 2 if u T 2 :
2 + T 2 if u > T (10.58)
(10.59) CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 604 The following theorem 169] proves that rt(f T ) is a Stein Unbiased
~
Risk Estimator (SURE) 319].
Theorem 10.5 (Donoho, Johnstone) For a soft thresholding, the
risk estimator rt (f T ) is unbiased:
~
E frt(f T )g = rt(f T ):
~
(10.60)
Proof 3 . As in (10.45), we prove that the risk of a soft thresholding can
be written ~
rt (f T ) = Efkf ; F k2 g =
with
r( N ;1
2 X r (T f
B
m=0 m] ) ) = Efj (X ) ; j2 g = Efj(X ; sign(X )) 1jX j> ; j2 g
(10.61)
where X is a Gaussian random variable with mean and variance 2 .
The equality (10.60) is proved by verifying that
n o n o (jX j2 ) = ( 2 + T 2 ) Ef1jX j T g + E (jX j2 ; 2 ) 1jX j T :
(10.62)
Following the calculations of Stein 319], we rewrite
r(T ) = Ef(X ; g(X ) ; )2 g
(10.63)
where g(x) = T sign(x)+(x;T sign(x)) 1jxj<T is a di erentiable function.
Developing (10.63) gives
r(T ) = Ef(X ; )2g + Efjg(X )j2 g ; 2Ef(X ; ) g(X )g: (10.64)
The probability density of X is the Gaussian (y ; ). The change of
variable x = y ; shows that r(T )=E Ef(X ; ) g(X )g = Since x (x) = ; 2 0 (x), Z +1 ;1 x g(x + ) (x) dx: an integration by parts gives Ef(X ; )g(X )g = ; = 2 2 Z +1 ;1
Z +1
;1 g(x + ) 0 (x) dx g0 (x + ) (x) dx : 10.2. DIAGONAL ESTIMATION IN A BASIS 2 605 Since g0 (x) = 1jxj T ,
Ef(X ; ) g(X )g = 2 Ef1jX j T g: Inserting this expression in (10.64) yields r(T )= 2 + Efjg(X )j2 g ; 2 2 Ef1
jX j T g : But jg(x)j2 = jxj2 1jxj<T + T 2 1jxj T and Ef1jX j T g + Ef1jX j<T g = 1, so r (T n ) = ( 2 + T 2 ) Ef1jX j T g + E (jX j2 ; 2 ) 1jX j T o which proves (10.62) and hence (10.60). ~
To nd the T that minimizes the SURE estimator rt(f T ), the N
~
data coe cients XB m] are sorted in decreasing amplitude order with
r
O(N log2 N ) operations. Let XB k] = XB mk ] be the coe cient of rank
r k]j jX r k + 1]j for 1 k < N . Let l be the index such that
k: jXB
B
r
r
jXB l]j T < jXB l + 1]j. We can rewrite (10.58): rt (f T ) =
~ N
X
k=l r
jXB k]j2 ; (N ; l) 2 + l ( 2 + T 2) : (10.65) r
To minimize rt(f T ) we must choose T = jXB l]j because rt (f T ) is
~
~
increasing in T . To nd the T that minimizes rt (f T ) it is therefore suf~
r
cient to compare the N possible values fjXB k]jg1 k N , that requires
O(N ) operations if we progressively recompute the formula (10.65).
~
The calculation of T is thus performed with O(N log2 N ) operations.
Although the estimator rt(f T ) of rt (f T ) is unbiased, its variance
~
~
may induce errors leading to a threshold T that is too small. This happens if the signal energy is small relative to the noise energy: kf k2
p2 log N in
EfkW k2 g = N 2 . In this case, one must impose T =
e
order to remove all the noise. Since EfkX k2g = kf k2 + N 2 , we estimate kf k2 with kX k2 ; N 2 and compare this value with a minimum
energy level N = 2N 1=2 (loge N )3=2 . The resulting SURE threshold is T= p2 log N ~
T e if kX k2 ; N
if kX k2 ; N 2
2 > N
N : (10.66) 606 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Let be a signal set and minT rt( ) be the minimax risk of a
soft thresholding obtained by optimizing the choice of T depending on
. Donoho and Johnstone 169] prove that the threshold computed
empirically with (10.66) yields a risk rt( ) equal to minT rt ( ) plus
a corrective term that decreases rapidly when N increases, if N =
2 N 1=2 (log N )3=2 .
e
Problem 10.9 studies a similar risk estimator for hard thresholding.
However, this risk estimator is biased. We thus cannot guarantee that
the threshold that minimizes the estimated risk is nearly optimal for
hard thresholding estimations. Translation Invariant Thresholding An improved thresholding estimator is calculated by averaging estimators for translated versions
of the signal. Let us consider signals of period N . Section 5.4 explains
that the representation of f in a basis B is not translation invariant,
unless B is a Dirac or a Fourier basis. Let f p n] = f n ; p]. The vectors
p
of coe cients fB and fB are not simply translated or permuted. They
may be extremely di erent. Indeed
p
fB m] = hf n ; p] gm n]i = hf n] gm n + p]i
and not all the vectors gm n + p] belong to the basis B, for 0 p < N .
As a consequence, the signal recovered by thresholding the coe cients
p
fB m] is not a translation of the signal reconstructed after thresholding
fB m].
The translation invariant algorithm of Coifman and Donoho 137]
estimates all translations of f and averages them after a reverse trans~
lation. For all 0 p < N , the estimator F p of f p is computed by
p n] = X n ; p]:
thresholding the translated data X
~
Fp = X N ;1
m=0 p
T (XB m]) gm where T (x) is a hard or soft thresholding function. The translation
invariant estimator is obtained by shifting back and averaging these
estimates:
X
1 N ;1 ~
~
F n] = N F p n + p]:
(10.67)
p=0 10.2. DIAGONAL ESTIMATION IN A BASIS 2 607 In general, this requires N times more calculations than for a standard
thresholding estimator. In wavelet and wavelet packet bases, which
are partially translation invariant, the number of operations is only
multiplied by log2 N , and the translation invariance reduces the risk
signi cantly. 10.2.4 Wavelet Thresholding A wavelet thresholding is equivalent to estimating the signal by averaging it with a kernel that is locally adapted to the signal regularity 4].
This section justi es the numerical results with heuristic arguments.
Section 10.3.3 proves that the wavelet thresholding risk is nearly minimax for signals and images with bounded variation.
A lter bank of conjugate mirror lters decomposes a discrete signal in a discrete orthogonal wavelet basis de ned in Section 7.3.3. The
discrete wavelets j m n] = j n ; N 2j m] are translated modulo modications near the boundaries, which are explained in Section 7.5. The
support of the signal is normalized to 0 1] and has N samples spaced
by N ;1. The scale parameter 2j thus varies from 2L = N ;1 up to
2J < 1:
h
i
B = f j m n]gL<j J 0 m<2;j f J m n]g0 m<2;J :
(10.68)
A thresholding estimator in this wavelet basis can be written
~
F= J 2;j
XX j =L+1 m=0 T hX j mi jm + 2;J
X m=0 T hX J mi Jm (10.69) where T is a hard thresholding (10.39) or a soft thresholding (10.40).
The upper bound (10.42) proves that the estimation risk is small if the
energy of f is absorbed by a few wavelet coe cients. Adaptive Smoothing The thresholding sets to zero all coe cients
jhX j m ij T . This performs an adaptive smoothing that depends on the regularity of the signal f . Since T is above the maximum amplitude
of the noise coe cients jhW j mij, if
jhX j mij = jhf j mi + hW j mij T 608 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS then jhf j mij has a high probability of being at least of the order T . At
ne scales 2j , these coe cients are in the neighborhood of sharp signal
transitions, as shown by Figure 10.4(b). By keeping them, we avoid
smoothing these sharp variations. In the regions where jhX j m ij < T ,
the coe cients hf j mi are likely to be small, which means that f
is locally regular. Setting wavelet coe cients to zero is equivalent to
locally averaging the noisy data X , which is done only if the underlying
signal f appears to be regular. Noise Variance Estimation To estimate the variance 2 of the
noise W n] from the data X n] = W n] + f n], we need to suppress
the in uence of f n]. When f is piecewise smooth, a robust estimator
is calculated from the median of the nest scale wavelet coe cients
167].
The signal X of size N has N=2 wavelet coe cients fhX l m ig0 m<N=2
at the nest scale 2l = 2 N ;1 . The coe cient jhf l mij is small if f is
smooth over the support of l m , in which case hX l m i hW l mi.
In contrast, jhf l m ij is large if f has a sharp transition in the support
of l m. A piecewise regular signal has few sharp transitions, and hence
produces a number of large coe cients that is small compared to N=2.
At the nest scale, the signal f thus in uences the value of a small
portion of large amplitude coe cients hX l mi that are considered to
be \outliers." All others are approximately equal to hW l mi, which
are independent Gaussian random variables of variance 2.
A robust estimator of 2 is calculated from the median of fhX l m ig0 m<N=2 .
The median of P coe cients Med( p)0 p<P is the value of the middle
coe cient n0 of rank P=2. As opposed to an average, it does not depend on the speci c values of coe cients p > n0 . If M is the median
of the absolute value of P independent Gaussian random variables of
2
zero-mean and variance 0 , then one can show that
EfM g 0:6745 0:
The variance 2 of the noise W is estimated from the median MX of
fjhX l m ijg0 m<N=2 by neglecting the in uence of f :
~ = 0:MX :
(10.70)
6745 10.2. DIAGONAL ESTIMATION IN A BASIS 2 609 Indeed f is responsible for few large amplitude outliers, and these have
little impact on MX . Hard or Soft Thresholding If we choose the threshold T = p2 log N of Theorem 10.4, we saw in (10.41) that a soft thresholding guarantees
with a high probability that
~
jhF j mij = j T (hX j mi)j jhf j mij : ~
The estimator F is at least as regular as f because its wavelet coe cients have a smaller amplitude. This is not true for the hard thresholding estimator, which leaves unchanged the coe cients above T , and
which can therefore be larger than those of f because of the additive
noise component.
Figure 10.4(a) shows a piecewise polynomial signal of degree at most
3, whose wavelet coe cients are calculated with a Symmlet 4. Figure
10.4(c) gives an estimation computed with a hard thresholding of the
noisy wavelet coe cients in Figure 10.4(b). An estimator ~ 2 of the noise
variance 2 is calculated with the median (10.70) and the threshold
p
is set to T = ~ 2 loge N . Thresholding wavelet coe cients removes
the noise in the domain where f is regular but some traces of the
noise remain in the neighborhood of singularities. The resulting SNR is
30:8 db. The soft thresholding estimation of Figure 10.4(d) attenuates
the noise e ect at the discontinuities but the reduction by T of the
coe cient amplitude is much too strong, which reduces the SNR to
23:8 db. As already explained, to obtain comparable SNR values, the
threshold of the soft thresholding must be about half the size of the
hard thresholding one. In this example, reducing by two the threshold
increases the SNR of the soft thresholding to 28:6 db. Multiscale SURE Thresholds Piecewise regular signals have a
proportion of large coe cients jhf j mij that increases when the scale 2j increases. Indeed, a singularity creates the same number of large
coe cients at each scale, whereas the total number of wavelet coe cients increases when the scale decreases. To use this prior information,
one can adapt the threshold choice to the scale 2j . At large scale 2j
the threshold Tj should be smaller in order to avoid setting to zero too e 610 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
200 2−5 150 2−6 100 2−7 50 2−8
2−9 0 2−10 −50 2−11
−100
0 0.2 0.4 (a) 0.6 0.8 1 200 2−5 150 2−6 100 2−7 50 2−8
2−9 0 2−10 −50 2−11
−100
0 0.2 0.4 0.6 (b) 0.8 1 200 2−5 150 2−6 100 2−7 50 2−8
2−9 0 2−10 −50
−100
0 2−11 0.2 0.4 (c) 0.6 0.8 1 200 200 150 150 100 100 50 50 0 0 −50 −50 −100
0 −100
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (d)
(e)
Figure 10.4: (a): Piecewise polynomial signal and its wavelet transform
on the right. (b): Noisy signal (SNR = 21:9 db) and its wavelet transform. (c): Estimation reconstructed from the wavelet coe cients above
threshold, shown on the right (SNR = 30:8 db). (d): Estimation with
a wavelet soft thresholding (SNR = 23:8 db). (e): Estimation with a
translation invariant hard thresholding (SNR = 33:7 db). 10.2. DIAGONAL ESTIMATION IN A BASIS 2 611 many large amplitude signal coe cients, which would increase the risk.
Section 10.2.3 explains how to compute the threshold value for a soft
thresholding, from the coe cients of the noisy data. We rst compute
an estimate ~ 2 of the noise variance 2 with the median formula (10.70)
at the nest scale. At each scale 2j , a di erent threshold is calculated
from the 2;j noisy coe cients fhX j mig0 m<2;j with the algorithm
of Section 10.2.3. A SURE threshold Tj is calculated by minimizing
an estimation (10.65) of the risk at the scale 2j . The soft thresholding is then performed at each scale 2j with the threshold Tj . For a
hard thresholding, we have no reliable formula with which to estimate
the risk and hence compute the adapted threshold with a minimization. One possibility is simply to multiply by 2 the SURE threshold
calculated for a soft thresholding.
Figure 10.5(c) is a hard thresholding estimation calculated with the
p
same threshold T = ~ 2 loge N at all scales 2j . The SNR is 23:3 db.
Figure 10.5(d) is obtained by a soft thresholding with SURE thresholds
Tj adapted at each scale 2jpThe SNR is 24:1db. A soft thresholding
.
with the threshold T = ~ =2 2 loge N at all scales gives a smaller SNR
equal to 21:7 db. The adaptive calculation of thresholds clearly improves the estimation. Translation Invariance Thresholding noisy wavelet coe cients cre- ates small ripples near discontinuities, as seen in Figures 10.4(c,d)
and 10.5(c,d). Indeed, setting to zero a coe cient hf j mi subtracts
hf j mi j m from f , which introduces oscillations whenever hf j mi is
non-negligible. Figure 10.4(e) and Figures 10.5(e,f) show that these oscillations are attenuated by a translation invariant estimation (10.67),
signi cantly improving the SNR. Thresholding wavelet coe cients of
translated signals and translating back the reconstructed signals yields
shifted oscillations created by shifted wavelets that are set to zero. The
averaging partially cancels these oscillations, reducing their amplitude.
When computing the translation invariant estimation, instead of
shifting the signal, one can shift the wavelets in the opposite direction: hf n ; p] jm n]i = hf n] If f and all wavelets j jm n + p]i = hf n] j n ; N 2j m + p]i: are N periodic then all these inner products 612 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
50 50 40 40 30 30 20 20 10 10 0 0 −10 −10 −20
0 0.2 0.4 (a) 0.6 0.8 1 −20
0 50 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (b) 20 10 0.8 30 20 0.6 40 30 0.4 50 40 0.2 10 0 0 −10 −10 −20
0 0.2 0.4 (c) 0.6 0.8 1 −20
0 50 50 40 40 30 30 20 20 10 (d) 10 0 0 −10 −10 −20
0 0.2 0.4 0.6 0.8 1 −20
0 (e)
(f)
Figure 10.5: (a): Original signal. (b): Noisy signal (SNR = 13:1 db).
(c): Estimation by p hard thresholding in a wavelet basis (Symma
let 4), with T = ~ 2 loge N (SNR = 23:3 db). (d): Soft thresholding calculated with SURE thresholds Tj adapted to each scale 2j
(SNR = 24:5 db). (e): Translation invariant hard thresholding with
p
T = ~ 2 loge N (SNR = 25:7 db). (f): Translation invariant soft
thresholding with SURE thresholds (SNR = 25:6 db). 10.2. DIAGONAL ESTIMATION IN A BASIS 2 613 are provided by the dyadic wavelet transform de ned in Section 5.5: Wf 2j p] = hf n] j n ; p]i for 0 p < N: The \algorithme a trous" of Section 5.5.2 computes these N log2 N
coe cients for L < j 0 with O(N log2 N ) operations. One can
verify (Problem 10.10) that the translation invariant wavelet estimator
(10.67) can be calculated by thresholding the dyadic wavelet coe cients
hX n] j n ; p]i and by reconstructing a signal with the inverse dyadic
wavelet transform. Image Estimation in Wavelet Bases Piecewise regular images are particularly well estimated by thresholding their wavelet coe cients. The image f n1 n2] contaminated by a white noise is decomposed in a separable two-dimensional wavelet basis. Figure 10.6(c) is
computed with a hard thresholding in a Symmlet 4 wavelet basis. For
imagesp N 2 = 5122 pixels, the threshold is set to T = 3 instead of
of
T = 2 loge N 2 , because this improves the SNR signi cantly. This
estimation restores smooth image components and discontinuities, but
the visual quality of edges is a ected by the Gibbs-like oscillations that
also appear in the one-dimensional estimations in Figure 10.4(c) and
Figure 10.5(c). Figure 10.6(c) is obtained with a wavelet soft thresholding calculated with a threshold half as large T = 3=2 . When using a
di erent SURE threshold Tj calculated with (10.66) at each scale 2j , the
SNR increases to 33:1 db but the visual image quality is not improved.
As in one sdimension, the Figures 10.6(e,f) calculated with translation
invariant thresholdings have a higher SNR and better visual quality. A
translation invariant soft thresholding, with SURE thresholds, gives an
SNR of 34:2 db.
Section 10.3.3 proves that a thresholding in a wavelet basis has a
nearly minimax risk for bounded variation images. Irregular textures
are badly estimated because they produce many coe cients whose amplitudes are at the same level as the noise. To restore these textures,
it is necessary to adapt the basis in order to better concentrate the
texture energy over a few large amplitude coe cients. 614 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS (a) (b) (c) (d) (e)
(f)
Figure 10.6: (a): Original image. (b): Noisy image (SNR = 28:6 db).
(c): Estimation with a hard thresholding in a separable wavelet basis (Symmlet 4), (SNR = 31:8 db). (d): Soft thresholding (SNR =
31:1 db). (e): Translation invariant hard thresholding (SNR = 34:3 db).
(f): Translation invariant soft thresholding (SNR = 31:7 db). 10.2. DIAGONAL ESTIMATION IN A BASIS 2 615 Figure 10.7: The rst row shows the wavelet modulus maxima of the
noisy image 10.6(b). The scale increases from left to right, from 2;7
to 2;5. The chains of modulus maxima selected by the thresholding
procedure are shown below. The bottom image is reconstructed from
the selected modulus maxima at all scales. 616 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Multiscale Edge Estimation Section 9.2.4 explains that wavelet bases are not optimal for approximating images because they do not
take advantage of the geometrical regularity of edges. Understanding
how to use the geometrical image regularity to enhance wavelet estimations is a di cult open issue. One approach implemented by Hwang
and Mallat 258] is to regularize the multiscale edge representation of
Section 6.3. In many images, discontinuities belong to regular geometrical curves that are the edges of important structures. Along an edge,
the wavelet coe cients change slowly and their estimation can thus be
improved with an averaging.
The image is decomposed with a two-dimensional dyadic wavelet
transform, whose modulus maxima locate the multiscale edges. At
each scale 2j , the chaining algorithm of Section 6.3.1 links the wavelet
maxima to build edge curves. Instead of thresholding each wavelet
maxima independently, the thresholding is performed over contours.
An edge curve is removed if the average wavelet maxima amplitude
is below T = 3 . Prior geometrical information can also be used to
re ne the edge selection. Important image structures may generate long
contours, which suggests removing short edge curves that are likely to
be created by noise. The rst line of Figure 10.7 shows the modulus
maxima of the noisy image. The edges selected by the thresholding are
shown below. At the nest scale shown on the left, the noise is masking
the image structures. Edges are therefore selected by using the position
of contours at the previous scale.
The thresholded wavelet maxima are regularized along the edges
with an averaging. A restored image is recovered from the resulting
wavelet maxima, using the reconstruction algorithm of Section 6.2.2.
Figure 10.7 shows an example of an image restored from regularized
multiscale edges. Edges are visually well recovered but textures and
ne structures are removed by the thresholding based on the amplitude
and length of the maxima chains. This produces a cartoon-like image. 10.2.5 Best Basis Thresholding 3 When the additive noise W is white, the performance of a thresholding
estimation depends on its ability to e ciently approximate the signal
f with few basis vectors. Section 9.3 explains that a single basis is 10.2. DIAGONAL ESTIMATION IN A BASIS 2 617 often not able to approximate well all signals of a large class. It is then
necessary to adapt the basis to the signal 242]. We study applications
of adaptive signal decompositions to thresholding estimation. Best Orthogonal Basis Sections 8.1 and 8.5 construct dictionaries
D = 2 B where each B = fgmg0 m<N is a wavelet packet or a local cosine orthogonal basis. These dictionaries have P = N log2 N
distinct vectors but include more than 2N=2 di erent orthogonal bases
by recombining these vectors.
An estimation of f from the noisy measurements X = f + W is
obtained by thresholding the decomposition of X in B :
~
F= X N ;1
m=0 T (hX gm i) gm: The ideal basis B is the one that minimizes the average estimation
error
~
~
Efkf ; F k2 g = min Efkf ; F k2 g:
(10.71)
2
In practice, we cannot nd this ideal basis since we do not know f .
~
Instead, we estimate the risk Efkf ; F k2g in each basis B , and choose
the best empirical basis that minimizes the estimated risk. Threshold Value If we wish to choose a basis adaptively, we must
p use a higher threshold T than the threshold value 2 loge N used
when the basis is set in advance. Indeed, an adaptive basis choice may
also nd vectors that better correlate the noise components. Let us
consider the particular case f = 0. To ensure that the estimated signal
is close to zero, since X = W , we must choose a threshold T that
has a high probability of being above all the inner products jhW gm ij
with all vectors in the dictionary D. For a dictionary including P
distinct vectors, for P large there is a negligible probability for the
noise coe cients to be above
p
(10.72)
T = 2 loge P:
This threshold is however not optimal and smaller values can improve
the risk. 618 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Basis Choice For a soft thresholding, (10.58) de nes an estimator
~
rt (f T ) of the risk rt (f T ) = Efkf ; F k2g:
~
rt (T f ) =
~ X N ;1
m=1 (jhX gmij2) (10.73) with 2
T2
(u) = u2; T 2 if u > T 2 :
(10.74)
+
if u
Theorem 10.5 proves that this estimator is unbiased.
The empirical best basis B ~ for estimating f is obtained by minimizing the estimated risk rt~ (T f ) = min rt (T f ) :
~
~
2 (10.75) The estimated risk is calculated in (10.73) as an additive cost function
over the noisy coe cients. The fast algorithm of Section 9.3.2 can thus
nd the best basis B ~ in wavelet packet or local cosine dictionaries,
with O(N log2 N ) operations. Figure 10.8(d) shows the estimation of
a sound recording \grea" in the presence of a white noise with an SNR
of 8.7db. A best empirical local cosine basis is chosen by the minimization (10.75) and is used to decompose the noisy signal. This best
basis is composed of local cosine vectors having a time and a frequency
resolution adapted to the transients and harmonic structures of the signal. A hard thresholding is performed and the Heisenberg boxes of the
remaining coe cients are shown in Figure 10.8(c). p
Donoho and Johnstone 166] prove that for T = 2 loge P the risk
~
Efkf ; F ~ k2 g in the empirical best basis B ~ is within a loge N factor
~
of the minimum risk Efkf ; F k2g in the ideal best basis B . In that
sense, the best basis algorithm is guaranteed to nd a nearly optimal
basis. Cost of Adaptivity An approximation in a basis that is adaptively selected is necessarily more precise than an approximation in a basis chosen a priori. However, in the presence of noise, estimations by
thresholding may not be improved by an adaptive basis choice. Indeed, using a dictionary of several orthonormal bases requires raising 10.2. DIAGONAL ESTIMATION IN A BASIS 2 2000 2000 1500 1500 1000 1000 500 500 0 619 0 −500 −500 −1000 −1000 −1500
0 0.2 0.4 (a) 0.6 0.8 1 −1500
0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 (b) 1
2000
1500 0.8 1000 0.6
500 0.4 0
−500 0.2
−1000 0
0 0.2 0.4 0.6 0.8 1 −1500
0 0.2 (c)
(d)
Figure 10.8: (a): Speech recording of \grea." (b): Noisy signal
(SNR = 8:7 db). (c): Heisenberg boxes of the local coe cients above
the threshold in the best basis. (d): Estimated signal recovered from
the thresholded local cosine coe cients (SNR = 10:9 db). 620 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS the threshold, because the larger number of dictionary vectors produces
a higher correlation peak with the noise. The higher threshold removes
more signal components, unless it is compensated by the adaptivity,
which can better concentrate the signal energy over few coe cients.
The same issue appears in parametrized estimations, where increasing the number of parameters may t the noise and thus degrade the
estimation.
For example, if the original signal is piecewise smooth, then a best
wavelet packet basis does not concentrate the signal energy much more
e ciently than a wavelet basis. In the presence of noise, in regions
where the noise dominates the signal, the best basis algorithm may
optimize the basis to t the noise. This is why the threshold value
must be increased. Hence, the resulting best basis estimation is not as
precise as a thresholding in a xed wavelet basis with a lower threshold.
However, for oscillatory signals such as the speech recording in Figure
10.8(a), a best local cosine basis concentrates the signal energy over
much fewer coe cients than a wavelet basis, and thus provides a better
estimation. 10.3 Minimax Optimality 3
We consider the noisy data X = f + W , where W is a Gaussian white
~
noise of variance 2. An estimation F = DX of f has a risk r(D f ) =
2 g. If some prior information tells us that the signal we
EfkDX ; f k
estimate is in a set , then we must construct estimators whose maximum risk over is as small as possible. Let r(D ) = supf 2 r(D f )
be the maximum risk over . The linear minimax risk and non-linear
minimax risk are respectively de ned by
rl ( ) = Dinf r(D ) and rn( ) = Dinf r(D )
2O
2O
n l where Ol is the set of all linear operators from C N to C N and On is
the set of all linear and non-linear operators from C N to C N . We study
operators D that are diagonal in an orthonormal basis B = fgmg0 m<N :
~
F = DX = X N ;1 m=0 dm(XB m]) gm 10.3. MINIMAX OPTIMALITY 3 621 and nd conditions to achieve a maximum risk over that is close to
the minimax risk. The values of rl ( ) and rn( ) are compared, so that
we can judge whether it is worthwhile using non-linear operators.
Section 10.3.1 begins by studying linear diagonal operators. For
orthosymmetric sets, Section 10.3.2 proves that the linear and nonlinear minimax risks are nearly achieved by diagonal operators. As a
consequence, thresholding estimators in a wavelet basis are proved to
be nearly optimal for signals and images having a bounded variation.
Readers more interested by algorithms and numerical applications may
skip this section, which is mathematically more involved. 10.3.1 Linear Diagonal Minimax Estimation An estimator that is linear and diagonal in the basis B can be written
~
F = DX = X N ;1
m=0 a m] XB m] gm (10.76) where each a m] is a constant. Let Ol d be the set of all such linear
diagonal operators D. Since Ol d Ol , the linear diagonal minimax
risk is larger than the linear minimax risk rl d( ) = Dinf r(D ) rl ( ) :
2O
ld We characterize diagonal estimators that achieve the minimax risk
rl d( ). If is translation invariant, we prove that rl d( ) = rl ( )
in a discrete Fourier basis. This risk is computed for bounded variation
signals. Quadratic Convex Hull The \square" of a set in the basis B is
de ned by ( B = ff~ : f~ =
)2 X N ;1
m=0 jfB m]j2 gm with f 2 g : (10.77) We say that is quadratically convex in B if ( )2 is a convex set.
B
A hyperrectangle Rx in B of vertex x 2 C N is a simple example of 622 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS quadratically convex set de ned by n o Rx = f : jfB m]j jxB m]j for 0 m < N :
in the basis B is de ned by The quadratic convex hull QH ] of n QH ] = f : X N ;1
m=0 jfB j is in the convex hull of (
m] 2 )2 B o : (10.78) It is the largest set whose square (QH ])2 is equal to the convex hull
B
of ( )2 .
B
The risk of an oracle attenuation (10.28) gives a lower bound of the
minimax linear diagonal risk rl d( ): rl d( ) rinf ( ) = sup X N ;1 f 2 m=0 jfB m]j2 :
2 + jfB m]j2
2 (10.79) The following theorem proves that this inequality is an equality if and
only if is quadratically convex. Theorem 10.6 If is a bounded and closed set, then there exists x 2
QH ] such that rinf (x) = rinf (QH ]) in the basis B. Moreover, the
linear diagonal operator D de ned by a m] = jxB m]j2
2 + jxB m]j2 (10.80) achieves the linear diagonal minimax risk r(D ) = rl d( ) = rinf (QH ]) : (10.81) Proof 3 . The risk r(D f ) of the diagonal operator (10.76) is r(D f ) = N ;1
X
m=0 2 ja m]j2 + j1 ; a m]j2 jf B m]j2 : (10.82) Since it is a linear function of jfB m]j2 , it reaches the same maximum in
and in QH ]. This proves that r(D ) = r (D QH ]) and hence
that rl d ( ) = rl d (QH ]). 10.3. MINIMAX OPTIMALITY 3 623 To verify that rl d ( ) = rinf (QH ]) we prove that rl d (QH ]) =
rinf (QH ]). Since (10.79) shows that rinf (QH ]) rl d (QH ]) to get the reverse inequality, it is su cient to prove that the linear estimator dened by (10.80) satis es r(D QH ]) rinf (QH ]). Since is bounded
and closed, QH ] is also bounded and closed and thus compact, which
guarantees the existence of x 2 QH ] such that rinf (x) = rinf (QH ]).
The risk of this estimator is calculated with (10.82): r(D f ) =
= N ;1
X jfB m]j2 4 + 2 jxB m]j4
( 2 + jxB m]j2 )2
m=0
N ;1 2
X
X
jxB m]j2 + 4 N ;1 jfB m]j2 ; jxB m]j2
2 + jxB m]j2
2
22
m=0
m=0 ( + jxB m]j ) : To show that r(D f ) rinf (QH ]), we verify that the second summation is negative. Let 0
1 and y be a vector whose decomposition
coe cients in B satisfy jyB m]j2 = (1 ; ) jxB m]j2 + jfB m]j2 :
Since QH ] is quadratically convex, necessarily y 2 QH ] so
J( ) = N ;1
X
m=0 2 jyB m]j2
2 + jyB m]j2 N ;1
X
m=0 2 jxB m]j2 = J (0):
2 + jxB m]j2 Since the maximum of J ( ) is at = 0,
N ;1
2
2
0 (0) = X jfB m]j ; jxB m]j
J
2
22
m=0 ( + jxB m]j ) 0 which nishes the proof. This theorem implies that rl d( ) = rl d(QH ]). To take advantage of
the fact that may be much smaller than its quadratic convex hull, it
is necessary to use non-linear diagonal estimators. Translation Invariant Set Signals such as sounds or images are of- ten arbitrarily translated in time or in space, depending on the beginning of the recording or the position of the camera. To simplify border 624 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS e ects, we consider signals of period N . We say that is translation
invariant if for any f n] 2 then f n ; p] 2 for all 0 p < N .
If the set is translation invariant and the noise is stationary, then
we show that the best linear estimator is also translation invariant,
which means that it is a convolution. Such an operator is diagonal in
the discrete Fourier basis B = fgm n] = p1N exp (i2 mn=N )g0 m<N .
The decomposition coe cients of f in this basis are proportional to its
discrete Fourier transform: X
1 N ;1 f n] exp ;i2 mn = f^ m] :
p
fB m] = p
N
N n=0
N For a set , the lower bound rinf ( ) in (10.79) becomes rinf ( ) = sup X N ;1 f 2 m=0 jf^ m]j2 :
2 + N ;1 jf m]j2
^
2 N ;1 The following theorem proves that diagonal operators in the discrete
Fourier basis achieve the linear minimax risk. Theorem 10.7 Let be a closed and bounded set. Let x 2 QH ] be
such that rinf (x) = rinf (QH ]) and ^ m] 2
^
h m] = N j2x+ jxj m]j2 :
(10.83)
^
~
If is translation invariant then F = DX = X ? h achieves the linear
minimax risk rl ( ) = r(D ) = rinf (QH ]) : (10.84) Proof 2 . Since rl ( ) rl d ( ) Theorem 10.6 proves in (10.81) that rl ( ) rinf (QH ]) :
Moreover, the risk rinf (QH ]) is achieved by the diagonal estimator
(10.80). In the discrete Fourier basis it corresponds to a circular convolution whose transfer function is given by (10.83). 10.3. MINIMAX OPTIMALITY 3 625 We show that rl ( ) rinf (QH ]) by using particular Bayes priors.
If f 2 QH ] then there exists a family ffi gi of elements in such that
for any 0 m < N ,
X
X
jf^ m]j2 = pi jf^i m]j2 with
pi = 1 :
i i To each fi 2 we associate a random shift vector Fi n] = fi n ; Pi ] as
in (9.19). Each Fi n] is circular stationary, and its power spectrum is
^
computed in (9.21): RFi m] = N ;1 jf^i m]j2 . Let F be a random vector
that has a probability pi to be equal to Fi . It is circular stationary
^
and its power spectrum is RF m] = N ;1 jf^ m]j2 . We denote by f the
probability distribution of F . The risk rl ( f ) of the Wiener lter is
calculated in (10.13):
N ;1
N ;1 ^
X N ;1 jf m]j2 2
X RF m] RW m]
^
^
=
rl ( f ) =
: (10.85)
;1 ^ 2
2
^
^
m=0 RF m] + RW m] m=0 N jf m]j +
Since is translation invariant, the realizations of F are in , so f 2
. The minimax Theorem 10.3 proves in (10.19) that rl ( f ) rl ( ).
Since this is true for any f 2 QH ], taking a sup with respect to f in
(10.85) proves that rl (QH ]) rl ( ), which nishes the proof. Bounded Variation Signals The total variation de ned in (2.60) measures the amplitude of all signal oscillations. Bounded variation
signals may include sharp transitions such as discontinuities. A set V
of bounded variation signals of period N is de ned by
V = ff : kf kV = X N ;1
n=0 jf n] ; f n ; 1]j C g : (10.86) Since V is translation invariant, the linear minimax estimator is diagonal in the discrete Fourier basis. The following proposition computes
the minimax linear risk, which is renormalized by the noise energy
EfkW k2 g = N 2 . Proposition 10.5 If 1 C= N 1=2 then
rl ( V )
C:
2
N
N 1=2 (10.87) CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 626 Proof 2 . The set V is translation invariant but it is not bounded because we do not control the average of a bounded variation signal. However, one can verify with a limit argument that the equality rl ( V ) =
rinf (QH V ]) of Theorem 10.7 is still valid. To compute rinf (QH V ])
we show that V is included in a hyperrectangle Rx = ff : jf^ m]j
jx m]jg, by computing an upper bound of jf^ m]j for each f 2 V .Let
^
g n] = f n] ; f n ; 1]. Its discrete Fourier transform satis es
jg m]j = jf^ m]j 1 ; exp ;i2 m = 2 jf^ m]j sin m : (10.88)
^ N P;
Since N=01 jg n]j C , necessarily jg m]j C so
^
n N 2 jf^ m]j2 4 j sin(Cm=N )j2 = jx m]j2
^
(10.89)
which proves that V Rx . The value jx 0]j = 1 is formally treated
^
like all others. Since Rx is quadratically convex, QH V ] Rx . Hence
N ;1 2 ;1
X
N jx m]j2
^
rinf (QH V ]) rinf (Rx ) =
2 + N ;1 jx m]j2
^
m=0
with 2 N ;1 jx 0]j2 ( 2 + N ;1 jx 0]j2 );1 = 2 . Since jx m]j C N jmj;1
^
^
^
and 1 C = N 1=2 , a direct calculation shows that
rinf (QH V ]) rinf (Rx ) C N 1=2 : (10.90) To compute a lower bound for rinf (QH V ]) we consider the two
signals in V de ned by f1 = C 1 0 N=2;1] ; C and f2 = C 1 0 N=2;1] ; C 1 N=2 N ;1] :
2
2
4
4
Let f 2 QH V ] such that 1
jf^ m]j2 = 2 (jf^1 m]j2 + jf^2 m]j2 ):
A simple calculation shows that for m = 0
6
2 so jf^ m]j2 = 8 j sin(Cm=N )j2 C 2 N 2 jmj;2
rinf (QH V ]) rinf (f ) C N 1=2 : Together with (10.90) this proves (10.87). 10.3. MINIMAX OPTIMALITY 3 627 This theorem proves that a linear estimator reduces the energy of the
noise by a factor that increases like N 1=2 . The minimax lter averages
the noisy data to remove part of the white noise, without degrading too
much the potential discontinuities of f . Figure 10.2(c) shows a linear
Wiener estimation calculated by supposing that jf^ m]j2 is known. The
resulting risk (10.17) is in fact the minimax risk over the translation
invariant set f = fg : g n] = f n ; p] with p 2 Zg. If f has a
discontinuity whose amplitude is on the order of C then although the
set f is much smaller than V , the minimax linear risks rl ( f ) and
rl ( V ) are of the same order. 10.3.2 Orthosymmetric Sets We study geometrical conditions on that allow us to nearly reach
the non-linear minimax risk rn( ) with estimators that are diagonal
in a basis B = fgmg0 m<N . The maximum risk on of any linear or
non-linear diagonal estimator has a lower bound calculated with the
oracle diagonal attenuation (10.28): rinf ( ) = sup X N ;1 f 2 m=0 jfB m]j2 :
2 + jfB m]j2
2 Thresholding estimators have a maximum risk that is close to this lower
bound. We thus need to understand under what conditions rn( ) is on
the order of rinf ( ) and how it compares with rl ( ). Hyperrectangle The study begins with hyperrectangles which are
building blocks for computing the minimax risk over any set . A
hyperrectangle Rx = ff : jfB m]j jxB m]j for 0 m < N g
is a separable set along the basis directions gm . The risk lower bound
for diagonal estimators is rinf (Rx) = X N ;1
m=0 jxB m]j2 :
2 + jxB m]j2
2 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 628 The following theorem proves that for a hyperrectangle, the non-linear
minimax risk is very close to the linear minimax risk. Theorem 10.8 On a hyperrectangle Rx the linear and non-linear minimax risks are reached by diagonal estimators. They satisfy rl (Rx) = rinf (Rx) (10.91) and rinf (Rx) rn(Rx) rinf (Rx) with 1=1:25 : (10.92) Proof 3 . We rst show that a linear minimax estimator is necessarily
~
diagonal in B. Let F = DX be the estimator obtained with a linear
operator D represented by the matrix A in B: ~
F B = A XB :
Let trA be the trace of A, and A be its complex transpose. Since
X = f + W where W is a white noise of variance 2 , a direct calculation
shows that
~
r(D f ) = EfkF ; f k2 g = 2 trAA + (AfB ; fB ) (AfB ; fB ): (10.93) If Dd is the diagonal operator whose coe cients are a m] = am m the
risk is then r(Dd f ) = N ;1
X
m=0 2 ja m mj 2 + j1 ; a m mj 2 jf B m]j2 : (10.94) To prove that the maximum risk over Rx is minimized when A is
diagonal, we show that r(Dd Rx ) r(D Rx ). For this purpose, we
use a prior probability distribution 2 Rx corresponding to a random
vector F whose realizations are in Rx : FB m] = S m] xB m] : (10.95) The random variables S m] are independent and equal to 1 or ;1 with
~
probability 1=2. The expected risk r(D ) = EfkF ; F k2 g is derived 10.3. MINIMAX OPTIMALITY 3 629 from (10.93) by replacing f by F and taking the expected value with respect to the probability distribution of F . If m 6= p then EfFB m] FB p]g =
0 so we get r(D ) = N ;1
2X
2 m=0
N ;1
X
m=0 jam m j2 +
jam m j2 + N ;1
X
m=0
N ;1
X
m=0 2
3
N ;1
X
jxB m]j2 6jam m ; 1j2 + jam pj27
4
5
p=0
p6=m j1 ; am mj2 jxB m]j2 = r(Dd x(10.96)
): Since the realizations of F are in Rx , (10.20) implies that r(D Rx )
r(D ), so r(D Rx) r(Dd x). To prove that r(D Rx ) r(Dd Rx )
it is now su cient to verify that r(Dd Rx ) = r(Dd x). To minimize
r(Dd f ), (10.94) proves that necessarily am m 2 0 1]. In this case
(10.94) implies
r(Dd Rx ) = sup r(Dd f ) = r(Dd x) :
f 2Rx Now that we know that the minimax risk is achieved by a diagonal operator, we apply Theorem 10.6 which proves in (10.81) that the minimax
risk among linear diagonal operator is rinf (Rx ) because Rx is quadratically convex. So rl (Rx ) = rinf (Rx ).
To prove that the non-linear minimax risk is also obtained with a
diagonal operator we use the minimax Theorem 10.3 which proves that
rn(Rx ) = sup Dinf r(D ) :
(10.97)
2O
2Rx n The set Rx can be written as a product of intervals along each direction gm . As a consequence, to any prior 2 Rx corresponding to a
random vector F we associate a prior 0 2 Rx corresponding to F 0 such
0
0
that FB m] has the same distribution as FB m] but with FB m] indepen0
dent from FB p] for p 6= m. We then verify that for any operator D,
r(D ) r(D 0). The sup over Rx in (10.97) can thus be restricted
to processes that have independent coordinates. This independence also
implies that the Bayes estimator that minimizes r(D ) is diagonal in
B. The minimax theorem proves that the minimax risk is reached by
diagonal estimators.
Since rn (Rx ) rl (Rx ) we derive the upper bound in (10.92) from
the fact that rl (Rx ) = Rinf (Rx ). The lower bound (10.92) is obtained by 630 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
computing the Bayes risk rn ( ) = inf D2On r(D ) for the prior corresponding to F de ned in (10.95), and verifying that rn ( )
rinf (Rx ).
We see from (10.97) that rn (Rx ) rn ( ), which implies (10.92). The bound > 0 was proved by Ibragimov and Khas'minskii 219]
but the essentially sharp bound 1=1:25 was obtained by Donoho, Liu
and MacGibbon 172]. They showed that depends on the variance
2 of the noise and that if 2 tends to 0 or to +1 then tends to 1.
Linear estimators are thus asymptotically optimal compared to nonlinear estimators. Orthosymmetric set To di erentiate the properties of linear and non-linear estimators, we consider more complex sets that can be written as unions of hyperrectangles. We say that is orthosymmetric in
B if for any f 2 and for any a m] with ja m]j 1 then X N ;1
m=0 a m] fB m] gm 2 : Such a set can be written as a union of hyperrectangles:
= f2 Rf : (10.98) An upper bound of rn( ) is obtained with the maximum risk rt ( ) =
supf 2 rt(f ) of a hard p soft thresholding estimator in the basis B,
or
with a threshold T = 2 loge N . Proposition 10.6 If is orthosymmetric in B then the linear minimax estimator is reached by linear diagonal estimators and rl ( ) = rinf (QH ]) : (10.99) The non-linear minimax risk satis es 1 r ( ) r ( ) r ( ) (2 log N +1)
n
t
e
1:25 inf 2 +r inf ( ) : (10.100) 10.3. MINIMAX OPTIMALITY 3 631 Proof 2 . Since is orthosymmetric, = f 2 Rf . On each hyperrectangle Rf , we showed in (10.96) that the maximum risk of a linear
estimator is reduced by letting it be diagonal in B. The minimax linear
estimation in is therefore diagonal: rl ( ) = rl d( ). Theorem 10.6
proves in (10.81) that rl d ( ) = rinf (QH ]) which implies (10.99).
Since = f 2 Rf we also derive that rn ( ) supf 2 rn(Rf ). So
(10.92) implies that
rn( ) 1:1 rinf ( ) :
25 Theorem 10.42 proves in (10.4) that the thresholding risk satis es rt (f ) (2 loge N + 1) 2 +r p (f ) : A modi cation of the proof shows that this upper bound remains valid
if rp(f ) is replaced by rinf (f ) 167]. Taking a sup over all f 2 proves
the upper bound (10.100), given that rn ( ) rt ( ). This proposition shows that rn( ) always remains within a factor 2 loge N
of the lower bound rinf ( ) and that the thresholding risk rt ( ) is at
most 2 loge N times larger than rn( ). In some cases, the factor 2 loge N
can even be reduced to a constant independent of N .
Unlike the nonlinear risk rn( ), the linear minimax risk rl ( ) may
be much larger than rinf ( ). This depends on the convexity of . If
is quadratically convex then = QH ] so (10.99) implies that
rl ( ) = rinf ( ). Since rn( ) rinf ( )=1:25, the risk of linear and nonlinear minimax estimators are of the same order. In this case, there is
no reason for working with non-linear as opposed to linear estimators.
When is an orthosymmetric ellipsoid, Problem 10.14 computes the
minimax linear estimator of Pinsker 282] and the resulting risk.
If is not quadratically convex then its hull QH ] may be much
bigger than . This is the case when has a star shape that is elongated in the directions of the basis vectors gm , as illustrated in Figure
10.9. The linear risk rl ( ) = rinf (QH ]) may then be much larger
than rinf ( ). Since rn( ) and rt ( ) are on the order of rinf ( ), they
are then much smaller than rl ( ). A thresholding estimator thus brings
an important improvement over any linear estimator. 632 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
¾ ½
¿ (a)
(b)
Figure 10.9: (a): Example of orthosymmetric set in three dimensions.
(b): The quadratically convex hull QH ] is a larger ellipsoid including
. Example 10.2 Let be an lp ball de ned by X N ;1 = ff : p
m jfB m=0 m]jp C pg: (10.101) It is an orthosymmetric set. Its square is
( B = ff :
)2 X N ;1
m=0 p
m jfB m]jp=2 C pg: If p 2 then ( )2 is convex so is quadratically convex. If p < 2, the
B
P ;1 2
convex hull of ( )2 is ff : N =0 m jfB m]j C 2g so the quadratic
B
m
convex hull of is
QH ] = ff : X N ;1
m=0 2
m jfB m]j2 C 2g: The smaller p, the larger the di erence between (10.102)
and QH ]. Risk calculation The value of rinf ( ) depends on the error when
approximating signals in with few vectors selected from the basis B. 10.3. MINIMAX OPTIMALITY 3 633 Theorem 9.4 proves that the non-linear approximation error of f 2
r
depends on the decay rate of its sorted coe cients fB k] = fB mk ], with
r
r
jfB k]j jfB k +1]j for 1 k N . The following proposition computes
rinf ( ) for two orthosymmetric sets. Proposition 10.7 Let s > 1=2 and C be such that 1 C=
Cs = n jf r B k]j f: and n Cs = f : then rinf ( C s) C k;s for X N ;1
m=0 1kN jfB m]j1=s s rinf ( C C 1=s C s) o o 2;1=s N s . If
(10.103)
(10.104) : (10.105) r
Proof 2 . We rst prove that if the sorted coe cients of f satisfy jfB k]j
C k;s then
(10.106)
rinf (f ) C 1=s 2;1=s :
r
Remember from (10.35) that rinf (f ) rp (f ). Since jfB k]j C k;s
Theorem 9.4 proves that the non-linear approximation error of f in B
satis es n M ] C 2 M 1;2s and Proposition 10.3 implies that rp (f )
2;1=s C 1=s , which veri es (10.106). If the coe cients satisfy only the
r
upper bound jfB k]j = O(C k;s ) the same proof shows that rinf (f ) =
r
O(C 1=s 2;1=s ). The set C s includes f such that jfB k]j = C k;s, and
r k]j = O(C k;s ). We thus derive that rinf ( C s ) =
all f 2 C s satisfy jfB
supf 2 C s rinf (f ) 2;1=s C 1=s .
Let us now consider the set C s de ned in (10.104). If f 2 C s
r
then (9.29) proves that jfB k]j C k;s . So C s
C s and hence
rinf ( C s ) rinf ( C s). To get a reverse inequality we consider f 2 C s
such that jfB m]j = for 0 m < b(C= )1=s c and fB m] = 0 for
m b(C= )1=s c. In this case rp (f ) = b(C= )1=s c
Since rinf ( C s)
2;1=s C 1=s . 2 1 rp (f ) and r ( C s )
inf
2 C 1=s 2;1=s :
rinf ( C s ), we get rinf ( C s ) 634 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS The hypothesis C=
1 guarantees that the largest signal coe cient
is not dominated by the noise, whereas C=
N s indicates that the
smallest coe cient has an amplitude smaller than the noise. This is
typically the domain of application for noise removal algorithms.
The larger s, the faster the decay of rinf ( ) when the noise variance
2 tends to 0. The exponent s is large if signals in
have sorted
decomposition coe cients with a fast decay, in which case rinf ( ) is
almost on the order of 2. This risk is much smaller than the noise
energy EfkW k2g = N 2 . 10.3.3 Nearly Minimax with Wavelets A thresholding estimator in a wavelet basis has a nearly minimax risk
for sets of piecewise regular signals.This result is proved for piecewise
polynomial signals, which have key characteristics that explain the efciency of wavelet thresholding estimators. The more general case of
bounded variation signals and images is studied. Piecewise Polynomials Piecewise polynomials are among the most di cult bounded signals to estimate with a linear operator. Indeed, the
proof of Proposition 10.5 shows that the maximum risk of an optimal
linear estimator is nearly reached by piecewise constant signals.
The estimation of a piecewise polynomial f is improved by nonlinear operators that average the noisy data X = f + W over large
domains where f is regular, but which avoid averaging X across the
discontinuities of f . These adaptive smoothing algorithms require estimating the positions of the discontinuities of f from X . Let K d be
the set of piecewise polynomial signals on 0 N ; 1], with at most K
polynomial components of degree d or smaller. Figure 10.2 gives an
example with d = 3 and K = 9. The following proposition computes a
lower bound of the minimax risk rn( K d). Proposition 10.8 If Kd rn(
N is a set of piecewise polynomial signals then K d)
2 K (d + 1) N ;1 : (10.107) 10.3. MINIMAX OPTIMALITY 3 635 Proof 2 . We consider f 2 K d which is equal to polynomials of degree
d on a partition of 0 N ; 1] composed of K sub-intervals f k k+1 ;
1]g0 k K . To compute a lower bound of rn ( K d), we create an oracle
estimator that knows in advance the position of each interval k k+1 ;
1]. On k k+1 ; 1], f is equal to a polynomial pk of degree d, which is
characterized by d+1 parameters. Problem 10.3 shows that the minimum
risk when estimating pk on k k+1 ; 1] from X = f + W is obtained
with an orthogonal projection on the space of polynomials of degree d
over k k+1 ; 1]. The resulting risk is (d + 1) 2 . Since rn ( K d ) is
larger than the sum of these risks on the K intervals, rn ( K d) K (d + 1) 2 : The lower bound (10.107) is calculated with an oracle estimator that
knows in advance the positions of the signal discontinuities. One can
prove 227] that the need to estimate the position of the signal discontinuities introduces another log2 N factor in the non-linear minimax
risk:
rn( K d) K (d + 1) loge N :
N2
N
It is much smaller than the normalized linear minimax risk (10.87),
which decays like N ;1=2 .
The inner product of a wavelet with d + 1 vanishing moments and
a polynomial of degree d is equal to zero. A wavelet basis thus gives
a sparse representation of piecewise polynomials, with non-zero coefcients located in the neighborhood of their discontinuities. Figure
10.4(a) gives an example. The following theorem derives that a thresholding estimator in a wavelet basis has a risk that is close to the nonlinear minimax. Proposition 10.9 Let T = p2 log N . The risk of a hard or a soft
thresholding in a Daubechies wavelet basis with d +1 vanishing moments
satis es
rt ( K d) 4K (d + 1) log2 N (1 + o(1))
e
(10.108)
N2
loge 2
N
when N tends to +1.
e 636 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
Proof 2 . On 0 N ;1], the discrete wavelets j m n] of a Daubechies basis
with d + 1 vanishing moments have a support of size N 2j (2d + 2). Let
f 2 K d. If the support of j m is included inside one of the polynomial
components of f , then hf j mi = 0. At each scale 2j , there are at
most K (2d + 2) wavelets j m whose support includes one of the K
transition points of f . On at most log2 N scales, the number M of nonzero coe cients thus satis es M K (2d + 2) log2 N: (10.109) Since min(jhf j m ij2 2 ) 2 and min(jhf j m ij2 2 ) = 0 if hf j mi =
0, we derive from (10.42) that the thresholding risk satis es rt (f ) (2 loge N + 1) (M + 1) 2 :
Inserting (10.109) yields rt ( ) (1 + 2K (d + 1) log2 N ) (2 loge N + 1) 2 :
Extracting the dominating term for N large gives (10.108). The wavelet thresholding risk rt( K d) is thus larger than rn( K d) by
at most a loge N factor. This loss comes from a non-optimal choice of
p
the threshold T = 2 loge N . If a di erent threshold Tj is used to
threshold the wavelet coe cients at each scale 2j , then one can prove
227] that the loge N factor disappears:
min(Tj )j rt( K d)
K (d + 1) loge N :
(10.110)
2
N
N
For a soft thresholding, nearly optimal values Tj are calculated from
the noisy data with the SURE estimator (10.66), and the resulting risk
rt ( K d) has an asymptotic decay equivalent to (10.110) 169]. Bounded Variation Let
variation bounded by C :
V n = f : kf kV = V be the set of signals having a total X N ;1
n=0 o jf n] ; f n ; 1]j C : 10.3. MINIMAX OPTIMALITY 3 637 To prove that a thresholding estimator in a wavelet basis has nearly a
minimax risk, we show that V can be embedded in two sets that are
orthosymmetric in the wavelet basis. This embedding is derived from
the following proposition that computes an upper bound and a lower
bound of kf kV from the wavelet coe cients of f . To simplify notations
we write the scaling vectors of the wavelet basis: J m = J +1 m . Recall
that the minimum scale is 2L = N ;1 . Proposition 10.10 There exist A B > 0 such that for all N > 0 kf kV
and kf kV B N ;1=2 J +1 ;j ;
X 2X1
j =L+1 n=0 2;j=2 jhf 02;j ;1
X ;j=2
A N ;1=2 sup @
2 jhf
jJ n=0 j m ij = B kf k1 1 1 (10.111) 1
j m ijA = A kf k1 1 1 : (10.112) The proof is identical to the proof of Theorem 9.6, replacing integrals by discrete sums. The factor N ;1=2 2;j=2 comes from the fact that
k j mkV N ;1=2 2;j=2. The indices of the norms kf k1 1 1 and kf k1 1 1
correspond to the indices of two Besov norms (9.32), calculated at scales
2j > N ;1. The two Besov balls
111 and = f : kf k1 1 1 C B ;1 = f : kf k1 1 1 C A;1
are clearly orthosymmetric in the wavelet basis. Proposition 10.10
proves that
(10.113)
111
V
111 :
Proposition 10.6 shows that a thresholding risk is nearly minimax over
orthosymmetric sets. The following theorem derives a similar result
over V by using the orthosymmetric embedding (10.113).
111 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 638 p2 log N . Theorem 10.9 (Donoho, Johnstone) Let T = exist A1 B1 > 0 such that if 1 C=
2=3 A1 C rn( V )
N2 1
N 2=3 rt ( V )
N2 There e N then B1 C 2=3 loge N :
N 2=3
(10.114) Proof 3 . Since 1 1 1 and 1 1 1 are orthosymmetric, Proposition 10.6
proves that
1r(
1:25 inf 1 1 1 ) rn ( 1 1 1 )
and
rt( 1 1 1 ) (2 loge N + 1) 2 + rinf ( 1 1 1 ) : But 111 V 111 so 1r(
2
1:25 inf 1 1 1 ) rn ( V ) rt ( V ) (2 loge N +1) + rinf (
The double inequality (10.114) is proved by verifying that rinf (
N rinf ( 1 1 1)
N2 1 1 1)
2 C 2=3 N Let us rst compute an upper bound of rinf (
then (10.112) shows that
2;j ;1
X n=0 2;j=2 jhf j mij 1:
2=3
1 1 1 ). 1 1 1) : (10.115)
If f 2 111 C N 1=2 :
A As in (9.38), we derive that there exists A0 such that the sorted wavelet
r
coe cients fB k] of f satisfy
r
jfB k]j A0 C N 1=2 k;3=2 :
r
Let C 0 s = ff : jfB k]j C 0 k;s g with s = 3=2 and C 0 = A0 C N 1=2 . Since 111 rinf ( C0 s 1 1 1) Proposition 10.7 shows that rinf ( C0 s) (C N 1=2 )2=3 2;2=3 : (10.116) 10.3. MINIMAX OPTIMALITY 3 639 To compute a lower bound of rinf ( 1 1 1 ), we de ne a subset l
of signals f such that hf j m i = 0 for j 6= l and when j = l 111 2;l
X m=0 jhf 1=2 l=2
ij C N B 2 = Cl :
lm Over these 2;l non-zero wavelet coe cients, this set l is identical to a
set C 00 s de ned in (10.104), for s = 1 and C 00 = Cl . Proposition 10.7
proves that
rinf ( 1 1 1 ) rinf ( l ) C N 1=2 2l=2 :
(10.117)
Since 1 C = N one can choose ; log2 N l 0 such that
1=2
2;l < C N So rinf ( Since rinf (
(10.118). 1 1 1) 1 1 1) !2=3 < 2;l+1: rinf ( l ) (C N 1=2 )2=3 2;2=3 : (10.118)
rinf ( 1 1 1), we derive (10.115) from (10.116) and This theorem proves that for bounded variation signals, the thresholding risk in a wavelet basis is close to the minimax risk rn( V ). The
theorem proof can be re ned 168] to show that rn ( V )
C 2=3 1 and rt ( V )
C 2=3 loge N :
N2
N 2=3
N2
N 2=3
The loss of a factor loge N in the thresholding risk is due to a threshold
p
choice T = 2 loge N that is too high at large scales. If the wavelet
coe cients are thresholded with di erent thresholds Tj that are optimized for scale 2j then the loge N factor disappears 170, 227]. In
this case, when N increases, rt ( V ) and rn( V ) have equivalent decay.
For a soft thresholding, the thresholds Tj can be calculated with the
SURE estimator (10.66). We restrict the study to bounded variation
signals because they have a simple characterization, but the minimax
and thresholding risks can also be calculated in balls of any Besov space,
leading to similar near-optimality results 170]. CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 640 Bounded Variation Images We now study the estimation of bounded
variation images of N 2 pixels, which we will assume to be periodic to
avoid border problems. The total variation is de ned in (2.70): X
1 N ;1 f n n ];f n ;1 n ] 2+ f n n ];f n;1 n ;1] 2
kf kV = N
12
1
2
12
2
n1 n2 =0
Let V be the set of images that have a total variation bounded by C :
V n = f : kf kV o C: In one dimension, Theorem 10.7 proves that if is translation invariant, then the linear minimax risk rl ( ) is reached by an estimator
that is diagonal in the discrete Fourier basis, and thus corresponds to
a circular convolution. This result remains valid for images, and is
proved similarly. Since V is translation invariant, the minimax linear
estimator can be written as a circular convolution. The next theorem
computes the linear minimax risk rl ( V ). It is compared with rn( V )
and with the maximum risk rt( V ) obtained with a thresholding estimator in a separable wavelet basis. Theorem 10.10 (Donoho, Johnstone) There exists A > 0 such that
if 1 C= N then rl ( V ) 1:
(10.119)
N2 2
p2 log N 2. There exist A B > 0 such that if N ;1
Let T =
11
e
C= N then
1
n
t(
A1 C N rN(2 V2 ) rN 2 V2) B1 C loge N :
(10.120)
N
A Proof 3 . The linear and non-linear minimax risk are calculated by showing that V can be embedded in two sets that are orthosymmetric in a
separable wavelet basis. For this purpose, we establish upper and lower
bounds of kf kV from the wavelet coe cients of f .
To simplify notation, we denote by B = fgm g0 m<N 2 the orthonorr
mal wavelet basis. Let fB m] be the wavelet coe cients of f , and fB k] 1=2 : 10.3. MINIMAX OPTIMALITY 3 641 be the sorted coe cients in decreasing amplitude order. This sorting
excludes the large scale \scaling coe cients" that carry the low frequencies of the signal. The discrete version of Theorem 9.7 proves that there
exist A B > 0 such that for all N > 0 kf kV
and N 2 ;1
;1 X jfB m]j
BN (10.121) m=0 r
kf kV A N ;1 k jfB k]j : (10.122)
The factors N ;1 of these inequalities comes from the fact that the total
variation of a two-dimensional wavelet gm satis es kgm kV N ;1 .
Let 1 and 2 be the two sets de ned by
1 and n = f: N 2 ;1
X
m=0 jfB m]j C N B ;1 o r
f : jfB k]j C N A k;1 : 2= These sets are clearly orthosymmetric in the wavelet basis B. The upper
bound (10.121) and lower bound (10.122) prove that
(10.123)
1
V
2:
Let us now compute upper and lower bounds for the linear minimax
~
risk rl ( V ). The trivial estimator F = X has a risk equal to EfkW k2 g =
2 2 so rl ( V ) N 2 2 . To get a lower bound, we use the fact that
N
1
V so rl ( V ) rl ( 1 ). Since 1 is orthosymmetric in the wavelet
basis B, Proposition 10.6 proves in (10.99) that
rl ( 1 ) = rinf (QH 1]) :
We also derive from (10.102) that
QH
Since C= 1] = n f: N 2 ;1
X
m=0 o jfB m]j2 C 2 N 2B ;2 : 1 we can choose fB m] = =B and f 2 QH rl ( V ) rinf (QH 1 ])) rinf (f ) = N 2 2 1 ]. B2 :
B2 + 1 Hence 642 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
This proves the lower bound of (10.119).
The non-linear minimax and thresholding risk are calculated by applying Proposition 10.6 to the orthosymmetric sets 1 and 2 . Since
1
V
2,
1 r ( ) r ( ) r ( ) (2 log N + 1) 2 + r ( ) :
nV
tV
inf 2
e
1:25 inf 1
(10.124)
Proposition 10.7 allows us to compute rinf ( 1 ) and rinf ( 2 ) since
1 = C1 s and 2 = C2 s
C1 = C N B ;1 and C2 = C N A;1. Since with s = 1,
the calculation (10.105) applies and proves that rinf ( 1) rinf ( 2 ) C N : (10.125) N ;1 C= N (10.126) We thus derive (10.120) from (10.124) and (10.126). This theorem proves that the linear minimax risk reduces the noise
energy by at most a constant that is independent of , C and N . The
normalized risk of a thresholding estimator decays like N ;1 loge N and
is thus much smaller than the linear minimax risk when N is large.
p
As in one dimension, if the threshold T = 2 loge N is replaced by
thresholds Tj that are optimized at each scale, then the loge N term
disappears 170, 227] and
rt( ) rn( ) C 1 :
(10.127)
N2 N2
N
For a soft thresholding, the thresholds Tj can be calculated with the
SURE estimator (10.66). 10.4 Restoration 3
Measurement devices can introduce important distortions and add noise
to the original signal. Inverting the degradation is often numerically
unstable and thus ampli es the noise considerably. The signal estimation must be performed with a high amplitude noise that is not white.
Deconvolutions are generic examples of such unstable inverse problems. 10.4. RESTORATION 3 643 Section 10.4.1 studies the estimation of signals contaminated by non
white Gaussian noises. It shows that thresholding estimators are quasiminimax optimal if the basis nearly diagonalizes the covariance of the
noise and provides sparse signal representations. Inverse problems and
deconvolutions are studied in Section 10.4.2, with an application to the
removal of blur in satellite images. 10.4.1 Estimation in Arbitrary Gaussian Noise
The signal f is contaminated by an additive Gaussian noise Z : X =f +Z :
The random vector Z is characterized by its covariance operator K ,
and we suppose that EfZ n]g = 0. When this noise is white, Section
10.3.2 proves that diagonal estimators in an orthonormal basis B =
fgmg0 m<N are nearly minimax optimal if the basis provides a sparse
signal representation. When the noise is not white, the coe cients of
the noise have a variance that depends on each gm:
2
m = EfjZB m]j2 g = hKgm gmi : The basis choice must therefore depend on the covariance K . Diagonal Estimation We study the risk of estimators that are diagonal in B: X
~
F = DX = dm(XB m]) gm :
N ;1 (10.128) m=0 If dm (XB m]) = a m] XB m], we verify as in (10.28) that the minimum
~
risk EfkF ; f k2g is achieved by an oracle attenuation:j2
a m] = jf jfB]jm+
Bm 2 and ~
EfkF ; f k2 g = rinf (f ) = X N ;1 (10.129) 2
m 2
2
m jfB m]j
2
2
m=0 m + jfB m]j : (10.130) CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 644 Over a signal set , the maximum risk of an oracle attenuation is
rinf ( ) = supf 2 rinf (f ). An oracle attenuation cannot be implemented
because a m] depends on jfB m]j which is not known, so rinf ( ) is only
a lower bound for the minimax risk of diagonal estimators. However,
a simple thresholding estimator has a maximum risk that is close to
rinf ( ). We begin by studying linear diagonal estimators D, where
each a m] is a constant. The following proposition computes an upper
bound of the minimax linear risk. The quadratic convex hull QH ] of
is de ned in (10.78). Proposition 10.11 Let be a closed and bounded set. There exists
x 2 QH ] such that rinf (x) = rinf (QH ]). If D is the linear operator
de ned by
x m2
(10.131)
a m] = 2 j+Bjx ]jm]j2
B
m
then rl ( ) r(D ) = rinf (QH ]) : (10.132) Proof 2 . Let rl d ( ) be the minimax risk obtained over linear operators
that are diagonal in B. Clearly rl ( ) rl d ( ). The same derivations
as in Theorem 10.6 prove that the diagonal operator de ned by (10.131)
satis es
r(D ) = rl d( ) = rinf (QH ]) :
Hence (10.132). Among non-linear diagonal estimators, we concentrate on thresholding
estimators:
N ;1
X
~=
F
(10.133)
Tm (XB m]) gm
m=0 where T (x) is a hard or soft thresholding function. The threshold Tm
2
is adapted to the noise variance m in the direction of gm . Proposition 10.4 computes an upper bound of the risk rt (f ) when Tm =
p
m 2 loge N . If the signals belong to a set , the threshold values are
improved by considering the maximum of signal coe cients: sB m] = sup jfB m]j :
f2 10.4. RESTORATION 3 645 If sB m] m then setting XB m] to zero yields a risk jfB m]j2 that is
2
always smaller than the risk m of keeping it. This is done by choosing
Tm = 1 to guarantee that Tm (XB m]) = 0. Thresholds are therefore
de ned by
m Tm = p2 log N if m < sB m] (10.134)
1
if m sB m] :
Proposition 10.12 For the thresholds (10.134), the risk of a threshe olding estimator satis es for N
with 2 1
=N P 4 rt ( ) (2 loge N + 1) 2+r inf ( ) (10.135) 2 m <sB m] m . Proof 2 . The thresholding risk rt (f ) is calculated by considering separately the case Tm = 1, which produces a risk of jfB m]j2 , from the case Tm < 1 rt (f ) = X
m sB m] jfB m]j2 + X
m <sB m] EfjfB m] ; Tm (XB m])j2 g : (10.136)
A slight modi cation 167] of the proof of Theorem 10.4 shows that
EfjfB m] ; Tm (XB m])j2 g 2
m
(2 loge N + 1) N + 2
2
m jfB m]j
2
m + jfB m]j2 : (10.137)
2
2
If m sB m] then jfB m]j2 2 m jfB m]j2 ( m + jfB m]j2 );1 , so inserting (10.137) in (10.136) proves (10.135). This proposition proves that the risk of a thresholding estimator is
not much above rinf ( ). It now remains to understand under what
conditions the minimax risk rn( ) is also on the order of rinf ( ). Nearly Diagonal Covariance To estimate e ciently a signal with
a diagonal operator, the basis B must provide a sparse representation of signals in but it must also transform the noise into \nearly" independent coe cients. Since the noise Z is Gaussian, it is su cient to have CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 646 \nearly" uncorrelated coe cients, which means that the covariance K
of Z is \nearly" diagonal in B. This approximate diagonalization is
measured by preconditioning K with its diagonal. We denote by Kd
the diagonal operator in the basis B, whose diagonal coe cients are
2
equal to the diagonal coe cients m of K . We suppose that K has no
eigenvalue equal to zero, because the noise would then be zero in this
direction, in which case the estimation is trivial. Let K ;1 be the in1
verse of K , and Kd =2 be the diagonal matrix whose coe cients are the
square root of the diagonal coe cients of Kd. The following theorem
computes lower bounds of the minimax risks with a preconditioning
factor de ned with the operator sup norm k : kS introduced in (A.16). Theorem 10.11 (Donoho, Kalifa, Mallat) The preconditioning factor satis es 1
1
= kKd =2 K ;1 Kd =2 kS 1 :
is orthosymmetric in B then
r ( ) 1 r (QH ]) If (10.138) B l and B rn ( ) (10.139) inf 1 r ( ):
1:25 B inf (10.140) Proof 3 . The proof considers rst the particular case where K is diagonal. If K is diagonal in B then the coe cients ZB m] are independent
2
Gaussian random variables of variance m . Estimating f 2 from
X = f + Z is equivalent to estimating f0 from X0 = f0 + Z0 where Z0 = N ;1
X ZB
m=0 m] g m m X0 = N ;1
X XB
m=0 m] g m m f0 = N ;1
X fB m]
m=0 m gm : (10.141)
The signal f0 belongs to an orthosymmetric set 0 and the renormalized
noise Z0 is a Gaussian white noise of variance 1. Proposition 10.6 applies
to the estimation problem X0 = f0 + Z0 . By reinserting the value of the
renormalized noise and signal coe cients, we derive that
rn( ) 1:1 rinf ( ) and rl ( ) = rinf (QH ]) :
(10.142)
25 10.4. RESTORATION 3 647 To prove the general case we use inequalities over symmetrical matrices. If A and B are two symmetric matrices, we write A B if the
eigenvalues of A ; B are positive, which means that hAf f i hBf f i
1
1
for all f 2 C N . Since B is the largest eigenvalue of Kd =2 K ;1 Kd =2 , the
;
;
inverse ;1 is the smallest eigenvalue of the inverse Kd 1=2 K Kd 1=2 . It
B
;
;
;
follows that hKd 1=2 K Kd 1=2 f f i ;1 hf f i. By setting g = Kd 1=2 f
B
1
1
we get hKg gi ;1 hKd =2 g Kd =2 gi. Since this is valid for all g 2 C N ,
B
we derive that
K ;1 Kd :
(10.143)
B
Observe that B 1 because hKgm gm i = hKd gm gm i. Lower bounds
for the minimax risks are proved as a consequence of the following lemma. Lemma 10.2 Consider the two estimation problems Xi = f + Zi for i = 1 2, where Ki is the covariance of the Gaussian noise Zi. We denote
by ri n ( ) and ri l ( ) the non-linear and linear minimax risks for each
estimation problem i = 1 2. If K1 K2 then
r1 n ( ) r2 n( ) and r1 l ( ) r2 l ( ) : (10.144) Since K1 K2 one can write Z1 = Z2 + Z3 where Z2 and Z3 are two
independent Gaussian random vectors and the covariance of Z3 is K3 =
K1 ; K2 0. We denote by i the Gaussian probability distribution
~
of Zi . To any estimator F1 = D1 X1 of f from X1 we can associate an
~
estimator F2 , calculated by augmenting the noise with Z3 and computing
the average with respect to its probability distribution:
~
F2 = D2 X2 = E 3 fD1 (X2 + Z3)g = E 3 fD1 X1 g :
The risk is E 2 fjD2 X2 ; f j2 g = E 2 fjE 3 fD1 X1g ; f j2g
E 2 fE 3 fjD1 X1 ; f j2gg = E 1 fjD1 X1 ; f j2 g :
~
~
To any estimator F1 = D1 X1 we can thus associate an estimator F2 =
D2 X2 of lower risk for all f 2 . Taking a sup over all f 2 and the
in mum over linear or non-linear operators proves (10.144).
Since K ;1 Kd , Lemma 10.2 proves that the estimation problem
B
with the noise Z of covariance K has a minimax risk that is larger than 648 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
the minimax risk of the estimation problem with a noise of covariance
;1 Kd . But since this covariances is diagonal we can apply (10.142).
B
The de nition of rinf ( ) is the same for a noise of covariance K and for
2
a noise of covariance Kd because m = hKgm gm i = hKd gm gm i. When
;1 1, the value rinf ( ) that appears
multiplying Kd by a constant B
0
0
in (10.142) is modi ed into rinf ( ) with rinf ( ) ;1 rinf ( ). We thus
B
derive (10.140) and (10.139). One can verify that B = 1 if and only if K = Kd and hence that K is
diagonal in B. The closer B is to 1 the more diagonal K . The main
di culty is to nd a basis B that nearly diagonalizes the covariance
of the noise and provides sparse signal representations so that is
orthosymmetric or can be embedded in two close orthosymmetric sets.
An upper bound of rl ( ) is computed in (10.132) with a linear
diagonal operator, and together with (10.139) we get
1 r (QH ]) r ( ) r (QH ]) :
(10.145)
B inf l inf Similarly, an upper bound of rn( ) is calculated with the thresholding
risk calculated by Proposition 10.12. With the lower bound (10.140)
we obtain
1 r ( ) r ( ) r ( ) (2 log N + 1) 2 + r ( ) :
n
t
inf
e
1:25 B inf
(10.146)
If the basis B nearly diagonalizes K so that B is on the order of 1 then
rl ( ) is on the order of rinf (QH ]), whereas rn( ) and rt ( ) are on the
order of rinf ( ). If is quadratically convex then = QH ] so the
linear and non-linear minimax risks are close. If is not quadratically
convex then a thresholding estimation in B may signi cantly outperform an optimal linear estimation. 10.4.2 Inverse Problems and Deconvolution The measurement of a discrete signal f of size N is degraded by a linear
operator U and a Gaussian white noise W of variance 2 is added: Y = Uf + W : (10.147) 10.4. RESTORATION 3 649 We suppose that U and 2 have been calculated through a calibration
procedure. The restoration problem is transformed into a denoising
problem by inverting the degradation. We can then apply linear or
non-linear diagonal estimators studied in the previous section. When
the inverse U ;1 is not bounded, the noise is ampli ed by a factor that
tends to in nity. This is called an ill-posed inverse problem 96, 323]
The case where U is a convolution operator is studied in more detail
with an application to satellite images. Pseudo Inverse The degradation U is inverted with the pseudoinverse de ned in Section 5.1.2. Let V = ImU be the image of U and
~
V? be its orthogonal complement. The pseudo-inverse U ;1 of U is the
? is zero. The restoration is said to
left inverse whose restriction to V be unstable if ~S
lim1 kU ;1 k2 = +1 :
N !+ Estimating f from Y is equivalent to estimating it from
~
~
~
X = U ;1 Y = U ;1 Uf + U ;1 W: (10.148) ~
The operator U ;1 U = PV is an orthogonal projection on V so
~
X = PV f + Z with Z = U ;1 W :
(10.149)
~
The noise Z is not white but remains Gaussian because U ;1 is linear. It
is considerably ampli ed when the problem is unstable. The covariance
operator K of Z is
~~
K = 2 U ;1 U ;1
(10.150)
where A is the adjoint of an operator A.
To simplify notation, we formally rewrite (10.149) as a standard
denoising problem:
X =f +Z
(10.151)
while considering that the projection of Z in V? is a noise of in nite
energy to express the loss of all information concerning the projection
of f in V?. It is equivalent to write formally Z = U ;1 W .
Let B = fgmg0 m<N be an orthonormal basis such that a subset of
its vectors de nes a basis of V = ImU. The coe cients of the noise 650 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 2
have a variance m = EfjZB m]j2g, and we set m = 1 if gm 2 V?.
An oracle attenuation (10.129) yields a lower bound for the risk X N ;1 2
2
m jfB m]j
rinf (f ) =
2
2
m=0 m + jfB m]j : (10.152) The loss of the projection of f in V? appears in the terms
2
2
m jfB m]j
2
2
m + jfB m]j = jfB m]j2 if m =1: Proposition 10.12 proves that a thresholding estimator in B yields
a risk that is above rinf ( ) by a factor 2 loge N . Theorem 10.11 relates
linear and non-linear minimax risk to rinf ( ). Let Kd be the diagonal
operator in B, equal to the diagonal of the covariance K de ned in
(10.150). The inverse of K is replaced by its pseudo inverse K ;1 =
;2 U U and the preconditioning number is
B 1
1
1
= kKd =2 K ;1 Kd =2 kS = ;2 kKd =2 U k2 :
S Thresholding estimators have a risk rt ( ) that is close to rn( ) if is
nearly orthosymmetric in B and if B is on the order of 1. The main
di culty is to nd such a basis B.
The thresholds (10.134) de ne a projector that is non-zero only in
the space V0 V generated by the vectors fgmg m <sB m]. This means
~
that the calculation of X = U ;1 Y in (10.148) can be replaced by a
~ ;1 Y , to avoid numerical instabilities.
regularized inverse X = PV0 U Deconvolution The restoration of signals degraded by a convolution operator U is a generic inverse problem that is often encountered in
signal processing. The convolution is supposed to be circular to avoid
border problems. The goal is to estimate f from Y = f ? u+W :
The circular convolution is diagonal in the discrete Fourier basis
o
n
B = gm n] = p1N exp (i2 m=N )
. The eigenvalues are equal to
0 m<N 10.4. RESTORATION 3 651 the discrete Fourier transform u m], so V = ImU is the space generated
^
by the sinusoids gm such that u m] 6= 0. The pseudo inverse of U is
^
~ ;1 f = f ? u;1 where the discrete Fourier transform of u;1 is
U
~
~
1
^ 6= 0
u
b
d
u;1 m] = 0 m] if u m]] = 0 :
~
if u m
^
The deconvolved data are
~
X = U ;1 Y = Y ? u;1 :
~
~
The noise Z = U ;1 W is circular stationary. Its covariance K is a cir~
~
~
cular convolution with 2 u;1 ? u;1 , where u;1 n] = u;1 ;n]. The
~
Karhunen-Loeve basis that diagonalizes K is therefore the discrete
2
Fourier basis B. The eigenvalues of K are m = 2 ju m]j;2 . When
^
2 = 1.
u m] = 0 we formally set m
^
When the convolution lter is a low-pass lter with a zero at a high
frequency, the deconvolution problem is highly unstable. Suppose that
u m] has a zero of order p 1 at the highest frequency m = N=2:
^
2m ; 1 p :
ju m]j N
^
(10.153)
2
The noise variance m has a hyperbolic growth when the frequency m is
in the neighborhood of N=2. This is called a hyperbolic deconvolution
problem of degree p. Linear Estimation In many deconvolution problems the set is
translation invariant, which means that if g 2 then any translation of g modulo N also belongs to . Since the ampli ed noise Z is circular
stationary the whole estimation problem is translation invariant. In
this case, the following theorem proves that the linear estimator that
achieves the minimax linear risk is diagonal in the discrete Fourier basis.
It is therefore a circular convolution. In the discrete Fourier basis,
N ;1 2 ;1 ^
X m N jf m]j2
rinf (f ) =
(10.154)
2
;1 ^ 2 :
m=0 m + N jf m]j
We denote by QH ] the quadratic convex hull of in the discrete
Fourier basis. CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 652 Theorem 10.12 Let be a translation invariant set. The minimax
linear risk is reached by circular convolutions and rl ( ) = rinf (QH ]) : (10.155) Proof 2 . Proposition 10.11 proves that the linear minimax risk when
estimating f 2 from the deconvolved noisy data X satis es rl ( )
rinf (QH ]). The reverse inequality is obtained with the same derivations
as in the proof of Theorem 10.7. The risk rinf (QH ]) is reached by
estimators that are diagonal in the discrete Fourier basis. If is closed and bounded, then there exists x 2 QH ] such that
rinf (x) = rinf (QH ]). The minimax risk is then achieved by a lter
^
whose transfer function d1 m] is speci ed by (10.131). The resulting
estimator is
~
F = D1 X = d1 ? X = d1 ? u;1 ? Y :
~
~
So F = DY = d ? Y , and one can verify (Problem 10.15) that
^
d m] = ^
^
N ;1 jx m]j2 u m] :
2 + N ;1 jx m]j2 ju m]j2
^
^ (10.156) 2
2
^
If m = 2 ju m]j;2 N ;1 jx m]j2 then d m] u;1 m], but if m
^
^
^
^
^
N ;1 jx m]j2 then d m] 0. The lter d is thus a regularized inverse of
^
u.
Theorem 10.12 can be applied to a set of signals with bounded total
variation ( V The set V = f : kf kV = X N ;1
n=0 f n] ; f n ; 1] ) C: is indeed translation invariant. Proposition 10.13 For a hyperbolic deconvolution of degree p, if 1
C= N then rl ( V )
N2 C N 1=2 (2p;1)=p : (10.157) 10.4. RESTORATION 3 653 Proof 2 . Since V is translation invariant, Theorem 10.12 proves that
rl ( V ) = rinf (QH V ]). Proposition 10.5 shows in (10.89) that all f 2
V have a discrete Fourier transform that satis es
2 C
jf^ m]j2 4 j sin m j2 = jx m]j2 :
^
(10.158)
N
Hence V is included in the hyperrectangle Rx . The convex hull QH V ]
is thus also included in Rx which is quadratically convex, and one can verify that rinf (QH ]V ) rinf (Rx) 2 rinf (QH ]V ) : (10.159) ;
The value rinf (Rx ) is calculated by inserting (10.158) with m2 = ;2 ju m]j2
^
in (10.154): rinf (Rx ) = N ;1
X N ;1 C 2 2
m2
2
;1 2 ^ 2 :
m=0 4 j sin N j + N C ju m]j (10.160) For ju m]j 2m N ;1 ; 1 p, if 1 C = N then an algebraic calcula^
tion gives rinf (Rx ) (C N ;1=2 ;1 )(2p;1)=p . So rl ( V ) = rinf (QH V ])
satis es (10.157). For a constant signal to noise ratio C 2=(N 2) 1, (10.157) implies
that
rl ( V ) 1 :
(10.161)
N2
Despite the fact that decreases and N increases, the normalized linear
minimax risk remains on the order of 1. Example 10.3 Figure 10.10(a) is a signal Y obtained by smoothing
a signal f with the low-pass lter m
u m] = cos2 N :
^
(10.162)
This lter has a zero of order p = 2 at N=2. Figure 10.10(b) shows
^
~
the estimation F = Y ? d calculated with the transfer function d m]
obtained by inserting (10.158) in (10.156). The maximum risk over V
of this estimator is within a factor 2 of the linear minimax risk rl ( V ). 654 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 250 250 200 200 150 150 100 100 50 50 0 0 −50 −50 −100
0 0.2 0.4 0.6 0.8 1 −100
0 0.2 0.4 0.6 0.8 1 (a)
(b)
Figure 10.10: (a): Degraded data Y , blurred with the lter (10.162)
and contaminated by a Gaussian white noise (SNR = 25.0 db). (b)
Deconvolution calculated with a circular convolution estimator whose
risk is close to the linear minimax risk over bounded variation signals
(SNR = 25.8 db). Thresholding Deconvolution An e cient thresholding estimator
is implemented in a basis B that de nes a sparse representation of signals in V and which nearly diagonalizes K . This approach was
introduced by Donoho 163] to study inverse problems such as inverse
Radon transforms. We concentrate on more unstable hyperbolic deconvolutions.
The covariance operator K is diagonalized in the discrete Fourier
basis and its eigenvalues are
;2p
2
2=
2 2k ; 1
:
(10.163)
k ju k]j2
^
N
Yet the discrete Fourier basis is not appropriate for the thresholding algorithm because it does not provide e cient approximations of bounded
variation signals. In contrast, periodic wavelet bases provide e cient
approximations of such signals. We denote by 0 0 n] = N ;1=2 . A
discrete and periodic orthonormal wavelet basis can be written
B = f j mgL<j 0 0 m<2;j :
(10.164)
However, we shall see that this basis fails to approximately diagonalize
K. 10.4. RESTORATION 3 655 The discrete Fourier transform ^j m k] of a wavelet has an energy
mostly concentrated in the interval 2;j;1 2;j ], as illustrated by Figure
10.11. If 2j < 2 N ;1 then over this frequency interval (10.163) shows
2
that the eigenvalues k remain on the order of 2. These wavelets are
therefore approximate eigenvectors of K . At the nest scale 2l = 2 N ;1,
j ^l m k]j has an energy mainly concentrated in the higher frequency
2
band N=4 N=2], where k varies by a huge factor on the order of N 2r .
These ne scale wavelets are thus far from approximating eigenvectors
of K .
_
h
_
_
h
g
__
hg
__ψ
hg
j,n _
h
_
gψ _
g
_
h _
g
L+1,n _
g ψ
L+1,n _
_h _
gh
__
ψ
j,n g h σ2
k σ ^
ψ j,n 2 ^
ψ ^
ψ L+1,n L+1,n ^
ψ j,n k
0 N/4 N/2 Figure 10.11: Wavelets and mirror wavelets are computed with a
wavelet packet lter bank tree, where each branch corresponds to a
convolution with a lter h or g followed by a subsampling. The graphs
_
of the discrete Fourier transforms j ^j n k] and j b j n k] are shown below
2 of the noise has a hyperbolic growth but
the tree. The variance k
varies by a bounded factor on the frequency support of each mirror
wavelet.
To construct a basis of approximate eigenvectors of K , the nest
scale wavelets must be replaced by wavelet packets that have a Fourier
2
transform concentrated in subintervals of N=4 N=2] where k varies 656 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS by a factor that does not grow with N . In order to e ciently approximate piecewise regular signals, these wavelet packets must also have
the smallest possible spatial support, and hence the largest possible frequency support. The optimal trade-o is obtained with wavelet packets
_
that we denote _ j m, which have a discrete Fourier transform b j m k]
;j N=2 ; 2;j ;1], as illustrated by Figmostly concentrated in N=2 ; 2
ure 10.11. This basis is constructed with a wavelet packet ltering tree
that subdecomposes the space of the nest scale wavelets. These particular wavelet packets introduced by Kalifa and Mallat 232, 233] are
called mirror wavelets because
j b_ j m k]j = j ^j m N=2 ; k]j :
Let L = ; log2 N . A mirror wavelet basis is a wavelet packet basis
composed of wavelets j m at scales 2j < 2L;1 and mirror wavelets to
replace the nest scale wavelets 2L;1:
n
o
_j m
B = jm
:
;j
0 m<2 L;1<j 0 To prove that the covariance K is \almost diagonalized" in B for all
N , the asymptotic behavior of the discrete wavelets and mirror wavelets
must be controlled. The following theorem thus supposes that these
wavelets and wavelet packets are constructed with a conjugate mirror
lter that yields a continuous time wavelet (t) with q > p vanishing
moments and which is Cq . The near diagonalization is veri ed to prove
that a thresholding estimator in a mirror wavelet basis has a risk whose
decay is equivalent to the non-linear minimax risk.
Theorem 10.13 (Kalifa, Mallat) Let B be a mirror wavelet basis
constructed with a conjugate mirror lter that de nes a wavelet that is
Cq with q vanishing moments. For a hyperbolic deconvolution of degree
p < q, if 1 C= N p+1=2 then
rn ( V ) r t ( V )
C 4p=(2p+1) (loge N )1=(2p+1) :
(10.165)
N2
N2
N
Proof 2 . The main ideas of the proof are outlined. We must rst verify
that there exists such that for all N > 0
1
1
kKd =2 K ;1 Kd =2kS
:
(10.166) 10.4. RESTORATION 3 657 The operator K ;1 = ;2 U U is a circular convolution whose transfer
function is ;2 ju m]j2 2 j2m=N ; 1j2p . The matrix of this operator in
^
the mirror wavelet basis is identical to the matrix in the discrete wavelet
basis of a di erent circular convolution whose transfer function satis es
;2 ju m+N=2]j2
;2 j2m=N j2p . This last operator is a discretized and
^
periodized version of a convolution operator in L2 (R) of transfer function
;2 N ;2p j!j2p . One can prove 47, 221] that this operator is
u(!)
^
preconditioned by its diagonal in a wavelet basis of L2 (R) if the wavelet
has q > p vanishing moments and is Cq . We can thus derive that in the
1
1
nite case, when N grows, kKd =2 K ;1 Kd =2 kS remains bounded.
The minimax and thresholding risk cannot be calculated directly with
the inequalities (10.146) because the set of bounded variation signals V
is not orthosymmetric in the mirror wavelet basis B. The proof proceeds
as in Theorem 10.9. We rst show that we can compute an upper bound
and a lower bound of kf kV from the absolute value of the decomposition
coe cients of f in the mirror wavelet basis B. The resulting inequalities
are similar to the wavelet ones in Proposition 10.10. This constructs two
orthosymmetric sets 1 and 2 such that 1
V
2 . A re nement
of the inequalities (10.146) shows that over these sets the minimax and
thresholding risks are equivalent, with no loss of a loge N factor. The
risk over 1 and 2 is calculated by evaluating rinf ( 1 ) and rinf ( 2 ),
from which we derive (10.165). This theorem proves that a thresholding estimator in a mirror wavelet
basis yields a quasi-minimax deconvolution estimator for bounded variation signals. If we suppose that the signal to noise ratio C 2 =(N 2) 1
then
rn( V ) rt ( V )
loge N 1=(2p+1) :
(10.167)
N2
N2
N As opposed to the normalized linear minimax risk (10.161) which remains on the order of 1, the thresholding risk in a mirror wavelet basis
converges to zero as N increases. The larger the number p of zeroes of
the low-pass lter u k] at k = N=2, the slower the risk decay.
^ Example 10.4 Figure 10.10(a) shows a signal Y degraded by a con; volution with a low-pass lter u k] = cos2 Nk . The result of the
^
deconvolution and denoising with a thresholding in the mirror wavelet
basis is shown in Figure 10.12. A translation invariant thresholding is 658 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
250
200
150
100
50
0
−50
−100
0 0.2 0.4 0.6 0.8 1 Figure 10.12: Deconvolution of the signal in Figure 10.10(a) with a
thresholding in a mirror wavelet basis (SNR = 29.2 db).
performed to reduce the risk. The SNR is 29.2 db, whereas it was 25.8
db in the linear restoration of Figure 10.10(b). Deconvolution of Satellite Images Nearly optimal deconvolution of bounded variation images can be calculated with a separable extension of the deconvolution estimator in a mirror wavelet basis. Such a
restoration algorithm is used by the French Spatial Agency (CNES) for
the production of satellite images.
The exposition time of the satellite photoreceptors cannot be reduced too much because the light intensity reaching the satellite is
small and must not be dominated by electronic noises. The satellite
movement thus produces a blur, which is aggravated by the imperfection of the optics. The electronics of the photoreceptors adds a Gaussian white noise. When the satellite is in orbit, a calibration procedure
measures the impulse response u of the blur and the noise variance 2 .
The image 10.14(b), provided by the CNES (French spatial agency), is
a simulated satellite image calculated from an airplane image shown in
Figure 10.14(a). The impulse response is a separable low-pass lter: Uf n1 n2] = f ? u n1 n2] with u n1 n2] = u1 n1 ] u2 n2] :
The discrete Fourier transform of u1 and u2 have respectively a zero of 10.4. RESTORATION 3 659 order p1 and p2 at N=2:
2k1 ; 1 p1 and u k ] 2k2 ; 1 p2 :
^2 2
u1 k1 ] N
^
N
The deconvolved noise has a covariance K that is diagonalized in a
two-dimensional discrete Fourier basis. The eigenvalues are
;2p1 2k
;2p2
2
2
2 2k1 ; 1
= ju k ]j2 ju k ]j2
;1
: (10.168)
^1 1 ^2 2
N
N
Most satellite images are well modeled by bounded variation images.
The main di culty is again to nd an orthonormal basis that provides
a sparse representation of bounded variation images and which nearly
diagonalizes the noise covariance K . Each vector of such a basis should
have a Fourier transform whose energy is concentrated in a frequency
2
domain where the eigenvectors k1 k2 vary at most by a constant factor.
Rouge 299, 300] has demonstrated numerically that e cient deconvolution estimations can be performed with a thresholding in a wavelet
packet basis.
At low frequencies (k1 k2) 2 0 N=4]2 the eigenvalues remain ap2
2 . This frequency square can thus be
proximately constant: k1 k2
covered with two-dimensional wavelets jl m. The remaining high frequency annulus is covered by two-dimensional mirror wavelets that are
separable products of two one-dimensional mirror wavelets. One can
verify that the union of these two families de nes an orthonormal basis
of images of N 2 pixels:
2
k1 k2 B= n l
j m n1 o n2 ] jml n_ o j m n1 ] _ j 0 m0 n2 ] j j 0 m m0 : (10.169) This two-dimensional mirror wavelet basis segments the Fourier plane
as illustrated in Figure 10.13. It is an anisotropic wavelet packet basis
as de ned in Problem 8.4. Decomposing a signal in this basis with a
lter bank requires O(N 2) operations.
To formally prove that a thresholding estimator in B has a risk
rt ( V ) that is close to the non-linear minimax risk rn( V ), one must
1
1
prove that there exists such that kKd =2 K ;1 Kd =2 kS
and that V CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 660 σ2
k
0 N/2
k1 k
2
N/2 N/4 ^^
j’ j j σk2 ^^
ψψ ψψ ^1
ψ 1 j ^3
ψj
^2
ψ 2 j N/4 ^^ ψj ψj ’ j 0 Figure 10.13: The mirror wavelet basis (10.169) segments the frequency
2
plane (k1 k2) into rectangles over which the noise variance k1 k2 =
22
k1 k2 varies by a bounded factor. The lower frequencies are covered
by separable wavelets jk , and the higher frequencies are covered by
separable mirror wavelets _ j _ j0 . 10.4. RESTORATION 3 (a) 661 (b) (c)
(d)
Figure 10.14: (a): Original airplane image. (b): Simulation of a satellite
image provided by the CNES (SNR = 31.1db). (c): Deconvolution with
a translation invariant thresholding in a mirror wavelet basis (SNR =
34.1db). (d): Deconvolution calculated with a circular convolution,
which yields a nearly minimax risk for bounded variation images (SNR
= 32.7db). 662 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS can be embedded in two close sets that are orthosymmetric in B. The
following theorem computes the risk in a particular con guration of the
signal to noise ratio, to simplify the formula. More general results can
be found in 232]. Theorem 10.14 (Kalifa, Mallat) For a separable hyperbolic deconvolution of degree p = max(p1 p2 ) 3=2, if C 2 =(N 2 2 ) 1 then
rl ( V ) 1
(10.170)
22
and N rn( V ) rt ( V )
loge N 1=(2p+1) :
(10.171)
N2 2
N2 2
N2
The theorem proves that the linear minimax estimator does not
reduce the original noise energy N 2 2 by more than a constant. In
contrast, the thresholding estimator in a separable mirror wavelet basis
has a quasi-minimax risk that converges to zero as N increases.
Figure 10.14(c) shows an example of deconvolution calculated in the
mirror wavelet basis. The thresholding is performed with a translation
invariant algorithm. This can be compared with the linear estimation in
Figure 10.14(d), calculated with a circular convolution estimator whose
maximum risk over bounded variation images is close to the minimax
linear risk. As in one dimension, the linear deconvolution sharpens the
image but leaves a visible noise in the regular parts of the image. The
thresholding algorithm completely removes the noise in these regions
while improving the restoration of edges and oscillatory parts. 10.5 Coherent Estimation 3
If we cannot interpret the information carried by a signal component, it
is often misconstrued as noise. In a crowd speaking a foreign language,
we perceive surrounding conversations as background noise. In contrast, our attention is easily attracted by a remote conversation spoken
in a known language. What is important here is not the information
content but whether this information is in a coherent format with respect to our system of interpretation. The decomposition of a signal in 10.5. COHERENT ESTIMATION 663 a dictionary of vectors can similarly be considered as a signal interpretation 259]. Noises are then de ned as signal components that do not
have strong correlation with any vector of the dictionary. In the absence
of any knowledge concerning the noise, a signal is estimated by isolating
the coherent structures which have a high correlation with vectors in
the dictionary. If the noise is not Gaussian, computing the estimation
risk is much more di cult. This section introduces algorithms that can
be justi ed intuitively, but which lack a rm mathematical foundation. 10.5.1 Coherent Basis Thresholding Let B = fgm g0 m<N be an orthonormal basis. If W n] is a Gaussian
white process of size N and variance 2, then EfkW k2g = N 2 and the
coe cients hW gmi are independent Gaussian random variables. When
N increases there is a probability converging towards 1 that 9]
max0 m<N jhW gm ij p2 log N
p p2 log N
p = CN : (10.172)
kW k
N
N
The factor CN is the maximum normalized correlation of a Gaussian
e = e white noise of size N .
The correlation of a signal f with the basis B is de ned by
jh
C (f ) = sup0 m<N k f gmij :
kf
We say that f is a noise with respect to B if it does not correlate vectors
in B any better than a Gaussian white noise: C (f ) CN . For example,
f n] = ei n is a noise in a basis of discrete Diracs gm n] = n ; m],
because
sup0 m<N jf m]j
= p1 < CN :
kf k
N Coherent Structures Let Z be an unknown noise. To estimate a
signal f from X = f + Z , we progressively extract the vectors of B that
best correlate X . Let us sort the inner products hX gm i:
jhX gmk ij jhX gmk ij for 1 k < N ; 1:
+1 664 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS The data X is not reduced to a noise if
g
C (X ) = jhX Xm1 ij > CN :
kk
The vector gm1 is then interpreted as a coherent structure.
For any k 1, we consider Rk X =X; k
X
p=1 hX gmp i gmp = N
X p=k+1 hX gmp i gmp : The residue Rk X is the orthogonal projection of X in a space of dimension N ; k. The normalized correlation of this residue with vectors
in B is compared with the normalized correlation of a Gaussian white
noise of size N ; k. This residue is not a noise if
2
N;
2
C 2 (Rk X ) = PN jhX gmk ij 2 > CN ;k = 2 loge (; k k) :
N
p=k+1 jhX gmp ij
The vector gmk is then also a coherent structure.
Let M be the minimum index such that C (RM X ) CN ;M : (10.173) Observe that M is a random variable whose values depend on each
realization of X . The signal f is estimated by the sum of the M ; 1
coherent structures:
~
F= X M ;1
p=1 hX gmp i gmp : This estimator is also obtained by thresholding the coe cients hX gmi
with the threshold value T = CN ;M X N ;1
p=M jhX gmp ij2 !1=2 : (10.174) The extraction of coherent structures can thus be interpreted as a calculation of an appropriate threshold for estimating f , in the absence of 10.5. COHERENT ESTIMATION 665 any knowledge about the noise. This algorithm estimates f e ciently
only if most of its energy is concentrated in the direction of few vecP ;1
tors gm in B. For example, f = N =0 gm has no coherent structures
m
because C (f ) = N ;1=2 < CN . Even though Z = 0, the extraction of
~
coherent structures applied to X = f yields F = 0. This indicates that
the basis representation is not well adapted to the signal.
200 200 150 150 100 100 50 50 0 0 −50 −50 −100
0 0.2 0.4 0.6 0.8 1 −100
0 0.2 0.4 0.6 0.8 (a)
(b)
Figure 10.15: (a): The same signal as in Figure 10.4 to which is added
a noisy musical signal (SNR = 19:3 db). (b): Estimation by extracting
coherent structures in a Daubechies 4 wavelet basis (SNR = 23:0 db). Figure 10.15(a) shows a piecewise regular signal contaminated by
the addition of a complex noise, which happens to be an old musical recording of Enrico Caruso. Suppose that we want to remove this
\musical noise." The coherent structures are extracted using a wavelet
basis, which approximates piecewise smooth functions e ciently but
does not correlate well with high frequency oscillations. The estimation in Figure 10.15(b) shows that few elements of the musical noise
are coherent structures relative to the wavelet basis. If instead of this
musical noise a Gaussian white noise of variance is added to this
piecewise smooth signal, then the coherent structure algorithm computes an estimated threshold (10.174) that is within 10% of the threshp
old T = 2 loge N used for white noises. The estimation is therefore
very similar to the hard thresholding estimation in Figure 10.4(c). 1 666 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Pursuit of Bases No single basis can provide a \coherent" interpre- tation of complex signals such as music recordings. To remove noise
from historical recordings, Berger, Coifman and Goldberg 92] introduced an orthogonal basis pursuit algorithm that searches a succession
of \best bases." Excellent results have been obtained on the restoration the recording of Enrico Caruso. In this case, we must extract
coherent structures corresponding to the original musical sound as opposed to the degradations of the recording. The coherent extraction
shown in Figure 10.15(b) demonstrates that hardly any component of
this recording is highly coherent in the Daubechies 4 wavelet basis. It
is therefore necessary to search for other bases that match the signal
properties.
Let D = 2 B be a dictionary of orthonormal bases. To nd a
basis in D that approximates a signal f e ciently, Section 9.3.1 selects
a best basis B that minimizes a Schur concave cost function C (f B ) = X N ;1
m=1 jhf gmij2
kf k2 where (x) is a concave function, possibly an entropy (9.62) or an lp
norm (9.64). A pursuit of orthogonal bases extracts coherent structures
from noisy data X with an iterative procedure that computes successive
residues that we denote Xp:
1. Initialization X0 = X .
2. Basis search A best basis B p is selected in D by minimizing a
cost:
C (Xp B p ) = min C (Xp B ) :
2
3. Coherent calculation Coherent structures are extracted as long
as C (Rk Xp) > CN ;k in B p . Let Mp be the number of coherent
structures de ned by C (RMp Xp) CN ;Mp . The remainder is Xp+1 = RMp Xp:
4. Stopping rule If Mp = 0, stop. Otherwise, go to step 2.
For musical signals 92], the pursuit of bases is performed in a general dictionary that is the union of a dictionary of local cosine bases 10.5. COHERENT ESTIMATION 667 and a dictionary of wavelet packet bases, introduced respectively in
Sections 8.1 and 8.5. In each dictionary, a best basis is calculated with
an entropy function (x) and is selected by the fast algorithm of Section 9.3.2. The best of these two \best" bases is retained. To take into
account some prior knowledge about the noise and the properties of
musical recordings, the correlation C (f ) used to extract coherent structures can be modi ed, and further ad-hoc re nements can be added
92]. 10.5.2 Coherent Matching Pursuit
A matching pursuit o ers the exibility of searching for coherent structures in arbitrarily large dictionaries of patterns D = fg g 2;, which
can be designed depending on the properties of the signal. No orthogonal condition is imposed. The notions of coherent structure and noise
are rede ned by analyzing the asymptotic properties of the matching
pursuit residues. Dictionary Noise A matching pursuit decomposes f over selected dictionary vectors with the greedy strategy described in Section 9.4.2.
Theorem 9.9 proves that the residue Rm f calculated after m iterations
of the pursuit satis es limm!+1 kRmf k = 0.
The matching pursuit behaves like a non-linear chaotic map, and it
has been proved by Davis, Mallat and Avelaneda 151] that for particular dictionaries, the normalized residues Rmf kRmf k;1 converge to an
attractor. This attractor is a set of signals h that do not correlate well
with any g 2 D because all coherent structures of f in D are removed
by the pursuit. The correlation of a signal f with the dictionary D is
de ned by
;
C (f ) = sup 2kfjhkf g ij :
For signals in the attractor, this correlation has a small amplitude that
remains nearly equal to a constant CD , which depends on the dictionary
D 151]. Such signals do not correlate well with any dictionary vector
and are thus considered as noise with respect to D. 668 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS The convergence of the pursuit to the attractor implies that after a
su ciently large number M of iterations the residue RM f has a correlation C (RM f ) that is nearly equal to CD . Figure 10.16 gives the decay
of C (Rm f ) as a function of m, for two signals decomposed in a Gabor
dictionary. After respectively M = 1400 and M = 76 iterations, both
curves reach the attractor level CD = 0:06.
C(Rm f) 0.25 0.2 0.15
(a)
0.1 (b) 0.05 m
0 200 400 600 800 1000 1200 1400 Figure 10.16: Decay of the correlation C (Rm f ) as a function of the
number of iterations m, for two signals decomposed in a Gabor dictionary. (a): f is the recording of \greasy" shown in Figure 10.17(a). (b):
f is the noisy \greasy" signal shown in Figure 10.17(b). Coherent Pursuit Coherent structures are progressively extracted to estimate f from X = f + Z . These coherent structures are dictionary
vectors selected by the pursuit, and which are above the noise level CD .
For any m 0, the matching pursuit projects the residue Rk X on a
vector g k 2 D such that
jhRk X g k ij = sup jhRk X g ij:
2; The vector g k is a coherent structure of Rk X if
kX g
C (Rk f ) = jhR Rk X kk ij > CD :
k
Let M be the minimum integer such that C (RM f ) CD . The residue
RM X has reached the noise level and is therefore not further decom- 10.5. COHERENT ESTIMATION 669 posed. The signal is estimated from the M coherent structures:
~
F= X M ;1
p=0 hR p X g p i g p : This estimator can also be interpreted as a thresholding of the matching
pursuit of X with a threshold that is adaptively adjusted to T = CD kRM X k:
2000 2500 1500 2000
1500 1000
1000 500 500
0 0
-500 −500 -1000 −1000
−1500
0 -1500 0.2 0.4 0.6 (a) 0.8 1 ω 2π -2000 0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 (b) 2000 8000 1500
1000
500 4000 0
-500
-1000 0 0 1 -1500 0 0.2 (c)
(d)
Figure 10.17: (a): Speech recording of \greasy." (b): Recording of
\greasy" plus a Gaussian white noise (SNR = 1:5 db). (c): Timefrequency distribution of the M = 76 coherent Gabor structures. (d):
~
Estimation F reconstructed from the 76 coherent structures (SNR =
6:8 db). 670 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Example 10.5 Figure 10.17(b) from 259] shows the speech recording of \greasy" contaminated with a Gaussian white noise, with an SNR
of 1.5 db. The curve (b) of Figure 10.16 shows that the correlation
C (Rm f ) reaches CD after m = M = 76 iterations. The time-frequency
energy distribution of these 76 Gabor atoms is shown in Figure 10.16(c).
~
The estimation F calculated from the 76 coherent structures is shown
in Figure 10.17(d). The SNR of this estimation is 6.8 db. The white
noise has been removed and the restored speech signal has a good intelligibility because its main time-frequency components are retained. 10.6 Spectrum Estimation 2
A zero-mean Gaussian process X of size N is characterized by its covariance matrix. For example, unvoiced speech sounds such as \ch"
or \s" can be considered as realizations of Gaussian processes, which
allows one to reproduce intelligible sounds if the covariance is known.
The estimation of covariance matrices is di cult because we generally
have few realizations, and hence few data points, compared to the N 2
covariance coe cients that must be estimated. If parametrized models
are available, which is the case for speech recordings 61], then a direct
estimation of the parameters can give an accurate estimation of the covariance 60]. This is however not the case for complex processes such
as general sounds or seismic and underwater signals. We thus follow a
non-parametrized approach that applies to non-stationary processes.
When the Karhunen-Loeve basis is known in advance, one can reduce the estimation to the N diagonal coe cients in this basis, which
de ne the power spectrum. This is the case for stationary processes,
where the Karhunen-Loeve basis is known to be the Fourier basis. For
non-stationary processes, the Karhunen-Loeve basis is not known, but
it can be approximated by searching for a \best basis" in a large dictionary of orthogonal bases. This approach is illustrated with locally
stationary processes, where the basis search is performed in a dictionary
of local cosine bases. 10.6. SPECTRUM ESTIMATION 671 10.6.1 Power Spectrum We want to estimate the covariance matrix of a zero-mean random vector X of size N from L realizations fXl g0 l<L. Let B = fgmg0 m<N be
an orthonormal basis. The N 2 covariance coe cients of the covariance
operator K are a l m] = hKgl gmi = EfhX gl i hX gm i g:
When L is much smaller than N , which is most often the case, a naive
estimation of these N 2 covariances gives disastrous results. In signal
processing, the estimation must often be done with only L = 1 realization. Naive Estimation Let us try to estimate the covariance coe cients
with sample mean estimators L
1 X hX g ihX g i :
A l m] = L
ll
lm
l=1 (10.175) We denote by K the estimated covariance matrix whose coe cients are
the A l m]. The estimation error is measured with a Hilbert-Schmidt
norm. The squared Hilbert-Schmidt norm of an operator K is the sum
of its squared matrix coe cients, which is also equal to the trace of the
product of K and its complex transpose K : kK k =
2
H X N ;1
l m=0 ja l m]j2 = tr(KK ) : The Hilbert-Schmidt error of the covariance estimation is kK ; K k =
2
H X N ;1
l m=0 ja l m] ; A l m]j2 : The following proposition computes its expected value when X is a
Gaussian random vector. CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 672 Proposition 10.14 If X is a Gaussian random vector then
E 1
jA l m] ; a l m]j2 = L ja l m]j2 + a l l] a m m] and EfkK ; K k2 g =
H kK k2 + E2 fkX k2g :
H
L L (10.176)
(10.177) Proof 2 . The sample mean-estimator (10.175) is unbiased:
E A l m] = a l m] so jA l m] ; a l m]j2 = E jA l m]j2 ; ja l m]j2 :
Let us compute EfjA l m]j2 g.
E EfjA l m]j2 g = E
= ( L
1 X hX k g i h X k g i
m
l
L 2 (10.178) ) k=1
Ln
1 X E jhX k g ij2 jhX k g ij2 o +
m
l
L2 k=1
L
1 X E nhX k g i hX k g i o E hX j g i
m
l
l
L2
k j =1
k6=j (10.179) hX j gm i : Each hX k gl i is a Gaussian random variable and for all k
E n o hX k gl i hX k gm i = a l m]: If A1 A2 A3 A4 are jointly Gaussian random variables, one can verify
that
EfA1 A2 A3 A4 g = EfA1 A2 gEfA3 A4 g+EfA1 A3 gEfA2 A4 g+EfA1 A4 gEfA2 A3 g: Applying this result to (10.179) yields
1;
1
E jA l m]j2 = 2 L a l l] a m m] + 2ja l m]j2 + 2 (L2 ; L) ja l m]j2
L
L 10.6. SPECTRUM ESTIMATION
so E 673 1
1
jA l m]j2 = 1 + L ja l m]j2 + L a l l] a m m]: We thus derive (10.176) from (10.178).
The Hilbert-Schmidt norm is
N ;1
X EfkK ; K k2 g =
H Efja l l m=0
X
1 N ;1 m] ; A l m]j2 g X
1 N ;1 a l l] a m m]:
=L
ja l m]j2 + L
l m=0
l m=0 Observe that
EfkX k2 g = N ;1
X
m=0 EfjhX gm ij2 g = N ;1
X
m=0 a m m]: Inserting this in the previous equation gives (10.177). The error calculation (10.176) proves that EfjA l m] ; a l m]j2g depends not only on ja l m]j2 but also on the amplitude of the diagonal
coe cients a l l] and a m m]. Even though a l m] may be small, the
error of the sample mean estimator is large if the diagonal coe cients
are large:
a l l ] a m m] :
E jA l m] ; a l m]j2
(10.180)
L
The error produced by estimating small amplitude covariance coe cients accumulates and produces a large Hilbert-Schmidt error (10.177). Example 10.6 Suppose that X is a random vector such that EfjX n]j2g
is on the order of 2 but that EfX n]X m]g decreases quickly when
jn ; mj increases. The Hilbert-Schmidt norm of K can be calculated
in a Dirac basis gm n] = n ; m], which gives kK k2 =
H X N ;1
l m=0 jEfX l] X m]gj2 N 2 674 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS and
EfkX k g =
2 As a consequence, for N X N ;1
n=0 E jX n]j2 N 2: L, E2 fkX k2 g kK ; K k2
H 4N 2 kK k2 :
H
L
L
The estimation error is huge a better result is obtained by simply
setting K = 0.
E Power Spectrum If we know in advance the Karhunen-Loeve ba- sis that diagonalizes the covariance operator, we can avoid estimating
o -diagonal covariance coe cients by working in this basis. The N
diagonal coe cients p m] = a m m] are the eigenvalues of K , and are
called its power spectrum.
We denote by P m] = A m m] the sample mean estimator along
the diagonal. The sample mean error is computed with (10.176):
E m
jP m] ; p m]j2 = 2jp L ]j :
2 (10.181) Since the covariance is diagonal, kK k =
2
H X N ;1
m=0 jp m]j2 = kpk2: (10.182) The estimated diagonal operator K with diagonal coe cients P m] has
therefore an expected error
EfkK ; K k 2
H g = EfkP ; pk g =
2 X 2 jp m]j2 N ;1
m=0 L = 2 kK kH : (10.183)
L
2 The relative error EfkK ; K k2 g=kK k2 decreases when L increases
H
H
but it is independent of N . To improve this result, we must regularize
the estimation of the power spectrum p m]. 10.6. SPECTRUM ESTIMATION 675 Regularization Sample mean estimations P m] can be regularized
if p m] varies slowly when m varies along the diagonal. These random
coe cients can be interpreted as \noisy" measurements of p m]: P m] = p m] (1 + W m]):
Since P m] is unbiased, EfW m]g = 0. To transform the multiplicative
noise into an additive noise, we compute
loge P m] = loge p m] + loge (1 + W m]):
If X n] is Gaussian, then W m] has a
(10.181) proves that
2
EfjW m]j2 g = :
L 2
2 (10.184) distribution 40], and The coe cients fhX gmig0 m<N of a Gaussian process in a KarhunenLoeve basis are independent variables, so P m] and P l] and hence
W m] and W l] are independent for l 6= m. As a consequence, W m]
and loge(1 + W m]) are non-Gaussian white noises.
~
In the Gaussian case, computing a regularized estimate P m] of p m]
~ be the diagonal
from (10.184) is a white noise removal problem. Let K
~
matrix whose diagonal coe cients are P m]. This matrix is said to be
a consistent estimator of K if
~
;~ 2
;P 2
lim1 EfkKK kK kH g = N lim1 Efkp pk2 k g = 0:
N !+
!+
k2
k
H
Linear estimations and Wiener type lters perform a weighted average with a kernel whose support covers a domain where loge p m] is
expected to have small variations. This is particularly e ective if p m]
is uniformly regular.
If p m] is piecewise regular, then wavelet thresholding estimators
improve the regularization of linear smoothings 188]. Following the
algorithm of Section 10.2.4, the wavelet coe cients of loge P m] are
thresholded. Despite the fact that loge(1 + W m]) is not Gaussian,
if X n] is Gaussian then results similar to Theorem 10.4 are proved
344] by verifying that wavelet coe cients have asymptotic Gaussian
properties. 676 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Stationary Processes If X is circular wide-sense stationary, then its covariance operator is a circular convolution that is diagonalized in
the discrete Fourier basis
mn
:
gm n] = p1 exp i2 N
N
0 m<N
The power spectrum is the discrete Fourier transform of the covariance
RX l] = EfX n] X n ; l]g:
^
RX m] = X N ;1 RX l] exp ;i2m l = EfjhX gmij2g :
N
l=0 It is estimated with only L = 1 realization by computing P m], which
is called a periodogram 60]:
N ;1
X
2= 1
P m] = jhX gm ij N
X n] exp ;i2Nmn
n=0 2 : (10.185) Most often, the stationarity of X is not circular and we only know
the restriction of its realizations to 0 N ; 1]. The discrete Fourier
basis is thus only an approximation of the true Karhunen-Loeve basis,
and this approximation introduces a bias in the spectrum estimation.
This bias is reduced by pre-multiplying X n] with a smooth window
g n] of size N , which removes the discontinuities introduced by the
Fourier periodization. Such discrete windows are obtained by scaling
and sampling one of the continuous time windows g(t) studied in Section 4.2.2. This windowing technique can be improved by introducing
several orthogonal windows whose design is optimized in 331].
To obtain a consistent estimator from the periodogram P m], it
is necessary to perform a regularization, as previously explained. If
the spectrum is uniformly regular, then a linear ltering can yield a
consistent estimator 60]. Figure 10.18(c) shows a regularized log periodogram calculated with such a linear ltering. The random uctuations are attenuated but the power spectrum peaks are smoothed. A
linear ltering of the spectra is more often implemented with a time windowing procedure, described in Problem 10.18. The interval 0 N ; 1] is 10.6. SPECTRUM ESTIMATION 677 divided in M subintervals with windows of size N=M . A periodogram is
computed over each interval and a regularized estimator of the power
spectrum is obtained by averaging these M periodograms. Wavelet
thresholdings can also be used to regularize piecewise smooth spectra
344].
~ - ^ 12 loge Rx 12 loge P 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 loge P 0
−2000 −1000 0 1000 m
2000 0
−2000 2
−1000 0 1000 m
2000 0
−2000 −1000 0 1000 m
2000 (a)
(b)
(c)
^
Figure 10.18: (a): Log power spectrum loge RX m] of a stationary process X n]. (b): Log periodogram loge P m] computed from L = 1
~
realization. (c): Linearly regularized estimator loge P m]. 10.6.2 Approximate Karhunen-Loeve Search 3 If X is non-stationary, we generally do not know in advance its KarhunenLoeve basis. But we may have prior information that makes it possible to design a dictionary of orthonormal bases guaranteed to contain
at least one basis that closely approximates the Karhunen-Loeve basis. Locally stationary processes are examples where an approximate
Karhunen-Loeve basis can be found in a dictionary of local cosine bases.
The algorithm of Mallat, Papanicolaou and Zhang 260] estimates this
best basis by minimizing a negative quadratic sum. This is generalized
to other Schur concave cost functions, including the entropy used by
Wickerhauser 76]. Diagonal Estimation Proposition 10.14 proves that an estimation of all covariance coe cients produces a tremendous estimation error.
Even though a basis B is not a Karhunen-Loeve basis, it is often prefer~
able to estimate the covariance K with a diagonal matrix K , which
is equivalent to setting the o -diagonal coe cients to zero. The N 678 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS ~
diagonal coe cients P m] are computed by regularizing the sample
mean-estimators (10.175). They approximate the spectrum of K .
The Hilbert-Schmidt error is the sum of the diagonal estimation
errors plus the energy of the o -diagonal coe cients:
~
kK ; K k2 =
H
Since kK k =
2
H we have
~
kK ; K k2 =
H
Let us denote X N ;1
l m=0 X N ;1
m=0 X N ;1
m=0 ~
jP m] ; p m]j2 + ja l j =
m] 2 X N ;1
m=0 j X N ;1
l m=0
l6=m C (K B ) = ; X N ;1 j p m] 2 + ~
jP m] ; p m]j2 + kK k2 ;
H X N ;1
m=0 ja l m]j2 : l m=0
l6=m X N ;1
m=0 jp m]j2 : ja l m]j2 jp m]j2 : (10.186)
(10.187) Clearly C (K B) ;kK kH and this sum is minimum in a KarhunenLoeve basis BKL where C (K BKL) = ;kK k2 . The error (10.186) can
H
thus be rewritten
~
~
kK ; K k2 = kP ; pk2 + C (K B) ; C (K BKL):
(10.188)
H Best Basis Let D = fB g 2; be a dictionary of orthonormal bases
B = fgmg0 m<N . The error formulation (10.188) suggests de ning a \best" Karhunen-Loeve approximation as the basis that minimizes
C (K B). Since we do not know the true diagonal coe cients p m], this
cost is estimated with the regularized sample mean coe cients:
~
C (K B) = ; X N ;1
m=0 ~
jP m]j2: The covariance estimation thus proceeds as follows. (10.189) 10.6. SPECTRUM ESTIMATION 679 1. Sample means For each vector gm 2 D, we compute the sample
mean estimator of the variance in the direction of each gm 2 D:
L
1 X jhX k g ij2:
P m] = L
m
k=1 (10.190) ~
2. Regularization Regularized estimators P m] are calculated with
a local averaging or a wavelet thresholding among a particular
group of dictionary vectors.
3. Basis choice The cost of K is estimated in each basis B by
~
C (K B ) = ; X N ;1
m=0 ~
jP m]j2 (10.191) and we search for the best basis B that minimizes these costs:
~
~
C (K B ) = inf C (K B ):
(10.192)
2;
~
4. Estimation The covariance K is estimated by the operator K
~ m].
that is diagonal in B , with diagonal coe cients equal to P
Since C (K BKL) = ;kK k2 and kK k2 kp k2, to evaluate the conH
H
sistency of this algorithm, we derive from (10.188) that
~
~
kK ; K k2 kP ; p k2 + C (K BKL) ; C (K B ) :
H
kK k2
kp k2
C (K BKL)
H
This covariance estimator is therefore consistent if there is a probability
converging to 1 that
C (K BKL) ; C (K B ) ! 0 when N ! +1
(10.193)
C (K BKL)
and
~
kP ; p k2 ! 0 when N ! +1 :
(10.194)
kp k2
This means that the estimated best basis tends to the Karhunen-Loeve
basis and the estimated diagonal coe cients converge to the power
spectrum. The next section establishes such a result for locally stationary processes in a dictionary of local cosine bases. CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 680 Generalized Basis Search The quadratic cost C (K B) de ned in
(10.187) yields a positive pseudo-distance between any B and BKL:
d(B BKL) = C (K B) ; C (K BKL)
(10.195)
which is zero if and only if B is a Karhunen-Loeve basis. The following theorem proves that any Schur concave cost function satis es this
property. Theorem 10.15 Let K be a covariance operator and B = fgmg0 be an orthonormal basis. If (x) is strictly concave then C (K B ) = X N ;1
m=0 m<N (hKgm gmi) is minimum if and only if K is diagonal in B.
Proof 3 . Let fhm g0 m<N be a Karhunen-Loeve basis that diagonalizes
K . As in (9.18), by decomposing gm in the basis fhi g0 i<N we obtain hKgm gm i = N ;1
X jhgm hi ij2 hKhi hi i: (10.196) jhgm hi ij2 (hKhi hi i) : (10.197) i=0
PN ;1
Since i=0 jhgm hi ij2 = 1, applying the Jensen inequality (A.2) to the concave function (x) proves that
(hKgm gm i) N ;1
X
i=0 Hence
N ;1
X
m=0 (hKgm gm i) N ;1 N ;1
XX
m=0 i=0 jhgm hi ij2 (hKhi hi i) : P ;1
Since N =0 jhgm hi ij2 = 1, we derive that
m
N ;1
X
m=0 (hKgm gm i) N ;1
X
i=0 (hKhi hi i) : 10.6. SPECTRUM ESTIMATION 681 This inequality is an equality if and only if for all m (10.197) is an
equality. Since (x) is strictly concave, this is possible only if all values
hKhi hi i are equal as long as hgm hi i 6= 0. We thus derive that gm
belongs to an eigenspace of K and is thus also an eigenvector of K .
Hence, fgm g0 m<N diagonalizes K as well. The pseudo-distance (10.195) is mathematically not a true distance
since it does not satisfy the triangle inequality. The choice of a particular cost depends on the evaluation of the error when estimating the
covariance K . If (x) = ;x2 , then minimizing the pseudo-distance
(10.195) is equivalent to minimizing the Hilbert-Schmidt norm of the
estimation error (10.188). Other costs minimize other error measurements, whose properties are often more complex. The cost associated
to (x) = ; loge x can be related to the Kullback-Liebler discriminant
information 173]. The entropy (x) = ;x loge x has been used in image processing to search for approximate Karhunen-Loeve bases for face
recognition 76]. 10.6.3 Locally Stationary Processes 3 Locally stationary processes appear in many physical systems, where
random uctuations are produced by a mechanism that changes slowly
in time or which has few abrupt transitions. Such processes can be
approximated locally by stationary processes. Speech signals are locally stationary. Over short time intervals, the throat behaves like a
steady resonator that is excited by a stationary source. For a vowel the
time of stationarity is about 10;1 seconds, but it may be reduced to
10;2 seconds for a consonant. The resulting process is therefore locally
stationary over time intervals of various sizes.
A locally stationary process X is de ned qualitatively as a process
that is approximately stationary over small enough intervals, and whose
values are uncorrelated outside these intervals of stationarity. A number of mathematical characterizations of these processes have been proposed 143, 260, 266, 267, 286].
Donoho, Mallat and von Sachs 172] give an asymptotic de nition of
local stationarity for a sequence of random vectors having N samples,
with N increasing to +1. The random vector XN n] has N samples over an interval normalized to 0 1]. Its covariance is RN n m] = CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS 682 EfXN n] XN m]g and we write
CN n ] = RN n n + ] : The decorrelation property of locally stationary processes is imposed
by a uniform decay condition along for all n. There exist Q1 and
1 > 1=2 independent of N such that 8n X (1 + 2j j 1 ) jCN n ]j2 Q1 : (10.198) If XN is stationary, then CN n ] = CN ]. A local approximation of
XN with stationary processes is obtained by approximating CN n ]
over consecutive intervals with functions that depend only on . Such
approximations are precise if CN n ] has slow variations in n in each
approximation interval. This occurs when the average total variation
of CN n ] decreases quickly enough as N increases. Since XN n] are
samples separated by 1=N on 0 1], we suppose that there exist Q2 and
0 2 1 independent of N such that 8h 1 N ;h X N ;1;h
n=0 kCN n + h :] ; CN n :]k Q2 h N ;1 with kCN n + h :] ; CN n :]k2 = 2 (10.199) X jCN n + h ] ; CN n ]j2 : Processes that belong to a sequence fXN gN 2N that satis es (10.198)
and (10.199) are said to be locally stationary. Example 10.7 Simple locally stationary processes are obtained by
blending together a collection of unrelated stationary processes. Let
fXl N n]g1 l L be a collection of mutually independent Gaussian stationary processes whose covariances Rl N n n + ] = Cl N ] satisfy for
1>1
X
(1 + 2j j 1 ) jCl N ]j2 Q1 : 10.6. SPECTRUM ESTIMATION
Let fwl n]g1 l L be a family of windows wl n] 0 with
De ne the blended process XN n] = L
X
l=1 wl n] Xl N n] : PL w n]
l=1 l 683
1. (10.200) One can then verify 173] that XN satis es the local stationarity properties (10.198) and (10.199), with 2 = 1.
If the windows wl are indicator functions of intervals al al+1) in
0 N ; 1], then the blend process has L abrupt transitions. The process
XN remains locally stationary because the number of abrupt transitions
does not increase with N . Figure 10.19(a) gives an example. Best Local Cosine Basis The covariance of a circular stationary process is a circular convolution whose eigenvectors are the Fourier
vectors exp (i2 mn=N ). Since the eigenvalues are the same at the frequencies 2 m=N and ;2 m=N , we derive that cos (2 mn=N + ) is
also an eigenvector for any phase . A locally stationary process can
be locally approximated by stationary processes on appropriate intervals f al al+1)gl of sizes bl = al+1 ; al . One can thus expect that its
covariance is \almost" diagonalized in a local cosine basis constructed
on these intervals of approximate stationarity. Corollary 8.108 constructs orthonormal bases of local cosine vectors over any family of
such intervals:
r2
:
(10.201)
gl n] b cos k + 1 n ; al
2 bl
l
0 k<bl 1 l L
Local cosine bases are therefore good candidates for building approximate Karhunen-Loeve bases.
When estimating the covariance of a locally stationary process, the
position and sizes of the approximate stationarity intervals are generally not known in advance. It is therefore necessary to search for an
approximate Karhunen-Loeve basis among a dictionary of local cosine
bases, with windows of varying sizes. For this purpose, the best basis
search algorithm of Section 10.6.2 is implemented in the dictionary D of
local cosine bases de ned in Section 8.5.2. This dictionary is organized 684 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS as a tree. A family Bjp of N 2;j orthonormal cosine vectors is stored at
depth j and position p. The support of these vectors cover an interval
al al + 2;j N ] with al = q N 2;j ; 1=2: ( r Bjq = gq k j n] = gl n] 2;2N cos
j ;a
k + 1 n ;j Nl
22 ) 0 k<N 2;j : The maximum depth is j log2 N , so the dictionary includes fewer
than N log2 N local cosine vectors. The decomposition of a signal of
size N over all these vectors requires O(N log2 N ) operations. The
2
power spectrum estimation from L realizations of a locally stationary
process XN proceeds in four steps:
1. Sample means The local cosine coe cients hXN gq k j i of the L
realizations are computed. The sample mean estimators P q k j ]
of their variances are calculated with (10.190). This requires
O(L N log2 N ) operations.
2
2. Regularization The regularization of P q k j ] is computed in
each family Bjp of 2;j N cosine vectors corresponding to 0 k <
~
2;j N . A regularized estimate P q k j ] is obtained either with
a local averaging along k of P q k j ], or by thresholding the
wavelet coe cients of P q k j ] in a wavelet basis of size 2;j N .
Over the whole dictionary, this regularization is calculated with
O(N log2 N ) operations.
~
3. Basis choice The cost C (K B ) of each local cosine basis B
~
in (10.191) is an additive function of jP q k j ]j2 for the cosine
vectors gq k j in the basis B . The algorithm of Section 9.3.2
nds the best basis B that minimizes this cost with O(N log2 N )
operations.
4. Estimation The local cosine power spectrum is estimated by the
~
coe cients P q k j ] for gq k j in the best basis B .
This best basis algorithm requires O(L N log2 N ) operations to com2
~
pute a diagonal estimator KN of the covariance KN . If the regularization of the local cosine coe cients is performed with a wavelet thresholding, using a conservative threshold that is proportional to the maximum eigenvalue of the process, Donoho, Mallat and von Sachs 172] 10.6. SPECTRUM ESTIMATION 685 prove that this covariance estimation is consistent for locally stationary processes. As N goes to +1, the best local cosine basis converges
to the Karhunen-Loeve basis and the regularized variance estimators
~
converge to the power spectrum. As a result, kKN ; KN kH decreases
to 0 with a probability that converges to 1 as N goes to +1.
6
4
2
0
−2
−4
−6
0 0.2 0.4 (a) 0.6 0.8 500 500 400 400 300 300 200 200 100 1 100 0
0 0.2 0.4 0.6 0.8 1 0
0 0.2 0.4 0.6 0.8 (b)
(c)
Figure 10.19: (a): One realization of a process that is stationary on
0 0:2], 0:2 0:78] and 0:78 1]. (b): Heisenberg boxes of the best local
cosine basis computed with L = 500 realizations of this locally stationary process. Grey levels are proportional to the estimated spectrum.
(c): Best local cosine basis calculated with L = 3 realizations. Example 10.8 Let XN be a locally stationary process constructed in (10.200) by aggregating independent Gaussian stationary processes 1 686 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS with three windows wl n] that are indicator functions of the intervals
0 0:2], 0:2 0:78] and 0:78 1]. In each time interval, the power spectrum of the stationary process is composed of harmonics whose amplitude decreases when the frequency increases. Figure 10.19(a) shows
one realization of this locally stationary process.
A diagonal covariance is calculated in a best local cosine basis.
For a large number L = 500 of realizations, the regularized estima~
tor P q k j ] gives a precise estimation of the variance EfjhXN gq k j ij2g.
The time-frequency tiling of the estimated best basis is shown in Figure
10.19(b). Each rectangle is the Heisenberg box of a local cosine vector
~
gq k j of the best basis B . Its grey level is proportional to P q k j ]. As
expected, short windows are selected in the neighborhood of the transition points at 0:2 and 0:78, and larger windows are selected where the
process is stationary. Figure 10.19(c) gives the time-frequency tiling of
the best basis computed with only L = 3 realizations. The estimators
~
P q k j ] are not as precise and the estimated best basis B ~ has window
sizes that are not optimally adapted to the stationarity intervals of XN . 10.7 Problems
10.1. Linear prediction Let F n] be a zero-mean, wide-sense stationary random vector whose covariance is RF k]. We predict the
~
future F n + l] from past values fF n ; k]g0 k<N with F n + l] =
PN ;1
k=0 ak F n ; k].
~
(a) Prove that r = EfjF n + l] ; F n + l]j2 g is minimum if and
only if
1 N ;1
X
k=0 ak RF q ; k] = RF q + l] for 0 q < N:
P ;
Verify that r = RF 0] ; N=01 ak RF k + l] is the resulting
k
minimum error. Hint: use Proposition 10.1.
~
(b) Suppose that RF n] = jnj with j j < 1. Compute F n + l]
and r.
1 Let X = F + W where the signal F and the noise W are
10.2.
zero-mean, wide-sense circular stationary random vectors. Let 10.7. PROBLEMS 687 ~
~
F n] = X ? h n] and r(D ) = EfkF ; F k2 g. The minimum
risk rl ( ) is obtained with the Wiener lter (10.12). A frequency
^
selective lter h has a discrete Fourier transform h m] which can only take the values 0 or 1. Find the frequency selective lter
that minimizes r(D ). Prove that rl ( ) r(D ) 2 rl ( ).
10.3. 1 Let fgm g0 m<N be an orthonormal basis. We consider the
space Vp of signals generated by the rst p vectors fgm g0 m<p .
We want to estimate f 2 Vp from X = f + W , where W is a
white Gaussian noise of variance 2 .
~
(a) Let F = DX be the orthogonal projection of X in Vp . Prove
that the resulting risk is minimax:
r(D Vp) = rn(Vp ) = p 2 :
(b) Find the minimax estimator over the space of discrete polynomial signals of size N and degree d. Compute the minimax
risk.
1 Let F = f (n ; P ) mod N ] be the random shift process (10.15)
10.4.
obtained with a Dirac doublet f n] = n] ; n ; 1]. We want to
estimate F from X = F + W where W is a Gaussian white noise
of variance 2 = 4 N ;1 .
~
(a) Specify the Wiener lter F and prove that the resulting risk
~ k2 g 1.
satis es rl ( ) = E fkF ; F
~
(b) Show that one can de ne a thresholding estimator F whose
expected risk satis es
~
E fkF ; F k2 g 12 (2 loge N + 1) N ;1 :
10.5. 1 Let f = 1 0 P ;1] be a discrete signal of N > P samples. Let
F = f (n ; P ) mod N ] be the random shift process de ned in
(10.15). We measure X = F + W where W is a Gaussian white
noise of variance 2 .
^
~
(a) Suppose that F = F ? h. Compute the transfer function h m]
~
of the Wiener lter and resulting risk rl ( ) = EfkF ; F k2 g.
~ be the estimator obtained by thresholding the de(b) Let F
composition coe cients of each realization of F in a Haar
p
~
basis, with T =
2 log2 N . Prove that EfkF ; F k2 g
2 (2 log N + 1)2 .
e
(c) Compare the Wiener and Haar thresholding estimators when
N is large. 688 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
Let jhf gmk ij jhf gmk+1 ij for k 1 be the sorted decomposition coe cients of f in B = fgm g0 m<N . We want to estimate f
from X = f + W , where W is a Gaussian white noise of variance
2 . If jhf gm ij = 2;k=2 , compute the oracle projection risk rp
k
in (10.34) as a function of 2 and N . Give an upper bound on
p
the estimation error if we threshold at T = 2 loge N the decomposition coe cients of X . Same question if jhf gmk ij = k;1 .
Explain why the estimation is more precise in one case than in
the other.
10.7. 1 Compare the SNR and the visual quality of translation invariant hard and soft thresholding estimators in a wavelet basis,
for images contaminated by an additive Gaussian white noise.
Perform numerical experiments on the Lena, Barbara and Peppers images in WaveLab. Find the best threshold values T as
a function of the noise variance. How does the choice of wavelet
(support, number of vanishing moments, symmetry) a ect the
result?
10.8. 2 Let g(t) be a Gaussian ofP
variance 1. Let gs n] = Ks g(n=s)
where Ks is adjusted so that n s n] = 1. An adaptive smoothing of X = f + W is calculated by adapting the scale s as a
function of the abscissa: 10.6. 1 N ;1 X
~
F l] = X n] gs(l) l ; n] :
n=0 (10.202) The scale s(l) should be large where the signal f seems to be
regular, whereas it should be small if we guess that f may have
a sharp transition in the neighborhood of l.
(a) Find an algorithm that adapts s(l) depending on the noisy
data X n], and implement the adaptive smoothing (10.202).
Test your algorithm on the Piece-Polynomial and Piece-Regular
signals in WaveLab, as a function of the noise variance 2 .
(b) Compare your numerical results with a translation invariant
hard wavelet thresholding. Analyze the similarities between
your algorithm that computes s(l) and the strategy used by
the wavelet thresholding to smooth or not to smooth certain
parts of the noisy signal.
3 Let r (f T ) be the risk of an estimator of f obtained by hard
10.9.
t
thresholding with a threshold T the decomposition coe cient of 10.7. PROBLEMS 689 X = f + W in a basis B. The noise W is Gaussian white with a
variance 2 . This risk is estimated by rt (f T ) =
~
with (u) = N ;1
X
m=0 u;
2 2 (jXB m]j2 )
if u T 2 :
if u > T 2 (a) Justify intuitively the de nition of this estimator as was done
for (10.59) in the case of a soft thresholding estimator.
(b) Let (x) = (2 2 );1=2 exp(;x2 =(2 2 )). With calculations
similar to the proof of Theorem 10.5, show that rt (T );E frt (T )g = 2 T
~ N ;1h
2X
m=0 i (T ;fB m])+ (T +fB m]) : (c) Implement in Matlab an algorithm in O(N log2 N ) which
~
nds T that minimizes rt (T f ). Study numerically the per~
~ to estimate noisy signals with a hard threshformance of T
olding in a wavelet basis.
1 Let B be an orthonormal wavelet basis of the space of dis10.10.
crete signals of period N . Let D be the family that regroups all
translations of wavelets in B.
(a) Prove that D is a dyadic wavelet frame for signals of period
N.
(b) Show that an estimation by thresholding decomposition coe cients in the dyadic wavelet family D implements a translation invariant thresholding estimation in the basis B.
3 A translation invariant wavelet thresholding is equivalent to
10.11.
thresholding an undecimated wavelet frame. For images, elaborate and implement an algorithm that performs a spatial averaging of the wavelet coe cients above the threshold, by using
the geometrical information provided by multiscale edges. Coefcients should not be averaged across edges.
2 Let X = f + W where f is piecewise regular. A best ba10.12.
sis thresholding estimator is calculated with the cost function
(10.74) in a wavelet packet dictionary. Compare numerically 690 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
the results with a simpler wavelet thresholding estimator, on the
Piece-Polynomial and Piece-Regular signals in WaveLab. Find
a signal f for which a best wavelet packet thresholding yields a
smaller estimation error than a wavelet thresholding.
10.13. 2 Among signals f n] of size N we consider V = ff : kf kV
C g. Let X = f + W where W is a Gaussian white noise of
variance 2 . We de ne a linear estimator DX n] = X ? h n] with
2
^
h m] = C 2 + 4 2 NCsin( m=N )j2 :
j (10.203) Prove that the maximum risk of this estimator is close to the
minimax linear risk:
r l ( V ) r (D V ) 2 r l ( V ) :
Hint: follow the approach of the proof of Proposition 10.5.
10.14. 2 We want to estimate a signal f that belongs to an ellipsoid
( = f: N ;1
X2
2
m jfB m]j
m=0 C2 ) from X = f + W , where W is a Gaussian white noise of variance
2 . We denote x+ = max(x 0).
(a) Using Proposition 10.6 prove that the minimax linear risk on
satis es
N ;1
X
rl ( ) = 2 a m]
(10.204)
m=0 with a m] = ( m ; 1)+ , where
calculated with
N ;1
X
m=0 m( is a Lagrange multiplier
2 m ; 1)+ = C2 : (10.205) (b) By analogy to Sobolev spaces, the of signals having a discrete derivative of order s whose energy is bounded by C 2 is
de ned from the discrete Fourier transform:
= ff : N=2
X m=;N=2+1 jmj2s N ;1 jf^ m]j2 C 2g: (10.206) 10.7. PROBLEMS 691 Show that the minimax linear estimator D in is a circular
convolution DX = X ? h. Explain how to compute the
^
transfer function h m].
(c) Show that the minimax linear risk satis es
10.15. 2 rl ( ) C 2=(2s+1) 2;2=(2s+1) :
We want to estimate f 2 from Y = f ? u + W where W is a white noise of variance 2 . Suppose that is closed and bounded. We consider the quadratic convex hull QH ] in the discrete Fourier basis and x 2 QH ] such that r(x) = rinf (QH ]).
Prove that the linear estimator that achieves the minimax linear
~
risk rl ( ) in Theorem 10.12 is F = Y ? h with
^
h m] = N ;1 jx m]j2 u m] :
^
^
2 + N ;1 jx m]j2 ju m]j2
^
^ Hint: use the diagonal estimator in Proposition 10.11.
10.16. 2 Implement in WaveLab the algorithm of Section 10.5.1 that
extracts coherent structures with a pursuit of bases. Use a dictionary that is a union of a wavelet packet and a local cosine
dictionary. Apply this algorithm to the restoration of the Caruso
signal in WaveLab. Find stopping rules to improve the auditory
quality of the restored signal 92].
10.17. 1 Stationary spectrum estimation Let X n] be a zero-mean, innite size process that is wide-sense stationary. The power spec^
trum RX (!) is the Fourier transform of the covariance RX p] =
1P ;
~
EfX n] X n ; p]g. Let RX p] = N N=01;jpj X n] X n + jpj] be
n
an estimation of RX k] from a single realization of X n].
~
(a) Show that EfRX p]g = N ;jpj RX p] for jpj N .
N
~
(b) Verify that the discrete Fourier transform of RX p] is the
periodogram P m] in (10.185).
2
1
^
(c) Let h(!) = N sin(N!=2) . Prove that
sin(!=2)
Z+
^
^^
^
RX (!) h 2 Nm ;! d!:
m]g = 21 RX ?h 2Nm = 21
;
(d) Let g n] be a discrete window whose support is 0 N ; 1] and
let h(!) = jg (!)j2 . The periodogram of the windowed data
^
EfP 692 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS
is 2
X
1 N ;1 g n] X n] exp ;i2 mn
Pg m] = N
: (10.207)
N
n=0
^^
Prove that p m] = EfPg m]g = 21 RX ? h( 2 Nm ). How should
we design g n] in order to reduce the bias of this estimator of
^
RX (!)?
(e) Verify that the variance is: EfjPg k] ; p k]j2 g = 2 jd k]j2 .
Hint: use Proposition 10.14.
1 Lapped spectrum estimation Let X n] be a zero-mean, in 10.18.
nite size process that is Gaussian and wide-sense stationary. Let
^
RX (!) be the Fourier series of its covariance RX k]. We suppose
that one realization of X n] is known on ; N + ; 1]. To reduce
the variance of the spectrogram (10.207), we divide 0 N ; 1] in Q
intervals aq aq+1 ] of size M , with aq = qM ; 1=2 for 0 p < Q.
We denote by fgq k gq k the discrete local cosine vectors (8.108)
constructed with windows gq having a support aq ; aq+1 + ],
with raising and decaying pro les of constant size 2 . Since all
windows are identical but translated, jgq (!)j2 = h(!).
^
2 and P k ] = 1 PL;1 Pl k ]. Verify that
~
(a) Let Pq k] = jhX gq k ij
L l=0
~
^
p k] = EfP k]g = 21 RX ? h M k + 1 :
2
(b) Suppose that X n] has a correlation length smaller than M so
that its values on di erent intervals aq aq+1 ] can be consid~
ered as independent. Show that EfjP k];p k]j2 g = 2 jp k]j2 L;1 .
Discuss the trade-o between bias and variance in the choice
of L.
(c) Implement this power spectrum estimator in WaveLab.
10.19. 3 Adaptive spectrum estimation Problem 10.18 estimates the
power spectrum and hence the covariance K of a stationary Gaus~
sian process X n] with a diagonal operator K in a local cosine
~ are the regularized coe cients
basis. The diagonal values of K
1P ;
~
P k] = L L=01 Pl k].
l
(a) Verify with (10.186) that E n o ~
k K ; K k2 = L
H M
M
o
Xn
X
~
E jP k] ; p k]j2 + kK k2 ; L jp k]j2 :
H
k=1
k=1 10.7. PROBLEMS 693
(10.208) (b) Find a best basis algorithm that chooses the optimal window size M = 2j by minimizing an estimator of the er~
ror (10.208). Approximate p k] with P k] and nd a pro~ k] ; p k]j2 g from the data values
cedure for estimating EfjP
fPl k]g0 l<L . Remember that when they are independent
~
EfjP k] ; p k]j2 g = 2 jp k]j2 L. 694 CHAPTER 10. ESTIMATIONS ARE APPROXIMATIONS Chapter 11
Transform Coding
Reducing a liter of orange juice to a few grams of concentrated powder
is what lossy compression is about. The taste of the restored beverage
is similar to the taste of orange juice but has often lost some subtlety.
We are more interested in sounds and images, but we face the same
trade-o between quality and compression. Major applications are data
storage and transmission through channels with a limited bandwidth.
A transform coder decomposes a signal in an orthogonal basis and
quantizes the decomposition coe cients. The distortion of the restored
signal is minimized by optimizing the quantization, the basis and the bit
allocation. The basic information theory necessary for understanding
quantization properties is introduced. Distortion rate theory is rst
studied in a Bayes framework, where signals are realizations of a random
vector whose probability distribution is known a priori. This applies to
audio coding, where signals are often modeled with Gaussian processes.
Since no appropriate stochastic model exists for images, a minimax approach is used to compute the distortion rate of transform coding. Image compression algorithms in wavelet bases and cosine block
bases are described. These transform codes are improved by embedding strategies that rst provide a coarse image approximation, then
progressively re ne the quality by adding more bits. The compression
of video sequences with motion compensation and transform coding is
also explained.
695 696 CHAPTER 11. TRANSFORM CODING 11.1 Signal Compression 2
11.1.1 State of the Art Speech Speech coding is used for telephony, where it may be of limited quality but good intelligibility, and for higher quality teleconferencing. Telephone speech is limited to the frequency band 200-3400
Hz and is sampled at 8 kHz. A Pulse Code Modulation (PCM) that
quantizes each sample on 8 bits produces a code with 64 kb/s (64 103
bits per second). This can be considerably reduced by removing some
of the speech redundancy.
The production of speech signals is well understood. Model based
analysis-synthesis cod