Unformatted text preview: Math Camp 1: Functional analysis
Sayan Mukherjee + Alessandro Verri About the primer
Goal To brie y review concepts in functional analysis that
will be used throughout the course.∗ The following
concepts will be described
1. Function spaces
2. Metric spaces
3. Convergence
4. Measure
5. Dense subsets ∗ The de nitions and concepts come primarily from \Introductory Real
Analysis" by Kolmogorov and Fomin (highly recommended). 6.
7.
8.
9.
10.
11.
12.
13.
14.
15. Separable spaces
Complete metric spaces
Compact metric spaces
Linear spaces
Linear functionals
Norms and seminorms of linear spaces
Convergence revisited
Euclidean spaces
Orthogonality and bases
Hilbert spaces 16. Delta functions
17. Fourier transform
18. Functional derivatives
19. Expectations
20. Law of large numbers Function space A function space is a space made of functions. Each
function in the space can be thought of as a point. Examples:
1.
2.
3.
, the set of all realvalued continuous functions
in the interval a, b] C a, b
, the set of all realvalued functions whose absolute value is integrable in the interval a, b]
L1 a, b
, the set of all realvalued functions square integrable in the interval a, b] L2 a, b Note that the functions in 2 and 3 are not necessarily
continuous! Metric space
By a metric space is meant a pair (X, ρ) consisting of a
space X and a distance ρ, a singlevalued, nonnegative,
real function ρ(x, y) de ned for all x, y ∈ X which has the
following three properties:
1. ρ(x, y) = 0 i x =y 2. ρ(x, y) = ρ(y, x)
3. Triangle inequality: ρ(x, z) ≤ ρ(x, y) + ρ(y, z) Examples
1. The set of all real numbers with distance
ρ(x, y ) = x − y 
is the metric space IR1.
2. The set of all ordered ntuples
x = (x1 , ..., xn)
of real numbers with distance
( ρ x, y )= is the metric space IRn. n =1 i (xi − yi)2 3. The set of all functions satisfying the criteria
f 2 (x)dx < ∞
with distance
( ( ) ( )) = (f1(x) − f2(x))2dx
is the metric space L2(IR).
ρ f1 x , f2 x 4. The set of all probability densities with KullbackLeibler
divergence
p (x)
ρ(p1 (x), p2 (x)) = ln 1
p1 (x)dx
p2 (x)
is not a metric space. The divergence is not symmetric
ρ(p1 (x), p2(x)) = ρ(p2 (x), p1 (x)). Convergence
An open/closed sphere in a metric space S is the set of
points x ∈ IR for which
ρ(x0 , x) < r open
ρ(x0 , x) ≤ r closed.
An open sphere of radius with center x0 will be called an
neighborhood of x0, denoted O (x0).
A sequence of points {xn} = x1, x2, ..., xn, ... in a metric
space S converges to a point x ∈ S if every neighborhood
O (x) of x contains all points xn starting from a certain
integer. Given any > 0 there is an integer N such that
O (x) contains all points xn with n > N . {xn} converges
to x i
lim ρ(x, xn) = 0.
n→∞ Measure
Throughout the course we will see integrals of the form
V (f (x), y )dν (x) → V (f (x), y ) p(x)dx
ν (x) is the measure.
The concept of the measure ν (E ) of a set E is a natural
extension of the concept
1) The length l( ) of a line segment
2) The volume V (G) of a space G
3) The integral of a nonnegative function of a region in
space. Lebesgue measure
Let f be a ν measurable function (it has nite measure)
taking no more than countably many distinct values
y1, y2, ..., yn, ... Then by the Lebesgue integral of f over the set A denoted
f (x) dν,
A
we mean the quantity
ynν (An)
n
where
An = {x : x ∈ A, f (x) = yn},
provided the series is absolutely convergent. The measure
ν is the Lebesgue measure. Lebesgue integral We can compute the integral
f (x)dx
by adding up the area under the red rectangles.
2 1.8 1.6 1.4 f(x) 1.2 1 0.8 0.6 0.4 0.2 0
−10 −8 −6 −4 −2 0 x 2 4 6 8 10 Riemann integral
The more tradition form of the integral is the Riemann
integral. The intuition is that of limit of an in nite sum of
in nitesimally small rectangles,
f (x)dx =
f (xn) x.
A
n
Integrals in the Riemann sense require continuous or piecewise continuous functions, the Lebesgue from shown previously relaxes this. Thus, the integral
1
f (x)dx
0
with f : 0, 1] → IR de ned as 1 if t is rational f= 0 otherwise
does not exist in the Riemann sense. LebesgueStieltjes integral
Let F be a nondecreasing function de ned on a closed
interval a, b] and suppose F is continuous from the left at
every point a, b). F is called the generating function of
the LebesgueStieltjes measure νF .
The LebesgueStieltjes integral of a function f is denoted
by
b
f (x) dF (x)
a
which is the Lebesgue integral
f (x) dνF .
a,b]
An example of dνF is a probability density p(x)dx. Then νF
would correspond to the cumulative distribution function. Dense
Let A and B be subspaces of a metric space IR. A is said
to be dense in B if A ⊂ B. A is the closure of the subset
A. In particular A is said to be everywhere dense in IR if
A = R.
A point x ∈ IR is called a contact point of a set A ∈ IR if
every neighborhood of x contains at least on point of A.
The set of all contact points of a set A denoted by A is
called the closure of A. Examples
1. The set of all rational points is dense in the real line.
2. The set of all polynomials with rational coe cients is
dense in C a, b].
3. Let K be a positive de nite Radial Basis Function then
the functions
n
f (x) =
ciK (x − xi)
i=1
is dense in L2.
Note: A hypothesis space that is dense in L2 is a desired
property of any approximation scheme. Separable
A metric space is said to be separable if it has a countable
everywhere dense subset.
Examples:
1. The spaces IR1, IRn, L2 a, b], and C a, b] are all separable.
2. The set of real numbers is separable since the set of
rational numbers is a countable subset of the reals and
the set of rationals is is everywhere dense. Completeness
A sequence of functions fn is fundamental if
such that
∀n and m > N , ρ(fn, fm) < . ∀> 0 ∃N A metric space is complete if all fundamental sequences
converge to a point in the space.
, L1, and L2 are complete. That C2 is not complete,
instead, can be seen through a counterexample. C Incompleteness of C2
Consider the sequence of functions (n = 1, 2, ...) −1 if − 1 ≤ t < −1/n φn(t) =
nt if − 1/n ≤ t < 1/n 1 if 1/n ≤ t ≤ 1
and assume that φn converges to a continuous function φ
in the metric of C2. Let
−1 if − 1 ≤ t < 0
f (t) =
1 if 0 ≤ t ≤ 1 Incompleteness of C2 (cont.)
Clearly,
(f (t) − φ(t))2dt 1/2 ≤ (f (t) − φn(t))2dt 1/2 + (φn(t) − φ(t))2dt Now the l.h.s. term is strictly positive, because f (t) is not
continuous, while for n → ∞ we have
(f (t) − φn(t))2dt → 0.
Therefore, contrary to what assumed, φn cannot converge
to φ in the metric of C2. 1/2 . Completion of a metric space
Given a metric space IR with closure IR, a complete metric
space IR∗ is called a completion of IR if IR ⊂ IR∗ and
IR = IR∗.
Examples
1. The space of real numbers is the completion of the
space of rational numbers.
2. Let K be a positive de nite Radial Basis Function then
L2 is the completion the space of functions
( )= fx n =1 i ( ) ciK x − xi . Compact spaces
A metric space is compact i it is totally bounded and
complete.
Let IR be a metric space and any positive number. Then
a set A ⊂ IR is said to be an net for a set M ⊂ IR if for
every x ∈ M , there is at least one point a ∈ A such that
ρ(x, a) < .
Given a metric space IR and a subset M ⊂ IR suppose M
has a nite net for every > 0. Then M is said to be
totally bounded.
A compact space has a nite net for all > 0. Examples
1. In Euclidean nspace, IRn, total boundedness is equivalent to boundedness. If M ⊂ IR is bounded then M
is contained in some hypercube Q. We can partition
this hypercube into smaller hypercubes with sides of
length . The vertices of the little cubes from a nite
√
n /2net of Q.
2. This is not true for in nitedimensional spaces. The
unit sphere in l2 with constraint
∞ x2
n = 1,
n=1
is bounded but not totally bounded. Consider the
points
e1 = (1, 0, 0, ...), e2 = (0, 1, 0, 0, ...), ..., where the nth coordinate of en is one and all others are
zero. These points lie on but the distance between
√
any √ is 2. So cannot have a nite net with
two
< 2/2.
3. In nitedimensional spaces maybe totally bounded. Let
be the set of points x = (x1, ..., xn, ..) in l2 satisfying
the inequalities
1
1
x1 < 1, x2 < , ..., xn < n−1 , ...
2
2
The set called the Hilbert cube is an example of
an in nitedimensional totally bounded set. Given any
> 0, choose n such that
1<,
2n+1 2 and with each point
= (x1, ..., xn, ..)
is associate the point
x∗ = (x1, ..., xn, 0, 0, ...).
(1)
Then
∞
∞
1< 1 < .
∗) =
2<
ρ(x, x
xk
4k 2n−1 2
k=n+1
k=n
The set ∗ of all points in that satisfy (1) is totally
bounded since it is a bounded set in nspace.
x 4. The RKHS induced by a kernel K with an in nite number of positive eigenvalues that decay exponentially is
compact. In this case, our vector x = (x1, ..., xn, ..) can be written in terms of its basis functions, the eigenvectors of K . Now for the RKHS norm to be bounded
x1 < µ1, x2 < µ2, ..., xn < µn, ... and we know that µn = O(n−α). So we have the case
analogous to the Hilbert cube and we can introduce a
point
x∗ = (x1 , ..., xn, 0, 0, ...)
(2)
in a bounded nspace which can be made arbitrarily
close to x. Compactness and continuity
A family of functions φ de ned on a closed interval
is said to be uniformly bounded if for K > 0
φ(x) < K
for all x ∈ a, b] and all φ ∈ . a, b A family of functions φ is equicontinuous of for any given
> 0 there exists δ > 0 such that x − y  < δ implies
φ(x) − φ(y ) <
for all x, y ∈ a, b] and all φ ∈ .
Arzela's theorem: A necessary and su cient condition for
a family of continuous functions de ned on a closed
interval a, b] to be (relatively) compact in C a, b] is that
is uniformly bounded and equicontinuous. Linear space
A set L of elements x, y, z, ... is a linear space if the following three axioms are satis ed:
1. Any two elements x, y ∈ L uniquely determine a third
element in x + y ∈ L called the sum of x and y such
that
(a) x + y = y + x (commutativity)
(b) (x + y) + z = x + (y + z) (associativity)
(c) An element 0 ∈ L exists for which x + 0 = x for all
x∈L (d) For every x ∈ L there exists an element
with the property x + (−x) = 0 −x ∈ L 2. Any number α and any element x ∈ L uniquely determine an element αx ∈ L called the product such that
(a) α(βx) = β (αx)
(b) 1x = x
3. Addition and multiplication follow two distributive laws
(a)(α + β )x = αx + βx
(b)α(x + y) = αx + αy Linear functional
A functional, F , is a function that maps another function
to a realvalue
F : f → IR.
A linear functional de ned on a linear space L, satis es the
following two properties
1. Additive: F (f + g) = F (f ) + F (g) for all f, g ∈ L
2. Homogeneous: F (αf ) = αF (f ) Examples
1. Let IRn be a real nspace with elements x = (x1, ..., xn),
and a = (a1, ..., an) be a xed element in IRn. Then
n ( )= Fx =1 aixi i is a linear functional
2. The integral
( )] =
is a linear functional
Ff x b
a ()() f x p x dx 3. Evaluation functional: another linear functional is the Dirac delta function
( )] = f (t). δt f · Which can be written
δt f (·)] = b
a ()( ) f x δ x − t dx. 4. Evaluation functional: a positive de nite kernel in a
RKHS
Ft f (·)] = (Kt, f ) = f (t).
This is simply the reproducing property of the RKHS. Normed space
A normed space is a linear (vector) space N in which a
norm is de ned. A nonnegative function · is a norm i
∀f, g ∈ N and α ∈ IR
1. f≥ 0 and 2. f +g ≤f 3. αf = α f +
f =0 i f =0 g . Note, if all conditions are satis ed except f = 0 i
then the space has a seminorm instead of a norm. f =0 Measuring distances in a normed space
In a normed space N , the distance ρ between f and g, or
a metric, can be de ned as
ρ(f, g ) = g − f .
Note that ∀f, g, h ∈ N
1. ρ(f, g) = 0 i f = g. 2. ρ(f, g) = ρ(g, f ).
3. ρ(f, h) ≤ ρ(f, g) + ρ(g, h). Example: continuous functions
A norm in C a, b] can be established by de ning
f = max f (t).
a≤tb
The distance between two functions is then measured as
ρ(f, g ) = max g (t) − f (t).
a≤t≤b
With this metric, C a, b] is denoted as C . Examples (cont.)
A norm in L1 a, b] can be established by de ning
= a f (t)dt.
The distance between two functions is then measured as
b
ρ(f, g ) =
g (t) − f (t)dt.
a
With this metric, L1 a, b] is denoted as L1.
b f Examples (cont.)
A norm in C2 a, b] and L2 a, b] can be established by de ning
1/2 = a f 2(t)dt .
The distance between two functions now becomes
b f 1/2 ( ) = a (g(t) − f (t))2dt .
With this metric, C2 a, b] and L2 a, b] are denoted as C2
and L2 respectively.
b ρ f, g Convergence revisited
A sequence of functions fn converge to a function f almost
everywhere i
lim fn(x) = f (x)
n→+∞
A sequence of functions fn converge to a function
measure i ∀ > 0
lim µ{x : fn(x) − f (x) ≥ } = 0.
n→+∞ f in A sequence of functions fn converge to a function f uniformly i
lim sup(fn(x) − f (x)) = 0
n→+∞ x Relationship between di erent types of
convergence
In the case of bounded intervals: uniform convergence (C )
implies
• • convergence in the quadratic mean (L2) which implies
convergence in the mean (L1) which implies convergence in measure
almost everywhere convergence which implies convergence in measure. Relationship between di erent types of
convergence
That uniform convergence implies all other type of convergence is clear.
Consider L2 over a bounded interval of width A. Keeping
in mind that the function g = 1 belongs to L2 and that
g L2 = A, convergence in the quadratic mean implies convergence in the mean because for every function f ∈ L2 we
have
f L1 =
f dx =
f  · 1dx ≤ f L2 1 L2 = A f L2
A
A
and hence that f ∈ L1. Any convergence implies convergence in
measure
Convergence in measure is obtained by convergence in the
mean through Chebyshev's inequality:
For any real random variable X and t > 0,
P (X  ≥ t) ≤ E X 2 /t2].
The proof that almost everywhere convergence implies
convergence in measure is somewhat more complicated. Almost everywhere convergence does not
imply convergence in the (quadratic) mean
Over the interval 0, 1] let fn be
n x ∈ (0, 1/n]
fn =
0 otherwise
Clearly fn → 0 for all x ∈ 0, 1]. Note that each fn is
not a continuous function and that the convergence is not
uniform (the closer the x to 0, the larger n must be for
fn(x) = 0). However,
1
fn(x)dx = 1 for all n,
0
in both the Riemann or the Lebesgue sense. Convergence in the quadratic mean does
not imply convergence at all!
Over the interval (0, 1], for every n = 1, 2, ..., and i = 1, ..., n
let
i
1 i−1 < x ≤ n
n
n
fi =
0 otherwise
Clearly the sequence
n
122
n
n
n
n
f1 , f1 , f2 , ..., f1 , f2 , ...fn−1, fn , f1 +1, ..., converges to 0 both in measure and in the quadratic mean.
However, the same sequence does not converge for any x! Convergence in probability and almost
surely
Any event with probability 1 is said to happen almost
surely. A sequence of real random variables Yn converges
almost surely to a random variable Y i P (Yn → Y ) = 1.
A sequence Yn converges in probability to Y i for every
> 0, limn→∞ P (Yn − Y  > ) = 0.
Convergence almost surely implies convergence in probability.
A sequence X1, ...Xn satis es the strong law of large num1
bers if for some constant c, n n=1 Xi converges to c almost
i
surely. The sequence satis es the weak law of large num1
bers i for some constant c, n n=1 Xi converges to c in
i
probability. Euclidean space
A Euclidean space is a linear (vector) space E in which a
dot product is de ned. A real valued function (·, ·) is a dot
product i ∀f, g, h ∈ E and α ∈ IR
1. (f, g ) = (g, f )
2. (f + g, h) = (f, h∗) + (g, h) and (αf, g ) = α(f, g )
3. (f, f ) ≥ 0 and (f, f ) = 0 i f = 0. A Euclidean space becomes a normed linear space when
equipped with the norm
f = (f, f ). Orthogonal systems and bases
A set of nonzero vectors {xα} in a Euclidean space
said to be an orthogonal system if
(xα, xβ ) = 0 for α = β
and an orthonormal system if
(xα, xβ ) = 0 for α = β
(xα, xβ ) = 1 for α = β. E is An orthogonal system {xα} is called an orthogonal basis
if it is complete (the smallest closed subspace containing
{xα} is the whole space E ). A complete orthonormal system is called an orthonormal basis. Examples
1. IRn is a real nspace, the set of ntuples x = (x1, ..., xn),
y = (y1 , ..., yn). If we de ne the dot product as
(x, y) = n =1 xiyi i we get Euclidean nspace. The corresponding norms
and distances in IRn are
x ( ρ x, y )= x−y =
= n =1 x2
i i n =1 i (xi − yi)2. The vectors
e1
e2 =
= ······························· =
form an orthonormal basis in IRn.
en (1, 0, 0, ...., 0)
(0, 1, 0, ...., 0)
(0, 0, 0, ...., 1) 2. The space l2 with elements x = (x1, x2, ..., xn, ....), y =
(y1, y2, ..., yn, ....), ..., where
∞ =1 i x2 < ∞,
i ∞ 2
yi < ∞, ..., ..., =1 i becomes an in nitedimensional Euclidean space when
equipped with the dot product
∞
(x, y) = xiyi.
=1 i The simplest orthonormal basis in l2 consists of vectors
e1
=
(1, 0, 0, 0, ...)
=
(0, 1, 0, 0, ...)
e2
e3
=
(0, 0, 1, 0, ...)
e4
=
(0, 0, 0, 1, ...)
······························· there are an in nite number of these bases.
3. The space C2 a, b] consisting of all continuous functions
on a, b] equipped with the dot product
(f, g) = a f (t)g(t)dt
is another example of Euclidean space.
b An important example of orthogonal bases in this space
is the following set of functions
2
2
1, cos b πnt , sin b πnt (n = 1, 2, ...).
−a
−a Hilbert space A Hilbert space is a Euclidean space that is complete,
separable, and generally in nitedimensional.
A Hilbert space is a set H of elements f, g, ... for which
1. H is a Euclidean space equipped with a scalar product 2. H is complete with respect to metric ρ(f, g) = 3. is separable (contains a countable everywhere dense
subset)
H 4. (generally) H is in nitedimensional.
l2 f −g and L2 are examples of Hilbert spaces. The δ function
We now consider the functional which returns the value of
f ∈ C at the location t (an evaluation functional),
f ] = f (t).
Note that this functional is degenerate because it does not
depend on the entire function f , but only on the value of
f at the speci c location t.
The δ(t) is not a functional but a distribution. The δ function (cont.)
The same functional can be written as
∞
f ] = f (t) =
f (s)δ (s − t)ds.
−∞
No ordinary function exists (in L2) that behaves like δ(t),
one can think of δ(t) as a function that vanishes for t = 0
and takes in nite value at t = 0 in such a way that
∞
δ (t)dt = 1.
−∞ The δ function (cont.)
The δ function can be seen as the limit of a sequence of
ordinary functions. For example, if
1
r (t) = (U (t) − U (t − ))
is a rectangular pulse of unit area, consider the limit
∞
lim −∞ f (s)r (s − t)ds.
→0
By de nition of r this gives
1 t+ f (s)ds = f (t)
lim t
→0
because f is continuous. Fourier Transform
The Fourier Transform of a real valued function f ∈ L1 is
~
the complex valued function f (ω) de ned as
+∞
~
F f (x)] = f (ω ) =
f (x) e−jωxdx.
−∞
~
The FT f can be thought of as a representation of the
information content of f (x). The original function f can
be obtained through the inverse Fourier Transform as
1 +∞ f (ω) ejωxdω.
~
f (x) =
2π −∞ Properties
()
f ∗(t)
F (t)
f (t − t0)
f (t)ejω0 t
dnf (t)
f at dtn (−jt)nf (t)
∞
f1(τ )f2 (t − τ )dτ
−∞
∞
f ∗(τ )f (t + τ )dτ
−∞ ⇔ 1F a
⇔ F∗ ω ω
a ⇔ ()
2πf (−ω)
F (ω )e−jt0 ω
F (ω − ω0)
(jω)nF (ω)
dnF (ω ) ⇔ () ()
F (ω )2 ⇔
⇔
⇔
⇔ dω n
⇔ F1 ω F2 ω Properties
The box and the sinc
f (t) = 1 if − a ≤ t ≤ a and 0 otherwise
2 sin(aω) .
F (ω ) =
ω
4 1.2 3.5
1 3
2.5 0.8
2
1.5 0.6 1
0.4 0.5
0 0.2
−0.5
0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −1
−10 −8 −6 −4 −2 0 2 4 6 8 10 Properties
The Gaussian
() =
F (ω ) = −at2
e ft π −ω2/4a
e
.
a 1 1.4 0.9
1.2
0.8
1 0.7 0.6
0.8
0.5
0.6
0.4 0.3 0.4 0.2
0.2
0.1 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 Properties
The Laplacian and Cauchy distributions
f (t) = e−at
2a .
F (ω ) = 2
a + ω2
1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 Fourier Transform in the distribution sense
With due care, the Fourier Transform c...
View
Full Document
 Spring '04
 RuthRosenholtz
 Space, Metric space, Hilbert space

Click to edit the document details