The above is our classic closest point problem, and is optimized by
taking
θ
n
=
Q
T
a
n
(since the columns of
Q
are orthonormal). Thus
we can write the original problem (
2
) as
minimize
Q
:
M
×
r
N
X
n
=1
k
a
n

QQ
T
a
n
k
2
2
subject to
Q
T
Q
=
I
,
and then take
b
Θ
=
b
Q
T
A
.
Expanding the functional and using the fact that (
I

QQ
T
)
2
=
(
I

QQ
T
), we have
N
X
n
=1
k
a
n

QQ
T
a
n
k
2
2
=
N
X
n
=1
a
T
n
(
I

QQ
T
)
a
n
=
N
X
n
=1
k
a
n
k
2
2

a
T
n
QQ
T
a
n
.
81
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
Subscribe to view the full document.
Since the first term does not depend on
Q
, our optimization program
is equivalent to
maximize
Q
:
M
×
r
N
X
n
=1
a
T
n
QQ
T
a
n
subject to
Q
T
Q
=
I
.
Now recall that for any vector
v
,
h
v
,
v
i
= trace(
vv
T
). Thus
N
X
n
=1
a
n
QQ
T
a
n
=
N
X
n
=1
trace(
Q
T
a
n
a
T
n
Q
)
= trace
Q
T
N
X
n
=1
a
n
a
T
n
!
Q
!
= trace
Q
T
(
AA
T
)
Q
.
The matrix
AA
T
has eigenvalue decomposition
AA
T
=
U
Σ
2
U
T
,
where
U
and
Σ
come from the SVD of
A
(we will take
U
to be
M
×
M
, possible adding zeros down the diagonal of
Σ
2
). Now
trace
Q
T
(
AA
T
)
Q
= trace
Q
T
U
Σ
2
U
T
Q
= trace
W
T
Σ
2
W
,
where
W
=
U
T
Q
. Notice that
W
also has orthonormal columns, as
W
T
W
=
Q
T
UU
T
Q
=
Q
T
Q
=
I
. Thus our optimization program
has become
maximize
W
:
M
×
r
trace(
W
T
Σ
2
W
)
subject to
W
T
W
=
I
.
82
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
After we solve this, we can take any
b
Q
such that
c
W
=
U
T
b
Q
.
This last optimization program is equivalent to a simple linear pro
gram that is solvable by inspection. Let
w
1
, . . . ,
w
r
be the columns
of
W
. Then
trace(
W
T
Σ
2
W
) =
r
X
p
=1
w
T
p
Σ
2
w
p
=
r
X
p
=1
M
X
m
=1

w
p
[
m
]

2
σ
2
m
=
M
X
m
=1
h
[
m
]
σ
2
m
,
where
h
[
m
] =
r
X
p
=1

w
p
[
m
]

2
.
Notice that
h
[
m
] =
r
X
p
=1

W
[
p, m
]

2
is a sum of the squares of a
row
of
W
. Since the sum of the squares
of every column of
W
is one, the sum of the squares of every entry
in
W
must be
r
, and so
M
X
m
=1
h
[
m
] =
r.
It is clear that
h
[
m
] is nonnegative, but it also true that
h
[
m
]
≤
1.
Here is why: since the columns of
W
are orthonormal, they can be
considered as part of an orthonormal basis for
R
M
. That is, there is
a
M
×
(
M

r
) matrix
W
0
such that the
M
×
M
matrix
W
W
0
has both orthonormal columns and orthonormal rows — thus the
sum of the squares of each row are equal to one. Thus the sum of
the squares of the first
r
entries cannot be larger than this.
83
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
Subscribe to view the full document.
Thus the maximum value trace(
W
T
Σ
2
W
) can take is given by the
linear program
maximize
h
∈
R
M
M
X
m
=1
h
[
m
]
σ
2
m
subject to
M
X
m
=1
h
[
m
] =
r,
0
≤
h
[
m
]
≤
1
.
We can intuit the answer to this program. Since all of the
σ
2
m
and
all of the
h
[
m
] are positive, we want to have as much weight as
possible assigned to the largest singular values. Since the weights are
constrained to be less than 1, this simply means we “max out” the
first
r
terms; the solution to the program above is
b
h
[
m
] =
(
1
,
m
= 1
, . . . , r
0
,
m
=
r
+ 1
, . . . , M.
 Fall '08
 Staff