finds the best linear combinations in a column spaces
to approximate the columns of
A
. Since any rank
r
matrix can be
represented this way, the optimization program (
2
) is equivalent to
(
1
); if
b
Q
,
b
Θ
solve (
2
), then
b
A
=
b
Q
b
Θ
solves (
1
).
Also note that
b
Θ
=
U
T
r
U
Σ
V
T
=
I 0 Σ
V
T
,
where
I
is the
r
×
r
identity matrix, and
0
is a
r
×
(
R

r
) matrix
of zeros. This matrix of all zeros has the same effect as removing all
but the first
r
terms along the diagonal of
Σ
and all but the first
r
rows of
V
T
. Thus
b
Q
b
Θ
=
U
r
I 0 Σ
V
T
=
U
r
Σ
r
V
T
r
.
What is the error between
A
and its best rank
r
approximation
b
A
?
Well,
A

b
A
=
R
X
p
=
r
+1
σ
p
u
p
v
T
p
,
and so the error matrix has singular values
σ
r
+1
, . . . , σ
R
. Since the
Frobenius norm (squared) can be calculated by summing the squares
of the singular values,
k
A

b
A
k
2
F
=
R
X
p
=
r
+1
σ
2
p
.
73
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
Subscribe to view the full document.
Solving TLS – Part II
Recall that the TLS problem reduced to solving a problem of the
form
minimize
X
k
C

X
k
2
F
subject to
rank(
X
) =
N,
and then taking
c
Δ
=
c
X

C
. In light of our discussion above, we
take the SVD of
C
,
C
=
W
Γ
Z
T
=
N
+1
X
n
=1
γ
n
w
n
z
T
n
,
and create
c
X
by leaving out the last term in the sum above
3
:
c
X
=
N
X
n
=1
γ
n
w
n
z
T
n
.
Then
c
Δ
=
c
X

C
=

γ
N
+1
w
N
+1
z
T
N
+1
.
Now we are ready to construct the actual estimate
b
x
. Recall that we
want a vector such that
(
C
+
c
Δ
)
x

1
=
0
,
meaning
c
X
x

1
=
0
.
The null space of
c
X
is (by construction) simply the span of
z
N
+1
,
meaning we need to find a scalar
α
such that
x

1
=
α
z
N
+1
.
3
If
C
has fewer than
N
+ 1 nonzero singular values, then it is already rank
deficient, and we can take
c
X
=
C
⇒
c
Δ
=
0
.
74
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
Thus we can take
b
x
TLS
=

1
z
N
+1
[
N
+ 1]
·
z
N
+1
[1]
z
N
+1
[2]
.
.
.
z
N
+1
[
N
]
.
If it happens that
z
N
+1
(
N
+ 1) = 0, this means
d
Δ
y
=
0
, and we
would need an
x
such that
(
A
+
d
Δ
A
)
x
=
y
.
Such an
x
may or may not exist (and probably doesn’t), so in this
case there is no TLS solution.
In the special case where the smallest singular value of
C
=
A y
is not unique, i.e.
γ
1
≥
γ
2
≥
γ
q
> γ
q
+1
=
γ
q
+2
=
· · ·
=
γ
N
+1
,
for some
q < N,
then the TLS solution may not be unique. We take
Z
0
=
z
q
+1
z
q
+2
· · ·
z
N
+1
,
and try to find a vector in the span that has the right form; any
vector
x
such that
x

1
∈
Span (
{
z
q
+1
, . . . ,
z
N
+1
}
)
is equally good. All we need is a
β
such that the last entry of
Z
0
β
is equal to

1.
75
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 23:01, November 5, 2019
Subscribe to view the full document.
Principal Components Analysis
Principal Components Analysis (PCA) is a standard technique for
dimensionality reduction
of data sets. It is a way to automat
ically find a
subspace
which approximates the data.
It is used
everywhere in signal processing, machine learning, and statistics.
 Fall '08
 Staff