This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Outline
Scientiﬁc Computing: An Introductory Survey
Chapter 3 – Linear Least Squares 1 2 Existence, Uniqueness, and Conditioning 3 Prof. Michael T. Heath Least Squares Data Fitting Solving Linear Least Squares Problems Department of Computer Science
University of Illinois at UrbanaChampaign Copyright c 2002. Reproduction permitted
for noncommercial, educational use only. Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 1 / 61 Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares
Data Fitting Method of Least Squares 2 / 61 Least Squares
Data Fitting Linear Least Squares Measurement errors are inevitable in observational and
experimental sciences For linear problems, we obtain overdetermined linear
system Ax = b, with m × n matrix A, m > n Errors can be smoothed out by averaging over many
cases, i.e., taking more measurements than are strictly
necessary to determine parameters of system System is better written Ax ∼ b, since equality is usually
=
not exactly satisﬁable when m > n Resulting system is overdetermined, so usually there is no
exact solution Least squares solution x minimizes squared Euclidean
norm of residual vector r = b − Ax, In effect, higher dimensional data are projected into lower
dimensional space to suppress irrelevant detail min r
x 2
2 = min b − Ax
x 2
2 Such projection is most conveniently accomplished by
method of least squares
Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 3 / 61 Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares
Data Fitting Data Fitting Scientiﬁc Computing 4 / 61 Least Squares
Data Fitting Data Fitting
Polynomial ﬁtting Given m data points (ti , yi ), ﬁnd nvector x of parameters
that gives “best ﬁt” to model function f (t, x), f (t, x) = x1 + x2 t + x3 t2 + · · · + xn tn−1 m (yi − f (ti , x))2 min
x is linear, since polynomial linear in coefﬁcients, though
nonlinear in independent variable t i=1 Problem is linear if function f is linear in components of x, Fitting sum of exponentials f (t, x) = x1 φ1 (t) + x2 φ2 (t) + · · · + xn φn (t) f (t, x) = x1 ex2 t + · · · + xn−1 exn t where functions φj depend only on t is example of nonlinear problem Problem can be written in matrix form as Ax ∼ b, with
=
aij = φj (ti ) and bi = yi For now, we will consider only linear least squares
problems Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 5 / 61 Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares
Data Fitting Example: Data Fitting Scientiﬁc Computing 6 / 61 Least Squares
Data Fitting Example, continued
For data
t −1.0 −0.5 0.0 0.5 1.0
y
1.0
0.5 0.0 0.5 2.0
overdetermined 5 × 3 linear system is 1 −1.0 1.0 1.0
1 −0.5 0.25 x1
0.5 0.0 0.0 x2 ∼ 0.0 = b
Ax = 1
= 1
0.5
0.5 0.25 x3
1
1.0 1.0
2.0 Fitting quadratic polynomial to ﬁve data points gives linear
least squares problem 1 t1 t2 y1
1
1 t2 t2 x1
y2 2 Ax = 1 t3 t2 x2 ∼ y3 = b
= 3 1 t4 t2 x3
y4 4
1 t5 t2
y5
5 Solution, which we will see later how to compute, is
Matrix whose columns (or rows) are successive powers of
independent variable is called Vandermonde matrix x = 0.086 0.40 1.4 T so approximating polynomial is
p(t) = 0.086 + 0.4t + 1.4t2 Michael T. Heath Scientiﬁc Computing 7 / 61 Michael T. Heath Scientiﬁc Computing 8 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Least Squares
Data Fitting Example, continued Existence and Uniqueness
Orthogonality
Conditioning Existence and Uniqueness Resulting curve and original data points are shown in graph
Linear least squares problem Ax ∼ b always has solution
=
Solution is unique if, and only if, columns of A are linearly
independent, i.e., rank(A) = n, where A is m × n
If rank(A) < n, then A is rankdeﬁcient, and solution of
linear least squares problem is not unique
For now, we assume A has full column rank n
< interactive example >
Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 9 / 61 Existence and Uniqueness
Orthogonality
Conditioning Normal Equations 2
2 Scientiﬁc Computing 10 / 61 Existence and Uniqueness
Orthogonality
Conditioning Orthogonality
Vectors v1 and v2 are orthogonal if their inner product is
T
zero, v1 v2 = 0 To minimize squared Euclidean norm of residual vector
r Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Space spanned by columns of m × n matrix A,
span(A) = {Ax : x ∈ Rn }, is of dimension at most n = r T r = (b − Ax)T (b − Ax)
T T T T T = b b − 2x A b + x A Ax If m > n, b generally does not lie in span(A), so there is no
exact solution to Ax = b take derivative with respect to x and set it to 0, Vector y = Ax in span(A) closest to b in 2norm occurs
when residual r = b − Ax is orthogonal to span(A), 2AT Ax − 2AT b = 0
which reduces to n × n linear system of normal equations 0 = AT r = AT (b − Ax) AT Ax = AT b again giving system of normal equations
AT Ax = AT b Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 11 / 61 Existence and Uniqueness
Orthogonality
Conditioning Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Orthogonality, continued 12 / 61 Existence and Uniqueness
Orthogonality
Conditioning Orthogonal Projectors
Matrix P is orthogonal projector if it is idempotent
(P 2 = P ) and symmetric (P T = P ) Geometric relationships among b, r , and span(A) are
shown in diagram Orthogonal projector onto orthogonal complement
span(P )⊥ is given by P⊥ = I − P
For any vector v ,
v = (P + (I − P )) v = P v + P⊥ v
For least squares problem Ax ∼ b, if rank(A) = n, then
=
P = A(AT A)−1 AT
is orthogonal projector onto span(A), and
b = P b + P⊥ b = Ax + (b − Ax) = y + r Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 13 / 61 Existence and Uniqueness
Orthogonality
Conditioning Michael T. Heath Pseudoinverse and Condition Number Sensitivity of least squares solution to Ax ∼ b depends on
=
b as well as A If rank(A) = n, pseudoinverse is deﬁned by
A+ = (AT A)−1 AT Deﬁne angle θ between b and y = Ax by and condition number by
cos(θ) =
cond(A) = A 2 · 2 Just as condition number of square matrix measures
closeness to singularity, condition number of rectangular
matrix measures closeness to rank deﬁciency
Least squares solution of Ax ∼ b is given by x = A+ b
=
Scientiﬁc Computing y
b 2
2 = Ax 2
b2 Bound on perturbation ∆x in solution x due to perturbation
∆b in b is given by By convention, cond(A) = ∞ if rank(A) < n Michael T. Heath 14 / 61 Existence and Uniqueness
Orthogonality
Conditioning Sensitivity and Conditioning Nonsquare m × n matrix A has no inverse in usual sense A+ Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems ∆x 2
1
∆b 2
≤ cond(A)
x2
cos(θ) b 2 15 / 61 Michael T. Heath Scientiﬁc Computing 16 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Existence and Uniqueness
Orthogonality
Conditioning Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Sensitivity and Conditioning, contnued Normal Equations
Orthogonal Methods
SVD Normal Equations Method
If m × n matrix A has rank n, then symmetric n × n matrix
AT A is positive deﬁnite, so its Cholesky factorization Similarly, for perturbation E in matrix A,
∆x 2
x2 AT A = LLT [cond(A)]2 tan(θ) + cond(A) E
A 2 can be used to obtain solution x to system of normal
equations
AT Ax = AT b 2 Condition number of least squares solution is about
cond(A) if residual is small, but can be squared or
arbitrarily worse for large residual which has same solution as linear least squares problem
Ax ∼ b
=
Normal equations method involves transformations
rectangular Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 17 / 61 −→ Michael T. Heath triangular Scientiﬁc Computing
Normal Equations
Orthogonal Methods
SVD 18 / 61 Example, continued For polynomial dataﬁtting example given previously,
normal equations method gives 1 −1.0 1.0
1
1
1
1
1 1 −0.5 0.25 0.0 0.0 AT A = −1.0 −0.5 0.0 0.5 1.0 1 1.0 0.25 0.0 0.25 1.0 1
0.5 0.25
1
1.0 1.0 5.0 0.0 2.5
= 0.0 2.5 0.0 ,
2.5 0.0 2.125 1.0 1
1
1
1
1 0.5
4.0 T
A b = −1.0 −0.5 0.0 0.5 1.0 0.0 = 1.0 1.0 0.25 0.0 0.25 1.0 0.5
3.25
2.0
Michael T. Heath square Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Example: Normal Equations Method Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems −→ Cholesky factorization of symmetric positive deﬁnite matrix
AT A gives 5.0 0.0 2.5
AT A = 0.0 2.5 0.0 2.5 0.0 2.125 2.236
0
0
2.236
0
1.118
1.581
0 0
1.581
0 = LLT
=0
1.118
0
0.935
0
0
0.935
Solving lower triangular system Lz = AT b by
forwardsubstitution gives z = 1.789 0.632 1.336 T LT x Solving upper triangular system
= z by
backsubstitution gives x = 0.086 0.400 1.429 Normal Equations
Orthogonal Methods
SVD 19 / 61 Shortcomings of Normal Equations Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems T Normal Equations
Orthogonal Methods
SVD 20 / 61 Augmented System Method Information can be lost in forming AT A and AT b
For example, take Deﬁnition of residual together with orthogonality
requirement give (m + n) × (m + n) augmented system 11 0
A=
0
where is positive number smaller than I
AT √
mach Then in ﬂoatingpoint arithmetic
AT A = 2 1+
1 1
1+ 2 = A
O r
b
=
x
0 Augmented system is not positive deﬁnite, is larger than
original system, and requires storing two copies of A 11
11 But it allows greater freedom in choosing pivots in
computing LDLT or LU factorization which is singular
Sensitivity of solution is also worsened, since
cond(AT A) = [cond(A)]2
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 21 / 61 Augmented System Method, continued A
O Normal Equations
Orthogonal Methods
SVD r /α
b
=
x
0 Square matrix Q is orthogonal if QT Q = I
Multiplication of vector by orthogonal matrix preserves
Euclidean norm Reasonable rule of thumb is to take
α = max aij /1000
i,j Qv Augmented system is sometimes useful, but is far from
ideal in work and storage required
Scientiﬁc Computing 22 / 61 We seek alternative method that avoids numerical
difﬁculties of normal equations
We need numerically robust transformation that produces
easier problem without changing solution
What kind of transformation leaves least squares solution
unchanged? which allows control over relative weights of two
subsystems in choosing pivots Michael T. Heath Scientiﬁc Computing Orthogonal Transformations Introducing scaling parameter α gives system
αI
AT Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems 2
2 = (Qv )T Qv = v T QT Qv = v T v = v 2
2 Thus, multiplying both sides of least squares problem by
orthogonal matrix does not change its solution
23 / 61 Michael T. Heath Scientiﬁc Computing 24 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Triangular Least Squares Problems Triangular Least Squares Problems, continued
We have no control over second term, b2 2 , but ﬁrst term
2
becomes zero if x satisﬁes n × n triangular system As with square linear systems, suitable target in simplifying
least squares problems is triangular form Rx = b1 Upper triangular overdetermined (m > n) least squares
problem has form
R
∼b
x= 1
b2
O which can be solved by backsubstitution
Resulting x is least squares solution, and minimum sum of
squares is
r 2 = b2 2
2
2 where R is n × n upper triangular and b is partitioned
similarly So our strategy is to transform general least squares
problem to triangular form using orthogonal transformation
so that least squares solution is preserved Residual is
2
2 r = b1 − Rx 2
2 + b2 Michael T. Heath 2
2 Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 25 / 61 Scientiﬁc Computing
Normal Equations
Orthogonal Methods
SVD 26 / 61 Orthogonal Bases Given m × n matrix A, with m > n, we seek m × m
orthogonal matrix Q such that If we partition m × m orthogonal matrix Q = [Q1 Q2 ],
where Q1 is m × n, then R
O A=Q A=Q where R is n × n and upper triangular
Linear least squares problem Ax ∼ b is then transformed
=
into triangular least squares problem
QT Ax = R
c
x∼ 1
=
O
c2 = b − Ax 2
2 = b−Q R
x
O R
R
= [Q1 Q2 ]
= Q1 R
O
O is called reduced QR factorization of A
Columns of Q1 are orthonormal basis for span(A), and
columns of Q2 are orthonormal basis for span(A)⊥ = QT b Q1 QT is orthogonal projector onto span(A)
1
∼
Solution to least squares problem Ax = b is given by which has same solution, since
2
2 Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems QR Factorization r Normal Equations
Orthogonal Methods
SVD solution to square system
2
2 = QT b − Michael T. Heath 2
2 Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems R
x
O QT Ax = Rx = c1 = QT b
1
1 Normal Equations
Orthogonal Methods
SVD 27 / 61 Computing QR Factorization Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 28 / 61 Householder Transformations
Householder transformation has form To compute QR factorization of m × n matrix A, with
m > n, we annihilate subdiagonal entries of successive
columns of A, eventually reaching upper triangular form H =I −2 vv T
vT v for nonzero vector v
H is orthogonal and symmetric: H = H T = H −1
Given vector a, we want to choose v so that α
1
0
0
Ha = . = α . = αe1
.
.
.
.
0
0 Similar to LU factorization by Gaussian elimination, but use
orthogonal transformations instead of elementary
elimination matrices
Possible methods include
Householder transformations
Givens rotations Substituting into formula for H , we can take GramSchmidt orthogonalization v = a − α e1
and α = ± a 2 , with sign chosen to avoid cancellation Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 29 / 61 Example: Householder Transformation
If a = 2 1 2 Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 30 / 61 Householder QR Factorization T , then we take 2
1
2
α
v = a − αe1 = 1 − α 0 = 1 − 0 2
0
2
0 To compute QR factorization of A, use Householder
transformations to annihilate subdiagonal entries of each
successive column
Each Householder transformation is applied to entire
matrix, but does not affect prior columns, so zeros are
preserved where α = ± a 2 = ±3
Since a1 is positive, we negative sign for α to avoid choose 2
−3
5
cancellation, so v = 1 − 0 = 1
2
0
2
To conﬁrm that transformation works, 2
5
−3
vT a
1 − 2 15 1 = 0
Ha = a − 2 T v =
vv
30
2
2
0 In applying Householder transformation H to arbitrary
vector u,
Hu = Scientiﬁc Computing vv T
vT v u=u− 2 vT u
vT v v which is much cheaper than general matrixvector
multiplication and requires only vector v , not full matrix H < interactive example >
Michael T. Heath I −2 31 / 61 Michael T. Heath Scientiﬁc Computing 32 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Householder QR Factorization, continued Normal Equations
Orthogonal Methods
SVD Householder QR Factorization, continued Process just described produces factorization
Hn · · · H1 A = For solving linear least squares problem, product Q of
Householder transformations need not be formed explicitly R
O R can be stored in upper triangle of array initially
containing A where R is n × n and upper triangular
If Q = H1 · · · Hn , then A = Q R
O Householder vectors v can be stored in (now zero) lower
triangular portion of A (almost) To preserve solution of linear least squares problem,
righthand side b is transformed by same sequence of
Householder transformations
Then solve triangular least squares problem
Michael T. Heath R
x ∼ QT b
=
O Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Householder transformations most easily applied in this
form anyway Normal Equations
Orthogonal Methods
SVD 33 / 61 Example: Householder QR Factorization Scientiﬁc Computing Normal Equations
Orthogonal Methods
SVD 34 / 61 Applying resulting Householder transformation H1 yields
transformed matrix and righthand side −1.789
−2.236
0
−1.118
−0.362
0
−0.191 −0.405 0.309 −0.655 , H1 b = −0.862
H1 A = 0 −0.362
0
0.809 −0.405
1.138
0
1.309
0.345 Householder vector v1 for annihilating subdiagonal entries
of ﬁrst column of A is 1
−2.236
3.236
1 0 1 v1 = 1 − 0 = 1 1 0 1 1
0
1
Michael T. Heath Scientiﬁc Computing Example, continued For polynomial dataﬁtting example given previously, with 1 −1.0 1.0
1.0
1 −0.5 0.25
0.5 0.0 0.0 , b = 0.0
A = 1 1
0.5
0.5 0.25
2.0
1
1.0 1.0 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Householder vector v2 for annihilating subdiagonal entries
of second column of H1 A
is 0
0
0
−0.191 1.581 −1.772 v2 = 0.309 − 0 = 0.309 0.809 0 0.809
1.309
0
1.309 Normal Equations
Orthogonal Methods
SVD 35 / 61 Example, continued Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 36 / 61 Example, continued Applying resulting Householder transformation H2 yields −1.789
−2.236
0
−1.118 0.632
0
1.581
0 0
−0.725 , H2 H1 b = −1.035
H2 H1 A = 0 −0.816
0 0
−0.589
0.404
0
0
0.047 Applying resulting Householder transformation H3 yields −1.789
−2.236
0
−1.118 0.632
0
1.581
0 0
0.935 , H3 H2 H1 b = 1.336
H3 H2 H1 A = 0 0 0.026
0
0
0.337
0
0
0 Householder vector v3 for annihilating subdiagonal entries
of third column of H2 H1 A is 0
0
0 0 0 0 v3 = −0.725 − 0.935 = −1.660 −0.589 0 −0.589
0.047
0
0.047
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Now solve upper triangular system Rx = c1 by
backsubstitution to obtain x = 0.086 0.400 1.429 T < interactive example > Normal Equations
Orthogonal Methods
SVD 37 / 61 Givens Rotations Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 38 / 61 Givens Rotations, continued Givens rotations introduce zeros one at a time
Given vector a1 a2 T , choose scalars c and s so that
Backsubstitution then gives cs
−s c a1
α
=
a2
0 with c2 + s2 = 1, or equivalently, α = s= Finally, c2 + s2 = 1, or α = c
α
=
s
0 c= Gaussian elimination yields triangular system
a1
a2
0 −a1 − a2 /a1
2
Michael T. Heath and c= αa1
a2 + a2
1
2 a2 + a2
1
2 Previous equation can be rewritten
a1
a2
a2 −a1 αa2
a2 + a2
1
2 a1
a2 + a2
1
2 a2 + a2 , implies
1
2
and s= a2
a2 + a2
1
2 c
α
=
s
−αa2 /a1
Scientiﬁc Computing 39 / 61 Michael T. Heath Scientiﬁc Computing 40 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Example: Givens Rotation
Let a = 4 3 Givens QR Factorization T More generally, to annihilate selected component of vector
in n dimensions, rotate target component with another
component a1
a1
1
0000
0
c 0 s 0 a2 α 0
0 1 0 0 a3 = a3 0 −s 0 c 0 a4 0 a5
a5
0
0001 To annihilate second entry we compute cosine and sine
c= a1
a2 + a2
1
2 = 4
= 0.8
5 and s = a2
a2 + a2
1
2 = 3
= 0.6
5 Rotation is then given by
G= cs
0.8 0.6
=
−s c
−0.6 0.8 By systematically annihilating successive entries, we can
reduce matrix to upper triangular form using sequence of
Givens rotations To conﬁrm that rotation works,
Ga = Normal Equations
Orthogonal Methods
SVD 0.8 0.6
−0.6 0.8 4
5
=
3
0 Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Each rotation is orthogonal, so their product is orthogonal,
producing QR factorization Normal Equations
Orthogonal Methods
SVD 41 / 61 Givens QR Factorization Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 42 / 61 GramSchmidt Orthogonalization
Given vectors a1 and a2 , we seek orthonormal vectors q1
and q2 having same span Straightforward implementation of Givens method requires
about 50% more work than Householder method, and also
requires more storage, since each rotation requires two
numbers, c and s, to deﬁne it This can be accomplished by subtracting from second
vector its projection onto ﬁrst vector and normalizing both
resulting vectors, as shown in diagram These disadvantages can be overcome, but requires more
complicated implementation
Givens can be advantageous for computing QR
factorization when many entries of matrix are already zero,
since those annihilations can then be skipped
< interactive example >
< interactive example >
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems 43 / 61 Normal Equations
Orthogonal Methods
SVD GramSchmidt Orthogonalization Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 44 / 61 Modiﬁed GramSchmidt Process can be extended to any number of vectors
a1 , . . . , ak , orthogonalizing each successive vector against
all preceding ones, giving classical GramSchmidt
procedure Classical GramSchmidt procedure often suffers loss of
orthogonality in ﬁniteprecision for k = 1 to n
qk = ak
for j = 1 to k − 1
T
rjk = qj ak
qk = qk − rjk qj
end
rkk = qk 2
qk = qk /rkk
end Also, separate storage is required for A, Q, and R, since
original ak are needed in inner loop, so qk cannot overwrite
columns of A
Both deﬁciencies are improved by modiﬁed GramSchmidt
procedure, with each vector orthogonalized in turn against
all subsequent vectors, so qk can overwrite ak Resulting qk and rjk form reduced QR factorization of A
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 45 / 61 Modiﬁed GramSchmidt QR Factorization Scientiﬁc Computing
Normal Equations
Orthogonal Methods
SVD 46 / 61 Rank Deﬁciency Modiﬁed GramSchmidt algorithm If rank(A) < n, then QR factorization still exists, but yields
singular upper triangular factor R, and multiple vectors x
give minimum residual norm for k = 1 to n
rkk = ak 2
qk = ak /rkk
for j = k + 1 to n
T
rkj = qk aj
aj = aj − rkj qk
end
end Common practice selects minimum residual solution x
having smallest norm
Can be computed by QR factorization with column pivoting
or by singular value decomposition (SVD)
Rank of matrix is often not clear cut in practice, so relative
tolerance is used to determine rank < interactive example > Michael T. Heath Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Scientiﬁc Computing 47 / 61 Michael T. Heath Scientiﬁc Computing 48 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Example: Near Rank Deﬁciency QR with Column Pivoting Consider 3 × 2 matrix Instead of processing columns in natural order, select for
reduction at each stage column of remaining unreduced
submatrix having maximum Euclidean norm 0.641 0.242
A = 0.321 0.121
0.962 0.363 If rank(A) = k < n, then after k steps, norms of remaining
unreduced columns will be zero (or “negligible” in
ﬁniteprecision arithmetic) below row k Computing QR factorization,
R= 1.1997 0.4527
0
0.0002 Yields orthogonal factorization of form R is extremely close to singular (exactly singular to 3digit
accuracy of problem statement)
If R is used to solve linear least squares problem, result is
highly sensitive to perturbations in righthand side
For practical purposes, rank(A) = 1 rather than 2, because
columns are nearly linearly dependent
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems QT AP = RS
OO where R is k × k , upper triangular, and nonsingular, and
permutation matrix P performs column interchanges Normal Equations
Orthogonal Methods
SVD 49 / 61 Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems QR with Column Pivoting, continued Normal Equations
Orthogonal Methods
SVD 50 / 61 Singular Value Decomposition Basic solution to least squares problem Ax ∼ b can now
=
be computed by solving triangular system Rz = c1 , where
c1 contains ﬁrst k components of QT b, and then taking
x=P Normal Equations
Orthogonal Methods
SVD Singular value decomposition (SVD) of m × n matrix A has
form
A = U ΣV T z
0 where U is m × m orthogonal matrix, V is n × n
orthogonal matrix, and Σ is m × n diagonal matrix, with Minimumnorm solution can be computed, if desired, at
expense of additional processing to annihilate S
rank(A) is usually unknown, so rank is determined by
monitoring norms of remaining unreduced columns and
terminating factorization when maximum value falls below
chosen tolerance Diagonal entries σi , called singular values of A, are
usually ordered so that σ1 ≥ σ2 ≥ · · · ≥ σn
Columns ui of U and vi of V are called left and right
singular vectors < interactive example >
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems 0
for i = j
σi ≥ 0 for i = j σij = Normal Equations
Orthogonal Methods
SVD 51 / 61 Example: SVD Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 52 / 61 Applications of SVD
Minimum norm solution to Ax ∼ b is given by
= 123
4 5 6
T SVD of A = 7 8 9 is given by U ΣV =
10 11 12 .141
.825 −.420 −.351 25.5
0
.344
.426
.298
.782 0
1.29 .547 .0278
.664 −.509 0
0
.750 −.371 −.542 .0790
0
0 x=
σi =0 0
.504
.574 .644
0 −.761 −.057 .646
0
.408 −.816 .408
0 uT b
i
vi
σi For illconditioned or rank deﬁcient problems, “small”
singular values can be omitted from summation to stabilize
solution
Euclidean matrix norm : A 2 = σmax Euclidean condition number of matrix : cond(A) =
< interactive example > σmax
σmin Rank of matrix : number of nonzero singular values
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 53 / 61 Pseudoinverse Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD 54 / 61 Orthogonal Bases Deﬁne pseudoinverse of scalar σ to be 1/σ if σ = 0, zero
otherwise
Deﬁne pseudoinverse of (possibly rectangular) diagonal
matrix by transposing and taking scalar pseudoinverse of
each entry
Then pseudoinverse of general real m × n matrix A is
given by
A+ = V Σ+ U T SVD of matrix, A = U ΣV T , provides orthogonal bases for
subspaces relevant to A
Columns of U corresponding to nonzero singular values
form orthonormal basis for span(A)
Remaining columns of U form orthonormal basis for
orthogonal complement span(A)⊥
Columns of V corresponding to zero singular values form
orthonormal basis for null space of A Pseudoinverse always exists whether or not matrix is
square or has full rank
If A is square and nonsingular, then A+ = A−1
In all cases, minimumnorm solution to Ax ∼ b is given by
=
x = A+ b
Michael T. Heath Scientiﬁc Computing Remaining columns of V form orthonormal basis for
orthogonal complement of null space of A
55 / 61 Michael T. Heath Scientiﬁc Computing 56 / 61 Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Normal Equations
Orthogonal Methods
SVD Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems LowerRank Matrix Approximation Normal Equations
Orthogonal Methods
SVD Total Least Squares Another way to write SVD is
A = U ΣV T = σ1 E1 + σ2 E2 + · · · + σn En Ordinary least squares is applicable when righthand side
b is subject to random error but matrix A is known
accurately T
with Ei = ui vi
Ei has rank 1 and can be stored using only m + n storage
locations
Product Ei x can be computed using only m + n
multiplications
Condensed approximation to A is obtained by omitting
from summation terms corresponding to small singular
values
Approximation using k largest singular values is closest
matrix of rank k to A
Approximation is useful in image processing, data
compression, information retrieval, cryptography, etc.
< interactive example >
Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems When all data, including A, are subject to error, then total
least squares is more appropriate
Total least squares minimizes orthogonal distances, rather
than vertical distances, between model and data
Total least squares solution can be computed from SVD of
[A, b] Normal Equations
Orthogonal Methods
SVD 57 / 61 Comparison of Methods Scientiﬁc Computing
Normal Equations
Orthogonal Methods
SVD 58 / 61 Comparison of Methods, continued Forming normal equations matrix AT A requires about
n2 m/2 multiplications, and solving resulting symmetric
linear system requires about n3 /6 multiplications Normal equations method produces solution whose
relative error is proportional to [cond(A)]2
Required Cholesky factorization can be expected to break
√
down if cond(A) ≈ 1/ mach or worse Solving least squares problem using Householder QR
factorization requires about mn2 − n3 /3 multiplications Householder method produces solution whose relative
error is proportional to If m ≈ n, both methods require about same amount of
work cond(A) + r If m
n, Householder QR requires about twice as much
work as normal equations
mn2 Michael T. Heath Scientiﬁc Computing Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems Householder method can be expected to break down (in
backsubstitution phase) only if cond(A) ≈ 1/ mach or worse Normal Equations
Orthogonal Methods
SVD 59 / 61 Comparison of Methods, continued
Householder is more accurate and more broadly
applicable than normal equations
These advantages may not be worth additional cost,
however, when problem is sufﬁciently well conditioned that
normal equations provide sufﬁcient accuracy
For rankdeﬁcient or nearly rankdeﬁcient problems,
Householder with column pivoting can produce useful
solution when normal equations method fails outright
SVD is even more robust and reliable than Householder,
but substantially more expensive Scientiﬁc Computing 2
2 [cond(A)] which is best possible, since this is inherent sensitivity of
solution to least squares problem n3 , Cost of SVD is proportional to
+
with
proportionality constant ranging from 4 to 10, depending on
algorithm used Michael T. Heath Michael T. Heath
Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems 61 / 61 Michael T. Heath Scientiﬁc Computing 60 / 61 ...
View
Full
Document
This note was uploaded on 10/16/2011 for the course MECHANICAL 581 taught by Professor Wasfy during the Fall '11 term at IUPUI.
 Fall '11
 Wasfy
 Mechanical Engineering

Click to edit the document details