chap03_8up - Least Squares Data Fitting Existence...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Outline Scientific Computing: An Introductory Survey Chapter 3 – Linear Least Squares 1 2 Existence, Uniqueness, and Conditioning 3 Prof. Michael T. Heath Least Squares Data Fitting Solving Linear Least Squares Problems Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial, educational use only. Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 1 / 61 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Method of Least Squares 2 / 61 Least Squares Data Fitting Linear Least Squares Measurement errors are inevitable in observational and experimental sciences For linear problems, we obtain overdetermined linear system Ax = b, with m × n matrix A, m > n Errors can be smoothed out by averaging over many cases, i.e., taking more measurements than are strictly necessary to determine parameters of system System is better written Ax ∼ b, since equality is usually = not exactly satisfiable when m > n Resulting system is overdetermined, so usually there is no exact solution Least squares solution x minimizes squared Euclidean norm of residual vector r = b − Ax, In effect, higher dimensional data are projected into lower dimensional space to suppress irrelevant detail min r x 2 2 = min b − Ax x 2 2 Such projection is most conveniently accomplished by method of least squares Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 3 / 61 Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Data Fitting Scientific Computing 4 / 61 Least Squares Data Fitting Data Fitting Polynomial fitting Given m data points (ti , yi ), find n-vector x of parameters that gives “best fit” to model function f (t, x), f (t, x) = x1 + x2 t + x3 t2 + · · · + xn tn−1 m (yi − f (ti , x))2 min x is linear, since polynomial linear in coefficients, though nonlinear in independent variable t i=1 Problem is linear if function f is linear in components of x, Fitting sum of exponentials f (t, x) = x1 φ1 (t) + x2 φ2 (t) + · · · + xn φn (t) f (t, x) = x1 ex2 t + · · · + xn−1 exn t where functions φj depend only on t is example of nonlinear problem Problem can be written in matrix form as Ax ∼ b, with = aij = φj (ti ) and bi = yi For now, we will consider only linear least squares problems Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 5 / 61 Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Example: Data Fitting Scientific Computing 6 / 61 Least Squares Data Fitting Example, continued For data t −1.0 −0.5 0.0 0.5 1.0 y 1.0 0.5 0.0 0.5 2.0 overdetermined 5 × 3 linear system is 1 −1.0 1.0 1.0 1 −0.5 0.25 x1 0.5 0.0 0.0 x2 ∼ 0.0 = b Ax = 1 = 1 0.5 0.5 0.25 x3 1 1.0 1.0 2.0 Fitting quadratic polynomial to five data points gives linear least squares problem 1 t1 t2 y1 1 1 t2 t2 x1 y2 2 Ax = 1 t3 t2 x2 ∼ y3 = b = 3 1 t4 t2 x3 y4 4 1 t5 t2 y5 5 Solution, which we will see later how to compute, is Matrix whose columns (or rows) are successive powers of independent variable is called Vandermonde matrix x = 0.086 0.40 1.4 T so approximating polynomial is p(t) = 0.086 + 0.4t + 1.4t2 Michael T. Heath Scientific Computing 7 / 61 Michael T. Heath Scientific Computing 8 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Least Squares Data Fitting Example, continued Existence and Uniqueness Orthogonality Conditioning Existence and Uniqueness Resulting curve and original data points are shown in graph Linear least squares problem Ax ∼ b always has solution = Solution is unique if, and only if, columns of A are linearly independent, i.e., rank(A) = n, where A is m × n If rank(A) < n, then A is rank-deficient, and solution of linear least squares problem is not unique For now, we assume A has full column rank n < interactive example > Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 9 / 61 Existence and Uniqueness Orthogonality Conditioning Normal Equations 2 2 Scientific Computing 10 / 61 Existence and Uniqueness Orthogonality Conditioning Orthogonality Vectors v1 and v2 are orthogonal if their inner product is T zero, v1 v2 = 0 To minimize squared Euclidean norm of residual vector r Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Space spanned by columns of m × n matrix A, span(A) = {Ax : x ∈ Rn }, is of dimension at most n = r T r = (b − Ax)T (b − Ax) T T T T T = b b − 2x A b + x A Ax If m > n, b generally does not lie in span(A), so there is no exact solution to Ax = b take derivative with respect to x and set it to 0, Vector y = Ax in span(A) closest to b in 2-norm occurs when residual r = b − Ax is orthogonal to span(A), 2AT Ax − 2AT b = 0 which reduces to n × n linear system of normal equations 0 = AT r = AT (b − Ax) AT Ax = AT b again giving system of normal equations AT Ax = AT b Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 11 / 61 Existence and Uniqueness Orthogonality Conditioning Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Orthogonality, continued 12 / 61 Existence and Uniqueness Orthogonality Conditioning Orthogonal Projectors Matrix P is orthogonal projector if it is idempotent (P 2 = P ) and symmetric (P T = P ) Geometric relationships among b, r , and span(A) are shown in diagram Orthogonal projector onto orthogonal complement span(P )⊥ is given by P⊥ = I − P For any vector v , v = (P + (I − P )) v = P v + P⊥ v For least squares problem Ax ∼ b, if rank(A) = n, then = P = A(AT A)−1 AT is orthogonal projector onto span(A), and b = P b + P⊥ b = Ax + (b − Ax) = y + r Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 13 / 61 Existence and Uniqueness Orthogonality Conditioning Michael T. Heath Pseudoinverse and Condition Number Sensitivity of least squares solution to Ax ∼ b depends on = b as well as A If rank(A) = n, pseudoinverse is defined by A+ = (AT A)−1 AT Define angle θ between b and y = Ax by and condition number by cos(θ) = cond(A) = A 2 · 2 Just as condition number of square matrix measures closeness to singularity, condition number of rectangular matrix measures closeness to rank deficiency Least squares solution of Ax ∼ b is given by x = A+ b = Scientific Computing y b 2 2 = Ax 2 b2 Bound on perturbation ∆x in solution x due to perturbation ∆b in b is given by By convention, cond(A) = ∞ if rank(A) < n Michael T. Heath 14 / 61 Existence and Uniqueness Orthogonality Conditioning Sensitivity and Conditioning Nonsquare m × n matrix A has no inverse in usual sense A+ Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems ∆x 2 1 ∆b 2 ≤ cond(A) x2 cos(θ) b 2 15 / 61 Michael T. Heath Scientific Computing 16 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Existence and Uniqueness Orthogonality Conditioning Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Sensitivity and Conditioning, contnued Normal Equations Orthogonal Methods SVD Normal Equations Method If m × n matrix A has rank n, then symmetric n × n matrix AT A is positive definite, so its Cholesky factorization Similarly, for perturbation E in matrix A, ∆x 2 x2 AT A = LLT [cond(A)]2 tan(θ) + cond(A) E A 2 can be used to obtain solution x to system of normal equations AT Ax = AT b 2 Condition number of least squares solution is about cond(A) if residual is small, but can be squared or arbitrarily worse for large residual which has same solution as linear least squares problem Ax ∼ b = Normal equations method involves transformations rectangular Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 17 / 61 −→ Michael T. Heath triangular Scientific Computing Normal Equations Orthogonal Methods SVD 18 / 61 Example, continued For polynomial data-fitting example given previously, normal equations method gives 1 −1.0 1.0 1 1 1 1 1 1 −0.5 0.25 0.0 0.0 AT A = −1.0 −0.5 0.0 0.5 1.0 1 1.0 0.25 0.0 0.25 1.0 1 0.5 0.25 1 1.0 1.0 5.0 0.0 2.5 = 0.0 2.5 0.0 , 2.5 0.0 2.125 1.0 1 1 1 1 1 0.5 4.0 T A b = −1.0 −0.5 0.0 0.5 1.0 0.0 = 1.0 1.0 0.25 0.0 0.25 1.0 0.5 3.25 2.0 Michael T. Heath square Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Example: Normal Equations Method Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems −→ Cholesky factorization of symmetric positive definite matrix AT A gives 5.0 0.0 2.5 AT A = 0.0 2.5 0.0 2.5 0.0 2.125 2.236 0 0 2.236 0 1.118 1.581 0 0 1.581 0 = LLT =0 1.118 0 0.935 0 0 0.935 Solving lower triangular system Lz = AT b by forward-substitution gives z = 1.789 0.632 1.336 T LT x Solving upper triangular system = z by back-substitution gives x = 0.086 0.400 1.429 Normal Equations Orthogonal Methods SVD 19 / 61 Shortcomings of Normal Equations Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems T Normal Equations Orthogonal Methods SVD 20 / 61 Augmented System Method Information can be lost in forming AT A and AT b For example, take Definition of residual together with orthogonality requirement give (m + n) × (m + n) augmented system 11 0 A= 0 where is positive number smaller than I AT √ mach Then in floating-point arithmetic AT A = 2 1+ 1 1 1+ 2 = A O r b = x 0 Augmented system is not positive definite, is larger than original system, and requires storing two copies of A 11 11 But it allows greater freedom in choosing pivots in computing LDLT or LU factorization which is singular Sensitivity of solution is also worsened, since cond(AT A) = [cond(A)]2 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 21 / 61 Augmented System Method, continued A O Normal Equations Orthogonal Methods SVD r /α b = x 0 Square matrix Q is orthogonal if QT Q = I Multiplication of vector by orthogonal matrix preserves Euclidean norm Reasonable rule of thumb is to take α = max |aij |/1000 i,j Qv Augmented system is sometimes useful, but is far from ideal in work and storage required Scientific Computing 22 / 61 We seek alternative method that avoids numerical difficulties of normal equations We need numerically robust transformation that produces easier problem without changing solution What kind of transformation leaves least squares solution unchanged? which allows control over relative weights of two subsystems in choosing pivots Michael T. Heath Scientific Computing Orthogonal Transformations Introducing scaling parameter α gives system αI AT Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems 2 2 = (Qv )T Qv = v T QT Qv = v T v = v 2 2 Thus, multiplying both sides of least squares problem by orthogonal matrix does not change its solution 23 / 61 Michael T. Heath Scientific Computing 24 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Triangular Least Squares Problems Triangular Least Squares Problems, continued We have no control over second term, b2 2 , but first term 2 becomes zero if x satisfies n × n triangular system As with square linear systems, suitable target in simplifying least squares problems is triangular form Rx = b1 Upper triangular overdetermined (m > n) least squares problem has form R ∼b x= 1 b2 O which can be solved by back-substitution Resulting x is least squares solution, and minimum sum of squares is r 2 = b2 2 2 2 where R is n × n upper triangular and b is partitioned similarly So our strategy is to transform general least squares problem to triangular form using orthogonal transformation so that least squares solution is preserved Residual is 2 2 r = b1 − Rx 2 2 + b2 Michael T. Heath 2 2 Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 25 / 61 Scientific Computing Normal Equations Orthogonal Methods SVD 26 / 61 Orthogonal Bases Given m × n matrix A, with m > n, we seek m × m orthogonal matrix Q such that If we partition m × m orthogonal matrix Q = [Q1 Q2 ], where Q1 is m × n, then R O A=Q A=Q where R is n × n and upper triangular Linear least squares problem Ax ∼ b is then transformed = into triangular least squares problem QT Ax = R c x∼ 1 = O c2 = b − Ax 2 2 = b−Q R x O R R = [Q1 Q2 ] = Q1 R O O is called reduced QR factorization of A Columns of Q1 are orthonormal basis for span(A), and columns of Q2 are orthonormal basis for span(A)⊥ = QT b Q1 QT is orthogonal projector onto span(A) 1 ∼ Solution to least squares problem Ax = b is given by which has same solution, since 2 2 Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems QR Factorization r Normal Equations Orthogonal Methods SVD solution to square system 2 2 = QT b − Michael T. Heath 2 2 Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems R x O QT Ax = Rx = c1 = QT b 1 1 Normal Equations Orthogonal Methods SVD 27 / 61 Computing QR Factorization Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 28 / 61 Householder Transformations Householder transformation has form To compute QR factorization of m × n matrix A, with m > n, we annihilate subdiagonal entries of successive columns of A, eventually reaching upper triangular form H =I −2 vv T vT v for nonzero vector v H is orthogonal and symmetric: H = H T = H −1 Given vector a, we want to choose v so that α 1 0 0 Ha = . = α . = αe1 . . . . 0 0 Similar to LU factorization by Gaussian elimination, but use orthogonal transformations instead of elementary elimination matrices Possible methods include Householder transformations Givens rotations Substituting into formula for H , we can take Gram-Schmidt orthogonalization v = a − α e1 and α = ± a 2 , with sign chosen to avoid cancellation Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 29 / 61 Example: Householder Transformation If a = 2 1 2 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 30 / 61 Householder QR Factorization T , then we take 2 1 2 α v = a − αe1 = 1 − α 0 = 1 − 0 2 0 2 0 To compute QR factorization of A, use Householder transformations to annihilate subdiagonal entries of each successive column Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved where α = ± a 2 = ±3 Since a1 is positive, we negative sign for α to avoid choose 2 −3 5 cancellation, so v = 1 − 0 = 1 2 0 2 To confirm that transformation works, 2 5 −3 vT a 1 − 2 15 1 = 0 Ha = a − 2 T v = vv 30 2 2 0 In applying Householder transformation H to arbitrary vector u, Hu = Scientific Computing vv T vT v u=u− 2 vT u vT v v which is much cheaper than general matrix-vector multiplication and requires only vector v , not full matrix H < interactive example > Michael T. Heath I −2 31 / 61 Michael T. Heath Scientific Computing 32 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Householder QR Factorization, continued Normal Equations Orthogonal Methods SVD Householder QR Factorization, continued Process just described produces factorization Hn · · · H1 A = For solving linear least squares problem, product Q of Householder transformations need not be formed explicitly R O R can be stored in upper triangle of array initially containing A where R is n × n and upper triangular If Q = H1 · · · Hn , then A = Q R O Householder vectors v can be stored in (now zero) lower triangular portion of A (almost) To preserve solution of linear least squares problem, right-hand side b is transformed by same sequence of Householder transformations Then solve triangular least squares problem Michael T. Heath R x ∼ QT b = O Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Householder transformations most easily applied in this form anyway Normal Equations Orthogonal Methods SVD 33 / 61 Example: Householder QR Factorization Scientific Computing Normal Equations Orthogonal Methods SVD 34 / 61 Applying resulting Householder transformation H1 yields transformed matrix and right-hand side −1.789 −2.236 0 −1.118 −0.362 0 −0.191 −0.405 0.309 −0.655 , H1 b = −0.862 H1 A = 0 −0.362 0 0.809 −0.405 1.138 0 1.309 0.345 Householder vector v1 for annihilating subdiagonal entries of first column of A is 1 −2.236 3.236 1 0 1 v1 = 1 − 0 = 1 1 0 1 1 0 1 Michael T. Heath Scientific Computing Example, continued For polynomial data-fitting example given previously, with 1 −1.0 1.0 1.0 1 −0.5 0.25 0.5 0.0 0.0 , b = 0.0 A = 1 1 0.5 0.5 0.25 2.0 1 1.0 1.0 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Householder vector v2 for annihilating subdiagonal entries of second column of H1 A is 0 0 0 −0.191 1.581 −1.772 v2 = 0.309 − 0 = 0.309 0.809 0 0.809 1.309 0 1.309 Normal Equations Orthogonal Methods SVD 35 / 61 Example, continued Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 36 / 61 Example, continued Applying resulting Householder transformation H2 yields −1.789 −2.236 0 −1.118 0.632 0 1.581 0 0 −0.725 , H2 H1 b = −1.035 H2 H1 A = 0 −0.816 0 0 −0.589 0.404 0 0 0.047 Applying resulting Householder transformation H3 yields −1.789 −2.236 0 −1.118 0.632 0 1.581 0 0 0.935 , H3 H2 H1 b = 1.336 H3 H2 H1 A = 0 0 0.026 0 0 0.337 0 0 0 Householder vector v3 for annihilating subdiagonal entries of third column of H2 H1 A is 0 0 0 0 0 0 v3 = −0.725 − 0.935 = −1.660 −0.589 0 −0.589 0.047 0 0.047 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Now solve upper triangular system Rx = c1 by back-substitution to obtain x = 0.086 0.400 1.429 T < interactive example > Normal Equations Orthogonal Methods SVD 37 / 61 Givens Rotations Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 38 / 61 Givens Rotations, continued Givens rotations introduce zeros one at a time Given vector a1 a2 T , choose scalars c and s so that Back-substitution then gives cs −s c a1 α = a2 0 with c2 + s2 = 1, or equivalently, α = s= Finally, c2 + s2 = 1, or α = c α = s 0 c= Gaussian elimination yields triangular system a1 a2 0 −a1 − a2 /a1 2 Michael T. Heath and c= αa1 a2 + a2 1 2 a2 + a2 1 2 Previous equation can be rewritten a1 a2 a2 −a1 αa2 a2 + a2 1 2 a1 a2 + a2 1 2 a2 + a2 , implies 1 2 and s= a2 a2 + a2 1 2 c α = s −αa2 /a1 Scientific Computing 39 / 61 Michael T. Heath Scientific Computing 40 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Example: Givens Rotation Let a = 4 3 Givens QR Factorization T More generally, to annihilate selected component of vector in n dimensions, rotate target component with another component a1 a1 1 0000 0 c 0 s 0 a2 α 0 0 1 0 0 a3 = a3 0 −s 0 c 0 a4 0 a5 a5 0 0001 To annihilate second entry we compute cosine and sine c= a1 a2 + a2 1 2 = 4 = 0.8 5 and s = a2 a2 + a2 1 2 = 3 = 0.6 5 Rotation is then given by G= cs 0.8 0.6 = −s c −0.6 0.8 By systematically annihilating successive entries, we can reduce matrix to upper triangular form using sequence of Givens rotations To confirm that rotation works, Ga = Normal Equations Orthogonal Methods SVD 0.8 0.6 −0.6 0.8 4 5 = 3 0 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Each rotation is orthogonal, so their product is orthogonal, producing QR factorization Normal Equations Orthogonal Methods SVD 41 / 61 Givens QR Factorization Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 42 / 61 Gram-Schmidt Orthogonalization Given vectors a1 and a2 , we seek orthonormal vectors q1 and q2 having same span Straightforward implementation of Givens method requires about 50% more work than Householder method, and also requires more storage, since each rotation requires two numbers, c and s, to define it This can be accomplished by subtracting from second vector its projection onto first vector and normalizing both resulting vectors, as shown in diagram These disadvantages can be overcome, but requires more complicated implementation Givens can be advantageous for computing QR factorization when many entries of matrix are already zero, since those annihilations can then be skipped < interactive example > < interactive example > Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems 43 / 61 Normal Equations Orthogonal Methods SVD Gram-Schmidt Orthogonalization Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 44 / 61 Modified Gram-Schmidt Process can be extended to any number of vectors a1 , . . . , ak , orthogonalizing each successive vector against all preceding ones, giving classical Gram-Schmidt procedure Classical Gram-Schmidt procedure often suffers loss of orthogonality in finite-precision for k = 1 to n qk = ak for j = 1 to k − 1 T rjk = qj ak qk = qk − rjk qj end rkk = qk 2 qk = qk /rkk end Also, separate storage is required for A, Q, and R, since original ak are needed in inner loop, so qk cannot overwrite columns of A Both deficiencies are improved by modified Gram-Schmidt procedure, with each vector orthogonalized in turn against all subsequent vectors, so qk can overwrite ak Resulting qk and rjk form reduced QR factorization of A Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 45 / 61 Modified Gram-Schmidt QR Factorization Scientific Computing Normal Equations Orthogonal Methods SVD 46 / 61 Rank Deficiency Modified Gram-Schmidt algorithm If rank(A) < n, then QR factorization still exists, but yields singular upper triangular factor R, and multiple vectors x give minimum residual norm for k = 1 to n rkk = ak 2 qk = ak /rkk for j = k + 1 to n T rkj = qk aj aj = aj − rkj qk end end Common practice selects minimum residual solution x having smallest norm Can be computed by QR factorization with column pivoting or by singular value decomposition (SVD) Rank of matrix is often not clear cut in practice, so relative tolerance is used to determine rank < interactive example > Michael T. Heath Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Scientific Computing 47 / 61 Michael T. Heath Scientific Computing 48 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Example: Near Rank Deficiency QR with Column Pivoting Consider 3 × 2 matrix Instead of processing columns in natural order, select for reduction at each stage column of remaining unreduced submatrix having maximum Euclidean norm 0.641 0.242 A = 0.321 0.121 0.962 0.363 If rank(A) = k < n, then after k steps, norms of remaining unreduced columns will be zero (or “negligible” in finite-precision arithmetic) below row k Computing QR factorization, R= 1.1997 0.4527 0 0.0002 Yields orthogonal factorization of form R is extremely close to singular (exactly singular to 3-digit accuracy of problem statement) If R is used to solve linear least squares problem, result is highly sensitive to perturbations in right-hand side For practical purposes, rank(A) = 1 rather than 2, because columns are nearly linearly dependent Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems QT AP = RS OO where R is k × k , upper triangular, and nonsingular, and permutation matrix P performs column interchanges Normal Equations Orthogonal Methods SVD 49 / 61 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems QR with Column Pivoting, continued Normal Equations Orthogonal Methods SVD 50 / 61 Singular Value Decomposition Basic solution to least squares problem Ax ∼ b can now = be computed by solving triangular system Rz = c1 , where c1 contains first k components of QT b, and then taking x=P Normal Equations Orthogonal Methods SVD Singular value decomposition (SVD) of m × n matrix A has form A = U ΣV T z 0 where U is m × m orthogonal matrix, V is n × n orthogonal matrix, and Σ is m × n diagonal matrix, with Minimum-norm solution can be computed, if desired, at expense of additional processing to annihilate S rank(A) is usually unknown, so rank is determined by monitoring norms of remaining unreduced columns and terminating factorization when maximum value falls below chosen tolerance Diagonal entries σi , called singular values of A, are usually ordered so that σ1 ≥ σ2 ≥ · · · ≥ σn Columns ui of U and vi of V are called left and right singular vectors < interactive example > Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems 0 for i = j σi ≥ 0 for i = j σij = Normal Equations Orthogonal Methods SVD 51 / 61 Example: SVD Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 52 / 61 Applications of SVD Minimum norm solution to Ax ∼ b is given by = 123 4 5 6 T SVD of A = 7 8 9 is given by U ΣV = 10 11 12 .141 .825 −.420 −.351 25.5 0 .344 .426 .298 .782 0 1.29 .547 .0278 .664 −.509 0 0 .750 −.371 −.542 .0790 0 0 x= σi =0 0 .504 .574 .644 0 −.761 −.057 .646 0 .408 −.816 .408 0 uT b i vi σi For ill-conditioned or rank deficient problems, “small” singular values can be omitted from summation to stabilize solution Euclidean matrix norm : A 2 = σmax Euclidean condition number of matrix : cond(A) = < interactive example > σmax σmin Rank of matrix : number of nonzero singular values Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 53 / 61 Pseudoinverse Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD 54 / 61 Orthogonal Bases Define pseudoinverse of scalar σ to be 1/σ if σ = 0, zero otherwise Define pseudoinverse of (possibly rectangular) diagonal matrix by transposing and taking scalar pseudoinverse of each entry Then pseudoinverse of general real m × n matrix A is given by A+ = V Σ+ U T SVD of matrix, A = U ΣV T , provides orthogonal bases for subspaces relevant to A Columns of U corresponding to nonzero singular values form orthonormal basis for span(A) Remaining columns of U form orthonormal basis for orthogonal complement span(A)⊥ Columns of V corresponding to zero singular values form orthonormal basis for null space of A Pseudoinverse always exists whether or not matrix is square or has full rank If A is square and nonsingular, then A+ = A−1 In all cases, minimum-norm solution to Ax ∼ b is given by = x = A+ b Michael T. Heath Scientific Computing Remaining columns of V form orthonormal basis for orthogonal complement of null space of A 55 / 61 Michael T. Heath Scientific Computing 56 / 61 Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Normal Equations Orthogonal Methods SVD Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Lower-Rank Matrix Approximation Normal Equations Orthogonal Methods SVD Total Least Squares Another way to write SVD is A = U ΣV T = σ1 E1 + σ2 E2 + · · · + σn En Ordinary least squares is applicable when right-hand side b is subject to random error but matrix A is known accurately T with Ei = ui vi Ei has rank 1 and can be stored using only m + n storage locations Product Ei x can be computed using only m + n multiplications Condensed approximation to A is obtained by omitting from summation terms corresponding to small singular values Approximation using k largest singular values is closest matrix of rank k to A Approximation is useful in image processing, data compression, information retrieval, cryptography, etc. < interactive example > Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems When all data, including A, are subject to error, then total least squares is more appropriate Total least squares minimizes orthogonal distances, rather than vertical distances, between model and data Total least squares solution can be computed from SVD of [A, b] Normal Equations Orthogonal Methods SVD 57 / 61 Comparison of Methods Scientific Computing Normal Equations Orthogonal Methods SVD 58 / 61 Comparison of Methods, continued Forming normal equations matrix AT A requires about n2 m/2 multiplications, and solving resulting symmetric linear system requires about n3 /6 multiplications Normal equations method produces solution whose relative error is proportional to [cond(A)]2 Required Cholesky factorization can be expected to break √ down if cond(A) ≈ 1/ mach or worse Solving least squares problem using Householder QR factorization requires about mn2 − n3 /3 multiplications Householder method produces solution whose relative error is proportional to If m ≈ n, both methods require about same amount of work cond(A) + r If m n, Householder QR requires about twice as much work as normal equations mn2 Michael T. Heath Scientific Computing Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems Householder method can be expected to break down (in back-substitution phase) only if cond(A) ≈ 1/ mach or worse Normal Equations Orthogonal Methods SVD 59 / 61 Comparison of Methods, continued Householder is more accurate and more broadly applicable than normal equations These advantages may not be worth additional cost, however, when problem is sufficiently well conditioned that normal equations provide sufficient accuracy For rank-deficient or nearly rank-deficient problems, Householder with column pivoting can produce useful solution when normal equations method fails outright SVD is even more robust and reliable than Householder, but substantially more expensive Scientific Computing 2 2 [cond(A)] which is best possible, since this is inherent sensitivity of solution to least squares problem n3 , Cost of SVD is proportional to + with proportionality constant ranging from 4 to 10, depending on algorithm used Michael T. Heath Michael T. Heath Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems 61 / 61 Michael T. Heath Scientific Computing 60 / 61 ...
View Full Document

This note was uploaded on 10/16/2011 for the course MECHANICAL 581 taught by Professor Wasfy during the Fall '11 term at IUPUI.

Page1 / 8

chap03_8up - Least Squares Data Fitting Existence...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online