computer vision, deep learning basics.pdf - Linear Algebra...

This preview shows page 1 out of 743 pages.

Unformatted text preview: Linear Algebra for Computer Vision, Robotics, and Machine Learning Jean Gallier and Jocelyn Quaintance Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA e-mail: [email protected] c Jean Gallier August 7, 2019 2 Preface In recent years, computer vision, robotics, machine learning, and data science have been some of the key areas that have contributed to major advances in technology. Anyone who looks at papers or books in the above areas will be baffled by a strange jargon involving exotic terms such as kernel PCA, ridge regression, lasso regression, support vector machines (SVM), Lagrange multipliers, KKT conditions, etc. Do support vector machines chase cattle to catch them with some kind of super lasso? No! But one will quickly discover that behind the jargon which always comes with a new field (perhaps to keep the outsiders out of the club), lies a lot of “classical” linear algebra and techniques from optimization theory. And there comes the main challenge: in order to understand and use tools from machine learning, computer vision, and so on, one needs to have a firm background in linear algebra and optimization theory. To be honest, some probablity theory and statistics should also be included, but we already have enough to contend with. Many books on machine learning struggle with the above problem. How can one understand what are the dual variables of a ridge regression problem if one doesn’t know about the Lagrangian duality framework? Similarly, how is it possible to discuss the dual formulation of SVM without a firm understanding of the Lagrangian framework? The easy way out is to sweep these difficulties under the rug. If one is just a consumer of the techniques we mentioned above, the cookbook recipe approach is probably adequate. But this approach doesn’t work for someone who really wants to do serious research and make significant contributions. To do so, we believe that one must have a solid background in linear algebra and optimization theory. This is a problem because it means investing a great deal of time and energy studying these fields, but we believe that perseverance will be amply rewarded. Our main goal is to present fundamentals of linear algebra and optimization theory, keeping in mind applications to machine learning, robotics, and computer vision. This work consists of two volumes, the first one being linear algebra, the second one optimization theory and applications, especially to machine learning. This first volume covers “classical” linear algebra, up to and including the primary decomposition and the Jordan form. Besides covering the standard topics, we discuss a few topics that are important for applications. These include: 1. Haar bases and the corresponding Haar wavelets. 2. Hadamard matrices. 3 4 3. Affine maps (see Section 5.4). 4. Norms and matrix norms (Chapter 8). 5. Convergence of sequences and series in a normed vector space. The matrix exponential eA and its basic properties (see Section 8.8). 6. The group of unit quaternions, SU(2), and the representation of rotations in SO(3) by unit quaternions (Chapter 15). 7. An introduction to algebraic and spectral graph theory. 8. Applications of SVD and pseudo-inverses, in particular, principal component analysis, for short PCA (Chapter 21). 9. Methods for computing eigenvalues and eigenvectors, with a main focus on the QR algorithm (Chapter 17). Four topics are covered in more detail than usual. These are 1. Duality (Chapter 10). 2. Dual norms (Section 13.7). 3. The geometry of the orthogonal groups O(n) and SO(n), and of the unitary groups U(n) and SU(n). 4. The spectral theorems (Chapter 16). Except for a few exceptions we provide complete proofs. We did so to make this book self-contained, but also because we believe that no deep knowledge of this material can be acquired without working out some proofs. However, our advice is to skip some of the proofs upon first reading, especially if they are long and intricate. The chapters or sections marked with the symbol ~ contain material that is typically more specialized or more advanced, and they can be omitted upon first (or second) reading. Acknowledgement: We would like to thank Christine Allen-Blanchette, Kostas Daniilidis, Carlos Esteves, Spyridon Leonardos, Stephen Phillips, Jo˜ao Sedoc, Jianbo Shi, Marcelo Siqueira, and C.J. Taylor for reporting typos and for helpful comments. Contents 1 Introduction 11 2 Vector Spaces, Bases, Linear Maps 2.1 Motivations: Linear Combinations, Linear Independence, 2.2 Vector Spaces . . . . . . . . . . . . .P . . . . . . . . . . . 2.3 Indexed Families; the Sum Notation i∈I ai . . . . . . . 2.4 Linear Independence, Subspaces . . . . . . . . . . . . . 2.5 Bases of a Vector Space . . . . . . . . . . . . . . . . . . 2.6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Linear Forms and the Dual Space . . . . . . . . . . . . . 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 27 34 40 46 53 57 65 68 69 . . . . . . 77 77 82 87 90 94 94 . . . . . . . . 101 101 103 108 110 113 118 121 121 3 Matrices and Linear Maps 3.1 Representation of Linear Maps by Matrices . . . . . . . 3.2 Composition of Linear Maps and Matrix Multiplication 3.3 Change of Basis Matrix . . . . . . . . . . . . . . . . . . 3.4 The Effect of a Change of Bases on Matrices . . . . . . 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Haar Bases, Haar Wavelets, Hadamard Matrices 4.1 Introduction to Signal Compression Using Haar Wavelets 4.2 Haar Matrices, Scaling Properties of Haar Wavelets . . . . 4.3 Kronecker Product Construction of Haar Matrices . . . . 4.4 Multiresolution Signal Analysis with Haar Bases . . . . . 4.5 Haar Transform for Digital Images . . . . . . . . . . . . . 4.6 Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Direct Sums, Rank-Nullity Theorem, Affine Maps 125 5.1 Direct Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5 6 CONTENTS 5.2 5.3 5.4 5.5 5.6 Sums and Direct Sums . . . The Rank-Nullity Theorem; Affine Maps . . . . . . . . . Summary . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . Grassmann’s Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Determinants 6.1 Permutations, Signature of a Permutation . . . 6.2 Alternating Multilinear Maps . . . . . . . . . . 6.3 Definition of a Determinant . . . . . . . . . . . 6.4 Inverse Matrices and Determinants . . . . . . . 6.5 Systems of Linear Equations and Determinants 6.6 Determinant of a Linear Map . . . . . . . . . . 6.7 The Cayley–Hamilton Theorem . . . . . . . . . 6.8 Permanents . . . . . . . . . . . . . . . . . . . . 6.9 Summary . . . . . . . . . . . . . . . . . . . . . 6.10 Further Readings . . . . . . . . . . . . . . . . . 6.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 131 137 144 145 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 153 158 162 170 172 175 175 181 183 184 185 7 Gaussian Elimination, LU, Cholesky, Echelon Form 7.1 Motivating Example: Curve Interpolation . . . . . . 7.2 Gaussian Elimination . . . . . . . . . . . . . . . . . 7.3 Elementary Matrices and Row Operations . . . . . . 7.4 LU -Factorization . . . . . . . . . . . . . . . . . . . . 7.5 P A = LU Factorization . . . . . . . . . . . . . . . . 7.6 Proof of Theorem 7.5 ~ . . . . . . . . . . . . . . . . 7.7 Dealing with Roundoff Errors; Pivoting Strategies . . 7.8 Gaussian Elimination of Tridiagonal Matrices . . . . 7.9 SPD Matrices and the Cholesky Decomposition . . . 7.10 Reduced Row Echelon Form . . . . . . . . . . . . . . 7.11 RREF, Free Variables, Homogeneous Systems . . . . 7.12 Uniqueness of RREF . . . . . . . . . . . . . . . . . . 7.13 Solving Linear Systems Using RREF . . . . . . . . . 7.14 Elementary Matrices and Columns Operations . . . 7.15 Transvections and Dilatations ~ . . . . . . . . . . . 7.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . 7.17 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 191 194 199 202 208 216 221 223 225 234 240 243 244 251 252 258 260 8 Vector Norms and Matrix Norms 8.1 Normed Vector Spaces . . . . . . . . . . . 8.2 Matrix Norms . . . . . . . . . . . . . . . 8.3 Subordinate Norms . . . . . . . . . . . . . 8.4 Inequalities Involving Subordinate Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 271 282 287 294 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 CONTENTS 8.5 8.6 8.7 8.8 8.9 8.10 Condition Numbers of Matrices . . . . An Application of Norms: Inconsistent Limits of Sequences and Series . . . . The Matrix Exponential . . . . . . . . Summary . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . Linear . . . . . . . . . . . . . . . . . . . . . Systems . . . . . . . . . . . . . . . . . . . . 9 Iterative Methods for Solving Linear Systems 9.1 Convergence of Sequences of Vectors and Matrices 9.2 Convergence of Iterative Methods . . . . . . . . . . 9.3 Methods of Jacobi, Gauss–Seidel, and Relaxation . 9.4 Convergence of the Methods . . . . . . . . . . . . . 9.5 Convergence Methods for Tridiagonal Matrices . . 9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . 9.7 Problems . . . . . . . . . . . . . . . . . . . . . . . 10 The Dual Space and Duality 10.1 The Dual Space E ∗ and Linear Forms . . . . . 10.2 Pairing and Duality Between E and E ∗ . . . . 10.3 The Duality Theorem and Some Consequences 10.4 The Bidual and Canonical Pairings . . . . . . . 10.5 Hyperplanes and Linear Forms . . . . . . . . . 10.6 Transpose of a Linear Map and of a Matrix . . 10.7 Properties of the Double Transpose . . . . . . . 10.8 The Four Fundamental Subspaces . . . . . . . 10.9 Summary . . . . . . . . . . . . . . . . . . . . . 10.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Euclidean Spaces 11.1 Inner Products, Euclidean Spaces . . . . . . . . . . 11.2 Orthogonality and Duality in Euclidean Spaces . . 11.3 Adjoint of a Linear Map . . . . . . . . . . . . . . . 11.4 Existence and Construction of Orthonormal Bases 11.5 Linear Isometries (Orthogonal Transformations) . . 11.6 The Orthogonal Group, Orthogonal Matrices . . . 11.7 The Rodrigues Formula . . . . . . . . . . . . . . . 11.8 QR-Decomposition for Invertible Matrices . . . . . 11.9 Some Applications of Euclidean Geometry . . . . . 11.10 Summary . . . . . . . . . . . . . . . . . . . . . . . 11.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 305 306 309 312 313 . . . . . . . 319 319 321 324 333 335 340 341 . . . . . . . . . . 343 343 350 355 360 362 363 368 370 373 374 . . . . . . . . . . . 377 377 386 393 395 403 406 408 410 415 416 418 12 QR-Decomposition for Arbitrary Matrices 429 12.1 Orthogonal Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 8 CONTENTS 12.2 QR-Decomposition Using Householder Matrices . . . . . . . . . . . . . . . . 434 12.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 12.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 13 Hermitian Spaces 13.1 Hermitian Spaces, Pre-Hilbert Spaces . . . . . . . . . . 13.2 Orthogonality, Duality, Adjoint of a Linear Map . . . . 13.3 Linear Isometries (Also Called Unitary Transformations) 13.4 The Unitary Group, Unitary Matrices . . . . . . . . . . 13.5 Hermitian Reflections and QR-Decomposition . . . . . . 13.6 Orthogonal Projections and Involutions . . . . . . . . . 13.7 Dual Norms . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 449 458 464 465 468 473 475 482 484 14 Eigenvectors and Eigenvalues 14.1 Eigenvectors and Eigenvalues of a Linear Map 14.2 Reduction to Upper Triangular Form . . . . . 14.3 Location of Eigenvalues . . . . . . . . . . . . 14.4 Conditioning of Eigenvalue Problems . . . . . 14.5 Eigenvalues of the Matrix Exponential . . . . 14.6 Summary . . . . . . . . . . . . . . . . . . . . 14.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 489 496 501 504 507 509 509 Quaternions and Rotations in SO(3) The Group SU(2) and the Skew Field H of Quaternions . . . . Representation of Rotation in SO(3) By Quaternions in SU(2) Matrix Representation of the Rotation rq . . . . . . . . . . . . An Algorithm to Find a Quaternion Representing a Rotation . The Exponential Map exp : su(2) → SU(2) . . . . . . . . . . . Quaternion Interpolation ~ . . . . . . . . . . . . . . . . . . . . Nonexistence of a “Nice” Section from SO(3) to SU(2) . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 521 523 528 530 533 535 538 540 540 . . . . . . . 543 543 543 548 553 559 562 567 15 Unit 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Spectral Theorems 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 16.2 Normal Linear Maps: Eigenvalues and Eigenvectors . 16.3 Spectral Theorem for Normal Linear Maps . . . . . . 16.4 Self-Adjoint and Other Special Linear Maps . . . . . 16.5 Normal and Other Special Matrices . . . . . . . . . . 16.6 Rayleigh–Ritz Theorems and Eigenvalue Interlacing 16.7 The Courant–Fischer Theorem; Perturbation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 CONTENTS 16.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 16.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 17 Computing Eigenvalues and Eigenvectors 17.1 The Basic QR Algorithm . . . . . . . . . . . . . . . 17.2 Hessenberg Matrices . . . . . . . . . . . . . . . . . . 17.3 Making the QR Method More Efficient Using Shifts 17.4 Krylov Subspaces; Arnoldi Iteration . . . . . . . . . 17.5 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 17.6 The Hermitian Case; Lanczos Iteration . . . . . . . . 17.7 Power Methods . . . . . . . . . . . . . . . . . . . . . 17.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . 17.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Graphs and Graph Laplacians; Basic Facts 18.1 Directed Graphs, Undirected Graphs, Weighted Graphs 18.2 Laplacian Matrices of Graphs . . . . . . . . . . . . . . . 18.3 Normalized Laplacian Matrices of Graphs . . . . . . . . 18.4 Graph Clustering Using Normalized Cuts . . . . . . . . 18.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 579 585 590 595 599 600 601 603 604 . . . . . . 607 610 616 621 624 627 627 19 Spectral Graph Drawing 631 19.1 Graph Drawing and Energy Minimization . . . . . . . . . . . . . . . . . . . 631 19.2 Examples of Graph Drawings . . . . . . . . . . . . . . . . . . . . . . . . . . 634 19.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 20 Singular Value Decomposition and Polar Form 20.1 Properties of f ∗ ◦ f . . . . . . . . . . . . . . . . . . . . 20.2 Singular Value Decomposition for Square Matrices . . . 20.3 Polar Form for Square Matrices . . . . . . . . . . . . . . 20.4 Singular Value Decomposition for Rectangular Matrices 20.5 Ky Fan Norms and Schatten Norms . . . . . . . . . . . 20.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Applications of SVD and Pseudo-Inverses 21.1 Least Squares Problems and the Pseudo-Inverse 21.2 Properties of the Pseudo-Inverse . . . . . . . . 21.3 Data Compression and SVD . . . . . . . . . . . 21.4 Principal Components Analysis (PCA) . . . . . 21.5 Best Affine Approximation . . . . . . . . . . . 21.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 641 645 648 650 654 655 655 . . . . . . 659 659 666 671 673 683 689 10 CONTENTS 21.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture