mpo662 - 1 Numerical Methods in Fluid Dynamics MPO 662...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Numerical Methods in Fluid Dynamics MPO 662 Instructor Mohamed Iskandarani MSC 320 x 4045 miskandarani@rsmas.miami.edu Grades 60% Homework (involve programming) 20% Mid term 20% Term project Syllabus 1. Introduction 2. Classifications of PDE’s and their properties 3. Basics of the finite difference method 4. Finite difference solutions of ODE 5. Finite difference solutions of time-dependent linear PDEs (a) advection equation (b) heat equation (c) Stability and dispersion properties of time differencing schemes 6. Numerical solution of finite difference approximation of elliptic equations 7. Special advection schemes 8. Energetically consistant finite difference schemes 9. The Finite Element Method 10. Additional topics (time permitting) 2 Background 1. Name 2. Degree Sought (what field) 3. Advisor (if any) 4. Background 5. Scientific Interest 6. Background in numerical modeling 7. Programming experience/language 3 Reserve List • Dale B. Haidvogel and Aike Beckmann, Numerical Ocean Circulation Modeling Imperial College Press, 1999. (CGFD) • Dale R. Durran, Numerical Methods for Wave Equations in Geophysical Fluid Dynamics, Springer, New York, 1998. (CGFD) • George J. Haltiner Roger T. Williams, Numerical Prediction and Dynamic Meteorology, Wiley, 1980. (CGFD) • John C. Tannehill, Dale A. Anderson, and Richard H. Pletcher, Computational Fluid Mechanics and Heat Transfer, Taylor and Francis, 1997. (FDM) • G. E. Forsythe and W. R. Wasow Finite-Difference Methods for Partial Differential Equations, John Wiley and Sons, Inc., New York, 1960. (FDM) • R. D. Richtmyer and K. W. Morton, Difference Methods for Initial–Value Problems, Interscience Publishers (J. Wiley & Sons), New York, 1967. Useful References • Gordon D. Smith, Numerical Solution of Partial Differential Equations : Finite Difference Methods, Oxford University Press, New York, 1985. (FDM) • K.W. Morton and D.F. Mayers, Numerical Solution of Partial Differential Equations : An Introduction, Cambridge University Press, New York, 1994. (FDM) • P.J. Roache, Computational Fluid Dynamics, Hermosa Publisher, 1972, ISBN 0913478-05-9. (FDM) • C.A.J. Fletcher, Computational Techniques for Fluid Dynamics, 2 volumes, 2nd ed., Springer-Verlag, New York, 1991-1992. (Num. Sol. of PDE’s) • Roger Peyret and Thomas D. Taylor, Computational Methods for Fluid Flow, Springer-Verlag, New York, 1990. (Num. Sol. of PDE’s) • Roger Peyret, Handbook of Computational Fluid Mechanics, Academic Press, San Diego, 1996. (QA911 .H347 1996) • Joel H. Ferziger and M. Peric Computational Methods For Fluid Dynamics, SpringerVerlag, New York, 1996. • R. S. Varga, Matrix Iterative Analysis, Prentice–Hall, New York, 1962. • Bengt Fornberg, A Practical Guide to Pseudospectral Methods, Cambridge University Press, Cambridge, 1998. (Spectral methods) • C. Canuto, M.Y. Hussaini, A. Quarteroni and T.A. Zang, Spectral Methods in Fluid Dynamics, Springer-Verlag, New York, 1991. (Spectral Methods) 4 • John P. Boyd, Chebyshev and Fourier Spectral Methods Dover Publications, 2000. (Spectral methods) • O.C. Zienkiewicz and R.L. Taylor, The Finite Element Method, 4th edition, Mc Graw Hill, 1989. • George Em. Karniadakis and Spencer J. Sherwin, Spectral h − p Element Methods for CFD, New York, Oxford University Press, 1999. (Spectral Elements) • Michel O. Deville, Paul F. Fischer and E.H. Mund, High-Order Methods for Incompressible Fluid Flow , Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, 2002. Useful Software • Plotting Software (e.g. matlab, NCAR Graphics, gnuplot) • Linear Algebra (e.g. matlab, LAPACK, IMSL) • Fast Fourier Transforms (e.g. matlab, fftpack, ?) • Fortran Compiler (debuggers are useful too) Numerical Methods in Ocean Modeling Lecture Notes for MPO662 October 26, 2011 2 Contents 1 Introduction 1.1 Justification of CFD . . . . . . . . . 1.2 Discretization . . . . . . . . . . . . . 1.2.1 Finite Difference Method . . 1.2.2 Finite Element Method . . . 1.2.3 Spectral Methods . . . . . . . 1.2.4 Finite Volume Methods . . . 1.2.5 Computational Cost . . . . . 1.3 Initialization and Forcing . . . . . . 1.4 Turbulence . . . . . . . . . . . . . . 1.5 Examples of system of equations and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 11 11 11 12 12 13 13 13 2 Basics of PDEs 2.1 Classification of second order PDEs . . . . . . . . . . . 2.1.1 Hyperbolic Equation: b2 − 4ac > 0 . . . . . . . 2.1.2 Parabolic Equation: b2 − 4ac = 0 . . . . . . . . 2.1.3 Elliptic Equation: b2 − 4ac < 0 . . . . . . . . . 2.2 Well-Posed Problems . . . . . . . . . . . . . . . . . . . 2.3 First Order Systems . . . . . . . . . . . . . . . . . . . 2.3.1 Scalar Equation . . . . . . . . . . . . . . . . . . 2.3.2 System of Equations in one-space dimension . . 2.3.3 System of equations in multi-space dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 19 21 23 23 24 24 27 28 3 Finite Difference Approximation of Derivatives 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Finite Difference Approximation . . . . . . . . . . . . . 3.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Taylor series and finite differences . . . . . . . . 3.3.2 Higher order approximation . . . . . . . . . . . . 3.3.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Systematic Derivation of higher order derivative 3.3.5 Discrete Operator . . . . . . . . . . . . . . . . . 3.4 Polynomial Fitting . . . . . . . . . . . . . . . . . . . . . 3.4.1 Linear Fit . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Quadratic Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 31 33 34 35 36 40 41 42 42 43 3 ... ... ... ... ... ... ... ... ... their ...... ...... ...... ...... ...... ...... ...... ...... ...... properties 4 CONTENTS 3.5 3.4.3 Higher order formula . . . . . . . . . . Compact Differencing Schemes . . . . . . . . 3.5.1 Derivation of 3-term compact schemes 3.5.2 Families of Fourth order schemes . . . 3.5.3 Families of Sixth order schemes . . . . 3.5.4 Numerical experiments . . . . . . . . . 4 Application of Finite Differences to ODE 4.1 Introduction . . . . . . . . . . . . . . . . . . 4.2 Forward Euler Approximation . . . . . . . . 4.3 Stability, Consistency and Convergence . . 4.3.1 Lax Richtmeyer theorem . . . . . . . 4.3.2 Von Neumann stability condition . . 4.4 Backward Difference . . . . . . . . . . . . . 4.5 Backward Difference . . . . . . . . . . . . . 4.6 Trapezoidal Scheme . . . . . . . . . . . . . 4.6.1 Phase Errors . . . . . . . . . . . . . 4.7 Higher Order Methods . . . . . . . . . . . . 4.7.1 Multi Stage (Runge Kutta) Methods 4.7.2 Remarks on RK schemes . . . . . . . 4.7.3 Multi Time Levels Methods . . . . . 4.8 Strongly Stable Schemes . . . . . . . . . . . 4.8.1 Stability of BDF . . . . . . . . . . . 4.9 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Numerical Solution of PDE’s 5.1 Introduction . . . . . . . . . . . . . . . . . . . 5.1.1 Convergence . . . . . . . . . . . . . . 5.1.2 Truncation Error . . . . . . . . . . . . 5.1.3 Consistency . . . . . . . . . . . . . . . 5.1.4 Stability . . . . . . . . . . . . . . . . . 5.1.5 Lax-Richtmeyer Equivalence theorem 5.2 Truncation Error . . . . . . . . . . . . . . . . 5.3 The Lax Richtmeyer theorem . . . . . . . . . 5.4 The Von Neumann Stability Condition . . . . 5.5 Von Neumann Stability Analysis . . . . . . . 5.6 Modified Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Numerical Solution of the Advection Equation 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 6.2 Donor Cell scheme . . . . . . . . . . . . . . . . . 6.2.1 Remarks . . . . . . . . . . . . . . . . . . . 6.3 Backward time centered space (BTCS) . . . . . . 6.3.1 Remarks . . . . . . . . . . . . . . . . . . . 6.4 Centered time centered space (CTCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 46 46 47 48 48 . . . . . . . . . . . . . . . . 49 49 51 53 53 55 55 56 56 57 59 59 61 62 66 68 69 . . . . . . . . . . . 71 71 72 73 73 73 73 73 74 76 77 77 . . . . . . 81 81 81 81 84 84 87 CONTENTS 6.5 6.6 5 6.4.1 Remarks . . . . . . . . . . . . . Lax Wendroff scheme . . . . . . . . . . 6.5.1 Remarks . . . . . . . . . . . . . Numerical Dispersion . . . . . . . . . . 6.6.1 Analytical Dispersion Relation 6.6.2 Numerical Dispersion Relation: ............ ............ ............ ............ ............ Spatial Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 88 90 90 91 91 7 Finite Volume Method 7.1 The partial differential equation . . . . . . . . . . . 7.2 Integral Form of Conservation Law . . . . . . . . . 7.3 Sketch of Finite Volume Methods . . . . . . . . . . 7.4 Finite Volume in 1D . . . . . . . . . . . . . . . . . 7.4.1 Function Reconstruction . . . . . . . . . . . 7.4.2 Piecewise constant . . . . . . . . . . . . . . 7.4.3 Piecewise Linear . . . . . . . . . . . . . . . 7.4.4 Piecewise parabolic . . . . . . . . . . . . . . 7.4.5 Reconstruction Validation . . . . . . . . . . 7.5 Finite Volume Method for Scalar Advection in 2D 7.5.1 Function reconstruction in 2D . . . . . . . . 7.6 Algorithm Summary . . . . . . . . . . . . . . . . . 7.7 Code Design . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Data Structure . . . . . . . . . . . . . . . . 7.7.2 Domain Geometry . . . . . . . . . . . . . . 7.7.3 Flow . . . . . . . . . . . . . . . . . . . . . . 7.7.4 T initiations . . . . . . . . . . . . . . . . . 7.8 Tracer Advection in a Stommel Gyre . . . . . . . . 7.8.1 The flow field . . . . . . . . . . . . . . . . . 7.8.2 The initial condition . . . . . . . . . . . . . 7.8.3 Expected result . . . . . . . . . . . . . . . . 7.8.4 Support Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 95 96 98 99 99 100 100 101 102 107 110 112 113 113 113 114 114 114 114 115 117 117 8 Numerical Dispersion of Linearized 8.1 Linearized SWE in 1D . . . . . . . 8.1.1 Centered FDA on A-grid . 8.1.2 Centered FDA on C-grid . 8.2 Two-Dimensional SWE . . . . . . 8.2.1 Inertia gravity waves . . . . 8.3 Rossby waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 119 120 121 124 124 125 9 Solving the Poisson Equations 9.1 Iterative Methods . . . . . . . . . . . . . . . . . . . 9.1.1 Jacobi method . . . . . . . . . . . . . . . . 9.1.2 Gauss-Seidel method . . . . . . . . . . . . . 9.1.3 Successive Over Relaxation (SOR) method 9.1.4 Iteration by Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 139 139 140 141 141 SWE .... .... .... .... .... .... . . . . . . . . . . . . . . . . . . . . . . . . 6 CONTENTS 9.2 9.3 9.1.5 Matrix Analysis . Krylov Method-CG . . . . Direct Methods . . . . . . 9.3.1 Periodic Problem . 9.3.2 Dirichlet Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Nonlinear equations 10.1 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 1D Burger equation . . . . . . . . . . . . . . . . . . 10.3 Quadratic Conservation . . . . . . . . . . . . . . . . 10.4 Nonlinear advection equation . . . . . . . . . . . . . 10.4.1 FD Approximation of the advection term . . 10.5 Conservation in vorticity streamfunction formulation 10.6 Conservation in primitive equations . . . . . . . . . . 10.7 Conservation for divergent flows . . . . . . . . . . . . 11 Special Advection Schemes 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . 11.2 Monotone Schemes . . . . . . . . . . . . . . . . . 11.3 Flux Corrected Transport (FCT) . . . . . . . . . 11.3.1 One-Dimensional . . . . . . . . . . . . . . 11.3.2 One-Dimensional Flux Correction Limiter 11.3.3 Properties of FCT . . . . . . . . . . . . . 11.3.4 Two-Dimensional FCT . . . . . . . . . . . 11.3.5 Time-Differencing with FCT . . . . . . . 11.4 Slope/Flux Limiter Methods . . . . . . . . . . . 11.5 MPDATA . . . . . . . . . . . . . . . . . . . . . . 11.6 WENO schemes in vertical . . . . . . . . . . . . 11.6.1 Function reconstruction . . . . . . . . . . 11.6.2 WENO reconstruction . . . . . . . . . . . 11.6.3 ENO and WENO numerical experiments 11.7 Utopia . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Lax-Wendroff for advection equation . . . . . . . 11.9 2D Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 146 149 149 149 . . . . . . . . 151 151 153 154 156 156 158 161 163 . . . . . . . . . . . . . . . . . 165 165 166 166 166 168 169 173 174 175 176 177 177 180 181 183 187 188 12 Fourier series 199 12.1 Continuous Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 12.2 Discrete Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 12.2.1 Fourier Series For Periodic Problems . . . . . . . . . . . . . . . . . 201 13 Spectral Methods 13.1 Spectral Series . . . . . . . . . . . . 13.2 Fourier Series . . . . . . . . . . . . . 13.2.1 Bounds on Fourier coefficients 13.3 Equal Error Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 205 206 209 210 CONTENTS 7 14 Finite Element Methods 14.1 MWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Collocation . . . . . . . . . . . . . . . . . . . . . 14.1.2 Least Square . . . . . . . . . . . . . . . . . . . . 14.1.3 Galerkin . . . . . . . . . . . . . . . . . . . . . . . 14.2 FEM example in 1D . . . . . . . . . . . . . . . . . . . . 14.2.1 Weak Form . . . . . . . . . . . . . . . . . . . . . 14.2.2 Galerkin form . . . . . . . . . . . . . . . . . . . . 14.2.3 Essential Boundary Conditions . . . . . . . . . . 14.2.4 Choice of interpolation and test functions . . . . 14.2.5 FEM solution using 2 linear elements . . . . . . 14.2.6 FEM solution using N linear elements . . . . . . 14.2.7 Local stiffness matrix and global assembly . . . . 14.2.8 Quadratic Interpolation . . . . . . . . . . . . . . 14.2.9 Spectral Interpolation . . . . . . . . . . . . . . . 14.2.10 Numerical Integration . . . . . . . . . . . . . . . 14.3 Mathematical Results . . . . . . . . . . . . . . . . . . . 14.3.1 Uniqueness and Existence of continuous solution 14.3.2 Uniqueness and Existence of continuous solution 14.3.3 Error estimates . . . . . . . . . . . . . . . . . . . 14.4 Two Dimensional Problems . . . . . . . . . . . . . . . . 14.4.1 Linear Triangular Elements . . . . . . . . . . . . 14.4.2 Higher order triangular elements . . . . . . . . . 14.4.3 Quadrilateral elements . . . . . . . . . . . . . . . 14.4.4 Interpolation in quadrilateral elements . . . . . . 14.4.5 Evaluation of integrals . . . . . . . . . . . . . . . 14.5 Time-dependent problem in 1D: the Advection Equation 14.5.1 Numerical Example . . . . . . . . . . . . . . . . 14.6 The Discontinuous Galerkin Method (DGM) . . . . . . 14.6.1 Gaussian Hill Experiment . . . . . . . . . . . . . 14.6.2 Cone Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 211 212 212 213 213 213 214 214 215 216 220 222 224 226 228 231 231 232 232 233 234 237 238 240 242 244 247 251 252 253 15 Linear Analysis 15.1 Linear Vector Spaces . . . . . . . . . . . . . 15.1.1 Definition of Abstract Vector Space 15.1.2 Definition of a Norm . . . . . . . . . 15.1.3 Definition of an inner product . . . . 15.1.4 Basis . . . . . . . . . . . . . . . . . . 15.1.5 Example of a vector space . . . . . . 15.1.6 Function Space . . . . . . . . . . . . 15.1.7 Pointwise versus Global Convergence 15.2 Linear Operators . . . . . . . . . . . . . . . 15.3 Eigenvalues and Eigenvectors . . . . . . . . 15.4 Sturm-Liouville Theory . . . . . . . . . . . 15.5 Application to PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 257 257 258 258 258 260 260 263 263 264 264 266 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 CONTENTS 16 Rudiments of Linear Algebra 271 16.1 Vector Norms and Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 16.1.1 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 16.1.2 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 16.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 16.3 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 273 16.4 Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 16.5 Eigenvalues of Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . 274 17 Programming Tips 277 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 17.2 Fortran Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 17.3 Debugging and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 17.3.1 Programming Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 281 17.3.2 Coding tips and compiler options . . . . . . . . . . . . . . . . . . . 282 17.3.3 Run time errors and compiler options . . . . . . . . . . . . . . . . 283 17.3.4 Some common pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . 284 18 Debuggers 287 18.1 Preparing the code for debugging . . . . . . . . . . . . . . . . . . . . . . . 287 18.2 Running the debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Chapter 1 Introduction 1.1 Justification of CFD Fluid motion is governed by the Navier-Stokes equations, a set of coupled and nonlinear partial differential equations derived from the basic laws of conservation of mass, momentum and energy. The unknowns are usually the velocity, the pressure and the density (for stratified fluids) and some tracers like temperature and salinity. The analytical paper and pencil solution of these equations is practically impossible save for the simplest of flows. The simplifications can take the form of geometrical simplification (the flow is in a rectangle or a circle), and/or physical simplification (periodicity, homogeneous density, linearity, etc...). Occasionally, it is possible to make headway by using asymptotic analyses technique, and there has been remarkable success in the past O(100) year, like the development of boundary layer theory. Scientists had to resort to laboratory experiments when theoretical analyses was impossible. Physical complexity can be restored to the system. The answers delivered are, however, usually qualitatively different since dynamical and geometric similitudes are difficult to enforce simultaneously between the lab experiment and the prototype. A prime example is the Reynolds’ number similarity which if violated can turn a turbulent flow laminar. Furthermore, the design and construction of these experiments can be difficult (and costly), particularly for stratified rotating flows. Computational fluid dynamics (CFD) is an additional tool in the arsenal of scientists. In its early days CFD was often controversial, as it involved additional approximation to the governing equations and raised additional (legitimate) issues. Nowadays CFD is an established discipline alongside theoretical and experimental methods. This position is in large part due to the exponential growth of computer power which has allowed us to tackle ever larger and more complex problems. 1.2 Discretization The central process in CFD is the process of discretization, i.e. the process of taking differential equations with an infinite number of degrees of freedom, and reducing it to a system of finite degrees of freedom. Hence, instead of determining the solution 9 10 CHAPTER 1. INTRODUCTION Greenland Iceland Figure 1.1: Computation grid for a finite difference ocean model everywhere and for all times, we will be satisfied with its calculation at a finite number of locations and at specified time intervals. The partial differential equations are then reduced to a system of algebraic equations that can be solved on a computer. Errors creep in during the discretization process. The nature and characteristics of the errors must be controlled in order to ensure that 1) we are solving the correct equations (consistency property), and 2) that the error can be decreased as we increase the number of degrees of freedom (stability and convegence). Once these two criteria are established, the power of computing machines can be leveraged to solve the problem in a numerically reliable fashion. Various discretization schemes have been developed to cope with a variety of issues. The most notable for our purposes are: finite difference methods, finite volume methods, finite element methods, and spectral methods. 1.2. DISCRETIZATION 1.2.1 11 Finite Difference Method Finite difference replace the infinitesimal limiting process of derivative calculation f (x + ∆x) − f (x) ∆x→ 0 ∆x (1.1) f (x + ∆x) − f (x) + O(∆x) ∆x (1.2) f ′ (x) = lim with a finite limiting process,i.e. f ′ (x) ≈ The term O(∆x) gives an indication of the magnitude of the error as a function of the mesh spacing. In this instance, the error is halfed if the grid spacing, ∆x is halved, and we say that this is a first order method. Most FDM used in practice are at least second order accurate except in very special circumstances. We will concentrate mostly on finite difference methods since they are still among the most popular numerical methods for the solution of PDE’s because of their simplicity, efficiency, low computational cost, and ease of analysis. Their major drawback is in their geometric inflexibility which complicates their applications to general complex domains. These can be alleviated by the use of either mapping techniques and/or masking to fit the computational mesh to the computational domain. 1.2.2 Finite Element Method The finite element method was designed to deal with problem with complicated computational regions. The PDE is first recast into a variational form which essentially forces the mean error to be small everywhere. The discretization step proceeds by dividing the computational domain into elements of triangular or rectangular shape. The solution within each element is interpolated with a polynomial of usually low order. Again, the unknowns are the solution at the collocation points. The CFD community adopted the FEM in the 1980’s when reliable methods for dealing with advection dominated problems were devised. 1.2.3 Spectral Methods Both finite element and finite difference methods are low order methods, usually of 2nd4th order, and have local approximation property. By local we mean that a particular collocation point is affected by a limited number of points around it. In contrast, spectral method have global approximation property. The interpolation functions, either polynomials or trigonomic functions are global in nature. Their main benefits is in the rate of convergence which depends on the smoothness of the solution (i.e. how many continuous derivatives does it admit). For infinitely smooth solution, the error decreases exponentially, i.e. faster than algebraic. Spectral methods are mostly used in the computations of homogeneous turbulence, and require relatively simple geometries. Atmospheric model have also adopted spectral methods because of their convergence properties and the regular spherical shape of their computational domain. 12 CHAPTER 1. INTRODUCTION Figure 1.2: Elemental partition of the global ocean as seen from the eastern and western equatorial Pacific. The inset shows the master element in the computational plane. The location of the interpolation points is marked with a circle, and the structuredness of this grid local grid is evident from the predictable adjacency pattern between collocation points. 1.2.4 Finite Volume Methods Finite volume methods are primarily used in aerodynamics applications where strong shocks and discontinuities in the solution occur. Finite volume method solves an integral form of the governing equations so that local continuity property do not have to hold. 1.2.5 Computational Cost The CPU time to solve the system of equations differ substantially from method to method. Finite differences are usually the cheapest on a per grid point basis followed by the finite element method and spectral method. However, a per grid point basis comparison is a little like comparing apple and oranges. Spectral methods deliver more accuracy on a per grid point basis than either FEM or FDM. The comparison is more meaningfull if the question is recast as ”what is the computational cost to achieve a given error tolerance?”. The problem becomes one of defining the error measure which is a complicated task in general situations. 1.3. INITIALIZATION AND FORCING 1.3 13 Initialization and Forcing The state of a system is determined by its intial state, (conditions at t=0), the forces acting on the system (sources and sinks of momentum, heat, buoyancy), the boundary conditions, and the governing equations. Given this information one can in principle integrate the equations forward to find the state of the system at a future time. Things are not so simple in practice. First, the initial state is seldom known accurately. In spite of advances in measurements and instrumentations, data deficiencies still exist and manifest themselves in either inadequate temporal or spatial coverage or measuring errors. The situation is more diare in the ocean than the atmosphe because the dynamical scales are smaller and require more measurement per unit area, and because of observational difficulties. Furthermore, the fluxes between the ocean and the atmosphere are not well known and constitute another modeling impediment. The discipline of data assimilation is devoted to the task of integrating data and model in optimal fashion. This is topic we will not touch upon in this course. 1.4 Turbulence Most flows occuring in nature are turbulent, in that they contain energy at all scales ranging from hundred of kilometers to a few centimeters. It is obviously not possible to model all these scales at once. It is often sufficient to model the ”large scale physics” and to relegate the small unresolvable scale to a subgrid model. Subgrid models occupy a large discipline in CFD, we will use the simplest of these models which rely on a simple eddy diffusivity coefficient. 1.5 Examples of system of equations and their properties The numerical solution of any partial differential equations should only proceed after carefull consideration of the dynamical settings of the problem. A prudent modeler will try to learn as much as he can about the system he is trying to solve, for ortherwise how can he judge the results of his simulations? Errors due to numerical approximation are sometimes notoriously hard to catch and diagnose, and an knoweldge of the expected behavior of the system will go a long way in helping catch these errors. Furthermore, one can often simplify the numerical solution process by taking advantages of special features in the solution, such as symmetry or a reduction of the number of unknowns, to reduce the complexity of the analytical and numerical formulations. In the following sections we consider the case of the two dimensional incompressible Navier Stokes equations to illustrate some of these points. The incompressible Navier-Stokes equations is an example of a system of partial differential equations governing a large class of fluid flow problems. We will confine ourselves to two-dimensions for simplicity. The primtive form of these equations is: v t + v · ∇v = − 1 ∇p + ν ∇2 v ρ0 ∇·v =0 (momentum conservation) (1.3) (mass conservation) (1.4) 14 CHAPTER 1. INTRODUCTION supplemented with proper boundary and initial conditions. The unknowns are the two components of the velocity v and the pressure p so that we have 3 unknowns functions to determine. The parameters of the problem are the density and the kinematic viscosity which are assumed constant in this example. Equation (1.3) expresses the conservation of momentum, and equation (1.4), also referred to as the continuity equation, conservation of mass which, for a constant density fluid, amount to conservation of volume. The form of equations (1.3)-(1.4) is called primitive because it uses velocity and pressure as its dependent variables. In the early computer days memory was expensive, and modelers had to use their ingenuity to reduce the model’s complexity as much as possible. Furthermore, the incompressibility constraint complicates the solution process substantially since there is no simple evolution equation for the pressure. The streamfunction vorticity formulation of the two-dimensional Navier Stokes equations addresses both difficulties by enforcing the continuity requirement implicitly, and reducing the number of unknown functions. The streamfunction vorticity formulation introduces other complications in the solution process that we will ignore for the time being. Alas, this is a typical occurence where a cure to one set of concerns raises quite a few, and sometimes, irritating side-effects. The streamfunction is defined as follows: u = −ψy , v = ψx . (1.5) Any velocity derivable from such a streamfunction is that guaranteed to conserve mass since ux + vy = (−ψy )x + (ψx )y = −ψyx + ψxy = 0. To simplify the equations further we introduce the vorticity ζ = vx − uy (a scalar in 2D flows), which in terms of the streamfunction is ∇2 ψ = ζ. (1.6) We can derive an evolution equation for the vorticity by taking the curl of the momentum equation, Eq. (1.3). The final form after using various vector identities are: ζt time rate of change Ω T 1 1 + v·ζ = ν ∇2 ζ advection UΩ L diffusion νΩ L2 UT L 1 νT L2 1 ν = UL Re (1.7) Note that the equation simplifies considerably since the pressure does not appear in the equation (the curl of a gradient is always zero). Equations (1.6) and (1.7) are now a system of two equations in two unknowns: the vorticity and streamfunction. The pressure has disappeared as an unknown, and its role has been assumed by the streamfunction. The two physical processes governing the evolution of vorticity are advection by the flow 1.5. EXAMPLES OF SYSTEM OF EQUATIONS AND THEIR PROPERTIES 15 and diffusion by viscous action. Equation (1.7) is an example of parabolic partial differential equation requiring initial and boundary data for its unique solution. Equation (1.6) is an example of an elliptic partial differential equation. In this instance it is a Poisson equation linking a given vorticity distribution to a streamfunction. Occasionally the term prognostic and diagnostic are applied to the vorticity and streamfunction, respectively, to mean that the vorticity evolves dynamically in time to balance the conservation of momentum, while the streamfunction responds instantly to changes in vorticity to enforce kinematic constraints. A numerical solution of this coupled set of equations would proceed in the following manner: given an initial distribution of vorticity, the corresponding streamfunction is computed from the solution of the Poisson equations along with the associated velocity; the vorticity equation is then integrated in time using the previous value of the unknown fields; the new streamfunction is subsequently updated. The process is repeated until the desired time is reached. In order to gauge which process dominates, advection or diffusion, the vorticity evolution, we proceed to non-dimensionalize the variables with the following, time, length and velocity scales, T , L and U , respectively. The vorticity scale is then Ω = U/L from the vorticity definition. The time rate of change, advection and diffusion then scale as Ω/T , U Ω/L, and ν Ω/L2 as shown in the third line of equation 1.7. Line 4 shows the relative sizes of the term after multiplying line 3 by T /Ω. If the time scale is chosen to be the advective time scale, i.e. T = L/U , then we obtain line 5 which shows a single dimensionless parameter, the Reynolds number, controlling the evolution of ζ . When Re ≪ 1 diffusion dominates and the equation reduces to the so called heat equation ζ = ν ∇2 ζ . If Re ≫ 1 advection dominates almost everywhere in the domain. Notice that dropping the viscous term is problematic since it has the highest order derivative, and hence controls the imposition of boundary conditions. Diffusion then has to become dominant near the boundary through an increase of the vorticity gradient in a thin boundary layers where advection and viscous action become balanced. What are the implications for numerical solution of all the above analysis. By carefully analysing the vorticity dynamics we have shown that a low Reynolds number simulation requires attention to the viscous operator, whereas advection dominates in high Reynolds number flow. Furthermore, close attention must be paid to the boundary layers forming near the edge of the domain. Further measures of checks on the solution can be obtained by spatially integrating various forms of the vorticity equations to show that energy, kinetic energy here, and enstrophy ζ 2 /2 should be conserved in the inviscid case, Re = ∞, when the domain is closed. 16 CHAPTER 1. INTRODUCTION Chapter 2 Basics of PDEs Partial differential equations are used to model a wide variety of physical phenomena. As such we expect their solution methodology to depend on the physical principles used to derive them. A number of properties can be used to distinguish the different type of differential equations encountered. In order to give concrete examples of the discussions to follow we will use as an example the following partial differential equation: auxx + buxy + cuyy + dux + euy + f = 0. (2.1) The unknown function in equation (2.1) is the function u which depends on the two independent variables x and y . The coefficients of the equations a, b, . . . , f are yet undefined. The following properties of a partial differential equation are useful in determining its character, properties, method of analysis, and numerical solution: Order : The order of a PDE is the order of the highest occuring derivative. The order in equation (2.1) is 2. A more detailed description of the equation would require the specification of the order for each independent variable. Equation 2.1 is second order in both x and y . Most equations derived from physical principles, are usually first order in time, and first or second order in space. Linear : The PDE is linear if none of its coefficients depend on the unknown function. In our case this is tantamount to requiring that the functions a, b, . . . , f are independent of u. Linear PDEs are important since their linear combinations can be combined to form yet another solution. More mathematically, if u and v are solution of the PDE, the so is w = αu + βv where α and β are constants independent of u, x and y . The Laplace equation uxx + uyy = 0 is linear while the one dimensional Burger equation ut + uux = 0 is nonlinear. The majority of the numerical analysis results are valid for linear equations, and little is known or generalizable to nonlinear equations. 17 18 CHAPTER 2. BASICS OF PDES Quasi Linear A PDE is quasi linear if it is linear in its highest order derivative, i.e. the coefficients of the PDE multiplying the highest order derivative depends at most on the function and lower order derivative. Thus, in our example, a, b and c may depend on u, ux and uy but not on uxx , uyy or uxy . Quasi linear equations form an important subset of the larger nonlinear class because methods of analysis and solutions developed for linear equations can safely be applied to quasi linear equations. The vorticity transport equation of quasi-geostrophic motion: ∂ ∇2 ψ + ∂t ∂ ψ ∂ ∇2 ψ ∂ψ ∂ ∇2 ψ − ∂y ∂x ∂x ∂y = 0, (2.2) where ∇2 = ψxx + ψyy is a third order quasi linear PDE for the streamfunction ψ . 2.1 Classification of second order PDEs The following classification are rooted in the character of the physical laws represented by the PDEs. However, these characteristics can be given a definite mathematical classfication that at first sight has nothing to do with their origins. We will attempt to link the PDE’s category to the relevant physical roots. The first question in attempting to solve equation 2.1 is to attempt a transformation of coordinates (the independent variables x and y ) in order to simpplify the equation. The change of variable can take the general form: x = x(ξ, η ) y = y (ξ, η ) (2.3) where ξ and η are the new independent variables. This is equivalent to a change of coordinates. Using the chain rule of differentiation we have: ux = uξ ξx + uη ηx (2.4) uy = uξ ξy + uη ηy (2.5) uxx = uyy = 2 uξξ ξx 2 uξξ ξy + + 2 2uξη ξx ηx + uηη ηx + uξ ξxx + uη ηxx 2 2uξη ξy ηy + uηη ηy + uξ ξyy + uη ηyy uxy = uξξ ξx ξy + uξη (ξx ηy + ξy ηx ) + uηη ηx ηy + uξ ξxy + uη ηxy (2.6) (2.7) (2.8) Substituting these expression in PDE 2.1 we arrive at the following equation: Auξξ + Buξη + Cuηη + Duξ + Euη + F = 0, (2.9) where the coefficients are given by the following expressions: 2 2 A = aξx + bξx ξy + cξy (2.10) B = 2aξx ηx + b(ξx ηy + ξy ηx ) + 2cξy ηy (2.11) C= 2 aηx + bηx ηy + 2 cηy (2.12) D = dξx + eξy (2.13) E = dηx + eηy (2.14) F (2.15) =f 2.1. CLASSIFICATION OF SECOND ORDER PDES 19 The equation can be simplified if ξ and η can be chosen such that A = C = 0 which in terms of the transformation factors requires: 2 2 aξx + bξx ξy + cξy = 0 2 + bη η + cη 2 = 0 aηx xy y (2.16) Assuming ξy and ηy are not equal to zero we can rearrange the above equation to have the form ar 2 + br + c = 0 where r = ξx /ξy or ηx /ηy . The number of roots for this quadratic depends on the sign of the determinant b2 − 4ac. Before considering the different cases we note that the sign of the determinant is independent of the choice of the coordinate system. Indeed it can be easily shown that the determinant in the new system is B 2 −4AC = (b2 −4ac)(ξx ηy −ξy ηx )2 , and hence the same sign as the determinant in the old system since the quantity (ξx ηy − ξy ηx ) is nothing but the squared Jacobian of the mapping between (x, y ) and (ξ, η ) space, and the Jacobian has to be one-signed for the mapping to be valid. 2.1.1 Hyperbolic Equation: b2 − 4ac > 0 In the case where b2 − 4ac > 0 equation has two√ distincts real roots and the equation is 2 called hyperbolic. The roots are given by r = −b± 2b −4ac . The coordinate transformation a required is hence given by: √ −b + b2 − 4ac ξx = (2.17) ξy 2a √ ηx −b − b2 − 4ac = (2.18) ηy 2a The interpretation of the above relation can be easily done by focussing on constant ξ surfaces where dξ = ξx dx + ξy dy = 0, and hence: dy dx dy dx ξx = ξy ξ ηx =− = ηy η =− √ b− b2 −4ac 2a (2.19) √ b+ b2 −4ac 2a (2.20) The roots of the quadratic are hence nothing but the slope of the constant ξ and constant η curves. These curves are called the characteristic curves. They are the preferred direction along which information propagate in a hyperbolic system. In the (ξ, η ) system the equation takes the canonical form: Buξη + Duξ + Euη + F = 0 (2.21) The solution can be easily obtained in the case D = E = F = 0, and takes the form: u = G(ξ ) + H (η ) where G and H are function determined by the boundary and initial conditions. (2.22) 20 CHAPTER 2. BASICS OF PDES Example 1 The one-dimensional wave equation: utt − κ2 uxx = 0, −∞ ≤ x ≤ ∞ (2.23) where κ is the wave speed is an example of a hyperbolic system, since its b2 − 4ac = 4κ2 > 0. The slope of the charasteristic curves are given by dx = ±κ, dt (2.24) which, for constant κ, gives the two family of characteristics: ξ = x − κt, η = x + κt (2.25) Initial conditions are needed to solve this problem; assuming they are of the form: u(x, 0) = f (x), ut (x, 0) = g(x), (2.26) we get the equations: F (x) + G(x) = f (x) ′ (2.27) ′ −κF (x) + κG (x) = g(x) (2.28) The second equation can be integrated in x to give x − κ [F (x) − F (x0 )] + κ [G(x) − G(x0 )] = g(α) dα (2.29) x0 where x0 is arbitrary and α an integration variable. We now have two equations in two unknowns, F and G, and the system can be solved to get: F (x) = G(x) = f (x) 1 − 2 2κ 1 f (x) + 2 2κ x F (x0 ) − G(x0 ) 2 x0 x F (x0 ) − G(x0 ) . g(α) dα + 2 x0 g(α) dα − (2.30) (2.31) To obtain the final solution we replace x by x − κt in the expression for F and by x + κt in the expression for G; adding up the resulting expressions the final solution to the PDE takes the form: u(x, t) = 1 f (x − κt) + f (x + κt) + 2 2κ x+κt g(τ ) dτ (2.32) x−κt Figure 2.1 shows the solution of the wave equation for the case where κ = 1, g = 0, and f (x) is a square wave. The time increases from left to right. The succession of plots shows two travelling waves, going in the positive and negative x-direction respectively at the speed κ = 1. Notice that after the crossing, the two square waves travel with no change of shape. 2.1. CLASSIFICATION OF SECOND ORDER PDES 21 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −5 0 5 −5 0 5 −5 1 1 0.8 0.6 0.4 0.4 0.2 0.2 0.2 0 0 5 0.6 0.4 0 0.8 0.6 5 1 0.8 0 0 −5 0 5 −5 0 5 −5 Figure 2.1: Solution to the second order wave equation. The top left figure shows the initial conditions, and the subsequent ones the solution at t = 0.4, 0.8, 1.2, 1.6 and 2.0. 2.1.2 Parabolic Equation: b2 − 4ac = 0 If b2 − 4ac = 0 then there is only one double root, and the equation is called parabolic. The two characteristic curves then coincide: dy dx = ξ dy dx = η −b 2a (2.33) Since the two characteristic curves coincide, the Jacobian of the mapping ξx ηy − ξy ηx vanishes identically. The coefficients of the PDE become A = B = 0. The canonical form of the parabolic equation is then Cuηη + Duξ + Euη + F = 0 (2.34) Example 2 The heat equation is a common example of parabolic equations: ut = κ2 uxx (2.35) 22 CHAPTER 2. BASICS OF PDES where κ now stands for a diffusion coefficient. The solution in an infinite domain can be obtained by Fourier transforming the PDE to get: ut = −k2 κ2 u ˜ ˜ (2.36) where u is the Fourier transform of u: ˜ ∞ u(k, t) = ˜ −∞ u(x, t)e−ikx dx, (2.37) and k is the wavenumber. The transformed PDE is simply a first order ODE, and can be easily solved with the help of the initial conditions u(x, 0) = u0 (x): u(k, t) = u0 e−k ˜ ˜ 2 κ2 t (2.38) The final solution is obtained by back Fourier transform u(x, t) = F−1 (˜). The latu ter can be written as a convolution since the back transforms of u0 = F−1 (˜0 ), and u F−1 (e−k 2 κ2 t )= 2κ 1 √ −x 2 e 4κ2 t are known: πt u(x, t) = 1 √ 2κ πt ∞ −∞ u0 (X )e −(x −X )2 4κ2 t dX (2.39) 1 0.1 0.01 0.2 0.8 0.6 0.4 0.2 0 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure 2.2: Solution to the second order wave equation. The figure shows the initial conditions (the top hat profile), and the solution at times t = 0.01, 0.1, 0.2. As an example we show the solution of the heat equation using the same square initial condition as for the wave equation. The solution can be written in terms of the error function: 1 −(x −X )2 x−1 1 1 x+1 √ − erf √ , (2.40) e 4κ2 t dX = u(x, t) = √ erf 2 2κ πt −1 2κ t 2κ t 2 z 1 where erf(z ) = √π 0 e−s ds. The solution is shown in figure 2.2. Instead of a travelling wave the solution shows a smearing of the two discontinuities as time increases accompanied by a decrease of the maximum amplitude. As its name indicates, the heat equation is an example of a diffusion equation where gradients in the solution tend to be smeared. 2.2. WELL-POSED PROBLEMS 2.1.3 23 Elliptic Equation: b2 − 4ac < 0 If b2 − 4ac < 0 and there is no real roots; the equation is then called elliptic. There are no real transformation that can eliminate the second derivatives in ξ and η . However, it is possible to find a transformation that would set B = 0 and A = C = 1. The canonical form is then: uξξ + uηη + Duξ + Euη + F = 0 (2.41) Example 3 The Poisson equation in two space dimensions is an example of elliptic PDE: uxx + uyy = 0 (2.42) 2.2 Well-Posed Problems Before attempting to compute numerical a solution to a PDE, we need to find out if its analytical solution makes sense. In other words, we want to find out if enough information is present in the problem formulation to identify the solution. The definition of a wellposed problem addresses this issue. A well-posed problem is defined as where the solution satisfies the following properties: • the solution exists • the solution is unique • the solution depends continuously upon the data Example 4 Consider the system of equations: uy + vx =0 vy + γux = 0 (2.43) The system can be reduced to a second order PDE in 1 variable: uyy − γuxx = 0 vyy − γvxx = 0 (2.44) Clearly, the PDEs are hyperbolic if γ > 0 and elliptic if γ < 0. To solve this system of equation we require boundary conditions. To continue with our example, we look for periodic solutions in x and impose the following boundary condition: u(x, 0) = sin(N x) , v (x, 0) = 0 N2 (2.45) 1. Ellipitic γ = −β 2 < 0 The solution for this case is then: u(x, y ) = sin(N x) cosh(βN y ), N2 v (x, y ) = β cos(N x) sinh(βN y ) N2 (2.46) 24 CHAPTER 2. BASICS OF PDES Notice that even though the PDEs are identical, the two solutions u and v are different because of the boundary conditions. For N → ∞ the boundary conditions become identical, and hence one would expect the solution to become identical. However, it is easy to verify that |u − v | → ∞ for any finite y > 0 as N → ∞. Hence small changes in the boundary condition lead to large changes in the solution, and the continuity between the boundary data and the solution has been lost. The problem in this case is that no boundary condition has been imposed on y → ∞ as required by the elliptic nature of the problem. 2. Hyperbolic γ = β 2 > 0 The solution is then u(x, y ) = sin(N x) cos(βN y ), N2 v (x, y ) = −γ cos(N x) sin(βN y ) N2 (2.47) Notice that in this case u, v → 0 when N → ∞ for any finite value of y . 2.3 First Order Systems The previous classification considered a single, scalar (1 dependent variable), second order partial differential equation, and two independent variables. In most physical systems there are usually a number of PDEs that must be satisfied simultaneously involving higher order derivative. We must then be able to classify these systems before attempting their solutions. Since systems of PDEs can be recast into first order system, the classification uses the latter approach. For example, the wave equation of the previous section can be cast as two equations in two unknowns: vt = κηx ηt = κvx ←→ ∂ ∂t v η = 0κ κ0 ∂ ∂x v η (2.48) where we have defined η = κux , and v = ut . Note that it is possible to consider v and η as the component of a vector unknown w and to write the equations in vector notation as shown in equation (2.48). In the next few section we will look primarily on the condition under which a first order system is hyperbolic. 2.3.1 Scalar Equation A typical equation of this sort is the advection equation: ut + cux = 0, 0 ≤ x ≤ L u(x, t = 0) = u0 (x), u(x = 0, t) = ul (t) (2.49) where c is the advecting velocity. Let us define c = dx/dt, so that the equation becomes: ∂u ∂u dt + dx = du = 0 ∂t ∂x (2.50) 2.3. FIRST ORDER SYSTEMS 25 1.4 u(x,t)=u (x ) 0 1.2 0 1 t 0.8 0.6 0.4 u (x ) 00 0.2 0 −4 −2 0 2 4 6 x Figure 2.3: Characteristic lines for the linear advection equation. The solid lines are the characteristics emanating from different locations on the initial axis. The dashed line represents the signal at time t = 0 and t = 1. If the solution at (x, t) is desired, we first need to find the foot of the characteristic x0 = x − ct, and the value of the function there at the initial time is copied to time t. where du is the total differential of the function u. Since the right hand side is zero, then the function must be constant along the lines dx/dt = c, and this constant must be equal to the value of u at the initial time. The solution can then written as: u(x, t) = u0 (x0 ) along dx =c dt (2.51) where x0 is the location of the foot of the characteristic, the intersection of the characteristic with the t = 0 line. The simplest way to visualize this picture is to consider the case where c is constant. The characteristic lines can then be obtained analytically: they are straight lines given by x = x0 + ct. A family of characteristic lines are shown in figure 2.3.1 where c is assumed positive. In this example the information is travelling from left to right at the constant speed c, and the initial hump translates to the right without change of shape. If the domain is of finite extent, say 0 ≤ x ≤ L, and the characteristic intersects the line x = 0 (assuming c > 0), then a boundary condition is needed on the left boundary to provide the information needed to determine the solution uniquely. That is we need to provide the variation of u at the “inlet” boundary in the form:u(x = 0, t) = g(t). The solution now can be written as: u(x, t) = u0 (x − ct) g(t − x/c) for for x − ct > 0 x − ct < 0 (2.52) Not that since the information travels from left to right, the boundary condition is needed at the left boundary and not on the right. The solution would be impossible to determine had the boundary conditions been given on the right boundary x = L, 26 CHAPTER 2. BASICS OF PDES 2 1.5 2 u(x,0)=1−sin(π x) 1.5 1 1 0.5 0.5 0 −1 −0.5 0 0.5 0 −1 1 −0.5 0 0.5 1 Figure 2.4: Characteristics for Burgers’ equation (left panel), and the solution (right panel) at different times for a periodic problem. The black line is the initial condition, the red line the solution at t = 1/8, the blue at t = 1/4, and the magenta at t = 3/4. The solution become discontinuous when characteristics intersects. the problem would then be ill-posed for lack of proper boundary conditions. The right boundary maybe called an “outlet” boundary since information is leaving the domain. No boundary condition can be imposed there since the solution is determined from “upstream” information. In the case where c < 0 the information travels from right to left, and the boundary condition must be imposed at x = L. If the advection speed c varies in x and t, then the characteristics are not necessarily straight lines of constant slope, but are in general curves. Since the slopes of the curves vary, characteristic lines may intersects. These intersection points are places where the solution is discontinuous with sharp jumps. At these locations the slopes are infinite and space and time-derivative become meaningless, i.e. the PDE is not valid anymore. This breakdown occurs usually because of the neglect of important physical terms, such as dissipative terms, that act to prevent true discontinuous solutions. An example of an equation that can lead to discontinuous solutions is the Burger equation: ut + uux = 0 (2.53) where c = u. This equation is nonlinear with a variable advection speed that depend on the solution. The characteristics are given by the lines: dx =u dt (2.54) along which the PDE takes the form du = 0. Hence u is constant along characteristics, which means that their slopes are also constant according to equation (2.54), and hence must be straightlines. Even in this nonlinear case the characteristics are straightlines but with varying slopes. The behavior of the solution can become quite complicated as characteristic lines intersect as shown in figure 2.4. The solution of hyperbolic equations in the presence of discontinuities can become quite complicated. We refer the interested reader to Whitham (1974); Durran (1999) for further discussions. 2.3. FIRST ORDER SYSTEMS 2.3.2 27 System of Equations in one-space dimension A system of PDE in one-space dimension of the form ∂w ∂w +A =0 ∂t ∂x (2.55) where A is the so called Jacobian matrix is said to be hyperbolic if the matrix A has a complete set of real eigenvalues. For then one can find a bounded matrix T whose columns are the eigenvectors of A, such that the matrix D = T−1 AT is diagonal. Reminder: A diagonal matrix is one where all entries are zero save for the ones on the main diagonal, and in the case above the diagonal entries are the eigenvalues of the ˆ matrix. The system can then be uncoupled by defining the auxiliary variables w = Tw, replacing in the original equation we get the equations ˆ ˆ ∂w ∂ wi ˆ ∂ wi ˆ ∂w +D = 0 ←→ + λi =0 ∂t ∂x ∂t ∂x (2.56) The equation on the left side is written in the vector form whereas the component form on the right shows the uncoupling of the equations. The component form clearly shows how the sytem is hyperbolic and analogous to the scalar form. Example 5 The linearized equation governing tidal flow in a channel of constant cross section and depth are the shallow water equations: ∂ ∂t u η + 0g h0 ∂ ∂x u η = 0, A= 0g h0 (2.57) where u and η are the unknown velocity and surface elevation, g is the gravitational acceleration, and h the water depth. The eigenvalues of the matrix A can be found by solving the equation: det A = 0 ⇔ det −λ g h −λ = λ2 − gh = 0. (2.58) √ The two real roots of the equations are λ = ±c, where c = gh. Hence the eigenvalues are the familiar gravity wave speed. Two eigenvectors of the system are u1 and u2 corresponding to the positive and negative roots, respectively: 1 1 u1 = c , u2 = − c . g g (2.59) The eigenvectors are the columns of the transformation matrix T, and we have T= 1 c g 1 c −g , T −1 = 1 2 g c −g c 1 1 . (2.60) It it easy to verify that D = T−1 AT = c0 0 −c , (2.61) 28 CHAPTER 2. BASICS OF PDES t T u(x, t) d d d d d d 1 −1 d c c d d d u(x, t) = u0 (xa ) ˆ ˆ η (x, t) = η0 (xb ) d ˆ ˆ d xa E xb x Figure 2.5: Characteristics for the one-dimensional tidal equations. The new variables u and η are constant along the right, and left going characteristic, respectively. The ˆ ˆ solution at the point (x, t) can be computed by finding the foot of two characteristic curves at the initial time, xa and xb and applying the conservation of u and η . ˆ ˆ as expected. The new variable are given by u ˆ η ˆ = T −1 u η = 1 2 1 1 g c −g c u η gη 1 2 u+ c = 1 gη u− 2 c (2.62) The equations written in the new variables take the component form: ˆ ˆ ∂u + c ∂u ∂t ∂x ˆ ∂η ˆ − c ∂η ∂t ∂x =0 =0 ←→ u = constant along ˆ η = constant along ˆ dx =c dt dx = −c dt (2.63) To illustrate how the characteristic information can be used to determine the solution at any time, given the initial conditions, we consider the case of an infinite domain, and calculate the intersection of the two characteristic curves meeting at (x, t) with the axis t = 0. If we label the two coordinate xa and xb as shown in the figure, then we can set up the two equations: gη u+ c u − gη c 2.3.3 = ua + = ub − gηa c gη b c u = ua + ub ηa − ηb +g 2 2c ←→ η = c ua − ub + ηa + ηb 2g 2 (2.64) System of equations in multi-space dimensions A system of PDE in two-space dimension can be written in the form: ∂w ∂w ∂w +A +B =0 ∂t ∂x ∂y (2.65) 2.3. FIRST ORDER SYSTEMS 29 In general it is not possible to define a similarity transformation that would diagonalize all matrices at once. To extend the definition of hyperbolicity to multi-space dimension, the following procedure can be used for the case where A and B are constant matrices. First we define the Fourier transform of the dependent variables with respect to x, y and t: ˆ w = wei(kx+ly−ωt) (2.66) ˆ where w can be interpreted as the vector of Fourier amplitudes, and k and l are wavelength in the x and y direction respectively, and ω is the frequency. This Fourier representation is then substitute it in the partial differential equation to obtain: ˆ [kA + lB − ωI ] w = 0, (2.67) where I is the identity matrix. Equation (2.67) has the form of an eigenvalue problem, where ω represent the eigenvalues of the matrix kA + lB. The system is classified as hyperbolic if and only if the eigenvalues ω are real for real choices of the wavenumber vector (k, l). The extension above hinges on the matrices A and B being constant in space and time; in spite of its limitation it does show intuitively how the general behavior of wave-like solution can be extended to multiple spatial dimensions. For the case where the matrices are not constant, the definition can be extended by requiring that the exitence of bounded matrices T such that the matrix T−1 (kA + lB)T is a diagonal matrix with real eigenvalues for all points within an neighborhood of (x, y ). 30 CHAPTER 2. BASICS OF PDES Chapter 3 Finite Difference Approximation of Derivatives 3.1 Introduction The standard definition of derivative in elementary calculus is the following u(x + ∆x) − u(x) ∆x→ 0 ∆x u′ (x) = lim (3.1) Computers however cannot deal with the limit of ∆x → 0, and hence a discrete analogue of the continuous case need to be adopted. In a discretization step, the set of points on which the function is defined is finite, and the function value is available on a discrete set of points. Approximations to the derivative will have to come from this discrete table of the function. Figure 3.1 shows the discrete set of points xi where the function is known. We will use the notation ui = u(xi ) to denote the value of the function at the i-th node of the computational grid. The nodes divide the axis into a set of intervals of width ∆xi = xi+1 − xi . When the grid spacing is fixed, i.e. all intervals are of equal size, we will refer to the grid spacing as ∆x. There are definite advantages to a constant grid spacing as we will see later. 3.2 Finite Difference Approximation The definition of the derivative in the continuum can be used to approximate the derivative in the discrete case: u′ (xi ) ≈ ui+1 − ui u(xi + ∆x) − u(xi ) = ∆x ∆x (3.2) where now ∆x is finite and small but not necessarily infinitesimally small, i.e. . This is known as a forward Euler approximation since it uses forward differencing. Intuitively, the approximation will improve, i.e. the error will be smaller, as ∆x is made smaller. 31 32 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES x i−1 x i x i+1 Figure 3.1: Computational grid and example of backward, forward, and central approximation to the derivative at point xi . The dash-dot line shows the centered parabolic interpolation, while the dashed line show the backward (blue), forward (red) and centered (magenta) linear interpolation to the function. The above is not the only approximation possible, two equally valid approximations are: backward Euler: u(xi ) − u(xi − ∆x) ui − ui−1 u′ (xi ) ≈ = (3.3) ∆x ∆x Centered Difference u′ (xi ) ≈ u(xi + ∆x) − u(xi − ∆x) ui+1 − ui−1 = 2∆x 2∆x (3.4) All these definitions are equivalent in the continuum but lead to different approximations in the discrete case. The question becomes which one is better, and is there a way to quantify the error committed. The answer lies in the application of Taylor series analysis. We briefly describe Taylor series in the next section, before applying them to investigate the approximation errors of finite difference formulae. 3.3. TAYLOR SERIES 3.3 33 Taylor series Starting with the identity: x u(x) = u(xi ) + xi u′ (s) ds (3.5) Since u(x) is arbitrary, the formula should hold with u(x) replaced by u′ (x), i.e., u′ (x) = u′ (xi ) + x xi u′′ (s) ds (3.6) Replacing this expression in the original formula and carrying out the integration (since u(xi ) is constant) we get: u(x) = u(xi ) + (x − xi )u′ (xi ) + x x xi xi u′′ (s) ds ds (3.7) The process can be repeated with u′′ (x) = u′′ (xi ) + x xi u′′′ (s) ds (3.8) to get: u(x) = u(xi ) + (x − xi )u′ (xi ) + (x − xi )2 ′′ u (xi ) + 2! x x x xi xi xi u′′′ (s) ds ds ds (3.9) This process can be repeated under the assumption that u(x) is sufficiently differentiable, and we find: (x − xi )n ( (x − xi )2 ′′ u (xi ) + · · · + u n)(xi ) + Rn+1 (3.10) 2! n! where the remainder is given by: u(x) = u(xi ) + (x − xi )u′ (xi ) + x x Rn+1 = xi ··· u(n+1) (x)( ds)n+1 (3.11) xi Equation 3.10 is known as the Taylor series of the function u(x) about the point xi . Notice that the series is a polynomial in (x − xi ) (the signed distance of x to xi ), and the coefficients are the (scaled) derivatives of the function evaluated at xi . If the (n + 1)-th derivative of the function u has minimum m and maximum M over the interval [xi x] then we can write: x x xi ··· xi m m( ds)n+1 ≤ Rn+1 ≤ (x − xi )n+1 (n + 1)! x x xi ≤ Rn+1 ≤ M ··· M ( ds)n+1 (3.12) xi (x − xi )n+1 (n + 1)! (3.13) which shows that the remainder is bounded by the values of the derivative and the distance of the point x to the expansion point xi raised to the power (n + 1). If we further assume that u(n+1) is continuous then it must take all values between m and M that is (x − xi )n+1 (3.14) Rn+1 = u(n+1) (ξ ) (n + 1)! for some ξ in the interval [xi x]. 34 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES 3.3.1 Taylor series and finite differences Taylor series have been widely used to study the behavior of numerical approximation to differential equations. Let us investigate the forward Euler with Taylor series. To do so, we expand the function u at xi+1 about the point xi : u(xi + ∆xi ) = u(xi ) + ∆xi ∂u ∂x + xi ∆ x2 ∂ 2 u i 2! ∂x2 + xi ∆ x3 ∂ 3 u i 3! ∂x3 + ... (3.15) xi The Taylor series can be rearranged to read as follows: ∂u u(xi + ∆xi ) − u(xi ) − ∆ xi ∂x = xi ∆ xi ∂ 2 u 2! ∂x2 ∆ x2 ∂ 3 u i 3! ∂x3 + xi + ... (3.16) xi Truncation Error where it is now clear that the forward Euler formula (3.2) corresponds to truncating the Taylor series after the second term. The right hand side of equation (3.16) is the error committed in terminating the series and is referred to as the truncation error. The tuncation error can be defined as the difference between the partial derivative and its finite difference representation. For sufficiently smooth functions, i.e. ones that possess continuous higher order derivatives, and sufficiently small ∆xi , the first term in the series can be used to characterize the order of magnitude of the error. The first term in the truncation error is the product of the second derivative evaluated at xi and the grid spacing ∆xi : the former is a property of the function itself while the latter is a numerical 2 parameter which can be changed. Thus, for finite ∂ u , the numerical approximation ∂x2 depends lineraly on the parameter ∆xi . If we were to half ∆xi we ought to expect a linear decrease in the error for sufficiently small ∆xi . We will use the “big Oh” notation to refer to this behavior so that T.E. ∼ O(∆xi ). In general if ∆xi is not constant we pick a representative value of the grid spacing, either the average of the largest grid spacing. Note that in general the exact truncation error is not known, and all we can do is characterize the behavior of the error as ∆x → 0. So now we can write: ∂u ∂x = xi ui+1 − ui + O(∆x) ∆ xi (3.17) The taylor series expansion can be used to get an expression for the truncation error of the backward difference formula: u(xi − ∆xi−1 ) = u(xi ) − ∆xi−1 ∂u ∂x + xi ∆x2−1 ∂ 2 u i 2! ∂x2 xi − ∆x3−1 ∂ 3 u i 3! ∂x3 + ... (3.18) xi where ∆xi−1 = xi − xi−1 . We can now get an expression for the error corresponding to backward difference approximation of the first derivative: ∂u u(xi ) − u(xi − ∆xi−1 ) − ∆xi−1 ∂x xi =− ∆xi−1 ∂ 2 u 2! ∂x2 + xi ∆x2−1 ∂ 3 u i 3! ∂x3 Truncation Error + ... xi (3.19) 3.3. TAYLOR SERIES 35 It is now clear that the truncation error of the backward difference, while not the same as the forward difference, behave similarly in terms of order of magnitude analysis, and is linear in ∆x: ∂u ui − ui−1 + O(∆x) (3.20) = ∂x xi ∆ xi Notice that in both cases we have used the information provided at just two points to derive the approximation, and the error behaves linearly in both instances. Higher order approximation of the first derivative can be obtained by combining the two Taylor series equation (3.15) and (3.18). Notice first that the high order derivatives of the function u are all evaluated at the same point xi , and are the same in both expansions. We can now form a linear combination of the equations whereby the leading error term is made to vanish. In the present case this can be done by inspection of equations (3.16) and (3.19). Multiplying the first by ∆xi−1 and the second by ∆xi and adding both equations we get: ui − ui−1 ui+1 − ui ∂u 1 + ∆xi ∆xi−1 − ∆xi + ∆xi−1 ∆ xi ∆xi−1 ∂x = xi ∆xi−1 ∆xi ∂ 3 u 3! ∂x3 + ... xi (3.21) There are several points to note about the preceding expression. First the approximation uses information about the functions u at three points: xi−1 , xi and xi+1 . Second the truncation error is T.E. ∼ O(∆xi−1 ∆xi ) and is second order, that is if the grid spacing is decreased by 1/2, the T.E. error decreases by factor of 22 . Thirdly, the previous point can be made clearer by focussing on the important case where the grid spacing is constant: ∆xi−1 = ∆xi = ∆x, the expression simplifies to: ui+1 − ui−1 ∂u − 2∆x ∂x = xi ∆x2 ∂ 3 u 3! ∂x3 + ... (3.22) xi Hence, for an equally spaced grid the centered difference approximation converges quadratically as ∆x → 0: ui+1 − ui−1 ∂u + O(∆x2 ) (3.23) = ∂x xi 2∆x Note that like the forward and backward Euler difference formula, the centered difference uses information at only two points but delivers twice the order of the other two methods. This property will hold in general whenever the grid spacing is constant and the computational stencil, i.e. the set of points used in approximating the derivative, is symmetric. 3.3.2 Higher order approximation The Taylor expansion provides a very useful tool for the derivation of higher order approximation to derivatives of any order. There are several approaches to achieve this. We will first look at an expendient one before elaborating on the more systematic one. In most of the following we will assume the grid spacing to be constant as is usually the case in most applications. 36 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES Equation (3.22) provides us with the simplest way to derive a fourth order approximation. An important property of this centered formula is that its truncation error contains only odd derivative terms: ∂u ∆x2 ∂ 3 u ∆x4 ∂ 5 u ∆x6 ∂ 7 u ∆x2m ∂ (2m+1) u ui+1 − ui−1 = + + + + ... + + ... 2∆x ∂x 3! ∂x3 5! ∂x5 7! ∂x7 (2m + 1)! ∂x(2m+1) (3.24) The above formula can be applied with ∆x replace by 2∆x, and 3∆x respectively to get: ui+2 − ui−2 4∆x ui+3 − ui−3 6∆x = = ∂u (2∆x)2 ∂ 3 u (2∆x)4 ∂ 5 u (2∆x)6 ∂ 7 u + + + + O(∆x8 ) (3.25) ∂x 3! ∂x3 5! ∂x5 7! ∂x7 ∂u (3∆x)2 ∂ 3 u (3∆x)4 ∂ 5 u (3∆x)6 ∂ 7 u + + + + O(∆x8 ) (3.26) ∂x 3! ∂x3 5! ∂x5 7! ∂x7 It is now clear how to combine the different estimates to obtain a fourth order approximation to the first derivative. Multiplying equation (3.24) by 22 and substracting it from equation (3.25), we cancel the second order error term to get: ∂u 4∆x4 ∂ 5 u 20∆x6 ∂ 7 u 8(ui+1 − ui−1 ) − (ui+2 − ui−2 ) = − − + O(∆x8 ) 12∆x ∂x 5! ∂x5 7! ∂x7 (3.27) Repeating the process for equation but using the factor 32 and substracting it from equation (3.26), we get ∂u 9∆x4 ∂ 5 u 90∆x6 ∂ 7 u 27(ui+1 − ui−1 ) − (ui+3 − ui−3 ) = − − + O(∆x8 ) 48∆x ∂x 5! ∂x5 7! ∂x7 (3.28) Although both equations (3.27) and (3.28) are valid, the latter is not used in practice since it does not make sense to disregard neighboring points while using more distant ones. However, the expression is useful to derive a sixth order approximation to the first derivative: multiply equation (3.28) by 9 and equation (3.28) by 4 and substract to get: 45(ui+1 − ui−1 ) − 9(ui+2 − ui−2 ) + (ui+3 − ui−3 ) ∂u 36∆x6 ∂ 7 u = + + O(∆x8 ) (3.29) 60∆x ∂x 7! ∂x7 The process can be repeated to derive higher order approximations. 3.3.3 Remarks The validity of the Taylor series analysis of the truncation error depends on the existence of higher order derivatives. If these derivatives do not exist, then the higher order approximations cannot be expected to hold. To demonstrate the issue more clearly we will look at specific examples. Example 6 The function u(x) = sin πx is infinitely smooth and differentiable, and its first derivative is given by ux = π cos πx. Given the smoothness of the function we expect the Taylor series analysis of the truncation error to hold. We set about verifying this claim in a practical calculation. We lay down a computational grid on the interval −1 ≤ x ≤ 1 of constant grid spacing ∆x = 2/M . The approximation points are then xi = i∆x − 1, 3.3. TAYLOR SERIES 37 1 0.01 0.5 0.005 0 0 −0.5 −0.005 −1 −1 −0.5 0 0.5 −0.01 −1 1 0 10 10 −5 10 10 −10 10 10 −15 10 0 10 10 1 2 10 3 10 4 10 10 −0.5 0 0.5 1 0 −5 −10 −15 10 0 1 10 2 10 3 10 4 10 Figure 3.2: Finite difference approximation to the derivative of the function sin πx. The top left panel shows the function as a function of x. The top right panel shows the spatial distribution of the error using the Forward difference (black line), the backward difference (red line), and the centered differences of various order (magenta lines) for the case M = 1024; the centered difference curves lie atop each other because their errors are much smaller then those of the first order schemes. The lower panels are convergence curves showing the rate of decrease of the rms and maximum errors as the number of grid cells increases. 38 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES i = 0, 1, . . . , M . Let ǫ be the error between the finite difference approximation to the first derivative, ux , and its analytical derivative ux : ˜ ǫi = ux (xi ) − ux (xi ) ˜ (3.30) The numerical approximation ux will be computed using the forward difference, equation ˜ (3.17), the backward difference, equation (3.20), and the centered difference approximations of order 2, 4 and 6, equations (3.22), (3.27, and (3.29). We will use two measures to characterize the error ǫi , and to measure its rate of decrease as the number of grid points is increased. One is a bulk measure and consists of the root mean square error, and the other one consists of the maximum error magnitude. We will use the following notations for the rms and max errors: 1 2 M ǫ 2 ǫ2 i (3.31) max (|ǫi |) (3.32) = ∆x i=0 ǫ ∞ = 0≤i≤M The right panel of figure 3.2 shows the variations of ǫ as a function of x for the case M = 1024 for several finite difference approximations to ux . For the first order schemes the errors peak at ±1/2 and reaches 0.01. The error is much smaller for the higher order centered difference scheme. The lower panels of figure 3.2 show the decrease of the rms error ( ǫ 2 on the left), and maximum error ( ǫ ∞ on the right) as a function of the number of cells M . It is seen that the convergence rate increases with an increase in the order of the approximation as predicted by the Taylor series analysis. The slopes on this log-log plot are -1 for forward and backward difference, and -2, -4 and -6 for the centered difference schemes of order 2, 4 and 6, respectively. Notice that the maximum error decreases at the same rate as the rms error even though it reports a higher error. Finally, if one were to gauge the efficiency of using information most accurately, it is evident that for a given M , the high order methods achieve the lowest error. Example 7 We now investigate the numerical approximation to a function with finite differentiability, more precisely, one that has a discontinuous third derivative. This function is defined as follows: u(x) ux (x) x < 0 sin πx π cos πx −x2 π (1 − 2x2 )e−x2 0 < x πxe x=0 0 π uxx (x) −π 2 sin πx 2 2πx(2x2 − 3)e−x 0 uxxx −π 3 cos πx 2 −2π (3 − 12x2 + 4x4 )e−x −π 3 , −6π Notice that the function and its first two derivatives are continuous at x = 0, but the third derivative is discontinuous. An examination of the graph of the function in figure 3.3 shows a curve, at least visually (the so called eye-ball norm). The error distribution is shown in the top right panel of figure 3.3 for the case M = 1024 and the fourth order centered difference scheme. Notice that the error is very small except for the spike near the discontinuity. The error curves (in the lower panels) show that the second order centered difference converges faster then the forward and backward Euler scheme, but 3.3. TAYLOR SERIES 39 1.5 1 1 x 10 −6 0.5 0 ε u(x) 0.5 0 −0.5 −0.5 −1 −1 10 10 0 x 0.5 −1 −1 1 0 10 max( |ε| ) || ε || 2 10 −0.5 −5 −10 10 0 10 1 2 10 M 10 3 10 4 10 10 −0.5 0 x 0.5 1 0 −5 −10 10 0 10 1 2 10 M 10 3 10 4 Figure 3.3: Finite difference approximation to the derivative of a function with discontinuous third derivative. The top left panel shows the function u(x) which, to the eyeball norm, appears to be quite smooth. The top right panel shows the spatial distribution of the error (M = 1024) using the fourth order centered difference: notice the spike at the discontinuity in the derivative. The lower panels are convergence curves showing the rate of decrease of the rms and maximum errors as the number of grid cells increases. 40 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES that the convergence rates of the fourth and sixth order centered schemes are no better then that of the second order one. This is a direct consequence of the discontinuity in the third derivative whereby the Taylor expansion is valid only up to the third term. The effects of the discontinuity are more clearly seen in the maximum error plot (lower right panel) then in the mean error one (lower left panel). The main message of this example is that for functions with a finite number of derivatives, the Taylor series prediction for the high order schemes does not hold. Notice that the error for the fourth and sixth order schemes are lower then the other 3, but their rate of convergence is the same as the second order scheme. This is largely coincidental and would change according to the function. 3.3.4 Systematic Derivation of higher order derivative The Taylor series expansion provides a systematic way of deriving approximation to higher order derivatives of any order (provided of course that the function is smooth enough). Here we assume that the grid spacing is uniform for simplicity. Suppose that the stencil chosen includes the points: xj such that i − l ≤ j ≤ i + r . There are thus l points to the left and r points to the right of the point i where the derivative is desired for a total of r + l + 1 points. The Taylor expansion is: (m∆x)2 (m∆x)3 (m∆x)4 (m∆x)5 (m∆x) ux + uxx + uxxx + uxxxx + uxxxxx +. . . 1! 2! 3! 4! 5! (3.33) for m = −l, . . . , r . Multiplying each of these expansions by a constant am and summing them up we obtain the following equation: ui+m = ui + r m=−l,m=0 am ui+m − r m=−l,m=0 am ui = + + + + r m=−l,m=0 r m=−l,m=0 r m=−l,m=0 r m=−l,m=0 r m=−l,m=0 + ... mam ∆x ∂u 1! ∂x i ∆x2 ∂ 2 u 2! ∂x2 ∆x3 ∂ 3 u 3! ∂x3 ∆x4 ∂ 4 u 4! ∂x4 ∆x5 ∂ 5 u 5! ∂x5 m2 am m3 am m4 am m5 am i i i i (3.34) It is clear that the coefficient of the k-th derivative is given by bk = r =−l,m=0 mk am . m Equation (3.34) allows us to determine the r + l coefficients am according to the derivative desired and the order desired. Hence if the first order derivative is needed at fourth order accuracy, we would set b1 to 1 and b2,3,4 = 0. This would provide us with four equations, 3.3. TAYLOR SERIES 41 and hence we need at least four points in order to determine its solution uniquely. More generally, if we need the k-th derivative then the highest derivative to be neglected must be of order k + p − 1, and hence k + p − 1 points are needed. The equations will then have the form: r bq = m=−l,m=0 mq am = δqk , q = 1, 2, . . . , k + p − 1 (3.35) where δqk is the Kronecker delta δqk = 1 is q = k and 0 otherwise. For the solution to exit and be unique we must have: l + r = k + p. Once the solution is obtained we can determine the leading order truncation term by calculating the coefficient multiplying the next higher derivative in the truncation error series: r mk+p am . bk+1 (3.36) m=−l,m=0 Example 8 As an example of the application of the previous procedure, let us fix the stencil to r = 1 and l = −3. Notice that this is an off-centered scheme. The system of equation then reads as follows in matrix form: −3 −2 −1 1 a−3 (−3)2 (−2)2 (−1)2 (1)2 a−2 (−3)3 (−2)3 (−1)3 (1)3 a−1 (−3)4 (−2)4 (−1)4 (1)4 a1 = b1 b2 b3 b4 (3.37) If the first derivative is desired to fourth order accuracy, we would set b1 = 1 and b2,3,4 = 0, while if the second derivative is required to third order accuracy we would set b1,3,4 = 0 and b2 = 1. The coefficients for the first example would be: 3.3.5 a−3 a−2 a−1 a1 1 = 12 −1 12 −18 3 (3.38) Discrete Operator Operators are often used to describe the discrete transformations needed in approximating derivatives. This reduces the lengths of formulae and can be used to derive new approximations. We will limit ourselves to the case of the centered difference operator: δnx ui = δx ui = δ2x ui = ui+ n − ui− n 2 2 n ∆x ui+ 1 − ui− 1 2 = ux + O(∆x2 ) ∆x ui+1 − ui−1 = ux + O(∆x2 ) 2∆x 2 (3.39) (3.40) (3.41) 42 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES The second order derivative can be computed by noticing that 2 δx ui = δx (δx ui ) = δx (ux + O(∆x2 ) ui+ 1 − ui− 1 2 2 δx = uxx + O(∆x2 ) ∆x 1 δx (ui+ 1 ) − δx (ui− 1 ) = uxx + O(∆x2 ) 2 2 ∆x ui+1 − 2ui + ui−1 ) = uxx + O(∆x2 ) ∆ x2 (3.42) (3.43) (3.44) (3.45) The truncation error can be verified by going through the formal Taylor series analysis. Another application of operator notation is the derivation of higher order formula. For example, we know from the Taylor series that δ2x ui = ux + ∆ x2 uxxx + O(∆x4 ) 3! (3.46) If I can estimate the third order derivative to second order then I can substitute this 2 estimate in the above formula to get a fourth order estimate. Applying the δx operator to both sides of the above equation we get: ∆ x2 uxxx + O(∆x4 )) = uxxx + O(∆x2 ) 3! 2 2 δx (δ2x ui ) = δx (ux + Thus we have δ2x ui = ux + (3.47) ∆ x2 2 δ [δ2x ui + O(∆x2 )] 3! x (3.48) ∆x3 2 δ δ2x ui + O(∆x4 ) 3! x (3.49) Rearranging the equation we have: ux |xi = 3.4 1− Polynomial Fitting Taylor series expansion are not the only means to develop finite difference approximation. An another approach is to rely on polynomial fitting such as splines (which we will not discuss here), and Lagrange interpolation. We will concentrate on the later in the following section. Lagrange interpolation consists of fitting a polynomial of a specified defree to a given set of (xi , ui ) pairs. The slope at the point xi is approximated by taking the derivative of the polynomial at the point. The approach is best illustrate by looking at specific examples. 3.4.1 Linear Fit The linear polynomial: L1 (x) = x − xi+1 x − xi ui+1 − ui , xi ≤ x ≤ xi+1 ∆x ∆x (3.50) 3.4. POLYNOMIAL FITTING 43 The derivative of this function yields the forward difference formula: ux |xi = ∂L1 (x) ∂x = xi ui+1 − ui ∆x (3.51) A Taylor series analysis will show this approximation to be linear. Likewise if a linear interpolation is used to interpolate the function in xi−1 ≤ x ≤ xi we get the backward difference formula. 3.4.2 Quadratic Fit It is easily verified that the following quadratic interpolation will fit the function values at the points xi and xi±1 : (x − xi )(x − xi+1 ) (x − xi−1 )(x − xi+1 ) (x − xi−1 )(x − xi ) ui−1 − ui + ui+1 2∆x2 ∆ x2 2∆x2 (3.52) Differentiating the functions and evaluating it at xi we can get expressions for the first and second derivatives: L2 (x) = ∂L2 ∂x ∂ 2 L2 ∂x2 = ui+1 − ui−1 2∆x (3.53) = ui+1 − 2ui + ui−1 ∆x2 (3.54) xi xi Notice that these expression are identical to the formulae obtained earlier. A Taylor series analysis would confirm that both expression are second order accurate. 3.4.3 Higher order formula Higher order fomula can be develop by Lagrange polynomials of increasing degree. A word of caution is that high order Lagrange interpolation is practical when the evaluation point is in the middle of the stencil. High order Lagrange interpolation is notoriously noisy near the end of the stencil when equal grid spacing is used, and leads to the well known problem of Runge oscillations Boyd (1989). Spectral methods that do not use periodic Fourier functions (the usual “sin” and “cos” functions) rely on unevenly spaced points. To illustrate the Runge phenomenon we’ll take the simple example of interpolating the function 1 (3.55) f (x) = 1 + 25x2 in the interval |x| ≤ 1. The Lagrange interpolation using an equally spaced grid is shown in the upper panel of figure 3.4, the solid line refers to the exact function f while the dashed-colored lines to the Lagrange interpolants of different orders. In the center of the interval (near x = 0, the difference between the dashed lines and the solid black line decreases quickly as the polynomial order is increased. However, near the edges of the interval, the Lagrangian interpolants oscillates between the interpolation points. 44 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES Equally−Spaced Lagrange Interpolation Equally−Spaced Lagrange Interpolation 1 2 0.5 0 0 −2 −0.5 −4 Exact 5 Points 9 Points 13 Points −1.5 −2 f(x) f(x) −1 Exact 17 Points −6 −8 −2.5 −10 −3 −12 −3.5 −4 −1 −14 −0.5 0 x 0.5 1 −16 −1 −0.5 0 0.5 Gauss−Lobatto−Spaced Lagrange Interpolation 1.2 Exact 5 Points 9 Points 13 Points 17 Points 1 0.8 f(x) 0.6 0.4 0.2 0 −0.2 −1 −0.5 0 x 0.5 1 Figure 3.4: Illustration of the Runge phenomenon for equally-spaced Lagrangian interpolation (upper figures). The right upper figure illustrate the worsening amplitude of the oscillations as the degree is increased. The Runge oscillations are suppressed if an unequally spaced set of interpolation point is used (lower panel); here one based on Gauss-Lobatto roots of Chebyshev polynomials. The solution black line refers to the exact solution and the dashed lines to the Lagrangian interpolants. The location of the interpolation points can be guessed by the crossing of the dashed lines and the solid black line. 1 3.4. POLYNOMIAL FITTING 45 At a fixed point near the boundary, the oscillations’ amplitude becomes bigger as the polynomial degree is increased: the amplitude of the 16 order polynomial reaches of value of 17 and has to be plotted separately for clarity of presentation. This is not the case when a non-uniform grid is used for the interpolation as shown in the lower left panel of figure 3.4. The interpolants approach the true function in the center and at the edges of the interval. The points used in this case are the Gauss-Lobatto roots of the Chebyshev polynomial of degree N − 1, where N is the number of points. 46 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES 3.5 Compact Differencing Schemes A major disadvantage of the finite difference approach presented earlier is the widening of the computational stencil as the order of the approximation is increased. These large stencils are cumbersome near the edge of the domain where no data is available to perform the differencing. Fortunately, it is possible to derive high-order finite difference approximation with compact stencils at the expense of a small complication in their evalution: implicit differencing schemes (as opposed to explicit schemes) must be used. Here we show how these schemes can be derived. 3.5.1 Derivation of 3-term compact schemes The Taylor series expansion of ui+m , where ui+m = u(xi + m∆x) about point xi can be written as ∞ (m∆x)n (n) u (3.56) ui+m = n! n=0 where u(n) is the n-th derivative of u with respect to x at xi , with m being an arbitrary number. From this expression it is easy to obtain the following sum and difference ui+m ± ui−m = ui+m + ui−m 2 = ui+m − ui−m 2m∆x = ∞ ((1 ± (−1)n ) n=0 ∞ (m∆x)n (n) u n! (m∆x)n (n) u n! n=0,2,4 ∞ (m∆x)n (n+1) u (n + 1)! n=0,2,4 (3.57) (3.58) (3.59) These expansion apply to arbitraty functions u as long as the expansion is valid; so they apply in particular to their derivatives. In case we substitute u(1) for u in the summation expansion we obtain an expression for the expansion of the derivatives: (1) (1) ui+m + ui−m 2 = ∞ (m∆x)n (n+1) u n! n=0,2,4 (3.60) Consider centered expansions of the following form (1) (1) (1) αui−1 + ui + αui+1 = a1 δ2x ui + a2 δ4x ui + a3 δ6x ui (3.61) where α, and the ai are unknown constants. The Taylor series expansion of the left and right hand sides can be matched as follows (1) ui or + 2α ∞ ∞ 1 a1 + 2n a2 + 3n a3 ∆xn u(n+1) = ∆xn u(n+1) n! (n + 1)! n=0,2,4 n=0,2,4 ∞ (a1 + 2n a2 + 3n a3 ) − (n + 1)(δn0 + 2α) ∆xn u(n+1) = 0 (n + 1)! n=0,2,4 (3.62) (3.63) 3.5. COMPACT DIFFERENCING SCHEMES 47 Here δn0 refers to the Kronecker delta: δnm = 0 for n = m and 1 if n = m. This leads to the following constraints on the constants ai and α: a1 + a2 + a3 n = 1 + 2α n a1 + 2 a2 + 3 a3 = 2(n + 1)α for n = 0 (3.64) for n = 2, 4, . . . , N (3.65) with a leading truncation error of the form: a1 + 2N +2 a2 + 3N +2 a3 − 2(N + 3)α ∆xN +2 u(N +3) (N + 3)! (3.66) Since we only have a handful of parameters we cannot hope to satisfy all these constraints for all n. However, we can derive progressively better approximation by matching higherorder terms. Indeed with 4 paramters at our disposal we can only satisfy 4 constraints. Let us explore some of the possible options. 3.5.2 Families of Fourth order schemes The smallest computational stencil is obtained by setting a2 = a3 = 0, in which case only 2 parameters are left to maximize the accuracy of the scheme, and only 2 constraints can be imposed: a1 a1 − 2 3! −2 2! α=1 α=0 with solution α a1 = = 1 4 3 2 (3.67) A family of fourth order schemes can be obtained if we allow a wider stencil on the right hand side of the equations, and allow a2 = 0. This family of schemes can be generated by the single parameter α a1 + a2 = 1 + 2α a1 + 4 a2 = 6α with solution a 1 a2 = = 2 (α + 2) 3 1 (4α − 1) 3 (3.68) The leading terms in the truncation error would then be TE = = a1 + 24 a2 − 2 · 5α a1 + 26 a2 − 2 · 7α ∆x4 u(5) + ∆x6 u(7) 5! 7! 4(3α − 1) 4(18α − 5) ∆x4 u(5) + ∆x6 u(7) 5! 7! (3.69) (3.70) This family of compact scheme can be made unique by, for example, requiring the scheme to be sixth-order and setting α = 1/3; the leading truncation error term would then be 4/(7!)∆x6 u(7) . 48 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES 10 0 10 −5 −5 10 ||ε|| 2 ||ε||∞ 10 0 10 10 −10 −10 10 −15 −15 1 10 10 10 2 10 1 2 10 M M Figure 3.5: Convergence curves for the compact (solid ligns) and explicit centered difference (dashed lines) schemes for the sample function u = sin πx, |x| ≤ 1. The dash-dot lines refers to lines of slope -2,-4,-6, and -8. The left panel shows the 2-norm of the error while the right panel shows the ∞-norm. 3.5.3 Families of Sixth order schemes Allowing the stencil on the right hand sides to expand, a3 = 0, we can generate families of sixth-order schemes easily. The constraints are given by a1 a 1 a 1 + a2 + a3 = 1 + 2α 2a 2 a= +2 6α 2+3 3 + 24 a2 + 34 a3 = 10α with solution The leading terms in the truncation error would then be TE = = a1 a 2 a 3 = = = α+9 6 32α − 9 15 −3α + 1 10 (3.71) a1 + 28 a2 + 38 a3 − 2 · 9α a1 + 26 a2 + 36 a3 − 2 · 7α ∆x6 u(7) + ∆x8 u(9) (3.72) 7! 9! 12(−8α + 3) 72(−20α + 7) ∆x6 u(7) + ∆x8 u(9) (3.73) 7! 9! Again, the formal order of the scheme can be made eighth order if one chooses α = 3/8. 3.5.4 Numerical experiments Figure 3.5 illustrate the convergence of the various compact scheme presented here for the sample function u = sin πx. For comparison we have shown in dashed lines the convergence curves for the explicit finite difference schemes presented earlier. The dashed-dot lines are reference curves with slopes -2,-4,-6 and -8, respectively. For this infinitly differentiable function, the various schemes achieve their theoretically expected convergence order as the order fo the scheme is increased. It should be noted that, although, the convergence curves of the two approaches are parallel (the same slope), the errors of the compact schemes are lower then those of their explicit differene counter-parts. Chapter 4 Application of Finite Differences to ODE In this chapter we explore the application of finite differences in the simplest setting possible, namely where there is only one independent variable. The equations are then referred to as ordinary differential equations (ODE). We will use the setting of ODE’s to introduce several concepts and tools that will be useful in the numerical solution of partial differential equations. Furthermore, time-marching schemes are almost universally reliant on finite difference methods for discretization, and hence a study of ODE’s is time well spent. 4.1 Introduction Here we derive how an ODE may be obtained in the process of solving numerically a partial differential equations. Let us consider the problem of solving the following PDE: ut + cux = νuxx , 0 ≤ x ≤ L (4.1) subject to periodic boundary conditions. Equation (4.1) is an advection diffusion equation with c being the advecting velocity and ν the viscosity coeffficient. We will take c and ν to be positive constants. The two independent variables are t for time and x for space. Because of the periodicity, it is sensible to expand the unknown function in a Fourier series: ∞ un (t)eikn x ˆ u= (4.2) k =−∞ where un are the complex amplitudes and depend only on the time variable, whereas eikx ˆ are the Fourier functions with wavenumber kn . Because of the periodicity requirement we have kn = 2πn/L where n is an integer. The Fourier functions form what is called an orthonormal basis, and can be determined as follows: multiply the two sides of equations (4.2) by e−ikm x where m is integer and integrate over the interval [0π ] to get: L 0 ue−ikm x dx = ∞ L un (t) ˆ k =−∞ 49 0 ei(kn −km )x dx (4.3) 50 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE Now notice that the integral on the right hand side of equation (4.3) satisfies the orthogonality property: L i(kn −km )x e 0 dx = ei(kn −km )L − 1 i(kn − km ) L, n = m = ei2π(n−m)L − 1 = 0, n = m i(kn − km ) (4.4) The role of the integration is to pick out the m − th Fourier component since all the other integrals are zero. We end up with the following expression for the Fourier coefficients: um = ˆ 1 L L 0 u(x)e−ikm x dx (4.5) Equation (4.5) would allow us to calculate the Fourier coefficients for a known function u. Note that for a real function, the Fourier coefficients satisfy u− n = u∗ ˆ ˆn (4.6) where the ∗ superscript stands for the complex conjugate. Thus, only the positive Fourier components need to be determined and the negative ones are simply the complex conjugates of the positive components. The Fourier series (4.2) can now be differentiated term by term to get an expression for the derivatives of u, namely: ux = uxx = ut = ∞ k =−∞ ∞ k =−∞ ∞ ikn un (t)eikn x ˆ (4.7) 2 ˆ −kn un (t)eikn x (4.8) dˆn ikn x u e dt k =−∞ (4.9) Replacing these expressions for the derivative in the original equation and collecting terms we arrive at the following equations: ∞ k =−∞ dun ˆ 2 + (ickn + νkn )ˆn eikn x = 0 u dt (4.10) Note that the above equation has to be satisfied for all x and hence its Fourier amplitudes must be zero for all x (just remember the orthogonality property (4.4), and replace u by zero). Each Fourier component can be studied separately thanks to the linearity and the constant coefficients of the PDE. The governing equation for the Fourier amplitude is now dˆ u = −(ick + νk2 ) u ˆ dt κ (4.11) 4.2. FORWARD EULER APPROXIMATION 51 where we have removed the subscript n to simplify the notation, and have introduced the complex number κ. The solution to this simple ODE is: u = u0 eκt ˆˆ (4.12) where u0 = u(t = 0) is the Fourier amplitude at the initial time. Taking the ratio of the ˆ ˆ solution between time t and t + ∆t, we can get see the expected behavior of the solution between two consquetive times: u(t + ∆t) ˆ = eκ∆t = eRe(κ)∆t eiI m(κ)∆t u(t) ˆ (4.13) where Re(κ) and I m(κ) refer to the real and imaginary parts of κ. It is now clear to follow the evolution of the amplitude of the Fourier components: |u(t + ∆t)| = |u(t)|eRe(κ)∆t ˆ ˆ (4.14) The analytical solution predicts an exponential decrease if Re(κ) < 0, an exponential increase if Re(κ) > 0, and a constant amplitude if Re(κ) = 0. The imaginary part of κ influences only the phase of the solution and decreases by an amount I m(κ)∆t. We now turn to the issue of devising numerical solution to the ODE. 4.2 Forward Euler Approximation Let us approximate the time derivative in (4.11) by a forward difference approximation (the name forward Euler is also used) to get: un+1 − un ≈ κun ∆t (4.15) where the superscript indicates the time level:un = u(n∆t), and where we have removed the ˆ for simplicity. Equation (4.15) is an explicit approximation to the original differential equation since no information about the unknown function at the future time (n + 1)∆t has been used on the right hand side of the equation. In order to derive the error committed in the approximation we rely again on Taylor series. Expanding un+1 about time level n∆ts, and inserting in the forward difference expression (4.15) we get: ut − κu = ∆ t2 ∆t ut − utt 2 3! truncation error ∼O(∆t) − (4.16) The terms on the right hand side are the truncation errors of the forward Euler approximation. The formal definition of the truncation error is that it is the difference between the analytical and approximate representation of the differential equation. The leading error term (for sufficiently small ∆t) is linear in ∆t and hence we expect the errors to decrease linearly. Most importantly, the approximation is consistent in that the truncation error goes to zero as ∆t → 0. 52 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE Given the initial condition u(t = 0) = u0 we can advance the solution in time to get: u1 = (1 + κ∆t)u0 u2 = (1 + κ∆t)u1 = (1 + κ∆t)2 u0 u3 = (1 + κ∆t)u2 = (1 + κ∆t)3 u0 . . . (4.17) un = (1 + κ∆t)un−1 = (1 + κ∆t)n u0 Let us study what happens when we let ∆t → 0 for a fixed time integration tn = n∆t. The only factor we need to worry about is the numerical amplification factor:(1 + κ∆t): tn lim (1 + κ∆t) ∆t = lim e ∆t→0 tn ln(1+κ∆t) ∆t ∆t→0 = lim e tn (κ∆t−κ2 ∆t2 +...) ∆t ∆t→0 = eκtn (4.18) where we have used the logarithm Taylor series ln(1 + ǫ) = ǫ − ǫ2 + . . ., assuming that κ∆t is small. Hence we have proven convergence of the numerical solution to the analytic solution in the limit ∆t → 0. The question is what happens for finite ∆t? Notice that in analogy to the analytic solution we can define an amplification factor associated with the numerical solution, namely: un = |A|eiθ (4.19) un − 1 where θ is the argument of the complex number A. The amplitude of A will determine whether the numerical solution is amplifying or decaying, and its argument will determine the change in phase. The numerical amplification factor should mimic the analytical amplification factor, and should lead to an anologous increase or decrease of the solution. For small κ∆t it can be seen that A is just the first term of the Taylor series expansion of eκ∆t and is hence only first order accurate. Let us investigate the magnitude of A in terms of κ, a problem parameter, and ∆t the numerical parameter, we have: A= |A|2 = AA∗ = 1 + 2Re(κ)∆t + |κ|2 ∆t2 (4.20) We focus in particular for the condition under which the amplitude factor is less then 1. The following condition need then to be fullfilled (assuming ∆t > 0): ∆t ≤ −2 Re(κ) |κ|2 (4.21) There are two cases to consider depending on the sign of Re(κ). If Re(κ) > 0 then |A| > 1 for ∆t > 0, and the finite difference solution will grow like the analytical solution. For Re(κ) = 0, the solution will also grow in amplitude whereas the analytical solution eκ predicts a neutral amplification. If Re(κ) < 0, then |A| > 1 for ∆t > −2 Rκ(2 ) whereas || the analytical solution predicts a decay. The moral of the story is that the numerical solution can behave in unexpected ways. We can rewrite the amplification factor in the following form: |A|2 = AA∗ = [Re(z ) + 1]2 + [I m(z )]2 (4.22) where z = κ∆t. The above equation can be interpreted as the equation for a circle centered at (−1, 0) in the complex plane with radius |A|2 . Thus z must be within the unit circle centered at (−1, 0) for |A|2 ≤ 1. 4.3. STABILITY, CONSISTENCY AND CONVERGENCE 4.3 53 Stability, Consistency and Convergence Let us denote by u the exact analytical solution to the differential equation, and by U the numerical solution obtained using a finite difference scheme. The quantity U − u is a norm of the error. we have the following definitions: • Convergence A scheme is called to converge to O(∆tp ) if U − u = O(∆tp ) as ∆t → 0, where p is a positive constant. • Truncation Error The local difference between the difference approximation and the differential equations. It is the error introduced if the exact solution is plugged into the difference equations. • Consistency The finite difference approximation is called consistent with the differential equation if the truncation error goes to zero when the numerical parameters are made arbitrarily small. • Stability A method is called stable if there is a constant C independent of the time step or the number of time steps such that: Un < C U0 (4.23) Equation (4.23) is a very loose constraint on the numerical approximation to guarantee convergence. This constraint allows the solution to grow in time (indeed the solution can still grow exponentially fast), but rules out growth that depends on the number of time steps or the step size. In situations where the analytical solution is known not to grow, it is entirely reasonable to put the restriction: Un < U0 4.3.1 (4.24) Lax Richtmeyer theorem The celebrated Lax-Richtmeyer theorem links the notion of consistency and stability for linear differential equations. It maintains that for linear differential equations, a consistent finite difference approximation converges to the true solution if the scheme is stable. The converse is also true in that a convergent and consistent numerical solution must be stable. We will show a here simplified version of the Lax-Richtmeyer equivalence theorem to highlight the relationships between consistency stability, and justify the constraints we place on the finite difference approximations. A general form for the integration of the ODE U takes the form: U n = AU n−1 + bn−1 (4.25) where A is the multiplicative factor and b a source sink term that does not depend on u. If the exact solution is plugged into the above recursive formula we get: un = Aun−1 + bn−1 + T n−1 ∆t (4.26) where T n−1 is the truncation error at time n. Substracting the two equations from each others, and invoking the linearity of the process, we can derive an equation for the evolution of the error in time, namely: en = Aen−1 − T n−1 ∆t (4.27) 54 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE where en = U n − un is the total error at time tn = n∆t. The reapplication of this formula to en−1 transforms it to: en = A(Aen−2 − T n−2 ∆t) − T n−1 ∆t = (A)2 en−2 − ∆t AT n−2 + T n−1 (4.28) where (A)2 = A.A, and the remainder of the superscript indicate time levels. Repeated application of this formula shows that: en = (A)n e0 − ∆t (A)n−1 T 0 + (A)n−2 T 1 + . . . + AT n−2 + T n−1 (4.29) Equation (4.29) shows that the error at time level n depends on the initial error, on the history of the truncation error, and on the discretization through the factor A. We will now attempt to bound this error and show that this possible if the truncation error can be made arbitrarily small (the consistency condition), and if the scheme is stable according to the definition shown above. A simple application of the triangle inequality shows that |en | ≤ |An | |e0 | +∆t (A)n−1 T 0 + (A)n−2 T 1 + . . . + |A| T n−2 + T n−1 (4.30) Now we define T = max |T m | for all 0 ≤ m ≤ n − 1, that is T is the maximum norm of the truncation error encountered during the course of the calculation, then the right hand side of the above inequality can be bounded again: |en | ≤ |An | |e0 | + ∆tT n−1 m=0 |(A)m | (4.31) In order to proceed further we need to introduce also the maximum bound on the amplification factor and all its powers. So let us assume that the scheme is stable, i.e. there is a positive constant C , independent of ∆t and n, such that max (|Am |) ≤ C, for 0 ≤ m ≤ n (4.32) Since the individual entries in the summation are smaller then C and the sum involves n terms, the sum must be smaller then nC . The inequality (4.32) is then bounded above by C |E 0 | + nT ∆tC , and we arrive at: |en | ≤ C |e0 | + tn T C (4.33) where tn = n∆t is the final integration time. The right hand side of (4.33) can be made arbitrarily small by the following argument. First, the initial condition is known and so the initial error is (save for round-off errors) zero. Second, since the scheme is consistent, the maximum truncation error T can be made arbitrarily small by choosing smaller and smaller ∆t. The end results is that the bound on the error can be made arbitrarily small if the approximation is stable and consistent. Hence the scheme converges to the solution as ∆t → 0. 4.4. BACKWARD DIFFERENCE 4.3.2 55 Von Neumann stability condition Here we derive a practical bound on the amplification factor |A| based on the criteria used in the derivation of the equivalence theorem |Am | ≤ C : |Am | = |A|m ≤ C |A| ≤ C (4.34) 1 m =e ∆t ln C tm |A| ≤ 1 + O(∆t) (4.35) (4.36) where we have used the Taylor series for the exponential in arriving at the final expression. This is the least restrictive condition on the amplification factor that will permit us to bound the error growth for finite times. Thus the modulus of the amplification factor maybe greater then 1 by an amount proportional to positive powers of |∆t|. This gives plenty of latitude for the numerical solution to grow, but will prevent this growth from depending on the time step or the number of time steps. In practice, the stability criterion is too generous, particularly when we know the solution is bounded. The growth of the numerical solution should be bounded at all times by setting C = 1. In this case the Von Neumann stability criterion reduces to |A| ≤ 1 4.4 (4.37) Backward Difference The backward difference formula to the ODE is du dt tn+1 ≈ un+1 − un = κun+1 ∆t (4.38) This is an example of an implicit method since the unknown un+1 has been used in evaluating the slope of the solution on the right hand side; this is not a problem to solve for un+1 in this scalar and linear case. For more complicated situations like a nonlinear right hand side or a system of equations, a nonlinear system of equations may have to be inverted. It is easy to show via Taylor series analysis that the truncation error for the backward difference scheme is O(∆t) and the scheme is hence consistent and of first order. The numerical solution can be updated according to: un+1 = un 1 − κ∆t (4.39) and the amplification factor is simply A = 1/(1 − κ∆t). Its magnitude is given by: |A|2 = 1 1 − 2Re(κ)∆t + |κ|2 ∆t2 The condition under which this amplification factor is bounded by 1 is (4.40) 56 4.5 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE Backward Difference The backward difference formula to the ODE is du dt ≈ tn+1 un+1 − un = κun+1 ∆t (4.41) This is an example of an implicit method since the unknown un+1 has been used in evaluating the slope of the solution on the right hand side; this is not a problem to solve for un+1 in this scalar and linear case. For more complicated situations like a nonlinear right hand side or a system of equations, a nonlinear system of equations may have to be inverted. It is easy to show via Taylor series analysis that the truncation error for the backward difference scheme is O(∆t) and the scheme is hence consistent and of first order. The numerical solution can be updated according to: un+1 = un 1 − κ∆t (4.42) and the amplification factor is simply A = 1/(1 − κ∆t). Its magnitude is given by: |A|2 = 1 1 − 2Re(κ)∆t + |κ|2 ∆t2 (4.43) The condition under which this amplification factor is bounded by 1 is ∆t ≥ 2 Re(κ) κ2 (4.44) again depend on the sign of Re(κ). If Re(κ) < 0, then |A| < 1 for all ∆t > 0; this is an instance of unconditional stability. The numerical solution is also damped when Re(κ) = 0 whereas the analytical solution is neutral. The numerical amplitude factor can be rewritten as: 1 (4.45) [1 − Re(z )]2 + [I m(z )]2 = |A|2 and shows contours of constant amplitude factors to be circles centered at (1,0) and of radius 1/|A|. 4.6 Trapezoidal Scheme The trapezoidal scheme is an an example of second order scheme that uses only two time 1 levels. It is based on applying the derivative at the intermediate time n + 2 , and using a centered difference formula with step size ∆t/2. The derivation is as follows: du dt 1 = κun+ 2 (4.46) tn+ 1 2 un+1 + un un+1 − un + O(∆t2 ) = κ + O(∆t2 ) ∆t 2 (4.47) 4.6. TRAPEZOIDAL SCHEME 57 using simple Taylor series expansions about time n + 1 . The truncation error is O(∆t2 ) 2 and the method is hence second order accurate. It is implicit since un+1 is used in the evaluation of the right hand side. The unkown function can be updated as: un+1 = 1+ 1− κ ∆t 2 un κ ∆t 2 (4.48) The amplification factor is 1+ A= 1− κ ∆t 2 , κ ∆t 2 2 |A| = 1 + Re(κ∆t) + 1 − Re(κ∆t) + |κ2 |∆t2 4 |κ2 |∆t2 4 (4.49) The condition for |A| < 1 is simply Re(κ) ≤ 0 (4.50) The scheme is hence unconditionally stable for Re(∆t) < 0 and neutrally stable (|A| = 1) for Re(κ) = 0. Example 9 To illustrate the application of the different scheme we proceed to evaluate numerically the solution of ut = −iu with the initial condition u = 1. The analytical solution in the complex u plane is a circle that starts at (1, 0) and proceeds counterclockwise. We then proceed to compute the numerical solutions using the forward, backward and trapezoidal schemes. The modulus of the forward Euler solution cycles outward and is indicative of the unstable nature of the scheme. The backward Euler solution cycles inward indicative of a numerical solution that is too damped. Finally, the trapezoidal scheme has neutral amplification: its solution remains on the unit circle and tracks the analytical solution quite closely. Notice however, that the trapezoidal solution seem to lag behind the analytical solution, and this lag seems to increase with time. This is symptomatic of lagging phase errors. 4.6.1 Phase Errors Stability is primarily concerned with the modulus of the amplification factor. However, the accuracy of the numerical scheme depends on the amplitude and phase errors of the scheme. The phase error can be analyzed by inspecting the argument of the amplification factor, the θ term in equation (4.19). The analytical change of phase for our model problem is θe = I m(κ)∆t, the ratio of the numerical and analytical phase is called the relative phase error: θ (4.51) R= θe When R > 1 the numerical phase exceeds the analytical phase and we call the scheme accelerating; if R < 1 the scheme is decelerating. For the forward differencing scheme the relative phase is given by: R= I m(z ) 1 tan−1 I m(z ) 1 + Re(z ) (4.52) 58 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −2 −1 0 1 2 Figure 4.1: Solution of the oscillation equation using the forward (x), backward (+) and trapezoidal schemes (◦). The analytical solution is indicated by a red asterisk. 1.3 RK3 1.2 RK4 1.1 Relative Phase 1 RK2 0.9 0.8 TZ 0.7 0.6 FD,BD 0.5 0.4 0 0.5 1 κ∆ t 1.5 2 2.5 Figure 4.2: Phase errors for several two-level schemes when Re(κ) = 0. The forwad and backward differencing schemes (FD and BD) have the same decelerating relative phase. The Trapezoidal scheme (TZ) has lower phase error for the same κ∆t as the 2 first order schemes. The Runge Kutta schemes of order 2 and 3 are accelerating. The best performance is for RK4 which stays closest to the analytical curve for the largest portion of the spectrum. 4.7. HIGHER ORDER METHODS 59 In general it is hard to get a simple formula for the phase error since the expressions often involve the tan−1 functions with complicated arguments. Figure 4.2 shows the relative phase as a function of κ∆t for the case where Re(κ) = 0 for several time integration schemes. The solid black line (R = 1) is the reference for an exact phase. The forward, backward and trapezoidal differencing have negative phase errors (and hence the schemes are decelerating), while the RK schemes (to be presented below) have an accelerating phase. 4.7 4.7.1 Higher Order Methods Multi Stage (Runge Kutta) Methods One approach to increasing the order of the calculations without using information at previous time levels is to introduce intermediate stages in the calculations. The most popular of these approaches is referred to as the Runge Kutta methods. We will illustrate their derivation for the second order scheme. For generality, we will assume that the ODE takes the form du = f (u, t), u(0) = u0 (4.53) dt The derivation of the second order Runge Kutta method starts with the expression: u(1) = un + a21 ∆t u n+1 n (4.54) n (1) = u + b1 ∆tf (u , tn ) + b2 ∆tf (u , tn + c2 ∆t) (4.55) where a21 , b1 , b2 and c2 are constant that will be determined on the basis of accuracy. Variants of this approach has the constants determined on the basis of stability considertaion, but in the following we follow the accuracy criterion. The key to determining these constants is to match the Taylor series expansion for the ODE with that of the approximation. Expanding u as a Taylor series in time we get: un+1 = un + ∆t du ∆t2 d2 u + + O(∆t3 ) dt 2! dt2 (4.56) Notice that the ODE provides the information necessary to compute the derivative in the Taylor series. Thus we have: du dt d2 u dt2 d3 u dt3 = f (u, t) (4.57) ∂f ∂f du df = + dt ∂t ∂u dt ∂f ∂f = + f ∂t ∂u df dft dfu + f + fu = dt dt dt 2 = ftt + fut f + fut f + fuu f 2 + fu ft + fu f = 2 = ftt + 2fut f + fuu f + fu ft + 2 fu f (4.58) (4.59) (4.60) (4.61) (4.62) 60 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE Replacing these two derivatives in the Taylor series expression we get: ∆ t2 ∆ t3 2 [fu f + ft ]tn + [ftt +2fut f + fuuf 2 + fu ft + fu f ]+ O(∆t4 ) 2! 3! (4.63) We now turn to expanding the expression for the proposed difference equations. We have to proceed carefully to include the effects of changes in u and t in our expansion. We start by expanding the last term of equation (4.55) about the variable un . un+1 = un +∆tf (un , tn )+ f (un + a21 ∆tf, tn + c2 ∆t) = f (un , tn + c2 ∆t)+ a21 ∆tf fu + (a21 ∆tf )2 fuu + O(∆t3 ) (4.64) 2! Now each term in expansion (4.64) is expanded in the t variable about time tn to get: (c2 ∆t)2 ftt + O(∆t2 ) 2! fu (un , tn + c2 ∆t) = fu (un , tn ) + fut (un , tn )c2 ∆t + O(∆t2 ) f (un , tn + c2 ∆t) = f (un , tn ) + c2 ∆tft + n n fuu (u , tn + c2 ∆t) = fuu (u , tn ) + O(∆t) (4.65) (4.66) (4.67) Substituting these expressions in expansion (4.64) we get the two-variable Taylor series expansion for f . The whole expression is then inserted in (4.55) to get: un+1 = un + (b2 + b1 ) f ∆t + (b2 a21 f fu + b2 c2 ft ) ∆t2 + b2 c2 2 2 ftt + b2 a21 c2 f fut + b2 a2 21 2 (4.68) fuu ∆t3 + O(∆t3 ) Matching the expansions (4.68) and (4.63) term by term we get the following equations for the different constants. b2 + b1 = 1 2b2 a21 = 1 (4.69) 2b c =1 22 A solution can be found in term of the parameter b2 , and it is as follows: b1 a21 c 2 = 1 − b2 1 = 2b2 1 = 2b2 (4.70) A family of second order Runge Kutta schemes can be obtained by varying b2 . Two common choices are • Midpoint rule with b2 = 1, so that b1 = 0 and a21 = c2 = becomes: u(1) = un + ∆t f (un , tn ) 2 un+1 = un + ∆tf (u(1) , tn + 1 2. The schemes (4.71) ∆t ) 2 (4.72) The first phase of the midpoint rule is a forward Euler half step, followed by a centered approximation at the mid-time level. 4.7. HIGHER ORDER METHODS • Heum rule b2 = 1 2 61 and a21 = c2 = 1. u(1) = un + ∆tf (un , tn ) ∆t un+1 = un + [f (u(1) , tn + ∆t) + f (un , tn )] 2 (4.73) (4.74) The first step is forward Euler full step followed by a centered step with the averaged sloped. Higher order Runge Kutta schemes can be derived by introducing additional stages between the two time levels; their derivation is however very complicated (for more information see Butcher (1987) and see Dormand (1996) for a more readable account). Here we limit ourselves to listing the algorithms for common third and fourth order Runge Kutta schemes. • RK3 q1 = ∆tf (un , tn ) q2 = ∆tf (u(1) , tn + q3 = ∆tf (u(2) , tn + ∆t 3 )− ∆t 3 )− 5q1 9 153q2 128 1 u(1) = un + q3 u(2) = u(1) + 15q2 16 un+1 = u(2) + 8q3 15 (4.75) • RK4 The fourth order RK scheme is the most well-known for its accuracy and large stability region. It is: q1 = q2 = q3 = q4 = un+1 = 4.7.2 ∆tf (un , tn ) 1 ∆tf (un + q2 , tn + ∆t ) 2 n + q 2 , t + ∆t ) ∆tf (u n 2 2 ∆tf (un + q3 , tn + ∆t) un + q1 +2q2 +2q3 +q4 6 (4.76) Remarks on RK schemes The following is a quick summary of the RK properties of different orders. 1. Implicit Runge Kutta time steps are possible. 2. Runge Kutta offers high order integration with only information at two time levels. Automatic step size control is easy since the order of the method does not depend on maintaining the same step size as the calculation proceeds. Several strategies are then possible to maximize accuracy and reduce CPU cost. 3. For order less then or equal to 4, only one stage is required per additional order, which is optimal. For higher order, it is known that the number of stages exceeds the order of the method. For example, a fifth order method requires 6 stages, and an eight order RK scheme requires a minimum of 11 stages. 4. Runge Kutta schemes require multiple evaluation of the right hand side per time step. This can be quite costly if a large system of simultaneous ODE’s is involved. Alternatives to the RK steps are multi-level schemes. 62 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE 3.5 RK4 3 2.5 1 RK3 AB2 Im(κ∆ t) 0.8 AB3 0.6 2 RK2 0.4 0.2 1.5 0 RK1 1 −0.2 −0.4 −0.6 0.5 −0.8 0 −3 −1 −2.5 −2 −1.5 −1 Re(κ∆ t) −0.5 0 0.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 Figure 4.3: Stability region for the Runge-Kutta methods of order 1, 2, 3, and 4 (left figure), and for the Adams Bashforth schmes of order 2 and 3 (right figure). The RK2 and AB2 stability curves are tangent to the imaginary axis at the origin, and hence the method are not stable for purely imaginary κ∆t. 5. A new family of Runge Kutta scheme was devised in recent years to cope with the requirements of Total Variations Diminishing (TVD) schemes. For second order methods, the Heum scheme is TVD. The third order TVD Runge Kutta scheme is 1 u(1) = un + q3 3 n 1 (1) 1 q2 = ∆tf (u(1) , tn + ∆t) u(2) = u + u + q2 4 4 4 (2) , t + ∆t ) un+1 = 1 un + 2 u(2) + 2 q q3 = ∆tf (u n 3 2 3 3 3 q1 = ∆tf (un , tn ) 4.7.3 (4.77) Multi Time Levels Methods The Runge-Kutta methods achieve their accuracy by evaluating the right hand side function at intermediate time levels. The cost of this accuracy is a multiple evaluation of f for a an integration of size ∆t. This may prove to be expensive if we are looking at complicated right hand sides and/or systems of ODEs. An alternative is to use information prior to tn to increase the order of accuracy at tn+1 . Leap Frog scheme The leap frog scheme consists of using a centered difference in time at level n: un+1 = un−1 + 2∆tf (un , tn ) (4.78) It is easy to show that the truncation error is of size O(∆t2 ). Moreover, unlike the trapezoidal scheme, it is explicit in the unknown un+1 , and hence does not involve nonlinear 4.7. HIGHER ORDER METHODS 63 complications, nor systems of equations. For our model equation, the trapezoidal scheme takes the form: un+1 = un−1 + 2κ∆tun (4.79) The determination of the amplification factor is complicated by the fact that two time levels are involved in the calculations. Nevertheless, let us assume that the amplification factor is the same for each time step, i.e. un = Aun−1 , and un+1 = Aun . We then arrive at the following equation: A2 − 2zA − 1 = 0 (4.80) There are two solutions to this quadratic equation: A± = z ± 1 + z2 (4.81) In the limit of good resolution, |z | → 0, we have A+ → 1 and A− → −1. The numerical solution is capable of behaving in two different ways, or modes. The mode associated with A+ is referred to as the physical mode because it approximates the solution to the original differential equation. The mode associated with A− is referred to as the computational mode since it arises solely as an artifact of the numerical procedure. The origin of the computational mode can be traced back to the fact that the leapfrog scheme is an approximation to a higher order equation that requires more initial conditions then necessary for the original ODE. To show this consider the trivial case of κ = 0 where the analytical solution is simply given by ua = u0 ; here u0 is the initial condition. The amplitude factors for the leap-frog schemes are A+ = 1 and A− = −1, and hence the computational mode is expected to keep its amplitude but switch sign at every time step. Applying the leap-frog scheme we see that all even time levels will have the correct value: u2 = u4 = . . . = u0 . The odd time levels will be contaminated by error in estimating the second initial condition needed to jump start the calculations. If u1 = u0 + ǫ where ǫ is the initial error committed, the solution at all odd time levels will then be u2n+1 = u0 + ǫ. The numerical solution for the present simple case can be written entirely in terms of the physical (initial condition) and computational (initial condition error) modes: ǫ ǫ (4.82) un = u0 + − (−1)n 2 2 Absolute stability requires that |A|± ≤ 1; notice however that the product of the two roots is A+ A− = −1, which implies that |A+ ||A− | = 1. Hence, if one root, say A+ is stable |A+ | < 1, the other one must be unstable with |A− | = 1/|A+ | > 1; the only exception is when both amplification factor have a neutral amplification |A+ | = |A− | = 1. For real z , Im(z ) = 0, one of the two roots has modulus exceeding 1, and the scheme is always unstable. Let us for a moment assume that z = iλ , we then have: √ A = iλ + 1 − λ2 . If λ ≤ 1 then the quantity under the square root sign is positive and we have two roots such that |A+ | = |A− | = 1. To make further progress on studying the stability of the leap frog scheme, let z = sinh(w) where w is a complex number. Using the identity cosh2 w − sinh2 w = 1 we arrive at the expression A± = sinh w ± cosh w. Setting w = a + ib where a, b are real, subsituting in the previous expression for the amplification factor, and calculating its modulus we get: |A± | = e±a . Hence a = 0 for 64 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE both amplification factors to be stable. The region of stability is hence z = i sin b where b is real, and is confined to the unit slit along the imaginary axis |I m(z )| ≤ 1. The leap frog scheme is a popular scheme to integrate PDE’s of primarily hyperbolic type in spite of the existence of the computational mode. The reason lies primarily in its neutral stability and good phase properties. The control of the computational mode can be effectively achieved either with a Asselin time filter (see Durran (1999)) or by discarding periodically the solution at level n − 1 and taking a two time level scheme. Multi-Step schemes A family of multi-step schemes can built upon interpolating the right hand side of the ODE in the interval [tn tn+1 ] and performing the integral. The derivation starts from the exact solution to the ODE: un+1 = un + tn+1 f (u, t) dt (4.83) tn Since the integrand is unknown in [tn tn+1 ] we need to find a way to approximate it given information at specific time levels. A simple way to achieve this is to use a polynomial that interpolates the integrand at the points (tk , uk ), n − p ≤ k ≤ n, where the solution is known. If we write: f (u, t) = Vp (t) + Ep (t) (4.84) where Vp is the polynomial approximation and Ep the error associated with it, then the numerical scheme becomes: un+1 = un + tn+1 Vp (t) dt + tn tn+1 tn Ep (t) dt (4.85) If the integration of Vp is performed exactly, then the only approximation errors present are due to the integration of the interpolation error term; this term can be bounded by max(|Ep |∆t. The explicit family of Adams Bashforth scheme relies on Lagrange interpolation. Specifically, p hp (t)f n−p , k Vp (t) = (4.86) k =0 p hp (t) = k m=0,m=k = t − t n−m t n−k − t n−m (4.87) t − tn−(k−1) t − t n−k −1 t − tn−p t − tn ... ... tn−k − t n tn−k − tn−(k−1) tn−k − tn−k−1 t n−k − tn−p (4.88) It is easy to verify that hp (t) is a polynomial of degree p − 1 in t, and that hp (tn−m ) = 0 k k for m = k and hp (tn−k ) = 1. These last two properties ensures that Vp (tn−k ) = f n−k . k The error associated with the Lagrange interpolation with p + 1 points is O(∆tp+1 ). Inserting the expressions for Vp in the numerical scheme, and we get: p un+1 = un + f n−k k =0 tn+1 tn hp (t) dt k + ∆tO(∆tp+2 ) (4.89) 4.7. HIGHER ORDER METHODS 65 Note that the error appearing in the above formula is only the local error, the global error is one order less, i.e. it is O(∆tp+1 ). We illustrate the application of this procedure by considering the derivation of its second and third order variants. The second order scheme requires p = 1. Hence, we write: t − tn t − tn−1 n f+ f n−1 (4.90) V1 (t) = tn − t n−1 tn−1 − tn The integral on the interval [tn tn+1 ] is tn+1 tn tn+1 t − tn−1 dtf n + t n − t n−1 tn 3 n 1 n−1 = ∆t f−f 2 2 V1 (t) dt = tn+1 tn t − tn dtf n−1 t n−1 − t n (4.91) (4.92) The final expression for the second order Adams Bashforth formula is: un+1 = un + ∆t 3 n 1 n−1 f−f + O(∆t3 ) 2 2 (4.93) A third order formula can be designed similarly. Starting with the quadratic interpolation polynomial V2 (t): V2 (t) = + [t − tn ][t − tn−2 ] [t − tn−1 ][t − tn−2 ] n f+ f n−1 [tn − tn−1 ][tn − tn−2 ] [tn−1 − tn ][tn−1 − tn−2 ] [t − tn ][t − tn−1 ] f n−2 [tn−2 − tn ][tn−2 − tn−1 ] (4.94) Its integral can be evaluated and plugged into equation (4.89) to get: un+1 = un + ∆t 23 n 16 n−1 5 f− f + f n−2 12 12 12 (4.95) The stability of the AB2 scheme can be easily determined for the sample problem. The amplification factors are the roots to the equation z 3 A2 − 1 + z A + = 0 2 2 (4.96) and the two roots are: 1 3 A± = 1 + z ± 2 2 3 1+ z 2 2 − 2z . (4.97) Like the leap frog scheme AB2 suffers from the existence of a computational mode. In the limit of good resolution, z → 0, we have A+ → 1 and A+ → 0; that is the computational mode is heavily damped. figure 4.4 shows the modulus of the physical and computational modes for Re(z ) = 0 and Im(z ) < 1. The modulus of the computational mode amplitude factor is quite small for the entire range of z considered. On the other hand the physical mode is unstable for purely imaginary z as the modulus of its amplification factor exceeds 66 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE 1.6 1.4 1.2 |A±| 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 κ∆ t 0.6 0.8 1 Figure 4.4: Modulus of amplification factor for the physical and computational modes of AB2 when Re(κ) = 0. 1. Note however, that a series expansion of |A+ | for small z = iλ∆t shows that |A+ | = 1 + (λ∆t4 )/4 and hence the instability grows very slowly for sufficiently small ∆t. It can be anticipated that AB3 will have one physical mode and two computational modes since its stability analysis leads to a third order equation for the amplification factor. Like AB2, AB3 strongly damps the two computational modes; it has the added benefit of providing conditional stability for Im(z ) = 0. The complete stability regions for AB2 and AB3 is shown in the right panel of figure 4.3. Like all multilevel schemes there are some disadvantages to Adams Bashforth methods. First a starting method is required to jump start the calculations. Second the stability region shrinks with the order of the method. The good news is that although AB2 is unstable for imaginary κ, its instability is small and tolerable for finite integration time. The third order Adams Bashforth scheme on the other hand includes portion of the imaginary axis, which makes AB3 quite valuable for the integration of advection like operators. The main advantage of AB schemes over Runge-Kutta is that they require but one evaluation of the right hand side per time step and use a similar amount of storage. 4.8 Strongly Stable Schemes Occasionally we are interested in enlarging the stability region as much as possible, while maitaining a high convergence order. The lowest order scheme of that sort is the backward difference formula. We now look for equivalent higher order formula. The common thread is to evaluate the derivative term at the next time level. The taylor series expansion of un−k about time level un+1 is: un−k+1 = un+1 − (k∆t) du (k∆t)2 d2 u (k∆t)3 d3 u (k∆t)4 d4 u + − + + ... 1! dt 2! dt2 3! dt3 4! dt4 (4.98) 4.8. STRONGLY STABLE SCHEMES 67 p k =1 kak p a1 a2 a3 a4 1 1 2 4/3 −1/3 3 18/11 −9/11 2/11 4 48/25 −36/25 16/25 −3/25 1 2/3 6/11 12/25 Table 4.1: Coefficients of the Backward Difference Formula of order 1, 2, 3 and 4. where k = 1, . . . , p. Multiplying each of these expansion by a coefficient ak and adding the individual terms, we get: p p p n−k +1 ak u n+1 ak u = k =1 p k =1 − − − (k∆t)3 k3 ak 3! k =1 p km ak k =1 kak k =1 3 du + dt3 p (k∆t) du + 1! dt k2 ak k =1 (k∆t)2 d4 u p k4 ak 4! k =1 dt4 (k∆t)m dm u + ... m! dtm (k∆t)2 d2 u 2! dt2 + ... (4.99) For a p-th order expression, we require the higher order derivative, 2 through p, to vanish; this yields p − 1 homogeneous algebraic equations. For a non-trivial solution we need to append one more conditions which we choose to be that the sum of the unknown coefficient is equal to one. This yields the following system of equations p ak = 1 (4.100) k =1 p km ak = 0, m = 2, 3, . . . , p (4.101) k =1 for the p coefficients a1 , a2 , ..., ap . In matrix form we have 111 1 22 32 1 23 33 .. . .. . .. . p 3p 12 ... 1 a1 . . . p2 a2 . . . p3 a3 = . . . . . . . . . p ... p ap 1 0 0 . . . 0 (4.102) The solution of this system for p equal to 2, 3 and 4 is shown in table 4.1. The corresponding expressions are: un+1 = un + ∆t ut |n+1 + O(∆t) 4 n 1 n−1 2 u−u + ∆t ut |n+1 + O(∆t2 ) un+1 = 3 3 3 9 n−1 2 6 18 n u− u + un−2 + ∆t ut |n+1 + O(∆t3 ) un+1 = 11 11 11 11 (4.103) (4.104) (4.105) 68 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE 4 3.5 3 2.5 2 1.5 1 0.5 0 −2 0 BDF1 2 BDF2 4 BDF3 6 8 10 Figure 4.5: Stability regions for the Backward Differencing schemes of order 1, 2 and 3. The schemes are unstable within the enclosed region and stable everywhere else. The instability regions grow with the order. The stability regions are symmetric about the real axis un+1 = 48 n 36 n−1 16 n−2 3 12 u− u +u − un−3 + + ∆t ut |n+1 + O(∆t4 )(4.106) 25 25 25 25 25 Notice that we have shown explicitly the time level at which the time derivative is approximated. The BDF’s scheme lead to implicit expressions to update the solution at time level un+1 . Like the Adams-Bashforth formula the BDF schemes require a starting method. They also generate computational modes whose number depends on how many previous time levels have been used. Their most important advantage is their stability regions in the complex plane which are much larger then equivalent explicit schemes. 4.8.1 Stability of BDF We investigate the stability of the BDF schemes for the simple case where f (u, t) = κu. It has already been shown that the backward difference scheme is stable in the entire complex plane save for the inside of the unit circle centered at (1, 0). The equation for the BDF2 amplification factor is easily derived: 4 1 2 1 − z A2 − A + = 0 3 3 3 and admits the two roots: √ 2 ± 1 + 2z A± = 3 − 2z (4.107) (4.108) The positive roots is the physical mode while the negative root is the computational mode. In the limit of z → 0, we have A+ → 1 and A− → 1/3, the computational mode is hence naturally damped. Figure 4.5 shows the stability regions for the BDF schemes. The contours of |A| = 1 are shown in the figure. The schemes are unstable within the regions shown and stable outside it. The instability region grows with increasing order. 4.9. SYSTEMS OF ODES 4.9 69 Systems of ODEs The equations to be solved can form a system of equations: du = Lu dt (4.109) where now u represents a vector of unknown and L is a matrix. The preceding schemes can be all made to work with the system of equations by treating all the components in a consistent matter. The major problem is not computational per se, but conceptual and concerns the stability of a system of equation. For example, a backward differentiation of the system leads to the following set of equations for the unknowns at the next time level: un+1 − un = Lun+1 , or un+1 = P un , P = I − ∆tL (4.110) ∆t If we denote the exact solution of the system as v, then the error between the numerical and exact solutions is given by en = un − v(tn ). A number of vector norms are usefull to measure the error, namely, the 1, 2 or |inf ty -norms. The numerical solution converges if e → 0 as ∆t → 0. The concept of stability also carries over. It is clear that the solution, for a linear system at least, evolves as un = P n u0 . And hence the solution will remain bounded if un = P n u0 ≤ C u0 (4.111) We now have to worry about the effect of the amplification matrix P . The problem with the stability analysis is that it is hard to relate P n and P . The matrix norms guarantee that P n ≤ P n . Hence requiring that P < 1 will ensure stability. This is a sufficient condition but not a necessary condition. Since the spectral radius is a lower bound on the different matrix norms, it is necessary to require ρ(P ) ≤ 1. If P can be made diagonal, such as when it possess a complete set of linearly independent eigenvectors, then the requirement ρ(P ) ≤ 1 is sufficient and necessary. 70 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE Chapter 5 Numerical Solution of PDE’s 5.1 Introduction Suppose we are given a well-posed problem that consists of a partial differential equation ∂u = Lu ∂t (5.1) where L is a differential operator, initial conditions u(x, 0) = u0 (x) (5.2) and appropriate boundary conditions. We are now interested in devising a numerical scheme based on finite difference method to solve the above well-posed problem. Let v (x, t) be the exact solution of the problem. The numerical solution of these equations via finite differences requires us to replace the continuous derivatives by discrete approximations, and to confine ourselves with the solution of the problem at a discrete set of space and time points. Hence the numerical solution, denoted by u will be determined at the discrete space points xj = j ∆x, and time points n∆t. We will use the notation un = u(xj , tn ). The approximation must be consistent, stable, and convergent to j be useful in modeling physical problems. We will turn to the issue of defining these important concepts shortly. A simple example of this class of problem is the scalar advection equation in a single space dimension ut + cux = 0, (5.3) where c is the advection speed. In this case L = −cux , and an appropriate boundary conditions is to specify the value of u at the upstream boundary. To make the discussion more concrete let us illustrate the discretization process for the case mentioned above. For simplicity we assume that c is constant and positive. A simple finite difference scheme that would advance the solution in time for time level n to n +1 is to use a Forward Euler scheme for the time derivative, and a backward Euler scheme for the space derivative. We get the following approximation to the PDE at point (xj , tn ). un+1 − un un − un−1 j j j j +c =0 ∆t ∆x 71 (5.4) 72 CHAPTER 5. NUMERICAL SOLUTION OF PDE’S Equation 5.4 provides a simple formula for updating the solution at time level n + 1 from the values at time n: un+1 = (1 − µ)un + µun−1 , where µ = j j j c∆t ∆x (5.5) The variable µ is known as the Courant number and will figure prominently in the study of the stability of the scheme. Equation 5.5 can be written as a matrix operation in the following form: u1 u2 . . . uj −1 uj u j +1 . . . uN −1 uN n+1 1 µ 1−µ .. .. . . = µ 1−µ µ 1−µ µ 1−µ .. .. . . µ 1−µ µ 1−µ u1 u2 . . . uj −1 uj u j +1 . . . uN −1 uN (5.6) where we have assumed that the boundary condition is given by u(x1 , t) = u0 (x1 ). The following legitimate question can now be asked: 1. Consistency: Is the discrete equation (5.4) a correct approximation to the continuous form, eq. (5.3), and does this discrete form reduce to the PDE in the limit of ∆t, ∆x → 0. n 2. Convergence Does the numerical solution un → vj as ∆t, ∆x → 0. j 3. Errors What are the errors committed by the approximation, and how should one expect them to behave as the numerical resolution is increased. 4. Stability Does the numerical solution remained bounded by the data specifying the problem? or are the numerical errors increasing as the computations are carried out. We will now turn to the issue of defining these concepts more precisely, and hint to the role they play in devising finite difference schemes. We will return to the issue of illustrating their applications in practical situations later. 5.1.1 Convergence n Let en = un − vj denote the error between the numerical and analytical solutions of the j j PDE at time n∆t and point j ∆x. If this error tends to 0 as the grid and time steps are decreased, the finite difference solution converges to the analytical solution. Moreover, a finite difference scheme is said to be convergent of order (p, q ) if e = O(∆tp , ∆xq ) as ∆t, ∆x → 0. n 5.2. TRUNCATION ERROR 5.1.2 73 Truncation Error If the analytical solution is inserted in the finite difference scheme, we expect a small residual to remain. This residual characterizes the error in approximating the continuous form by a discrete form. By performing a Taylor series analysis we can derive an expression of this residual in terms of higher order derivatives of the solution. 5.1.3 Consistency Loosely speaking the notion of consistency addresses the problem of whether the finite difference approximation is really representing the partial differential equations. We say that a finite difference approximation is consistent with a differential equation if the FD equations converge to the original equations as the time and space grids are refined. Hence, if the truncation error goes to zero as the time and space grids are refined we conclude that the scheme is consistent. 5.1.4 Stability The notion of stability is a little more complicated to define. Our primary concern here is to make sure that numerical errors do not swamp the analytical solution. One way to ensure that is to require the solution to remain bounded by the initial data. Hence a definition of stability is to require the following un ≤ C u0 (5.7) where C is a positive constant that may depend on the final integration time tn but not on the time nor on the space increments. Notice that this definition of stability is very general and does not refer to the behavior of the continuum equation. If the latter is known to preserve the norm of the solution, then the more restrictive condition un ≤ u0 (5.8) is more practical, particularly for finite ∆t. 5.1.5 Lax-Richtmeyer Equivalence theorem The Lax-Richtmeyer equivalence theorem ties these different notions together. It states the following ”Given a properly-posed linear initial value problem, and a finite difference approximation to it that satisfies the consistency condition, stability is the necessary and sufficient condition for convergence. This theorem’s value is that it guarantees convergence provided two simpler conditions are satisfied, namely consistency and stability. These two are considerable easier to check for general type problems then convergence. 5.2 Truncation Error The analysis of the truncation error for the simple advection equation will be presented here. Inserting the exact solution v in the finite difference form 5.4, we get: n n n n vj +1 − vj vj − vj −1 +c =0 ∆t ∆x (5.9) 74 CHAPTER 5. NUMERICAL SOLUTION OF PDE’S The Taylor series, in time and space yield the following: ∆ t2 vtt |n + O(∆t3 ) j 2 ∆ x2 n vxx |n + O(∆x3 ) = vj − ∆x vx |n + j j 2 n n vj +1 = vj + ∆t vt |n + j (5.10) n vj −1 (5.11) Substituting these expressions in equation 5.9 we get: [vt + cvx ] = − c∆x ∆t vtt + vxx + O(∆t2 , ∆x2 ) 2 2 (5.12) T.E. The terms on the left hand side of eq. 5.12 are the original PDE; all terms on the right hand side are part of the truncation error. They represent the residual by which the exact solution fails to satisfy the difference equation. For sufficiently small ∆t and ∆x, the leading term in the truncation series is linear in both ∆t and ∆x. Notice also that one can regard equation 5.12 as the true partial differential equation represented by the finite difference equation for finite ∆t and ∆x. The analysis of the different terms appearing in the truncation error series can give valuable insight into the behavior of the numerical approximation, and forms the basis of the modified equation analysis. We will return to this issue later. For now it is sufficient to notice that the truncation error tends to 0 as ∆t, ∆x → 0, and hence the finite difference approximation is consistent. 5.3 The Lax Richtmeyer theorem A general formula for the evolution of the finite difference solution is the following: un = Aun−1 + bn−1 (5.13) where A is the evolution matrix, and b is a vector containing forcing terms and the effects of boundary conditions. The vector un holds the vector of solution values at time n. The truncation error at a specific time level can be obtained by applying the above matrix operation to the vector of exact solution values: vn = Avn−1 + bn−1 + zn−1 ∆t (5.14) where z is the vector of truncation error at time level n − 1. Substracting equation 5.14 from 5.13, we get an evolution equation for the error, namely: en = Aen−1 + zn−1 ∆t (5.15) Equation 5.15 shows that the error at time level n is made up of two parts. The first one is the evolution of the error inherited from the previous time level, the first term on the right hand side of eq. 5.15, and the second part is the truncation error committed at the present time level. Since, this expression applies to a generic time level, the same expression holds for en−1 : en−1 = Aen−2 + zn−2 ∆t (5.16) 5.3. THE LAX RICHTMEYER THEOREM 75 where we have assumed that the matrix A does not change with time to simplify the discussion (this is tantamount to assuming constant coefficients for the PDE). By repeated application of this argument we get: en = A2 en−2 + Azn−2 + zn−1 ∆t (5.17) = A3 en−3 + A2 zn−3 + Azn−2 + zn−1 ∆t (5.18) . . . = An e0 + An z0 + An−1 z1 + . . . + Azn−2 + zn−1 ∆t (5.19) Equation 5.19 shows that the error growth depends on the truncation error at all time levels, and on the discretization through the matrix A. We can use the triangle inequality to get a bound on the norm of the error. Thus, e n ≤ An e0 + An z0 + An−1 z1 + . . . + A zn−2 + zn−1 ∆t (5.20) In order to make further progress we assume that the norm of the truncation error at any time is bounded by a constant ǫ such that ǫ= max 0≤m≤n−1 ( zm ) (5.21) The right hand side of inequality 5.20 can be bounded by e n ≤ An e0 + ǫ∆ t n−1 Am (5.22) m=0 The initial errors and the subsequent truncation errors are thus modulated by the evolution matrices Am . In order to prevent the unbounded growth of the error norm as n → ∞, we need to put a limit on the norm of the these matrices. This is in effect the stability property needed for convergence: Am ≤ C = max ( Am ) 1≤m≤n (5.23) where C is a constant independent of n, ∆t and ∆x. The sum in bracket can be bounded by the factor nC ; the final expression becomes: en ≤ C e0 + t n ǫ (5.24) where tn = n∆t is the final integration time. When ∆x → 0, the initial error en can be made as small as desired. Furthermore, by consistency, the truncation error ǫ → 0 when ∆t, ∆x → 0. The global error is hence guarateed to go to zero as the computational grid is refined, and the scheme is convergent. 76 CHAPTER 5. NUMERICAL SOLUTION OF PDE’S 5.4 The Von Neumann Stability Condition The sole requirements we have put on the scheme for convergence are consistency and stability. The latter took the form: Am ≤ C (5.25) where C is independent of ∆t, ∆x and n. By the matrix norm properties we have: Am ≤ A hence it is sufficient to require that A 1 m ∆t m (5.26) ≤ C , or that A ≤ C m = e tm ln C = 1 + ln C ∆t + . . . = 1 + O(∆t) tm (5.27) The Von neumann stability condition is hence that A ≤ 1 + O(∆t) (5.28) Note that this stability condition does not make any reference on whether the continuous (exact) solution grows or decays in time. Furthermore, the stability condition is established for finite integration times with the limit ∆t → 0. In practical computations the computations are necessarily carried out with a small but finite ∆t, and it is frequently the case that the evolution equation puts a bound on the growth of the solution. Since the numerical solution and its errors are subject to the same growth factors via the matrix A, it is reasonable, and in most cases essential to require the stronger condition A ≤ 1 for stability for non growing solutions. A final practical detail still needs to be ironed out, namely what norm should be used to measure the error? From the properties of the matrix norm it is immediately clear that the spectral radius ρ(A) ≤ A , hence ρ(A) ≤ 1 is a necessary condition for stability but not sufficient. There are classes of matrices A where it is sufficient, for example those that posess a complete set of linear eigenvectors such as those that arise from the discretization of hyperbolic equation. If the 1 or ∞-norms are used the condition for stability becomes sufficient. Example 10 In the case of the advection equation, the matrix A given in equation 5.6 has norm: A 1 = |A ∞ = |µ| + |1 − µ| (5.29) For stability we thus require that |µ| + |1 − µ| ≤ 1. Two cases need to be considered: 1. 0 ≤ µ ≤ 1: A = µ + 1 − µ = 1, stable. 2. µ < 0: A = 1 − 2µ > 1, unstable. 3. µ > 1: A = 1 + 2µ > 1, unstable. The scheme is hence guaranteed to converge when 0 ≤ µ ≤ 1. 5.5. VON NEUMANN STABILITY ANALYSIS 5.5 77 Von Neumann Stability Analysis Matrix analysis can be difficult to carry out for complicated PDEs, particularly since it requires us to know the entire spectrum, or norm of the matrix before we can derive useful stability criteria. Von Neumann devised a substantially easier stability test, one that does involve matrices per se but can be reduced to evaluating scalars. The idea is to restrict attention to periodic problems and to consider the Fourier modes of the solution. Since the solution is periodic it can be expanded into a Fourier series of the form: un = un eikxj ˆk j (5.30) where k is the wavenumber and un is its (complex) Fourier amplitude. This expression ˆk can then be inserted back in the finite difference equation, and an expression for the amplification factor A can be obtained, where A depends on k, ∆x and ∆t. Stability of every Fourier mode will guarantee the stability of the entire solution, and hence |A| ≤ 1 for all Fourier modes is the necessary and sufficient stability condition for non-growing solutions. Example 11 Inserting the Fourier series in the finite difference approximation for the advection equation we end up with the following equation: uk ˆk un+1 eikxj = (1 − µ)ˆn eikxj + µun eikxj−1 ˆk (5.31) Since xj −1 = xj − ∆x, the exponential factor drops out of the picture and we end up with the following expression for the growth of the Fourier coefficients: ˆk un+1 = (1 − µ) + µe−ik∆x un ˆk (5.32) A The expression in bracket is nothing but the amplification factor for Fourier mode k. Stability requires that |A| < 1 for all k. |A|2 = AA∗ = (1 − µ) + µe−ik∆x (1 − µ) + µeik∆x = (1 − µ)2 + µ(1 − µ)(eik∆x + e−ik∆x ) + µ2 = 1 − 2µ + 2µ(1 − µ) cos k∆x + 2µ 2 2 = 1 − 2µ(1 − cos k∆x) + 2µ (1 − cos k∆x) k ∆x (1 − µ)µ = 1 − 4 sin2 2 (5.33) (5.34) (5.35) (5.36) (5.37) It is now clear that |A|2 ≤ 1 if µ(1 − µ) > 0, i.e. 0 ≤ µ ≤ 1. It is the same stability criterion derived via the matrix analysis procedure. 5.6 Modified Equation The truncation error series used to establish the consistency of the scheme can be used to extract additional information about the expected behavior of the numerical scheme. 78 CHAPTER 5. NUMERICAL SOLUTION OF PDE’S This is motivated by the observation that the finite difference scheme is in fact solving a perturbed form of the original equation. Equations 5.9 and 5.12 establish that the FTBS scheme approximates the advection equation to first order, O(∆t, ∆x) to the advection equation. They also show that FTBS approximates the following equation to second order in time and space: [vt + cvx ] = − c∆x ∆t vtt + vxx + O(∆t2 , ∆x2 ). 2 2 (5.38) The second term on the right hand side of equation 5.38 has the form of a diffusion like operator, and hence we expect it to lead to a gradual decrease in the amplitude of the solution. The interpretation of the time derivative term is not simple. The way to proceed is to derive an expression for vtt in terms of the spatial derivative. This is achieved by differentiating equation 5.38 once with respect to time and once with respect to space to obtain: ∆t c∆x vttt + vtxx + O(∆t2 + ∆x2 ) 2 2 c∆x ∆t vxxx + O(∆t2 + ∆x2 ) = − vttx + 2 2 vtt + cvxt = − vtx + cvxx (5.39) (5.40) Multiplying equation 5.40 by −c and adding it to equation 5.39 we get: vtt = c2 vxx c∆x ∆t (−vttt + cvtxx ) + (vxxt − cvxxx ) + O(∆t2 , ∆x2 ) 2 2 (5.41) Inserting this first order approximation to vt t back in equation 5.38 we obtain the following modified equation. vt + cvx = c (∆x − c∆t) vxx + O(∆x2 , ∆x∆t, ∆t2 ) 2 (5.42) Equation 5.42 is more informative then its earlier version, equation 5.38. It tells us that the leading error term in ∆t, ∆x behaves like a second order spatial derivative whose coefficient is given by the pseudo, or numerical, viscosity νn , where νn = c (∆x − c∆t) . 2 (5.43) If νn > 0 we expect the solution to be damped to leading order, the numerical scheme behaves as a advection-diffusion equation, one whose viscous coefficient is purely an artifact of the finite difference discretization. If the numerical viscosity is negative, νn < 0, the solution will be amplified exponentially fast. The stability condition that νn > 0 is nothing but the usual stability criterion we have encountered earlier, namely that c > 0 and µ = c∆t/∆x < 1. A more careful analysis of the truncation error that includes higher powers of ∆t, ∆x yields the following form: vt + cvx = c∆x2 c∆x (1 − µ)vxx − (2µ2 − 3µ +1)vxxx + O(∆t3 , ∆t2 ∆x, ∆t∆x2 , ∆x3 ) (5.44) 2 6 5.6. MODIFIED EQUATION 79 The third derivative term is indicative of the presence of dispersive errors in the numerical solution; the magnitude of these errors is amplified by the coefficient multiplying the third order derivative. This coefficient is always negative in the stability region 0 ≤ µ ≤ 1. One can expect a lagging phase error with respect to the analytical solution. Notice also that the coefficients of the higher order derivative on the right hand side term go to zero for µ = 1. This “ideal” value for the time step makes the scheme at least third order accurate according to the modified equation; in fact it is easy to convince one self on the basis of the characteristic analysis that the exact solution is recovered. Notice that the derivation of the modified equation uses the Taylor series form of the finite difference scheme, equation 5.9, rather then the original partial differential equations to derive the estimates for the high order derivative. This is essential to account for the discretization errors. The book by Tannehill et all 1997 discusses a systematic procedure for deriving the higher order terms in the modified equation. 80 CHAPTER 5. NUMERICAL SOLUTION OF PDE’S Chapter 6 Numerical Solution of the Advection Equation 6.1 Introduction We devote this chapter to the application of the notions discussed in the previous chapter to investigate several finite difference schemes to solve the simple advection equation. This equation was taken as an example to illustrate the abstract concepts that frame most of the discussion on finite difference methods from a theoretical perspective. These concepts we repeat are consistency, convergence and stability. We will investigate several common schemes found in the literature, and we will investigate their amplitude and phase errors more closely. 6.2 Donor Cell scheme The donor cell scheme is essentially the FTBS scheme seen earlier. The only distinguishing feature of the donor-cell scheme is that it allows the switching of the spatial finite difference according to the sign of the advecting velocity c. A compact way of writing the scheme is: un+1 − un c + |c| un − un−1 c − |c| un+1 − un j j j j j j + + =0 ∆t 2 ∆x 2 ∆x (6.1) For c > 0 the scheme simplifies to a FTBS, and for c < 0 it becomes a FTFS (forward time and forward space) scheme. Here we will consider solely the case c > 0 to simplify things. Figure 6.1 shows plots of the amplification factor for the donor cell scheme. Prior to discussing the figures we would like to make the following remarks. 6.2.1 Remarks 1. The scheme is conditionally stable since the time step cannot be chosen independently of the spatial discretization and must satisfy ∆t ≤ ∆tmax = c/∆x. 81 82 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION |A| for FTBS µ = 0.25, 0.5, and 0.75 1 0.9 µ=0.25,0.75 0.8 µ=0.50 0.7 |A| 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 1.4 µ=0.75 1.2 µ=0.5,1.0 1 Φ/Φ a 0.8 0.6 0.4 µ=0.25 0.2 0 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 Figure 6.1: Amplitude and phase diagram of the donor cell scheme as a function of the wavenumber 6.2. DONOR CELL SCHEME 83 2. The wavelength appearing in the Von-Neumann stability analysis has not been specified yet. Small values of k correspond to very long wavelegths, i.e. Fourier modes that are well represented on the computational grid. Large values of k correspond to very short wavelength. This correspondence is evident by the expression k∆x = 2π ∆x/λ, where λ is the wavelength of the Fourier mode. For example, a twenty kilometers wave represented on a grid with ∆x = 2 kilometers would have 10 points per wavelength and its k∆x = 2π 2/10 = 2π/5. 3. There is an lower limit on the value of the shortest wave representable on a discrete grid. This wave has a wavelength equal to 2∆x and takes the form of a see-saw function; its k∆x = π . Any wavelength shorter then this limit will be aliased into a longer wavelegth. This phenomenon is similar to the one encountered in the Fourier analysis of time series where the Nyquist limit sets a lower bound to the smallest measurable wave period. 4. In the previous chapter we have focussed primarily on the magnitude of the amplification factor as it is the one that impacts the issue of stability. However, additional information is contained in the expression for the amplification factor that relates to the dispersive properties of the finite difference scheme. The analytical expression for the amplification factor for a Fourier mode is Aa = e−ick∆t. (6.2) Thus the analytical solution expects a unit amplification per time step, |Aa | = 1, and a change of phase of φa = −ck∆t = −µk∆x. The amplification factor for the donor cell scheme is however: A = |A|eiφ , k ∆x , 2 µ sin k∆x 1 − µ(1 − cos k∆x) |A| = 1 − µ(1 − µ)4 sin2 φ = tan−1 (6.3) (6.4) (6.5) where φ is the argument of the complex number A. The ratio of φ/φa gives the relative error in the phase. A ratio less then 1 means that the numerical phase error is less then the analytical one, and the scheme is decelerating, while a ratio greater then indicates an accelerating scheme. We will return to phase errors later when we look at the dispersive properties of the scheme. 5. The donor cell scheme for c positive can be written in the form: un+1 = (1 − µ)un + µun−1 j j j (6.6) which is a linear, convex (for 0 ≤ µ ≤ 1), combination of the two values at the previous time levels upstream of the point (j, n). Since the two factors are positive we have min(un , un−1 ) ≤ un+1 ≤ max(un , un−1 ), (6.7) j j j j j 84 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION In plain words the value at the next time level cannot exceed the maximum of the two value upstream, nor be less then the minimum of these two values. This is referred to as the monotonicity property. It plays an important role in devising scheme which do not generate spurious oscillation because of under-resolved gradients. We will return to this point several times when discussing dispersive errors and special advection schemes. Figure 6.1 shows |A| and φ/φa for the donor cell scheme as a function of k∆x for several values of the Courant number µ. The long waves (small k∆x are damped the least for 0 ≤ µ ≤ 1 whereas high wave numbers k∆x → 0 are damped the most. The most vigorous damping occurs for the shortest wavelength for µ = 1/2, where the donorcell scheme reduces to an average of the two upstream value, the amplification factor magnitude is then |A| = 0, i.e. 2∆x waves are eliminated after a single time step. The amplification curves are symmetric about µ = 1/2, and damping lessens as µ becomes smaller for a fixed wavelength. The dispersive errors are small for long waves; they are decelerating for all wavelengths for µ < 1/2 and accelerating for 1/2 ≤ µ ≤ 1; they reach their peak acceleration for µ = 3/4. 6.3 Backward time centered space (BTCS) In this scheme the terms in the equations are evaluated at time (n + 1). For a two time level scheme this translates into a backward euler difference for the time derivative. We use a centered difference in space to increase the order of the spatial approximation. This leads to the equations: +1 +1 un+1 − un−1 un+1 − un j j j j +c =0 (6.8) ∆t 2∆x 6.3.1 Remarks 1. truncation error The Taylor series analysis (expansion about time level n + 1) leads to the following equation: ut + cux = − − ∆t c∆x2 ∆ t2 utt + uxxx + uttt + O(∆t3 , ∆x4 ) 2 3 6 (6.9) The leading truncation error term is O(∆t, ∆x2 ), and hence the scheme is first order in time and second order in space. Moreover, the truncation error goes to zero for ∆t, ∆x → 0, and hence the scheme is consistent. 2. The Von Neumann stability analysis leads to the following amplification factor: A= |A| = φ φa = 1 − iµ sin k∆x 1 + µ2 sin2 k∆x 1 1 + µ2 sin2 k∆x (6.10) < 1, for all µ, k∆x tan−1 (−µ sin k∆x) −µk∆x (6.11) (6.12) 6.3. BACKWARD TIME CENTERED SPACE (BTCS) 85 |A| for BTCS 1 µ=0.25 µ=0.50 0.8 |A| 0.9 µ=0.75 0.7 µ=1.00 0.6 0.5 µ=2.00 0.4 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 1 0.9 0.25 0.8 0.50 0.7 0.6 Φ/Φ a 0.75 0.5 1.00 0.4 0.3 2.00 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 Figure 6.2: Amplitude and phase diagrams of the BTCS scheme as a function of the wavenumber 86 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION The scheme is unconditionally stable since |A| < 1 irrespective of the time step ∆t. By the Lax-Richtmeyer theorem the consistency and stability of the scheme guarantee it is also convergent. 3. The modified equation for BTCS is ut + cux = c2 ∆t uxx − 2 c∆x2 c3 2 + ∆t uxxx + . . . 6 6 (6.13) The numerical viscosity is hence always positive and lends the scheme its stable and damping character. Notice that the damping increasing with increasing c and ∆ t. 4. Notice that the scheme cannot update the solution values a grid point at a time, +1 since the values un+1 and un±1 are unknown and must be determined simultanej j ously. This is an example of an implicit scheme which requires the inversion of a system of equation. Segregating the unknowns on the left hand side of the equation we get: µ +1 µ +1 (6.14) − un−1 + un+1 + un+1 = un j j j 2 2j which consititutes a matrix equation for the vector of unknowns at the next time level. The equation in matrix forms are: −µ 2 1 −µ 2 0 µ 2 1 −µ 2 µ 2 1 µ 2 u 1 . . . uj −1 uj u j +1 . . . uN n+1 n u1 . . . u j −1 = uj uj +1 . . . uN (6.15) The special structure of the matrix is that the only non-zero entries are those along the diagonal, and on the first upper and lower diagonals. This special structure is referred to as a tridiagonal matrix. Its inversion is far cheaper then that of a full matrix and can be done in O(N ) addition and multiplication through the Thomas algorithm for tridiagonal matrices; in contrast, a full matrix would require O(N 3 ) operations. Finally, the first and last rows of the matrix have to be modified to take into account boundary conditions. We will return to the issue of boundary conditions later. Figures 6.2 shows the magnitude of the amplification factor |A| for several Courant numbers. The curves are symmetric about k∆x = π/2. The high and low wave numbers are the least damped whereas the intermediate wave numbers are the most damped. The departure of |A| from 1 deteriorates with increasing µ. Finally the scheme is decelarating for all wavenumbers and Courant numbers, and the deceleration deteriorates for the shorter wavelengths. 6.4. CENTERED TIME CENTERED SPACE (CTCS) 87 1.00 1 0.9 0.25 0.8 0.50 0.75 0.7 Φ/Φ a 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 Figure 6.3: Phase diagrams of the CTCS scheme as a function of the wavenumber 6.4 Centered time centered space (CTCS) A simple and popular explicit second order scheme in space and time is the centered time and centered space scheme. This is a three time level scheme and takes the form: n n uj +1 − uj −1 un+1 − un−1 j j +c =0 2∆t 2∆x 6.4.1 (6.16) Remarks 1. truncation error The Taylor series analysis leads to the following equation: ut + cux = − − c∆x2 ∆ t2 uttt + uxxx + O(∆t4 , ∆x4 ) 3 3 (6.17) The leading truncation error term is O(∆t2 , ∆x2 ), and hence the scheme is second order in time and space. Moreover, the truncation error goes to zero for ∆t, ∆x → 0, and hence the scheme is consistent. 88 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION 2. The Von Neumann stability analysis leads to a quadratic equation for the amplification factor: A2 + 2µ sin k∆xA − 1 = 0. Its two solutions are: A± = −iµ sin k∆x ± 1 − µ2 sin2 k∆x |A± | = 1, for all |µ| < 1 φ −µ sin k∆x 1 tan−1 = φa −µk∆x 1 − µ2 sin2 k∆x (6.18) (6.19) (6.20) The scheme is conditionally stable for |µ| < 1, and its amplification is neutral since |A| = 1 within the stability region. An attribute of the CTCS scheme is that its amplification factor mirror the neutral amplification of the analytical solution. By the Lax-Richtmeyer theorem the consistency and stability of the scheme guarantee it is also convergent. 3. The modified equation for CTCS is ut + cux = c∆x2 2 c∆x4 (µ − 1)uxxx − (9µ4 − 10µ2 + 1)uxxxxx + . . . 6 120 (6.21) The even derivative are absent from the modified equation indicating the total absence of numerical dissipation. The only errors are dispersive in nature due to the presence of odd derivative in the modified equation. 4. The model requires a starting procedure to kick start the computations. It also has a computational mode that must be damped. 5. Figures 6.3 shows the phase errors for CTCS for several Courant numbers. All wave numbers are decelerating and the shortest wave are decelerated more then the long waves. 6.5 Lax Wendroff scheme The idea behind the Lax-Wendroff scheme is to keep the simplicity of two time level schemes while attempting to increase the order of accuracy in space and time to second order. This is possible if derivatives in time are translated to derivatives in space. The Taylor series in time about time level n is: un+1 = un + ∆tut + j j ∆ t3 ∆ t2 utt + uttt 2 6 (6.22) From the PDE we know that ut = −cux . What we need to complete the second order accuracy in time is a second order expression for utt . This can be obtained by differentiating the advection equation with respect to time to yield: utt = −cuxt = −c(ut )x = −c(−cux )x = c2 uxx (6.23) The Taylor series in time takes the form: un+1 = un − c∆tux + j j ∆ t3 c2 ∆t2 uxx + uttt 2 6 (6.24) 6.5. LAX WENDROFF SCHEME 89 |A| for Lax−Wendroff µ=1.00 1 0.9 µ=0.25 0.8 0.7 |A| 0.6 µ=0.50 0.5 0.4 0.3 0.2 0.1 µ=0.75 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 1.4 1.2 1.00 1 0.75 0.25 0.50 Φ/Φ a 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 k ∆ x/π 0.6 0.7 0.8 0.9 1 Figure 6.4: Amplitude and phase diagrams of the Lax-Wendroff scheme as a function of the wavenumber 90 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION All that remains to be done is to use high order approximations for the spatial derivatives ux and ux x. We use centered derivatives for both terms as they are second order accurate to get the final expression: un+1 − un un+1 − un−1 c2 ∆t un+1 − 2un + un−1 j j j j j j j = −c + 2 ∆t 2∆x 2 ∆x 6.5.1 (6.25) Remarks 1. truncation error The Taylor series analysis (expansion about time level n + 1 leads to the following equation: ut + cux = − − ∆ t2 c∆x2 uttt + uxxx + O(∆t4 , ∆x4 ) 3 3 (6.26) The leading truncation error term is O(∆t2 , ∆x2 ), and hence the scheme is second order in time and space. Moreover, the truncation error goes to zero for ∆t, ∆x → 0, and hence the scheme is consistent. 2. The Von Neumann stability analysis leads to: A = 1 − µ2 (1 − cos k∆x) − iµ sin k∆x 2 |A| φ φa 2 2 2 2 = [1 − µ (1 − cos k∆x)] + µ sin k∆x, −µ sin k∆x 1 = tan−1 −µk∆x 1 − µ2 (1 − cos k∆x) (6.27) (6.28) (6.29) The scheme is conditionally stable for |µ| < 1. By the Lax-Richtmeyer theorem the consistency and stability of the scheme guarantee it is also convergent. 3. The modified equation for Lax Wendroff is ut + cux = c∆x3 c∆x2 2 (µ − 1)uxxx − µ(1 − µ2 )uxxxx + . . . 6 8 (6.30) 4. Figures 6.4 shows the amplitude and phase errors for the Lax Wendroff schemes. The phase errors are predominantly lagging, the only accelerating errors are those of the short wave at relatively high values of the Courant number. 6.6 Numerical Dispersion Consistency and stability are the first issues to consider when contemplating the solution of partial differential equations. They address the theoretical questions of convergence in the limit of improving resolution. They should not be the last measure of performance, however, as other error measures can be of equal importance. For hyperbolic equations, where wave dynamics are important, it is critical to consider the distortion of wave propagation by the numerical scheme. Although, we have looked at the phase characteristic of the scheme derived so far, it was hard to get an intuitive feel for the impact on the wave propagation characteristics. The aim of this section is to address the issue of the numerical dispersion relation, and to compare it to the dispersion relation of the continuous equations. We start with the latter. 6.6. NUMERICAL DISPERSION 6.6.1 91 Analytical Dispersion Relation The analytical dispersion relation for the wave equation can be obtained by looking for periodic solutions in space and time of the form uei(kx−ωt) where k is the wavenumber ˜ and ω the corresponding frequency. Inserting this expression in equation 5.3 we get the dispersion relation: ω = ck (6.31) The associate phase speed, Cp , and group velocity, Cg , of the system is as follows Cp = Cg = ω =c k ∂ω =c ∂k (6.32) (6.33) The two velocities are constant and the system is non-dispersive, i.e. all waves travels with the same phase speed regardless of wavenumber. The group velocity is also constant in the present case and reflects the speed of energy propagation. One can anticipate that this property will be violated in the numerical discretization process based on what we know of the phase error plots; there it was shown that phase errors are different for the different wave number. We will make this assertion clearer by looking at the numerical dispersion relation. 6.6.2 Numerical Dispersion Relation: Spatial Differencing To keep the algebra tractable, we assume that only the spatial dimension is discretized and the time dimenion is kept continuous. The semi-discrete form of the following schemes are: 1. Centered second order scheme uj +1 − uj −1 =0 2∆x (6.34) 8(uj +1 − uj −1 ) − (uj +2 − uj −2 ) =0 12∆x (6.35) ut + 2. Centered fourth order scheme ut + 3. Centered sixth order scheme ut + 45(uj +1 − uj −1 ) − 9(uj +2 − uj −2 ) + (uj +3 − uj −3 ) =0 60∆x 4. Donor cell (6.36) uj − uj − 1 =0 ∆x (6.37) 2uj +1 + 3uj − 6uj −1 ) + uj −2 =0 6∆x (6.38) ut + 5. Third order upwind ut + 92 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION The dispersion for these numerical scheme can be derived also on the basis of periodic solution of the form uj = uei(kxj −σ) . The biggest difference is of course that the Fourier ˜ expansion is discrete in space. The following expression for the phase velocity can be derived for the different schemes: CD2 CD4 CD6 Donor Third Upwind σ k σ k σ k σ k σ k =c =c =c =c =c sin k∆x k ∆x 8 sin k∆x − sin 2k∆x 6k∆x sin 3k∆x − 9 sin 2k∆x + 45 sin k∆x 30k∆x sin k∆x − i(1 − cos k∆x) k ∆x (8 sin k∆x − sin 2k∆x) − i2(1 − cos k∆x)2 k ∆x (6.39) Several things stand out in the numerical dispersion of the various schemes. First, all of them are dispersive, and hence one expects that a wave form made up of the sum of individual Fourier components will evolve such that the fast travelling wave will pass the slower moving ones. Second, all the centered difference scheme show a real frequency, i.e. they introduce no amplitude errors. The off-centered schemes on the other hand have real and imaginary parts. The former influences the phase speed whereas the former influences the amplitude. The amplitude decays if I m(σ ) < 0, and increases for I m(σ ) > 0. Furthermore, the upwind biased schemes have the same real part as the next higher order centered scheme; thus their dispersive properties are as good as the higher order centered scheme except for the damping associated with their imaginary part (this is not necessarily a bad things at least for the short waves). Figure 6.5 shows the dispersion curve for the various scheme discussed in this section versus the analytical dispersion curve (the solid straight line). One can immediately see the impact of higher order spatial differencing in improving the propagation characteristics of the intermediate wavenumber range. As the order is increased, the dispersion curves rise further towards the analytical curve, particularly near k∆x > π/2, hence a larger portion of the spectrum is propagating correctly. The lower panel shows the impact of biasing the differencing towards the upstream side. The net effect is the introduction of numerical dissipation. The latter is strongest for the short waves, and decreases with the order of the scheme. Figure 6.6 shows the phase speed (upper panel) and group velocity (lower panel) of the various schemes. Again it is evident that a larger portion of the wave spectrum is propagating correctly whereas as the order is increased. None of the schemes allows the shortest wave to propagate. The same trend can be seen for the group velocity plot. However, the impact of the numerical error is more dramatic there since the short waves have negative group velocities, i.e. they are propagating in the opposite direction. This trend worsens as the accuracy is increased. 6.6. NUMERICAL DISPERSION 93 Numerical Dispersion of CD of 2, 4 and 6 order 3.5 3 2.5 ω 2 FE1 1.5 CD6 CD4 1 CD2 0.5 0 0 0.2 0.4 k /(∆ xπ) 0.6 0.8 1 Imaginary part of frequency for Upstream Difference of 1, and 3 order 2 1.8 1.6 1.4 ωi 1.2 1 1 0.8 0.6 0.4 3 0.2 0 0 0.2 0.4 k /(∆ xπ) 0.6 0.8 1 Figure 6.5: Dispsersion relation for various semi-discrete schemes. The upper panel shows the real part of the frequency whereas the bottom panel shows the imaginary part for the first and third order upwind schemes. The real part of the frequency for these two schemes is identical to that of the second and fourth order centered schemes. 94 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION Numerical phase speed of various spatial discretization 1 0.9 0.8 FE1 0.6 CD6 0.5 CD4 cnum/can 0.7 0.4 CD2 0.3 0.2 0.1 0 0 0.2 0.4 k ∆ x/π 0.6 0.8 1 Group velocity 1 0.5 0 g −0.5 CD2 −1.5 CD4 −2 c −1 CD6 −2.5 −3 0 FE1 0.2 0.4 k∆ x/π 0.6 0.8 1 Figure 6.6: Phase speed (upper panel) and Group velocity (lower panel) for various semi-discrete schemes. Chapter 7 Finite Volume Method . This chapter focusses on introducing finite volume method for the solution of partial differential equations. These methods are in wide-spread use for their robustness, their intuitive formulation, and offer some clear advantages, primarily ensuring the conservation of a quantity of interest. We will take look at the formulation, discretization, and coding of these methods. 7.1 The partial differential equation The partial differential equation we will focus on is a scalar equation that represents the transport of a substance under the influence of advection by the air flow and mixing. The transport equation is frequently written in the advective form: ∂T + u · ∇T = ∇ · (α∇T ) ∂t (7.1) where T is the subtance transported, e.g. temperature, humidity or a pollutant concentration, u is the velocity field presumed known, and α is the diffusion coefficient and which can represent either molecular diffusion or eddy mixing. The advective form can be interpreted as the time evolution of the T field along characteristic lines given by dx = u (and in the absence of diffusive effects or source terms, T is constant). The dt advective form is thus closest to a Lagrangian description of the flow where one follows individual particles. A slightly different form of the equation called the conservative form can be derived and forms the starting point for the derivation of finite volume methods. The connection between the advective and conservation is mediated by a statement of conservation of mass. Indeed, the velocity field cannot be arbitrary and must satisfy some sort of mass conservation equation. Here we will assume the flow to be incompressible so that its mass conservation equation reduces to: ∇·u=0 (7.2) Multiplying the continuity equation by T , adding it to the resultant equations to the advective form, and recalling that u ·∇T + T ∇· u = ∇· (uT ) we can derive the conservative 95 96 CHAPTER 7. FINITE VOLUME METHOD form of the transport-diffusion equation: ∂T + ∇ · (uT ) = ∇ · (α∇T ). ∂t (7.3) The advection and conservation forms are both valid statements at every point in the domain. Although the two forms are equivalent in the continuum, they express different aspect of physical laws. The advection form describes the evolution of T along fluid trajectories whereas the conservation form describe the conservation of T at every point in the domain (see section 7.2 for an explanation of this point of view). It is also important to remember that the two statements hold simultaneously thanks to the conservation of mass. Although the advection and conservation equation are equivalent in the continuum case, the equivalency maybe broken in the discretization process. Hence discretizing the advective or conservation form will lead to the approximate enforcement of slightly different physical laws. In some application, the issue of conservation is essential, so that the T is conserved for long simulation times. This concern stems not only from physical considerations but also for the need to account for the sources and sinks of T in long calculations, or in complex simulations. Examples include climate simulations: their simulation time is centuries, and it is important to account for all the sinks of heat or carbon dioxide. Another example are simulation of chemical reactions or combustion. Hence, in situation where conservation is paramount it is natural to discretize the continuous form starting from the conservation statement, and ensuring that the discretization does not introduce spurious sources. Finite volume methods are ideally suited to enforce conservation laws in the discrete case. They have a further virtue: for solution that develop discontinuities and where the spatial derivative may fail to exist at a number of location, the finite volume procedure remains valid as it acts on an integrated form of the equations as we will see shortly. 7.2 Integral Form of Conservation Law The partial differential equation 7.3 is valid at all points in the domain which we could consider as infitesimal volumes. Anticipating that infinitesimal discrete volumes are unaffordable and would have to be ”inflated” to a finite size, we proceed to derive the conservative form for a finite volume δV bounded by a surface δS as shown in figure 7.1. Integrating equation 7.3 over the control volume δV we get: δV d dt ∂T dV + ∂t T dV δV δV ∇ · (uT ) dV = n · uT dV = + δS ∇ · (α∇T ) dV (7.4) δV n · α∇T dV (7.5) δS Remarks • We assume the volume δV to be fixed in space so we can interchange the order of integration in space and differentiation in time. The interpretation of the first integral on the left hand side of equation 7.5 is now simple: it is the time rate of change of the T budget inside volume δV . 7.2. INTEGRAL FORM OF CONSERVATION LAW 97  n Tu '$ δV δS &% Figure 7.1: Sketch of the volume δV and its bounding surface δS . • We have used the Gauss-divergence theorem to change the volume integrals of the flux and diffusion divergence into surface integrals. Here n is the outward unit normal to the surface δS . The surface integral on the left hand side accounts for the advective flux carrying T in and out of the volume δV across the surface δS ; the one on the right hand side accounts for the diffusive transport of T across δS . • Equation 7.5 lends itself to simple physical interpretation: the rate of change of the T budget in δV is equal to the rate of transport of T through δS by advective fluxes (transport by the flow, wind, current) and diffusive fluxes. • If the volume δV is closed to advection or diffusion: u.n = 0 and ∇T · n = 0, then the rate of change is zero and the budget of T is conserved within δV . • The volume δV is so far arbitrary and we have not assigned it a specific shape or size. Actually equation 7.5 applies to any volume δV whether it is a computational cell, an entire ocean basin, or Earth’s atmosphere. • If there are additional physical processes affecting the budget of T , such as sources or sinks within δV these should be accounted for also. No additional process is considered here. • The derivatives appearing in the diffential equation 7.1 are: first order for the advective term, and second order for the diffusion term, whereas the equivalent terms in the integral form have only a zero-th order derivative and first order derivative respectively. This lowering of the derivative order is important in dealing with solution which change so rapidly in space that the spatial derivative does not exist. Examples include supersonic shock waves in the atmosphere or hydraulic jump in water where fluid properties such as density or temperature change so fast that it appears discontinuous. Discontinuous function do not have derivatives at the location of discontinuity and mathematically speaking the partial differential form of the conservation equation is invalid there even though the conservation law underlying it is still valid. Special treatment is generally required for treating discontinuities and reducing order of the spatial derivative helps simplify the special treatment. For these reasons Finite volumes are preferred over finite differences to solve problems whose solution exhibit local discontinuities. A slightly different form of equation 7.5 can be derived by introducing the average of T in δV and which we refer to as T and is defined as: T= 1 δV T dV. δV (7.6) 98 CHAPTER 7. FINITE VOLUME METHOD The integral conservation law can now be recast as a time evolution equation for T : δV 7.3 dT + dt δS n · uT dV = δS n · α∇T dV (7.7) Sketch of Finite Volume Methods Equation 7.5 and 7.7 are exact and no approximation was necessary in their derivation. In a numerical model, the approximation will be introduced by the temporal integration of the equations, and the need to calculate the fluxes in space and time. The traditional finite volume method takes equation 7.7 as its starting point. The domain is first divided into computational cells δVj where the cell average of the function is known. The advection and diffusion fluxes are calculated in two steps: • Function reconstruction The advective fluxes require the calculation of the function values at cell edges, while the diffusive fluxes require the calculation of the function derivative at cell edges. The latter are obtained from approximating the function T with a polynomial whose coefficients are determined by the need to recover the cell averages over a number of cells. P ˜ T= an φ(x) (7.8) n=1 ˜ T dV δVj +m = δVj +m T j +m , m = 0, 1, 2, . . . , P − 1 (7.9) where δVj +m are P cells surrounding the cell δVj and where the φn are P suitably chosen interpolation functions (generally simple polynomials). The unknown coefficients an are recovered simply from the simultaneous algebraic equation P Am,n an = δVj +m T j +m , Am,n = n=1 δVj +m φn dV. (7.10) Once the coefficients an are known it becomes possible to evaluate the function on cell edges with a view to compute the flux integrals. The error in the spatial approximation is primarily due to equation 7.8. • Evaluation of the integrals. The integrals are usually evaluated numerically using Gauss type quadrature. The approximation function φn generally determine the number of quadrature points so that the quadrature is exact for polynomials of degree P . No error is then incurred during the spatial integration. One can then write: Q Fj = q =1 ˜ u · nT |δSj | xq ωk (7.11) where ωk are the weights of the Gaussian quadrature (as appropriate for multidimensional integration, xq are the Q quadrature points, and δSj are the mapping factors of the surface δSj 7.4. FINITE VOLUME IN 1D 99 • Temporal integration The final source of error originates from the temporal integration whereby the fluxes are used to advance the solution in time using a time marching procedure a la Forward Euler, one of the Runge Kutta methods, or the Adams-Bashforth class of methods. The time integration cannot be chosen independtly of the spatial approximation, but is usually constrained by stability considerations. 7.4 Finite Volume in 1D r 1 1 2 r r 2 3 2 r 3 5 2 ··· 7 2 r E j−1 j− 3 2 r E j j− 1 2 cell r j+1 j+ 1 j 2 + 3 2 r r M −1 r M j Figure 7.2: Discretization of a one-dimensional domain into computational cells of width δx. The cell centers are indicated by filled circles and cell edges by vertical lines ; the cell number is shown above the cells while the numbers below the line indicate cell edges. To illusrate the application of the procedure outlined in the previous section it is best to consider the simple case of one-space dimension. In that case the cells are line segments as shown in figure 7.2. The cell volumes reduces to the width of the segment δx which we assume to be the same for all cells in the following. The flux integrals reduce to evaluation of the term at the cell edges. The conservation law can now be written as: Fj + 1 − Fj − 1 Dj + 1 − Dj − 1 dT j 2 2 2 2 =− + dt δx δx (7.12) where Fj + 1 = uT and Dj + 1 = αTx are the advective flux and diffusive fluxes at xj + 1 . 2 7.4.1 2 2 Function Reconstruction The function reconstruction procedures can be developed as follows prior to time-integration. Assume P x − xj T= an ξ n , where ξ = (7.13) δxj n=0 Here an are the P +1 coefficients that need to be determined from P +1 conditions on the averages of the polynomials over multiple cells. The origin and scale of the coordinate system have been shifted to the centre of cell j and scaled by its width for convenience so that the left and right edges are located at a scaled length of -1/2 and 1/2, respectively. The entries of the matrix are now easy to fill since Am,n = xm+ 1 2 xm− 1 2 = δxj n x − xj − 1 2 n−1 δxj xm+ 1 − xj − 1 2 δxj 2 dx, for n = 1, 2, . . . , P + 1 n − xm− 1 − xj − 1 2 δxj 2 (7.14) n (7.15) 100 CHAPTER 7. FINITE VOLUME METHOD = δxj 7.4.2 n n ξm+ 1 − ξm− 1 2 2 , with ξm± 1 = n xm± 1 − xj 2 (7.16) δxj 2 Piecewise constant r r r r r r j−1 j r j+1 r r M −1 r M Figure 7.3: Piecewise constant approximation. The simplest case to consider is one where the function is constant over a cell as shown in figure 7.3. Then we have the approximation: T = a0 (7.17) Since there is only one unknown coefficient, a0 , we can only impose a single constraint, namely that the integral of T over cell j yields δxj T j : xj + 1 2 xj − 1 a0 dx = δxj T j (7.18) 2 The solution is then simply a0 = T j . The function reconstruction at xj + 1 can be done. 2 There are a couple of things to notice first with respect to implementing the solution algorithm: • The edge point two approximations are possible, one from the left cell j and one from the right cell j + 1. For the purpose of defining the advective flux, a physically intuitive justification is that the information reaching the edge should come from the upstream cell, i.e. the cell where the wind is blowing from. Thus we have: if uj + 1 ≥ 0 2 if uj + 1 < 0 Tj T j +1 Tj + 1 = 2 (7.19) 2 • Piecewise constant function have a zero spatial derivative, and hence piecewise constants are useless for computing the function derivation for the diffusion term. 7.4.3 Piecewise Linear r r r r r j−1 $$ $$$ $$$ r j r j+1 r Figure 7.4: Piecewise Linear Approximation. r M −1 r M 7.4. FINITE VOLUME IN 1D 101 An improvement over the constant approximation is to assume linear variations for the function over two cells as shown in figure 7.4. Then we have the approximation: T = a0 + a1 ξ (7.20) Since there are two unknown coefficient, a0 and a1 , we need to impose two integral constraints. The choice of reconstruction stencil is at our disposal; here we choose a symmetric stencil consisting of cells j and j + 1 so that xj − 1 ≤ x ≤ xj + 3 ; equivalently 2 1 − 2 ≤ ξ ≤ 3 . The two constraints become: 2 1 2 1 −2 3 2 1 2 2 a0 + a1 ξ dξ = T j a0 + a1 ξ dξ = (7.21) δxj +1 T j +1 δxj (7.22) Performing the integration we obtain the following system of equations for the unknowns: 10 11 a0 a1 Tj = δxj +1 T j +1 δxj (7.23) When the grid cells are of constant size the solution is simply: a0 = T j , a1 = T j +1 − T j −1 (7.24) The linear interpolation is then T = T j + T j +1 − T j ξ = (1 − ξ )T j + ξ T j +1 , for − (7.25) 3 1 ≤ξ≤ 2 2 (7.26) 1 To evaluate the function at the cell edge ξj ± 1 it is enough to set ξ = ± 2 . 2 7.4.4 Piecewise parabolic For parabolic interpolation centered on the cells j − 1, j and j + 1, the polynomial takes the form T j +1 − T j −1 T j +1 − 2T j + T j −1 2 −T j +1 + 26T j − T j −1 + ξ+ ξ 24 2 2 3 x − xj 3 (7.27) for − ≤ ξ ≤ , ξ = 2 2 δx T= At xj + 1 the cell edge value can now be computed by setting ξ = 1/2 in the expression 2 above to get: −T j −1 + 5T j + 2T j +1 Tj + 1 = (7.28) 2 6 102 CHAPTER 7. FINITE VOLUME METHOD 7.4.5 Reconstruction Validation The validation of the characteristics of the different reconstruction procedures will be illustrated here by looking at some important examples. In order to characterize the procedures accurately, we need to solve a problem with a known solution, and compare the results to what the numerical scheme yields. The result is of course the error in the reconstruction. We will look at several examples with distinct characteristics: function with smooth variations, and functions with local kinks in the graphs. The kinks are symptomatic of a break in the smoothness of the function, as in when the slope of the graph changes abruptly (slope discontinuity) or when the value of the function itself changes abruptly (function discontinuity). In all cases we will be interested in monitoring the decrease of the error with increasing number of cells (decreasing δx) for different functions, and for each of the reconstruction procedures. Infinitely smooth profile Our first example consists of reconstructing the function T = cos πx on the interval −1 ≤ x ≤ 1, starting from its analytical cell average over cell j : Tj = 1 δx xj + 1 2 xj − 1 cos πx dx = sin πxj + 1 − sin πxj − 1 2 2 πδx (7.29) 2 The result of the reconstruction using the piecewise constant, linear and parabolic reconstruction are shown by the symbols in figures 7.5-7.7, respectively, and the error at each cell edge is given by the height difference between the solid curve (reference solution) and the symbol. Remarks • The 4 cell discretization is obviously too coarse to represent the cosine wave properly, particularly for the piecewise constant case. • There is a dramatic improvement in going from to linear to parabolic in the case of the 8 cell discretication (top right panel of figures); whereas the piecewise constant reconstruction still incurs substantial errors. • The piecewise linear and parabolic reconstruction produce out of bound values (i.e. outside the range of the initial data whose absolute value is bounded by 1) at x = ±1 because extrapolation had to be used near the edges. This is a recurrent theme in methods whose stencil exceed a single cell. • The error decreases with increasing cell numbers (decreasing δx) for all three reconstruciton methods considered. • It is visually apparent that the decrease in error is fastest for the piecewise parabolic reconstruction then for the piecewise constant. An important issue concerns the rate at which the error decreases for each method. Figure 7.8 shows the convergence curves for the different methods as the number of cells 7.4. FINITE VOLUME IN 1D 103 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −0.5 0 0.5 1 −1 −1 1 1 −0.5 0 0.5 1 0 −0.5 0.5 0.5 0 0 1 0.5 −0.5 −0.5 −1 −1 −0.5 0 0.5 1 −1 −1 Figure 7.5: Reconstruction at cell edges using piecewise constant reconstruction with 4 (top left), 8 (top right), 16 (bottom left) and 32 (bottom right) cells. The solid curved line is the exact solution, whereas the staircases represent the cells and the cell averages. In the present case we have assumed that the wind is blowing from left to right except at the right most cell. The * symbols are the values of T calculated by the reconstruction procedure. The error is thus the height difference between the solid curve and the symbols. 104 CHAPTER 7. FINITE VOLUME METHOD 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 −0.5 0 0.5 1 −1 1 −0.5 0 0.5 1 −0.5 −1 1 0 −0.5 0.5 0.5 0 0 1 0.5 −0.5 −1 −1 −0.5 0 0.5 1 −1 Figure 7.6: Same as figure 7.5 but using piecewise reconstruction interpolation across 2 cells. One-sided reconstruction was used at both end-edges. 7.4. FINITE VOLUME IN 1D 105 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 −0.5 0 0.5 1 −1 1 −0.5 0 0.5 1 −0.5 −1 1 0 −0.5 0.5 0.5 0 0 1 0.5 −0.5 −1 −1 −0.5 0 0.5 1 −1 Figure 7.7: Same as figure 7.5 but using piecewise parabolic reconstruction across 3 cells. One sided reconstruction was used for end-edges and for the first internal left edge. 106 CHAPTER 7. FINITE VOLUME METHOD 0 10 −1 10 −2 max|T−Te| 10 −3 10 −4 10 −5 10 −6 10 0 1 10 2 3 10 10 number of cells=2/δ x 10 Figure 7.8: Convergence curves showing first, second, and third order convergence for the piecewise constant (black), linear (red) and parabolic (blue) reconstruction procedures, respectively. Reference slopes of N −1 , N −2 and N −3 are shown in dashed line for comparison . 1 1 1 0.5 0.5 0.5 0 0 0 −1 0 1 −1 0 1 −1 0 1 Figure 7.9: Quadratic reconstruction for a function with discontinuity using, from left to right, 10, 20 and 40 cells, respectively, with maximum errors of 0.25, 0.1623, and 0.1580. is increased. The slope of the curves on this log-log plot shows the order of the method. For small enough δx, the slope for the constant, linear and parabolic reconstruction asymptotes to -1, -2, and -3 respectively, indicating a convergence of O(δx), O(δx2 ) and O(δx3 ). Reconstructing Functions with Discontinuities The function T (x) = 1 x−a x+a − tanh tanh 2 δl δl (7.30) exhibit two transition zones at x = ±a of width δl as shown in figure 7.4.5. as δl becomes smaller then the grid spacing, the transition zones appear like discontinuities in the function. High order reconstruction of functions with discontinuities is problematic because of spurious oscillations as shown in figure 7.4.5 near the discontinuities. The maximum 7.5. FINITE VOLUME METHOD FOR SCALAR ADVECTION IN 2D 107 error norm decreases marginally as the number of cell is increased, and does not exhibit the cubic decrease expected for smooth functions. Notice also that the quadratic reconstruction has produced function values outside the range of the original data (larger then 1 at the left discontinuity and a negative value near the right discontinuity). These oscillations are commonly known as Gibbs oscillations and their amplitude does not decrease with increasing resolution. These oscillations are a direct consequence of applying the reconstruction across a stencil that includes a discontinuity. Notice that the first and second order reconstruction cannot produce out of range values, and as such are preferable near sharp transition zones. Many remedies have been proposed to mitigate the generation of these oscillations. Their common threads is to: first, test for the smoothness of the solution locally, and second, vary the order of the reconstruction and/or the stencil to avoid generating these discontinuities. In sharp transition zone low-order non-oscillatory schemes would be used whereas in smooth regions, high-order reconstruction would be used for improved accuracy. Here we explore one such procedure dubbed the Weighed Essentially NonOscillatory scheme (WENO), proposed by Shu and colleagues. 7.5 Finite Volume Method for Scalar Advection in 2D E k+1 k k−1 E E E E E T u T u T u T u T u T u T E E E E E E T u T u T u T u T u T u T j−1 E E E E E E T u T u T u T u T u T u T j E E E E E E T u T u T u T u T u T u T E E E E E E T u T u T u T u T u T u E E E E E E T j+1 Figure 7.10: Cartesian finite volume grid with rectangular cells. The solid point represents the average of T over a cell while the arrows denote the x − y advective fluxes into a cell through its 4 edges. The spatial discretization proceeds by dividing the domain into finite volume cells whose shape is left to the user. Finite volume codes have commonly used triangular cells (eg FVCOM) or rectangular cells. In the former the grid can be made unstructured and 108 CHAPTER 7. FINITE VOLUME METHOD resembling that used in finite element methods. In the rectangular case, the grids are structured and look like finite difference grids. In recent year the atmospheric community has spearheaded the development of hexagonal finite volume cells because they can be used efficiently to discretize spherical surfaces. Here we focus exclusively on rectangular cells as depicted in figure 7.10. We are now concerned with assigning the different terms appearing in equation 7.7. Here we presume that diffusion is non-existent (α = 0). We have the following remarks: • The cell volume is actually an area in 2D and we denote by δA = ∆x∆y the area of the cell where ∆x × ∆y are the cell sizes in the x × y directions. • The cells can be referenced by a pair of indices (j, k) along the (x, y ) directions. The cell center has coordinate (xj , yk ) and the cell walls are located at (xj ± ∆x , yk ± ∆y ). 2 2 • Each cell has four edges with constant normal along each. The outward unit normal to cell (j, k) at x = xj + ∆x/2 points in the positive x-direction, and we have u · n = u; whereas at x = xj − ∆x/2 it points in the negative x-direction and we have u · n = −u. Similarly along y = yk + ∆y/2 we have u · n = v , whereas along y = yk − ∆y/2 we have u · n = −v . With these remarks we can now write down the finite volume equation for cell (j, k): dT j,k δA dt + + =0 yk + ∆ y 2 yk − ∆ y 2 u xj + 1 , y T xj + 1 , y − u xj − 1 , y T xj − 1 , y dy v x, y k + 1 T x, y k + 1 − v x, y k − 1 T x, y k − 1 dx 2 x j + ∆x 2 x j − ∆j 2 2 2 2 2 2 2 2 (7.31) where xj ± 1 = xj ± ∆x , and yk± 1 = yk ± ∆y . In this form we see that the equation is 2 2 2 2 nothing but an accounting of the flux entering/leaving cell j . For ease of notation we denote the x, y -fluxes by (F, G) = (uT, vT ), and the equation can be re-written as: dT j,k δA dt + + yk + ∆ y 2 yk − ∆ y 2 x j + ∆x 2 x j − ∆j 2 F xj + 1 , y − F xj − 1 , y dy G x, y k + 1 − G x, y k − 1 dx = 0 2 2 2 2 An added wrinkle to the 2D finite volume formulation is the need to evaluate boundary integrals (which were not encountered in the 1D case). The silver lining is that all the integrals have the form: zm + ∆z 2 zm − ∆z 2 H (z ) dz (7.32) where H (z ) is some function of z . Here we use a simple second order integration scheme, the mid-point rule, and write zm + ∆z 2 zm − ∆z 2 H (z ) dz ≈ H (zm )∆z + O(∆z 2 ) (7.33) 7.5. FINITE VOLUME METHOD FOR SCALAR ADVECTION IN 2D 109 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Figure 7.11: The midpoint rule approximates the area under the black curve with the area under the red rectangle. The rectangle height is determined by the function value at the mid-point of the interval, the dashed red curve. A geometric interpretation of the mid-point rule is shown in figure 7.11. The area under the function H (z ), shown in black is approximated by the area under the red triangle, and whose height is defined by the function value at the mid-point zm . The integration reduces hence to an evaluation of the function at the edge center multiplied by the size of the edge, ∆z . Here we use a simple second order integration scheme, the mid-point rule, and write yk + ∆ y 2 yk − ∆ y 2 x j + ∆x 2 x j − ∆x 2 F xj + 1 , y dy ≈ F xj + 1 , yk ∆y (7.34) G x, y k + 1 dx ≈ G xj , yk+ 1 ∆x (7.35) 2 2 2 2 The above integration formula are exact if F varies linearly across the cells. Higher order integration formula could be used but would involve substantially more work. The final form for the 2D FV advection equation becomes: F xj + 1 , yk − F xj − 1 , yk dT j,k 2 2 =− dt ∆y + G xj , yk+ 1 − G xj , yk− 1 2 δA 2 ∆x (7.36) Equation 7.36 is an ordinary differential equation governing the time evolution of the cell-averaged tracer. Its time integration can be performed by one of the time-stepping scheme discussed previously; an appropriate scheme is the third order Runge-Kutta method (RK3). The right hand of this ODE requires simply the computations of the flux divergence onto cell (j, k). The final piece missing is the reconstruction of the function values at the cell edges prior to computing the fluxes, a topic we follow up on in the next section. 110 CHAPTER 7. FINITE VOLUME METHOD 7.5.1 Function reconstruction in 2D The flux integration requires simply the calculation of: ∆x , yk 2 ∆y G xj , y k + 2 F xj + ∆x , yk 2 ∆y = v xj , y k + 2 = Fj + 1 ,k = u xj + 2 = Gj,k+ 1 2 ∆x , yk 2 ∆y T xj , y k + 2 T xj + (7.37) (7.38) The scheme requires the evaluation of the T function from its cell averages in neighboring cells. The 2D geometry now afford a number of cells over which to do the reconstruction. One can divide finite volume methods in roughly two categories, one that uses a skewed stencil to reconstruct the function and those that rely on a straightforward dimensionby-dimension reconstruction. It is the latter that will be presented in these notes. Again 1 we will map our cells into a unit cell |ξ |, |η | ≤ 2 according to ξ= x − xj y − yk , η= ∆x ∆y (7.39) where (xj , yk ) is the center of cell (j, k). Piecewise constant 2D reconstruction In this case the function T is a constant that does not vary with ξ nor η within the finite volume cell. This constant must obviously enough be equal to the cell average, and we have T = T j,k . The reconstructed function on the cell edges become: Tj + 1 ,k = 2 T j,k T j +1,k for for uj + 1 ,k ≥ 0 2 , Tj,k+ 1 = 2 uj + 1 ,k < 0 2 T j,k T j,k+1 for for vj,k+ 1 ≥ 0 2 uj,k+ 1 < 0 (7.40) 2 Piecewise linear 2D reconstruction In this case the function T is allowed to vary linearly with ξ and η . An obvious candidate function is T = a0 + a1 ξ + b1 η (7.41) where a0 , a1 and b1 are unknown coefficients that must be determined by imposed conditions. For example by requiring that the cell averages of cells (j, k), (j + 1, k) and (j, k + 1) be recovered. It is easy then to show that a0 = T j,k , a1 = T j +1,k − T j,k , b1 = T j,k+1 − T j,k (7.42) This reconstruction is second order. The edge values can now be recovered: T j,k + T j +1,k 1 Tj + 1 ,k = T (ξ = , η = 0) = 2 2 2 T j,k + T j,k+1 1 Tj,k+ 1 = T (ξ = 0, η = ) = 2 2 2 (7.43) (7.44) 7.5. FINITE VOLUME METHOD FOR SCALAR ADVECTION IN 2D 111 Piecewise bi-linear 2D reconstruction The linear variation reconstruction proposed in 7.41 is not the only one possible. An alternative is a bilinear reconstruction: T = a0 + a1 ξ + b1 η + c1 ξη (7.45) This function varies linearly along constant ξ or η lines, and has a nonlinear (quadratic) behavior along any other line. Four constraints are now needed to determine the four constants: the recovery of the cell-averages over cells (j, k), (j + 1, k), (j, k + 1) and (j + 1, k + 1). The associated system of equation would be yk + q 2 yk − q 2 xj + p 2 xj − p (a0 + a1 ξ + b1 η + c1 ξη ) dξ dη = T j +p,k+q , p, q = 0, 1 (7.46) 2 This leads to the matrix equation and solution: 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 T j,k T j +1,k T j,k+1 T j +1,k+1 a0 a1 b1 c1 1 0 0 −1 1 0 −1 0 1 1 −1 −1 0 0 0 1 T j,k T j +1,k , = = T j,k+1 T j +1,k+1 (7.47) It is now a simple matter of evaluating the function at the flux integration points: a0 a1 b1 c1 1 , η = 0) = a0 + 2 1 = T (ξ = 0, η = ) = a0 + 2 Tj + 1 ,k = T (ξ = 2 Tj,k+ 1 2 T j,k + T j +1,k 1 a1 = 2 2 T j,k + T j,k+1 1 b1 = 2 2 (7.48) (7.49) Equations 7.49 and 7.44 are identical at the center of cell edges, (actually they coincide along the entire line ξ = 0 and η = 0) but are different at other points. For example at 1 the corner of the cell (ξ, η ) = ( 2 , 1 ) the linear interpolation of 7.41 yields 2 Tj + 1 ,k+ 1 = 2 2 T j +1,k + T j,k+1 2 (7.50) whereas the bilinear interpolation of 7.45 yields Tj + 1 ,k+ 1 = 2 2 T j,k + T j +1,k + T j,k+1 + T j +1,k+1 4 (7.51) Both linear and bilinear interpolations are valid and yields second order accuracy in the grid spacing. The bilinear interpolation include information from corner neighbors in its reconstruction. However, this information is not used when the flux is approximated only with a single mid-point evaluation at the cell edges. For this information to flow through the algorithm we need to modify the flux integral and evaluate it more accurately. This would be useful to do when the flow is not aligned with grid lines. 112 CHAPTER 7. FINITE VOLUME METHOD Q 2 3 4 z√ q ±√33 15 ±5 0 ± ± ωq 1 ξ√ q ±√63 15 ±6 0 5 9 8 9 √ 15+2 30 35 √ 15−2 30 35 1 ±2 0.347854845137454 1 ±2 0.652145154862546 √ 15+2 30 35 √ 15−2 30 35 ωq 1 2 5 18 8 18 0.173927422568727 0.326072577431273 Table 7.1: Table of Gauss-Lobatto quadrature points. Columns 2-3 shows the values for 1 the standard interval, while columns 3-4 shows the values for the interval |ξ | ≤ 2 . Higher order 2D reconstruction Higher-order reconstruction in multi-dimensions are possible, however, they require a substantial increase in the amount of computations and coding. The first has to do with an increased complexity of the reconstruction polynomials. This is relatively easy to handle as it could simply be done by a dimension by dimension approach. The additional complication is to replace the mid-point integration rule, equation 7.35, by something more accurate. One example is replacing the mid-point rule by Gauss quadrature formula: Q 1 −1 f (z ) dz ≈ f (zq )ωq (7.52) q =1 where zq are the quadrature points, the ωq are the quadrature weights, and Q is the number of quadrature points. The approximation is exact if f (z ) is a polynomial of degree 2Q − 1 in z . Gauss quadrature roots and weights are shown on the standard 1 interval [−1, 1] and [− 2 , 1 ] in table 7.1. Focussing on the integrals of the x-fluxes on the 2 right edge of the cells, we need to yk + ∆ y 2 yk − ∆ y 2 x j + ∆x 2 x j − ∆x 2 F xj + 1 , y 2 G x, y k + 1 2 d y = ∆y dx = ∆x 1 2 −1 2 1 2 1 −2 1 F ξ = ,η 2 1 G ξ, η = 2 Q dη ≈ ∆y F q =1 1 , ηq ωq 2 (7.53) 1 ωq 2 (7.54) Q dξ ≈ ∆x G ξq , q =1 These Gauss quadrature formula require the evaluation of the flux at multiple points along the edges, hence the reconstruction should be capable of providing value for T ( 1 , ηq ) 2 and T (ξq , 1 ). Note also that the advective velocity need to available at these points. 2 7.6 Algorithm Summary We are now in a position to summarize the solution process. 1. Read or Set the physical and numerical parameters of the problem. 7.7. CODE DESIGN 113 2. Define the domain’s geometry including the number of cells in the x, y directions, the grid sizes, and the cell areas. It would helpfull also to store the coordinates of the cell centers as well as that of the cell edges. 3. Define the flow field. The code for the Stommel gyre will be provided. 4. Define the output units, and the output format of the files, and compute some preliminary diagnostics, like the budget of T over the domain. 0 5. Define the initial distribution of the tracer Tj,k . The cell averages at the initial time can be also deduced. Here and keeping with the second order accuracy philosophy, we will take the cell averages to be the value of T at the center of the cell. It is easy to convince yourself using a two-dimensional version of the mid-point rule that T j,k = T (xj , yk ) + O(∆x2 , ∆y 2 ). 6. Start a time loop that call a subroutine to perform a single time integration using the RK3 routine. 7. The RK3 routine calls a function to compute the right hand side of the ODE, equation 7.36. It needs to call three times for the 3 stages of the scheme. The right hand side should accept the cell averages and return the flux divergence. 8. The right hand side computation requires the evaluation of the fluxes at cell edges prior to performing the fluxes. These are obtained from reconstructing the function values at the cell edges and multiplying by the velocity field. 9. Output some diagnostics, like the T budget within the domain, and the extrema of the field. The solution needs to be saved intermitently for examination. 7.7 Code Design You should start with the one-dimensional finite volume code that was presented in class as it already includes many of the elements listed above. What is required is to transform the code from 1D to 2D. One sequence of modification is as follows: 7.7.1 Data Structure It is obvious that the most efficient way to store the various data is in two dimensional matrices. The dimensions of the matrices differ, the cell averages must be Tb(M,N) where M, N are the number of cells in the x, y directions respectively. 7.7.2 Domain Geometry The domain description needs to be upgraded to account for the multi-dimensionality of the problem. The grid information is contained in module grid.f90. The following information requires to be added to it: • The grid size in the y -direction. 114 CHAPTER 7. FINITE VOLUME METHOD • The y-coordinate of the cell edges: yk+ 1 = (k − 1)∆y , with k = 1, 2, . . . , N + 1, 2 where N is the number of cells in the y-directions. The y -coordinates of the cell centers may also be needed and should be stored. 7.7.3 Flow The flow field information should be updated. The u velocity should be made twodimensional and the y component of the velocity should also be added. Notice that u need to be defined at vertical cell edges, and consequently its dimension should be declared as u(M+1,N) whereas v should be declared v(M,N+1). Refer to the figure for further information about the grid. 7.7.4 T initiations The setting of the initial condition was done in a subroutine included in module params.f90 that contains all the problem statements. This subroutine should be made two-dimensional, and be passed the proper data in its argument list or via the modules it includes. 7.8 Tracer Advection in a Stommel Gyre Here, we revisit the Hecht advection test problem (???) to characterize the behavior of the high-order scheme in the under-resolved regime. The Hecht advection test consists of advecting a passive tracer in a Stommel Gyre, and in particular through an intense westen boundary current where the smooth looking Gaussian Hill undergoes intense deformation. Under-resolved features in this regime produce substantial noise in the solution and fail to propagate the solution downstream and out of the boundary current region where the Hill reconstitute itself. 7.8.1 The flow field The flow field is given by the so-called Stommel gyre model which is an idealized version of the Gulf Stream system. The flow is characterized by an intense Western Boundary current that moves waters northward in a narrow zone whose width depends on the β effect and the bottom friction value. The flow in the rest of the domain is slow and has little shear. The streamline for this flow is given by: yπ P eAx + (1 − P )eBx − 1 (7.55) ψ = Ψ sin b 1 − eBa (7.56) P = Aa e − eBa where a and b are the zonal and meriodinal size of the basin, respectively (the western and southern boundaries are located at x = 0 and y = 0, respectively). The parameters A and B determine the width of the western boundary current region and these, in turn, depend on the ratio of the β parameter, β , and drag coefficient, Cd as follows: α= β Cd (7.57) 7.8. TRACER ADVECTION IN A STOMMEL GYRE 115 6000 5000 4000 3000 2000 1000 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Figure 7.12: Streamlines of Stommel gyre flow field. The flow is characterized by an intense Western Boundary Current and a slow moving southerward current elsewhere in the basin. The bull’s eye in the lower left corner is the initial distribution of T . α + 2 α 2 2 A=− α − 2 α 2 2 B=− π b 2 + π b 2 + (7.58) (7.59) The strength of the transport depends on the wind-stress, τ , the water density, ρ, the drag coefficient and the meridional size of the basin: Ψ= τπ ρCd b b π 2 (7.60) The advective velocity components can be obtained by differentiating the streamfunction and dividing by the depth of the basin D u= 7.8.2 ψx ψy , v=− D D (7.61) The initial condition The initial condition for our tracer which we can think of as pollutant dumped in the ocean is a circular Gaussian hill distribution with a decay length scale of l T = exp − (x − xc )2 + (y − yc )2 l2 and the center of the profile is located at (xc , yc ) = ab 3, 3 . (7.62) 116 CHAPTER 7. FINITE VOLUME METHOD 6000 5000 4000 3000 2000 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 6000 5000 4000 3000 2000 1000 6000 5000 4000 3000 2000 1000 6000 5000 4000 3000 2000 1000 Figure 7.13: Solution by a method of characteristics that tracks the solution after 1/2 (first), 1 (second), 2 (third) and 3 (fourth) years of integration. 7.8. TRACER ADVECTION IN A STOMMEL GYRE 7.8.3 117 Expected result The Gaussian hill will slowly makes its way to the western boundary region where it will be sheared and deformed substantially. The fidelity of the simulation will depend crucially on the spatial resolution used. The western boundary current with the current parameter settings will have a width of about 70 km. If the grid spacing is not enough to resolve the shear region, numerical noise will be generated and will manifest itself in either large positive and negative values outside the bounds of the initial condition (0 ≤ T ≤ 1). One of the aim of this exercise is to decide at what grid resolution, compared the width of the western boundary current, can one consider the simulation to be adequate. The report should thus include multiple runs with ∆x = 100, 50, 20, 10 km. The integration should be carried out for 3 (1080 days or years taking snapshots every month to record the time-evolution of T . The maximum speed is about 1.5 m/s; the time step should be scaled accordingly so that the Courant number, C = u∆t/Dx does not exceed 1/2. The solution after 3 years of integration is shown in figure 7.13. 7.8.4 Support Code A number of matlab scripts will be provided to help in visualizing the results in matlab. The scripts are located in directory ~mohamed/Project on metofis. Feel free to modify them to fit your needs. 118 CHAPTER 7. FINITE VOLUME METHOD Chapter 8 Numerical Dispersion of Linearized SWE This chapter is concerned with the impact of FDA and variable staggering on the fidelity of wave propagation in numerical models. We will use the shallow water equations as the model equations on which to compare various approximations. These equations are the simplest for describing wave motions in the ocean and atmosphere, and they are simple enough to be tractable with pencil and paper. By comparing the dispersion relation of the continuous and discrete systems, we can decide which scales of motions are faithfully represented in the model, and which are distorted. Conversely the diagrams produced can be used to decide on the number of points required to resolve specific wavelengths. The two classes of wave motions encountered here are inertia-gravity waves, and Rossby waves. The main reference for the present work is Dukowicz (1995). The plan is to look at dynamical system of increasing complexity in order to highlight various aspects of the discrete systems. We start by looking at 1D versions of the linearized shallow water equations, and unstaggered and staggered versions of the discrete approximation; in particular we constrast these two approaches for several high order centered difference scheme and show the superiority of the staggered system. Second we look at the impact of including a second spatial dimensional and include rotation but restrict ourselves to second order schemes; the discussion is instead focussed on the various staggering on the dispersive properties. Lastly we look at the dispersive relation for the Rossby waves. 8.1 Linearized SWE in 1D Since we are interested in applying Fourier analysis to study wave propagations, we need to linearize the equations and hold the coefficients of the PDE to be constant. For the shallow water equations, they are: ut + gηx = 0 (8.1) ηt + Hux = 0 (8.2) 119 120 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 8.1.1 Centered FDA on A-grid The straight forward approach to discretizing the shallow water equation in space is to replace the continuous partial derivatives by their discrete counter-parts. The main question is what impact do the choice of variable staggering have on the dispersion relationship. We start by looking at the case where u and η are co-located. We also restrict ourselves at centered approximation to the spatial derivatives which have the form: M αm (uj +m − uj −m ) ux |j = m=1 + O(∆x2M ) (8.3) 2∆x where M is the width of the stencil; the αm are coefficients that can be obtained from the Taylor series expansions (see equations 3.24,3.27, and 3.29. The order of the neglected term on an equally spaced grid is O(∆x)2M . A similar representation holds for the η derivative. The semi-discrete form of the equation is then: M m=1 αm (ηj +m ut |j + g − ηj −m ) 2∆x M αm (uj +m − uj −m ) m=1 2∆x ηt |j + H =0 (8.4) =0 (8.5) To compute the numerical dispersion associated with the spatially discrete system, we need to look at periodic solution in space and time, and thus we set uj ηj u ˆ η ˆ = ei(kxj −σt) (8.6) Hence we have for the time derivative ut = −σ uei(kxj −σt) , and for the spatially discrete ˆ derivative uj +m − uj −m = u ei(kxj+m −σt) − ei(kxj−m −σt) = uei(kxj −σt) 2i sin mk∆x; ˆ ˆ (8.7) Hence the FDA of the spatial derivative has the following expression M M i(kxj −σt) m=1 αm (uj +m − uj −m ) = 2i u e ˆ αm sin mk∆x (8.8) m=1 Similar expressions can be written for the η variables. Inserting the periodic solutions in 8.5 we get the homogeneous system of equations for the amplitudes u and η : ˆ ˆ M − iσ u + gi ˆ αm η=0 ˆ (8.9) sin mk∆x u − iσ η = 0 ˆ ˆ ∆x (8.10) m=1 M αm Hi m=1 sin mk∆x ∆x For non-trivial solution we require the determinant of the system to be equal to zero, a condition that yields the following dispersion relation: M σ = ±c αm m=1 sin mk∆x ∆x (8.11) 8.1. LINEARIZED SWE IN 1D where c = then √ 121 gH is the gravity wave speed of the continuous system. The phase speed is M αm CA,M = c m=1 sin mk∆x k ∆x (8.12) and clearly the departure of the term in bracket from unity determines the FDA phase fidelity of a given order M. We thus have the following relations for schemes of order 2, 4 and 6: sin k∆x CA,2 = c (8.13) k ∆x 8 sin k∆x − sin 2k∆x CA,4 = c (8.14) 6k∆x 45 sin k∆x − 9 sin 2k∆x + sin 3k∆x CA,6 = c (8.15) 30k∆x 8.1.2 Centered FDA on C-grid When the variables are staggered on a C-grid, the spatially discrete equations take the following form ut | j + 1 + g M m=0 βm 2 ηj + 1 + 1+2m − ηj + 1 − 1+2m 2 2 2 2 ∆x M βm (uj + 1+2m − uj − 1+2m ) m=0 =0 (8.16) 2 2 =0 (8.17) ∆x where βm ’s are the differentiation coefficients on a staggered grid. These can be obtained from applying the expansion in 3.24 to grids of spacing (2∗m+1)∆x/2 with m = 0, 1, 2, . . . to get: uj + 1 − uj − 1 ∂u (∆x/2)2 ∂ 3 u (∆x/2)4 ∂ 5 u (∆x/2)6 ∂ 7 u 2 2 = + + + (8.18) ∆x xj 3! ∂x3 5! ∂x5 7! ∂x7 uj + 3 − uj − 3 (3∆x/2)2 ∂ 3 u (3∆x/2)4 ∂ 5 u (3∆x/2)6 ∂ 7 u ∂u 2 2 + = + + (8.19) 3∆x xj 3! ∂x3 5! ∂x5 7! ∂x7 uj + 5 − uj − 5 ∂u (5∆x/2)2 ∂ 3 u (5∆x/2)4 ∂ 5 u (5∆x/2)6 ∂ 7 u 2 2 = + + + (8.20) 5∆x xj 3! ∂x3 5! ∂x5 7! ∂x7 ηt |j + H The fourth order approximation can be obtained by combining these expressions to yield: 1 3 1 uj + 2 − uj − 3 ∂u (∆x/2)4 ∂ 5 u (∆x/2)6 ∂ 7 u 9 uj + 2 − uj − 1 2 2 − = −9 − 90 (8.21) 8 ∆x 8 3∆x xj 5! ∂x5 7! ∂x7 1 1 5 25 uj + 2 − uj − 2 1 uj + 2 − uj − 5 ∂u (∆x/2)4 ∂ 5 u (∆x/2)6 ∂ 7 u 2 − = − 25 − 650 (8.22) 24 ∆x 24 5∆x xj 5! ∂x5 7! ∂x7 Finally the above two expressions can be combined to yield the sixth-order approximation: 5 25 uj + 3 − uj − 3 9 uj + 2 − uj − 5 ∂u (∆x/2)6 ∂ 7 u 450 uj + 1 − uj − 1 2 2 2 2 2 − + = + 150 384 ∆x 128 3∆x 384 5∆x xj 7! ∂x7 (8.23) 122 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE Going back to the dispersion relation, we now look for periodic solution of the form: i(kxj + 1 −σt) uj + 1 = ue ˆ 2 2 and ηj = η ei(kxj −σt) ˆ (8.24) which when replaced in the FDA yields the following M M m=0 βm (uj + 1+2m − uj − 1+2m ) = uei(kxj −σt) ˆ 2 2 βm ei m=0 M = uei(kxj −σt) 2i ˆ M m=0 βm sin i k xj + 1 −σt ˆ βm (ηj + 1 + 1+2m − ηj + 1 − 1+2m ) = η e 2 2 2 m=0 M 2 i k xj + 1 −σt 2 1+2m k ∆x 2 m=0 M 2i − e−i 1+2m k ∆x 2 (8.25) 1 + 2m k ∆x 2 βm ei 2 = ηe ˆ 1+2m k ∆x 2 βm sin m=0 − e−i (8.26) 1+2m k ∆x 2 1 + 2m k ∆x 2 (8.27) (8.28) Inserting these expressions in the FDA for the C-grid we obtain the following equations after eliminating the exponential factors to get the dispersion equations: M − iσ u + g2i ˆ βm m=0 M βm H 2i m=1 sin 1+2m k∆x 2 ∆x sin 1+2m k∆x 2 ∆x η=0 ˆ (8.29) u − iσ η = 0 ˆ ˆ (8.30) The frequency and the phase speed are then given by M σC,M = ±c βm m=0 M sin 1+2m k∆x sin 1+2m k∆x 2 2 σC,M = ±c βm ∆x/2 k∆x/2 m=0 (8.31) We thus have the following phase speeds for schemes of order 2,4 and 6 σC,2 = c σC,4 = c σC,6 = c sin k∆x 2 k∆x/2 (8.32) 27 sin k∆x − sin 3 k∆x 2 2 24k∆x/2 2250 sin k ∆x 2 − 125 sin 3 k∆x + 9 sin 5 k∆x 2 2 1920k∆x/2 (8.33) (8.34) Figure 8.1 compares the shallow water phase speed for the staggered and unstaggered configuration for various order of centered difference approximations. The un-staggered schemes display a familiar pattern: by increasing order the phase speed in the intermediate wavelengths is improved but there is a rapid deterioration for the marginally resolved waves k∆x ≥ 0.6. The staggered scheme on the other hand displays a more accurate representation of the phase speed for the entire spectrum. Notice that the second order 8.1. LINEARIZED SWE IN 1D 123 1 0.9 C−6 0.8 C−4 A−2 0.7 A−4 C−2 Cnum/C 0.6 0.5 0.4 A−6 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 k∆ x/π 0.6 0.7 0.8 Figure 8.1: Phase speed of the spatially discrete linearized shallow water equation. The solid lines show the phase speed for the A-grid configuration for centered schemes of order 2, 4 and 6, while the dashed lines show the phase speed of the staggered configuration for orders 2, 4 and 6. 0.9 1 124 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE u, v, η r r u, v, η A-Grid u, v, η r r u, v, η u, v r r B-Grid qηj,k u, v u, v r r u, v bψ C-Grid v r bψ v r ur qηj,k bψ ur bψ vr D-Grid u r qηj,k v r u r Figure 8.2: Configuration of 4 Arakawa grids staggered approximation provides a phase fidelity which is comparable to the the fourth order approximation in the intermediate wavelengths 0.2π ≤ k∆x ≤ 0.6π and superior for wavelengths k∆x ≥ 0.6. Finally, and most importantly the unstaggered scheme possess a null mode where C = 0 which could manifest itself as a non-propagating 2∆x spurious mode; the staggered schemes do not have a null mode. 8.2 Two-Dimensional SWE Here we carry out the dispersion relationship for the two-dimensional shallow water equaions in the presence of rotation. We shall consider the two cases of flow on an f plane and flow on a β -plane. We will also consider various grid information that include the Arakawa grids A, B, C, and D. 8.2.1 Inertia gravity waves The linearized equations are given by ut − f v + gηx = 0 (8.35) vt + f u + gηy = 0 (8.36) ηt + H (ux + vy ) = 0 (8.37) Assuming periodic solutions in time and space of the form (u, v, η ) = (ˆ, v , η )ei(kx+ly−ωt) , uˆˆ where (k, l) are wavenumbers in the x − y directions, we obtain the following eigenvalue problem for the frequency σ : −iσ −f gik f −iσ gil iHk iHl −iσ = −iω −ω 2 + f 2 + c2 (k2 + l2 ) = 0 (8.38) √ Here c = gH is the gravity wave speed. The non-inertial roots can be written in the following form: σ 2 = 1 + a2 (kd)2 + (ld)2 (8.39) 8.3. ROSSBY WAVES 125 where σ = ω/f is a non-dimensional frequency, d is the grid spacing assumed uniform in both directions and a is the ratio of the Rossby number to the grid spacing, i.e. it is the number of points per rossby radius: a= gH Ro = . d fd (8.40) Although the continuous dispersion does not depend on a grid spacing, it is useful to write it in the above form for comparison with the numerical dispersion relations. The numerical dispersion for the various grids are given by Dukowicz (1995) 2 σA = 1 + a2 sin2 kd + sin2 ld (8.41) 2 σB (8.42) 2 σC 2 σD 8.3 2 = 1 + 2a [1 − cos kd cos ld] ld ld kd kd cos2 + 4a2 sin2 + sin2 = cos2 2 2 2 2 ld ld kd kd cos2 + a2 cos2 sin2 ld + sin2 kd cos2 = cos2 2 2 2 2 (8.43) (8.44) Rossby waves The Rossby dispersion relation are given by σ = −a2 kd 1 + a2 (kd)2 + (ld)2 −1 (8.45) σA = −a2 sin kd cos ld 1 + a2 sin2 kd + sin2 ld σB = −a2 sin kd 1 + 2a2 [1 − cos kd cos ld] σC σD −1 ; (8.46) −1 ld ld kd kd ld cos2 + 4a2 sin2 + sin2 cos2 2 2 2 2 2 −1 ld ld kd = −a2 sin kd cos2 + sin2 1 + 4a2 sin2 ; 2 2 2 = −a2 sin kd cos2 (8.47) −1 (8.48) (8.49) where the frequency is now normalized by βd. The normalized Rossby wave frequencies, and their relative error for the various grids are displayed in figures 8.8-8.12 for various Rossby radius parameters a. Since contour plots are hard to read we also supply line-plots for special l values l = 0 and l = k in figure 8.13. From these plots one can conclude the following: 1. All grid configurations have a null mode at k∆x = π 2. The C and D grids have a null mode for all zonal wavenumber when ld = π . 3. for a ≥ 2 the B,C and D grids perform similarly for the resolved portion of the spectrum kd ≤ 2π/5. 126 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 35 1 30 A−Grid B−Grid 0.6 Exact 25 l∆ x/π l∆ x/π 0.8 0.4 20 0.2 0 1 15 l∆ x/π 0.8 C−Grid D−Grid 0.6 10 0.4 5 0.2 0 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 0 1 0.8 A−Grid σG 1 −σ , r=8 B−Grid 1 l∆ x/π 0.6 0.9 0.4 0.8 0.2 0.7 0 1 0.6 0.8 0.5 C−Grid 0.4 l∆ x/π 0.6 0.3 0.4 0.2 0.2 0.1 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 Figure 8.3: Comparison of the dispersion relation on the Arakawa A, B, C and D grids. The top figure shows the dispersion relation while the bottom one shows the relative error. The parameter a=8. 8.3. ROSSBY WAVES 127 18 16 1 A−Grid B−Grid 0.6 Exact 14 l∆ x/π l∆ x/π 0.8 0.4 12 10 0.2 0 1 8 0.8 C−Grid D−Grid l∆ x/π 6 0.6 4 0.4 0.2 0 2 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 0 1 0.8 A−Grid σG 1 −σ , r=4 B−Grid 1 l∆ x/π 0.6 0.9 0.4 0.8 0.2 0.7 0 1 0.6 0.8 0.5 C−Grid 0.4 l∆ x/π 0.6 0.3 0.4 0.2 0.2 0.1 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.4: Same as 8.3 but for a=4. 1 0 128 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 9 8 1 A−Grid B−Grid 0.6 Exact 7 l∆ x/π l∆ x/π 0.8 0.4 6 0.2 5 0 1 4 0.8 C−Grid D−Grid l∆ x/π 3 0.6 2 0.4 0.2 0 1 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 0 1 0.8 A−Grid σG 1 −σ , r=2 B−Grid 1 l∆ x/π 0.6 0.9 0.4 0.8 0.2 0.7 0 1 0.6 0.8 0.5 C−Grid 0.4 l∆ x/π 0.6 0.3 0.4 0.2 0.2 0.1 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.5: Same as 8.3 but for a=2. 1 0 8.3. ROSSBY WAVES 129 5 4.5 1 A−Grid B−Grid 0.6 4 Exact 3.5 l∆ x/π l∆ x/π 0.8 0.4 3 0.2 2.5 0 1 2 l∆ x/π 0.8 C−Grid D−Grid 1.5 0.6 0.4 1 0.2 0 0.5 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 0 1 0.8 A−Grid σG 1 −σ , r=1 B−Grid 1 l∆ x/π 0.6 0.9 0.4 0.8 0.2 0.7 0 1 0.6 0.8 0.5 C−Grid 0.4 l∆ x/π 0.6 0.3 0.4 0.2 0.2 0.1 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.6: Same as 8.3 but for a=1. 1 0 130 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 3 1 2.5 A−Grid B−Grid 0.6 Exact l∆ x/π l∆ x/π 0.8 0.4 2 0.2 1.5 0 1 0.8 C−Grid D−Grid l∆ x/π 1 0.6 0.4 0.5 0.2 0 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 0 1 0.8 A−Grid σG 1 −σ , r=0.5 B−Grid 1 l∆ x/π 0.6 0.9 0.4 0.8 0.2 0.7 0 1 0.6 0.8 0.5 C−Grid 0.4 l∆ x/π 0.6 0.3 0.4 0.2 0.2 0.1 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.7: Same as 8.3 but for a=1/2. 1 0 8.3. ROSSBY WAVES 131 4 1 3 0.8 A−Grid B−Grid Exact l∆ x/π l∆ x/π 2 0.6 0.4 1 0.2 0 0 1 l∆ x/π 0.8 C−Grid −1 D−Grid 0.6 −2 0.4 −3 0.2 0 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 −4 1 σG Rossby 1 −σ 0.8 A−Grid B−Grid r=8 1 l∆ x/π 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 0.8 C−Grid 0 D−Grid −0.2 l∆ x/π 0.6 −0.4 0.4 −0.6 0.2 −0.8 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 1 −1 Figure 8.8: Comparison of Rossby wave dispersion for the different Arakawa grids. The top figures shows the dispersion while the bottom ones show the relative error. Here a=8. 132 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 2 1 1.5 0.8 A−Grid B−Grid Exact l∆ x/π l∆ x/π 1 0.6 0.4 0.5 0.2 0 0 1 l∆ x/π 0.8 C−Grid −0.5 D−Grid 0.6 −1 0.4 −1.5 0.2 0 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 −2 1 σG Rossby 1 −σ 0.8 A−Grid B−Grid r=4 1 l∆ x/π 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 0.8 C−Grid 0 D−Grid −0.2 l∆ x/π 0.6 −0.4 0.4 −0.6 0.2 −0.8 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.9: Same as figure 8.8 but for a=4. 1 −1 8.3. ROSSBY WAVES 133 1 0.8 1 A−Grid B−Grid 0.6 Exact 0.4 0.6 0.4 l∆ x/π l∆ x/π 0.8 0.2 0.2 0 0 1 −0.2 l∆ x/π 0.8 C−Grid D−Grid −0.4 0.6 0.4 −0.6 0.2 0 −0.8 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 −1 1 σG Rossby 1 −σ 0.8 A−Grid B−Grid r=2 1 l∆ x/π 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 0.8 C−Grid 0 D−Grid −0.2 l∆ x/π 0.6 −0.4 0.4 −0.6 0.2 −0.8 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.10: Same as figure 8.8 but for a=2. 1 −1 134 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 1 0.8 1 A−Grid B−Grid 0.6 Exact 0.4 0.6 0.4 l∆ x/π l∆ x/π 0.8 0.2 0.2 0 0 1 −0.2 l∆ x/π 0.8 C−Grid D−Grid −0.4 0.6 0.4 −0.6 0.2 0 −0.8 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 −1 1 σG Rossby 1 −σ 0.8 A−Grid B−Grid r=1 1 l∆ x/π 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 0.8 C−Grid 0 D−Grid −0.2 l∆ x/π 0.6 −0.4 0.4 −0.6 0.2 −0.8 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.11: Same as figure 8.8 but for a=1. 1 −1 8.3. ROSSBY WAVES 135 1 0.8 1 A−Grid B−Grid 0.6 Exact 0.4 0.6 0.4 l∆ x/π l∆ x/π 0.8 0.2 0.2 0 0 1 −0.2 l∆ x/π 0.8 C−Grid D−Grid −0.4 0.6 0.4 −0.6 0.2 0 −0.8 0 0.5 k∆ x/π 10 0.5 k∆ x/π 1 −1 1 σG Rossby 1 −σ 0.8 A−Grid B−Grid r=0.5 1 l∆ x/π 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 0.8 C−Grid 0 D−Grid −0.2 l∆ x/π 0.6 −0.4 0.4 −0.6 0.2 −0.8 0 0 0.2 0.4 0.6 k∆ x/π 0.8 1 0 0.2 0.4 0.6 k∆ x/π 0.8 Figure 8.12: Same as figure 8.8 but for a=1/2. 1 −1 136 CHAPTER 8. NUMERICAL DISPERSION OF LINEARIZED SWE 3 0 −0.5 2 −1 1 σ −1.5 0 −2 −2.5 −1 −3 −2 −3.5 −4 0 0.2 0.4 0.6 0.8 1 −3 0 0.2 1 π − k ∆x 0.4 1 − π 0.6 0.8 1 0.8 1 0.8 1 0.8 1 0.8 1 k ∆x 1.5 0 1 −0.5 σ 0.5 0 −1 −0.5 −1.5 −1 −2 0 0.2 0.4 0.6 0.8 1 −1.5 0 0.2 1 π − k ∆x 0.4 1 − 0.6 π k ∆x 0.4 1 π− k ∆x 0.8 0 0.6 −0.2 0.4 0.2 −0.6 −0.2 σ −0.4 0 −0.4 −0.8 −0.6 −1 0 0.2 0.4 0.6 0.8 1 −0.8 0 0.2 1 π − k ∆x 0.6 0.3 0 0.2 −0.1 0.1 0 −0.3 −0.1 σ −0.2 −0.2 −0.4 −0.3 −0.5 0 0.2 0.4 0.6 0.8 1 −0.4 0 0.2 1 π − k ∆x 0.4 1 − π 0.6 k ∆x 0.15 0 0.1 −0.05 0.05 0 −0.15 −0.05 σ −0.1 −0.1 −0.2 −0.15 −0.25 0 0.2 0.4 0.6 1 π − k ∆x 0.8 1 −0.2 0 0.2 0.4 1 − π 0.6 k ∆x Figure 8.13: Rossby wave frequency σ versus k∆x for, from top to bottom a =, 8,4,2, 1 and 1/2. The left figures show the case l = 0 and the right figures the case l = k. The black line refers to the continuous case and the colored line to the A (red), B (blue), C (green), and D (magenta) grids. Chapter 9 Solving the Poisson Equations The textbook example of an elliptic equation is the Poisson equation: ∇2 u = f, x ∈ Ω (9.1) subject to appropriate boundary conditions on ∂ Ω, the boundary of the domain. The right hand side f is a known function. We can approximate the above equation using standard second order finite differences: uj +1,k − 2uj,k + uj − 1, k uj,k+1 − 2uj,k + uj, k − 1 + = fj,k ∆2 x ∆2 y (9.2) The finite difference representation 9.2 of the Poisson equation results in a coupled system of algebraic equations that must be solved simultaneously. In matrix notation the system can be written in the form Ax = b, where x represents the vector of unknowns, b represents the right hand side, and A the matrix representing the system. Boundary conditions must be applied prior to solving the system of equations. 5s s s s s k3s s s s s T4 s 2s 1s 1 s s s 2 s s s 3 j s s s 4 E s s s 5 Figure 9.1: Finite Difference Grid for a Poisson equation. 137 138 CHAPTER 9. SOLVING THE POISSON EQUATIONS Example 12 For a square domain divided into 4x4 cells, as shown in figure 9.1, subject to Dirichlet boundary conditions on all boundaries, there are 9 unknowns uj,k , with (j, k) = 1, 2, 3. The finite difference equations applied at these points provide us with the system: −4 1 0 1 0 0 0 0 0 1 −4 1 0 1 0 0 0 0 0 1 −4 1 0 1 0 0 0 1 0 1 −4 1 0 1 0 0 0 1 0 1 −4 1 0 1 0 0 0 1 0 1 −4 1 0 1 0 0 0 1 0 1 −4 1 0 0 0 0 0 1 0 1 −4 1 0 0 0 0 0 1 0 1 −4 u2,2 u3,2 u4,2 u2,3 u3,3 u4,3 u2,4 u3,4 u4,4 f2,2 f3,2 f4,2 f2,3 f3,3 f4,3 f2,4 f3,4 f4,4 u2,1 + u1,2 u3,1 u4,1 0 = ∆ − 0 0 u2,5 u3,5 u4,5 + u5,4 (9.3) where ∆x = ∆y = ∆. Notice that the system is symmetric, and pentadiagonal (5 nonzero diagonal). This last property precludes the efficient solution of the system using the efficient tridiagonal solver. The crux of the work in solving elliptic PDE is the need to update the unknowns simultaneously by inverting the system Ax = b. The solution methodologies fall under 2 broad categories: 1. Direct solvers: calculate the solution x = A−1 b exactly (up to round-off errors). These methods can be further classified as: (a) Matrix Methods: are the most general type solvers and work for arbitrary non-singular matrices A. They work by factoring the matrix into a lower and upper triangular matrices that can be easily inverted. The most general and robust algorithm for this factorization is the Gaussian elimination method with partial pivoting. For symmetric real system a slightly faster version relies on the Cholesky algorithm. The main drawback of matrix methods is that their storage cost CPU cost grow rapidly with the increase of the number of points. In particular, the CPU cost grows as O(M 3 ), where N is the number of unknowns. If the grid has N points in each direction, then for a 2D problem this cost scales like N 6 , and like N 9 for 3D problems. (b) FFT also referred to as fast solvers. These methods take advantage of the structure of separable equations to diagonalize the system using Fast Fourier Transforms. The efficiency of the method rests on the fact that FFT costs grow like N 2 ln N in 2D, a substantial reduction compared to the N 6 cost of matrix methods. 2. Iterative Methods calculate an approximation to the solution that minimizes the norm of the residual vector r = b − Ax. There is a large number of iterative solvers; the most efficient ones are those that have a fast convergence rate and low CPU cost per iteration. Often times it is necessary to exploit the structure of the 9.1. ITERATIVE METHODS 139 equations to reduce the CPU cost and accelerate convergence. Here we mention a few of the more common iterative schemes: (a) Fixed point methods: Jacobi and Gauss-Seidel Methods) (b) Multigrid methods (c) Krylov Method: Preconditioned Conjugate Gradient (PCG) 9.1 Iterative Methods We will discuss mainly the fixed point iterations methods and (maybe) PCG methods. 9.1.1 Jacobi method The solution of the Poisson equation can be viewed as the steady state solution to the following parabolic equation: ut = ∇ 2 u − f (9.4) At steady state, t → ∞ the left hand side of the equation goes to zero and we recover the original Poisson equation. The artifice of introducing a pseudo-time t is usefull because we can now use an explicit method to update the unknowns uj,k individually without solving a system of equation. Using a FTCS approximation we have: un+1 = un + j,k j,k ∆t n u + un−1,k + un +1 + un −1 − 4un − ∆tfj,k j j,k j,k j,k ∆2 j +1,k (9.5) where we have assumed ∆x = ∆y = ∆. At steady state, after an infinite number of iteration, the solution satifies uj,k = uj,k + ∆t uj +1,k + uj −1,k + uj,k+1 + uj,k−1 − 4un − ∆tfj,k j,k ∆2 (9.6) Forming the difference of equations 9.6 and 9.5 we get: en+1 = en + j,k j,k ∆t n + en−1,k + en +1 + en −1 − en , e j,k j,k j,k j ∆2 j +1,k (9.7) where en = un − uj,k . Thus, the error will evolve according to the amplification factor j,k j,k associated with the FDE 9.7. We note that the initial conditions for our pseudo-time are not important for the convergence analysis; of course we would like to start our initial guess as close as possible to the right solution. Second, since we are not interested in the transient solution, the time-accuracy is not relevant easier. As a matter of fact, we would like to use the largest time step possible that will lead us to steady state. A Von-Neumann stability analysis shows that the stability limit is ∆t ≤ 1 ∆2 , substituting 4 this time step in the equation we obtain the Jacobi algorithm: un+1 = j,k un+1,k + un−1,k + un +1 + un −1 ∆2 j j j,k j,k − fj,k 4 4 (9.8) A Von-Neumann analysis shows that the amplification factor for this method is given 140 CHAPTER 9. SOLVING THE POISSON EQUATIONS 0.8 0.7 0.7 0.6 0.6 λ∆ y/π 1 0.9 0.8 λ∆ y/π 1 0.9 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.1 0 0.2 0.2 0.3 0.4 0.4 0.6 κ∆ x/π 0.5 0.8 0.6 0 1 0.7 0.8 0.9 0.1 0 0.2 0.2 0.4 0.3 0.4 κ∆ x/π 0.6 0.5 0.8 0.6 1 0.7 0.8 0.9 Figure 9.2: Magnitude of the amplification factor for the Jacobi method (left) and GaussSeidel method (right) as a function of the (x, y ) Fourier components. by cos κ∆x + cos κ∆y (9.9) 2 where (κ, λ) are the wavenumbers in the (x, y ) directions. A plot of |G| is shown in figure 9.2. It is clear that the shortest (κ∆x → π ) and longest (κ∆x → 0) error components are damped the least. The intermediate wavelengths (κ∆x = π/2) are damped most effectively. G= 9.1.2 Gauss-Seidel method A simple modification to the Gauss-Seidel method can improve the storage and convergence rate of the Jacobi method. Note that in Jacobi, only values at the previous iterations are used to update the solution. An improved algorithm can be obtained if the most recent value is used. Assuming that we are sweeping through the grid by increased j and k indeces, then the Gauss-Seidel method can be written in the form: +1 un+1,k + un−1,k + un +1 + un+1 1 j,k j j,k − j ∆2 fj,k (9.10) 4 4 The major advantages of this scheme are that only one time level need be stored (the values can be updated on the fly), and the convergence rate can be improved substantially (double the rate of the Jacobi method). The latter point can be quantified by looking at the error amplification which now takes the form: un+1 j,k G= = − 1 + cos(α − β ) eiα + eiβ |G|2 = −iα + e−iβ ) 4 − (e 9 − 4(cos α + cos β ) + cos(α − β ) (9.11) where α = κ∆x, and β = λ∆y . A plot of |G| versus wavenumbers is shown in figure 9.2, and clearly shows the reduction in the area where |G| is close to 1. Notice that unlike the Jacobi method, the smallest wavelengths are damped at the rate of 1/3 at every time step. The error components that are damped the least are the long ones: α, β → 0. 9.1. ITERATIVE METHODS 141 1 1 0.9 0.9 0.7 0.6 0.6 λ∆ y/π 0.8 0.7 λ∆ y/π 0.8 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.2 0 0.2 0.3 0.4 0.4 κ∆ x/π 0.6 0.5 0.8 0.6 0 1 0.7 0.8 0.1 0 0.2 0.2 0.4 0.3 0.4 κ∆ x/π 0.6 0.5 0.8 0.6 1 0.7 0.8 0.9 Figure 9.3: Magnitude of the amplification factor for the Gauss-Seidel by rows (left) and Gauss-Seidel method by rows and columns (right) as a function of the (x, y ) Fourier components. 9.1.3 Successive Over Relaxation (SOR) method A correction factor can be added to the update of the Gauss-Seidel method in order to improve the convergence rate. Let u∗ denote the temporary value obtained from the j,k Gauss-Seidel step; then an improved estimate of the solution is un+1 = un + ω (u∗ − un ) j,k j,k j,k j,k (9.12) where ω is the correction factor. For ω = 1 we revert to the Gauss-Seidel update, for ω < 1 the correction is under-relaxed, and for ω > 1 the correction is over-relaxed. For convergence, it can be shown that 1 ≤ ω ≤ 2. The optimal ω , ωo , can be quite hard to compute and depends on the number of points in each direction and the boundary conditions applied. Analytic values for ωo can be obtained for a Dirichlet problem: ωo = 2 1− √ 1−β , β= β 2 ∆x2 cos Nπ 1 − ∆y 2 2 1 + ∆x2 ∆y cos Mπ 1 + − (9.13) where M and N are the number of points in the x and y directions, respectively. 9.1.4 Iteration by Lines A closer examination of the Gauss-Seidel method in equation 9.10 reveals that an efficient algorithm, relying on tridiagonal solvers, can be produced if the iteration is changed to: un+1 j,k = +1 +1 un+1,k + un−1,k + r 2 (un +1 + un+1 1 ) j,k j,k − j j 2(1 + r 2 ) − ∆ x2 fj,k 4 (9.14) where r = ∆x/∆y is the aspect ratio of the grid. Notice that 9.14 has 3 unknowns only at row j since un+1 1 would be known from either a boundary condition or a previous j,k − 142 CHAPTER 9. SOLVING THE POISSON EQUATIONS iteration, and un +1 is still lagged in time. Hence a simple tridiagonal solver can be j,k used to update the rows one-by-one. The amplification factor for this variation on the Gauss-Seidel method is given by: |G|2 = r4 [2(1 + r 2 − cos α)]2 + [2(1 + r 2 − cos α)]2 cos β + r 4 (9.15) A plot of |G| for r = 1 is shown in figure 9.3. The areas with small |G| have expanded with resepect to those shown in figure 9.2. In order to symmetrize the iterations along the two directions, it is natural to follow a sweep-by-row by a sweep-by-columns. The amplification factor for this iteration is shown in the left panel of figure 9.3 and show a substantial reduction in error amplitude for all wavelenths except the longest ones. Example 13 In order to illustrate the eciency of the different methods outline above we solve the following Laplace equation ∇2 u = 0, 0 ≤ x, y ≤ 1 (9.16) u(0, y ) = u(1, y ) = 0 (9.17) u(x, 1) = sin πx (9.18) −16(x− 1 )2 4 u(x, 1) = e sin πx (9.19) We divide the unit square into M × N grid points and we use the following methods:Jacobi, Gauss-Seidel, SOR, SOR by line in the x-direction, and SOR by line in both directions. We monitor the convergence history with the rms change in u from one iteration to the next: ǫ 2 1 1 (un+1 − un )2 = jk M N j,k jk 2 (9.20) The stopping criterion is ǫ 2 < 10−13 , and we limit the maximum number of iterarions to 7,000. We start all iterations with u = 0 (save for the bc) as an initial guess. The convergence history is shown in figure 9.4 for M = N = 65. The Jacobi and Gauss-Seidel have similar convergence history except near the end where Gauss-Seidel is converging faster. The SOR iterations are the fastest reducing the number of iterations required by a factor of 100 almost. We have used the optimal relaxation factor since it is was computable in our case. The SOR iterations are also quite similar showing a slow decrease of the error in the initial stages but very rapid decrease in the final stages. The criteria for the selection of an iteration algorithm should not rely solely on the algorithm’s rate of convergence; it should also the operation count needed to complete each iteration. The convergence history for the above example shows that the 2-way line SOR is the most efficient per iterations. However, table 9.1 shows the total CPU time is cheapest for the point-SOR. Thus, the overhead of the tridiagonal solver is not compensated by the higher efficiency of the SOR by line iterations. Table 9.1 also shows that, where applicable, the FFT-based fast solvers are the most efficient and cheapest. 9.1. ITERATIVE METHODS 143 −2 10 −4 10 −6 |ε| 2 10 −8 10 −10 10 −12 10 −14 10 0 10 1 10 2 3 10 n 4 10 10 Figure 9.4: Convergence history for the Laplace equation. The system of equation is solved with: Jacobi (green), Gauss-Seidel (red), SOR (black), Line SOR in x (solid blue), and line SOR in x and y (dashed blue). Here M = N = 65. Jacobi Gauss-Seidel SOR SOR-Line SOR-Line 2 FFT 33 0.161 0.131 0.009 0.013 0.014 0.000 65 0.682 2.197 0.056 0.164 0.251 0.001 129 2.769 10.789 0.793 1.291 1.403 0.004 Table 9.1: CPU time in second to solve Laplace equation versus the number of points (top row). 144 9.1.5 CHAPTER 9. SOLVING THE POISSON EQUATIONS Matrix Analysis The relaxation schemes presented above are not restricted to the Poisson equation but can be re-intrepeted as specific instances of a larger class of schemes. We present the matrix approach in order to unify the different schemes presented. Let the matrix A be split into A=N −P (9.21) where N and P are matrices of the same order as A. The system of equations becomes: Nx = Px + b (9.22) Starting with an arbitrary vector x(0) , we define a sequence of vectors x(v) by the recursion N x(n) = P x(n−1) + b, n = 1, 2, 3, ... (9.23) It is now clear what kind of restrictions need to be imposed on the matrices in order to solve for x, namely: the matrix N must be non-singular: det(N ) = 0, and the matrix N must be easily invertible so that computing y from N y = z is computationally efficient. In order to study how fast the iterations are converging to the correct solution, we introduce the matrix M = N −1 P , and the error vectors e(n) = x(n) − x. Substracting equation 9.22 from equation 9.23, we obtain an equation governing the evolution of the error, thus: e(n) = M e(n−1) = M 2 e(n−2) = . . . = M n e(0) (9.24) where e(0) is the initial error. Thus, it is clear that a sufficient condition for convergence, i.e. that limn→∞ e(n) = 0, is that limn→∞ M n = O. This is also necessary for the method to converge for all e(0) . The condition for a matrix to be convergent is that its spectral radius ρ(M ) < 1. (Reminder: the spectral radius of a matrix M is dened as the maximum eigenvalue in magnitude: ρ(M ) = maxi |λi |). Since computing the eigenvalues is difficult usually, and since the spectral radius is a lower bound for any matrix norm, we often revert to imposing conditions on the matrix norm to enforce convergence; thus ρ(M ) ≤ M < 1. (9.25) In particular, it is common to use either the 1- or infinity-norms since these are the simplest to calculate. The spectral radius is also useful in defining the rate of convergence of the method. In fact since, using equation 9.24, one can bound the norm of the error by: e(n) (n) e ≤ Mn e(0) (9.26) n (9.27) ≤ [ρ(M )] (0) e Thus the number of iteration needed to reduce the initial error by a factor of α is n ≥ ln α/ ln[ρ(M )]. Thus, a small spectral radius reduces the number of iterations (and hence CPU cost) needed for convergence. 9.1. ITERATIVE METHODS 145 Jacobi Method The Jacobi method derived for the Poisson equation can be generalized by defining the matrix N as the diagonal of matrix A: N = D, P = A − D (9.28) The matrix D = aij δij , where δij is the Kronecker delta. The matrix M = D−1 (D − A) = I − D −1 A In component form the update takes the form: xn = i 1 aii K n aij xj −1 (9.29) j =1 j =i The procedure can be employed if aii = 0, i.e. all the diagonal elements of A are different from zero. The rate of convergence is in general difficult to obtain since the eigenvalues are not easily available. However, the infinity and/or 1-norm of M can be easily obtained: ρ(M ) ≤ min( M M 1 = max j i=1 1, M ∞) aij <1 aii (9.30) (9.31) i=j M ∞ = max i j =1 aij <1 ajj (9.32) j =i (9.33) Gauss-Seidel Method A change of splitting leads to the Gauss-Seidel method. Thus we split the matrix into a lower triangular matrix, and an upper triangular matrix: N = a11 a21 . . . a22 aK 1 aK 2 .. . · · · aKK , P = N − A (9.34) A slightly different form of writing this splitting is as A = D + L + U where D is again the diagonal part of A, L is a strictly lower triangular matrix, and U is a strictly upper triangular matrix; here N = D + L. The matrix notation for the SOR iteration is a little complicated but can be computed: xm = M xm−1 + (I + αD−1 L)−1 αD−1 b M = (I + αD −1 −1 L) [(1 − α)I − αD −1 (9.35) U] (9.36) 146 9.2 CHAPTER 9. SOLVING THE POISSON EQUATIONS Krylov Method-CG Consider the system of equations Ax = b, where the matrix A is a symmetric positive definite matrix. The solution of the system of equations is equivalent to minimizing the functional: 1 Φ(x) = xT Ax − xT b. (9.37) 2 The extremum occurs for ∂ Φ = Ax − b = 0, thanks to the symmetry of the matrix, and ∂x 2 the positivity of the matrix shows that this extremum is a minimum, i.e. ∂ Φ = A. The ∂x2 iterations have the form: xk = xk−1 + αpk (9.38) where xk is the kth iterate, α is a scalar and pk are the search directions. The two parameters at our disposal are α and p. We also define the residual vector rk = b − Axk . We can now relate Φ(xk ) to Φ(xk−1 ): Φ(xk ) = 1T x Axk − xT b k 2k = Φ(xk−1 ) + αxT−1 Apk + k α2 T p Apk − αpT b k 2k (9.39) For an efficient iteration algorithm, the 2nd and 3rd terms on the right hand side of equation 9.39 have to be minimized separately. The task is considerably simplified if we require the search directions pk to be A-orthogonal to the solution: xT−1 Apk = 0. k (9.40) The remaining task is to choose α such that the last term in 9.39 is minimized. It is a simple matter to show that the optimal α occurs for α= pT b k , pT Apk k (9.41) and that the new value of the functional will be: Φ(xk ) = Φ(xk−1 ) − 1 (pT b)2 k . 2 pT Apk k (9.42) We can use the orthogonality requirement 9.40 to rewrite the above two equations as: α= pT rk−1 1 (pT rk−1 )2 k k , Φ(xk ) = Φ(xk−1 ) − . 2 pT Apk pT Apk k k (9.43) The remaining task is defining the iteration is to determine the algorithm needed to update the search vectors pk ; the latter must satisfy the orthogonality condition 9.40, and must maximum the decrease in the functional. Let us denote by Pk the matrix 9.2. KRYLOV METHOD-CG 147 formed by the (k − 1) column vectors pi , then since the iterates are linear combinations of the search vectors,we can write: xk − 1 = Pk−1 = y= k −1 i=1 αi pi = Pk−1 y p 1 p 2 . . . p k −1 α1 α2 . . . αk−1 (9.44) (9.45) (9.46) We note that the solution vector xk−1 belongs to the space spanned by the search vectors T pi , i = 1, . . . , k − 1. The orthogonality property can now be written as y T Pk−1 Apk = 0. This property is easy to satisfy if the new search vector pk is A-orthogonal to all the T previous search vectors, i.e. if Pk−1 Apk = 0. The algorith can now be summarized as follows: First we initialize our computations by defining our initial guess and its residual; second we perform the following iterations: while rk < ǫ: 1. Choose pk such that pT Apk = 0, ∀i < k, and maximize pT rk−1 . i k 2. Compute the optimal αk = end p T r k −1 k pT Apk k 3. Update the guess xk = xk−1 + αk pk , and residual rk = rk−1 − αk Apk A vector pk which is A-orthogonal to all previous search, and such that pT rk−1 = 0, k vectors can always be found. Note that if pT rk−1 = 0, then the functional does not k decrease and the minimum has been reached, i.e. the system has been solved. To bring about the largest decrease in Φ(xk ), we must maximize the inner product pT rk−1 . This k can be done by minimizing the angle between the two vectors pk and rk−1 , i.e. minimizing rk−1 − pk . Consider the following update for the search direction: pk = rk−1 − APk−1 zk−1 (9.47) where zk−1 is chosen to minimize J = rk−1 − APk−1 z 2 . It is easy to show that the minimum occurs for T T Pk−1 aT APk−1 z = Pk−1 AT rk−1 , (9.48) and under this condition pT APk−1 = 0, and pk − rk is minimized. We have the following k property: TT Pk rk = 0, (9.49) i.e. the search vectors are orthogonal to the residual vectors. We note that Span{p1 , p2 , . . . , pk } = Span{r0 , r1 , . . . , rk−1 } = Span{b, Ab, . . . , Ak−1 b} (9.50) 148 CHAPTER 9. SOLVING THE POISSON EQUATIONS i.e. these different basis sets are spanning the same vector space. The final steps in the conjugate gradient algorithm is that the search vectors can be written in the simple form: pk = rk−1 + βk pk−1 , pT Ark−1 k βk = − T−1 , pk−1 Apk−1 (9.51) αk = − (9.53) (9.52) T rk−1 rk−1 pT Apk k The conjugate gradient algorithm can now be summarized as Initialize: r = b − Ax, p = r , ρ = r 2 . while ρ < ǫ: k ←k+1 w ← Ap 2 α = pr w T update guess: x ← x + αp update residual: r ← r − αw new residual norm: ρ′ ← r 2 update search direction: β = ρ′ /ρ, p ← r + βp update residual norm: ρ ← ρ′ . end It can be shown that the error in the CG algorithm after k iteration is bounded by: xk − x A ≤ 2 x0 − x A √ κ−1 √ κ+1 k (9.54) where κ(A) is the condition number, κ(A) = A A−1 = maxi (|λi |) , mini (|λi |) (9.55) the ratio of maximum eigenvalue to minimum eigenvalue. The error estimate uses the √ A-norm: w A = wT Aw . Note that for very large condition numbers, 1 ≪ κ, the rate of residual decrease approaches 1: √ κ−1 √ κ+1 2 ≈1− √ ≈1 κ (9.56) Hence the number of iterations needed to reach convergence increases. For efficient iterations κ must be close to 1, i.e. the eigenvalues cluster around the unit circle. The ˜˜ problem becomes one of converting the original problem Ax = b into Ax = ˜ with b ˜ κ(A) ≈ 1. 9.3. DIRECT METHODS 149 9.3 Direct Methods 9.3.1 Periodic Problem We will be mainly concerned with FFT based direct methods. These are based on the efficiency of the FFT to diagonalize the matrix A trough the transformation D = Q−1 AQ, where Q is the unitary matrix made up of the eigenvectors of A. These eigenvector depend on the shape, boundary conditions of the problem. The method is applicable to seperable elliptic problems only. For a doubly periodic problem, we can write that: ujk = M −1 N −1 1 MN umn e−i ˆ 2πjm M e−i 2πkn N (9.57) m=0 n=0 where umn are the Discrete Fourier Coefficients. A similar expression can be written for ˆ the right hand side function f . Replace the Fourier expression in the original Laplace equation we get: 1 MN M −1 N −1 umn e−i ˆ 2πjm M e−i 2πkn N e−i 2πm M + ei 2πm M e−i 2πn N + ei 2πn N = m=0 n=0 ∆2 1 MN M −1 N −1 2πjm 2πkn ˆ fmn e−i M e−i N (9.58) m=0 n=0 Since the Fourier functions form an orthogonal basis, the Fourier coefficients should match individually. Thus, one can obtain the following expression for the unknowns umn : ˆ umn = ˆ 9.3.2 ˆ ∆2 fmn πm πn 2 cos 2M + cos 2N − 2 , m = 0, 1, . . . , M − 1, n = 0, 1, . . . , N − 1 (9.59) Dirichlet Problem For a Dirichlet problem with homogeneous boundary conditions on all boundaries, the following expansion satisfies the boundary conditions identically: ujk = 1 MN M −1 N −1 m=1 n=1 umn sin ˆ πjm πkn sin M N (9.60) Again, the sine basis function are orthogonal and hence the Fourier coefficients can be computed as umn = ˆ ˆ ∆2 fmn , m = 1, . . . , M − 1, n = 1, . . . , N − 1 2 cos πm + cos πn − 2 M N (9.61) Again, the efficiency of the method rests on the FFT algorithm. Specialized routines for sine-transforms are available. 150 CHAPTER 9. SOLVING THE POISSON EQUATIONS Chapter 10 Nonlinear equations Linear stability analysis is not sufficient to establish the stability of finite difference approximation to nonlinear PDE’s. The nonlinearities add a severe complications to the equations by providing a continuous source for the generation of small scales. Here we investigate how to approach nonlinear problems, and ways to mitigate/control the growth of nonlinear instabilities. 10.1 Aliasing In a constant coefficient linear PDE, no new Fourier components are created that are not present either in the initial and boundary conditions conditions, or in the forcing functions. This is not the case if nonlinear terms are present or if the coefficients of a linear PDE are not constant. For example, if two periodic functions: φ = eik1 xj and ψ = eik2 xj , are multiplied during the course of a calculation, a new Fourier mode with wavenumber k1 + k2 is generated: φψ = ei(k1 +k2 )xj . (10.1) The new wave generated will be shorter then its parents if k1,2 have the same sign, i.e. k12πk2 < k2π2 . The representation of this new wave on the finite difference grid can + 1, become problematic if its wavelength is smaller then twice the grid spacing. In this case the wave can be mistaken for a longer wave via aliasing. Aliasing occurs because a function defined on a discrete grid has a limit on the shortest wave number it can represent; all wavenumbers shorter then this limit appear as a long wave. The shortest wavelength representable of a finite difference grid with step size ∆x is λs = 2∆x and hence the largest wavenumber is kmax = 2π/λs = π/∆x. Figure 10.1 shows an example of a long and short waves aliased on a finite difference grid consisting 6πx of 6 cells. The solid line represents the function sin 4∆x and is indistinguishable from πx the function sin 4∆x (dashed line): the two functions coincide at all points of the grid (as marked by the solid circles). This coincidence can be explained by re-writing each Fourier mode as: eikxj = eikj ∆x = eikj ∆x ei2πn 151 j 2πn = ei(k+ ∆x )j ∆x , (10.2) 152 CHAPTER 10. NONLINEAR EQUATIONS 6πx 6πx Figure 10.1: Aliasing of the function sin 4∆x (solid line) and the function sin 4∆x (dashed line). The two functions have the same values on the FD grid j ∆x. where n = 0, ±1, ±2, . . . Relation 10.2 is satisfied at all the FD grid points xj = j ∆x; it shows that all waves with wavenumber k + 2πn are indistinguishable on a finite difference ∆x grid with grid size ∆x. In the case shown in figure 10.1, the long wave has length 4∆x and the short wave has length 4∆x/3, so that the equation 10.2 applies with n = −1. Going back to the example of the quadratic nonlinearity φψ , although the individual functions, φ and ψ , are representable on the FD grid, i.e. |k1,2 | ≤ π/∆x, their product may not be since now |k1 + k2 | ≤ 2π/∆x. In particular, if π/∆x ≤ |k1 + k2 | ≤ 2π/∆x, the product will be unresolvable on the discrete grid and will be aliased into wavenumber ˜ k given by π 2 k1 + k2 − ∆π , if k1 + k2 > ∆x x ˜ (10.3) k= π 2π k1 + k2 + ∆x , if k1 + k2 < − ∆x ˜ Note that very short waves are aliased to very long waves: k → 0 when |k1,2 | → d kc 0 ˜ |k| t T π ∆x . t k1 + k2 π ∆x & % The k E 2π ∆x Figure 10.2: Folding of short waves into long waves. aliasing wavenumber can be visualized by looking at the wavenumber axis shown in figure π ˜ 10.2; note that k1 + k2 and |k| are symmetrically located about the point ∆x . There ˜ exist a cut-off wavenumber kc whereby all longer wavelength are aliased to |k| > kc; thus since k < kc , then (k1 + k2 ) < 2kc . Thus if k1 + k2 > kmax the product will be aliased 10.2. 1D BURGER EQUATION 153 ˜ ˜ into |k | = kmax − (2kc − kmax ), and the latter must satisfy |k | > kc , and we end up with 2 kc < kmax 3 For a finite difference grid this is equivalent to kc < 10.2 (10.4) 2π 3∆x . 1D Burger equation The 1D Burger equation is the simplest model of nonlinearities as found in the NavierStokes momentum equations: ∂u ∂u +u = 0, ∂t ∂x 0≤x≤L (10.5) Multiplying the above equation by um where m ≥ 0 we can derive the following conservation law: 1 ∂um+2 ∂um+1 + = 0. (10.6) ∂t m + 2 ∂x The above equation is a conservation equation for all moments of the solution; in particular for m = 0, we have the momentum conservation, and for m = 1 energy conservation. The spatial integral yields: ∂ m+1 dx Lu ∂t + um+2 |L − um+2 |0 = 0. m+2 (10.7) which shows that the global budget of um+1 depends on the boundary values, and is zero for periodic boundary conditions. We will explore the impact of spatial differencing on the continuum conservation properties of the energy (second moment). A central difference scheme of the advective form of the equation, equation 10.5, yields: uj +1 − uj −1 ∂uj = − uj . ∂t 2∆x (10.8) Multiplying by uj and summing over the interval we get: ∂ N 2 j =0 uj /2 ∂t =− 1N (uj uj +1 − uj uj −1 ). 2 j =0 (10.9) Notice that the terms within the summation sign do not cancel out, and hence energy is not conserved. Likewise, a finite difference approximation to the conservative form: u2+1 − u2−1 ∂uj j j =− , ∂t 4∆x (10.10) is not conserving as its discrete energy equation attests: ∂ N 2 j =0 uj /2 ∂t =− 1N (uj u2+1 − uj u2−1 ). j j 4 j =0 (10.11) 154 CHAPTER 10. NONLINEAR EQUATIONS 1.5 0.56 1 0.55 0.54 0.5 2 ∆x Σi=1 u /2 0.53 N 0 0.52 0.51 −0.5 0.5 −1 0.49 −1.5 −1 −0.5 0 x 0.5 1 0.48 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 t Figure 10.3: Left: Solution of the inviscid Burger equation at t = 0.318 < 1/π using the advective form (black), momentum conserving form (blue), and energy conserving form (red); the analytical solution is superimposed in Green. The initial conditions are u(x, 0) = − sin πx, the boundary conditions are periodic, the time step is ∆t = 0.01325, and ∆x = 2/16; RK4 was used for the time integration. Right: Energy budget for the different Burger schemes: red is the energy conserving, blue is the momentum conserving, and black is the advective form. The only energy conserving available for the Burger equation is the following: ∂uj uj +1 + uj + uj −1 uj +1 − uj −1 =− . ∂t 3 2∆x (10.12) where the advection velocity is a three-term average of the velocity at the central point. Its discrete energy equation is given by: ∂ N 2 j =0 uj /2 ∂t =− 1N [uj uj +1 (uj + uj +1 ) − uj uj −1 (uj + uj −1 )] 6 j =0 (10.13) where the term inside the summation sign does cancel out. Figure 10.3 shows solutions of the Burger equations using the 3 schemes listed above. The advective form, shown in black, does not conserve energy and exhibits oscillations near the front region. The oscillations are absent in the both the flux and energy conserving forms. The flux form, equation 10.10, exhibits a decrease in the energy and a decrease in the amplitude of the waves. Note that the solution is shown just prior to the formation of the shock at time t = 0.318 < 1/π . 10.3 Quadratic Conservation It is obvious that building quadratic conserving schemes depends highly on the system of equations considered. The remaining sections will be devoted to equations commonly 10.3. QUADRATIC CONSERVATION 155 found in the CFD/oceanic literature. We will concentrate on the following system of equations: vt + v · ∇v + g∇η = f (10.14) αηt + ∇ · (hv) = 0 (10.15) Equation 10.14 is the momentum equation and 10.15 is the continuity equation in a fluid moving with velocity v and subject to a pressure η . The parameter α controls the compressibility of the system; for α = 0 we recover the incompressible equations, and for α = 1 the shallow water equations. The parameter h is the thickness of the fluid layer, and f is a term lumping momentum sources and sinks (including dissipation). The system 10.14-10.15 imply conservation laws for energy, vorticity and enstrophy in the incompressible case, and for energy, potential vorticity and potential enstrophy in the compressible case, when the source terms on the right hand sides are identically zero. The question is: Is it possible to enforce the conservation of these higher order quantities (mainly quadratic in the unknown variables) in the finite difference approximation? We will look in particular at energy/enstrophy conservation in non-divergent shallow water flow, and on energy/potential vorticity conservation in divergent shallow water equations. We first begin by defining the following operators: − u(x − ∆x ) 2 (10.16) ∆x u(x + ∆x ) + u(x − ∆x ) 2 2 (10.17) ux = 2 The operators defined above are nothing but the centered difference and averaging operators. It is easy to show that the following relationships holds for any 2 functions a and b defined on a finite difference grids: δx u = ∆x 2) u(x + x δx (ax ) = δx a (10.18) x x a δx b = δx (ab) − b δx a x (10.19) x aδx b = δx (a b) − bδx a ∆x2 x x (δx a) (δx b) ab = ax b + 4 ∆ x2 x x ax b = ab + δx b δx a 4 (10.20) (10.21) (10.22) The first relationship shows that the differencing and averaging operators commute, the second and third relations are the finite difference equivalent of product differentiation rules, and the fourth and fifth are the finite difference equivalent of product averaging. It is easy to show the additional relationships ax δx a = δx a2 2 (10.23) x a2 2 x = δx x a2 = (a ) − 2 aδx a a2 2 x2 x , a2 = a x + ∆x 2 a x− ∆x 2 (10.24) x (10.25) 156 10.4 CHAPTER 10. NONLINEAR EQUATIONS Nonlinear advection equation The advection equation DT = Tt + v · ∇T = 0 Dt (10.26) of a tracer T by a flow field v that is divergence-free: ∇ · v = 0, (10.27) is equivalent to the following conservation law: Tt + ∇ · (vT ) = 0. (10.28) Equation 10.26 is called the advective form, and equation 10.28 is called the flux (or conservative) form. The two forms are equivalent in the continuum provided the flow is divergence free. Note that the above statements holds regardless of the linearity/dimensionality of the system. Integration of the flux form over the domain Ω shows that d dt T dV Ω =− ∂Ω n · v T dS (10.29) where n is the unit outward normal, ∂ Ω the boundary of the flow domain, dS is an elemental surface on this boundary, and the boundary integral is the amount of T entering Ω. Equation 10.29 shows that the total inventory of T and Ω depends on the amount of T entering through the boundary. In particular the total budget of T should be constant if the boundary is closed v · n = 0 or if the domain is periodic. The above conservation laws imply higher order conservation. To wit, equation 10.26, can be multiplied by T m (where m ≥ 0) and the following equation can be derived: ∂T m+1 + v · ∇T m+1 = 0, ∂t (10.30) i.e. the conservation law imply that all moments of T m are also conserved. Since the above equation has the same form as the original one, the total inventory of T m will also be conserved under the same conditions as equation 10.29. 10.4.1 FD Approximation of the advection term For a general flow field, the FD of the advection form will not conserve the first moment, while the FD of the flux form will. This is easy to see since the flux is: ∇ · (vT ) = δx (uT ) + δy (vT ). (10.31) The relevant question is: is it possible to come up with a finite difference scheme that will conserve both the first and second moment? Let us look at the following approximation of the flux form x y (10.32) ∇ · (vT ) = δx (ux T ) + δy (v y T ). 10.4. NONLINEAR ADVECTION EQUATION 157 Can the term T ∇ · (vT ) be also written in flux form (for then it will be conserved upon summation). We concentrate on the x-component for simplicity: T δx ux T x = δx ux T x = δx u T x = δx u T = δx ux T + x2 x2 x2 x2 x − ux T δx T − x T2 2 ux δx (10.33) x (10.34) x x T2 T2 u δx u − 2 2 2x T2 ∆ x2 xT u δx δx u δx − 2 4 2 − δx − δx T2 ∆ x2 δx δx u δx 4 2 (10.35) + T2 δx ux 2 (10.36) x T2 x T2 δx (ux ) u− 2 2 = δx (10.37) Equality 10.33 follows from property 10.20, 10.34 from 10.23, 10.35 from 10.19. The second and third terms of equation 10.35 can rewritten with the help of equations 10.21 and 10.24, respectively. The third and fifth terms on the right hand side of equation 10.36 cancel. The final equation 10.37 is obtained by combining the first and second terms of equation 10.36 (remember the operators are linear), and using equation 10.25. A similar derivation can be carries out for the y -component of the divergence: y y T δy v T y = δy T2 y T2 δy (v y ) v− 2 2 (10.38) Thus, the semi-discrete second moment conservation becomes: y x T2 y T2 ∂T 2 /2 T2 x u − δy v+ = −δx [δx (ux ) + δy (v y )] ∂t 2 2 2 (10.39) The first and second term in the semi-discrete conservation equation are in flux form, and hence will cancel out upon summation. The third term on the right hand side is nothing but the discrete divergence constraint. Thus, the second order moment of T will be conserved provided that the velocity field is discretely divergence-free. The following is a FD approximation to v · ∇T consistent with the above derivation: u ∂T ∂x ∂uT ∂u −T ∂x ∂x x = δx ux T − T δx (ux ) = (10.40) (10.41) x = T δx (ux ) + ux δx T − T δx (ux ) = ux δx T Thus, we have x x v · ∇T = ux δx T + v y δy T y (10.42) (10.43) (10.44) 158 CHAPTER 10. NONLINEAR EQUATIONS 10.5 Conservation in vorticity streamfunction formulation Nonlinear instabilities can develop if energy is falsely generated and persistently channeled towards the shortest resolvable wavelengths. Arakawa Arakawa (1966); Arakawa and Lamb (1977) devised an elegant method to eliminate these artificial sources of energy. His methodology is based on the streamfunction-vorticity formulation of two dimensional, divergence-free fluid flows. The continuity constraint can be easily enforced in 2D flow by introducing a streamfunction, ψ , such that v = k × ∇ψ ; in component form this is: u = −ψy , v = ψx (10.45) ζ = vx − uy = ψx x + ψy y = ∇2 ψ (10.46) The vorticity ζ = ∇ × v reduces to and the vorticity advection equation can be obtained by taking the curl of the momentum equation, thus: ∂ ∇2 ψ = J (∇2 ψ, ψ ) (10.47) ∂t where J stand for the Jacobian operator: J (a, b) = ax by − bx ay (10.48) The Jacobian operator possesses some interesting properties 1. It is anti-symmetric, i.e. J (b, a) = −J (a, b) (10.49) 2. The Jacobian can be written in the useful forms: J (a, b) = ∇a · k × ∇b (10.50) = ∇ · (k × a∇b) (10.51) = −∇ · (k × b∇a) (10.52) 3. The integral of the Jacobian over a closed domain can be turned into a boundary integral thanks to the above equations J (a, b)dA = Ω a ∂Ω ∂b ds = − ∂s b ∂Ω ∂a ds ∂s (10.53) where s is the tangential direction to the boundary. Hence, the integral of the Jacobian vanishes if either a or b is constant along ∂ Ω. In particular, if the boundary is a streamline or a vortex line, the Jacobian integral vanishes. The area-averaged vorticity is hence conserved. 4. The following relations hold: a2 , b) 2 b2 bJ (a, b) = J (a, ) 2 aJ (a, b) = J ( (10.54) (10.55) 10.5. CONSERVATION IN VORTICITY STREAMFUNCTION FORMULATION 159 Thus, the area integrals of aJ (a, b) and bJ (a, b) are zero if either a or b are constant along the boundary. It is easy to show that enstrophy, ζ 2 /2, and kinetic energy, |∇ψ |2 /2, are conserved if the boundary is closed. We would like to investigate if we can conserve vorticity, energy and enstrophy in the discrete equations. We begin first by noting that the Jacobian in the continuum form can be written in one of 3 ways: J (ζ, ψ ) = ζx ψy − ζy ψx (10.56) = (ζψy )x − (ζψx )y (10.57) = (ψζx )y − (ψζy )x (10.58) We can thus define 3 centered difference approximations to the above definitions: x y y J1 (ζ, ψ ) = δx ζ δy ψ − δy ζ δx ψ J2 (ζ, ψ ) = δx ζ δy ψ J3 (ζ, ψ ) = δy ψ δx ζ yx x (10.59) − δy ζ δx ψ xy − δx ψ δy ζ xy (10.60) yx (10.61) It is obvious that J2 and J3 will conserve vorticity since they are in flux form; J1 can also be shown to conserve the first moment since: xy x J1 (ζ, ψ ) = δx δy ψ ζ x − ζ δx δy ψ xy x = δx δy ψ ζ − yx xy y − δy δx ψ ζ y + ζ δy δx ψ xy (10.62) ∆x2 ∆y 2 y xy y x . δx δy ψ δx ζ − δy δx ψ ζ − δy δx ψ δy ζ (10.63) 4 4 The last equation above shows that J1 can indeed be written in flux form, and hence vorticity conservation is ensured. Now we turn our attention to the conservation of quadratic quantities, namely, kinetic energy and enstrophy. It is easy to show that J2 conserves kinetic energy since: ψJ2 (ζ, ψ ) = ψδx ζ δy ψ yx x yx x yx y xy = δx ψ ζδy ψ = δx ψ ζδy ψ − δy ψ ζδx ψ − ψδy ζ δx ψ y xy − δy ψ ζδx ψ (10.64) xy x yx xy y − ζδy ψ δx ψ + ζδx ψ δy ψ(10.65) − ∆ x2 y δx ψ δx ζ δy ψ 4 − ∆y 2 x δy ψ δy ζ δx ψ 4 (10.66) Similarly, it can be shown that the average of J1 and J2 conserves enstrophy: ζ J1 + J2 ∆x2 x yx y = δx ζ ζδy ψ − δx ζ δx ζ δy ψ 2 4 y − δy ζ ζδx ψ xy − ∆y 2 x δy ζ δy ζ δx ψ 4 (10.67) 160 CHAPTER 10. NONLINEAR EQUATIONS Notice that the finite difference Jacobians satisfy the following property: J1 (ψ, ζ ) = −J1 (ζ, ψ ) J2 (ψ, ζ ) = −J3 (ζ, ψ ). (10.68) (10.69) Hence, from equation 10.66 ζJ3 (ζ, ψ ) can be written in flux form, and from equation 10.67 ψ J1 +J3 can also be written in flux form. These results can be tabulated: 2 energy conserving J2 J1 + J3 2 enstrophy conserving J3 J1 + J2 2 Notice that any linear combination of the energy conserving schemes will also be energy conserving, likewise for the enstrophy conserving forms. Thus, it is possible to find an energy and enstrophy conserving Jacobian if we can find two constants α and β such that: J1 + J2 J1 + J3 = βJ3 + (1 − β ) (10.70) JA = αJ2 + (1 − α) 2 2 Equating like terms in J1 , J2 and J3 we can solve the system of equation. The final result can be written as: J1 + J2 + J3 (10.71) JA = 3 Equation 10.71 defines the Arakawa Jacobian, named in honor of Akio Arakawa who proposed it first. The expression for JA in terms of the FD computational stencil is a little complicated. We give the expression for a square grid spacing:∆x = ∆y . 12∆x∆yJA (ζ, ψ ) = (ζj +1,k + ζj,k ) (ψj +1,k+1 + ψj,k+1 − ψj +1,k−1 − ψj,k−1 ) + (ζj,k+1 + ζj,k ) (ψj −1,k+1 + ψj −1,k − ψj +1,k+1 − ψj +1,k ) + (ζj −1,k + ζj,k ) (ψj −1,k−1 + ψj,k−1 − ψj −1,k+1 − ψj,k−1 ) + (ζj,k−1 + ζj,k ) (ψj +1,k−1 + ψj +1,k − ψj +1,k−1 − ψj −1,k ) + (ζj +1,k+1 + ζj,k ) (ψj,k+1 − ψj +1,k ) + (ζj −1,k+1 + ζj,k ) (ψj −1,k − ψj,k+1 ) + (ζj −1,k−1 + ζj,k ) (ψj,k−1 − ψj −1,k ) + (ζj +1,k−1 + ζj,k ) (ψj +1,k − ψj,k−1 ) (10.72) Note, the terms in ζj,k cancel out, the expression for JA can use ±ζj,k or no value at all. An important property of the Arakawa Jacobian is that it inhibits the pile up of energy at small scales, a consequence of conserving enstrophy and energy. Since both quantities are conserved, so is their ratio which can be used to define an average wavenumber κ: κ2 = ∇ψ 2 dA . 2 2 A (∇ ψ ) dA A (10.73) For the case of a periodic problem, the relationship between the above ratio and wavenumbers can be easily demonstrated by expanding the streamfunction and vorticity in terms 10.6. CONSERVATION IN PRIMITIVE EQUATIONS cψ v s us q ηj,k cψ v s cψ 161 us cψ Figure 10.4: Configuration of unknowns on an Arakawa C-Grid. The C-Grid velocity points ui+ 1 and vi,j + 1 are located a distance d/2 to the left and top, respectively, of 2 2 pressure point ηi,j . of the Fourier components: ˆ ψm,n eimx einy ψ= m ζ=− (10.74) n ˆ (m2 + n2 )ψm,n eimx einy m (10.75) n ˆ where ψm,n are the complex Fourier cofficients and the computational domain has been mapped into the square domain [0, 2π ]2 . Using the orthogonality of the Fourier modes, it is easy to show that the ratio κ becomes κ2 = m m ˆ + n2 )2 |ψm,n |2 2 2ˆ 2 n (m + n )|ψm,n | n (m 2 (10.76) The implication of equation 10.76 is that there can be no one-way cascade of energy in wavenumber space; if some “local” cascading takes place from one part of the spectrum to the other; there must be a compensating shift of energy in another part. 10.6 Conservation in primitive equations The Arakawa Jacobian enforces enstrophy and energy conservation when the vorticitystreamfunction formulation is discretized. There are situatiions (in the presence of islands, for example) where the vorticity-streamfunction formulation is not appropriate and one must revert to discretizing the primitive equations (momentum and volume conservation). The question becomes what is the appropriate FD discretization of the nonlinear momentum advection that can guarantee conservation of kinetic energy and enstrophy? The process to obtaining these differencing schemes is to backtrap the steps that lead to the derivation of the Jacobian operator in the vorticity equation. The Jacobian in the vorticity equation arises from the cross-differentiation of the nonlinear advection terms: ∂ v · ∇v ∂ v · ∇u − ∂x ∂y = = ∂ ∂u ∂v ∂v ∂ ∂v −u −v u + v ∂x ∂x ∂y ∂y ∂x ∂y ∂v ∂u ∂ ∂v ∂ ∂u +v +v u − u ∂x ∂x ∂y ∂y ∂x ∂y (10.77) (10.78) 162 CHAPTER 10. NONLINEAR EQUATIONS In order to make use of the results obtained using the vorticity-streamfunction form, it is usefull to introduce a fictitious streamfunction in the primitive variables using a staggered grid akin to the Arakawa C-grid shown in figure 10.4. The staggered velocity are defined with respect to the “fictitious streamfunction”: u = −δy ψ, v = δx ψ, ζ = δx v − δy u (10.79) The energy conserving Jacobian can thus be written as: x J2 (ζ, ψ ) = −δx uy δx v + v y δy v y x y + δy ux δx u + v x δy u (10.80) Comparing equations 10.80 and 10.77 we can deduce the following energy conserving momentum advection operators: x y v · ∇u = ux δx u + v x δy u x y v · ∇v = uy δx v + v y δy v (10.81) In a similar manner, the enstrophy conserving Jacobian can be re-written as: J1 + J2 = −δx (uxy δx v x + v yy δy v y ) + δy (uxx δx ux + v xy δy uy ) , 2 (10.82) and we can deduce the following enstrophy-conserving operators: v · ∇u = uxx δx ux + v xy δy uy v · ∇v = uxy δx v x + v yy δy v y (10.83) If either 10.81 or 10.83 is used in the momentum equation, and if the flow is discretely divergence-free, then energy or enstrophy is conserved in the same manner as it is in the vorticity equation through J2 or J1 +J2 . Stated differently, only the divergent part of the 2 velocity field is capable of creating or destroying energy or enstrophy, in perfect analogy to the behavior of the continuous equations. We would like to have an operator that conserves both energy and enstrophy, which means converting J3 . This is considerably harder. We skip the derivation and show the result, see Arakawa and Lamb for details Arakawa and Lamb (1977)): 1 2 y′ y′ [δx (uxyy ux ) + δy (v xyy uy )] + δx′ u′ ux + δy′ v ′ uy 3 3 2 1 x′ x′ xxy x xxy y ∇ · vv = [δx (u v ) + δy (v v )] + δx′ u′ v x + δy′ v ′ v y 3 3 ux + v y u′ = −δy′ ψ = √ 2 − ux + v y √ v ′ = δx′ ψ = 2 ∇ · vu = (10.84) (10.85) (10.86) (10.87) The (x′ , y ′ ) coordinate system is rotated 45 degrees counterclockwise to the (x, y ) coordinate system; i.e. it is in the diagonal directions w.r.t. to the original axis. 10.7. CONSERVATION FOR DIVERGENT FLOWS 10.7 163 Conservation for divergent flows So far we have dealt mostly with advection operators which handle conservation laws appropriate for divergence-free flows. There are situations, such as flows over obstacles, where the divergent velocity plays a significant role. Under such conditions, it might be important to incorporate conservation laws appropriate to divergent flows. In the following we will consider primarily the shallow water equations in a rotating frame: ∂v + v · ∇v + f k × v + g ∇η = 0 ∂t ∂h + ∇ · (hv) = 0 ∂t (10.88) (10.89) where the fluid depth h is the sum of the resting layer thickness H and the surface displacement η . One of the more important conservation principles is the conservation of potential vorticity, q : ζ+f ∂q + v · ∇q = 0, q = (10.90) ∂t h The latter is derived by re-writing the nonlinear advection of momentum in the form v · ∇v = ∇ v·v − v × ζk 2 (10.91) prior to taking the curl of the momentum equation 10.88 to arrive at: ∂ζ + f + ∇ · [(ζ + f )v] = 0. ∂t (10.92) The potential vorticity conservation equation is obtained after expanding equation 10.92 and using the continiuity equation 10.89. Equation 10.92 is the flux form of equation 10.90, and shows that the area average of hq is conserved if the domain is closed. The best configuration to solve the shallow water equations is that of the C-grid, figure 10.4; the vorticity, streamfunction, and potential vorticity are collocated. In terms of the discrete operators we have ζ = δx v − δy u, q = ζ+f η xy (10.93) The finite difference discretization of the continuity equation on the C-grid takes the form: ∂h + δx U + δy V = 0 (10.94) ∂t x y where U = h u and V = h v . The purpose is to formulate FD expressions for the momentum advection terms compatible with the conservation of PV, equation 10.90. With this in mind, we start by averaging equation 10.94 with xy , to bring it to the q collocation points, and multiply the resulting equation by q to obtain: xy ∂q h ∂t + δx q x U xy + δy q y V xy xy =h x y 1 ∂q xy xy + xy U δx q + V δy q ∂t h (10.95) 164 CHAPTER 10. NONLINEAR EQUATIONS Equation 10.95 is a FD expression of the identity ∂qh Dq = ∇ · (qhv) = h ∂t Dt (10.96) If the right hand side of 10.95 is a FD expression for the PV equation 10.90, the left hand side is a FD analogue to the vorticity equation 10.92. Carrying the steps backwards we have the following component forms for −v × (ζ + f )k: Continuum −v (ζ + f ) u(ζ + f ) Discrete xy −V q y xy U qx · The remaining task is to find suitable forms for the ∇ v2v . The choices available are either squaring the space-averaged velocity components, or averaging the squared velocity. The latter however leads to a straightforward FD analogue of the kinetic energy and is therefore preferred. This leads to the following PV-conserving momentum advection and Coriolis force operators: v · ∇u − f v = v · ∇v + f u = 1 x y δx u2 + v 2 − v xy q y 2 1 x y δy u2 + v 2 + uxy qx 2 (10.97) (10.98) It can be shown that the above operator also conserves potential enstrophy hq 2 /2. The derivation of schemes that conserve both PV and kinetic energy is very complex. Arakawa and Lamb Arakawa and Lamb (1981); Arakawa and Hsu (1981) did derive such a differencing scheme. Here we quote the final result: x 1 1 1 1 x y xx y x (10.99) δx u2 + v 2 − V q xy − δ′ x (δ′ y V )δ′ x δ′ y q + δ′ x U δ′ y q x + δ′ x U δ′ y q x 2 48 12 12 y 1′ 1 1 1′ xy y′ y ′ ′′ 2 x + v 2 y − U x q xy + (10.100) δy u δ y (δ x U )δ x δ y q − δ y V δ x q − δ′ y V δ′ x q y 2 48 12 12 where δ′ is the discrete differential operator without division by grid distance. Chapter 11 Special Advection Schemes 11.1 Introduction This chapter deals with specialized advection schemes designed to handle problems where in addition to consistency, stability and conservation, additional constraints on the solution must be satisfied. For example, biological or chemical concentration must be non-negative for phsical reason; however, numerical errors are capable of generating negative values which are simply wrong and not amenable to physical interpretation. These negative values can impact the solution adversely, particularly if there is a feed back loop that exacerbate these spurious values by increasing their unphysical magnitude. An example of a feed back loop is a reaction term valid for only positive values leading to moderate growth or decay of the quantity in question; whereas negative values lead to unstable exponential growth. Another example is the equation of state in ocean models which intimately ties salt, temperature, and density. This equation is empirical in nature and is valid for specific ranges of temperature, salt, and density; and the results of out of range inputs to this equation are unpredictable and lead quickly to instabilities in the simulation. The primary culprit in these numerical artifacts is the advection operator as it is the primary means by which tracers are moved around in a fluid environment. Molecular diffusion is usually too weak to account for much of the transport, and what passes for turbulent diffusion has its roots in “vigorous” advection in straining flow fields. Advection transports a tracer from one place to another without change of shape, and as such preserves the original extrema (maxima and minima) of the field for long times (in the absence of other physical mechanism). Problems occur when the gradient are too steep to be resolved by the underlying computational grid. Examples include true discontinuities, such as shock waves or tidal bore, or pseudi-discontinuities such as narrow temperature or salt fronts that are too narrow to be resolved on the grid, (a few hundered meters whereas the computational grid is of the order of kilometers). A number of special advection schemes were devised to address some or all of these issues. They are known generically as Total Variation Diminishing (TVD) schemes. They occupy a prominent place in the study and numerical solution of hyperbolic equations like the Euler equations of gas dynamics or the shallow water equations. Here we confine ourselves to the pure advection equation, a scalar hyperbolic equation. 165 166 11.2 CHAPTER 11. SPECIAL ADVECTION SCHEMES Monotone Schemes The properties of the pure advection operator to preserve the original extrema of the advected field is referred to as the monotonicity property. Consider an initially discretized initial condition of the form Tj0 ≥ Tj0+1 , then a scheme is called monotone if Tjn ≥ Tjn +1 (11.1) for all j and n. A general advection scheme can be written in the form: q Tjn+1 αk Tjn k + = (11.2) k =−p where the αk are coefficients that depend on the specific scheme used. A linear scheme is one where the coefficients αk are independent of the solution Tj . For a scheme to be n monotone with respect to the Tk , we need the condition ∂Tjn+1 ≥0 ∂Tjn k + (11.3) Godunov has shown that the only linear monotonic scheme is the first order (upstream) donor cell scheme. All high-order linear schemes are not monotonic and will permit spurious extrema to be generated. High-order schemes must be nonlinear in order to preserve monotonicity. 11.3 Flux Corrected Transport (FCT) The FCT algorithm was originally proposed by Boris and Book Boris and Book (1973, 1975, 1976) and later modified and generalized to multidimensions by Zalesack Zalesak (1979). Here we present the Zalesak version as it is the more common one and flexible one. We will first consider the scheme in one-dimension before we consider its two-dimensional extension. 11.3.1 One-Dimensional Consider the advection of a tracer in one-dimension written in conservation form: Tt + (uT )x = 0 (11.4) subject to appropriate initial and boundary conditions. The spatially integrated form of this equation lead to the following: xj + 1 2 xj − 1 2 where f |x 1 j+ 2 Tt dx + f |x j+ 1 2 − f |x j− 1 2 =0 (11.5) = [uT ]xj− 1 is the flux out of the cell j . This equation is nothing but the 2 restatement of the partial differential equation as the rate at which the budget of T in 11.3. FLUX CORRECTED TRANSPORT (FCT) 167 cell j increases according to the advective fluxes in and out of the cell. As a matter of fact the above equation can be reintrepeted as a finite volume method if the integral is ∂T replaced by ∂tj ∆x where T j refers to the average of T in cell j whose size is ∆x. We now have: fj + 1 − fj − 1 ∂T j 2 2 + =0 (11.6) ∂t ∆x If the analytical flux is now replaced by a numerical flux, F , we can generate a family of discrete schemes. If we choose an upstream biased scheme where the value within each cell is considered constant, i.e. Fj + 1 = uj + 1 Tj for uj + 1 > 0 and Fj + 1 = uj + 1 Tj +1 for 2 2 2 2 2 uj + 1 < 0 we get the donor cell scheme. Note that the two cases above can be re-written 2 (and programmed) as: Fj + 1 = 2 uj + 1 + |uj + 1 | 2 2 2 Tj + uj + 1 − |uj + 1 | 2 2 2 Tj +1 (11.7) The scheme will be monotone if we were to advance in time stably using a forward Euler method. If on the other hand we choose to approximate Tj at the cell edge as the average of the two cells: Tj + Tj +1 Fj + 1 = uj + 1 (11.8) 2 2 2 we obtained the second order centered in space scheme. Presumably the second order scheme will provide a more accurate solution in those regions where the advected profile is smooth whereas it will create spurious oscillations in regions where the solution is “rough”. The idea behind the flux corrected transport algorithm is to use a combination of the higher order flux and the lower order flux to prevent the generation of new extrema. The algorithm can be summarized as follows: 1. compute low order fluxes FjL 1 . + 2 2. compute high order fluxes FjH 1 , e.g. second order interpolation of T to cell edges + 2 or higher. 3. Define the anti-diffusive flux Aj + 1 = FjH 1 − FjL 1 . This flux is dubbed anti-diffuse + + 2 2 2 because the higher order fluxes attempt to correct the over diffusive effects of the low order fluxes. 4. Update the solution using the low order fluxes to obtain a first order diffused but monotonic approximation: Tjd = Tjn − FjL 1 − FjL 1 + − 2 2 ∆x ∆t (11.9) 5. Limit the anti-diffusive flux so that the corrected solution will be free of extrema not found in Tjn or Tjd . The limiting is effected through a factor: Ac+ 1 = Cj + 1 Aj + 1 , j where 0 ≤ Cj + 1 ≤ 1. 2 2 2 2 168 CHAPTER 11. SPECIAL ADVECTION SCHEMES 6. Apply the anti-diffusive flux to get the corrected solution Tjn+1 = Tjd − Ac+ 1 − Ac− 1 j j 2 2 ∆x ∆t (11.10) Notice that for C = 0 the anti-diffusive fluxes are not applied, and we end up with Tjn+1 = Tjd ; while for C = 1, they are applied at full strength. 11.3.2 One-Dimensional Flux Correction Limiter In order to elucidate the role of the limiter we expand the last expression in terms of the high and low order fluxes to obtain: Tjn+1 = Tjn − Cj + 1 FjH 1 + 1 − Cj + 1 FjL 1 − Cj − 1 FjH 1 + 1 − Cj − 1 FjL 1 + − + − 2 2 2 2 2 2 2 2 ∆x ∆t (11.11) The term under the bracket can thus be interpreted as a weighed average of the low and high order flux; and the weights depend on the local smoothness of the solution. Thus for a rough neighborhood we should choose C → 0 to avoid oscillations, while for a smooth neighborhood C = 1 to improve accuracy. As one can imagine the power and versatility of FCT lies in the algorithm that prescribe the limiter. Here we prescribe the Zalesak limiter. 1. Optional step designed to eliminate correction near extrema. Set Aj + 1 = 0 if: 2 Aj + 1 Tjd+1 − Tjd < 0 and 2 A 1 Tjd+2 − Tjd+1 < 0 j+ 2 or A 1 Td − Td j j −1 < 0 j+ (11.12) 2 2. Evaluate the range of permissible values for Tjn+1 : T max = max T n , T n , T n , T d , T d , T d j j −1 j j +1 j −1 j j +1 T min = min T n , T n , T n , T d , T d , T d j +1 j +1 j −1 j j −1 j j (11.13) 3. Compute the anti-diffusive fluxes Pj+ going into cell j : Pj+ = max 0, Aj − 1 − min 0, Aj + 1 2 2 (11.14) These incoming fluxes will increase Tjn+1 . 4. Compute the maximum permissible incoming flux that will keep Tjn+1 ≤ Tjmax . From the corrective step in equation 11.10 this given by Q+ = Tjmax − Tjd j ∆x ∆t (11.15) 11.3. FLUX CORRECTED TRANSPORT (FCT) 169 5. Compute limiter required so that the extrema in cell j are respected: + Rj + min 1, Qj = Pj+ 0 if Pj+ > 0 if Pj+ = 0 (11.16) 6. Steps 3, 4 and 5 must be repeated so as to ensure that the lower bound on the solution Tjmin ≤ Tjn+1 . So now we define the anti-diffusive fluxes away from cell j : Pj− = max 0, Aj + 1 − min 0, Aj − 1 2 2 Q− = Tjd − Tjmin j − Rj − min 1, Qj = Pj− 0 ∆x ∆t (11.17) (11.18) if Pj− > 0 if Pj− (11.19) =0 7. We now choose the limiting factors so as enforce the extrema constraints simultaneously on adjacent cells. Cj + 1 = 2 11.3.3 Properties of FCT min R+ , R− j +1 j min R+ , R− j j +1 if Aj + 1 > 0 2 if Aj + 1 < 0 (11.20) 2 Figure 11.1 shows the advection of a function using a centered differencing formula of order 8 using 50, 100, 200, 400 and 2000 points around a periodic domain of length 20. The red and blue curves show the analytical and the numerical solution, respectively. The time-stepping is based on an RK3 scheme with a fixed time step of ∆t = 10−3 . The function is composed of multiple shapes to illustrate the strength and weaknesses of different numerical discretization on various solution profiles with different levels of discontinuities: square and triangular waves, a truncated inverted parabola with a smooth maximum, and an infinitely smooth narrow Gaussian. The discontinuities in the profile are challenging for high-order method and this is reflected in the solutions obtained in 11.1 where Gibbs oscillations pollute the entire solution and at all resolutions. One can anticipate that in the limit of infinite resolution the amplitude of these Gibbs oscillations will reach a finite limit independent of the grid size. The top solution with 50 points is severely under-resolved (even the analytical solution does not look smooth since it is drawn on the numerical grid). The situation improves dramatically with increased resolution for the smooth profiles where the smooth peaks are now well-represented using ∆x = 20/200 = 0.1. The Gibbs oscillations seen riding on the smooth peaks are caused by the spurious dispersion of grid scale noise. The resolution increase does not pay off for the square wave where the numerical solution exhibit noise at all resolutions. In contrast to the high-order solution, figure 11.2 shows the results of the same computations using a donor-cell scheme to compute the fluxes. As anticipated from our 170 CHAPTER 11. SPECIAL ADVECTION SCHEMES 1.2 M=50 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=100 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=200 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=400 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=2000 1 0.8 0.6 0.4 0.2 0 −0.2 −10 −8 −6 −4 −2 0 2 4 6 8 Figure 11.1: Uncorrected Centered Difference of 8th order with RK3, ∆t = 10−3 . 10 11.3. FLUX CORRECTED TRANSPORT (FCT) 171 1.2 M=50 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=100 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=200 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=400 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=2000 1 0.8 0.6 0.4 0.2 0 −0.2 −10 −8 −6 −4 −2 0 2 4 6 8 Figure 11.2: upwind 1st order scheme (donor-cell) with RK1, ∆t = 10−3 . 10 172 CHAPTER 11. SPECIAL ADVECTION SCHEMES 1.2 M=50 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=100 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=200 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=400 1 0.8 0.6 0.4 0.2 0 −0.2 1.2 M=2000 1 0.8 0.6 0.4 0.2 0 −0.2 −10 −8 −6 −4 −2 0 2 4 6 8 Figure 11.3: FCT-Centered Difference scheme of 8th order with RK3, ∆t = 10−3 . 10 11.3. FLUX CORRECTED TRANSPORT (FCT) 173 theoretical discussion of the donor-cell scheme the errors are primarily dissipative. At coarse resolution the signals in the solution have almost disappeared and the solution has been smeared excessively. Even at the highest resolution used here the smooth peaks have lost much of their amplitudes while the discontinuities (of various strengthes) have also been smeared. On the up-side the donor-cell scheme has delivered solutions that are oscillations-free and that respect the TVD properties of the continuous PDE. The FCT approach, as well as other limiting methods, aim to achieve a happy medium between diffusive but oscillation-free and high-order but oscillatory. Figure 11.3 illustrate FCT’s benefits (and drawbacks). First the oscillations have been eliminate at all resolutions. Second, at the coarseest resolution, there is a significant loss of peak accompanied by the so-called terracing effect where the limiter tends to flatten smooth peaks, and to introduce intermediate ”step” in the solution where previously there was none. This effect continues well-into the well resolved regime of ∆x = 20/400 where the smooth Gaussian peak has still not recovered its full amplitude and is experiencing a continuiing terracing effect. This contrasts with the uncorrected solution where the smooth peak was recoved with a more modest resolution of ∆x = 20/200. This is a testament to an overactive FCT that cannot distinguish between smooth peaks and discontinuities. As a result smooth peaks that ought not to be limited are flattened out. It is possible to mitigate an overzealous limiter by appending a “discriminator” that can turn off the limiting near smooth extrema. In general the discriminator is built upon checking whether the second derivative of the solution is one-signed in neighborhoods where the solution’s slope changes sign. Furthermore, the terracing effect can be eliminated by introducing a small amount of scale selective dissipation flux (essentially a hyperviscosity). For further details check out Shchepetkin and McWilliams (1998) and Zalesak (2005). The FCT procedure has advantages and disadvantages. The primary benefit of the procedure is that it is a practical procedure to prevent the generation of spurious extrema. It is also flexible in defining the high order fluxes and the extrema of the fields. Most importantly the algorithm can be extended to multiple dimensions in a relatively straightforward manners. The disadvantages is that the procedure is costly in CPU compared to unlimited method, but hey nothing comes free. 11.3.4 Two-Dimensional FCT In two dimensions the advection equation takes the form Tt + fx + gy = 0, where f = uT and g = vT . The FCT algorithm takes the same form as before after taking account of the extra flux G and extra spatial dimension. We thus have the low order solution: d Tj,k n Tj,k FjL 1 ,k − FjL 1 ,k − + (11.21) Aj + 1 ,k = FjH 1 ,k − FjL 1 ,k , Aj,k+ 1 = GH + 1 − GL + 1 + j,k + j,k (11.22) − ∆t 2 GL + 1 − GL − 1 j,k j,k 2 = 2 ∆x − ∆t 2 ∆y and the following anti-diffusive fluxes: 2 2 2 2 2 2 max The extrema of the solution are defined in two passes over the data. First we set Tj,k = n d max(Tj,k , Tj,k ). The final permissible values are determined by computing the extrema 174 CHAPTER 11. SPECIAL ADVECTION SCHEMES of the previous fields in the neighboring cells: T max = max T max , T max , T max j,k j,k j ±1,k j,k ±1 (11.23) T min = min T min , T min , T min j,k j,k j ±1,k j,k ±1 Finally, the incoming and outgoing fluxes in cell (j, k) are given by: + Pj,k = max 0, Aj − 1 ,k − min 0, Aj + 1 ,k + max 0, Aj,k− 1 − min 0, Aj,k+ 1 (11.24) 2 − Pj,k 2 2 2 = max 0, Aj + 1 ,k − min 0, Aj − 1 ,k + max 0, Aj,k+ 1 − min 0, Aj,k− 1 (11.25) 11.3.5 2 2 2 2 Time-Differencing with FCT A last practical issue with the implementation of FCT is the choice of time-differencing for the combined low and high order scheme. In order to preserve the monotone property the low order scheme must rely on a first order Forward Euler scheme in time. For the high order flux it is desirable to increase the order of the time-differencing to match that of the spatial differencing, and to increase the time-accuracy at least in smooth regions. There is an additional wrinkle to this dilemna in that whereas Forward Euler in time is stable for the donor-cell scheme, it is unconditionally unstable for centered in space spatial differences. The resolution of this dilemna can be addressed in several ways. One approach is to use a Runge-Kutta type approach and to use the low and high order fluxes at each of the substages of the Runge Kutta scheme. For the RK scheme to be stable to both spatial differences; we need at least a third order scheme (their stability region includes portions of the imaginary axis and the left hand complex planes). The drawback of such an approach is an increase in CPU time since multiple evaluations of the right hand sides is required. The CPU cost is exacerbated if the FCT limiter is applied at each of the sub-steps. Another approach consists of using a multi-level methods like the leap-frog trapezoidal scheme. The low flux is first updated using the solution at time n. The high order fluxes at time n are obtained with a traditional leap-frog trapezoidal step that does not involve the low order solution (i.e. no limiter is applied): first the leap-frog step is applied: ˆ T n+1 = T n−1 − ∆t FjH 1 (T n ) − FjH 1 (T n ) − + 2 2 ∆x (11.26) A second order estimate of the function values at the mid-time level is calculated: 1 T n+ 2 = ˆ T n + T n+1 2 (11.27) and then used to compute the high order fluxes 1 FjH 1 = FjH 1 T n+ 2 . + + 2 2 It is these last high order fluxes that are limited using the FCT algorithm. (11.28) 11.4. SLOPE/FLUX LIMITER METHODS 175 3 2.5 MC 2 C(r) e rbe upe S Van Leer minmod 1.5 1 0.5 0 0 1 r 2 3 Figure 11.4: Graph of the different limiters as a function of the slope ratio r . 11.4 Slope/Flux Limiter Methods The slope/flux limiter class of methods uses a similar approach to the flux correction method, in that a low order and a high order flux is used to eliminate spurious oscillations. The slope-limiter schemes however do not involve the computations of the temporary diffused value. The limiting is instead applied directly to the flux based on the values of the solution’s gradients. That the final flux used in the scheme takes the form: Fj + 1 = FjL 1 + Cj + 1 FjH 1 − FjL 1 + + + 2 2 2 2 (11.29) 2 where the limiting factor C = C (r ) is a function of the slope ratio in neighboring cells: rj + 1 = 2 Tj − Tj −1 . Tj +1 − Tj (11.30) For slowly varying smooth function the slope ratio is close to 1; the slopes change sign near an extremum and the ratio is negative. A family of method can be generated based entirely on the choice of limiting function; here we list a few of the possible choices: MinMod: Superbee: Van Leer: MC: C (r ) = max(0, min(1, r )) C (r ) = max(0, min(1, 2r ), min(2, r )) r+ C (r ) = 1+|r| |r | C (r ) = max(0, min(2r, 1+2r , 2) 2 (11.31) The graph of these limiters is shown in figure 11.4. The different functions have a number of common features. All limiters set C = 0 near extrema (r ≤ 0). They all asymptote to 2, save for the minmod limiter which asymptotes to 1, when the function changes rapidly (r ≫ 1). The minmod limiter is the most stringent of the limiters and prevents the solution gradient from changing quickly in neighboring cells; this limiter is known as being diffusive. The other limiters are more lenient, the MC one being the 176 CHAPTER 11. SPECIAL ADVECTION SCHEMES most lenient, and permit the gradients in neighboring cells to be twice as large as the one in the neighboring cell. The Van Leer limiter is the smoothest of the limiters and asymptotes to 2 for r → ∞. 11.5 MPDATA The Multidimensional Positive Definite Advection Transport Algorithm (MPDATA) was presented by Smolarkiewicz (1983) as an algorithm to preserve the positivity of the field throughout the simulation. The motivation behind his work is that chemical tracers must remain positive. Non-oscillatory schemes like FCT are positive definite but are deemed too expensive, particularly since oscillations are tolerable as long as they did not involve negative values. MPDATA is built on the monotone donor cell scheme and on its modified equation. The latter is used to determine the diffusive errors in the scheme and to correct for it near the zero values of the field. The scheme is presented here in its one-dimensional form for simplicity. The modified equation for the donor cell scheme where the fluxes are defined as in equaiton 11.7 is: ∂uT ∂ ∂T ∂T + = κ ∂t ∂x ∂x ∂x + O(∆x2 ) (11.32) where κ is the numerical diffusion generated by the donor cell scheme: κ= |u|∆x − u2 ∆t 2 (11.33) The donor cell scheme will produce a first etimate of the field which is guranteed to be non-negative if the initial field is initially non-negative. This estimate however, is too diffused; and must be corrected to eliminate these first order errors. MPDATA data achieves the correction by casting the second order derivatives in the modified equation 11.32 as another transport step with a pseudo-velocity u: ˜ κ ∂T ∂ uT ˜ ∂T =− , u= ˜ T ∂x ∂t ∂x 0 T >0 (11.34) T =0 and re-using the donor cell scheme to discretize it. The velocity u plays the role of an ˜ anti-diffusion velocity that tries to compensate for the diffusive error in the first step. The correction step takes the form: uj + 1 ˜ = ˜ Fj + 1 = 2 2 uj + 1 ∆x − u2+ 1 ∆t j 2 2 uj + 1 + uj + 1 ˜ ˜ 2 ˜ Tjn+1 = Tj − 2 Tjd + 2 ˜ ˜ Fj + 1 − Fj − 1 2 2 ∆x Tjd+1 − Tjd (Tjd+1 + Tjd + ǫ)∆x uj + 1 − uj + 1 ˜ ˜ ∆t 2 2 2 Tjd+1 (11.35) (11.36) (11.37) where Tjd is the diffused solution from the donor-cell step, and ǫ is a small positive number, e.g. 10− 15, meant to prevent the denominator from vanishing when Tjd = 11.6. WENO SCHEMES IN VERTICAL 177 Tjd+1 = 0. The second donor cell step is stable provided the original one is too; and hence the correction does not penalize the stability of the scheme. The procedure to derive the two-dimensional version of the scheme is similar; the major difficulty is in deriving the modified equation and the corresponding anti-diffusion velocity. It turns out that the x-component of the anti-diffusion velocity remains the same while the ycomponents takes a similar form with u replaced by v , and ∆x by ∆y . 11.6 WENO schemes in vertical We explore the application of WENO methodology to compute in the vertical. The WENO methodology is based on a reconstruction of function values from cells averages using different stencils, and on combining the different estimates so as to maximime accuracy while minimizing the impact of nonsmooth stencils. We briefly describe the steps of a WENO calculation below, the details can be found in Shu (1998); Jiang and Shu (1996). Note that the diffusive term requires the calculation of the derivative of the function. This can be also done accurately with a WENO scheme after the reconstruction of T ; the caveat for high order accuracy is that the grid spacing ∆z must vary smoothly. We first take up the reconstruction step and dwell on differentiation later. The question is of course always how much should we pay for an accurate calculation of the vertical diffusion term. 11.6.1 Function reconstruction s i−l s ··· s i−1 zi− 1 s i 2 ' ∆zi zi+ 1 2 E s i+1 s ··· s i+s Figure 11.5: Sketch of the stencil S (i; k, l). This stencil is associated with cell i, has left shift l, and contains k = l + s + 1 cells. We first focus on the issue of computing function values from cell averages. We divide the vertical into a number of finite volumes which we also refer to as cells, and we define cells, cell centers and cells sizes by: (11.38) Ii = zi− 1 , zi+ 1 2 zi = 2 zi− 1 + zi+ 1 2 ∆zi = zi+ 1 2 2 2 − zi− 1 2 (11.39) (11.40) The reconstruction problem can be stated as follows: Given the cell averages of a function T (z ): zi+ 1 1 z 2 v (z ′ ) dz ′ , i = 1, 2, . . . , N (11.41) Ti = ∆zi zi− 1 2 178 CHAPTER 11. SPECIAL ADVECTION SCHEMES find a polynomial pi (z ), of degree at most k − 1, for each cell i, such that it is a k-th order accurate approximation to the function T (z ) inside Ii : pi (z ) = T (z ) + O(∆z k ), z ∈ Ii , i = 1, 2, . . . , N (11.42) The polynomial pi (z ) interpolates the function within cells. It also provides for a discontinuous interpolation at cell boundaries since a cell boundary is shared by more then one cell; we thus write: Ti+ 1 = pi (zi− 1 ), Ti− 1 = pi (zi+ 1 ) − + 2 2 (11.43) 2 2 Given the cell Ii and the order of accuracy k, we first choose a stencil, S (i; k, l), based on Ii , l cells to the left of Ii and s cells to the right of Ii with l + s + 1 = k. S (i) consists of the cells: S (i) = {Ii−l , Ii−l+1 , . . . , Ii+s } (11.44) There is a unique polynomial p(z ) of degree k − 1 = l + s, whose cell average in each of the cells in S (i) agrees with that of T (z ): z Tj = 1 ∆zj zj + 1 2 zj − 1 p(z ′ ) dz ′ , j = i − l, . . . , i + s. (11.45) 2 The polynomial in question is nothing but the derivative of the Lagrangian interpolant of the function T (z ) at the cell boundaries. To see this, we look at the primitive function of T (z ): z T (z ) = −∞ T (z ′ ) dz ′ , (11.46) where the choice of lower integration limit is immaterial. The function T (z ) at cell edges can be expressed in terms of the cell averages: i T (zi+ 1 ) = 2 zj + 1 2 j =−∞ zj − 1 2 i T (z ′ ) dz ′ = z T j ∆zj (11.47) j =−∞ Thus, the cell averages define the primitive function at the cell boundaries. If we denote the unique polynomial of degree at most k which interpolates T at the k + 1 points: zi−l− 1 , . . . , zi+s+ 1 , by P (z ), and denote its derivative by p(z ), it is easy to verify that: 2 2 1 ∆zj zj + 1 2 zj − 1 p(z ′ ) dz ′ = 2 = = = 1 ∆zj zj + 1 2 zj − 1 P ′ (z ′ ) dz ′ 2 P (zj + 1 ) − P (zj − 1 ) 2 2 ∆zj T (zj + 1 ) − T (zj − 1 ) 2 2 ∆zj 1 ∆zj z (11.48) zj + 1 2 zj − 1 T (z ′ ) dz ′ (11.49) (11.50) (11.51) 2 = T j , j = i − l, . . . , i + s (11.52) 11.6. WENO SCHEMES IN VERTICAL 179 This implies that p(z ) is the desired polynomial. Standard approximation theory tells us that P ′ (z ) = T ′ (z ) + O(∆z k ), z ∈ Ii , which is the accuracy requirement. The construction of the polynomial p(z ) is now straightforward. We can start with the Lagrange intepolants on the k + 1 cell boundary and differentiate with respect to z to obtain: k p(z ) = m−1 m=0 j =0 z T i−l+j ∆zi−l+j k k n=0 q =0 n=m q =m,n z − zi−l+q− 1 2 k n=0 n=m zi−l+m− 1 − zi−l+n− 1 2 2 (11.53) The order of the outer sums can be exchanged to obtain an alternative form which maybe computationally more practical: p(z ) = k −1 j =0 z Clj (z )T i−l+j (11.54) where Clj (z ) is given by: k Clj (z ) = ∆zi−l+j m=j +1 k k n=0 q =0 n=m q =m,n z − zi−l+q− 1 2 k n=0 n=m zi−l+m− 1 − zi−l+n− 1 2 2 (11.55) The coefficient Clj need not be computed at each time step if the computational grid is fixed, instead they can be precomputed and stored to save CPU time. The expression for the Clj simplifies (because many terms vanish) when the point z coincide with a cell edge and/or when the grid is equally spaced (∆zj = ∆z, ∀j ). ENO reconstruction The accuracy estimate holds only if the function is smooth inside the entire stencil S (i; k, l) used in the interpolation. If the function is not smooth Gibbs oscillations appear. The idea behind ENO reconstruction is to vary the stencil S (i; k, l), by changing the left shift l, so as to choose a discontinuity-free stencil; this choice of S (i; k, l) is called an “adaptive stencil”. A smoothness criterion is needed to choose the smoothest stencil, and ENO uses Newton divided differences. The stencil with the smoothest Newton divided difference is chosen. ENO properties: 1. The accuracy condition is valid for any cell which does not contain a discontinuity. 2. Pi (z ) is monotone in any cell Ii which does contain a discontinuity. 180 CHAPTER 11. SPECIAL ADVECTION SCHEMES 3. The reconstruction is Total Variation Bounded (TVB as opposed to TVD), that k is there is a function Q(z ) satisfying Q(z ) = Pi (z ) + O(∆zi +1 ), z ∈ Ii , such that T V (Q) ≤ T V (T ). ENO disadvantages: 1. The choice of stencil is sensitive to round-off errors near the roots of the solution and its derivatives. 2. The numerical flux is not smooth as the stencil pattern may change at neighboring points. 3. In the stencil choosing process k stencils are considered covering 2k − 1 cells but only one of the stencils is used. If information from all cells are used one can potentially get 2k − 1-th order accuracy in smooth regions. 4. ENO stencil choosing is not computationally efficient because of the repeated use of “if” structures in the code. 11.6.2 WENO reconstruction WENO attempts to address the disadvantages of ENO, primarily a more efficient use of CPU time to gain accuracy in smooth region without sacrificing the TVB property in the presence of discontinuity. The basic idea is to use a convex combination of all stencils used in ENO to form a better estimate of the function value. Suppose the k candidate stencils S (i; k, l), l = 0, . . . , k − 1 produce the k different estimates: Tjl+ 1 = 2 k −1 j =0 z Clj T i−l+j , l = 0, . . . , k − 1 then the WENO estimate is Tj + 1 = 2 k −1 ωl Tjl+ 1 . l=0 (11.56) (11.57) 2 where ωl are the weights satisfying the following requirements for consistency and stability: ωl ≥ 0, k −1 ωl = 1 (11.58) l=0 Furthermore, when the solution has a discontinuity in one or more of the stencils we would like the corresponding weights to be essentially 0 to emulate the ENO idea. The weights should also be smooth functions of the cell averages. The weights described below are in fact C ∞ . Shu et al propose the following forms for the weights: ωl = αl k −1 n=0 αn , αl = dl l = 0, . . . , k − 1 (ǫ + βl )2 (11.59) 11.6. WENO SCHEMES IN VERTICAL 181 Here, the dl are the factor needed to maximize the accuracy of the estimate, i.e. to make the truncation error O(∆z 2k−1 ). Note that the weights must be as close to dl in smooth regions, actually we have the requirement that ωl = dl + O(∆z k ). The factor ǫ is introduced to avoid division by zero, a value of ǫ = 10−6 seems standard. Finally, βl are the smoothness indicators of stencils S (i; k, l). These factors are responsible for the success of WENO; they also account for much of the CPU cost increase over traditional methods. The requirements for the smoothness indicator are that βl = O(∆z 2 ) in smooth regions and O(1) in the presence of discontinuities. This translates into αl = O(1) in smooth regions and O(∆z 4 ) in the presence of discontinuities. The smoothness measures advocated by Shu et al look like weighed H k−1 norms of the interpolating functions: βl = k −1 zi+ 1 n=1 zi− 1 2 2 ∆z 2n−1 ∂ n pl ∂z n 2 dz (11.60) The right hand side is just the squares of the scaled L2 norms for all derivatives of the polynomial pl over the interval [zi− 1 , zi+ 1 ]. The factor ∆z 2n−1 is introduced to remove 2 2 any ∆z dependence in the derivatives in order to preserve self similarity; the smoothness indicator are the same regardless of the underlying grid. The smoothness indicators for the case k = 2 are: z z z z (11.61) β0 = (T i+1 − T i )2 , β1 = (T i − T i−1 )2 Higher order formulae can be found in Shu (1998); Balsara and Shu (2000). The formulae given here have a one-point upwind bias in the optimal linear stencil suitable for a problem with wind blowing from left to right. If the wind blows the other way, the procedure should be modified symemetrically with respect to zi+ 1 . 2 11.6.3 ENO and WENO numerical experiments Figure 11.6 (left panel) shows the convergence rates of ENO interpolation for the function 1 sin 2πx for − 1 ≤ x ≤ 2 . Two sets of experiments were conducted. One set the shift to 0 2 so the interpolation is right tilted, and the other to (k − 1)/2 where k is the polynomial order so that the stencil is centered. The two sets of experiments overlap for k = 2, 3. The convergence rates for both experiments are the same, although the centered stencils yield a lower error. The WENO reconstruction effectively doubles the convergence rates by using a convex combination of all stencils used in the reconstruction. I have coded up a WENO advection scheme that can use variable order space interpolation (up to order 9), and up to 3rd order Runge-Kutta stepping. I have also experimented with the scheme in 1D. Figure 11.7 shows the advection of Shchepetkin’s narrow profile (top left), wide profile (top right), and hat profile, (bottom left). The high order WENO5-RK3 scheme has less dissipation, and better phase properties than the WENO3-RK2 scheme. For the narrow Gaussian hill the peak is well preserved and the profile is narrower; it is indistinguishable from the analytical solution for the wider profile. Finally, although the scheme does not enforce TVD there is no evidence of dispersive ripples in the case of the hat profile; there are however small negative values. I have tried to implement the shape preserving WENO scheme proposed by Suresh and Huynh (1997) and Balsara and Shu (2000). Their limiting attempts to preserve the high order 182 10 CHAPTER 11. SPECIAL ADVECTION SCHEMES 0 10 10 2 10 0 −5 −5 3 ε ε 10 3 −10 4 10 −10 5 10 −15 5 7 9 6 10 7 −15 10 1 2 10 N 10 10 3 −20 10 1 10 2 10 3 10 4 N Figure 11.6: Convergence Rate (in the maximum norm) for ENO (left panel) and WENO (right panel) reconstuction. The dashed curves are for a left shift set to 0, while the solid curve are for centered interpolation. The numbers refer to the interpolation order accuracy of WENO near discontinuities and smoth extrema, and as such include a peak discriminator that picks out smooth extrema from discontinuous ones. As such, I think the scheme will fail to preserve the peaks of the original shape and will allow some new extrema to be generated. This is because there is no full proof discrimator. Consider what happens to a square wave advected by a uniform velocity field. The discontinuity is initially confined to 1 cell; the discriminator will rightly flag it as a discontinuous extremum and will apply the limiter at full strength. Subsequentally, numerical dissipation will smear the front across a few cells and the front width will occupy a wider stencil. The discriminator, which works by comparing the curvature at a fixed number of cells, will fail to pick the widening front as a discontinuity, and will permit out of range values to be mistaken for permissible smooth extrema. In order to test the effectiveness of the limiter, I have tried the 1D advection of a square wave using the limited and unlimited WENO5 (5-th order) coupled with RK2. Figure 11.8 compares the negative minimum obtained with the limited (black) and unlimited (red) schemes; the x-axis represent time (the cone has undergone 4 rotations by the end of the simulation). The different panels show the result using 16, 32, 64 and 128 cells. The trend in all cases is similar for the unlimited scheme: a rapid growth of the negative extremum before it reaches a quasi-steady state. The trend for the limited case is different. Initially, the negative values are suppressed the black curves starting away from time 0. This is initial period increases with better resolution. After the first negative values appear, there is a rapid deterioration in the minimum value before reaching a steady state. This steady state value decreases with the number of points, and becomes negligeable for the 128 cell case. Finally, note that unlimited case produces a slightly better minimum for the case of the 16 cells, but does not improve substantially as the number of points is increased. For this experiment, the interval is the unit interval and the hat profile is confined to |x| < 1/4; the time step is held fix at ∆t = 1/80, so the Courant number increases with the number of cells used. 11.7. UTOPIA 183 Figure 11.7: Advection of several Shchepetkin profiles. The black solid line refers to the analytical solution, the red crosses to the WENO3 (RK2 time-stepping), and the blue stars to WENO5 with RK3. The WENO5 is indistiguishable from the analytical solution for the narrow profile I have repeated the experiments for the narrow profile case (Shchepetkin’s profile), and confirmed that the limiter is indeed able to supress the generation of negative value, even for a resolution as low as 64 cells (the reference case uses 256 cells). The discriminator, however, allows a very slight and occasional increase of the peak value. By in large, the limiter does a good job. The 2D cone experiments with the limiters are shown in the Cone section. 11.7 Utopia The uniform third order polynomial interpolation algorithm was derived explicitly to be a multi-dimension, two-time level, conservative advection scheme. The formulation is based on a finite volume formulation of the advection equation: (∆V T )t + z ∂ ∆V F · n dS = 0 (11.62) where T is the average of T over the cell ∆V and F · n are the fluxes passing through the surfaces ∂ ∆V of the control volume. A further integral in time reduces the solution 184 CHAPTER 11. SPECIAL ADVECTION SCHEMES 0 0 10 10 −10 −10 10 10 −20 −20 10 10 0 100 200 300 0 200 300 0 0 100 100 200 300 0 10 10 −10 −10 10 10 −20 −20 10 10 0 100 200 300 Figure 11.8: Negative minima of unlimited (red) and limited (black) WENO scheme on a square hat profile. Top left 16 points, top right 32 points, bottom left 64 points, and bottom rights 64 points. to the following: T n+1 n =T + 1 ∆V ∆t 0 ∂V F · n dS dt = 0 (11.63) A further definition will help us interpret the above formula. If we let the time-average ∆ flux passing the surfaces bounding ∆V as F ∆t = 0 t F dt we end up with the following two-time level expression: T n+1 n =T + ∆t ∆V ∂V F · n dS (11.64) UTOPIA takes a Lagrangian point of view in tracing the fluid particle crossing each face. The situation is depicted in figure 11.9 where the particles crossing the left face of a rectangular cell, is the area within the quadrilateral ABCD; this is effectively the contribution of edge AD to the boundary integral ∆t ∂ VAD F · n dS . UTOPIA makes the assumption that the advecting velocity is locally constant across the face in space 11.7. UTOPIA 185 (j − 1, k + 1) (j, k + 1) Et (j + 1, k + 1) A  B (j − 1, k) (j, k) (j + 1, k) Ft  D (j − 1, k − 1) (j − 1, k − 2) C (j, k − 1) (j, k − 2) (j + 1, k − 1) (j + 1, k − 2) Figure 11.9: Sketch of the particles entering cell (j, k) through its left edge (j − 1 , k) 2 assuming positive velocity components u and v . 186 CHAPTER 11. SPECIAL ADVECTION SCHEMES and time; this amount to approximating the curved edges of the area by straight lines as shown in the figure. The distance from the left edge to the straight line BC is u∆t, and can be expressed as p∆x where p is the courant number for the x-component of the velocity. Likewise, the vertical distance between point B and edge k + 1 is v ∆t = q ∆y , 2 where q is the Courant number in the y direction. We now turn to the issue of computing the integral of the boundary fluxes; we will illustrate this for edge AD of cell (j, k). Owing to UTOPIA’s Lagrangian estimate of the flux we have: 1 ∆t F · n dS = T dx dy. (11.65) ∆V ∂ VAD ∆x∆y ABCD The right hand side integral is in area integral of T over portions of upstream neighboring cells. Its evaluation requires us to assume a form for the spatial variations of T . Several choices are available and Leonard et al Leonard et al. (1995) discusses several options. UTOPIA is built on assuming a quadratic variations; for cell (j, k), the interpolation is: 1 T j +1,k + T j,k−1 + T j −1,k + T j,k+1 − 4T j,k 24 T j +1,k − T j −1,k T j +1,k − 2T j,k + T j −1,k 2 ξ+ ξ 2 2 T j,k+1 − T j,k−1 T j,k+1 − 2T j,k + T j,k−1 2 η+ η. 2 2 Tj,k (ξ, η ) = T j,k − + + (11.66) Here, ξ and η are scaled local coordinates: ξ= x y −i η = − j. ∆x ∆y (11.67) 1 so that the center of the box is located at (0, 0) and the left and right edges at (± 2 , 0), respectively. The interpolation formula is designed such as to produce the proper cellaverages when the function is integrated over cells (j, k), (j ± 1, k) and (j, k ± 1). The area integral in equation 11.65 must be broken into several integral: First, the area ABCD stradles two cells, (j − 1, k) and (j − 1, k − 1), with two different interpolation for T ; and second, the trapezoidal area integral can be simplified. We now have 1 ∆x∆y ABCD T dx dy = I1 (j, k) − I2 (j, k) + I2 (j, k − 1) (11.68) where the individual contributions from each area are given by: I1 (j, k) = I2 (j, k) = AEF D AEB Tj,k (ξ, η ) dη dξ = Tj,k (ξ, η ) dη dξ = 1 2 1 2 1 −u 2 1 2 1 −u 2 1 2 1 −u 2 ηAB (ξ ) Tj,k (ξ, η ) dη dξ (11.69) Tj,k (ξ, η ) dη dξ. (11.70) The equation for the line AB is: ηAB (ξ ) = 1 1q + ξ− 2p 2 (11.71) 11.8. LAX-WENDROFF FOR ADVECTION EQUATION 187 (fj,k+1 − 2fj,k + fj,k−1 ) − (fj,k − 2fj,k−1 + fj,k−2) 3 uv 24 fj,k−1 − 2fj,k + fj,k+1 2 uv + 6 fj +1,k − 2fj,k + fj −1,k − fj +1,k−1 + 2fj,k−1 − fj −1,k−1 2 + u 8 (fj +1,k − fj +1,k−1 ) − (fj,k − fj,k−1) u − 3 (fj,k−2 − 2fj,k−1 + fj,k ) + 2(fj +1,k − fj +1,k−1 ) + 12 2(fj,k+1 − fj,k−1) + (fj,k − fj −1,k ) − (fj,k−1 − fj −1,k−1) + uv 12 fj +1,k + fj,k fj +1,k − 2fj,k + fj −1,k − + 2 6 fj +1,k − 2fj,k + fj −1,k 2 fj +1,k − fj,k u+ uu (11.72) − 2 6 Fj + 1 ,k = 2 Figure 11.10: Flux for the finite volume form of the utopia algorithm. using the local coordinates of cell (j, k). Performing the integration is rather tedious; the output of a symbolic computer program is shown in figure 11.10 A different derivation of the UTOPIA scheme can be obtained if we consider the cell values are function values and not cell averages. The finite difference form is then given by the equation shown in figure 11.11. 11.8 Lax-Wendroff for advection equation We explore the application of the Lax-Wendroff procedure to compute high-order, twotime level approximation to the advection diffusion equation written in the form: Tt + ∇ · (uT ) = 0. (11.74) The starting point is the time Taylor series expansion which we carry out to fourth order: T n+1 = T n + ∆ t2 ∆ t3 ∆ t4 ∆t Tt + Ttt + Tttt + Ttttt + O(∆t5 ). 1! 2! 3! 4! (11.75) The next step is the replacement of the time-derivative above with spatial derivatives using the original PDE. It is easy to derive the following identities: Tt = −∇ · [uT ] (11.76) Ttt = −∇ · [ut T − u∇ · (uT )] (11.77) Ttt = −∇ · [ut T + uTt ] Tttt = −∇ · [utt T − 2ut ∇ · (uT ) − u∇ · (ut T ) + u∇ · (u∇ · (ut T ))] (11.78) 188 CHAPTER 11. SPECIAL ADVECTION SCHEMES fm,n−1 − 3fm,n−2 + 3fm,n+1 − fm,n 3 uv 24 fm,n−1 + fm,n+1 − 2fm,n 2 + uv 6 −5fm,n−1 − 3fm+1,n−1 + 3fm,n+1 + 3fm+1,n + fm−1,n−1 − fm−1,n + fm,n−2 + fm,n + 16 −fm+1,n + fm,n + fm+1,n−1 − fm,n−1 + u 3 +fm+1,n + fm−1,n − fm+1,n−1 − 2fm,n + 2fm,n−1 − fm−1,n−1 2 + u uv 8 +fm,n+1 − 3fm−1,n + 16fm,n + fm,n−1 + 9fm+1,n + 24 fm+1,n − 2fm,n + fm−1,n 2 fm,n − fm+1,n u+ uu (11.73) + 2 6 Fj + 1 ,k = 2 Figure 11.11: Flux of UTOPIA when variables represent function values. This is the finite difference form of the scheme 11.9 2D Numerical experiments We present here a number of numerical experiments to illustrate the effect of different discretization schemes on multi-dimensional advection of a passive tracer. The flow is a simple one and consist of a flow rotating inside a square cavity with angular speed π . The initial condition is the famous grooved cylinder experiment that include tracer features that cannot align with a square grid, and sharp discontinuities. The preservation of the sharp features and the extrema is the goal of this exercise. The sharp features in the solution do not favor high-order scheme because they will lead to severe Gibbs oscillations, whereas the diffusive scheme will smear the sharp gradients unrealistically. There are six schemes considered here: donor-cell (DC), fourth-order centered finite difference (CD4), a FCT-limited CD4, a third-order UTOPIA scheme, a Universal limiter based UTOPIA-ULim scheme, and a WENO scheme of order 5. There are 3 linear scheme here (DC, CD4 and UTOPIA), and 3 non-linear ones (FCT-CD4, UTOPIA-ULim, and WENO5). Only the CD4 is free of numerical diffusion whereas, DC and UTOPIA inject second (laplacian-like diffusion) and fourth-order (biharmonic-like hyper-diffusion) spurious dissipation. The WENO5 scheme is also upwind biased but the oscillation control is primarily due to the convex weighing of lower order stencils. The numerical results are shown in figures 11.12-11.17 in a convergence study form for resolutions of 40, 80, 160, 320, and 640 cells in each direction. The lower right plot shows the exact solution on the highest resolution grid (it does not reflect the plotting distortion caused by the grid). A comparison among the different scheme at the intermediate resolution of 160 cells is shown in figure 11.18. The extrema of the field at the final time-step are stamped for each experiment. The contour level shows are equi-spaced between 0.1–1.0. We did not include the zero-contour as it leads to a lot of noise because of tiny oscillations. Instead negative contours are shown in dashed black 11.9. 2D NUMERICAL EXPERIMENTS 189 lines for the two level 4hmin /5 and hmin /5. The DC, UTOPIA, and UTOPIA-ULim schemes are single-stage two-time level schemes, and hence sport the shortest computational cost. The CD4 and WENO5 flavors were integrated with an RK3TVD scheme. The latter was slightly modified to run the FCTversion of the CD4 scheme. In all cases the Courant number based on the corner velocity was kept constant: √ √ √ Nc ωL 2/2T /Nt 2π Umax ∆t = = 2π = = 0.06 (11.79) C= ∆x L/Nc Nt 80 where Nc and Nt are the number of cells, and number of time-steps needed to perform a single full rotation. The Donor cell scheme shows a severe problem with excessive numerical diffusion. The cylinder groove has been completely erased in the coarsest resolution run where one sees an 77% loss of peak. The numerical damping improves with resolution, but even at the highest one, the groove is not restored and there is still a 33% loss of peak. On the other hand the solution of oscillation-free. CD4 on the other hand is plagued by oscillations as no numerical dissipation is available to dissipate the small scale noise. The oscillations persist from the coarsest to the finest grid, and remain at the grid-scale. Notice that in the unresolved case the fourth-order scheme does a terrible job at capturing the solution features. Notice also that the amplitude of the Gibbs oscillations of both under and over-shoots remains quite high irrespective of resolution. The FCT corrected CD4 solution, 11.14, achieves its main goal of controlling the Gibbs oscillations. Their amplitude now is of the order of machine precision. The shape of the cylinder and its groove are still severely distorted in the under-resolved cases, but improve quickly with resolution. The severe damping of the DC scheme has been eliminated, only 29% loss of peak at the coarsest resolution, and the resolving power of the high-order scheme is apparent even at a resolution of 80 points. The UTOPIA scheme is a third-order scheme with a hyperviscous numerical dissipation which acts much more vigorously on short scale waves then it does on longer scale ones. The scheme however, is not oscillation-free but their amplitude and extent is much less then that of CD4. The benefits of scale-selective numerical diffusion are quite apparent for this example: control of Gibbs oscillation while preserving solution gradients for longer times. At the intermediate resolution of 160 cells, the spurious extrema have a much smaller amplitude than those of CD4, but larger than those of FCT-CD4. UTOPIA-ULim shows the good impact of combining a high-order scheme with a limiter. At the resolved scale the solution is quite good and oscillation-free. At coarse resolution the distortion is not compounded by grid-scale noise. The universal limiter does not belong to the class of FCT algorithm and has been designed to take multidimensionality into account. The WENO5 scheme shown in 11.17 is not oscillation-free, but shows the impact of resolution on the amplitude of these Gibbs oscillation: unlike CD4 where their amplitude was independent of grid spacing, WENO5 leads to a decrease of Gibbs amplitude as ∆x decreases. Notice also that the grooved cylinder is decently represented even at the coarse resolution of 80 cells. The WENO5 is slightly upstream-biased and hence inject 190 CHAPTER 11. SPECIAL ADVECTION SCHEMES hmin =0.0006253 hmin =4.363e-06 hmax =0.23878 hmax =0.39047 hmin =2.9875e-10 hmin =1.5769e-18 hmax =0.54917 hmax =0.63257 1 0.9 hmin =6.347e-35 hmin =0 hmax =0.77175 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.12: Donor Cell with 40, 80, 160, 320, and 640 points 11.9. 2D NUMERICAL EXPERIMENTS 191 hmin =-0.56894 hmin =-0.39675 hmax =1.1613 hmax =1.644 hmin =-0.52123 hmin =-0.37856 hmax =1.3178 hmax =1.3952 1 0.9 hmin =-0.3355 hmin =0 hmax =1.4186 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.13: Centered 4th order with 40, 80, 160, 320, and 640 points 192 CHAPTER 11. SPECIAL ADVECTION SCHEMES hmin =4.567e-12 hmin =2.1219e-19 hmax =0.71853 hmax =0.9012 hmin =-7.3573e-18 hmin =-1.1295e-17 hmax =0.99259 hmax =0.99999 1 0.9 hmin =-1.3224e-17 hmin =0 hmax =1 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.14: Centered 4th order with 40, 80, 160, 320, and 640 points 11.9. 2D NUMERICAL EXPERIMENTS 193 hmin =-0.046307 hmin =-0.053232 hmax =0.78887 hmax =1.0482 hmin =-0.059122 hmin =-0.15352 hmax =1.1221 hmax =1.1081 1 0.9 hmin =-0.10015 hmin =0 hmax =1.1084 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.15: UTOPIA 40, 80, 160, 320, and 640 points 194 CHAPTER 11. SPECIAL ADVECTION SCHEMES hmin =3.2358e-24 hmin =3.1356e-36 hmax =0.74802 hmax =0.9478 hmin =1.9394e-59 hmin =8.6597e-154 hmax =1 hmax =1 1 0.9 hmin =8.3198e-196 hmin =0 hmax =1 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.16: Limited UTOPIA 40, 80, 160, 320, and 640 points 11.9. 2D NUMERICAL EXPERIMENTS 195 hmin =-6.254e-06 hmin =-0.020649 hmax =1.0583 hmax =1.0761 hmin =-0.048594 hmin =-0.00091886 hmax =1.0169 hmax =1.0479 1 0.9 hmin =-8.2062e-06 hmin =0 hmax =1.0005 hmax =1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Figure 11.17: WENO5 with 40, 80, 160, 320, and 640 points 196 CHAPTER 11. SPECIAL ADVECTION SCHEMES a highly-scale selective numerical dissipation. This account for the limited extent of the grid-scale noise. Notice also that the sharp gradient is maitained as soon as it becomes resolvable on the computational grid. These finding are summarized in 11.18 where the different scheme are compared for the single resolution of 160 cells in each direction. It is clear that a high-order, upstreambiased scheme is almost optimal for this example as it strikes a good balance between high-order and control of Gibbs oscillations. Limiters can enhance their performance by removing unwanted noise. It should be pointed out that the cost of the scheme escalates quite rapidly with the complexity of the algorithm. Although fidelity to the analytical solution is the over riding concern of the purists, practical considerations can trump the use of expensive schemes. 11.9. 2D NUMERICAL EXPERIMENTS 197 hmin =2.9875e-10 hmin =-0.52123 hmax =0.54917 hmax =1.3178 hmin =-0.059122 hmin =-0.048594 hmax =1.1221 hmax =1.0169 hmin =-7.3573e-18 hmin =1.9394e-59 hmax =0.99259 hmax =1 Figure 11.18: DC, CD4, UTOPIA, WENO5, FCT-CENT4, LIM-UTOPIA at 160 × 160 198 CHAPTER 11. SPECIAL ADVECTION SCHEMES Chapter 12 Fourier series In this chapter we explore the issues arising in expressing functions as Fourier series. It is extremely useful to recall some of the definitions of norms, inner products, and vector spaces reviewed summarily in 15. 12.1 Continuous Fourier series As shown earlier, the set of functions φk (x) = eikx , k = 0, ±1, ±2, . . . (12.1) forms an orthogonal basis function over the interval [−π, π ]: π −π φj φk dx = 2πδjk (12.2) where the overbar denotes the complex conjugate. The Fourier series of the function u is defined as: Su = ∞ uk φk . ˆ (12.3) k =−∞ It represents the formal expansion of u in the Fourier orthogonal system. The Fourier coefficients uk are: ˆ 1π u(x)e−ikx dx. (12.4) uk = ˆ 2π −π It is also possible to re-write the Fourier series in terms of trigonometric functions by using the identities: cos θ = eiθ + e−iθ eiθ − e−iθ , sin θ = , 2 2i (12.5) The Fourier series become: Su = a0 + ∞ (ak cos kx + bk sin kx) k =1 199 (12.6) 200 CHAPTER 12. FOURIER SERIES The Fourier coefficients of the trigonometric series are related to those of the complex exponential series by uk = a|k| − ib|k| . ˆ (12.7) ˆ If u is a real-valued functions, ak and bk are real numbers, and u−k = uk . Often it is ˆ unnessary to use the full Fourier expansion. If u is an even function, i.e. u(−x) = u(x) then all the sine coefficients, bk , are zero, and the series becomes what is called a cosineseries. Likewise, if the function u is odd, u(−x) = −u(x), bk = 0 and the expansion becomes a sine-series. 12.2 Discrete Fourier series In practical applications, numerical methods based on Fourier series cannot be implemented in precisely the same way as suggested in the earlier section. For example, the Fourier coefficients of an arbitrary function may be too difficult to calculate using equation 12.4, either the integral is too complicated to evaluate analytically, or the function u is only known at a discrete set of points; thus an efficient way must be found to convert the function u from physical space to spectral space. Furthermore, nonlinearities can complicate significantly the application of spectral methods. The key to overcoming these difficulties is the use of a discrete Fourier series and its associated discrete Fourier transform. Consider the set of N points xj = 2πj/N , for j = 0, 1, . . . , N − 1. We define the discrete Fourier coeffcients of a complex valued function u in 0 ≤ x ≤ 2π as un = ˜ 1 N Notice that u±N/2 = ˜ N −1 j =0 uj e−inxj , n = −N/2, −N/2 + 1, . . . , N/2 − 1. N −1 ∓i N 2 j =0 uj e 2πj N = N −1 j j =0 (−1) uj , (12.8) and so u N = u− N . The inver˜ ˜ 2 2 sion of the definition can be done easily, multiply equation 12.8 by einxk and sum the resulting series over n to obtain: N −1 un einxk = ˜ n=0 N −1 n=0 1 N N −1 j =0 uj ein(xk −xj ) = 1 N N −1 j =0 uj N −1 ein(xk −xj ) (12.9) n=0 The last sum can be written as as geometric series with factor ei(xk −xj ) . Its sum can then be expressed analytically as: N −1 ein(xk −xj ) = 1 + ei(xk −xj ) + ei2(xk −xj ) + . . . + ei(N −1)(xk −xj ) (12.10) n=0 = 1 − eiN (xk −xj ) 1 − ei2π(k−j ) = 1 − ei2π(k−j )/N 1 − ei(xk −xj ) (12.11) There are two cases to consider: if k = j then all the terms in the series are equal to 1 and hence sum up to N , if k = j then the numerator in the last fraction is equal to N− 0. Thus we can write n=01 ein(xk −xj ) = N δjk . The set of functions einx is said to be 12.2. DISCRETE FOURIER SERIES 201 discretely orthogonal. This property can be substituted in equation 12.9 to obtain the inversion formula: N −1 N −1 un einxk = ˜ n=0 12.2.1 uj δjk = uk (12.12) j =0 Fourier Series For Periodic Problems The structure of the discrete Fourier series can be justified easily for periodic problems. In the continuous case they would take the form ∞ u(x) = un e−ikn x ˆ (12.13) n=−∞ Here, and unlike the integral form, the wavenumbers k are not continuous but are quantizied on account of periodicity. If the domain is 0 ≤ x ≤ a then the wavenumbers are given by nπ kn = , n = 0, 1, 2, 3, . . . (12.14) a In the discrete case the domain would be divide into an equally-spaced set of points xj = j ∆x with ∆x = a/N , where N + 1 is the number of points. The discrete Fourier series would then take the form Nmax un e−ikn xj ˆ u(xj ) = uj = (12.15) n=−Nmax where Nmax is the maximum wave mode that can be represented on the discrete grid. Note that kn xj = 2πn jaN = 2πnj . Since the smallest wavelength is 2∆x the maximum a N wavenumber is then kmax = 2π 2πNmax a N = so that Nmax = = 2∆x a 2∆x 2 (12.16) Furthermore we have that 2π N 2 e−ik±Nmax xj = e∓i N j = e±iπj = (−1)j (12.17) Hence the two waves are identical and it is enough to retain the amplitude of one only. The discrete Fourier series can then take the form: N −1 2 u(xj ) = uj = un e−i ˆ 2πn N (12.18) n=− N 2 We now have the parity between the N degrees of freedom in physical space uj and the N degrees of freedom in Fourier space. Further manipulation can reduce the above expression in the standard form presented earlier: N 2 uj = −1 n=− N 2 un e−i ˆ 2πnj N (12.19) 202 CHAPTER 12. FOURIER SERIES −1 = −i 2πnj N un e ˆ + = un e−i ˆ −i 2πnj −i2πj N e un e ˆ + = 2π (n+N )j −i N un e ˆ + = −i 2πnj N un−N e ˆ n= N 2 = N −1 N −1 2 un e−i ˆ 2πnj N (12.21) N −1 2 un e−i ˆ 2πnj N (12.22) n=0 n=− N 2 N −1 (12.20) n=0 n=− N 2 −1 2πnj N n=0 n=− N 2 −1 N −1 2 un e−i ˜ + N −1 2 un e−i ˆ 2πnj N (12.23) n=0 2πnj N (12.24) n=0 where the new tilded coefficients are related to the old (hatted) coefficients by un = un ˜ ˆ un = un−N ˜ ˆ N 2 0≤n≤ N −1 2 −1≤n≤N −1 (12.25) Sine transforms For problems that have homogeneous Dirichlet boundary conditions imposed, one expands the solution in terms of sine functions: uj = N −1 un sin ˆ n=1 N −1 nπj ∆x N −1 nπj nπxj = un sin ˆ = un sin ˆ L L N n=1 n=1 (12.26) It is easy to verify that u0 = uN = 0 no matter the values of un . To derive the inversion ˆ mπj formula, multiply the above sum by sin N and sum over j to get: N −1 j =1 mπj uj sin N = N −1 j =1 = N −1 n=1 = N −1 n=1 = N −1 n=1 un ˆ N −1 n=1 N −1 un ˆ sin sin j =1 N −1 nπj N sin mπj N nπj mπj sin N N (n + m)πj (n − m)πj un ˆ − cos cos 2 N N j =1 (n−m)πj (n+m)πj (n+m)πj un N −1 i (n−m)πj ˆ + e−i N − ei N − e−i N (12.27) eN 4 j =1 The terms in the inner series have the form Sk = r + r 2 + . . . + r k = r (1+ r + . . . + r k−1 ) = r r k+1 − r r k+1 − 1 rk − 1 = = − 1 (12.28) r−1 r−1 r−1 12.2. DISCRETE FOURIER SERIES 203 which for our trigonometric series become S (p) = N −1 i pπj N = e ei j =1 pπN N pπ pπ pπ − ei N ei N − 1 = (−1)p − ei N pπ ei N − 1 (12.29) with p = ±(n ± m). Note also that S (0) = (N − 1). Furthermore we have that pπ S (p) + S (−p) = (−1)p − ei N pπ (−1)p − e−i N + pπ pπ ei N − 1 e−i N − 1 (−1)p 2 cos pπ − 2 − 2 − 2 cos pπ N N = 2 − 2 cos pπ N = −1 − (−1)p (12.30) (12.31) (12.32) We can now estimate the four different sums: S (n − m) + S (−n + m) − S (n + m) − S (−n − m) = (−1)n+m − (−1)n−m = 0 (12.33) since if n ± m are even if n and m are both either odd or even, and n ± m is odd if one is odd and the other even. For the case where m = n we have S (n − m) + S (−n + m) − S (n + m) − S (−n − m) = 2(N − 1) + 1 + (−1)2n = 2N (12.34) In compact notation we would write S (n − m) + S (−n + m) − S (n + m) − S (−n − m) = 2N δnm (12.35) replacing the above expression in the discrete inner product we get un = ˆ 2 N N −1 j =1 uj sin πnj N (12.36) 204 CHAPTER 12. FOURIER SERIES Chapter 13 Spectral Methods 13.1 Spectral Series The approximation of a function u by a series of the form N u(x) = ui φj (x) ˆ (13.1) j =1 raises several questions: 1. How many terms in the series should be kept to achieve a given error? Two related questions are: how fast do the coefficients ui decrease?, and what determines how ˆ fast these coefficients decay? 2. How do we choose the basis functions φj (x)? 3. How do we compute the coefficients uj ? ˆ There are 4 main concepts that are central to the following discussion. 1. The error in the approximation of a PDE comes from several sources, we however assume that all of these decay to zero at rougly the same rate as N increases. 2. The second concept is anchored in Darboux’s principle, which states that the rate of convergence of a series is determined by its distance to the closest singularity of the function. Two functions that are different by that share similar singularity (in terms of location and strength) have spectral coefficients that decay at the same rate. By studying a few simple examples, we can learn a lot about the strength and weaknesses of spectral methods. 3. From Darboux’s principle and a limited knowledge about the behavior of the function we can predict rates of convergence. Several rates are possible algebraic, geometric, subgeometric and supergeometric. 4. From Darboux’s principle and the model functions we can produce rules of thumbs to decide how to pick N . 205 206 CHAPTER 13. SPECTRAL METHODS 13.2 Fourier Series The Fourier series of a general function is ∞ u(x) = a0 + ∞ an cos nx + n=1 bn sin nx (13.2) n=1 where the cosine and sine Fourier coefficients are: a0 = 1 2π π an bn u(x) dx, −π = 1 π π u(x) −π cos nx sin nx dx, n = 1, 2, . . . (13.3) A more compact form to write the same equations makes use of complex notation, by noting that einx = cos nx + i sin nx: u(x) = ∞ cn einx , cn = n=−∞ 1 = 2π π −π u(x)e−inx dx (13.4) The coefficients of the two forms are related by: c0 = a0 , cn = an + ibn n > 0 an − ibn n < 0 (13.5) Since the Fourier basis is periodic, it would be reasonable to assume that the basis is solely useful for expanding periodic functions. This is partially true only. Fourier series work best for periodic functions. However, the Fourier series will converge although very slowly for quite arbitraty functions u(x). Example 14 Consider the function u(x) = x, and let us find its Fourier coefficients. Since the function is odd u(−x) = −x, all its cosine coefficients are zero, while its sine coefficients are given by bn = 1 π π −π x sin nx dx = (−1)n+1 2 n (13.6) The amplitude of the coefficients decrease linearly with the mode number, and hence the series converges slowly as n increases. Figure 13.1 shows a series of plot comparing the original function to the spectral series with increasing truncation N . For low N the two curves are quite different but the difference decreases as N increases. Notice that the oscillations, away from the discontinuity, decrease with N while their amplitude remains constant near the discontinuity. These oscillations are referred to as Gibbs oscillations produce an O(1) error near the discontinuity since the function evaluates to π while the series sums up to 0. The damping of these oscillations requires a combination of techniques to damp their amplitudes, such as “sum acceleration”, “filtering”, and “reconstruction”. 13.2. FOURIER SERIES 207 N=1 N=5 N=9 N=18 N=36 N=72 Figure 13.1: Approximating the sawtooth function u(x) = x with a sine-spectral series with increasing cut-off N . The solid black line is the original function and the blue line the spectral approximation. 1 1 0.8 0.8 0.6 0.6 N=2 N=4 0.4 0.4 0.2 0.2 0 0 1 0 0.5 1 1.5 0.8 1 0 2 0.5 1 1.5 2 0.8 0.6 0.6 N=6 N=8 0.4 0.4 0.2 0.2 0 0 0 0.5 1 1.5 0 2 0.5 1 1.5 2 Figure 13.2: Approximating the function u(x) = max(0, sin(x) with a spectral series with increasing cut-off N . The solid black line is the original function and the blue line the spectral approximation. The x-axis has been scaled by 1/π . 208 CHAPTER 13. SPECTRAL METHODS N=2 N=4 N=6 Figure 13.3: Approximating the function u(x) = 3/(5 − 4 cos x with a spectral series with increasing cut-off N . The solid black line is the original function and the blue line is the spectral approximation. The x-axis has been scaled by 1/π . Example 15 Our second example consists of expanding the function u(x) = max(0, sin(x) on 0 ≤ x ≤ 2π . Notice that this function is continuous but not differentiable at x = 0 and x = π . The coefficients can be calculated as: n 1 a0 = π , an = −1(−(−1) π n2 −1) 1 b1 = 2 , bn = 0 n>0 n>1 (13.7) The spectral series is re-written as u = a0 + b1 sin nx + N=2 an cos nx and its graph n is shown in figure 13.2. One can visually notice that the series converges faster then the previous example. The two curves almost overlap for N = 8, and the only noticeable oscillations are near the kinks of the curve at x = 0 and 2π . The faster convergence is in fact due to the quadratic, O(n−2 ), decrease of the spectral coefficients, which in turn is due to the increased smoothness of the function over the previous example. Example 16 The function u(x) = 3/(5 − 4 cos x) is a periodic and infinitely differentible function on the interval |x| < π . It has the following Fourier expansion: u(x) = 1 + 2 ∞ 2−n cos nx (13.8) n=1 Notice that for this case the Fourier coefficients decrease geometrically fast with an+1 /an = 1/2. This rapid decrease in the amplitude of these coefficients translate into a rapid convergence of the series. The curves for the original function and for the spectral approximations are shown in figure 13.3 for several N , and the two curves overlap for N = 6. The series exhibits exponential convergence as its coefficients’ decay by 1/2 as n increases. For the previous cases the decay of the coefficient is proportional to O(n−k ) so that nk k an+1 ∼ ∼1− , n ≫k k an (n + 1) n (13.9) hence this ratio approaches 1 as n increases. In contrast the geometrically convergent series has coefficients ratio bounded away from 1. 13.2. FOURIER SERIES 209 0 10 10 −5 10 10 −10 10 0 10 10 1 0 10 20 30 40 50 60 70 10 0 −5 −10 n n Figure 13.4: log-log (left) and semi-log plots of the decay of the Fourier coefficients for example 14 (black curve), example 15 (blue curve), and example 16 (red curve). Figure 13.4 shows the decay of the Fourier coefficients for the 3 examples presented above on a log-log scale and on a semi-log scale. We notice that functions with finite regularity exhibit an algebraic convergence rate which translates into a straight line on a log-log scale and a decreasing curve with upward concavity on a semi-log scale. Functions with infinite regularity, like example 16 show faster decrease of their Fourier coefficients then the algebraic rates, and their graphs are straight lines on semi-log scale plots. The two additional curves shown illustrate the phenomena of sub-geometric convergence, when expanding a function which is infinitely differentiable but singular at x = 0, and super-geometric convergence where the spectral coefficients decrease faster then exponentially as in e−n ln n . 13.2.1 Bounds on Fourier coefficients To gain insight into the behavior of these coefficients we return to their complex form: cn = 1 2π π −π u(x)e−inx dx (13.10) If the function u is differentiable on the interval |x| < π we can apply a single integration by part to obtain: cn = i u(π ) − u(−π ) 1 − (−1)n i 2π n n π −π u′ (x)e−inx dx (13.11) since einπ = e−inπ = cos(nπ = (−1)n . The process can be repeated again if u′ (x) is continuous within the interval to get cn = u′ (π ) − u′ (−π ) u(π ) − u(−π ) 1 − (−1)n i2 − (−1)n 2π n n2 i n 2 π −π u′′ (x)e−inx dx (13.12) 210 CHAPTER 13. SPECTRAL METHODS The following equation can thus be obtained if the function u(x) is differentiable k times: (−1)n k (−1)j u(j ) (π ) − u(j ) (−π ) cn = 2π j =0 i n j +1 1 − 2π i n k +1 π −π u(k+1) einx dx (13.13) where u(j ) is the j -th derivative of u. We note first that if the u(j ) is continuous and periodic we have u(j ) (π ) = u(j ) (−π ) for all j < k, and hence the individual entries in the sum are all zero. The first term not to vanish, i.e. for which the above condition does not hold stops this integration process. This series allows us to bound the coefficients of the spectral expansion. If k is the maximum number of time a function is diffentiable, and if u(j ) is periodic for j ≤ k − 2, then the Fourier coefficients decrease as cn ∼ O 1 nk (13.14) If the function u is infinitely differentiable and all its derivative are periodic, then the process can be repeated an inifinite number of times. This implies that the coefficients are decreasing faster then any finite power of n. This is the property of infinite order or exponential convergence. 13.3 Equal Error Assumptions Here we define the errors incurred in approximating the function u by a truncated series. The error can be separated into several components which we define below: 1. Truncation Error ET (N ) is defined to be the error made by neglecting all spectral coefficients an with n > N . 2. Discretization Error ED (N ) is the difference between the first (N + 1) of the exact solution and the corresponding terms as computed by a spectral or pseudospectral method using (N + 1) basis functions. 3. Interpolation Error EI (N ) is the error made by approximating a function by (N + 1) term series w Chapter 14 Finite Element Methods The discretization of complicated flow domains with finite difference methods is quite cumbersome. Their straightfoward application requires the flow to occur in logically rectangular domains, a constraint that severely limit their capabilities to simulate flows in realistic geometries. The finite element method was developed in large part to address the issue of solving PDE’s in arbitrarily complex regions. Briefly, the FEM method works by dividing the flow region into cells referred to as element. The solution within each element is approximated using some form of interpolation, most often polynomial interpolation; and the weights of the interpolation polynomials are adjusted so that the residual obtained after applying the PDE is minimized. There are a number of FE approaches in existence today; they differ primarily in their choice of interpolation procedure, type of elements used in the discretization; and the sense in which the residual is minimized. Finlayson Finlayson (1972) showed how the different approaches can be unified via the perspective of the Mean Weighed Residual (MWR) method. 14.1 MWR Consider the following problem: find the function u such that L(u) = 0 (14.1) where L is a linear operator defined on a domain Ω; if L is a differential operator, appropriate initial and boundary conditions must be supplied. The continuum problem as defined in equation 14.1 is an infinite dimensional problem as it requires us to find u at every point of Ω. The essence of numerical discretization is to turn this infinite dimensional system into a finite dimensional one: N u= ˜ uj φ(x) ˆ (14.2) j =0 Here u stands for the approximate solution of the problem, u are the N + 1 degrees ˜ ˆ of freedom available to minimize the error, and the φ’s are the interpolation or trial 211 212 CHAPTER 14. FINITE ELEMENT METHODS functions. Equation 14.2 can be viewed as an expansion of the solution in term of a basis function defined by the functions φ. Applying this series to the operator L we obtain L(˜) = R(x) u (14.3) where R(x) is a residual which is different then zero unless u is the exact solution of the ˜ equation 14.1. The degrees of freedom u can now be chosen to minimize R. In order to ˆ determine the problem uniquely, I can impose N + 1 constraints. For MWR we require that the inner product of R with a N + 1 test functions vj to be orthogonal: (R, vj ) = 0, j = 0, 1, 2, . . . , N. (14.4) Recalling the chapter on linear analysis; this is equivalent to saying that the projection of R on the set of functions vj is zero. In the case of the inner product defined in equation 15.13 this is equivalent to Ω Rvj dx = 0, j = 0, 1, 2, . . . , N. (14.5) A number of different numerical methods can be derived by assigning different choices to the test functions. 14.1.1 Collocation If the test functions are defined as the Dirac delta functions vj = δ(x−xj ), then constraint 14.4 becomes: R(xj ) = 0 (14.6) i.e. it require the residual to be identically zero on the collocation points xj . Finite differences can thus be cast as a MWR with collocation points defined on the finite difference grid. The residual is free to oscillate between the collocation points where it is pinned to zero; the oscillations amplitude will decrease with the number of of collocation points if the residual is a smooth function. 14.1.2 Least Square ∂R Setting the test functions to vj = ∂ uj is equivalent to minimizing the norm of the ˆ residual R 2 = (R, R). Since the only parameters available in the problem are uj , this ˆ 2 . This minimum occurs for u such that is equivalent to finding uj that minimize R ˆ ˆj ∂ ∂ uj ˆ R2 dx 2R Ω =0 (14.7) ∂R dx ∂ uj ˆ =0 (14.8) ∂R ∂ uj ˆ =0 (14.9) Ω R, 14.2. FEM EXAMPLE IN 1D 14.1.3 213 Galerkin In the Galerkin method the test functions are taken to be the same as the trial functions, so that vj = φj . This is the most popular choice in the FE community and will be the one we concentrate on. There are varitions on the Galerkin method where the test functions are perturbation of the trial functions. This method is usually referred as the PetrovGalerkin method. The perturbations are introduced to improve the numerical stability of the scheme; for example to introduce upwinding in the solution of advection dominated flows. 14.2 FEM example in 1D We illustrate the application of the FEM method by focussing on a specific problem. Find u(x) in x ∈ [0, 1] such that ∂2u − λu + f = 0 ∂x2 (14.10) subject to the boundary conditions u(x = 0) = 0, ∂u =q ∂x (14.11) (14.12) Equation 14.11 is a Dirichlet boundary conditions and is usually referred to as an essential boundary condition, while equation 14.12 is usually referred to as a natural boundary conditions. The origin of these terms will become clearer shortly. 14.2.1 Weak Form In order to cast the equation into a residual formulation, we require that the inner product with suitably chosen test functions v is zero: 1 0 ∂2u − λu + f v dx = 0 ∂x2 (14.13) The only condition we impose on the test function is that it is zero on those portions of the boundary where Dirichlet boundary conditions are applied; in this case v (0) = 0. Equation 14.13 is called the strong form of the PDE as it requires the second derivative of the function to exist and be integrable. Imposing this constraint in geometrically complex region is difficult, and we seek to reformulate the problem in a “weak” form such that only lower order derivatives are needed. We do this by integrating the second derivative in equation 14.13 by part to obtain: 1 0 ∂u ∂v + λuv dx = ∂x ∂x 1 f v dx + v 0 ∂u ∂x 1 −v ∂u ∂x (14.14) 0 The essential boundary conditions on the left edge eliminates the third term on the right hand side of the equation since v (0) = 0, and the Neumann boundary condition at the 214 CHAPTER 14. FINITE ELEMENT METHODS right edge can be substituted in the second term on the right hand side. The final form is thus: 1 ∂ u ∂v 1 + λuv dx = f v dx + qv (1) (14.15) ∂x ∂x 0 0 For the weak form to be sensible, we must require that the integrals appearing in the formulation be finite. The most severe restriction stems from the first order derivatives appearing on the left hand side of 14.15. For this term to be finite we must require that ∂v the functions ∂u and ∂x be integrable, i.e. piecewise continuous with finite jump discon∂x tinuities; this translates into the requirement that the functions u and v be continuous, the so-called C0 continuity requirement. 14.2.2 Galerkin form The solution is approximated with a finite sum as: N u(x) = ui φi ˆ (14.16) i=0 and the test functions are taken to be v = φj , j = 1, 2, . . . , N . The trial functions φi must be chosen such that φi>0 (0) = 0, in accordance with the restriction that v (0) = 0. We also set, without loss of generality, φ0 (0) = 1, the first term of the series is then nothing but the value of the function at the edge where the Dirichlet condition is applied: u(0) = u0 . The substitution of the expansion and test functions into the weak form yield ˆ the following N system of equations in the N + 1 variables ui : ˆ N 1 i=0 0 ∂ φi ∂φj + λφi φj dx ui = ˆ ∂x ∂x 1 0 f φj dx + qφj (1) (14.17) In matrix form this can be re-written as N 1 Kji ui = bj , Kji = i=0 0 ∂φj ∂φi + λφi φj dx, bj = ∂x ∂x 1 0 f φj dx + qφj (1) (14.18) Note that the matrix K is symmetric: Kji = Kij , so that only half the matrix entries need to be evaluated. The Galerkin formulation of the weak variational statement 14.15 will always produce a symmetric matrix regardless of the choice of expansion function; the necessary condition for the symmetry is that the left hand side operator in equation 14.15 be symmetric with respect to the u and v variables. The matrix K is usually referred to as the stiffness matrix, a legacy term dating to the early application of the finite element method to solve problems in solid mechanics. 14.2.3 Essential Boundary Conditions The series has the N unknowns u1≤i≤N , thus the matrix equation above must be modified ˆ to take into account the boundary condition. We do this by moving all known qunatities to the right hand side, and we end up with the following system of equations: N i=1 Kji ui = cj , cj = bj − Kj 0 u0 , j = 1, 2, . . . , N (14.19) 14.2. FEM EXAMPLE IN 1D 215 Had the boundary condition on the right edge of the domain been of Dirichlet type, we would have to add the following restrictions on the trial functions φ2≤i≤N −1 (1) = 0. The imposition of Dirichlet conditions on both sides is considerably simplified if we further request that φ0 (1) = φN (0) = 0 and φ0 (0) = φN (1) = 1. Under these conditions u0 = u(0) = u0 and uN = u(1) = uN . We end up with the following (N − 1) × (N − 1) ˆ ˆ system of algebraic equations N −1 i=1 14.2.4 Kji ui = cj , cj = bj − Kj 0 u0 − KjN uN , j = 1, 2, . . . , N − 1 (14.20) Choice of interpolation and test functions To complete the discretization scheme we need to specify the type of interpolation functions to use. The choice is actually quite open save for the restriction on using continuous functions (to integrate the first order derivative terms), and imposing the Dirichlet boundary conditions. There are two aspects to choosing the test functions: their locality and their interpolation properties. If the functions φi are defined over the entire domain, they are termed global expansion functions. Such functions are most often used in spectral and pseudo-spectral methods. They provide a very accurate representation for functions that are smooth; in fact the rate of convergence increases faster then any finite power of N if the solution is infinitely smooth, a property known as exponential convergence. This property is lost if the solution has finite continuity. The main drawback of global expansion functions is that the resulting matrices are full and tightly couple all degrees of freedom. Furthermore, the accurate representation of local features, such as fronts and boundary layers, requires long series expasions with substantial increase in the computational cost. Finite element methods are based on local expansion functions: the domain is divided into elements wherein the solution is expanded into a finite series. The functions φi are thus non-zero only within one element, and zero elsewhere. This local representation of the function is extremely useful if the solution has localized features such as boundary layers, local steep gradient, etc... The resulting matrices are sparser and hence more efficient solution schemes become possible. The most popular choice of expansion function is the linear interpolation function, commonly referred to the hat functions which we will explore later on. Higher order expansion are also possible, in particular the spectral element method chooses expansion that are high order polynomial within each element. The nature of the expansion function refers to the nature of the expansion coefficients. If a spectral representation is chosen, then the unknowns become the generalized Fourier (or spectral) coefficients. Again this is a common choice for spectral methods. The most common choice of expansion functions in finite element methods are Lagrangian interpolation functions, i.e. functions that interpolated the solution at specified points xj also referred to as collocation points; in FEM these points are also referred to as nodes. Lagrangian interpolants are chosen to have the following property: φj (xi ) = δij (14.21) where δij is the Kronecker delta. The interpolation function φj is zero at all points xi=j ; at point xj , φj (xj ) = 1. Each interpolation function is associated with one collocation 216 CHAPTER 14. FINITE ELEMENT METHODS φ0 φ1 φ2 u0 ˆ u1 ˆ u2 ˆ ˆˆ $ˆˆ $ ˆˆˆ $$ ˆˆˆ $$ ˆˆ$$ ˆˆ$$ $$ˆˆ $$ˆˆ $$$ ˆˆ $$$ ˆˆ s $$ ˆs $ $ ˆs Figure 14.1: 2-element discretization of the interval [0, 1] showing the interpolation functions point. If our expansion functions are Lagrange interpolants, then the coefficients ui ˆ represent the value of the function at the collocation points xj : N u(xj ) = uj = ui φ(xj ) = uj , j = 0, 1, . . . , N ˆ ˆ (14.22) i=0 We will omit the circumflex accents on the coefficients whenever the expansion functions are Lagrangian interpolation functions. The use of Lagrangian interpolation simplifies the imposition of the C 0 continuity requirement, and the function values at the collocation points are obtained directly without further processing. There are other expansion functions in use in the FE community. For example, Hermitian interpolation functions are used when the solution and its derivatives must be continuous across element boundaries (the solution is then C 1 continuous); or Hermitian expansion is used to model infinite elements. These expansion function are usually reserved for special situations and we will not address them further. In the following 3 sections we will illustrate how the FEM solution of equation 14.15 proceeds. We will approach the problem from 3 different perspectives in order to highlight the algorithmic steps of the finite element method. The first approach will consider a small size expansion for the approximate solution, the matrix equation can then be written and inverted manually. The second approach repeats this procedure using a longer expansion, the matrix entries are derived but the solution of the larger system must be done numerically. The third approach considers the same large problem as number two above; but introduces the local coordinate and numbering systems, and the mapping between the local and global systems. This local approach to constructing the FE stiffness matrix is key to its success and versatility since it localizes the computational details to elements and subroutines. A great variety of local finite element approximations can then be introduced at the local elemental level with little additional complexity at the global level. 14.2.5 FEM solution using 2 linear elements We illustrate the application of the Galerkin procedure for a 2-element discretization of 1 the interval [0, 1]. Element 1 spans the interval [0, 2 ] and element 2 the interval [ 1 , 1] 2 and we use the following interpolation procedure: u(x) = u0 φ0 (x) + u1 φ1 (x) + u2 φ2 (x) (14.23) 14.2. FEM EXAMPLE IN 1D 217 where the interpolation functions and their derivatives are tabulated below φ0 (x) φ1 (x) φ2 (x) 1 0 ≤ x ≤ 2 1 − 2x 2x 0 1 ≤x≤1 0 2(1 − x) 2x − 1 2 ∂φ0 ∂x ∂φ1 ∂x ∂φ2 ∂x −2 0 2 −2 0 2 (14.24) and shown in figure 14.1. It is easy to verify that the φi are Lagrangian interpolation functions at the 3 collocations points x = 0, 1/2, 1, i.e. φi (xj ) = δij . Furthermore, the expansion functions are continuous across element interfaces, so that the C 0 continuity requirement is satisfied), but their derivates are discontinuous. It is easy to show that the interpolation 14.23 amounts to a linear interpolation of the solution within each element. Since the boundary condition at x = 0 is of Dirichlet type, we need only test with functions that satisfy v (0) = 0; in our case the functions φ1 and φ2 are the only candidates. Notice also that we have only 2 unknowns u1 and u2 , u0 being known from the Dirichlet boundary conditions; thus only 2 equations are needed to determine the solution. The matrix entries can now be determined. We illustrate this for two of the entries, and assuming λ is constant for simplicity: 1 K10 = 0 ∂ φ1 ∂φ0 + λφ1 φ0 ∂x ∂x 1 2 dx = 0 [−4 + λ(2x − 4x2 )] dx = −2 + λ 12 (14.25) Notice that the integral over the entire domain reduces to an integral over a single element because the interpolation and test functions φ0 and φ1 are non-zero only over element 1. This property that localizes the operations needed to build the matrix equation is key to the success of the method. The entry K11 requires integration over both elements: 1 K11 = ∂ φ1 ∂φ1 + λφ1 φ1 ∂x ∂x 0 1 2 = 1 [4 + λ4x2 ] dx + 1 2 0 = 2+ 2λ 12 + 2+ 2λ 12 dx (14.26) [4 + λ4(1 − x)2 ] dx =4+ 4λ 12 (14.27) (14.28) The remaining entries can be evaluated in a similar manner. The final matrix equation takes the form: −2 + 0 λ 12 4 + 4λ 12 λ −2 + 12 λ −2 + 12 2λ 2 + 12 u0 u1 = u2 b1 b2 (14.29) Note that since the value of u0 is known we can move it to the right hand side to obtain the following system of equations: 4 + 4λ 12 λ −2 + 12 λ −2 + 12 2λ 2 + 12 u1 u2 = b1 + 2 − b2 λ 12 u0 (14.30) 218 CHAPTER 14. FINITE ELEMENT METHODS whose solution is given by u1 u2 = 2 + 2λ 12 λ 2 − 12 1 ∆ λ 2 − 12 4 + 4λ 12 b1 + 2 − b2 λ 12 u0 (14.31) λ λ where ∆ = 8(1 + 12 )2 − ( 12 − 2)2 is the determinant of the matrix. The only missing piece is the evaluation of the right hand side. This is easy since the function f and the flux q are known. It is possible to evaluate the integrals directly if the global expression for f is available. However, more often that not, f is either a complicated function, or is known only at the collocation points. The interpolation methodology that was used for the unknown function can be used to interpolate the forcing functions and evalute their associated integrals. Thus we write: 1 bj = f φj dx + qφj (1) 0 (14.32) 12 = (fi φi )φj dx + qφj (1) (14.33) 0 i=0 2 1 = 0 i=0 1 12 = φi φj dx fi + qφj (1) 141 012 The final solution can thus be written as: u1 u2 = 1 ∆ 2+ 2− 2λ 12 λ 12 2− 4+ f0 f1 + f2 f0 +4f1 +f2 + 12 f 1+2f2 12 λ 12 4λ 12 (14.34) 0 q 2− (14.35) λ 12 +q u0 (14.36) If u0 = 0, λ = 0 and f = 0, the analytical solution to the problem is u = qx. The finite element solution yields: u1 u2 = 1 4 22 24 0 q = q 2 q (14.37) which is the exact solution of the PDE. The FEM procedure produces the exact result because the solution to the PDE is linear in x. Notice that the FE solution is exact at the interpolation points x = 0, 1/2, 1 and inside the elements. If f = −1, and the remaining parameters are unchanged the exact solution is quadratic ue = x2 /2 + (q − 1)x, and the finite element solution is u1 u2 = 1 4 22 24 −1 2 q− 1 4 = 4q −3 8 q−1 2 (14.38) Notice that the FE procedure yields the exact value of the solution at the 3 interpolation points. The errors committed are due to the interpolation of the function within the 14.2. FEM EXAMPLE IN 1D 219 0.5 0.7 0.6 f=0 0.4 0.5 0.3 0.4 0.2 0.3 f=−x 0.2 0.1 0 0.1 0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 0 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 f=−x2 0.4 0.2 0 0 0.2 Figure 14.2: Comparison of analytical (solid line) and FE (dashed line) solutions for the equation ux x + f = 0 with homogeneous Dirichlet condition on the left edge and Neumann condition ux = 1 on the right edge. The circles indicate the location of the interpolation points. Two linear finite elements are used in this example. 220 CHAPTER 14. FINITE ELEMENT METHODS elements; the solution is quadratic whereas the FE interpolation provide only for a linear variation. For f = −x, the exact solution is ue = x3 /6 + (q − 1/2)x, and the FE solution is: u1 u2 = 1 4 −1 4 22 24 q− 5 24 = 24q −11 48 q−1 3 (14.39) The solution is again exact at the interpolation points by in error within the element due to the linear interpolation. This fortuitous state of affair is due to the exact evaluation of the forcing term f which is also exactly interpolated by the linear functions. For f = −x2 , the exact solution is u = x4 /12 + (q − 1/2)x, and the FE solution is: u1 u2 = 1 4 −1 12 22 24 q− 9 48 = 48q −10 96 q − 22 96 (14.40) This time the solution is in error at the interpolation points also. Figure 14.2 compare the analytical and FE solutions for the 3 cases after setting q = 1. 14.2.6 FEM solution using N linear elements φ0 rr rr u ru x0 ∆ x0 x 1 φj −1 ... φj φj +1 ˆˆˆ $ˆˆˆ $ ˆˆ$$$$ ˆˆ$$$$ $$$ˆˆˆˆ $$$ˆˆˆˆ u$ $ ˆu $ $ ˆu xj − 1 ∆ xj xj ∆xj +1 xj +1 φN ... ¨¨ ¨ ¨ ¨ u u xN − 1 ∆ xN xN Figure 14.3: An element partition of the domain into N linear elements. The element edges are indicated by the filled dots. In order to decrease the error further for cases where f is a more complex function, we need to increase the number of elements. This will increase the size of the matrix system and the computational cost. We go through the process of developing the stiffness equations for this system since it will be useful for the understanding of general FE concepts. Suppose that we have divided the interval into N elements (not necessarily of equal size), then interpolation formula becomes: N u(x) = ui φi (x) (14.41) i=0 Element number j , shown in figure 14.3, spans the interval [xj −1 , xj ], for j = 1, 2, . . . , N ; its left neighbor is element j − 1 and its right number is element j + 1. The length of each element is ∆xj = xj − xj −1 . The linear interpolation function associated with node 14.2. FEM EXAMPLE IN 1D 221 j is φj (x) = 0 x − x j −1 x j − x j −1 xj +1 −x xj +1 −xj 0 x < xj −1 xj −1 ≤ x ≤ xj (14.42) xj ≤ x ≤ xj +1 xj +1 < x Let us now focus on building the stiffness matrix row by row. The j th row of K corresponds to setting the test function to φj . Since the function φj is non-zero only on the interval [xj −1 , xj +1 ], the integral in the stiffness matrix reduces to an integration over that interval. We thus have: xj +1 ∂ φ ∂φ j i (14.43) + λφi φj dx, Kji = ∂x ∂x x j −1 bj = xj +1 x j −1 f φj dx + qφj (1) (14.44) Likewise, the function φi is non-zero only on elements i and i + 1, and hence Kji = 0 unless i = j − 1, j, j + 1; the system of equation is hence tridiagonal. We now derive explicit expressions for the stiffness matrix entries for i = j, j ± 1. xj Kj,j −1 = x j −1 ∂ φj ∂φj −1 + λφj −1 φj ∂x ∂x dx, (14.45) xj x − xj − 1 xj − x 1 +λ 2 (xj − xj −1 ) xj − xj −1 xj − xj −1 x j −1 ∆xj 1 +λ =− ∆ xj 6 = − dx, (14.46) (14.47) The entry for Kj,j +1 can be deduced automatically by using symmetry and applying the above formula for j + 1; thus: Kj,j +1 = Kj +1,j = − 1 ∆xj +1 +λ ∆xj +1 6 (14.48) The sole entry remaining is i = j ; in this case the integrals spans the elements i and i + 1 xj Kj,j = = = x j −1 1 ∆x2 j ∂ φj ∂φj + λφj φj ∂x ∂x xj x j −1 dx + xj +1 xj 1 + λ(x − xj −1 )2 dx, + 1 2∆xj +λ ∆ xj 6 + ∂ φj ∂φj + λφj φj ∂x ∂x 1 ∆x2+1 j xj +1 xj dx, (14.49) 1 + λ(xj +1 − x)2 (14.50) dx, 2∆xj +1 1 +λ ∆xj +1 6 Note that all entries in the matrix equations are identical except for the rows associated with the end points where the diagonal entries are different. It is easy to show that we must have: 1 2∆x1 K0,0 = +λ (14.51) ∆x1 6 2∆xN 1 +λ (14.52) KN,N = ∆xN 6 222 CHAPTER 14. FINITE ELEMENT METHODS The evaluation of the right hand sides leads to the following equations for bj : bj = xj +1 x j −1 j +1 = f φj dx + qφj (1) xj +1 i=j −1 xj −1 φi φj dxfi + qφN (1)δN j (14.53) (14.54) 1 [∆xj fj −1 + 2(∆xj + ∆xj +1 )fj + ∆xj +1 fj +1 ] + qφN (1)δN j (14.55) 6 Again, special consideration must be taken when dealing with the edge points to account for the boundary conditions properly. In the present case b0 is not needed since a Dirichlet boundary condition is applied on the left boundary. On the right boundary the right hand side term is given by: = 1 [∆xj fN −1 + 2(∆xN fj ] + q (14.56) 6 Note that the flux boundary condition affects only the last entry of the right hand side. If the grid spacing is constant, a typical of the matrix equation is: bN = 1 λ∆ x 2λ∆x λ∆ x 1 1 + + + uj − 1 + 2 uj + − uj +1 = ∆x 6 ∆x 6 ∆x 6 ∆x (fj −1 + 4fj + fj +1 ) (14.57) 6 For λ = 0 it is easy to show that the left hand side reduces to the centered finite difference approximation of the second order derivative. The finite element discretization produces a more complex approximation for the right hand side involving a weighing of the function at several neighboring points. − 14.2.7 Local stiffness matrix and global assembly We note that in the previous section we have built the stiffness matrix by constantly referring to a global node numbering system and a global coordinate system across all elements. Although this is practical and simple in one-dimensional problems, it becomes very tedious in two and three dimensions, where elements can have arbitrary orientation with respect to the global coordinate system. It is thus useful to transform the computations needed to a local coordinate system and a local numbering system in order to simplify/automate the building of the stiffness matrix. We illustrate these local entities in the one-dimensional case since they are easiest to grasp in this setting. For each element we introduce a local coordinate system that maps the element j defined over xj −1 ≤ x ≤ xj into −1 ≤ ξ ≤ 1. The following linear map is the simplest transformation that accomplishes that: x − xj −1 −1 (14.58) ξ=2 ∆xj This linear transformation maps the point xj −1 into ξ = −1 and xj into ξ = 1; its inverse is simply ξ+1 + xj −1 (14.59) x = ∆ xj 2 14.2. FEM EXAMPLE IN 1D 223 h1 (ξ ) uj 1 h2 (ξ ) ˆˆˆ $$ ˆˆˆ$$ $$ˆ $ ˆˆˆu uj $ u $$ 2 ∆ xj ξ1 ξ2 Figure 14.4: Local coordinate system and local numbering system We also introduce a local numbering system so the unknown can be identified locally. The superscript j , whenever it appears, indicate that a local numbering system is used to refer to entities defined over element j . In the present instance the uj refers to the global 1 unknown uj −1 and uj refers to the global unknown uj . Finally, the global expansion 2 functions, φj are transformed into local expansion functions so that the interpolation of the solution u within element j is: uj (ξ ) = uj h1 (ξ ) + uj h2 (ξ ) 1 2 (14.60) where the functions h1,2 are the local Lagrangian functions h1 (ξ ) = 1−ξ 1+ξ , h2 (ξ ) = , 2 2 (14.61) It is easy to show that h1 should be identified with the right limb of the global function φj −1 while h2 should be identified with the left limb of global function φj (x). The operations that must be carried out in the computational space include differentiation and integration. The differentiation in physical space is evaluated with the help of the chain rule: ∂uj ∂ξ ∂uj 2 ∂uj = = (14.62) ∂x ∂ξ ∂x ∂ξ ∆xj ∂ξ where ∂x is the metric of mapping element j from physical space to computational space. For the linear mapping used here this metric is constant. The derivative of the function in computational space is obtained from differentiating the interpolation formula 14.60: ∂uj ∂ξ = uj 1 = ∂h1 ∂h2 + uj 2 ∂ξ ∂ξ (14.63) uj − uj 2 1 2 (14.64) For the linear interpolation functions used in this example, the derivative is constant throught the element. We know introduce the local stiffness matrices which are the contributions of the local element integration to the global stiffness matrix: j Km,n = xj x j −1 ∂ hm ∂hn + λhm (ξ )hn (ξ ) ∂x ∂x dx, (m, n) = 1, 2 (14.65) Notice that the local stiffness matrix has a small dimension, 2 × 2 for the linear interpolation function, and is symmetric. We evaluate these integrals in computational space 224 CHAPTER 14. FINITE ELEMENT METHODS j j by breaking them up into 2 pieces Dm,n and Mm,n defined as follows: xj 1 ∂hm ∂hn dx = ∂x ∂x ∂hm ∂hn x j −1 −1 ∂ξ ∂ξ 1 xj ∂x dξ hm hn hm hn dx = ∂ξ −1 x j −1 j Dm,n = j Mm,n = ∂ξ ∂x 2 ∂x dξ ∂ξ (14.66) (14.67) The integrals in physical space have been replaced with integrals in computational space in which the metric of the mapping appears. For the linear mapping and interpolation function, these integrals can be easily evaluated: Mj = ∆xj 2 1 −1 (1 − ξ )2 (1 − ξ 2 ) (1 − ξ 2 ) (1 + ξ )2 dξ = ∆xj 6 21 12 (14.68) 1 −1 −1 1 (14.69) Similarly, the matrix D j can be shown to be: Dj = 1 2∆xj 1 −1 1 −1 −1 1 dξ = 1 ∆ xj The local stiffness matrix is K j = D j + λM j . The matrix M j appears frequently in FEM, it is usually identified with a time-derivative term (absent here), and is referred to as the mass matrix. Having derived expressions for the local stiffness matrix, what remains is to map them into the global stiffness matrix. The following relationships hold between the global stiffness matrix and the local stiffness matrices: j Kj,j −1 = K2,j j j Kj,j = K2,2 + K1,+1 1 Kj,j +1 = j K1,+1 2 (14.70) (14.71) (14.72) The left hand sides in the above equations are the global entries while those on the right hand sides are the local entries. The process of adding up the local contribution is called the stiffness assembly. Note that some global entries require contributions from different elements. In practical computer codes, the assembly is effected most efficiently by keeping track of the map between the local and global numbers in an array: imap(2,j) where imap(1,j) gives the global node number of local node 1 in element j , and imap(2,j) gives the global node number of local node 2 in element j . For the one-dimensional case a simple straightforward implementation is shown in the first loop of figure 14.5 where P stands for the number of collocation points within each element. For linear interpolation, P = 2. The scatter, gather and assembly operations between local and global nodes can now be easily coded as shown in the second, and third loops of figure 14.5. 14.2.8 Quadratic Interpolation With the local approach to stiffness assembly, it is now simple to define more complicated local approximations. Here we explore the possibility of using quadratic interpolation to 14.2. FEM EXAMPLE IN 1D integer, parameter :: N=10 integer, parameter :: P=3 integer :: Nt=N*(P-1)+1 integer :: imap(P,N) real*8 :: ul(P,N),vl(P,N) real*8 :: u(Nt), v(Nt) real*8 :: Kl(P,P,N) real*8 :: K(Nt,Nt) 225 ! ! ! ! ! ! ! ! number of elements number of nodes per element total number of nodes connectivity local variables global variables local stiffness matrix global stiffness matrix ! Assign Connectivity in 1D do j = 1,N ! element loop do m = 1,P ! loop over collocation points imap(m,j) = (j-1)*(P-1)+m ! assign global node numbers enddo enddo ! Gather/Scatter operation do j = 1,N ! element loop do m = 1,P ! loop over local node numbers mg = imap(m,j) ! global node number of node m in element j ul(m,j) = u(mg) ! local gathering operation v(mg) = vl(m,j) ! local scattering enddo enddo ! Assembly operation K(1:Nt,1:Nt) = 0 ! global stiffness matrix do j = 1,N ! element loop do n = 1,P ng = imap(n,j) ! global node number of local node n do m = 1,P mg = imap(m,j) ! global node number of local node m K(mg,ng) = K(mg,ng) + Kl(m,n,j) enddo enddo enddo Figure 14.5: Gather, scatter and stiffness assembly codes. 226 CHAPTER 14. FINITE ELEMENT METHODS improve our solution. The local interpolation takes the form uj (ξ ) = uj h1 (ξ ) + uj h2 (ξ ) + uj h3 (ξ ) 1 2 3 1−ξ h1 (ξ ) = −ξ 2 h2 (ξ ) = 1 − ξ 2 1+ξ h3 (ξ ) = ξ 2 (14.73) (14.74) (14.75) (14.76) It is easy to verify that hi (ξ ) are Lagrangian interpolants at the collocation points ξi = −1, 0, 1. These functions are shown in top right panel of figure 14.6. Notice that there are now 3 degrees of freedom per elements, and that the interpolation function associated with the interior node does not interact with the interpolation functions defined in other elements. The local matrices can be evaluated analytically: 7 −8 1 4 2 −1 1 ∆ xj j j 2 , D = M= −8 16 −8 , 2 16 30 3∆xj 1 −8 7 −1 2 4 (14.77) The assembly procedure can now be done as before with the proviso that the local node numbers m runs from 1 to 3. In the present instance the global system of equation is pentadiagonal and is more expensive to invert then the tridiagonal system obtained with the linear interpolation functions. One would expect improved accuracy, however. 14.2.9 Spectral Interpolation Generalizing the approach to higher order polynomial interpolation is possible. As the degree of the interpolating polynomial increases, however, the well-known Runge phenomenon rears its ugly head. This phenomenon manifests itself in oscillations near the edge of the interpolation interval. This can be cured by a more judicial choice of the collocation points. This is the approach followed by the spectral element method, where the polynomial interpolation is still cast in terms of high order Lagrangian interpolation polynomials but the collocation points are chosen to be the Gauss-Lobatto roots of special polynomials. The most common polynomials used are the Legendre polynomials since their Gauss-Lobatto roots possess excellent interpolation and quadrature properties. The Legendre spectral interpolation takes the form P uj (ξ ) = uj hm (ξ ) m (14.78) m=1 hm (ξ ) = P −(1 − ξ 2 )L′ −1 (ξ ) ξ − ξn P = , P (P − 1)LP −1 (ξm )(ξ − ξm ) n=1,n=m ξm − ξn m = 1, 2, . . . , P. (14.79) LP −1 denotes the Legendre polynomial of degree (P − 1) and L′ −1 denotes its derivative. P The P collocation points ξn are the P Gauss-Lobatto roots of the Legendre polynomials of degree P − 1, i.e. they are the roots of the equation: 2 (1 − ξn )L′ −1 (ξn ) = 0, P (14.80) 14.2. FEM EXAMPLE IN 1D 227 1 h2 0.8 1 1 h2 2 0.8 0.6 0.2 0 h3 3 0.4 0.2 h3 2 0.6 0.4 h3 1 0 −0.2 −1 −0.5 1 0.8 0 4 0.5 1 −0.2 −1 1 4 h2 h3 0.8 0.6 6 h3 1 6 h4 h5 0.2 0 6 h2 0.5 0.4 0.2 6 0 0.6 0.4 −0.5 0 −0.2 −1 −0.5 0 0.5 1 −0.2 −1 −0.5 0 0.5 1 Figure 14.6: Plot of the Lagragian interpolants for different polynomial degrees: linear (top left), quadratic (top right), cubic (bottom left), and fifth (bottom right). The collocation points are shown by circles, and are located at the Gauss-Lobatto roots of the Legendre polynomial. The superscript indicates the total number of collocation points, and the subscript the collocation point with which the polynomial interpolant is associated. ξr1 ξr2 r r ξrm r ξ r rP Figure 14.7: Distribution of collocation points in a spectral element. In this example there are 8 collocation points (polynomial of degree 7). 228 CHAPTER 14. FINITE ELEMENT METHODS and are shown in figure 14.7. Equation 14.79 shows the two different forms in we can express the Lagragian interpolant; the traditional notation expresses hm as a product of P − 1 factors chosen so as to guarantee the Lagragian property; the second form is particular to the choice of Legendre Gauss-Lobatto points Boyd (1989); Canuto et al. (1988). It is easy to show that hm (ξn ) = δmn , and they are polynomials of degree P − 1. Note that unlike the previous cases the collocation points are not equally spaced within each element but tend to cluster more near the boundary of the element. Actually the collocation spacing is O(1/(P − 1)2 ) near the boundary and O(1/(P − 1)) near the center of the element. These functions are shown in figure 14.6 for P = 4 and 6. The P − 2 internal points are localized to a single element and do not interact with the interpolation function of neighboring elements; the edge interpolation polynomials have support in two neighboring elements. The evaluation of the derivative of the solution at specified points ηn is equivalent to: P u′ (ηn ) = h′ (ηn )um m (14.81) m=1 and can be cast in the form of a matrix vector product, where the matrix entries are the derivative of the interpolation polynomials at the specified points ηn . The only problem arises from the more complicated form of the integration formula. For this reason, it is common to evaluate the integrals numerically using high order Gaussian quadrature; see section 14.2.10. Once the local matrices are computed the assembly procedure can be performed with the local node numbering m running from 1 to P. 14.2.10 Numerical Integration Although it is possible to evaluate the integrals analytically for each interpolation polynomial, the task becomes complicated and error prone. Furthermore, the presence of variable coefficients in the equations may complicate the integrands and raise their order. The problem becomes compounded in multi-dimensional problems. It is thus customary to revert to numerical integration to fill in the entries of the different matrices. Gauss quadrature estimates the definite integral of a function with the weighed sum of the function evaluated at specified points called quadrature points: Q 1 G g(ξp )ωp + RQ g(ξ ) dξ = −1 (14.82) p=1 G where Q is the order of the quadrature formula and ξp are the Gauss quadrature points; the superscript is meant to distinguish the quadrature points from the collocation points. RQ is the remainder of approximating the integral with a weighed sum; it is usually proportional to a high order derivative of the function g. Gauss Quadrature The Gauss quadrature is one of the most common quadrature formula used. Its quadrature points are given by the roots of the Qth degree Legendre polynomial; its weights 14.2. FEM EXAMPLE IN 1D 229 and remainder are: G ωp = RQ = 2 (1 − , 2 G ξp )[L′ (ξp )]2 Q p = 1, 2, . . . , Q ∂ 2Q g 22Q+1 (Q!)4 3 ∂ξ 2Q (2Q + 1)[(2Q)!] ξ (14.83) , |ξ | < 1 (14.84) Gauss quadrature omits the end points of the interval ξ = ±1 and considers only interior points. Notice that if the integrand is a polynomial of degree 2Q − 1; its 2Q-derivative is zero and the remainder vanishes identically. Thus a Q point Gauss quadrature integrates all polynomials of degree less then 2Q exactly. Gauss-Lobatto Quadrature The Gauss-Lobatto quadrature formula include the end points in their estimate of the integral. The roots, weights, and remainder of a Q-order Gauss-Lobatto quadrature are: GL 1 − ξp 2 GL L′ −1 (ξp ) = 0 Q GL ωp = RQ = (14.85) 2 , p = 1, 2, . . . , Q 2 (1 − ξp )[L′ (ξp )]2 Q −Q(Q − 1)3 22Q−1 [(Q − 2)!]4 ∂ 2Q−2 g (2Q − 1)[(2Q − 2)!]3 ∂ξ 2Q (14.86) ξ , |ξ | <(14.87) 1 A Q point Gauss-Lobatto quadrature of order Q integrates exactly polynomials of degree less or equal to 2Q − 3. Quadrature order and FEM The most frequently encountered integrals are those associated with the mass matrix, equation 14.67, and diffusion matrix, equation 14.66. We will illustrate how the order of integration can be determined to estimate these integrals accurately. We assume for the time being that the interpolation polynomial hm is of degree P − 1 (there are P collocation points within each element), and the metric of the mapping is constant. The integrand in equation 14.66 is of degree 2(P − 2), and the in equation 14.67 is of degree 2(P − 1). If Gauss quadrature is used and exact integration is desired then the order of the quadrature must be Q > P − 2 in order to evaluate 14.66 exactly and Q > P in order to evaluate 14.67 exactly. Usually a single quadrature rule is chosen to effect all integrations needed. In this case Q = P + 1 will be sufficient to evaluate the matrices M and D in equations 14.67-14.66. Higher quadrature with Q > P + 1 may be required if additional terms are included in the integrand; for example when the metric of the mapping is not constant. Exact evaluation of the integrals using Gauss-Lobatto quadrature requires more points since it is less accurate then Gauss quadrature: Q ≥ (2P + 3)/2 GaussLobatto points are needed to compute the mass matrix exactly. 230 CHAPTER 14. FINITE ELEMENT METHODS Although the order of Gauss quadrature is higher, it is not always the optimal choice; other considerations may favor Gauss-Lobatto quadrature and reduced (inexact) integration. Consider a quadrature rule where the collocation and quadrature points are identical, such a rule is possible if one chooses the Gauss-Lobatto quadrature of order P , GL where P is the number of points in each element; then ξm = ξm for m = 1, . . . , P . The evaluation of the local mass matrix becomes: j Mm,n = ≈ 1 −1 P hm (ξ )hn (ξ ) ∂x dξ ∂ξ hm (ξp )hm (ξp ) p=1 P = δm,p δn,p p=1 = δm,n ∂x ∂ξ ∂x ∂ξ ωm ∂x ∂ξ ωp (14.88) ωp (14.89) ξp (14.90) ξp (14.91) ξm Equation 14.91 shows that mass matrix becomes diagonal when the quadrature and collocation points are identical. This rule applies equatlly well had we chosen the Gauss points for quadrature and collocation. However, the Gauss-Lobatto roots are preferred since they simplify the imposition of C 0 continuity across elements (there are formulation where C 0 continuity is not necessary, Gauss quadrature and collocation becomes sensible). The implication of a diagonal mass matrix is profound for it simplifies considerably the computations of time-dependent problems. As we will see later, the time-integration requires the inversion of the mass matrix, and this task is infinitely easier when the mass matrix is diagonal. The process of reducing the mass matrix to diagonal is occasionally referred to as mass lumping. One should be carefull when low order finite elements are used to build the mass matrix as the reduced quadrature introduces error. For low order interpolation (linear and quadratic) mass lumping reduces the accuracy of the finite element method substantially; the impact is less pronounced for higher order interpolation; the rule of thumb is that mass lumping is terrible for linear element and has little impact for P > 3. Example 17 We solve the 1D problem: uxx − 4u = 0 in 0 ≤ x ≤ 1 subject to the boundary conditions u(0) = 0, and ux = 1. The analytical solution is u = sinh2x/(2cosh2). The rms error between the finite element solution and the analytical solution is shown in figures 14.8 as a function of the number of degrees of freedom. The plots show the error for the linear (red) and quadratic (blue) interpolation. The left panel shows a semi logarithmic plot to highlight the exponential convergence property of the spectral element method, while the right panel shows a log-log plot of the same quantities to show the algebraic decrease of the error as a function of resolution for the linear and quadratic interpolation as evidenced by the straight convergence lines. The slopes of the convergence curves for the spectral element method keeps increasing as the number of degrees of freedom is increased. This decrease is most pronounced when the degrees of freedom as added as increased spectral interpolation within each element as opposed to increasing the number of elements. Note that the spectral element computations used 14.3. MATHEMATICAL RESULTS 231 −2 10 −2 10 −4 −4 1 10 1 10 2 2 −6 10 −6 10 3 3 4 4 5 −8 5 −8 10 ||ε||2 ||ε||2 10 −10 −10 10 −12 10 −14 10 10 −12 10 −14 10 −16 10 −16 10 0 50 100 150 200 250 300 350 0 1 10 2 10 3 10 10 K(N−1)+1 K(N−1)+1 Figure 14.8: Convergence curves for the finite element solution of uxx − 4u = 0. The red and blue lines show the solutions obtained using linear and quadratic interpolation, respectively, using exact integration; the black lines show the spectral element solution. Gauss-Lobatto quadrature to evaluate the integrals, whereas exact integration was used for linear and quadratic finite elements. The inexact quadrature does not destroy the spectral character of the method. 14.3 Mathematical Results The following mathematical results are presented without proof given the substantial mathematical sophistication in their derivation. 14.3.1 Uniqueness and Existence of continuous solution The existence and uniqueness of the solution to the weak form is guaranteed by the Lax-Milgram theorem: Theorem 1 Lax-Milgram Let V be a Hilbert space with norm V , consider the bilinear form A(u, v ) : V × V −→ R, and the bounded linear functional F (v ) : V −→ R. If the bilinear form is bounded and coercive,i.e. there exists positive constants ρ and β such that • continuity of A: • coercivity of A: |A(u, v )| ≤ β u ρu V V v V ∀u, v ∈ V ≤ A(u, u) ∀u ∈ V Then there exists a unique u ∈ V such that ˆ A(ˆ, v ) = F (v ) ∀v ∈ V u (14.92) The continuity condition guarantees that the operator A is bounded: |A(u, v )| ≤ β u 2 V . This, in combination with the coercivity condition guarantees that A is norm equivalent to V . The above theorem guarantees the existence and uniqueness of the continuous solution. We take the issue of the discrete solution in the following section. 232 14.3.2 CHAPTER 14. FINITE ELEMENT METHODS Uniqueness and Existence of continuous solution The infinite-dimensional continuous solution u, must be approximated with a finite diˆ mensional approximation uN where N characterizes the dimensionality (number of degrees of freedom) of the discrete solution. Let VN ⊂ V be a finite dimensional subspace of V providing a dense coverage of V so that in the limit N → ∞, limN →∞ VN = V . Since VN is a subset of V the condition of the Lax-Milgram theorem are fullfilled and hence a unique solution exists for the discrete problem. The case where the discrete space VN ⊂ V (VN is a subset of V ) is called the conforming case. The proofs of existence and uniqueness follow from the simple fact that VN is a subset of V . Additionally, for the Galerkin approximation the following stability condition is satisfied: uN V ≤C f (14.93) where C is a positive constant independent of N . One has moreover that: un − u ˆ V ≤ β inf u − v |V ˆ ρ v∈VN (14.94) The inequality (14.93) shows that the V -norm of the numerical solution is bounded by the L2 -norm of the forcing function f (the data). This is essentially the stability criterion of the Lax-Richtmeyer theorem. Inequality (14.94) says that the V -norm of the error is bounded by the smallest error possible v ∈ VN to describe u ∈ V . By making further ˆ assumptions about the smoothness of the solution u it is possible to devise error estimates ˆ in terms of the size of the elements h. The above estimate guarantees that the left hand side of (14.94) goes to zero as N → ∞ since VN → V . Hence the numerical solution converges to the true solution. According to the Lax-Richtmeyer equivalence theorem, since two conditions (stability and convergence) hold, the third condition (consistency) must follow; the discretization is hence also consistent. 14.3.3 Error estimates Inequality (14.94) provide the mean to bound the error in the numerical solution. Let uI = IN u be the interpolant of u in VN . An upper bound on the approximation error ˆ ˆ ˆ can be obtained since un − u ˆ V ≤ β inf v − u |V ≤ C uI − u V ˆ ˆ ˆ ρ v∈VN (14.95) Let h reflect the charateristic size of an element (for a uniform discretization in 1D, this would be equivalent to ∆x). One expects un − u V → 0 as h → 0. The rate at which ˆ that occurs depends on the smothness of the solution u. ˆ For linear intepolation, we have the following estimate: 1 un − u ˆ H1 ˆ ≤ C h |u|H 2 , |u|H 2 = ˆ 0 uxx dx 1 2 (14.96) where |u|H 2 is the so-called H 2 semi-norm, (essentially a measure of the “size” of the ˆ second derivative), and C is a generic positive constant independent of h. If the solution 14.4. TWO DIMENSIONAL PROBLEMS 233 admits integrabale second derivatives, then the H 1 -norm of the error decays linearly with the grid size h. The L2 -norm of the error however decreases quadratically according to: un − u ˆ H1 1 ˜ ≤ C h |u|H 2 |u|H 2 = ˆ ˆ 2 0 2 (uxx ) dx 1 2 (14.97) The difference between the rate of convergences of the two error norms is due to the fact that the H 1 -norm takes the derivatives of the function into account. That the first derivative of u are approximated to first order in h while u itself is approximated to ˆ ˆ second order in h. For an interpolation using polynomials of degree k the L2 -norm of the error is given by ˆ un − u ≤ C hk+1 |u|H k+1 ˆ ˆ |u|H k+1 = ˆ 1 0 dk+1 u dxk+1 2 1 2 dx (14.98) provided that the solution is smooth enough i.e. the k + 1 derivative of the solution is square-integrable. For the spectral element approximation using a single element, the error depends on N and the regularity of the solution. If u ∈ H m with m > 0, then the approximation ˆ error in the L2 norm is bounded by ˆ ˆ un − u ≤ C N −m u Hm (14.99) The essential difference between the p-version of the finite element method and the spectral element method lies in the exponent of the error (note that h N −1 ). In the p-case the exponent of N is limited by the degree of the interpolating polynomial. In the spectral element case it is limited by the smoothness of the solution. If the latter is infinitely smooth then m ≈ N and hence the decay is exponential N −N . 14.4 Two Dimensional Problems The extension of the finite element methods to two-dimensional elliptic problems is straightforward and follows the same lines as for the one-dimensional examples, namely: transformation of the PDE to a variational problem, restriction of this problem to a finite dimensional space (effectively, the Galerkin step), discretization of the geometry and spatial interpolation, assembly of the stiffness matrix, imposition of boundary conditions, and finally solution of the linear system of equations. Two dimensional problems have more complicated geometries then one-dimensional problems. The issue of geometry discretization becomes important and this will be explored in the present section. We will take as our sample PDE the following problem: ∇2 u − λu + f = 0, x ∈ Ω (14.100) ∇u · n = q, x ∈ ΓN (14.102) u(x) = ub (x), x ∈ ΓD (14.101) 234 CHAPTER 14. FINITE ELEMENT METHODS where Ω is the region of interest with ∂ Ω being its boundary, ΓD is that portion of ∂ Ω where Dirichlet conditions are imposed and ΓN are those portions where Neumann conditions are imposed. We suppose the ΓD + ΓN = ∂ Ω. The variational statements comes from multiplying the PDE by test functions v (x) that satisfy v (x ∈ ΓD ) = 0, integrating over the domain Ω, and applying the boundary conditions; the variational statement boils down to: Ω (∇u · ∇v + λuv ) dA = f v dA + Ω ΓN 1 vq ds, ∀v ∈ H0 (14.103) 1 where H0 is the space of square integrable functions on Ω that satisfy homogenous boundary conditions on ΓD ; ds is the arclength along the boundary. The Galerkin formulation comes from restricting the test functions to a finite set and interpolation functions to a finite set: N u= ui φi (x), v = φj , j = 1, 2, . . . , N (14.104) i=1 where the φi (x) are now two dimensional interpolation functions (here we restrict ourselves to Lagrangian interpolants). The Galerkin formulations becomes find ui such that N i=1 Ω (∇φi · ∇φi + λφi φj ) dAui = Ω f φj dA + ΓN φj q ds, j = 1, 2, . . . , N (14.105) We thus recover the matrix formulation Ku = b where the stiffness matrix and forcing functions are given by: Kji = bj Ω = Ω (∇φi · ∇φi + λφi φj ) dA (14.106) f φj dA + (14.107) ΓN φj q ds In two space dimensions we have a greater choice of element topology (shape) then in the simplistic 1D case. Triangular elements, the simplex elements of 2D space, are very common since they have great flexibility, and allow the discretization of very complicated domains. In addition to triangular elements, quadrilaterals elements with either straight or curved edges are extensively used. In the following, we explore the interpolation functions for each of these elements, and the practical issues needed to be overcome in order to compute the integrals of the Galerkin formulation. 14.4.1 Linear Triangular Elements One of the necessary ingredient of FE computations is the localization of the computations to an element which requires the development of a natural coordinate system in which to perform the computations. For the triangle, the natural coordinate system is the area coordinate shown in figure 14.9. The element is identified by the three nodes i, j and k forming its vertices; let P be a point inside the triangle with coordinates x. By connecting P to the three vertices i, j , and k, we can divide the elements into three 14.4. TWO DIMENSIONAL PROBLEMS 235 k h hh h hhhh d hhh d hh 2 2 Ai  d 222  j 22  d22 c  P  Aj  Ak        i Figure 14.9: Natural area coordinate system for triangular elements. small triangles with areas Ai , Aj and Ak , respectively with Ai + Aj + Ak = A, where A is the area of the original element. Notice that if the point P is located along edge j − k, Ai = 0, whereas if its located at i we have Ai = A and Aj = Ak = 0; the nodes j and k have similar property. This natural division of the element into 3 parts of similar structure allows us to define the area coordinate of point P as: ai = Ai Aj Ak , aj = , ak = , ai + aj + ak = 1 A A A (14.108) The area of a triangle is given by the determinant of the following matrix: 1 A= 2 1 xi yi 1 xj y j 1 xk yk (14.109) The other areas Ai , Aj and Ak can be obtained similarly; their dependence on the coordinate (x, y ) of the point P is linear. It is now easy to verify that if we set the local interpolation functions to φi = ai we obtain the linear Lagrangian interpolant on the triangle, i.e. that φi (xm ) = δi,m where m can tak the value i, j , or k. The linear Lagrangian interpolant for point i can be easily expressed in terms of the global coordinate system (x, y ): 1 (αi + βi x + γi y ) 2A = xj yk − xk yj φi (x, y ) = (14.110) αi (14.111) βi = yj − yk γi = −(xj − xk ) (14.112) (14.113) The other interpolation function can be obtained with a simple permutation of indices. Note that the derivative of the linear interpolation functions are constant over the element. The linear interpolation now takes the simple form: u(x) = ui φi (x) + uj φj (x) + uk φk (x) (14.114) 236 CHAPTER 14. FINITE ELEMENT METHODS 1 1 0.5 0.5 1 0 1 0.5 0 0 0.2 0.4 0.6 0.8 1 1 0.5 0.5 0 0 0 1 0.5 0 1 1 0.5 0.5 0 0 Figure 14.10: Linear interpolation function over linear triangular elements. The triangle is shown in dark shades. The three interpolation functions are shown in a light shade and appear as inclined planes in the plot. where ui , uj and uk are the values of the solution at the nodes i, j and k. The interpolation formula for the triangle guarantees the continuity of the solution across element boundaries. Indeed, on edge j − k for example, the interpolation does not involve node i and is essentially a linear combination using the functions values at node j and k, thus ensuring continuity. The linear interpolation function for the triangle are shown in figure 14.10 as a three-dimensional plot. The usefullness of the area coordinates stems from the existence of the following integration formula over the triangle: p!q !r ! (14.115) ap aq ar dA = 2A ijk (a + b + c + 2)! A where the notation p! = 12 . . . p stands for the factorial of the integer p. It is now easy to verify that the local mass matrix is now given by φi φi φi φj Me = φj φi φj φj A φk φi φk φj 211 φi φk A φj φk dA = 1 2 1 12 112 φk φk (14.116) The entries of the matrix arising from the discretization of the Laplace operator are easy to compute since the gradients of the interpolation and test functions are constant over an element; thus we have: De = ∇φi · ∇φi ∇φi · ∇φj ∇φj · ∇φi ∇φj · ∇φj A ∇φk · ∇φi ∇φk · ∇φj ∇φi · ∇φk ∇φj · ∇φk dA ∇φk · ∇φk (14.117) 14.4. TWO DIMENSIONAL PROBLEMS 237 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 14.11: FEM grid and contours of the solution to the Laplace equation in a circular annulus. = β β + γi γi βi βj + γi γj βi βk + γi γk 1 ii βj βi + γj γi βj βj + γj γj βj βk + γj γk 4A βk βi + γk γi βk βj + γk γj βk βk + γk γk (14.118) As an example of the application of the FEM element method we solve the Laplace equation in a circular annulus subject to Dirichlet boundary conditions using a FEM with linear triangular elements. The FEM grid and contours of the solution are shown in figure 14.11. The grid contains 754 nodes and 1364 elements. The boundary conditions were set to cos θ on the outer radius and sin θ on the inner radius, where θ is the azimuthal angle. The inner and outer radii are 1/2 and 1, respectively. The contour lines were obtained by interpolating the FEM solution to a high resolution (401x401) strutured grid prior to contouring it. 14.4.2 Higher order triangular elements It is possible to define higher order interpolation formula within an element without changing the shape of an element. For example the quadratic triangular elements with collocation points at the triangle vertices and mid-points of edges are given by u(x) = ui φ2 (x) + uj φ2 (x) + uk φ2 (x) j k i φ2 i φ2 j 2 φk 2 φi−j + ui−j φ2−j (x) + uj −k φ2−k (x) + uk−i φ2 −i (x) i j k = ai (ai − 1) = aj (aj − 1) = ak (ak − 1) = 4ai aj (14.119) (14.120) (14.121) (14.122) (14.123) 238 CHAPTER 14. FINITE ELEMENT METHODS η 3u T 1 E -1 1 1 u 2 u 4   4 u ξ u -1 2 y T u g g g g   u &1 & & & g& 3 gu & Ex Figure 14.12: Mapping of a quadrilateral between the unit square in computational space (left) and physical space (right). φ2−k = 4aj ak j (14.124) φ2 −i = 4ak ai k (14.125) where i − k denotes the midpoint of edge i − k. There are 6 degrees of freedom associated with each quadratic triangular element. Although it is possible to define even higher order interpolation in the triangle by proceeding as before; it is not so simple to implement. The alternative is to use a mapping to a structured computational space and define the interpolation and collocation function in that space. It is important to choose the collocation points appropriately in order to avoid Gibbs oscillations as the polynomial degree increases. Spectral triangular elements are the optimal choice with this regard. We will not discuss spectral element triangles here; we refer the interested reader to Karniadakis and Sherwin (1999). 14.4.3 Quadrilateral elements Change of Coordinates The derivation of the interpolation and integration formula for quadrilateral finite elements follows the line of the previous section. The main task resides in defining the local coordinate system in the master element shown in figure 14.12. For straight edged elements the mapping between physical and computational space can be easily effected through the following bilinear map: x(ξ, η ) = 1−ξ 2 1+ξ 1+η 1−η x1 + x3 + 2 2 2 1+η 1−η x2 + x4 2 2 (14.126) In order to derive all the expressions needed to express the Galerkin integrals in computational space, it is useful to introduce a basis in computational space (eξ , eη ) tangential to the coordinate lines (ξ, η ); we denote by (x, y ) the coordinate in physical space. Let r = xi + y j denotes a vector pointing to point P located inside the element. We define the basis vectors vectors tangent to the coordinate lines as eξ = ∂r ∂r = xξ i + y ξ j , e η = = xη i + y η j ∂ξ ∂η (14.127) 14.4. TWO DIMENSIONAL PROBLEMS 239 where r denotes the position vector of a point P in space and (i, j) forms an othonormal basis in the physical space. Inverting the above relationship one obtains i= yξ xη xξ yη eξ − eη , j = − eξ + eη J J J J where J = xξ yη − xη yξ is the Jacobian of the mapping. The norms of eξ and eη are given by |eξ |2 = eξ · eξ = (xξ )2 + (yξ )2 , |eη |2 = eη · eη = (xη )2 + (yη )2 The basis in the computational plane is orthogonal if eξ · eη = xξ xη + yξ yη = 0; in general the basis is not orthogonal unless the element is rectangular. It is now possible to derive expression for length and area segements in computational space. These are needed in order to compute boundary and area integrals arising from the Galerkin formulation. Using the definition ( ds)2 = dr · dr with dr = rξ dξ + rη dη , we have: ( ds)2 = |eξ dξ + eη dη |2 = |eξ |2 dξ 2 + |eη |2 dη 2 + 2eξ · eη dξ dη (14.128) The differential area of a curved surface is defined as the area of its tangent plane approximation (in 2D, the area is always flat.) The area of the parallelogram defined by the vectors dξ eξ and dη eη is dA = || dξ eξ × dη eη || = i jk xξ yξ 0 xη yη 0 dξ dη = |xξ yη − xη yξ | dξ dη = |J | dξ dη (14.129) after using the definition of (eξ , eη ) in terms of (i, j). Since x = x(ξ, η ) and y = y (ξ, η ), the derivative in physical space can be expressed in terms of derivatives in computational space by using the chain rule of differentiation; in matrix form this can be expressed as: ux uy = ξx ηx ξy ηy uξ uη (14.130) Notice that the chain rule involves the derivatives of ξ, η in terms of x, y whereas the bilinear map readily delivers the derivatives of x, y with respect to ξ, η . In order to avoid inverting the mapping from physical to computational space we derive expressions for ∇ξ , and ∇η in terms of xξ , xη , etc... Applying the chain rule to x and y we obtain (noticing that the two variables are independent) the system of equations: xξ yξ xη yη ξx ηx ξy ηy = xx y x xy yy = 10 01 (14.131) The solution is ξx ηx ξy ηy = 1 J y η −y ξ −xη xξ , J = xξ yη − xη yξ (14.132) 240 CHAPTER 14. FINITE ELEMENT METHODS 1 1 0.5 0.5 0 1 0 1 1 0 −1 1 0 0 0 −1 −1 1 1 0.5 0.5 0 1 −1 0 1 1 0 −1 1 0 0 0 −1 −1 −1 Figure 14.13: Bilinear shape functions in quadrilateral elements, the upper left hand corner is h1 (ξ )h1 (η ), the upper right hand side panel shows h2 (ξ )h1 (η ), the lower left panel shows h1 (ξ )h2 (η ) and lower right shows h2 (ξ )h1 (η ). For the bilinear map of equation 14.126, the metrics and Jacobian can be easily computed by differentiation: 1 − η x2 − x1 1 + η x4 − x1 + 2 2 2 2 1 − ξ x3 − x1 1 + ξ x4 − x2 xη = + 2 2 2 2 xξ = 1 − η y2 − y1 1 + η y4 − y1 + (14.133) 2 2 2 2 1 − ξ y3 − y1 1 + ξ y4 − y2 yη = + (14.134) 2 2 2 2 yξ = The remaining expression can be obtained simply by plugging in the the various expressions derived earliear. 14.4.4 Interpolation in quadrilateral elements The interpolation of the solution within quadrilateral elements is easily accomplished if tensorized product of one-dimensional Lagrangian interpolants are used to build the two-dimensional formula. For the bilinear map shown in figure 14.12, for example we 14.4. TWO DIMENSIONAL PROBLEMS 241 can use the collocation points located at the vertices of the quadrilateral to define the following interpolation: 4 u(ξ, η ) = um φm (ξ, η ) = u1 φ1 + u2 φ2 + u3 φ3 + u4 φ4 (14.135) m=1 where the two-dimensional Lagrangian interpolants are tensorized product of the onedimensional interpolants defined in equation 14.61: φ1 (ξ, η ) = h1 (ξ )h1 (η ), φ2 (ξ, η ) = h2 (ξ )h1 (η ), φ3 (ξ, η ) = h1 (ξ )h2 (η ), φ4 (ξ, η ) = h2 (ξ )h2 (η ), (14.136) and shown in figure 14.13. Note that the bilinear interpolation function above satisfy the C 0 continuity requirement. This can be easily verified by noting first that the interpolation along an edge involves only the collocation points along that edge, hence neighboring elements sharing an edge will interpolate the solution identically if the value of the function on the collocation points is unique. Another important feature of the bilinear interpolation is that, unlike the linear interpolation in triangular element, it contains the term of second degree: ξη . Hence the interpolation within an element is non-linear; it is only linear along edges. Before proceeding further, we introduce a new notation for interpolation with quadrilaterals to explicitely bring out the tensorized form of the interpolation functions. This is accomplished by breaking the single two-dimensional index m in equation 14.135 into two one-dimensional indices (i, j ) such that m = (j − 1)2 + i, where (i, j ) = 1, 2. The index i runs along the ξ direction and the index j along the η direction; thus m = 1 becomes identified with (i, j ) = (1, 1), m = 2 with (i, j ) = (2, 1), etc... The interpolation formula can now be written as 2 2 u(ξ, η ) = uij hi (ξ )hj (η ) (14.137) j =1 i=1 where uij are the function values at point (i, j ). With this notation in hand, it is now possible to write down arbitrarily high order Lagrangian interpolation in 2D using the tensor product formula. Thus, a 1-D interpolation using N points per 1D element can be extended to 2D via: N N uij hN (ξ )hN (η ) i j u(ξ, η ) = (14.138) j =1 i=1 The superscript N has been introduced on the Lagragian interpolants to stress that they are polynomials of degree N − 1 and use N collocation points per direction. The collocation points using 7th degree interpolation polynomials is shown in figure 14.14. Note, finally, that sum factorization algorithms must be used to compute various quantities on structured sets of points p, q in order to reduce the computational overhead. For example, the derivative of the function at point (p, q ), can be computed as: N uξ |ξp ,ηq = j =1 N uij i=1 dhN i dξ ξp hN (ηq ). j (14.139) 242 CHAPTER 14. FINITE ELEMENT METHODS Figure 14.14: Collocation points within a spectral element using 8 collocation points per direction to interpolate the solution; there are 82 =64 points in total per element. First the term in parenthesis is computed and saved in a temporary array, second, the final expression is computed and saved; this essentially reduces the operation from O(N 4 ) to O(N 3 ). Further reduction in operation count can be obtained under special circumstances. For instance, if ηq happens to be a collocation point, then hN (ηq ) = δjq and the j formula reduces to a single sum: N uξ |ξp ,ηq = 14.4.5 uiq i=1 dhN i dξ (14.140) ξp Evaluation of integrals The integrals needed to build the stiffness matrix for two dimensional quadrilateral elements are a bit more complicated then those encountered in triangular elements. This is primarily due to the lack of magic integration formula, and the more complex (nonconstant) mapping between physical and computational space. We start by considering the calculation of the elemental mass matrix Mm,n = A φm φn dA; in computational space and using the 2-index notation introduced earlier, this integral becomes: e Mij,kl = 1 1 −1 −1 hi (ξ ) hj (η ) hk (ξ ) hl (η ) |J | dξ dη (14.141) where m = (j − 1)P + i, and n = (k − 1)P + l. In order to bypass the tediousness of evaluating integrals analytically for each term that may be present in the Galerkin formulation, it is common practice to evaluate the integrals with Gauss quadrature. The order of the quadrature needed depends of course on the polynomial degree of the integrand and on whether exact integration is required. We now proceed to determine the quadrature order needed to integrate the mass matrix exactly. In the case of the bilinear map of equation 14.126, the Jacobian varies bilinearly over the element, hence it is linear in the variable ξ and η . Assuming the Lagrange interpolation uses P points in each direction, the integrand in the mass matrix is a polynomial of degree 2(P − 1) + 1 in each of the variables ξ and η . Thus, Gauss 14.4. TWO DIMENSIONAL PROBLEMS 243 quadrature of order Q will evaluate the integrals exactly provided Q ≥ P : Q Q Mij,kl = m=1 n=1 G G G G GG hi (ξm ) hj (ηn ) hk (ξm ) hl (ηn ) |Jmn |ωm ωn (14.142) GG G where Jmn denotes the Jacobian at the Gauss quadrature points (ξm , ηn ), and ωm are the Gauss quadrature weights. The only required operations are hence the evaluation of G the Lagrangian interpolants at the points ξm , and summations of the terms in 14.142. If Gauss-Lobatto integration is used, the roots and weights become those appropriate for the Gauss-Lobatto quadrature; the number of quadrature points needed to evaluate the integral exactly increases to Q ≥ P + 1. Like the one-dimensional case, the mass matrix can be rendered diagonal if inexact (but accurate enough) integration is acceptable. If the quadrature points and collocation G points coincide, hk (ξm ) = δim , and the expression for the mass matrix entries reduces to: Mij,kl = δik δjl ωk ωl |Jkl | (14.143) The integrals involved in evaluating the discrete Laplace operator, the matrix D , is substantially more complicated in the present instance. Here we continue with our reliance on Gauss quadrature and derive the terms that must be computed. Keeping with our 2 index notation, we have Dij,kl = A ∇φij · ∇φkl dA. (14.144) Most of the work comes from having to express the inner product in the integrand in computational space: ∇φij · ∇φkl = = = + ∂φij ∂φkl ∂φij ∂φkl + ∂x ∂x ∂y ∂y ∂ φkl ∂φij ∂φkl ∂ φij ∇ξ + ∇η · ∇ξ + ∇η ∂ξ ∂η ∂ξ ∂η ∂φij ∂φkl ∂φij ∂φkl ∇ξ · ∇ξ + ∇η · ∇η ∂ξ ∂ξ ∂η ∂η ∂ φij ∂φkl ∂φij ∂φkl + ∇ξ · ∇η. ∂ξ ∂η ∂η ∂ξ (14.145) (14.146) (14.147) Setting φij = hi (ξ )hj (η ) and φkl = hk (ξ )hl (η ), and evaluating the integrals using Gauss quadrature, we get: Q Q Dij,kl = n=1 m=1 Q Q + n=1 m=1 Q Q + n=1 m=1 Q Q + n=1 m=1 h′ (ξm ) hj (ηn ) h′ (ξm ) hl (ηn ) [∇ξ · ∇ξ |J |]m,n ωm ωn i k hi (ξm ) h′ (ηn ) hk (ξm ) h′ (ηn ) [∇η · ∇η |J |]m,n ωm ωn j l h′ (ξm ) hj (ηn ) hk (ξm ) h′ (ηn ) [∇ξ · ∇η |J |]m,n ωm ωn i l hi (ξm ) h′ (ηn ) h′ (ξm ) hl (ηn ) [∇ξ · ∇η |J |]m,n ωm ωn (14.148) j k 244 CHAPTER 14. FINITE ELEMENT METHODS where the expressions in bracket are evaluated at the quadrature points (ξm , ηn ) ( we have omitted the superscript G from the quadrature points). Again, substantial savings can be achieved if the Gauss-Lobatto quadrature of the same order as the inteporlation polynomial is used. The expression for Dij,kl reduces to: Q Dij,kl = δjl m=1 Q + δik n=1 h′ (ξm ) h′ (ξm ) [∇ξ · ∇ξ |J |]m,j ωm ωj i k h′ (ηn ) h′ (ηn ) [∇η · ∇η |J |]i,n ωi ωn j l + h′ (ξk ) h′ (ηj ) [∇ξ · ∇η |J |]k,j ωk ωj i l + h′ (ξi ) h′ (ηl ) [∇ξ · ∇η |J |]i,l ωi ωl k j 14.5 (14.149) Time-dependent problem in 1D: the Advection Equation Assume we attempt to solve the 1D constant-coefficient advection equation: ut + cux = 0, x ∈ Ω (14.150) subject to appropriate initial and boundary conditions. The variational statement that solves the above problem is: Find u such that Ω (ut + cux )vdx = 0, ∀v (14.151) The Galerkin formulation reduces the problem to a finite dimensional space by replacing the solution with a finite expansion of the form N u= ui (t)φi (x) (14.152) i=1 and setting the test functions to v = φj , j = 1, 2, . . . , N . Note that unlike the steady state problems encountered earlier ui , the value of the function at the collocation points, depends on time. Replacing the finite expansion in the variational form we get the following ordinary differential equation (ODE) M du + Cu = 0 dt (14.153) where u is the vector of solution values at the collocation points, M is the mass matrix, and C the matrix resulting from discretization of the advection term using the Galerkin procedure. The entries of M and C are given by: L Mj,i = φi φj dx 0 L Cj,i = c 0 ∂φi φj dx ∂x (14.154) (14.155) 14.5. TIME-DEPENDENT PROBLEM IN 1D: THE ADVECTION EQUATION 245 We follow the procedure outlined in section 14.2.7 to build the matrices M and C . We also start by looking at linear interpolation functions as those defined in equation 14.60. The local mass matrix is given in equation 14.68; here we derive expressions for the advection matrix C assuming the advective velocity c is constant. and advection matrices are xj Cji = x j −1 hj dhi dx = c dx 1 −1 hi (ξ ) dhj (ξ ) c dξ, C = dξ 2 −1 1 −1 1 (14.156) After stiffness assembly, the global system of equations becomnes: ∆x d c (uj −1 + 4uj + uj +1 ) + (−uj −1 + uj +1 ) = 0 6 dt 2 (14.157) The approximation of the advective term using linear finite element has resulted in a centered-difference approximation for that term; whereas the time-derivative term has produced the mass matrix. Notice that any integration scheme, even an explicit ones like leap-frog, would necessarily require the inversion of the mass matrix. Thus, one can already anticipate that the computational cost of solving the advection using FEM will be higher then a similarly configured explicit finite difference method. For linear elements in 1D, the mass matrix is tridiagonal and the increase in cost is minimal since tridiagonal solvers are very efficient. Quadratic elements in 1D lead to a pentadiagonal system and is hence costlier to solve. This increased cost may be justifiable if it is compensated by a sufficient increase in accuracy. In multi-dimensions, the mass matrix is not tridiagonal but has only limited bandwidth that depends on the global numbering of the nodes; thus even linear elements would require a full matrix inversion. Many solutions have been proposed to tackle the extra-cost of the full matrix. One solution is to use reduced integration using Gauss-Lobatto quadrature which as we saw in the Gaussian quadrature section leads immediately to a diagonal matrix; this procedure is often referred to as mass lumping. For low order elements, mass lumping degrades significantly the accuracy of the finite element method, particularly in regards to its phase properties. For the 1D advection equation mass lumping of linear elements is equivalent to a centered difference approximation. For high order interpolation, i.e. higher then degree 3, the loss of accuracy due to the inexact quadrature is tolerable, and is of the same order of accuracy as the interpolation formula. Another alternative revolves around the use of discontinuous test functions and is appropriate for the solution of mostly hyperbolic equations; this approach is dubbed the Discontinuous Galerkin method, and will be examined in a following section. The system of ODE can now be integrated using one of the time-stepping algorithms, for example second order leap-frog, third order Adams-Bashforth, or one of the RungeKutta schemes. For linear finite elements, the stability limit can be easily studied with the help of Von Neumann stability analysis. For example, it is easy to show that a leapfrog scheme applied to equation will result in a stability limit of the form µ = c∆t/∆x < √ 1/ 3; and hence is much more restrictive then the finite difference scheme which merely requires that the Courant number be less then 1. However, an examination of the phase property of the linear FE scheme will reveal its superiority over centered differences. 246 CHAPTER 14. FINITE ELEMENT METHODS Numerical phase speed of various spatial discretization Group velocity 1 1 0.9 0.5 0.8 FE1 0 0.6 CD6 −0.5 0.5 CD4 cg num /c an 0.7 CD2 CD4 −2 CD6 c −1 −1.5 0.4 CD2 0.3 0.2 −2.5 0.1 0 0 0.2 0.4 k ∆ x/π 0.6 0.8 1 −3 0 FE1 0.2 0.4 k∆ x/π 0.6 0.8 1 Figure 14.15: Comparison of the dispersive properties of linear finite elements with centered differences, the left panel shows the ratio of numerical phase speed to analytical phase speed, and the right panel shows the ratio of the group velocity We study the phase properties implied by the linear finite element discretization by looking for the periodic solution, in space and time, of the system of equation 14.157: u(x, t) = ei(kxj −σt) . We thus get the following dispersion relationship and phase speed: σ= cF c = 3c sin k∆x ∆x 2 + cos k∆x 3 sin k∆x k∆x(2 + cos k∆x) (14.158) (14.159) The numerical phase speed should be contrasted to the one obtaiend from centered second and fourth order finite differences: cCD2 c cCD4 c = = sin k∆x k ∆x 1 sin k∆x sin 2k∆x − 4 3 k ∆x 2k∆x (14.160) (14.161) Figure 14.15 compares the dispersion of linear finite element with that of centered difference schemes of 2, 4 and 6 order. It is immediately apparent that the FE formulation yields more accurate phase speed at all wavenumbers, and that the linear interpolation is equivalent, if not slightly better then, a sixth-order centered FD approximation; in particular the intermediate to short waves travel slightly faster then in FD. The group velocity, shown in the right panel of figure 14.15, shows similar results for the long to intermediate waves. The group velocity of the short wave are, however, in serious errors for the FE formulation; in particular they have a negative phase speed and propagate upstream of the signal at a faster speed then the finite difference schemes. A mass-lumped version of the FE method would collapse the FE curve onto that of the centered second order method. 14.5. TIME-DEPENDENT PROBLEM IN 1D: THE ADVECTION EQUATION 247 14.5.1 Numerical Example As an example of the application of the FEM to the the inviscid 1D advection equation we solve the following problem: 12 ut + ux = 0 on − 1 ≤ x ≤ 1, u(−1, t) = 0, u(x, 0) = e256(x+ 2 ) (14.162) The initial condition consists of an infinitely smooth Gaussian hill centered at x = −1/2. The solution at time t = 1 should be the same Gaussian hill but centered at x = 1/2. The FEM solutions are shown in figure 14.16 where the consistent mass (exact integration) and “lumped” mass solutions are compared, for various interpolation orders; the number of elements was kept fixed and the convergence study can be considered an p-refinement strategy. The number of element was chosen so that the hill is barely resolved for linear elements (the case m = 2). It is obvious that the solution improves rapidly as m (the number of interpolation points within an element) is increased. The lumped mass solution is very poor indeed for the linear case where its poor dispersive properties have generated substantial oscillatory behavior. The situation improves substantially when the interpolation is increased to quadratic; the biggest discrepancy occuring near the upstream foot of the hill. The lumped and consistent mass solutions are indistinguishable for m = 5. One may object that the improvements are solely due to the increased resolution. We have repeated the experiment above trying to keep the total number of degrees of freedom fixed; in this case we are trying to isolate the effects of improved interpolation order. The first case considered was an under-resolved case using 61 total degrees of freedom, and the comparison is shown in figure 14.17. The figure shows indeed the improvements in the dispersive characteristics of the lumped mass approximation as the degree of the polynomial is increased. The consistent and lumped solutions overlap over the main portion of the signal for m ≥ 4 but differ over the (numerically) dispersed waves trailing the hill. These are evidence that the two approximations are still under-resolved. The solution in the resolved case is shown in figure 14.18 where the total number of degrees of freedom is increased to 121. In this case the two solution overlap for m ≥ 3. The two solutions are indistinguishable for m = 5 over the entire domain, evidence that the solution is now well-resolved. Notice that the dispersive error of the lumped mass in the linear interpolation case is still quite in error and further increase in the number of elements is required; these errors are entirely due to mass-lumping as the consitent mass solution is free of dispersive ripples. The lesson of the previous example is that the dispersive properties of the lumpedmass solution is quite good provided enough resolution is used. The number of elements needed to reach the resolution threshold decreases dramatically as the polynomial order is increased. The dispersive properties of the lumped-mass linear interpolation FEM seem to be the worse, and seem to require a very-well resolved solution before the effect of the mass-lumping is eliminated. One then has to weigh the increased cost of a masslumped large calculation versus that of a smaller consistent-mass calculation to decided whether lumping is cost-effective or not. 248 CHAPTER 14. FINITE ELEMENT METHODS 1 0.5 0 −1 m=2 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 −1 m=3 −0.8 1 0.5 0 −1 m=4 −0.8 1 0.5 0 −1 m=5 −0.8 Figure 14.16: Solution of the advection equation with FEM. The black lines show the result of the consistent mass matrix calculation and the red lines show that of the “lumped” mass matrix calculation. The discretization consisted of 40 equally spaced elements on the interval [0 1], and a linear (m=2), quadratic (m=3), cubic (m=4) and quartic (m=5) interpolation. The time stepping is done with a TVD-RK3 scheme (no explicit dissipation included). 14.5. TIME-DEPENDENT PROBLEM IN 1D: THE ADVECTION EQUATION 249 1 0.5 0 −1 1 0.5 0 −1 1 0.5 0 −1 1 0.5 0 −1 1 0.5 0 −1 1 0.5 0 −1 m=2, E=60 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 m=3, E=30 −0.8 −0.6 m=4, E=20 −0.8 −0.6 m=5, E=15 −0.8 −0.6 m=6, E=12 −0.8 −0.6 m=7, E=10 −0.8 −0.6 Figure 14.17: Consistent versus lumped solution of the advection equation with FEM for a coarse resolution with the total number of degrees of freedom fixed, N ≈ 61. The black and red lines refer to the consistent and lumped mass solutions, respectively. 250 CHAPTER 14. FINITE ELEMENT METHODS 1 0.5 0 −1 m=2, E=120 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 −1 m=3, E=60 −0.8 −0.6 1 0.5 0 −1 m=4, E=40 −0.8 −0.6 1 0.5 0 −1 m=5, E=30 −0.8 −0.6 Figure 14.18: Consistent versus lumped solution of the advection equation with FEM for a coarse resolution with the total number of degrees of freedom fixed, N ≈ 121. The black and red lines refer to the consistent and lumped mass solutions, respectively. 14.6. THE DISCONTINUOUS GALERKIN METHOD (DGM) 14.6 251 The Discontinuous Galerkin Method (DGM) A major drawback of the continuous Galerkin method is the requirement to maintain C 0 continuity; a requirement that leads to a tight coupling between neighboring elements. In particular, it causes the mass matrix to be global which necessitates matrix inversion to time-step the solution. There are classes of problem where the C 0 continuity is not necessary; test and trial functions can then be made discontinuous, and the solution process becomes considerably more cost effective. This new formulation of the Galerkin method has been dubbed the Discontinuous Galerkin Method (DGM). It is most suitable to compute the solution to problems governed primarily by hyperbolic (like the pure advection equation or the shallow water equation) or predominantly hyperbolic (like the advection-diffusion equation with high Peclet number). In the following we describe the formulation of DGM for the simple case of a pure advection equation: Tt + v · ∇T = 0 (14.163) where v is the advective velocity field. If v is divergence-free, that is ∇ · v = 0, the advection equation can be written in the conservative form Tt + ·∇F = 0, F = vT (14.164) where F is the flux and is a function of T ; equation 14.164 is written in the form of a conservation law. We suppose that the domain of interest has been divided into elements. Then the following variational statement applies inside each element: E (Tt + ·∇F)w dV = 0 (14.165) where w are the test functions, and E is the element of interest. If w is sufficiently smooth, we can integrate the divergence term by part using the Gauss theorem to obtain: E (Tt w − F · ∇w) dV + ∂E w F · n dS = 0 (14.166) where ∂E is the boundary of element E , n is the unit outward normal. The boundary integral represent a weighed sum of the fluxes leaving/entering the element. The discretization steps consist of replacing the infinite dimensional space of the test functions by a finite space, and representing the solution by a finite expansion Th . Since Th is discontinuous at the edges of elements, we must also replace the flux F(T ) by a numerical flux G that depends on the values of Th outside and inside the element: G = G(T i , T o ) (14.167) where T i and T o represent the values of the function on the edge as seen from inside element, and from the neighboring element, respectively. Physical intuition dictates that the right flux is that obtained by following the characteristic (Lagrangian trajectory). For the advection equation that means the outside value is used if n · v < 0 and the inside value is used if n · v > 0. 252 CHAPTER 14. FINITE ELEMENT METHODS Figure 14.19: Flow field and initial condition for Gaussian Hill experiment Proceeding as before and defining the local/elemental approximation to Th as N T (x) = Ti (t)φi (x) (14.168) i=1 and the degrees of freedom Ti are determined by the following ODE: M dT dt + G=0 Mji = Gj E = E (14.169) φi φj dV (14.170) φi G · n dS (14.171) where M is the local mass matrix; no assembly is required. The great value of DGM is that the variational statement operates one element at a time, and hence the mass matrix arising from the time-derivative is purely local. The only linkage to neighboring elements comes from considering the fluxes along the boundary. Thus, even if the matrix M is full, it is usually a small system that can be readily inverted, and the process of time integration becomes considerably cheaper. Another benefit of DGM is that it satisfies the local conservation property so dear to many oceanographic/coastal modeler, since the fluxes defined on each edge are unique. Finally, by dropping the continuity requirements it becomes quite feasible to build locally and dynamically adaptive solution strategies without worrying about global continuity of the function. In the next section we compare the performance of the DGM with that of the continuous formulation on several problems; all simulations will be performed using the spectral element interpolation. 14.6.1 Gaussian Hill Experiment The first experiment is designed to establish the convergence property of the different methods. To this end we choose the classical problem of advecting a passive tracer in 14.6. THE DISCONTINUOUS GALERKIN METHOD (DGM) 253 the unit square domain (0 ≤ x, y ≤ 1) by a rotating non-divergent flow given by: u = −ω (y − 1/2), v = ω (x − 1/2). (14.172) where ω is set to 2π . The initial condition is infinitely smooth and given by a Gaussian distribution r2 12 12 φ = e− l2 , r = + y− (14.173) x− 4 2 with an e-folding length scale l = 1/16. Periodic boundary conditions are imposed on all sides. All models were integrated for one rotation which time the solution should be identical to the initial condition. The convergence curves for the l2 norm of the error are displayed in 14.20 for the different formulations. The time integration consists of RK4 for traditional Galerkin method ( which we will refer to as the CGM), and DGM; the time step was chosen so that spatial errors are dominant. The convergence curves for CGM, and DGM are similar and indicate the exponential decrease of the error as the spectral truncation is increased for a constant elemental partition. In order to compare the benefits of h versus p refinements, we plot in the right panels of figure 14.20 the l2 error versus the total number of collocation points in each direction. The benefits of p-refinement for this infinitely smooth problem is readily apparent for CG: Given a fixed number of collocation points, the error is smallest for the smallest number of elements, and hence highest spectral truncation. (The number of collocation points is given by K (N − 1)+1, where N is the number of points per element and K is the number of elements). The situation is not as clear for DGM where the different curves tend to overlap. This is probably due to the discontinuous character of the interpolation: adjacent elements are holding duplicate information about the solution, and the total number of degrees of freedom grows like KN . 14.6.2 Cone Experiment The second experiment is designed to establish the behavior of the different methods in the presence of “mild” discontinuities. The flow field is the same as for the Gaussian hill, but the initial profile consists of a cone: u(x, y, t = 0) = max(0, 1 − 8r ), r = x− 1 4 2 + y− 1 2 2 . (14.174) The cone has a peak at r = 0 which decreases linearly to 0; there is a slope discontinuity at r = 1/8. The initial conditions contours are similar to the one depicted in figure 14.19. The same numerical parameters were used as for the Gaussian Hill problem. The presence of the discontinuity ruins the spectral convergence property of the spectral element method. This is born out in the convergence curves (not shown) which display a 1/N convergence rate only in the l2 norm for a fixed number of elements; h refinement is a more effective way to reduce the errors in the present case. In the following, we compare the performance of the different schemes using a single resolution, 10x10 elemental partition with 6 points per element. Figure 14.21 compares the solution 254 CHAPTER 14. FINITE ELEMENT METHODS ε vs number of points per element ε vs total number of degrees of freedom 2 2 −2 −2 10 10 2 −4 −4 3 10 10 2 4 5 −6 −6 10 2 3 ε ε 2 10 8 −8 −8 10 10 4 10 −10 5 −10 10 10 8 6 8 10 12 N 14 16 18 0 ε vs number of points per element 50 100 Ndf 150 10 200 ε vs total number of degrees of freedom 2 2 −2 −2 10 10 2 −4 −4 10 10 3 2 4 −6 −6 2 10 2 10 ε ε 5 3 −8 −8 10 10 10 8 4 −10 −10 10 10 10 6 8 10 12 N 5 14 16 18 0 50 100 Ndf 8 150 Figure 14.20: Convergence curve in the L2 norm for the Gaussian Hill initial condition using, from top to bottom, CGM, and DG. The labels indicate the number of elements in each direction. The abcissa on the left graphs represent the spectral truncation, and on the right the total number of collocation points. 200 14.6. THE DISCONTINUOUS GALERKIN METHOD (DGM) 255 Figure 14.21: Contour lines of the rotating cone problem after one revolution for CG (left), and DG (right), using the 10×6 grid. The contours are irregularly spaced to highlight the Gibbs oscillations. for the 4 schemes at the end of the simulations. We note that the contour levels are irregularly spaced and were chosen to highlight the presence of Gibbs oscillations around the 0-level contour. For CG, the oscillation are present in the entire computational region, and have peaks that reaches −0.03. Although the DG solution exhibits Gibbs oscillations also, these oscillations are confined to the immediate neighborhood of the cone. Their largest amplitude is one third that observed with CG. Further reduction of these oscillation require the use of some form of dissipation, e.g. Laplacian, high order filters, or slope limiters. We observe that CG shows a similar decay in the peak amplitude of the cone with DG doing a slightly better job at preserving the peak amplitude. Figure 14.22 shows the evolution of the largest negative T as a function of the grid resolution for CG and DG. We notice that the DG simulation produces smaller negative values, up to a factor of 5, than CG at the same resolution. 256 CHAPTER 14. FINITE ELEMENT METHODS 0 10 −1 10 −min(T) 5 7 5 9 7 9 −2 10 −3 10 0 10 1 10 K 2 10 Figure 14.22: Min(T) as a function of the number of element at the end of the simulation. The red lines are for DG and the blue lines for CG. The number of points per element is fixed at 5, 7 and 9 as indicated by the lables. Chapter 15 Linear Analysis 15.1 Linear Vector Spaces In the following we will use the convention that bold roman letter, such as x, denote vectors, greek symbols denote scalars (real or complex) and capital roman letters denote operators. 15.1.1 Definition of Abstract Vector Space We call a set V of vectors a linear vector space V if the following requirements are satisfied: 1. We can define an addition operation, denoted by ‘+’, such that for any 2 elements of the vector space x and y, the result of the operation z = (x + y) belongs to V . We say that the set V is closed under addition. Furthermore the addition must have the following properties: (a) commutative x + y = y + x (b) associative (x + y) + z = x + (y + z) (c) neutral element: there exist a null zero vector 0 such that x + 0 = x (d) for each vector x ∈ V there exist a vector y such that x + y = 0, we denote this vector by y = −x. 2. We can define a scalar multiplication operation defined between any vector x ∈ V and a scalar α such that αx ∈ V , i.e. V is closed under scalar multiplication. The following properties must also hold for 2 scalars α and β , and any 2 vectors x and y: (a) α(β x) = (αβ )x (b) Distributive scalar addition: (α + β )x = αx + β x (c) Distributive vector addition: α(x + y) = αx + αy (d) 1x = x (e) 0x = 0 257 258 CHAPTER 15. LINEAR ANALYSIS 15.1.2 Definition of a Norm In order to provide the abstract vector space with the sense of length and distance, we define the norm or length of a vector x, as the number x . In order for this number to make sense as a distance we put the following restrictions on the definition of the norm: 1. αx = |α| x 2. Positivity x > 0, ∀x = 0, and x = 0 ⇔ x = 0 3. Triangle or Minkowski inequality x + y ≤ x + y With the help of the norm we can define now the distance between 2 vectors x and y as x − y , i.e. the norm of their difference. So 2 vectors are equal or identical if their distance is zero. Furthermore, we can now talk about the convergence of a vector sequence. Specifically, we say that a vector sequence xn converges to x as n → ∞ if for any ǫ > 0 there is an N such that xn − x < ǫ ∀n > N . 15.1.3 Definition of an inner product It is generally usefull to introduce the notion of angle between 2 vectors x and y. We thus define the scalar (x, y) which must satisfy the following properties 1. conjugate symmetry: (x, y) = (y, x), where the overbar denotes the complex conjugate. 2. linearity: (αx + β y, z) = α(x, z) + β (y, z) 3. positiveness: (x, x) > 0, ∀x = 0, and (x, x) = 0 if x = 0. The above properties imply the Schwartz inequality: |(x, y)| ≤ (x, x) (y, y) (15.1) This inequality suggest that the inner product (x, x) can be defined as a norm, so that x = (x, x)1/2 With the definition of a norm we call 2 vectors orthogonal iff (x, y) = 0. Moreover iff (x, y) = x y the 2 vectors are colinear or aligned. 15.1.4 Basis A set of vectors e1 , e2 , . . . , en different from 0 are called a basis for the vector space V if they have the following 2 properties: 1. Linear independence: N i=1 αi ei = 0 ⇔ αi = 0∀i (15.2) If at least one of the αi is non-zero, the set is called linearly dependent, and one of the vectors can be written as a linear combination of the others. 15.1. LINEAR VECTOR SPACES 259 2. Completeness Any vector z ∈ V can be written as a linear combination of the basis vectors. The number of vectors needed to form a basis is called the dimension of the space V . Put in another way, V is N dimensional if it contains a set of N independent vectors but no set of (N + 1) of independent vectors. If N vectors can be found for each N , no matter how large, we say that the vector space is infinite dimensional. A basis is very usefull since it allows us to describe any element x of the vector space. Thus any vector x can be “expanded” in the basis e1 , e2 , . . . , en as: N x= αi ei i = α1 e1 + α2 e2 + . . . + αn en (15.3) This representaion is also unique by the independence of the basis vectors. The question of course if how to find out the coefficients αi which are nothing but the coordinates of x in ei . We can take the inner product of both sides of the equations to come up with the following linear system of algebraic equations for the αi : α1 (e1 , e1 ) +α2 (e2 , e1 ) + . . . +αn (en , e1 ) α1 (e1 , e2 ) +α2 (e2 , e2 ) + . . . +αn (en , e2 ) . . . = (x, e1 ) = (x, e2 ) (15.4) α1 (e1 , en ) +α2 (e2 , en ) + . . . +αn (en , en ) = (x, en ) The coupling between the equations makes it hard to compute the coordinates of a vector in a general basis, particularly for large N . Suppose however that the basis set is mutually orthogonal, that is every every vector ei is orthogonal to every other vector ej , that is (ei , ej ) = 0 for all i = j . Another way of denoting this mutual orthogonality is by using the Kronecker delta function, δij : δij = 1 i=j 0 i=j (15.5) The orthogonality property can then be written as (ei , ej ) = δij (ej , ej ), and the basis is called orthogonal. For an orthogonal basis the system reduces to the uncoupled (diagonal) system (ej , ej )αj = (x, ej ) (15.6) and the coordinates can be computed easily as αj = (x, ej ) (x, ej ) = (ej , ej ) ej 2 (15.7) The basis set can be made orthonormal by normalizing the basis vectors (i.e. rescaling ej by such that their norm is 1), then αj = (x, ej ). 260 CHAPTER 15. LINEAR ANALYSIS 15.1.5 Example of a vector space Consider a set V whose elements are defined by N tuples, i.e. each element x of V is identified by N scalar fields (ξ1 , ξ2 , . . . , ξn ) Let us define the addition of 2 vectors x, defined as above, and y defined by the N -tuples (η1 , η2 , . . . , ηn ) as the vector z whose N -tuples are σi = ξi + ηi . Furthermore, we define the vector z = αa by its N -tuples σi = αξi . It is then easy to verify that this space endowed with the vector addition and scalar multiplication defined above fullfill the requirements of a vector space. A norm for this vector space can be easily defined by the so-called p norm where 1 p N x p p = i=1 |ξ | (15.8) A particularly usefull norm is the 2-norm (p = 2), also, called the Euclidean norm. Other usefull norms are the 1-norm (p = 1) and the infinity norm: x ∞ = lim x p→∞ p = max |ξj |. 1≤j ≤N (15.9) An inner product for this vector space can be defined as: N ξi η i (x, y) = (15.10) i=1 It can be easily verified that this inner product satifies all the needed requirements. Furthermore, the norm introduced by this inner product is nothing but the 2-norm mentioned above. An orthonormal basis for the vector space V is given by; e1 = (1, 0, 0, 0, . . . , 0) e2 = (0, 1, 0, 0, . . . , 0) . . . (15.11) en = (0, 0, 0, 0, . . . , 1) is a complete and independent vector set. Thus V is N -dimensional. 15.1.6 Function Space Spaces where the vectors are functions occupy an important place in linear analysis. Their properties as linear spaces are harder to manipulate as they are infinite dimensional spaces. For example the space of all continuous functions defined on the interval a ≤ t ≤ b and which we denote by C (a, b) is a linear vector space where vector addition and scalar multiplication are defined in an obvious way. The inner product on this vector space is the continuous analogue of the inner product defined for the N -tuple space. Suppose for a moment that the functions x(t) and y(t) are defined on N equally spaced points on 15.1. LINEAR VECTOR SPACES 261 the interval [a, b] (i.e. discrete space) by their pointwise values xi and yi at the points ti , and let us define the discrete inner product as the Riemann type sum: b−a N (x, y) = N (15.12) xi yi i=1 In the limit N tends to infinity the above discrete sum become N b−a N →∞ N b (x, y) = lim xi yi = i=1 x(t)y(t) dt. (15.13) a Two functions are said to be orthogonal if (x, y) = 0. Similarly we can define the p-norm of a function as: b x p = a |x(t)|p dt (15.14) The 1-norm, 2-norm and ∞ norms follow by setting p = 1, 2. and ∞. The main difficulties with function spaces are twofold. First they are infinite dimensional and thus require an infinite set of vectors to define a basis; proving completeness is hence difficult. Two, functions are usually defined in a continuum where limits may be in or outside the vector space. Some interesting issue arise. Consider for example the sequence of functions 0, −1 ≤ t ≤ 0 1 (15.15) x(t) = nt, 0≤t≤ n 1 1, ≤t≤1 n defined on C (−1, 1). This sequence converges to the step function, also known as the Heaviside function, H (t) which does not belong to C (−1, 1) since it is discontinuous. That although the sequence xn is in C (−1, 1), its limit as n → ∞ is not; we say that the space does not contain its limit points and hence is not closed. This is akin to the space C (−1, 1) having holes in it. This is rather unfortunate as closed spaces are more tractable. It is possible to create a closed space by changing the definition of the space slightly to that of the Lebesgue space L2 (a, b), namely the space of functions that are square integrable on the interval [a, b], i.e. b x 2 = a |x(t)|2 dt < ∞. (15.16) L2 (a, b) is an example of a Hilbert space – a closed inner product space with the 1 1 x = (x, x) 2 – a closed inner product space with the x = (x, x) 2 . The issue of defining a basis function for a function space is complicated by infinite dimension of the space. Assume I have an infinite set of linearly independent vectors, if I remove a single element of that set, the set is still infinite but clearly cannot generate the space. It turns out that it is possible to prove completeness but we will defer the discussion until later. For the moment, we assume that it is possible to define such a 262 CHAPTER 15. LINEAR ANALYSIS basis. Furthermore, this basis can be made to be orthogonal. The Legendre polynomials Pn are an orthogonal set of functions over the interval [−1, 1], and the trigonometric functions einx are orthogonal over [−π, π ]. Suppose that an orthogonal and complete basis ei (t) has been defined, then we can expand a vector in this basis function: x= ∞ αi ei (15.17) i=1 The above expansion is referred to as a generalized Fourier series, and the αi are the Fourier coefficients of x in the basis ei . We can also follow the procedure outlined for the finite-dimensional spaces to compute the αi ’s by taking the inner product of both sides of the expansion. The determination of the coordinate is, again, particularly easy if the basis is orthogonal (x, ei ) (15.18) αi = (ei , ei ) In particular, if the basis is orthonormal αi = (x, ei ). In the following we show that the Fourier coefficients are the best approximation to the function in the 2-norm, i.e. the coefficients αi minimize the norm x − i αi ei 2 . We have: x− αi ei 2 2 i = (x − i = (x, x) − αi ei , x − αi ei ) αi (x, ei ) − i (15.19) i αi αj (ei , ej(15.20) ) αi (ei , x) + i i j The orthogonality of the basis functions can be used to simply the last term on the right hand side to i |αi |2 . If we furthermore define ai = (x, ei ) we have: 2 2 = x 2 x = x 2 2 (15.21) [(ai − αi )(ai − αi ) − ai ] (15.22) |ai − αi |2 − (15.23) + i = [αi αi − ai αi − ai αi ] i αi ei i i x− + + i |a|2 i Note that since the first and last terms are fixed, and the middle term is always greater or equal to zero, the left hand side can be minimized by the choice αi = ai = (x, ei ). The minimum norm has the value ||x||2 − i |ai |2 . Since this value must be always positive then |ai | ≤ ||x||2 , (15.24) i A result known as the Bessel inequality. If the basis set is complete and the minimum norm tend to zero as the number of basis functions increases to infinity, we have: i |ai | = ||x||2 , (15.25) which is known as the Paserval equality; this is a generalized “Pythagorean Theorem”. 15.2. LINEAR OPERATORS 15.1.7 263 Pointwise versus Global Convergence Consider the 2 functions x(t) and y(t) defined on [−1, 1] as follows: x(t) = |t| y(t) = (15.26) |t|, 1, for |t| > 0 x=0 (15.27) Both functions belong to C (−1, 1) and are identical for all t except t = 0. If we use the 2-norm to judge the distance between the 2 functions, we get that x − y 2 = 0, hence the functions are the same. However, in the maximum norm, the 2 functions are not identical since x − y ∞ = 1. This example makes apparent that the choice of norms is critical in deciding whether 2 functions are the same or different, 2 functions that may be considered identical in one norm can become different in an another norm. This is simply an apparent contradiction and reflects the fact that different norms measure different things. The 2-norm for example looks at the global picture and asks if the 2 functions are the same over the interval; this is the so-called mean-square convergence. The infinity norm on the other hand measures pointwise convergence. 15.2 Linear Operators An operator or (transformation) L, we mean a mapping from one vector space, called the domain D , to another vector space called the range R. L thus describes functions operating on vectors. As with functions, we require L to be uniquely valued although not necessarily one-to-one, i.e. we may have Lx = Ly = z for x = y. However, Lx = Ly = z implies x = y, we say that L is one-to-one, i.e. each vector z in R has a single corresponding x such that Lx = z. Two operators A and B are equal if they have the same domain and if Ax = B y, ∀x, y. Finally, we say that I is the identity operator if I x = x, ∀x and Ø is the null operator if Øx = 0∀x. Operation addition and multiplication by a scalar can be defined. The operator C = A + B is defined as C x = Ax + B y, and C = αA is defined as C x = α(Ax). The product of two operators C = AB is defined as C x = A(B x). We can easily see that operator addition commutes (since vector addition must commute), whereas AB need not equal BA, when they do, we say that A and B commute. We call an operator linear if L(αx + β y) = αLx + βLy (15.28) An operator is bounded if there is a positive constant c such that Lx < c, ∀x ∈ D . The smallest suitable bound is called the norm of the operator, thus: L = lubx=0 Lx x (15.29) An adjoint to the operator L is the operator L∗ such that (Lx, y) = (x, L∗ y), ∀x, y (15.30) 264 CHAPTER 15. LINEAR ANALYSIS It is easy to show the following properties: (L∗ )∗ = L ∗ (15.31) ∗ (15.32) ∗ (A + B ) ∗ ∗ (15.33) = A +B ∗ (AB ) = B +A If L∗ = L we call the operator self adjoint. 15.3 Eigenvalues and Eigenvectors Non trivial solutions x = 0 to the equation Lx = λx, (15.34) are called eigenvectors, the scalars λ are called eigenvalues. The statement above asks essentially if there are special vectors which when transformed by L produce parallel vectors, the ratio of lengths of these two vectors is the eigenvalue. The above equation can be restated as find the non-trivial solution to the homogeneous equation (L − λI )x = 0. (15.35) Usually the eigenvalues and eigenvectors occur in pairs. If the operator L is a matrix the λ’s can be determined by solving the characteristic equations det(L − λI ) = 0. The following results are very important: 1. The eigenvalues of a self adjoint operator are all real, and the eigenvectors corresponding to distinct eigenvalues are mutually orthogonal. 2. For self-adjoint operator L on a finite dimensional domain V , k mutually orthogonal eigenvectors can be found for each eigenvectors of multiplicity k. 3. The 2 properties above imply that the eigenvectors of a self-adjoint operators form a basis for the finite-dimensional space V . The situation is substantially more complicated for infinite dimensional spaces, and is taken care of by the Sturm Liouville theory. 15.4 Sturm-Liouville Theory We will change our notation and drop the bold face of the vector notation. Given the 2 functions f and g on the interval a ≤ t ≤ b, we define first an inner product of the form: b f (t)g(t)w(t) dt (f, g) = (15.36) a where w(t) > 0 on a < t < b (this is a more general inner product that the one defined earlier which correspond to w(t) = 1). Let the operator L be defined as follows: d dy 1 p(t) + r (t) w(t) dt dt αy (a) + βy ′ (a) = 0, γy (b) + δy ′ (b) = 0 Ly = (15.37) (15.38) 15.4. STURM-LIOUVILLE THEORY 265 1.4 2 1.2 1.5 1 1 0.8 0.5 0.6 0 0.4 −0.5 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 15.1: Left: step function fit f = 1; blue line is for N = 1, black line for N = 7, and red line for N = 15. Right: Fourier expansion to f = esin πx , blue line is for N = 1, red for N = 2, and black for N = 3; the circles show the function f In the present case L is a differential operator subject to homogeneous boundary conditions. It is easy to show that the operator L is self adjoint under the inner product defined in 15.36. Furthermore, the eigenvalue problem defined by Ly + λy = 0 (15.39) is called a Sturm-Liouville system. Sturm-Liouville systems have the crucial theorem that Theorem 2 If both p(t) and w(t) are analytic and positive over the interval [a, b], where a and b are finite, then the eigenfunctions of the Sturm-Liouville system are complete over L2 (a, b). Completeness is also known to hold under other circumstances. For example if p(t) vanishes at one or both end points, then the boundary conditions can be replaced with the requirement that y or y ′ be finite there. If it happens that p(a) = p(b) we can replace the boundary conditions by periodicity conditions y (a) = y (b) and y ′ (a) = y ′ (b). Note carefully that the domain of the Sturm Liouville system is the set of function that are twice differentiable and that satisfy certain boundary conditions, and yet the theorem asserts completeness over a much broader space L2 , which contains functions that need not be even continuous and that need not satify the boundary conditions. The various special forms of the Sturm Liouville problems gives rise to various commonly known Fourier series. For example the choice of w = p = 1 and r = 0 gives rise to the Fourier trigonometric series. For w = 1, p = 1 − t2 we get the Fourier Legendre series, etc... Example 18 The function y (t) = 1 on 0 ≤ t ≤ 1 can be expanded in the trigonometrics series N =1 fm sin(mπx). It is easy to verify that m 1 (sin mπx, sin kπx) = 0 1 sin kπx sin mπxdx = δkm 2 (15.40) 266 CHAPTER 15. LINEAR ANALYSIS m 0 1 2 3 4 5 6 7 8 9 10 11 Am 0.000000000000000 1.13031820798497 4.968164449421373×10−18 -4.433684984866381×10−02 -3.861917048379140×10−17 5.429263119140263×10−04 -3.898111864745672×10−16 -3.198436462370136×10−06 3.363168791013840×10−16 1.103677177269183×10−08 -4.748619297285671×10−17 -2.497959777896152×10−11 Bm 0.532131755504017 4.097072104153782×10−17 -0.271495339534077 -5.790930955238388×10−17 5.474240442093724×10−03 2.457952426063564×10−17 -4.497732295430222×10−05 5.047870580187706×10−17 1.992124806619515×10−07 -1.091272736411650×10−16 -5.505895970193617×10−10 1.281761515677824×10−16 Table 15.1: Fourier coefficients of example 19 4 and that the Fourier coefficients are given by: fm = 0, for even m, and fm = mπ , for odd m. Figure 15.1 illustrates the convergence of the series as the number of Fourier functions retained, N , increases. Example 19 The function f = esin πx is periodic over the interval [−1, 1] and can be expanded into a Fourier series of the following form: N f (x) = Am cos mπx + Bm sin mπx (15.41) m=0 The Fourier coefficients can be determined from: Am = 1 sin πx cos mπxdx −1 e , 1 2 −1 cos mπxdx Bm = 1 sin πx sin mπxdx −1 e 1 2 −1 sin mπxdx (15.42) Since the integrals cannot be evaluated analytically, we compute them numerically using a very high order methods. The first few Fourier coefficients are listed in table 15.1. Notice in particular the rapid decrease of |Am | and |Bm | as m increases, and the fact that with 3 Fourier modes the series expansion and the original function are visually identical. 15.5 Application to PDE The method of separation variables relies on the Sturm Liouville theory to generate a basis made up of eigenfunctions for the function space where the solution is sought. The problem boils down to finding the Fourier coefficients of the solution in that basis. The type of eigenfunctions used depends on the geometry of the domain, the boundary conditions and the partial differential equations. 15.5. APPLICATION TO PDE 267 Example 20 Let us take the example of the Laplace equation ∇u = 0 defined on the rectangular domain 0 ≤ x ≤ a and 0 ≤ y ≤ b, and subject to the boundary conditions that u = 0 on all boundaries except the top boundary where u(x, y = b) = v (x). Separation of variables assumes that the solution can be written as the product of functions that depend on a single independent variable: u(x, y ) = X (x)Y (y ). When this trial solution is substituted in the PDE, we can derive the identity: Yyy Xxx =− X Y (15.43) Since the left hand side is a function of x-only, and the right hand side a function of y only, and the equality must hold for arbitrary x and y the two ratios must be equal to a constant which we set to −λ2 , and we end up with the 2 equations Xxx + λ2 X = 0 (15.44) 2 (15.45) Yyy − λ Y =0 Notice that the 2 equations are in the form of a Sturm-Liouville problem. The solutions of the above 2 systems are the following set of functions: X = A cos λx + B sin λx (15.46) Y (15.47) = C cosh λy + D sinh λy where A, B , C and D are integration constants. Applying the boundary conditions at x = 0 and y = 0, we deduce that A = C = 0. The boundary condition at x = b produces the equation B sin λa = 0 (15.48) The solution B = 0 is rejected since this would result in the trivial solution u = 0, and hence we require that sin λa = 0 which yields the permissible values of λ: λn = nπ , n = 1, 2, . . . . a (15.49) There are thus an infinite number of possible λ which we have tagged by the subscript n, with correponding Xn (x), Yn (x), and unknown constants. Since the problem is linear the sum of these solution is also a solution and hence we set u(x, y ) = ∞ En sin λn x sinh λn y (15.50) n=1 where we have set En = Bn ∗ Dn . The last unused boundary condition determines the constants En with the equation v (x) = ∞ En sin λn x sinh λn b (15.51) n=1 It is easy to show that the functions sin λn x are orthogonal over the interval 0 ≤ x ≤ a, i.e. b a (15.52) sin λn x sin λm x dx = δnm 2 a 268 CHAPTER 15. LINEAR ANALYSIS which leads to En = 2 a b a sin λn x v (x) dx. (15.53) The procedure outlined above can be reintrepeted as follows: the sin λn x are the eigenfunctions of the Sturm Liouville problem 15.44 and the eigenvalues are given by 15.49; the basis function is complete and hence can be used to generate the solution as in equation 15.50; the coordinates of the solution in that basis are determined by 15.53. The eigenvalues and eigenfunctions depend on the partial differential equations, the boundary conditions, and the shape of the domain. The next few examples illustrate this dependence. Example 21 The heat equation ut = ν (uxx + uyy ) in the same domain as example 20 subject to homogeneous Dirichlet boundary conditions on all sides will generate the eigenπx values eigenfunction pairs sin ma , in the x-direction and sin nπy in the y -direction. b The solution can be written as the double series: u(x, y, t) = ∞ ∞ Amn sin n=1 m=1 nπy −αmn t mπx sin e , αmn = − a b m2 π 2 n 2 π 2 +2 a2 b (15.54) Changing the boundary conditions to homogeneous Neumann conditions in the y -direction will change the eigenfunctions to cos( nπy ). If the boundary conditions is homogeneous b Neumann at the bottom and Dirichlet at the top, the eigenpairs in the y -directions become cos (2n+1)πy . b Example 22 Solution of the wave equation utt = c2 ∇2 u in the disk 0 ≤ r ≤ a using cylindrical coordinates will produce the following set of equations in each independent variable: Ttt + κ2 c2 T 2 Θθθ 22 =0 (15.55) +λ Θ = 0 (15.56) 2 2 r Rrr + rRr + (κ r − λ )R = 0 (15.57) Since the domain is periodic in the azimuthal direction, we should expect a periodic solution and hence λ must be an integer. The radial equation is nothing but the Bessel equation in the variable κr , its solutions are given by R = An Jn (κr ) + Bn Yn (κr ). Bn = 0 must be imposed if the solution is to be finite at r = 0. The eigenvalues κ are determined by imposing a boundary condition at r = a. For a homogeneous Dirichlet conditions, the eigenvalues are determined by the roots ξmn of the Bessel functions Jn (ξm ) = 0, and hence κmn = ξmn /a. Note that κmn is the radial wavenumber. The solution can now be expanded as: u(r, θ, t) = ∞ ∞ Jn (κmn r ) [(Amn cos nθ + Bmn sin nθ ) cos σmn t + n=0 m=0 (Cmn cos nθ + Dmn sin nθ ) sin σmn t] (15.58) 15.5. APPLICATION TO PDE 269 where σmn = κmn c is the time frequency. The integration constants must be determined from the initial conditions of which there must be 2 since we have a second derivative in time. In the present examples the radial eigenfunctions are given the Bessel functions of the first kind and order n and the eigenvalues are determined by the roots of the Jn . Notice that the Bessel equation is also a Sturm Liouville problem and hence the basis Jn (κmn r ) must be complete and orthogonal. Periodicity in the azimuthal direction yields the trigonometric functions, and quantizes the eigenvalues to the set of integers. 270 CHAPTER 15. LINEAR ANALYSIS Chapter 16 Rudiments of Linear Algebra 16.1 Vector Norms and Angles We need to generalize the notion of distance and angles for multi-dimensional systems. These notions are intuitive for two and three dimensional vectors and provide a basis for the generalization. This generalization does not prescribe a single formula for a norm or distance but lists the properties that must be satisfied for a measure to be called a distance or angle. 16.1.1 Vector Norms A vector norm is defined as the measure of a vector in real-number space, i.e. it is a function that associated a real (positive) number for each member of the vector space. The norm of a vector u is denoted by u and must satisfy the following properties: 1. positivity: u ≥ 0, and if u = 0, then u = 0. All norms are positive and only the null vector has 0 norm. 2. Scalar multiplication: αu = |α| u , for any scalar α. 3. triangle inequality: u + v ≤ u + v for any two vectors u and v. For a vector u = (u1 , u2 , . . . , uN ) an operator that defines a norm is the so-called Lp norm defined as: 1 p N u p = i=1 The following are frequently used values for p: 1-norm: p = 1, u 1 = N |ui |; i=1 2-norm: p = 2, u 2 = max-norm: p = ∞, u ∞ N 2 i=1 |ui | ; = maxi |ui |. 271 |ui |p (16.1) 272 CHAPTER 16. RUDIMENTS OF LINEAR ALGEBRA 16.1.2 Inner product An inner product is an operation that associates a real number between any two vectors u and v. It is usually denoted by (u, v) or u · v; and has the following properties: 1. commutative: (u, v) = (v, u). 2. linear under scalar multiplication: (αu, v) = α(u, v). 3. linear under vector addition: (u, v + w) = (u, v) + (u, v). 4. positivity (u, u) ≥ 0, and (u, u) = 0 implies u = 0. The norm and inner product definition allow us to defined the cosine of an angle between two vectors as: (u, v) cos θ = (16.2) uv The properties of the inner product allows it to defined a vector norm, often called the inner-product induced norm: u = (u, u). 16.2 Matrix Norms Properties of a matrix norm: • positivity: L > 0∀L, L = 0 implies L = 0. • scalar multiplication: αL = |α| L , where α is a scalar • triangle inequality: L + M ≤ L + M • LM ≤ L M Since matrices and vectors occur together, it is reasonable to put some conditions on their respective norms. Matrix and vector norms are compatible if: Lu ≤ L u ∀ u =0 (16.3) It is possible to use a vector norm to define a matrix norm, the so-called, subordinate norm: Lu = max Lu (16.4) L = max u u u =1 Here are some common matrix norms some satisfy the compatibility and can be regarded as subordinate norms: 1. 1-norm L sum. 2. ∞-norm L sum. 1 = maxj ( ∞ N i=1 |lij |) = maxi ( This is also referred to as the maximum column N j =1 |lij |) This is also referred to as the maximum row 3. 2-norm L 2 = ρ(LT L) where ρ is the spectral radius (see below). If the matrix L is symmetric LT = L, then L 2 = ρ(L2 ) = ρ(L). 16.3. EIGENVALUES AND EIGENVECTORS 16.3 273 Eigenvalues and Eigenvectors Eigenvalues and eigenvectors of a matrix L are the pairs (λ,u) such that Lu = λu, u = 0 (16.5) The eigenvalues can be determined by rewriting the definition as (L − λi I )ui = 0, and requiring that the solutions to this equation have non-trivial solution. The condition is that the determinant of the matrix be zero: det(L − λI ) = 0. This is the characteristic equation of the system. For an N × N matrix, the characteristic equation is an n-degree polynomial that admits n complex roots. These roots are precisely the eigenvalues of the system. Assuming, the eigenvalues of the matrix L are λi and ui , then the following properties hold: • The transpose of the matrix, LT has the same eigenvalues and eigenvectors as the matrix L. • The matrix L2 = LL has the same eigenvectors as L and has the eigenvalues λ2 . i • If the inverse of the matrix exist, L−1 , then its eigenvectors are the same as those of L and its eigenvalues are 1/λi . • If B = a0 + a1 L + a2 L2 + . . . + ap Lp , is a polynomial in L, then it has the same eigenvectors as L and its eigenvalues are a0 + a1 λ + a2 λ2 + . . . + ap λp . • If L is a real symmetric matrix, (LT = L), then eigenvectors corresponding to distinct eigenvalues are mutually orthogonal, and all its eigenvalues are real. 16.4 Spectral Radius The spectral radius, ρ(L), of a matrix L is given by its largest eigenvalue in magnitude: ρ(L) = max(|λi |) (16.6) i The spectral radius is the lowest for all compatible matrix norms. The proof is simple. Let the matrix L have the eigenvalues and eigenvectors λi , ui . Then ≤L Lui ui (16.7) λi ui = |λi | ui and hence |λi | ≤ L . This result holds for all eigenvalues, and in particular for the eigenvalue with the largest magnitude, i.e. the spectral radius: ρ(L) = max(|λi |) ≤ L i (16.8) The above result holds for all matrix norms. For the case of the 1 or ∞-norms, we have Gershgorin first theorem, that the spectral radius is less then the largest sum of the absolute values of the row or columns entries, namely: ρ(L) ≤ L 1 and ρ(L) ≤ L ∞ . 274 CHAPTER 16. RUDIMENTS OF LINEAR ALGEBRA Gershgorin’s second theorem puts a limit on where to find the eigenvalues of a matrix in the complex plane. Each eigenvalue is within a circle centered at li i, the diagonal entry of the matrix, and of radius R: N |λi − li,i | ≤ 16.5 j =1,j =i |lij | = |li,1 | + |l2,i | + . . . |li,i−1 | + |li,i+1 | + . . . |li,N | (16.9) Eigenvalues of Tridiagonal Matrices A tridiagonal matrix is a matrix whose only non-zero entries are on the main diagonal, and on the first upper and lower diagonals. l1,1 l1,2 l 2,1 L= 0 . . . 0 0 0 l2,2 l2,3 l3,2 l3,3 ··· 0 0 . . . 0 l3,4 .. . 0 lN,N −1 lN,N (16.10) Tridiagonal matrices occur often in the discretization of one-dimensional partial differential equations. The eigenvalues of some of these tridiagonal matrices can sometimes be derived for constant coefficients PDE. In this case the matrix takes the form: ab c a b ca L= b .. b . c (16.11) a ca The eigenvalues and corresponding eigenvectors are given by √ λi = a + 2 bc cos for i = 1, 2, . . . , N . iπ , N +1 ui = c b c b c b c b 1 2 2 2 j 2 N 2 iπ sin N +1 2iπ sin N +1 . . . sin . . . sin jiπ N +1 N iπ N +1 (16.12) 16.5. EIGENVALUES OF TRIDIAGONAL MATRICES 275 For periodic partial differential equations, the tridiagonal system is often of the form ab c a b ca L= b c b .. c . b (16.13) a ca The eigenvalues are then given by: λi = a + (b + c) cos 2π (i − 1) 2π (i − 1) + i(c − b) sin N N (16.14) A final useful result is the following. If a real tridiagonal matrix has either all its offdiagonal element positive or all its off-diagonal element negative, then all its eigenvalues are real. 276 CHAPTER 16. RUDIMENTS OF LINEAR ALGEBRA Chapter 17 Programming Tips 17.1 Introduction The implementation of numerical algorithm requires familiarity with a number of software packages and utilities. Here is a short list of the minimum required to get started in programming: 1. Basics of the Operating System such as file manipulation. The RSMAS library has UNIX: the basics. The library has also a bunch of electronic books on the subject. Two titles I came across are: Unix Unleashed, and Learning the Unix Operating System: Nutshell Handbook 2. Text editor to write the computer program. Most Unix books would have a short tutorial on using either vi or emacs for editing text files. There are a number of simpler visual editors too such as jed. Web sites for vi or its close cousin vim are: • http://www.asu.edu/it/fyi/dst/helpdocs/editing/vi/ This is actually a very concise and fast introduction to vi. Start with it and then go to the other web sites for more in-depth information. • http://docs.freebsd.org/44doc/usd/12.vi/paper.html • http://www.vim.org/ To learn about emacs, and its visual counterpart, xemacs, visit • http://www.math.utah.edu/lab/unix/emacs.html This seems like a good and brief introduction so that you can be editing files with simple commands. • http://cmgm.stanford.edu/classes/unix.emacs.html • http://www.lib.chicago.edu/keith/tclcourse/emacs-tutorial.html- • http://www.xemacs.org/ This is the Grapher User Interface version of emacs. It is much like notepad in WINDOWS in that the commands can be entered visually. 277 278 CHAPTER 17. PROGRAMMING TIPS 3. Knowledge of a programming language. Fortran is still the preferred language for numerical programming, although C and C++ have been used in certain applications requiring sophisticated data structures. Section 17.2 has an example to introduce various elements of the language. 4. Compiler to turn the program into machine instructions. Section 17.3 discusses compiler issues and how to use the compiler options in helping you to track errors in the coding. 5. Debugger to track down bugs in a computer program. 6. Visualization software to plot the results of computations. 7. Various mathematical libraries such as LAPACK for linear algebra routines and FFTPACK for Fast Fourier Transform. 17.2 Fortran Example The following is a simple fortran 90 code that makes use of simple language syntax such as declaring variable, looping, opening a file, and writing data out. You should type in the program to practice the editor commands, compile it and run it. Then use a visualization package to plot the results. ! ! This is a sample fortran program to introduce the language. ! It divides the interval [-1 1] into M points and computes the ! functions sin(x) and cos(x). The results are written to the ! terminal and to a file called waves.dat ! ! Comments are marked with an exclamation mark; the compiler ! ignores all characters after the "!" sign. ! ! A fortran statement that does not fit into a single line can ! be continued on the next one by terminating it with a "&" sign. program waves ! name of program unit implicit none ! prevents compiler from assigning default types ! and forces user to declare every variable !.Variable Declaration starts here integer, parameter :: M=21 ! declares the value of a constant ! that does not change throughout ! the calculation integer :: i ! declares an integer counter real, parameter :: xmin=-1.0, xmax=1.0 ! single precision constants real :: f(M) ! real array with M entries 17.2. FORTRAN EXAMPLE 279 real :: pi,x,y,dx !.End of Variable Declaration ! Executable statements are below ! Unit 6 is the terminal also called stdout write(6,*)’Hello World’ ! Open a file to write out the data. open(unit=9, & ! file unit is number 9 file=’waves.out’, & ! output file name is waves.out form=’formatted’, & ! data written in ASCII status=’unknown’, & ! create file if it does not exit already action=’write’) ! file meant for writing pi = 2.0*asin(1.0) ! initialize pi dx = (xmax-xmin)/real(M-1) ! grid-size do i = 1,M ! counter: starts at 1, increments by 1 and ends at M !...indent statements within loop for clarity x = (i-1)*dx + xmin ! location on interval y = sin(pi*x) ! compute function 1 f(i) = cos(pi*x) ! compute function 2 write(6,*) x,y ! write two columns to terminal write(9,*) x,y,f(i) ! write three columns to file enddo ! end of do loop must be marked. close(9) ! close file (optional) write(6,*)’Done’ stop end program waves ! ! ! Compiling the program and creating an executable called waves: ! $ f90 waves.f90 -o waves ! If "-o waves" is ommited the executable will be called a.out ! by default. The fortan 90 compiler (f90) may have a different name ! on your system. Possible alternatives are pgf90 (Portland Group ! compiler), ifc (Intel Fortran Compiler), and xlf90 (on IBMs). ! ! ! Running the program ! $ waves 280 CHAPTER 17. PROGRAMMING TIPS ! ! ! Expected Terminal output is: ! Hello World ! -1.000000 8.7422777E-08 ! -0.9000000 -0.3090170 ! -0.8000000 -0.5877852 ! -0.7000000 -0.8090170 ! -0.6000000 -0.9510565 ! -0.5000000 -1.000000 ! -0.4000000 -0.9510565 ! -0.3000000 -0.8090171 ! -0.2000000 -0.5877852 ! -9.9999964E-02 -0.3090169 ! 0.0000000E+00 0.0000000E+00 ! 0.1000000 0.3090171 ! 0.2000000 0.5877854 ! 0.3000001 0.8090171 ! 0.4000000 0.9510565 ! 0.5000000 1.000000 ! 0.6000000 0.9510565 ! 0.7000000 0.8090169 ! 0.8000001 0.5877850 ! 0.9000000 0.3090170 ! 1.000000 -8.7422777E-08 ! Done ! ! ! Visualizing results with matlab: ! $ matlab !> z = load(’waves.out’); % read file into array z !> size(z) % get dimensions of z !> plot(z(:,1),z(:,2),’k’) % plot second column versus first in black !> hold on; % add additional lines !> plot(z(:,1),z(:,3),’r’) % plot third column versus first in red !> xlabel(’x’); % add labels to x-axis. !> ylabel(’f(x)’); % add labels to y-axis. !> title(’sine and cosine curves’); % add title to plot !> legend(’sin’,’cos’,0) % add legend !> print -depsc waves % save results to a color !> % encapsulated postscript file !> % called waves.eps. The extension !> % eps will be added automatically. ! Viewing the postscript file: ! $ ghostscript waves.eps 17.3. DEBUGGING AND VALIDATION 281 ! ! Printing the file to color printer ! $ lpr -Pmpocol waves.eps 17.3 Debugging and Validation Errors are invariably introduced when implementing a numerical algorithm in a computer code. There are actually two kinds of errors to watch for: 1. Algorithmic Errors: are conceptual in nature and are independent of the actual computer code. These kinds of errors can usually be studied theoretically when the algorithm is devised. Examples include stability and convergence of a numerical scheme to its mathematically continuous form. It is usually very hard to solve algorithmic problems with computer experimentation. The latter can only be a guide and/or a confirmation of theory. 2. Programming Errors: are introduced when translating an algorithm into actual computer code. These errors are often referred to as bugs and there are techniques that makes tracking them simpler. This is what we will concentrate on in this section. 17.3.1 Programming Rules The following are rules of thumb devised to help a programmer write “good” code. The primary advice is to write clear readable code that is easy to validate and maintain. Other advice towards that goal are: • Write modular programs with clearly defined functional unit. By separating the different steps of an algorithm, it becomes easier to “abstract” the code, and to make connections with the theory. • Each modular unit should be validated. By gaining confidence in the basic working of the building units it becomes easier to cobble together more complicated programs. Moreover, some basic units can be reused in new programs without having to rewrite them. • Comment the program to explain the different steps. Choose meaningfull names for variables. Document the input and output of subroutines and their basic tasks. • Write clear code, and do not worry about efficiency in either CPU or memory. Modern computers are vastly superior to the ones common during the early years of computing. Memory and CPU speed constraints forced programmers to use coding tricks that obfuscated the code. • Improve the readability of the code. Do not be afraid to leave white spaces. Indent do loops, logical statements, and functional units so they are easy to identify. • Last but not least make sure you are using the appropriate algorithm for what you intend to do. 282 CHAPTER 17. PROGRAMMING TIPS 17.3.2 Coding tips and compiler options If you write programs in the “right” way, you will have few syntax errors, a good compiler (called with the right options) will flag them for you, and you will correct them easily. It is important to emphasize that most compilers are rather lenient by default, and must be invoked with special options in order to get many useful warnings. Some of these compiler options enable compile-time checks, i.e. the compiler will spot errors during the translation of the code to machine language and prior to executing the program. Examples include: disabling implicit type declarations, and flagging standard language violations. Others options enable various run-time checks (when the code is actually executed) such as: checking array bounds, trapping floating-point exceptions, and special variable initializations that are supposed to produce floating-point exceptions if uninitialized variables are used. The following is a list of programming tips that will help minimize the number of bugs in the code. • Do not used implicitly declared types. In fortran variables whose name start with the letters i,j,k,l,m,n are integers by default and all others are reals by default. Use the statement implicit none in your code to force the compiler to list every undeclared variable, and to catch mistyped variables. It is also possible to do that using compiler options (see the manual pages for the specific options, on UNIX machines it is usually -u). • Make sure every variable is initialized properly before using it, and do not assume that the compiler does it automatically. Some compiler will allow you to initialize variables to a bogus value, a Nan (short for not a number, so that the code trips if that variable is used in an operation before overwriting the Nan. • Use the include header.h statement to make sure that common blocks are identical across program units. The common declaration would thus go into a separate file (header.h) which can be subsequently included in other subroutines without retyping it. • Check that the argument list of a call statement matches in type and number the argument list of the subroutine declaration. • Remarks on the pitfalls list The improved syntax of Fortran 90 eliminates some common programming pitfalls. Note that a Fortran 90 compiler will often only be able to help you if you make use of the stricter checking features of the new standard: 1. IMPLICIT NONE 2. Explicit interfaces 3. INTENT attributes 4. PRIVATE attributes for module-wide data that should not be accessible to the outside 17.3. DEBUGGING AND VALIDATION 283 • Use list files to catch compiler report. When compiling the compiler throws rapidly a list of errors at you, being out of context they are hard to understand, and even harder to remember when you return to the editor. Using two windows one for editing and one for debugging is helpful. You can also ask the compiler to generate a LIST FILE, that you can look at (or print) in a separate window. • Use modules and interface to double check the argument lists of calling subroutines. 17.3.3 Run time errors and compiler options Some bugs cannot be caught at compile time and produce errors only when the code is executed. The compiler switches can help you catch some common bugs. Here is again a list of tips. 1. Do not optimize the code in the first run. Rather compile it with a debugging option (usually -g) to produce trace back information. The program can then let you know the statement number that caused the fatal error. 2. Do array bound checking. The code will crash if you are trying to access memory beyond that available for an array. The common flag for this is -C but changes from compiler to compiler. This flag will slow down the performance of the code. However, you first concern should be a correct code rather then fast code. 3. Some floating point operations can result in a Nan. The code should stop then and issue an error report. Various switches trap different floating point exceptions. You want to catch division by zero, overflows (where the number is too large to be represent in the machine precision), underflows (similar but for very small numbers). Underflows are not as problematic as overflows and can usually be ignored. Check the manual for the right compiler switches. 4. Test routines individually to check if they are working according to their specification. Try first “trivial” cases where you know the answers to check the results you get. 5. Use print statements liberally to spot check the value of variables. 6. Use a symbolic debugger to trace the execution of your program. The first rule about debugging is staying cool, treat the misbehaving program as an intellectual exercise, or as a detective work. 1. You got some strange results that made you think there is a bug. Think again, are you sure they are not the correct output for some special input? 2. If you are not sure what causes the bug, DON’T try semi-random code modifications, that’s seldom works. Your aim should be to gather as much as possible information! 284 CHAPTER 17. PROGRAMMING TIPS 3. If you have a modular program, each part does a clearly defined task, so properly placed ’debug statements’ can ISOLATE the malfunctioning procedure/codesection. 4. If you are familiar with a debugger use it, but be careful not to be carried away by the many options and start playing. 17.3.4 Some common pitfalls • Data-types 1. Using implicit variable declarations 2. Using a non-intrinsic function without a type declaration 3. Incompatible argument lists in call and the routine definition 4. Incompatible declarations of a common block 5. Using constants and parameters of incorrect (smaller) type 6. Assuming that untyped integer constants get typed properly 7. Assuming intrinsic conversion-functions take care of result type • Arithmetic 1. Using constants and parameters of incorrect type 2. Uncareful use of automatic type promotions 3. Assuming that dividing two integers will give a floating-point result. 4. Assuming integer exponents (e.g. 2**(3)-) are computed as floating-point numbers 5. Using floating-point comparisons tests, .EQ. and .NE. are particularly risky 6. Loops with a REAL or DOUBLE PRECISION control variable 7. Assuming that the MOD function with REAL arguments is exact 8. Assuming real-to-integer assignment will work in all cases • Miscellaneous 1. Code lines longer than the allowed maximum 2. Common blocks losing their values while the program runs 3. Aliasing of dummy arguments and common block variables, or other dummy arguments in the same subprogram invocation 4. Passing constants to a subprogram that modifies them 5. Bad DO-loop parameters (see the DO loops chapter) 6. TABs in input files - what you see is not what you get! • General 17.3. DEBUGGING AND VALIDATION 285 1. Assuming variables are initialized to zero 2. Assuming variables keep their value between the execution of a RETURN statement and subsequent invocations 3. Letting array indexes go out of bounds 4. Depending on assumptions about the computing order of subexpressions 5. Assuming short-circuit evaluation of expressions 6. Using trigonometric functions with large arguments on some machines 7. Inconsistent use of physical units 286 CHAPTER 17. PROGRAMMING TIPS Chapter 18 Debuggers A debugger is a very useful tool for developing codes, and in finding and fixing bugs introduced in the code development. Most compiler providers include a debugger as part of their software bundle. Since we are using the Portland Group compiler for the class, the discussion here will center on its debugger primarily, even though a lot of the information applies equally to other compiler/debugger systems. Check the manual for the specific compiler/debugger in case of problems. 18.1 Preparing the code for debugging A debuggable executable must have extra information embedded in it for debugging purposes. The debugger then makes use of this information to report the values of variables, and allow the programmer to follow the execution of the program line by line. Compiler options must be used to instruct the compiler to embed this information in the object files. Here is a useful subset of these options pertinent to the PG compiler: 1. -g Generate information for the debugger, this is necessary on most computers to enable the printing of useful debugging information. Avoid using any optimization in the code development phase (so avoid the -O options). Occasionaly you want to debug with optimization on, the -gopt would then be necessary. 2. -C or -Mbounds: generate code to check array bounds 3. -Ktrap=list-of-options: helps trap floating point problems. The list is a comma separates list of strings that controls which floating point operations to catch. These include: (a) -Ktrap=divz: trap divide by zero. (b) -Ktrap=ovef: trap floating point overflows. (c) -Ktrap=unf: trap underflow (number too small to be representable). (d) -Ktrap=inv: trap invalid operands (e.g. square root of negative numbers). A commonly useful subset is to set -Ktrap=divz,inv,ovf 287 288 CHAPTER 18. DEBUGGERS 4. -Mchkptr: Check if code mistakenly references NULL pointers. 5. -Mchkstk: Check the stack for available space upon entry to and before the start of a parallel region. Useful when many private variables are declared. Compiler Enable Debugging Bounds Checking Uninitialized variables Floating point trap Floating point stack 18.2 PGI -g -C NA -Ktrap -Mchkstk Intel -g -CB -ftrapuv -fpe0 -fpstkchk gcc -g -C -ffpe-trap IBM -g -C -qinitauto -qflttrp Pathscale -g Running the debugger The command pgdbg executable wil run the code executable under the control of the debugger. The default is to start the debugger under a Graphical User Interface (GUI). If you have a slow connection, the GUI can slow down your work because of the graphics overhead. Most debugger include a command, text line interface. For PG it is invoked with the -text option”. The command will start the debugger which in turn will “load” the executable and all its pertinent information. The debugger then passes control to the user and waits for further instructions. Most debuggers come with an on-line help facility to allow users to learn the command list interactively, the command to initiate is usually called help. Here we list briefly a subset of these commands and that have proved to be most useful for the author. The debugger GUI is fairly intuitive; it will open up at least one window to display source lines, and another one for I/O operations. The GUI and text versions of the debugger can be controlled via command lines. Here we will cover some the most useful one, and we refer the user to the manual for further information. 1. run Will cause the code to start executing. 2. list 10,40 will list the lines 10,40, list without argument will list lines 10 to 20 from the current statement. 3. stop at xyz will put a breakpoint at line xyz. Execution will stop at the line so the user can examine variables. 4. print sn will print the value of the variable sn. 5. assign sn=expression assigns the expression expression to the variable sn. 6. whatis sn will report the data type of variable sn. 7. where will report the current statement. 8. step will execute the next source line, including stepping into a function or subroutine. 18.2. RUNNING THE DEBUGGER 9. 289 stepi will execute a single intruction (as opposed to the entire source line. step ¡count¿ will execute ¡count¿ instructions. 10. display list the expressions being printed at breakpoints. display <exp1>,<exp2> prints <exp1> and <exp2> at every breakpoint. 11. next will cause the debugge to skip over a function or subroutine, i.e. executing it in its entirety. 12. continue will cause the execution to resume from the point it stopped to the next breakpoint. 290 CHAPTER 18. DEBUGGERS Bibliography Arakawa, A., 1966. Computational design for long term numerical integration of the equations of fluid motion: two-dimensional incompressible flow. part i. Journal of Computational Physics 1 (1), 119–143. Arakawa, A., Hsu, Y.-J., 1981. Energy conserving and potential-enstrophy dissipating schemes for the shallow-water equations. Monthly Weather Review 118, 1960–1969. Arakawa, A., Lamb, V. R., 1977. Computational Design of the Basic Dynamical Processes of the UCLS general circulation model. Vol. 17. Academic Press, New York, p. 174. Arakawa, A., Lamb, V. R., 1981. A potential enstrophy and energy conserving scheme for the shallow water equations. Monthly Weather Review 109, 18–36. Balsara, D. S., Shu, C.-W., 2000. Monotonicity preserving weighed essentially nonoscillatory schemes with increasingly high order of accuracy. Journal of Computational Physics 160, 405–452. Boris, J. P., Book, D. L., 1973. Flux corrected transport, i: Shasta, a fluid transport algorithm that works. Journal of Computational Physics 11, 38–69. Boris, J. P., Book, D. L., 1975. Flux corrected transport, ii: Generalization of the method. Journal of Computational Physics 18, 248. Boris, J. P., Book, D. L., 1976. Flux corrected transport, iii: Minimum error fct algorithms. Journal of Computational Physics 20, 397. Boyd, J. P., 1989. Chebyshev and Fourier Spectral Methods. Lecture Notes in Engineering. Springer-Verlag, New York. Butcher, J. C., 1987. The Numerical Analysis of Ordinary Differential Equations. John Wiley and Sons Inc., NY. Canuto, C., Hussaini, M. Y., Quarteroni, A., Zang, T. A., 1988. Spectral Methods in Fluid Dynamics. Springer Series in Computational Physics. Springer-Verlag, New York. Dormand, J. R., 1996. Numerical Methods for Differential Equations, A Computational Approach. CRC Press, NY. Dukowicz, J. K., 1995. Mesh effects for rossby waves. Journal of Computational Physics 119, 188–194. 291 292 BIBLIOGRAPHY Durran, D. R., 1999. Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. Springer, New York. Finlayson, B. A., 1972. The Method of Weighed Residuals and Variational Principles. Academic Press. Jiang, C.-S., Shu, C.-W., 1996. Efficient implementation of weighed eno schemes. Journal of Computational Physics 126, 202–228. Karniadakis, G. E., Sherwin, S. J., 1999. Spectral/hp Element Methods for CFD. Oxford University Press. Leonard, B. P., MacVean, M. K., Lock, A. P., 1995. The flux integral method fo multidimensional convection and diffusion. Applied Mathematical Modelling 19, 333–342. Shchepetkin, A. F., McWilliams, J. C., 1998. Quasi-monotone advection schemes based on explicit locally adaptive dissipation. Montlhy Weather Review 126, 1541–1580. Shu, C.-W., 1998. Essentially non-oscillatory and weighed essentially non-oscillatory schemes for hyperbolic conservation laws. Springer, New York, p. 325. Suresh, A., Huynh, H. T., 1997. Accurate monotonicity preserving schemes with rungekutta time stepping. Journal of Computational Physics 136, 83–99. Whitham, G. B., 1974. Linear and Nonlinear Waves. Wiley-Interscience, New York. Zalesak, S. T., 1979. Fully multidimensional flux-corrected transport algorithms for fluids. Journal of Computational Physics 31, 335–362. Zalesak, S. T., 2005. The design of flux-corrected transport algorithms for structured grids. In: Kuzmin, D., L¨hner, R., Turek, S. (Eds.), Flux-Corrected Transport. o Springer, pp. 29–78. ...
View Full Document

Ask a homework question - tutors are online