BCOQ-book - Synchronization and Linearity An Algebra for...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Synchronization and Linearity An Algebra for Discrete Event Systems Francois Baccelli ¸ INRIA and ´ Ecole Normale Sup´ rieure, D´ partement d’Informatique e e Paris, France Guy Cohen ´ Ecole Nationale des Ponts et Chauss´ es-CERMICS e Marne la Vall´ e, France e and INRIA Geert Jan Olsder Delft University of Technology Faculty of Technical Mathematics Delft, the Netherlands Jean-Pierre Quadrat Institut National de la Recherche en Informatique et en Automatique INRIA-Rocquencourt Le Chesnay, France Preface to the Web Edition The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific community. Copyright Statement This electronic document is in PDF format. One needs Acrobat Reader (available freely for most platforms from the Adobe web site) to benefit from the full interactive machinery: using the package hyperref by Sebastian Rahtz, the table of contents A and all LTEX cross-references are automatically converted into clickable hyperlinks, bookmarks are generated automatically, etc.. So, do not hesitate to click on references to equation or section numbers, on items of the table of contents and of the index, etc.. One may freely use and print this document for one’s own purpose or even distribute it freely, but not commercially, provided it is distributed in its entirety and without modifications, including this preface and copyright statement. Any use of the contents should be acknowledged according to the standard scientific practice. The authors will appreciate receiving any comments by e-mail or other means; all modifications resulting from these comments in future releases will be adequately and gratefully acknowledged. About This and Future Releases We have taken the opportunity of this electronic edition to make corrections of misprints and slight mistakes we have become aware of since the book was published for the first time. In the present release, alterations of the original text are mild and need no special me ii Synchronization and Linearity text. In a more remote future, we may consider providing a true “second edition” in which these changes will be incorporated in the main text itself, sometimes removing the obsolete or wrong corresponding portions. Francois Baccelli ¸ Guy Cohen Geert Jan Olsder Jean-Pierre Quadrat October 2001 francois.baccelli@ens.fr guy.cohen@mail.enpc.fr g.j.olsder@math.tudelft.nl jean-pierre.quadrat@inria.fr Contents Preface ix I Discrete Event Systems and Petri Nets 1 1 2 Introduction and Motivation 1.1 Preliminary Remarks and Some Notation . . . . . . . . . . . . 1.2 Miscellaneous Examples . . . . . . . . . . . . . . . . . . . . . 1.2.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Communication . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Production . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Queuing System with Finite Capacity . . . . . . . . . . 1.2.5 Parallel Computation . . . . . . . . . . . . . . . . . . . 1.2.6 Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.7 Continuous System Subject to Flow Bounds and Mixing 1.3 Issues and Problems in Performance Evaluation . . . . . . . . . 1.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 8 9 14 15 18 20 22 25 28 32 Graph Theory and Petri Nets 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 2.2 Directed Graphs . . . . . . . . . . . . . . . . . . . . 2.3 Graphs and Matrices . . . . . . . . . . . . . . . . . 2.3.1 Composition of Matrices and Graphs . . . . 2.3.2 Maximum Cycle Mean . . . . . . . . . . . . 2.3.3 The Cayley-Hamilton Theorem . . . . . . . 2.4 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Definition . . . . . . . . . . . . . . . . . . . 2.4.2 Subclasses and Properties of Petri Nets . . . 2.5 Timed Event Graphs . . . . . . . . . . . . . . . . . 2.5.1 Simple Examples . . . . . . . . . . . . . . . 2.5.2 The Basic Autonomous Equation . . . . . . 2.5.3 Constructiveness of the Evolution Equations 2.5.4 Standard Autonomous Equations . . . . . . . 2.5.5 The Nonautonomous Case . . . . . . . . . . 2.5.6 Construction of the Marking . . . . . . . . . 2.5.7 Stochastic Event Graphs . . . . . . . . . . . 2.6 Modeling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 35 38 41 46 48 53 53 59 62 63 68 77 81 83 87 87 88 iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Synchronization and Linearity 2.7 II 3 4 2.6.1 2.6.2 2.6.3 Notes Multigraphs . . . . . . . . . . . . . . . . . . . . . . . Places with Finite Capacity . . . . . . . . . . . . . . . Synthesis of Event Graphs from Interacting Resources .............................. . . . . . . . . . . . . . . . . Algebra 88 89 90 97 99 Max-Plus Algebra 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 The min Operation in the Max-Plus Algebra . . . . . . . 3.2 Matrices in Rmax . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Linear and Affine Scalar Functions . . . . . . . . . . . 3.2.2 Structures . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Systems of Linear Equations in (Rmax )n . . . . . . . . . 3.2.4 Spectral Theory of Matrices . . . . . . . . . . . . . . . 3.2.5 Application to Event Graphs . . . . . . . . . . . . . . . 3.3 Scalar Functions in Rmax . . . . . . . . . . . . . . . . . . . . . 3.3.1 Polynomial Functions P (Rmax ) . . . . . . . . . . . . . 3.3.2 Rational Functions . . . . . . . . . . . . . . . . . . . . 3.3.3 Algebraic Equations . . . . . . . . . . . . . . . . . . . 3.4 Symmetrization of the Max-Plus Algebra . . . . . . . . . . . . 3.4.1 The Algebraic Structure S . . . . . . . . . . . . . . . . 3.4.2 Linear Balances . . . . . . . . . . . . . . . . . . . . . . 3.5 Linear Systems in S . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Solving Systems of Linear Balances by the Cramer Rule 3.6 Polynomials with Coefficients in S . . . . . . . . . . . . . . . . 3.6.1 Some Polynomial Functions . . . . . . . . . . . . . . . 3.6.2 Factorization of Polynomial Functions . . . . . . . . . . 3.7 Asymptotic Behavior of Ak . . . . . . . . . . . . . . . . . . . . 3.7.1 Critical Graph of a Matrix A . . . . . . . . . . . . . . . 3.7.2 Eigenspace Associated with the Maximum Eigenvalue . 3.7.3 Spectral Projector . . . . . . . . . . . . . . . . . . . . . 3.7.4 Convergence of Ak with k . . . . . . . . . . . . . . . . 3.7.5 Cyclic Matrices . . . . . . . . . . . . . . . . . . . . . . 3.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 101 101 103 103 104 105 106 108 111 114 116 116 124 127 129 129 131 133 134 135 138 138 139 143 143 145 147 148 150 151 Dioids 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Definitions and Examples . . . . . . . . . . . . . . 4.2.1 Axiomatics . . . . . . . . . . . . . . . . . . . . . 4.2.2 Some Examples . . . . . . . . . . . . . . . . . . . 4.2.3 Subdioids . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Homomorphisms, Isomorphisms and Congruences . . . . . . . . . . . . . . . . . . 153 153 154 154 155 156 157 . . . . . . . . . . . . . . . . . . Contents 4.3 4.4 4.5 4.6 4.7 4.8 4.9 III 5 v Lattice Properties of Dioids . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Basic Notions in Lattice Theory . . . . . . . . . . . . . . . . 4.3.2 Order Structure of Dioids . . . . . . . . . . . . . . . . . . . . 4.3.3 Complete Dioids, Archimedian Dioids . . . . . . . . . . . . . 4.3.4 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Distributive Dioids . . . . . . . . . . . . . . . . . . . . . . . Isotone Mappings and Residuation . . . . . . . . . . . . . . . . . . . 4.4.1 Isotony and Continuity of Mappings . . . . . . . . . . . . . . 4.4.2 Elements of Residuation Theory . . . . . . . . . . . . . . . . 4.4.3 Closure Mappings . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Residuation of Addition and Multiplication . . . . . . . . . . Fixed-Point Equations, Closure of Mappings and Best Approximation 4.5.1 General Fixed-Point Equations . . . . . . . . . . . . . . . . . ◦ 4.5.2 The Case (x ) = a \x ∧ b . . . . . . . . . . . . . . . . . . . 4.5.3 The Case (x ) = ax ⊕ b . . . . . . . . . . . . . . . . . . . 4.5.4 Some Problems of Best Approximation . . . . . . . . . . . . Matrix Dioids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 From ‘Scalars’ to Matrices . . . . . . . . . . . . . . . . . . . 4.6.2 Residuation of Matrices and Invertibility . . . . . . . . . . . Dioids of Polynomials and Power Series . . . . . . . . . . . . . . . . 4.7.1 Definitions and Properties of Formal Polynomials and Power Series . 4.7.2 Subtraction and Division of Power Series . . . . . . . . . . . 4.7.3 Polynomial Matrices . . . . . . . . . . . . . . . . . . . . . . Rational Closure and Rational Representations . . . . . . . . . . . . 4.8.1 Rational Closure and Rational Calculus . . . . . . . . . . . . 4.8.2 Rational Representations . . . . . . . . . . . . . . . . . . . . 4.8.3 Yet Other Rational Representations . . . . . . . . . . . . . . 4.8.4 Rational Representations in Commutative Dioids . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 Dioids and Related Structures . . . . . . . . . . . . . . . . . 4.9.2 Related Results . . . . . . . . . . . . . . . . . . . . . . . . . Deterministic System Theory Two-Dimensional Domain Description of Event Graphs 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 A Comparison Between Counter and Dater Descriptions . . . . . 5.3 Daters and their Embedding in Nonmonotonic Functions . . . . . 5.3.1 A Dioid of Nondecreasing Mappings . . . . . . . . . . . 5.3.2 γ -Transforms of Daters and Representation by Power Series in γ 5.4 Moving to the Two-Dimensional Description . . . . . . . . . . . 5.4.1 The Zmax Algebra through Another Shift Operator . . . . ax 5.4.2 The Min [[γ , δ ]] Algebra . . . . . . . . . . . . . . . . . . . 5.4.3 Algebra of Information about Events . . . . . . . . . . . ax 5.4.4 Min [[γ , δ ]] Equations for Event Graphs . . . . . . . . . . . 5.5 Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 158 160 162 164 165 167 167 171 177 178 184 184 188 189 191 194 194 195 197 197 201 201 203 203 205 207 208 210 210 211 213 . . . . . . . . . . . . . . . . . . . . . . 215 215 217 221 221 224 230 230 232 238 238 244 vi Synchronization and Linearity 5.6 5.7 5.8 5.9 6 5.5.1 A First Derivation of Counters . . . . . . . . . . . . . . . 5.5.2 Counters Derived from Daters . . . . . . . . . . . . . . . 5.5.3 Alternative Definition of Counters . . . . . . . . . . . . . 5.5.4 Dynamic Equations of Counters . . . . . . . . . . . . . . Backward Equations . . . . . . . . . . . . . . . . . . . . . . . . ax 5.6.1 Min [[γ , δ ]] Backward Equations . . . . . . . . . . . . . . 5.6.2 Backward Equations for Daters . . . . . . . . . . . . . . Rationality, Realizability and Periodicity . . . . . . . . . . . . . . 5.7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . 5.7.4 On the Coding of Rational Elements . . . . . . . . . . . . 5.7.5 Realizations by γ - and δ -Transforms . . . . . . . . . . . Frequency Response of Event Graphs . . . . . . . . . . . . . . . 5.8.1 Numerical Functions Associated with Elements of B[[γ , δ ]] ax 5.8.2 Specialization to Min [[γ , δ ]] . . . . . . . . . . . . . . . . 5.8.3 Eigenfunctions of Rational Transfer Functions . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Max-Plus Linear System Theory 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 6.2 System Algebra . . . . . . . . . . . . . . . . . . . . . 6.2.1 Definitions . . . . . . . . . . . . . . . . . . . 6.2.2 Some Elementary Systems . . . . . . . . . . . 6.3 Impulse Responses of Linear Systems . . . . . . . . . 6.3.1 The Algebra of Impulse Responses . . . . . . 6.3.2 Shift-Invariant Systems . . . . . . . . . . . . . 6.3.3 Systems with Nondecreasing Impulse Response 6.4 Transfer Functions . . . . . . . . . . . . . . . . . . . 6.4.1 Evaluation Homomorphism . . . . . . . . . . 6.4.2 Closed Concave Impulse Responses and Inputs 6.4.3 Closed Convex Inputs . . . . . . . . . . . . . 6.5 Rational Systems . . . . . . . . . . . . . . . . . . . . 6.5.1 Polynomial, Rational and Algebraic Systems . 6.5.2 Examples of Polynomial Systems . . . . . . . 6.5.3 Characterization of Rational Systems . . . . . 6.5.4 Minimal Representation and Realization . . . . 6.6 Correlations and Feedback Stabilization . . . . . . . . 6.6.1 Sojourn Time and Correlations . . . . . . . . . 6.6.2 Stability and Stabilization . . . . . . . . . . . 6.6.3 Loop Shaping . . . . . . . . . . . . . . . . . . 6.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 245 247 248 249 249 251 253 253 254 255 257 259 261 261 263 264 268 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 271 271 271 273 276 276 278 279 280 280 282 284 286 286 287 288 292 294 294 298 301 301 Contents IV 7 8 vii Stochastic Systems 303 Ergodic Theory of Event Graphs 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 A Simple Example in Rmax . . . . . . . . . . . . . . . . . . . . . 7.2.1 The Event Graph . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Statistical Assumptions . . . . . . . . . . . . . . . . . . . 7.2.3 Statement of the Eigenvalue Problem . . . . . . . . . . . 7.2.4 Relation with the Event Graph . . . . . . . . . . . . . . . 7.2.5 Uniqueness and Coupling . . . . . . . . . . . . . . . . . 7.2.6 First-Order and Second-Order Theorems . . . . . . . . . 7.3 First-Order Theorems . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Notation and Statistical Assumptions . . . . . . . . . . . 7.3.2 Examples in Rmax . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Maximal Lyapunov Exponent in Rmax . . . . . . . . . . . 7.3.4 The Strongly Connected Case . . . . . . . . . . . . . . . 7.3.5 General Graph . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 First-Order Theorems in Other Dioids . . . . . . . . . . . 7.4 Second-Order Theorems; Nonautonomous Case . . . . . . . . . . 7.4.1 Notation and Assumptions . . . . . . . . . . . . . . . . . 7.4.2 Ratio Equation in a General Dioid . . . . . . . . . . . . . 7.4.3 Stationary Solution of the Ratio Equation . . . . . . . . . 7.4.4 Specialization to Rmax . . . . . . . . . . . . . . . . . . . 7.4.5 Multiplicative Ergodic Theorems in Rmax . . . . . . . . . 7.5 Second-Order Theorems; Autonomous Case . . . . . . . . . . . . 7.5.1 Ratio Equation . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Backward Process . . . . . . . . . . . . . . . . . . . . . 7.5.3 From Stationary Ratios to Random Eigenpairs . . . . . . 7.5.4 Finiteness and Coupling in Rmax ; Positive Case . . . . . . 7.5.5 Finiteness and Coupling in Rmax ; Strongly Connected Case 7.5.6 Finiteness and Coupling in Rmax ; General Case . . . . . . 7.5.7 Multiplicative Ergodic Theorems in Rmax . . . . . . . . . 7.6 Stationary Marking of Stochastic Event Graphs . . . . . . . . . . 7.7 Appendix on Ergodic Theorems . . . . . . . . . . . . . . . . . . 7.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 305 306 306 307 309 313 315 317 319 319 320 321 323 324 328 329 329 330 331 333 347 348 348 350 352 353 362 364 365 366 369 370 Computational Issues in Stochastic Event Graphs 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Monotonicity Properties . . . . . . . . . . . . . . . . . 8.2.1 Notation for Stochastic Ordering . . . . . . . . . 8.2.2 Monotonicity Table for Stochastic Event Graphs 8.2.3 Properties of Daters . . . . . . . . . . . . . . . . 8.2.4 Properties of Counters . . . . . . . . . . . . . . 8.2.5 Properties of Cycle Times . . . . . . . . . . . . 8.2.6 Comparison of Ratios . . . . . . . . . . . . . . 8.3 Event Graphs and Branching Processes . . . . . . . . . 8.3.1 Statistical Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 373 374 374 374 375 379 383 386 388 389 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Synchronization and Linearity 8.4 8.5 8.6 V 9 8.3.2 Statistical Properties . . . . . . . 8.3.3 Simple Bounds on Cycle Times . 8.3.4 General Case . . . . . . . . . . . Markovian Analysis . . . . . . . . . . . . 8.4.1 Markov Property . . . . . . . . . 8.4.2 Discrete Distributions . . . . . . 8.4.3 Continuous Distribution Functions Appendix . . . . . . . . . . . . . . . . . 8.5.1 Stochastic Comparison . . . . . . 8.5.2 Markov Chains . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postface Related Topics and Open Ends 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 About Realization Theory . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 The Exponential as a Tool; Another View on Cayley-Hamilton 9.2.2 Rational Transfer Functions and ARMA Models . . . . . . . 9.2.3 Realization Theory . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 More on Minimal Realizations . . . . . . . . . . . . . . . . . 9.3 Control of Discrete Event Systems . . . . . . . . . . . . . . . . . . . 9.4 Brownian and Diffusion Decision Processes . . . . . . . . . . . . . . 9.4.1 Inf-Convolutions of Quadratic Forms . . . . . . . . . . . . . 9.4.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 9.4.3 Fenchel and Cramer Transforms . . . . . . . . . . . . . . . . 9.4.4 Law of Large Numbers in Dynamic Programming . . . . . . 9.4.5 Central Limit Theorem in Dynamic Programming . . . . . . . 9.4.6 The Brownian Decision Process . . . . . . . . . . . . . . . . 9.4.7 Diffusion Decision Process . . . . . . . . . . . . . . . . . . . 9.5 Evolution Equations of General Timed Petri Nets . . . . . . . . . . . 9.5.1 FIFO Timed Petri Nets . . . . . . . . . . . . . . . . . . . . . 9.5.2 Evolution Equations . . . . . . . . . . . . . . . . . . . . . . 9.5.3 Evolution Equations for Switching . . . . . . . . . . . . . . . 9.5.4 Integration of the Recursive Equations . . . . . . . . . . . . . 9.6 Min-Max Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 General Timed Petri Nets and Descriptor Systems . . . . . . . 9.6.2 Existence of Periodic Behavior . . . . . . . . . . . . . . . . . 9.6.3 Numerical Procedures for the Eigenvalue . . . . . . . . . . . 9.6.4 Stochastic Min-Max Systems . . . . . . . . . . . . . . . . . 9.7 About Cycle Times in General Petri Nets . . . . . . . . . . . . . . . 9.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 390 395 400 400 400 409 412 412 414 414 417 419 419 419 419 422 423 425 427 429 430 430 432 433 433 434 436 437 437 439 444 446 447 449 450 452 455 456 458 Bibliography 461 Notation 471 Preface The mathematical theory developed in this book finds its initial motivation in the modeling and the analysis of the time behavior of a class of dynamic systems now often referred to as ‘discrete event (dynamic) systems’ (DEDS). This class essentially contains man-made systems that consist of a finite number of resources (processors or memories, communication channels, machines) shared by several users (jobs, packets, manufactured objects) which all contribute to the achievement of some common goal (a parallel computation, the end-to-end transmission of a set of packets, the assembly of a product in an automated manufacturing line). The coordination of the user access to these resources requires complex control mechanisms which usually make it impossible to describe the dynamic behavior of such systems in terms of differential equations, as in physical phenomena. The dynamics of such systems can in fact be described using the two (Petri net like) paradigms of ‘synchronization’ and ‘concurrency’. Synchronization requires the availability of several resources or users at the same time, whereas concurrency appears for instance when, at a certain time, some user must choose among several resources. The following example only contains the synchronization aspect which is the main topic of this book. Consider a railway station. A departing train must wait for certain incoming trains so as to allow passengers to change, which reflects the synchronization feature. Consider a network of such stations where the traveling times between stations are known. The variables of interest are the arrival and departure times, assuming that trains leave as soon as possible. The departure time of a train is related to the maximum of the arrival times of the trains conditioning this departure. Hence the max operation is the basic operator through which variables interact. The arrival time at a station is the sum of the departure time from the previous station and the traveling time. There is no concurrency since it has tacitly been assumed that each train has been assigned a fixed route. The thesis developed here is that there exists an algebra in which DEDS that do not involve concurrency can naturally be modeled as linear systems. A linear model is a set of equations in which variables can be added together and in which variables can also be multiplied by coefficients which are a part of the data of the model. The train example showed that the max is the essential operation that captures the synchronization phenomenon by operating on arrival times to compute departure times. Therefore the basic idea is to treat the max as the ‘addition’ of the algebra (hence this max will be written ⊕ to suggest ‘addition’). The same example indicates that we also need conventional addition to transform variables from one end of an arc of the network to ix x Synchronization and Linearity the other end (the addition of the traveling time, the data, to the departure time). This is why + will be treated as multiplication in this algebra (and it will be denoted ⊗). The operations ⊕ and ⊗ will play their own roles and in other examples they are not necessarily confined to operate as max and +, respectively. The basic mathematical feature of ⊕ is that it is idempotent: x ⊕ x = x . In practice, it may be the max or the min of numbers, depending on the nature of the variables which are handled (either the times at which events occur, or the numbers of events during given intervals). But the main feature is again idempotency of addition. The role of ⊗ is generally played by conventional addition, but the important thing is that it behaves well with respect to addition (e.g. that it distributes with respect to ⊕). The algebraic structure outlined is known under the name of ‘dioid’, among other names. It has connections with standard linear algebra with which it shares many combinatorial properties (associativity and commutativity of addition, etc.), but also with lattice-ordered semigroup theory for, speaking of an idempotent addition is equivalent to speaking of the ‘least upper bound’ in lattices. Conventional system theory studies networks of integrators or ‘adders’ connected in series, parallel and feedback. Similarly, queuing theory or Petri net theory build up complex systems from elementary objects (namely, queues, or transitions and places). The theory proposed here studies complex systems which are made up of elementary systems interacting through a basic operation, called synchronization, located at the nodes of a network. The mathematical contributions of the book can be viewed as the first steps toward the development of a theory of linear systems on dioids. Both deterministic and stochastic systems are considered. Classical concepts of system theory such as ‘state space’ recursive equations, input-output (transfer) functions, feedback loops, etc. are introduced. Overall, this theory offers a unifying framework for systems in which the basic ‘engine’ of dynamics is synchronization, when these systems are considered from the point of view of performance evaluation. In other words, dioid algebra appears to be the right tool to handle synchronization in a linear manner, whereas this phenomenon seems to be very nonlinear, or even nonsmooth, ‘through the glasses’ of conventional algebraic tools. Moreover, this theory may be a good starting point to encompass other basic features of discrete event systems such as concurrency, but at the price of considering systems which are nonlinear even in this new framework. Some perspectives are opened in this respect in the last chapter. Although the initial motivation was essentially found in the study of discrete event systems, it turns out that this theory may be appropriate for other purposes too. This happens frequently with mathematical theories which often go beyond their initial scope, as long as other objects can be found with the same basic features. In this particular case the common feature may be expressed by saying that the input-output relation has the form of an inf- (or a sup-) convolution. In the same way, the scope of conventional system theory is the study of input-output relations which are convolutions. In Chapter 1 it is suggested that this theory is also relevant for some systems which either are continuous or do not involve synchronization. Systems which mix Preface xi fluids in certain proportions and which involve flow constraints fall in the former category. Recursive ‘optimization processes’, of which dynamic programming is the most immediate example, fall in the latter category. All these systems involve max (or min) and + as the basic operations. Another situation where dioid algebra naturally shows up is the asymptotic behavior of exponential functions. In mathematical terms, the conventional operations + and × over positive numbers, say, are transformed into max and +, respectively, by the mapping: x → lims →+∞ exp (sx ). This is relevant, for example, in the theory of large deviations, and, coming back to conventional system theory, when outlining Bode diagrams by their asymptotes. There are numerous concurrent approaches for constructing a mathematical framework for discrete event systems. An important dichotomy arises depending on whether the framework is intended to assess the logical behavior of the system or its temporal behavior. Among the first class, we would quote theoretical computer science languages like CSP or CCS and recent system-theoretic extensions of automata theory [114]. The algebraic approach that is proposed here is clearly of the latter type, which makes it comparable with such formalisms as timed (or stochastic) Petri nets [1], generalized semi-Markov processes [63] and in a sense queuing network theory. Another approach, that emphasizes computational aspects, is known as Perturbation Analysis [70]. A natural question of interest concerns the scope of the methodology that we develop here. Most DEDS involve concurrency at an early stage of their design. However, it is often necessary to handle this concurrency by choosing certain priority rules (by specifying routing and/or scheduling, etc.), in order to completely specify their behavior. The theory developed in this book may then be used to evaluate the consequences of these choices in terms of performance. If the delimitation of the class of queuing systems that admit a max-plus representation is not an easy task within the framework of queuing theory, the problem becomes almost transparent within the setting of Petri networks developed in Chapter 2: stochastic event graphs coincide with the class of discrete event systems that have a representation as a max-plus linear system in a random medium (i.e. the matrices of the linear system are random); any topological violation of the event graph structure, be it a competition like in multiserver queues, or a superimposition like in certain Jackson networks, results in a min-type nonlinearity (see Chapter 9). Although it is beyond the scope of the book to review the list of queuing systems that are stochastic event graphs, several examples of such systems are provided ranging from manufacturing models (e.g. assembly/disassembly queues, also called fork-join queues, jobshop and flowshop models, production lines, etc.) to communication and computer science models (communication blocking, wave front arrays, etc.) Another important issue is that of the design gains offered by this approach. The most important structural results are probably those pertaining to the existence of periodic and stationary regimes. Within the deterministic setting, we would quote the interpretation of the pair (cycle time, periodic regime) in terms of eigenpairs together with the polynomial algorithms that can be used to compute them. Moreover, because xii Synchronization and Linearity bottlenecks of the systems are explicitly revealed (through the notion of critical circuits), this approach provides an efficient way not only to evaluate the performance but also to assess certain design choices made at earlier stages. In the stochastic case, this approach first yields new characterizations of throughput or cycle times as Lyapunov exponents associated with the matrices of the underlying linear system, whereas the steady-state regime receives a natural characterization in terms of ‘stochastic eigenvalues’ in max-plus algebra, very much in the flavor of Oseledec’s multiplicative ergodic ¸ theorems. Thanks to this, queuing theory and timed Petri nets find some sort of (linear) garden where several known results concerning small dimensional systems can be derived from a few classical theorems (or more precisely from the max-plus counterpart of classical theorems). The theory of DEDS came into existence only at the beginning of the 1980s, though it is fair to say that max-plus algebra is older, see [49], [130], [67]. The field of DEDS is in full development and this book presents in a coherent fashion the results obtained so far by this algebraic approach. The book can be used as a textbook, but it also presents the current state of the theory. Short historical notes and other remarks are given in the note sections at the end of most chapters. The book should be of interest to (applied) mathematicians, operations researchers, electrical engineers, computer scientists, probabilists, statisticians, management scientists and in general to those with a professional interest in parallel and distributed processing, manufacturing, etc. An undergraduate degree in mathematics should be sufficient to follow the flow of thought (though some parts go beyond this level). Introductory courses in algebra, probability theory and linear system theory form an ideal background. For algebra, [61] for instance provides suitable background material; for probability theory this role is for instance played by [20], and for linear system theory it is [72] or the more recent [122]. The heart of the book consists of four main parts, each of which consists of two chapters. Part I (Chapters 1 and 2) provides a natural motivation for DEDS, it is devoted to a general introduction and relationships with graph theory and Petri nets. Part II (Chapters 3 and 4) is devoted to the underlying algebras. Once the reader has gone through this part, he will also appreciate the more abstract approach presented in Parts III and IV. Part III (Chapters 5 and 6) deals with deterministic system theory, where the systems are mostly DEDS, but continuous max-plus linear systems also are discussed in Chapter 6. Part IV (Chapters 7 and 8) deals with stochastic DEDS. Many interplays of comparable results between the deterministic and the stochastic framework are shown. There is a fifth part, consisting of one chapter (Chapter 9), which deals with related areas and some open problems. The notation introduced in Parts I and II is used throughout the other parts. The idea of writing this book took form during the summer of 1989, during which the third author (GJO) spent a mini-sabbatical at the second author’s (GC’s) institute. The other two authors (FB and JPQ) joined in the fall of 1989. During the process of writing, correcting, cutting, pasting, etc., the authors met frequently, mostly in Fontainebleau, the latter being situated close to the center of gravity of the authors’ own home towns. We acknowledge the working conditions and support of our home Preface xiii institutions that made this project possible. The Systems and Control Theory Network in the Netherlands is acknowledged for providing some financial support for the necessary travels. Mr. J. Schonewille of Delft University of Technology is acknowledged for preparing many of the figures using Adobe Illustrator. Mr. G. Ouanounou of INRIA-Rocquencourt deserves also many thanks for his help in producing the final manuscript using the high-resolution equipment of this Institute. The contents of the book have been improved by remarks of P. Bougerol of the University of Paris VI, and of A. Jean-Marie and Z. Liu of INRIA-Sophia Antipolis who were all involved in the proofreading of some parts of the manuscript. The authors are grateful to them. The second (GC) and fourth (JPQ) authors wish to acknowledge the permanent interaction with the other past or present members of the so-called Max Plus working group at INRIA-Rocquencourt. Among them, M. Viot and S. Gaubert deserve special mention. Moreover, S. Gaubert helped us to check some examples included in this book, thanks ax to his handy computer software MAX manipulating the Min [[γ , δ ]] algebra. Finally, the publisher, in the person of Ms. Helen Ramsey, is also to be thanked, specifically because of her tolerant view with respect to deadlines. We would like to stress that the material covered in this book has been and is still in fast evolution. Owing to our different backgrounds, it became clear to us that many different cultures within mathematics exist with regard to style, notation, etc. We did our best to come up with one, uniform style throughout the book. Chances are, however, that, when the reader notices a higher density of Theorems, Definitions, etc., GC and/or JPQ were the primary authors of the corresponding parts1 . As a last remark, the third author can always be consulted on the problem of coping with three French co-authors. Francois Baccelli, Sophia Antipolis ¸ Guy Cohen, Fontainebleau Geert Jan Olsder, Delft Jean-Pierre Quadrat, Rocquencourt June 1992 1 GC: I do not agree. FB is more prone to that than any of us! xiv Synchronization and Linearity Part I Discrete Event Systems and Petri Nets 1 Chapter 1 Introduction and Motivation 1.1 Preliminary Remarks and Some Notation Probably the most well-known equation in the theory of difference equations is x (t + 1) = Ax (t ) , t = 0, 1, 2, . . . . (1.1) The vector x ∈ Rn represents the ‘state’ of an underlying model and this state evolves in time according to this equation; x (t ) denotes the state at time t . The symbol A represents a given n × n matrix. If an initial condition x (0) = x 0 (1.2) is given, then the whole future evolution of (1.1) is determined. Implicit in the text above is that (1.1) is a vector equation. Written out in scalar equations it becomes n x i (t + 1) = Ai j x j (t ) , i = 1, . . . , n ; t = 0, 1, . . . . (1.3) j =1 The symbol x i denotes the i -th component of the vector x ; the elements Ai j are the entries of the square matrix A. If Ai j , i, j = 1, . . . , n , and x j (t ), j = 1, . . . , n , are given, then x j (t + 1), j = 1, . . . , n , can be calculated according to (1.3). The only operations used in (1.3) are multiplication ( Ai j × x j (t )) and addition (the symbol). Most of this book can be considered as a study of formulæ of the form (1.1), in which the operations are changed. Suppose that the two operations in (1.3) are changed in the following way: addition becomes maximization and multiplication becomes addition. Then (1.3) becomes x i (k + 1) = = max( Ai 1 + x 1 (k ), Ai 2 + x 2 (k ), . . . , Ain + x n (k )) max j ( Ai j + x j (k )) , i = 1, . . . , n . (1.4) If the initial condition (1.2) also holds for (1.4), then the time evolution of (1.4) is completely determined again. Of course the time evolutions of (1.3) and (1.4) will be different in general. Equation (1.4), as it stands, is a nonlinear difference equation. As 3 4 Synchronization and Linearity an example take n = 2, such that A is a 2 × 2 matrix. Suppose A= 3 2 7 4 (1.5) and that the initial condition is x0 = 1 0 . (1.6) The time evolution of (1.1) becomes 1 0 x (0) = 3 2 , x (1) = , x (2) = 23 14 167 102 , x (3) = ,... and the time evolution of (1.4) becomes x (0) = 1 0 , x (1) = 7 4 , x (2) = 11 9 , x (3) = 16 13 ,... . (1.7) We are used to thinking of the argument t in x (t ) as time; at time t the state is x (t ). With respect to (1.4) we will introduce a different meaning for this argument. In order to emphasize this different meaning, the argument t has been replaced by k . For this new meaning we need to think of a network, which consists of a number of nodes and some arcs connecting these nodes. The network corresponding to (1.4) has n nodes; one for each component x i . Entry Ai j corresponds to the arc from node j to node i . In terms of graph theory such a network is called a directed graph (‘directed’ because the individual arcs between the nodes are one-way arrows). Therefore the arcs corresponding to Ai j and A j i , if both exist, are considered to be different. The nodes in the network can perform certain activities; each node has its own kind of activity. Such activities take a finite time, called the activity time, to be performed. These activity times may be different for different nodes. It is assumed that an activity at a certain node can only start when all preceding (‘directly upstream’) nodes have finished their activities and sent the results of these activities along the arcs to the current node. Thus, the arc corresponding to Ai j can be interpreted as an output channel for node j and simultaneously as an input channel for node i . Suppose that this node i starts its activity as soon as all preceding nodes have sent their results (the rather neutral word ‘results’ is used, it could equally have been messages, ingredients or products, etc.) to node i , then (1.4) describes when the activities take place. The interpretation of the quantities used is: • x i (k ) is the earliest epoch at which node i becomes active for the k -th time; • Ai j is the sum of the activity time of node j and the traveling time from node j to node i (the rather neutral expression ‘traveling time’ is used instead of, for instance, ‘transportation time’ or ‘communication time’). The fact that we write Ai j rather than A j i for a quantity connected to the arc from node j to node i has to do with matrix equations which will be written in the classical 1.1. Preliminary Remarks and Some Notation 5 4 3 7 node 1 node 2 2 Figure 1.1: Network corresponding to Equation (1.5) way with column vectors, as will be seen later on. For the example given above, the network has two nodes and four arcs, as given in Figure 1.1. The interpretation of the number 3 in this figure is that if node 1 has started an activity, the next activity cannot start within the next 3 time units. Similarly, the time between two subsequent activities of node 2 is at least 4 time units. Node 1 sends its results to node 2 and once an activity starts in node 1, it takes 2 time units before the result of this activity reaches node 2. Similarly it takes 7 time units after the initiation of an activity of node 2 for the result of that activity to reach node 1. Suppose that an activity refers to some production. The production time of node 1 could for instance be 1 time unit; after that, node 1 needs 2 time units for recovery (lubrication say) and the traveling time of the result (the final product) from node 1 to node 2 is 1 time unit. Thus the number A11 = 3 is made up of a production time 1 and a recovery time 2 and the number A21 = 2 is made up of the same production time 1 and a traveling time 1. Similarly, if the production time at node 2 is 4, then this node does not need any time for recovery (because A22 = 4), and the traveling time from node 2 to node 1 is 3 (because A12 = 7 = 4 + 3). If we now look at the sequence (1.7) again, the interpretation of the vectors x (k ) is different from the initial one. The argument k is no longer a time instant but a counter which states how many times the various nodes have been active. At time 14, node 1 has been active twice (more precisely, node 1 has started two activities, respectively at times 7 and 11). At the same time 14, node 2 has been active three times (it started activities at times 4, 9 and 13). The counting of the activities is such that it coincides with the argument of the x vector. The initial condition is henceforth considered to be the zeroth activity. In Figure 1.1 there was an arc from any node to any other node. In many networks referring to more practical situations, this will not be the case. If there is no arc from node j to node i , then node i does not need any result from node j . Therefore node j does not have a direct influence on the behavior of node i . In such a situation it is useful to consider the entry Ai j to be equal to −∞. In (1.4) the term −∞ + x j (k ) does not influence x i (k + 1) as long as x j (k ) is finite. The number −∞ will occur frequently in what follows and will be indicated by ε. For reasons which will become clear later on, Equation (1.4) will be written as x i (k + 1) = Ai j ⊗ x j (k ) , j i = 1, . . . , n , 6 Synchronization and Linearity or in vector notation, x (k + 1) = A ⊗ x (k ) . (1.8) The symbol j c( j ) refers to the maximum of the elements c( j ) with respect to all appropriate j , and ⊗ (pronounced ‘o-times’) refers to addition. Later on the symbol ⊕ (pronounced ‘o-plus’) will also be used; a ⊕ b refers to the maximum of the scalars a and b. If the initial condition for (1.8) is x (0) = x 0 , then x (1) = A ⊗ x 0 , x (2) = A ⊗ x (1) = A ⊗ ( A ⊗ x 0 ) = ( A ⊗ A) ⊗ x 0 = A2 ⊗ x 0 . It will be shown in Chapter 3 that indeed A ⊗ ( A ⊗ x 0 ) = ( A ⊗ A) ⊗ x 0 . For the example given above it is easy to check this by hand. Instead of A ⊗ A we simply write A2 . We obtain x (3) = A ⊗ x (2) = A ⊗ ( A2 ⊗ x 0 ) = ( A ⊗ A2 ) ⊗ x 0 = A3 ⊗ x 0 , and in general x (k ) = ( A ⊗ A ⊗ · · · ⊗ A ) ⊗ x 0 = Ak ⊗ x 0 . k times The matrices A2 , A3 , . . . , can be calculated directly. Let us consider the A-matrix of (1.5) again, then A2 = max(3 + 3, 7 + 2) max(2 + 3, 4 + 2) max(3 + 7, 7 + 4) max(2 + 7, 4 + 4) = 9 11 69 . In general ( A2 )i j = Ail ⊗ Al j = max( Ail + Al j ) . l (1.9) ( A ⊗ x (k )) ⊕ ( B ⊗ u (k )) , C ⊗ x (k ) . (1.10) l An extension of (1.8) is x (k + 1) = y (k ) = The symbol ⊕ in this formula refers to componentwise maximization. The m -vector u is called the input to the system; the p -vector y is the output of the system. The components of u refer to nodes which have no predecessors. Similarly, the components of y refer to nodes with no successors. The components of x now refer to internal nodes, i.e. to nodes which have both successors and predecessors. The matrices B = { Bi j } and C = {Ci j } have sizes n × m and p × n , respectively. The traditional way of writing (1.10) would be x i (k + 1) = yi (k ) = max( Ai 1 + x 1 (k ), . . . , Ain + x n (k ), Bi 1 + u 1 (k ), . . . , Bim + u m (k )) , i = 1, . . . , n ; max(Ci 1 + x 1 (k ), . . . , Cin + x n (k )) , i = 1, . . . , p . 1.1. Preliminary Remarks and Some Notation 7 Sometimes (1.10) is written as x (k + 1) = y (k ) = A ⊗ x (k ) ⊕ B ⊗ u (k ) , C ⊗ x (k ) , (1.11) where it is understood that multiplication has priority over addition. Usually, however, (1.10) is written as x (k + 1) y (k ) = = Ax (k ) ⊕ Bu (k ) , Cx (k ) . (1.12) If it is clear where the ‘⊗’-symbols are used, they are sometimes omitted, as shown in (1.12). This practice is exactly the same one as with respect to the more common multiplication ‘ × ’ or ‘ . ’ symbol in conventional algebra. In the same vein, in conventional algebra 1 × x is the same as 1 x , which is usually written as x . Within the context of the ⊗ and ⊕ symbols, 0 ⊗ x is exactly the same as x . The symbol ε is the neutral element with respect to maximization; its numerical value equals −∞. Similarly, the symbol e denotes the neutral element with respect to addition; it assumes the numerical value 0. Also note that 1 ⊗ x is different from x . If one wants to think in terms of a network again, then u (k ) is a vector indicating when certain resources become available for the k -th time. Subsequently it takes Bi j time units before the j -th resource reaches node i of the network. The vector y (k ) refers to the epoch at which the final products of the network are delivered to the outside world. Take for example 37 ε x (k + 1) = x (k ) ⊕ u (k ) , 24 1 (1.13) y (k ) = ( 3 ε )x (k ) . The corresponding network is shown in Figure 1.2. Because B11 = ε ( = −∞), the u(k ) 1 3 4 7 3 2 y (k ) Figure 1.2: Network with input and output input u (k ) only goes to node 2. If one were to replace B by ( 2 1 ) for instance, 8 Synchronization and Linearity where the prime denotes transposition, then each input would ‘spread’ itself over the two nodes. In this example from epoch u (k ) on, it takes 2 time units for the input to reach node 1 and 1 time unit to reach node 2. In many practical situations an input will enter the network through one node. That is why, as in this example, only one B -entry per column is different from ε. Similar remarks can be made with respect to the output. Suppose that we have (1.6) as an initial condition and that u (0) = 1 , u (1) = 7 , u (2) = 13 , u (3) = 19 , . . . , then it easily follows that x (0) = 1 0 , x (1) = y (0) = 4 , 7 4 , x (2) = y (1) = 10 , 11 9 y (2) = 14 , , x (3) = 16 14 ,... , y (3) = 19 , . . . . We started this section with the difference equation (1.1), which is a first-order linear vector difference equation. It is well known that a higher order linear scalar difference equation z (k + 1) = a1 z (k ) + a2 z (k − 1) + · · · + an z (k − n + 1) can be written in the form of Equation (1.1). If we introduce the vector 1), . . . , z (k − n + 1)) , then (1.14) can be written as z (k ) z (k + 1) a1 a2 . . . . . . an z (k ) 0 . . . . . . 0 z (k − 1) 1 . . . . = 0 . . . . . . . . . . . 0 ... 0 1 0 z (k − n + 2) z (k − n + 1) (1.14) (z (k ), z (k − . (1.15) This equation has exactly the form of (1.1). If we change the operations in (1.14) in the standard way, addition becomes maximization and multiplication becomes addition; then the numerical evaluation of (1.14) becomes z (k + 1) = max (a1 + z (k ), a2 + z (k − 1), . . . , an + z (k − n + 1)) . (1.16) This equation can also be written as a first-order linear vector difference equation. In fact this equation is almost Equation (1.15), which must now be evaluated with the operations maximization and addition. The only difference is that the 1’s and 0’s in (1.15) must be replaced by e’s and ε’s, respectively. 1.2 Miscellaneous Examples In this section, seven examples from different application areas are presented, with a special emphasis on the modeling process. The examples can be read independently. 1.2. Miscellaneous Examples 9 It is shown that all problems formulated lead to equations of the kind (1.8), (1.10), or related ones. Solutions to the problems which are formulated are not given in this section. To solve these problems, the theory must first be developed and that will be done in the next chapters. Although some of the examples deal with equations with the look of (1.8), the operations used will again be different. The mathematical expressions are the same for many applications. The underlying algebra, however, differs. The emphasis of this book is on these algebras and their relationships. 1.2.1 Planning Planning is one of the traditional fields in which the max-operation plays a crucial role. In fact, many problems in planning areas are more naturally formulated with the min-operation than with the max-operation. However, one can easily switch from minimization to maximization and vice versa. Two applications will be considered in this subsection; the first one is the shortest path problem, the second one is a scheduling problem. Solutions to such problems have been known for some time, but here the emphasis is on the notation introduced in §1.1 and on some analysis pertaining to this notation. 1.2.1.1 Shortest Path Consider a network of n cities; these cities are the nodes in a network. Between some cities there are road connections; the distance between city j and city i is indicated by Ai j . A road corresponds to an arc in the network. If there is no road from j to i , then we set Ai j = ε. In this example ε = +∞; nonexisting roads get assigned a value +∞ rather than −∞. The reason is that we will deal with minimization rather than maximization. Owing to the possibility of one-way traffic, it is allowed that Ai j = A j i . Matrix A is defined as A = ( Ai j ). The entry Ai j denotes the distance between j and i if only one link is allowed. Sometimes it may be more advantageous to go from j to i via k . This will be the case if Aik + Ak j < Ai j . The shortest distance from j to i using exactly two links is min ( Aik + Ak j ) . k =1,... ,n (1.17) When we use the shorthand symbol ⊕ for the minimum operation, then (1.17) becomes Aik ⊗ Ak j . k Note that has been used for both the maximum and the minimum operation. It should be clear from the context which is meant. The symbol ⊕ will be used similarly. The reason for not distinguishing between these two operations is that (R ∪ {−∞}, max , +) and (R ∪ {+∞}, min , +) are isomorphic algebraic structures. Chapters 3 and 4 will deal with such structures. It is only when the operations max and min appear in the same formula that this convention would lead to ambiguity. This situation will occur in Chapter 9 and different symbols for the two operations will be 10 Synchronization and Linearity used there. Expression (1.17) is the i j -th entry of A2 : ( A2 )i j = Aik ⊗ Ak j . k Note that the expression A2 can have different meanings also. In (1.9) the maxoperation was used whereas the min-operation is used here. If one is interested in the shortest path from j to i using one or two links, then the length of the shortest path becomes ( A ⊕ A2 )i j . If we continue, and if one, two or three links are allowed, then the length of the shortest path from j to i becomes ( A ⊕ A2 ⊕ A3 )i j , where A3 = A2 ⊗ A, and so on for more than three links. We want to find the shortest path whereby any number of links is allowed. It is easily seen that a road connection consisting of more than n − 1 links can never be optimal. If it were optimal, the traveler would visit one city at least twice. The road from this city to itself forms a part of the total road connection and is called a circuit. Since it is (tacitly) assumed that all distances are nonnegative, this circuit adds to the total distance and can hence be disregarded. The conclusion is that the length of the shortest path from j to i is given by ( A ⊕ A2 ⊕ · · · ⊕ An−1 )i j . Equivalently one can use the following infinite series for the shortest path (the terms Ak , k ≥ n do not contribute to the sum): A+ = A ⊕ A2 ⊕ · · · ⊕ An ⊕ An+1 ⊕ · · · . def (1.18) The matrix A+ , sometimes referred to as the shortest path matrix, also shows up in the scheduling problem that we define below shortly. Note that ( A+ )ii refers to a path which first leaves node i and then comes back to it. If one wants to include the possibility of staying at a node, then the shortest path matrix should be defined as e ⊕ A+ , where e denotes the identity matrix of the same size as A. An identity matrix in this set-up has zeros on the diagonal and the other entries have the value +∞. In general, e is an identity matrix of appropriate size. The shortest path problem can also be formulated according to a difference equation of the form (1.8). To that end, consider an n × n matrix X : the i j -th entry of X refers to a connection from city j to city i , X i j (k ) is the minimum length with respect to all roads from j to i with k links. Then it is not difficult to see that this vector satisfies the equation X i j (k ) = min ( X il (k − 1) + Al j ) , l =1,... ,n i , j = 1, . . . , n . (1.19) 1.2. Miscellaneous Examples 11 Formally this equation can be written as X (k ) = X (k − 1) A = X (k − 1) ⊗ A , but it cannot be seen from this equation that the operations to be used are minimization and addition. Please note that the matrix A in the last equation, which is of size n 2 × n 2, is different from the original A, of size n × n , as introduced at the beginning of this subsection. The principle of dynamic programming can be recognized in (1.19). The following formula gives exactly the same results as (1.19): X i j (k ) = min ( Ail + X l j (k − 1)) , l =1,... ,n i , j = 1, . . . , n . The difference between this formula and (1.19) is that one uses the principle of forward dynamic programming and the other one uses backward dynamic programming. 1.2.1.2 Scheduling Consider a project which consists of various tasks. Some of these tasks cannot be started before some others have been finished. The dependence of these tasks can be given in a directed graph in which each node coincides with a task (or, equivalently, with an activity). As an example, consider the graph of Figure 1.3. There are six 4 2 1 2 5 2 5 4 5 3 4 1 8 6 3 Figure 1.3: Ordering of activities in a project nodes, numbered 1, . . . , 6. Node 1 represents the initial activity and node 6 represents the final activity. It is assumed that the activities, except the final one, take a certain time to be performed. In addition, there may be traveling times. The fact that the final activity has a zero cost is not a restriction. If it were to have a nonzero cost, a fictitious node 7 could be added to node 6. Node 7 would represent the final activity. The arcs between the nodes in Figure 1.3 denote the precedence constraints. For instance, node 4 cannot start before nodes 2 and 5 have finished their activities. The number Ai j associated with the arc from node j to node i denotes the minimum time that should elapse between the beginning of an activity at node j and the beginning of an activity at node i . By means of the principle of dynamic programming it is not difficult to calculate the critical path in the graph. Critical here refers to ‘slowest’. The total duration of 12 Synchronization and Linearity the overall project cannot be smaller than the summation of all numbers Ai j along the critical path. Another way of finding the time at which the activity at node i can start at the earliest, which will be denoted x i , is the following. Suppose activity 1 can start at epoch u at the earliest. This quantity u is an input variable which must be given from the outside. Hence x 1 = u . For the other x i ’s we can write x i = max ( Ai j + x j ) . j =1,... ,6 If there is no arc from node i to j , then Ai j gets assigned the value ε (= −∞). If x = (x 1 , . . . , x 6 ) and A = ( Ai j ), then we can compactly write x = Ax ⊕ Bu , where A= εε 5ε 3ε ε2 ε1 εε ε ε ε ε 4 8 ε ε ε ε ε 2 ε ε ε 5 ε 4 ε ε ε ε ε ε (1.20) , B = e ε ε ε ε ε . Note that e in B equals 0 in this context. Here we recognize the form of (1.11), although in (1.20) time does not play a role; x in the left-hand side equals the x in the right-hand side. Hence (1.20) is an implicit equation for the vector x . Let us see what we obtain by repeated substitution of the complete right-hand side of (1.20) into x of this same right-hand side. After one substitution: x = = A2 x ⊕ ABu ⊕ Bu A2 x ⊕ ( A ⊕ e) Bu , and after n substitutions: x = An x ⊕ ( An−1 ⊕ An−2 ⊕ · · · ⊕ A ⊕ e) Bu . In the formulæ above, e refers to the identity matrix; zeros on the diagonal and ε’s elsewhere. The symbol e will be used as the identity element for all spaces that will be encountered in this book. Similarly, ε will be used to denote the zero element of any space to be encountered. Since the entries of An denote the weights of paths with length n in the corresponding graph and A does not have paths of length greater than 4, we get An = −∞ for n ≥ 5. Therefore the solution x in the current example becomes x = ( A4 ⊕ A3 ⊕ A2 ⊕ A ⊕ e) Bu , for which we can write x = (e ⊕ A+ ) Bu , (1.21) 1.2. Miscellaneous Examples 13 where A+ was defined in (1.18). In (1.21), we made use of the series A∗ = e ⊕ A ⊕ · · · ⊕ An ⊕ An+1 ⊕ · · · , def (1.22) although it was concluded that Ak , k > n , does not contribute to the sum. With the conventional matrix calculus in mind one might be tempted to write for (1.22): (e ⊕ A ⊕ A2 ⊕ · · · ) = (e A)−1 . (1.23) Of course, we have not defined the inverse of a matrix within the current setting and so (1.23) is an empty statement. It is also strange to have a ‘minus’ sign in (1.23) and it is not known how to interpret this sign in the context of the max-operation at the left-hand side of the equation. It should be the reverse operation of ⊕. If we dare to continue along these shaky lines, one could write the solution of (1.20) as (e A)x = Bu ⇒ x = (e A)−1 Bu . Quite often one can guide one’s intuition by considering formal expressions of the kind (1.23). One tries to find formal analogies in the notation using conventional analysis. In Chapter 3 it will be shown that an inverse as in (1.23) does not exist in general and therefore we get ‘stuck’ with the series expansion. There is a dual way to analyze the critical path of Figure 1.3. Instead of starting at the initial node 1, one could start at the final node 6 and then work backward in time. This latter approach is useful when a target time for the completion of the project has been set. The question then is: what is the latest moment at which each node has to start its activity in such a way that the target time can still be fulfilled? If we call the starting times x i again, then it is not difficult to see that x i = min min A j where A= ε ε ε ε ε ε 5 ε ε ε ε ε ij + xj , B 3ε ε2 εε εε ε5 εε ε 1 4 ε ε ε ε ε 8 2 4 ε i +u , i = 1, . . . , 6 , , B= ε ε ε ε ε e . It is easily seen that A is equal to the transpose of A in (1.20); x 6 has been chosen as the completion time of the project. In matrix form, we can write x = A⊗x ⊕ B ⊗u , where ⊗ is now the matrix multiplication using min as addition of scalars and + as multiplication, whereas ⊕ is the min of vectors, componentwise. This topic of target times will be addressed in §5.6. 14 Synchronization and Linearity 1.2.2 Communication This subsection focuses on the Viterbi algorithm. It can conveniently be described by a formula of the form (1.1). The operations to be used this time are maximization and multiplication. The stochastic process of interest in this section, ν(k ), k ≥ 0, is a time homogeneous Markov chain with state space {1, 2, . . . , n }, defined on some probability space ( , F, P). The Markov property means that P ν (k + 1) = ik+1 | ν(0) = i0 , . . . , ν(k ) = ik = P ν (k + 1) = ik+1 | ν(k ) = ik , where P [A | B] denotes the conditional probability of the event A given the event B and A and B are in F. Let Mi j denote the transition probability1 from state j to i . The initial distribution of the Markov chain will be denoted p . The process ν = (ν(0), . . . , ν( K )) is assumed to be observed with some noise. This means that there exists a sequence of {1, 2, . . . , n }-valued random variables def z (k ), k = 0, . . . , K , called the observation, and such that Nik jk = P[z (k ) = ik | ν(k ) = jk ] does not depend on k and such that the joint law of (ν, z ), where z = (z (0), . . . , z ( K )), is given by the relation K P[ν = j , z = i ] = Nik jk M jk jk−1 Ni0 j0 p j0 , (1.24) k =1 where i = (i0 , . . . , i K ) and j = ( j0 , . . . , j K ). Given such a sequence z of observations, the question to be answered is to find the sequence j for which the probability P [ν = j | z ] is maximal. This problem is a highly simplified version of a text recognition problem. A machine reads handwritten text, symbol after symbol, but makes mistakes (the observation errors). The underlying model of the text is such that after having read a symbol, the probability of the occurrence of the next one is known. More precisely, the sequence of symbols is assumed to be produced by a Markov chain. We want to compute the quantity x jK ( K ) = max P [ν = j , z = i ] . (1.25) j0 ,... , j K −1 This quantity is also a function of i , but this dependence will not be made explicit. The argument that achieves the maximum in the right-hand side of (1.25) is the most likely text up to the ( K − 1)-st symbol for the observation i ; similarly, the argument j K which maximizes x jK ( K ) is the most likely K -th symbol given that the first K observations are i . From (1.24), we obtain K x jK ( K ) = = 1 Classically, max j0 ,... , j K −1 max j K −1 Nik jk M jk jk−1 Ni0 j0 p j0 k =1 NiK jK M jK jK −1 x jK −1 ( K − 1) in Markov chains, Mi j would rather be denoted M j i . , 1.2. Miscellaneous Examples 15 with initial condition x j0 (0) = Ni0 j0 p j0 . The reader will recognize the above algorithm as a simple version of (forward) dynamic programming. If Nik jk M jk jk−1 is denoted A jk jk−1 , then the general formula is x m (k ) = max ( Am x (k − 1)) , =1,... ,n m = 1, . . . , n . (1.26) This formula is similar to (1.1) if addition is replaced by maximization and multiplication remains multiplication. The Viterbi algorithm maximizes P[ν, z ] as given in (1.24). If we take the logarithm of (1.24), and multiply the result by −1, (1.26) becomes − ln(x m (k )) = min [− ln( Am ) − ln(x (k − 1))] , =1,... ,n m = 1, . . . , n . The form of this equation exactly matches (1.19). Thus the Viterbi algorithm is identical to an algorithm which determines the shortest path in a network. Actually, it is this latter algorithm—minimizing − ln(P[ν | z ])—which is quite often referred to as the Viterbi algorithm, rather than the one expressed by (1.26). 1.2.3 Production Consider a manufacturing system consisting of three machines. It is supposed to produce three kinds of parts according to a certain product mix. The routes to be followed by each part and each machine are depicted in Figure 1.4 in which Mi , i = 1, 2, 3, are the machines and Pi , i = 1, 2, 3, are the parts. Processing times are given in P1 P2 P3 M1 M2 M3 Figure 1.4: Routing of parts along machines Table 1.1. Note that this manufacturing system has a flow-shop structure, i.e. all parts follow the same sequence on the machines (although they may skip some) and every machine is visited at most once by each part. We assume that there are no set-up times on machines when they switch from one part type to another. Parts are carried on a limited number of pallets (or, equivalently, product carriers). For reasons of simplicity it is assumed that 1. only one pallet is available for each part type; 2. the final product mix is balanced in the sense that it can be obtained by means of a periodic input of parts, here chosen to be P1 , P2 , P3 ; 3. there are no set-up times or traveling times; 16 Synchronization and Linearity Table 1.1: Processing times P1 M1 M2 M3 3 4 P2 1 2 3 P3 5 3 4. the sequencing of part types on the machines is known and it is ( P2 , P3 ) on M1 , ( P1 , P2 , P3 ) on M2 and ( P1 , P2 ) on M3 . The last point mentioned is not for reasons of simplicity. If any machine were to start working on the part which arrived first instead of waiting for the appropriate part, the modeling would be different. Manufacturing systems in which machines start working on the first arriving part (if it has finished its current activity) will be dealt with in Chapter 9. We can draw a graph in which each node corresponds to a combination of a machine and a part. Since M1 works on 2 parts, M2 on 3 and M3 on 2, this graph has seven nodes. The arcs between the nodes express the precedence constraints between operations due to the sequencing of operations on the machines. To each node i in Figure 1.5 corresponds a number x i which denotes the earliest epoch at which the node can start its activity. In order to be able to calculate these quantities, the epochs at which the machines and parts (together called the resources) are available must be given. This is done by means of a six-dimensional input vector u (six since there are six resources: three machines and three parts). There is an output vector also; the components of the six-dimensional vector y denote the epochs at which the parts are ready and the machines have finished their jobs (for one cycle). The model becomes x = Ax ⊕ Bu ; (1.27) y = Cx , (1.28) in which the matrices are A= εε 1ε εε 1ε ε5 εε εε ε ε ε 3 ε 3 ε ε ε ε ε 2 ε 2 ε ε ε ε ε ε ε ε ε ε ε ε ε 4 ε ε ε ε ε ε ε ; B= e ε ε ε ε ε ε ε ε e ε ε ε ε ε ε ε ε ε e ε ε ε e ε ε ε ε e ε ε ε ε ε ε ε e ε ε ε ε ε ; 1.2. Miscellaneous Examples 17 C = ε ε ε ε ε ε 5 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε 3 ε ε ε 3 ε ε ε 4 ε ε ε ε 3 ε 3 ε . Equation (1.27) is an implicit equation in x which can be solved as we did in the P1 P3 u6 u4 1 2 3 4 5 y6 6 y4 7 y5 u1 M1 M 2 u2 M3 P2 u5 u3 y3 y1 y2 Figure 1.5: The ordering of activities in the flexible manufacturing system subsection on Planning; x = A∗ Bu . Now we add feedback arcs to Figure 1.5 as illustrated in Figure 1.6. In this graph Figure 1.6: Production system with feedback arcs the feedback arcs are indicated by dotted lines. The meaning of these feedback arcs is the following. After a machine has finished a sequence of products, it starts with the next sequence. If the pallet on which product Pi was mounted is at the end, the finished product is removed and the empty pallet immediately goes back to the starting point to pick up a new part Pi . If it is assumed that the feedback arcs have zero cost, then u (k ) = y (k − 1), where u (k ) is the k -th input cycle and y (k ) the k -th output. Thus we 18 Synchronization and Linearity can write y (k ) = Cx (k ) = = C A∗ Bu (k ) C A∗ By (k − 1) . (1.29) The transition matrix from y (k − 1) to y (k ) can be calculated (it can be done by hand, but a simple computer program does the job also): def ∗ M = CA B = 6 εε ε6 9 8ε 89 6 10 7 10 6 ε 74 7ε 6 10 7 10 6 9 8ε 89 5 8 ε ε ε 8 . This matrix M determines the speed with which the manufacturing system can work. We will return to this issue in §1.3 1.2.4 Queuing System with Finite Capacity Let us consider four servers, Si , i = 1, . . . , 4, in series (see Figure 1.7). Each cus- S1 S2 S3 S4 Figure 1.7: Queuing system with four servers tomer is to be served by S1 , S2, S3 and S4 , and specifically in this order. It takes τi (k ) time units for Si to serve customer k (k = 1, 2, . . . ). Customer k arrives at epoch u (k ) into the buffer associated with S1. If this buffer is empty and S1 is idle, then this customer is served directly by S1 . Between the servers there are no buffers. The consequence is that if Si , i = 1, 2, 3, has finished serving customer k , but Si +1 is still busy serving customer k − 1, then Si cannot start serving the new customer k + 1. He must wait. To complete the description of the queuing system, it is assumed that the traveling times between the servers are zero. Let x i (k ) denote the beginning of the service of customer k by server Si . Before Si can start serving customer k + 1, the following three conditions must be fulfilled: • Si must have finished serving customer k ; • Si +1 must be idle (for i = 4 this condition is an empty one); • Si −1 must have finished serving customer k + 1 (for i = 1 this condition is an empty one and must be related to the arrival of customer k + 1 in the queuing system). 1.2. Miscellaneous Examples 19 It is not difficult to see that the vector x , consisting of the four x -components, satisfies ε ε ε ε τ (k + 1) ε ε ε x (k + 1) x (k + 1) = 1 ε τ2 (k + 1) ε ε ε ε τ3 (k + 1) ε (1.30) e τ1 (k ) e ε ε ε ε τ2 (k ) e ε x (k ) ⊕ u (k + 1) . ⊕ ε ε ε τ3 (k ) e ε ε ε ε τ4 (k ) We will not discuss issues related to initial conditions here. For those questions, the reader is referred to Chapters 2 and 7. Equation (1.30), which we write formally as x (k + 1) = A2 (k + 1, k + 1)x (k + 1) ⊕ A1 (k + 1, k )x (k ) ⊕ Bu (k + 1) , is an implicit equation in x (k + 1) which can be solved again, as done before. The result is x (k + 1) = ( A2 (k + 1, k + 1))∗ ( A1 (k + 1, k )x (k ) ⊕ Bu (k + 1)) , where ( A2 (k + 1, k + 1))∗ equals e τ1 (k + 1) τ1 (k + 1)τ2 (k + 1) τ1 (k + 1)τ2 (k + 1)τ3 (k + 1) ε e τ2 (k + 1) τ2 (k + 1)τ3 (k + 1) ε ε ε ε . e ε τ3 (k + 1) e The customers who arrive in the queuing system and cannot directly be served by S1 , wait in the buffer associated with S1 . If one is interested in the buffer contents, i.e. the number of waiting customers, at a certain moment, one should use a counter (of customers) at the entry of the buffer and one at the exit of the buffer. The difference of the two counters yields the buffer contents, but this operation is nonlinear in the maxplus algebra framework. In §1.2.6 we will return to the ‘counter’-description of discrete event systems. The counters just mentioned are nondecreasing with time, whereas the buffer contents itself is fluctuating as a function of time. The design of buffer sizes is a basic problem in manufacturing systems. If the buffer contents tends to go to ∞, one speaks of an unstable system. Of course, an unstable system is an example of a badly designed system. In the current example, buffering between the servers was not allowed. Finite buffers can also be modeled within the max-plus algebra context as shown in the next subsection and more generally in §2.6.2. Another useful parameter is the utilization factor of a server. It is defined by the ‘busy time’ divided by the total time elapsed. Note that we did not make any assumptions on the service time τi (k ). If one is faced with unpredictable breakdowns (and subsequently a repair time) of the servers, then the service times might be modeled stochastically. For a deterministic and invariant (‘customer invariant’) system, the serving times do not, by definition, depend on the particular customer. 20 1.2.5 Synchronization and Linearity Parallel Computation The application of this subsection belongs to the field of VLSI array processors (VLSI stands for ‘Very Large Scale Integration’). The theory of discrete events provides a method for analyzing the performances of so-called systolic and wavefront array processors. In both processors, the individual processing nodes are connected together in a nearest neighbor fashion to form a regular lattice. In the application to be described, all individual processing nodes perform the same basic operation. The difference between systolic and wavefront array processors is the following. In a systolic array processor, the individual processors, i.e. the nodes of the network, operate synchronously and the only clock required is a simple global clock. The wavefront array processor does not operate synchronously, although the required processing function and network configuration are exactly the same as for the systolic processor. The operation of each individual processor in the wavefront case is controlled locally. It depends on the necessary input data available and on the output of the previous cycle having been delivered to the appropriate (i.e. directly downstream) nodes. For this reason a wavefront array processor is also called a data-driven net. We will consider a network in which the execution times of the nodes (the individual processors) depend on the input data. In the case of a simple multiplication, the difference in execution time is a consequence of whether at least one of the operands is a zero or a one. We assume that if one of the operands is a zero or a one, the multiplication becomes trivial and, more importantly, faster. Data driven networks are at least as fast as systolic networks since in the latter case the period of the synchronization clock must be large enough to include the slowest local cycle or largest execution time. . . . B 21 B 11 . . . B 22 B 12 0 . . . ; A 12 ; A 11 0 2 1 0 . . . ; A 22 ; A 21 3 0 4 Figure 1.8: The network which multiplies two matrices Consider the network shown in Figure 1.8. In this network four nodes are connected. Each of these nodes has an input/output behavior as given in Figure 1.9. The purpose of this network is to multiply two matrices A and B ; A has size 2 × n and B has size n × 2. The number n is large ( 2), but otherwise arbitrary. The entries of the rows of A are fed into the network as indicated in Figure 1.8; they enter the two nodes on the left. Similarly, the entries of B enter the two top nodes. Each node will start its activity as soon as each of the input channels contains a data item, provided it has 1.2. Miscellaneous Examples 21 . . . Bl+1,m Blm ηγ . . . , Ai,j +1 , Aij Ai, j−1 . . . Bl−1,m . . . . . . Bl+1,m . . . Ai,j +1 ηγ Aij Blm + Aij , A i,j −1 . . . Blm Bl−1,m . . . Figure 1.9: Input and output of individual node, before and after one activity finished its previous activity and sent the output of that previous activity downstream. Note that in the initial situation—see Figure 1.8—not all channels contain data. Only node 1 can start its activity immediately. Also note that initially all loops contain the number zero. The activities will stop when all entries of A and B have been processed. Then each of the four nodes contains an entry of the product AB . Thus for instance, node 2 contains ( AB )12 . It is assumed that each node has a local memory at each of its input channels which functions as a buffer and in which at most one data item can be stored if necessary. Such a necessity arises when the upstream node has finished an activity and wants to send the result downstream while the downstream node is still busy with an activity. If the buffer is empty, the output of the upstream node can temporarily be stored in this buffer and the upstream node can start a new activity (provided the input channels contain data). If the upstream node cannot send out its data because one or more of the downstream buffers are full, then it cannot start a new activity (it is ‘blocked’). Since it is assumed that each node starts its activities as soon as possible, the network of nodes can be referred to as a wavefront array processor. The execution time of a node is either τ1 or τ2 units of time. It is τ1 if at least one of the input items, from the left and from above ( Ai j and Bi j ), is a zero or a one. Then the product to be performed becomes a trivial one. The execution time is τ2 if neither input contains a zero or a one. It is assumed that the entry Ai j of A equals zero or one with probability p , 0 ≤ p ≤ 1, and that Ai j is neither zero nor one with probability 1 − p . The entries of B are assumed to be neither zero nor one (or, if such a number would occur, it will not be detected and exploited). If x i (k ) is the epoch at which node i becomes active for the k -th time, then it follows from the description above that x 1 (k + 1) = α1 (k )x 1 (k ) ⊕ x 2 (k − 1) ⊕ x 3 (k − 1) , x 2 (k + 1) = α1 (k )x 2 (k ) ⊕ α1 (k + 1)x 1 (k + 1) ⊕ x 4 (k − 1) , x 3 (k + 1) = α2 (k )x 3 (k ) ⊕ α1 (k + 1)x 1 (k + 1) ⊕ x 4 (k − 1) , (1.31) x 4 (k + 1) = α2 (k )x 4 (k ) ⊕ α1 (k + 1)x 2 (k + 1) ⊕ α2 (k + 1)x 3 (k + 1) . In these equations, the coefficients αi (k ) are either τ1 (if the entry is either a zero or a 22 Synchronization and Linearity one) or τ2 (otherwise); αi (k + 1) have the same meaning with respect to the next entry. Systems of this type will be considered in Chapter 7. There is a correlation among the coefficients of (1.31) during different time steps. By replacing x 2 (k ) and x 3 (k ) by x 2 (k + 1) and x 3 (k + 1), respectively, x 4 (k ) by x 4 (k + 2) and α2 (k ) by α2 (k + 1), we obtain x 1 (k + 1) = α1 (k )x 1 (k ) ⊕ x 2 (k ) ⊕ x 3 (k ) , x 2 (k + 1) = α1 (k − 1)x 2 (k ) ⊕ α1 (k )x 1 (k ) ⊕ x 4 (k ) , (1.32) x 3 (k + 1) = α2 (k )x 3 (k ) ⊕ α1 (k )x 1 (k ) ⊕ x 4 (k ) , x 4 (k + 1) = α2 (k − 1)x 4 (k ) ⊕ α1 (k − 1)x 2 (k ) ⊕ α2 (k )x 3 (k ) . The correlation between some of the αi -coefficients still exists. The standard procedure to avoid problems connected to this correlation is to augment the state vector x . Two new state variables are introduced: x 5 (k + 1) = α1 (k ) and x 6 (k + 1) = α2 (k ). Equation (1.32) can now be written as x 1 (k + 1) = α1 (k )x 1 (k ) ⊕ x 2 (k ) ⊕ x 3 (k ) , x 2 (k + 1) = x 5 (k )x 2 (k ) ⊕ α1 (k )x 1 (k ) ⊕ x 4 (k ) , x 3 (k + 1) = α2 (k )x 3 (k ) ⊕ α1 (k )x 1 (k ) ⊕ x 4 (k ) , (1.33) x 4 (k + 1) = x 6 (k )x 4 (k ) ⊕ x 5 (k )x 2 (k ) ⊕ α2 (k )x 3 (k ) , x 5 (k + 1) = α1 (k ) , x 6 (k + 1) = α2 (k ) . The correlation in time of the coefficients αi has disappeared at the expense of a larger state vector. Also note that Equation (1.33) has terms x j (k ) ⊗ xl (k ), which cause the equation to become nonlinear (actually, bilinear). For our purposes of calculating the performance of the array processor, this does not constitute basic difficulties, as will be seen in Chapter 8. Equation (1.32) is non-Markovian and linear, whereas (1.33) is Markovian and nonlinear. 1.2.6 Traffic In a metropolitan area there are three railway stations, S1 , S2 , and S3 , which are connected by a railway system as indicated in Figure 1.10. The railway system consists of two inner circles, along which the trains run in opposite direction, and of three outer circles. The trains on these outer circles deliver and pick up passengers at local stations. These local stations have not been indicated in Figure 1.10 since they do not play any role in the problem to be formulated. There are nine railway tracks. Track Si S j denotes the direct railway connection from station Si to station S j ; track Si Si denotes the outer circle connected to station Si . Initially, a train runs along each of these nine tracks. At each station the trains must wait for the trains arriving from the other directions (except for the trains coming from the direction the current train is heading for) in order to allow for transfers. Another assumption to be satisfied is that trains on the same circle cannot bypass one another. If x i (k ) denotes the k -th departure time of the train in direction i (see Figure 1.10) then these departure times are described by x (k + 1) = A1 ⊗ x (k ), where A1 is the 9 × 9 1.2. Miscellaneous Examples 23 8 6 2 9 5 S2 3 S3 S1 1 7 4 Figure 1.10: The railway system, routing no. 1 matrix A1 = e s12 ε ε ε ε ε s12 ε ε e s23 ε ε ε ε ε s23 s31 ε e ε ε ε s31 ε ε ε ε ε e ε s13 ε ε s13 ε ε ε s21 e ε s21 ε ε ε ε ε ε s32 e ε s32 ε s11 ε ε s11 ε ε s11 ε ε ε s22 ε ε s22 ε ε s22 ε ε ε s33 ε ε s33 ε ε s33 . An entry si j refers to the traveling time on track Si S j . These quantities include transfer times at the stations. The diagonal entries e prevent trains from bypassing one another on the same track at a station. The routing of the trains was according to Figure 1.10; trains on the two inner circles stay on these inner circles and keep the same direction; trains on the outer circles remain there. Other routings of the trains are possible; two such different routings are given in Figures 1.11 and 1.12. If x i (k ) denotes the k -th departure time from the same station as given in Figure 1.10, then the departure times are described again by a model of the form x (k + 1) = A ⊗ x (k ). The A-matrix corresponding to Figure 1.11 is indicated by A2 and the A-matrix corresponding to Figure 1.12 by A3 . If we define matrices Fi of size 3 × 3 in the following way: εεε ε ε s31 ε , F1 = ε ε ε , F2 = s12 ε εεε ε s23 ε s11 F3 = ε ε ε s22 ε ε ε , s33 ε F4 = ε s13 s21 ε ε ε s32 , ε 24 Synchronization and Linearity S3 S3 S2 S2 S1 S1 Figure 1.11: Routing no. 2 Figure 1.12: Routing no. 3 then the matrices A1 , A2 and A3 can be compactly written as F1 F3 e ⊕ F2 e ⊕ F2 e ⊕ F4 F3 , A2 = F2 A1 = F1 F2 F4 F3 F2 e ⊕ F2 A3 = F1 F2 F1 e F4 F4 e ⊕ F4 F4 F3 F3 , F3 F3 F3 . e It is seen that for different routing schemes, different models are obtained. In general, depending on the numerical values of si j , the time evolutions of these three models will be different. One needs a criterion for deciding which of the routing schemes is preferable. Depending on the application, it may be more realistic to introduce stochastic si j quantities. Suppose for instance that there is a swing bridge in the railway track from S3 to S1 (only in this track; there is no bridge in the track from S1 to S3 ). Each time a train runs from S3 to S1 there is a probability p , 0 ≤ p ≤ 1, that the train will be delayed, i.e. s31 becomes larger. Thus the matrices Ai become k -dependent; Ai (k ). The system has become stochastic. In this situation one may also have a preference for one of the three routings, or for another one. The last part of the discussion of this example will be devoted to the deterministic model of routing no. 1 again. However, to avoid some mathematical subtleties which are not essential at this stage, we assume that there is at least a difference of τ time units between two subsequent departures of trains in the directions 1 to 6 (τ > e). Consequently, the equations are now x (k + 1) = A1 ⊗ x (k ), where A1 equals the previous A1 except for the first six diagonal entries that are replaced by τ (instead of e earlier), and the last three diagonal entries sii that are replaced by sii ⊕ τ respectively. We then introduce a quantity χi (t ) which is related to x i (k ). The argument t of χi (t ) refers to the actual clock time and χi (t ) itself refers to the number of train departures 1.2. Miscellaneous Examples 25 in the direction i which have occurred up to (and including) time t . The quantity χi can henceforth only assume the values 0, 1, 2, . . . . At an arbitrary time t , the number of trains which left in the direction 1 can exceed at most by one: • the same number of trains τ units of time earlier; • the number of trains that left in the direction 3 s31 time units earlier (recall that initially a train was already traveling on each track); • the number of trains that left in the direction 7 s11 time units earlier. Therefore, we have χ1 (t ) = min (χ1 (t − τ ) + 1, χ3 (t − s31 ) + 1, χ7 (t − s11 ) + 1) . For χ2 one similarly obtains χ2 (t ) = min (χ1 (t − s12 ) + 1, χ2 (t − τ ) + 1, χ8 (t − s2 ) + 1) , etc. If all quantities si j and τ are equal to 1, then one can compactly write χ(t ) = A ⊗ χ(t − 1) , where χ(t ) = (χ1 (t ), . . . , χ9 (t )) , and where the matrix A is derived from A1 by replacing all entries equal to ε by +∞ and the other entries by 1. This equation must be read in the min-plus algebra setting. In case we want something more general than si j and τ equal to 1, we will consider the situation when all these quantities are integer multiples of a positive constant gc ; then gc can be interpreted as the time unit along which the evolution will be expressed. One then obtains χ(t ) = A 1 ⊗ χ(t − 1) ⊕ A2 ⊗ χ(t − 2) ⊕ · · · ⊕ Al ⊗ χ(t − l ) (1.34) for some finite l . The latter equation (in the min-plus algebra) and x (k + 1) = A1 ⊗ x (k ) (in the max-plus algebra) describe the same system. Equation (1.34) is referred to as the counter description and the other one as the dater description. The word ‘dater’ must be understood as ‘timer’, but since the word ‘time’ and its declinations are already used in various ways in this book, we will stick to the word ‘dater’. The awareness of these two different descriptions for the same problem has far-reaching consequences as will be shown in Chapter 5. The reader should contemplate that the stochastic problem (in which some of the si j are random) is more difficult to handle in the counter description, since the delays in (1.34) become stochastic (see §8.2.4). 1.2.7 Continuous System Subject to Flow Bounds and Mixing So far, the examples have been related to the realm of discrete event systems, and dynamic equations have been obtained under the form of recursive equations involving max (or min) and +-operations. We close this section by showing that a continuoustime system may naturally lead to essentially the same type of equations whenever 26 Synchronization and Linearity some flow limitation and mixing phenomena are involved. Had we adopted the point of view of conventional system theory, the model of such a system in terms of differential equations would have exhibited very complex nonlinearities. With the following approach, we will see that this system is ‘linear’ in a precise mathematical sense (see Chapter 6). An analogous discrete event model will be discussed in Example 2.48. In Figure 1.13, a fluid is poured through a long pipe into a first reservoir (empty Figure 1.13: A continuous system at time t = 0). The input u (t ) denotes the cumulated flow at the inlet of the pipe up to time t (hence u (t ) is a nondecreasing time function and u (t ) = 0 for t ≤ 0). It is assumed that it takes a delay of 2 units of time (say, 2 seconds) for the fluid to travel through the pipe. From the first reservoir, the fluid drops into a second reservoir through an aperture which limits the instantaneous flow to a maximum value of, say, 0.5 liter per second. The volume of fluid at time t in this second reservoir is denoted y (t ), and it is assumed that y (0) = 3 liters. Let us establish dynamic equations for such a system relating the output y to the input u . Because of the flow limitation into the second reservoir, we have: ∀t , ∀s ≥ 0 , y (t + s ) ≤ y (t ) + 0.5s . (1.35) On the other hand, since there is a delay of 2 seconds caused by the pipe, y (t ) should be compared with u (t − 2), and because there is a stock of 3 liters in the second reservoir at t = 0, we have: ∀t , y (t ) ≤ u (t − 2) + 3 . (1.36) It follows that ∀t , ∀s ≥ 0 , y (t ) ≤ y (t − s ) + 0.5s ≤ u (t − 2 − s ) + 3 + 0.5s , hence, ∀t , y (t ) ≤ = inf[u (t − 2 − s ) + 3 + 0.5s ] s ≥0 inf [u (t − τ ) + 3 + 0.5(τ − 2)] . τ ≥2 (1.37) 1.2. Miscellaneous Examples 27 Let def h (t ) = 3 3 + 0.5(t − 2) if t ≤ 2; otherwise. (1.38) and consider ∀t , def y (t ) = inf [u (t − τ ) + h (τ )] . τ ∈R (1.39) Indeed, in (1.39), the range of τ may be limited to τ ≥ 2 since, for τ < 2, h (τ ) remains equal to 3 whereas u (t − τ ) ≥ u (t − 2) (remember that u (·) is nondecreasing). Therefore, comparing (1.39) with (1.37), it is clear that y (t ) ≤ y (t ), ∀t . Moreover, choosing τ = 2 at the right-hand side of (1.39), we see that y satisfies (1.36). In addition, since for all s and all ϑ ≥ 0, h (s + ϑ) ≤ h (s ) + 0.5ϑ , then ∀t , ∀θ ≥ 0 , y (t + ϑ) = = ≤ = inf [u (t + ϑ − τ ) + h (τ )] τ ∈R inf [u (t − s ) + h (s + ϑ)] s ∈R inf [u (t − s ) + h (s )] + 0.5ϑ s ∈R y (t ) + 0.5ϑ . Thus, y satisfies (1.35). Finally, we have proved that y is the maximum solution of (1.35)–(1.36). It can also be checked that (1.39) yields y (t ) = 3, ∀t ≤ 2. Therefore, y is the solution which will be physically realized if we assume that, subject to (1.37)–(1.39), the fluid flows as fast as possible. This output trajectory is related to the input history u by an ‘infconvolution’ (see Equation (1.39)). In order to make this inf-convolution more visible, the inf-operator should be viewed as an ‘integration’ (which is nothing else than the ⊕-operator ranging over the real numbers). If moreover + in (1.39) is replaced by ⊗ one obtains the appearance of the conventional convolution. The same kind of inputoutput relationship (indeed, a ‘sup-convolution’ in that context) can be obtained from the recursive equations (1.12) by developing the recursion from any initial condition. As a final remark, observe that if we have two systems similar to the one shown in Figure 1.13, one producing a red fluid and the other producing a white fluid, and if we want to produce a pink fluid by mixing them in equal proportions, then the new output is related to the two inputs by essentially the same type of equations. More specifically, let yr (t ) and yw (t ) be the quantities of red and white fluids that have been produced in the two downstream reservoirs up to time t (including the initial reserves). Suppose that the two taps at their outlets are opened so that the same (very large) outflow of red and white liquids can be obtained unless one of the two reservoirs is empty, in which case the two taps are closed immediately. Then, min( yr (t ), yw (t )) is directly related to the quantity yp (t ) of pink fluid produced up to time t . Therefore, this mixing operation does not introduce new mathematical operators. 28 1.3 Synchronization and Linearity Issues and Problems in Performance Evaluation In the previous sections we dealt with equations of the form x (k + 1) = Ax (k ), or more generally x (k + 1) = A(k )x (k ) ⊕ B (k )u (k ). In the applications three different interpretations in terms of the operations were given: maximization and addition, minimization and addition and lastly, maximization and multiplication. In this section only the first interpretation will be considered (we will say that the system under consideration is in the max-plus algebra framework). Before that a brief introduction to the solution of the conventional equation (1.1) is needed. Assume that the initial vector (1.2) equals an eigenvector of A; the corresponding eigenvalue is denoted by λ. The solution of (1.1) can be written as x (t ) = λt x 0 , t = 0, 1, . . . . (1.40) More generally, if the initial vector can be written as a linear combination of the set of linearly independent eigenvectors, x0 = cjvj , (1.41) j where v j is the j -th eigenvector with corresponding eigenvalue λ j , the c j are coefficients, then x (t ) = c j λtj v j . j If the matrix A is diagonalizable, then the set of linearly independent eigenvectors spans Rn , and any initial condition x 0 can be expressed as in (1.41). If A is not diagonalizable, then one must work with generalized eigenvectors and the formula which expresses x (t ) in terms of eigenvalues and x 0 is slightly more complicated. This complication does not occur in the max-plus algebra context and therefore will not be dealt with explicitly. In Chapter 3 it will be shown that under quite general conditions an eigenvalue (λ) and corresponding eigenvector (v ) also exist in the max-plus algebra context for a square matrix ( A). The definition is A⊗v = λ⊗v . To exclude degenerate cases, it is assumed that not all components of v are identical to ε. As an example of a (nondegenerate) eigenvalue: 3 2 7 4 2.5 e = 4.5 2.5 e . Thus it is seen that the matrix A of (1.5) has an eigenvalue 4.5. Equation (1.40) is also valid in the current setting. If x 0 is an eigenvector of A, with corresponding eigenvalue λ, then the solution of the difference equation (1.8) can be written as x (k ) = λk x 0 ( = λk ⊗ x 0 ) , k = 0, 1, . . . . (1.42) 1.3. Issues and Problems in Performance Evaluation 29 The numerical evaluation of λk in this formula equals k λ in conventional analysis. The eigenvalue λ can be interpreted as the cycle time (defined as the inverse of the throughput) of the underlying system; each node of the corresponding network becomes active every λ units of time, since it follows straightforwardly from (1.42). Also, the relative order in which the nodes become active for the k -th time, as expressed by the components x i (k ), is exactly the same as the relative order in which the nodes become active for the (k + 1)-st time. More precisely, Equation (1.42) yields xl (k + 1) − x j (k + 1) = xl (k ) − x j (k ) , j , l = 1, . . . , n . Thus the solution (1.42) exhibits a kind of periodicity. Procedures exist for the calculation of eigenvalues and eigenvectors; an efficient one is the procedure known as Karp’s algorithm for which the reader is referred to Chapter 2. More discussion about related issues can be found in Chapter 3. Under suitable conditions the eigenvalue turns out to be unique (which differs from the situation in conventional analysis). It can be shown for instance that A of (1.5) has only one eigenvalue. Similarly, the matrix M of §1.2.3 also has a unique eigenvalue: e e 6 εε ε65 3 9 8 ε 8 9 8 3 6 10 7 10 6 ε 3.5 = 9.5 3.5 . 0.5 0.5 ε 7 4 7 ε ε 3.5 6 10 7 10 6 ε 3.5 3 3 9 8ε 898 It follows that the eigenvalue equals 9.5, which means in more practical terms that the manufacturing system ‘delivers’ an item (a product or a machine) at all of its output channels every 9.5 units of time. The eigenvector of this example is also unique, apart from adding the same constant to all components. If v is an eigenvector, then cv , where c is a scalar, also is an eigenvector, as it follows directly from the definition of eigenvalue. It is possible that several eigenvectors can be associated with the only eigenvalue of a matrix, i.e. eigenvectors may not be identical up to an additional constant. Suppose that we deal with the system characterized by the matrix of (1.5); then it is known from earlier that the ‘cycle time’ is 9/2 units of time. The throughput is defined as the inverse of the cycle time and equals 2/9. If we had the choice of reducing one arbitrary entry of A by 2, which entry should we choose such that the cycle time becomes as small as possible? To put it differently, if a piece of equipment were available which reduces the traveling time at any connection by 2, where should this piece of equipment be placed? By trial and error it is found that either A12 or A21 should be reduced by 2; in both cases the new cycle time becomes 4. If one reduces A11 or A22 by this amount instead of A12 or A21 , then the cycle time remains 9/2. The consequences of the four potential ways of reduction are expressed by 17 24 2.5 e 37 04 3 e = 4.5 =4 3 e 2.5 e ; ; 35 24 1 e 37 22 2.5 e =4 1 e = 4.5 ; 2.5 e . 30 Synchronization and Linearity To answer the question of which ‘transportation line’ to speed up for more general networks, application of the trial and error method as used above would become very laborious. Fortunately more elegant and more efficient methods exist. For those one needs the notion of a critical circuit, which is elaborated upon in Chapters 2 and 3. Without defining such a circuit in this section formally let us mention that, in Figure 1.1, this critical circuit consists of the arcs determined by A12 and A21 . Note that ( A12 + A21 )/2 = λ = 9/2, and this equality is not a coincidence. Stochastic extensions are possible. Towards that end, consider x (k + 1) = A(k )x (k ) , where the matrix A now depends on k in a stochastic way. Assume that x ∈ R2 and that for each k the matrix A is one of the following two matrices: 37 24 , 3 2 5 4 . Both matrices occur with probability 1/2 and there is no correlation in time. A suitable definition of cycle time turns out to be lim E [x i (k + 1) − x i (k )] , k →∞ where E denotes mathematical expectation. Application of the theory presented in Chapters 7 and 8 shows that this cycle time is independent of i and is equal to 13/3. Conventional linear systems with inputs and outputs are of the form (1.10), although (1.10) itself has the max-plus algebra interpretation. This equation is a representation of a linear system in the time domain. Its representation in the z -domain equals Y (z ) = C (z I − A)−1 BU (z ) , where Y (z ) and U (z ) are defined by ∞ Y (z ) = y (i )z −i ∞ , U (z ) = i =0 u (i )z −i , i =0 def where it is tacitly assumed that the system was at rest for t ≤ 0. The matrix H (z ) = C (z I − A)−1 B is called the transfer matrix of the system. Here I refers to the identity matrix in conventional algebra. The notion of transfer matrix is especially useful when subsystems are combined to build larger systems, by means of parallel, series and feedback connections. In the max-plus algebra context, the z -transform also exists (see [72]), but here we will rather refer to the γ -transform where γ operates as z −1 . For instance, the γ -transform of u is defined as ∞ u (i ) ⊗ γ i , U (γ ) = i =0 1.3. Issues and Problems in Performance Evaluation 31 and Y (γ ) and X (γ ) are defined likewise. Multiplication of (1.12) by γ k yields γ −1 x (k + 1)γ k+1 y (k )γ k = = A ⊗ x (k )γ k ⊕ B ⊗ u (k )γ k , C ⊗ x (k )γ k . (1.43) If these equations are summed with respect to k = 0, 1, . . . , then we obtain γ −1 X (γ ) Y (γ ) = = A ⊗ X (γ ) ⊕ B ⊗ U (γ ) ⊕ γ −1 x 0 , C ⊗ X (γ ) . (1.44) The first of these equations can be solved by first multiplying (max-plus algebra), equivalently adding (conventional) the left- and right-hand sides by γ and then repeatedly substituting the right-hand side for X (γ ) within this right-hand side. This results in X (γ ) = (γ A)∗ (γ BU (γ ) ⊕ x 0 ) . Thus we obtain Y (γ ) = H (γ )U (γ ), provided that x 0 = ε, and where the transfer matrix H (γ ) is defined by H (γ ) = C ⊗ (γ A)∗ ⊗ γ ⊗ B = γ C B ⊕ γ 2 C AB ⊕ γ 3 C A2 B ⊕ · · · . (1.45) The transfer matrix is defined by means of an infinite series and the convergence depends on the value of γ . If the series is convergent for γ = γ , then it is also convergent for all γ ’s which are smaller than γ . If the series does not converge, it still has a meaning as a formal series. Exactly as in conventional system theory, the product of two transfer matrices (in which it is tacitly assumed that the sizes of these matrices are such that the multiplication is possible), is a new transfer matrix which refers to a system which consists of the original systems connected in series. In the same way, the sum of two transfer matrices refers to two systems put in parallel. This section will be concluded by an example of such a parallel connection. We are given two systems. The first one is given in (1.13), and is characterized by the 1 × 1 transfer matrix H1 = εγ ⊕ 11γ 2 ⊕ 15γ 3 ⊕ 20γ 4 ⊕ 24γ 5 ⊕ 29γ 6 ⊕ · · · . It is easily shown that this series converges for γ ≤ −4.5; the number 4.5 corresponds to the eigenvalue of A. The second system is given by eε4 ε x (k + 1) = 1 1 ε x (k ) ⊕ 2 u (k ) , ε63 e y (k ) = ( 1 1 4 )x (k ) , and its transfer matrix is H2 = 4γ ⊕ 12γ 2 ⊕ 15γ 3 ⊕ 18γ 4 ⊕ 23γ 5 ⊕ 26γ 6 ⊕ · · · . 32 Synchronization and Linearity The transfer matrix of the two systems put in parallel has size 1 × 1 again (one can talk about a transfer function) and is obtained as Hpar = H1 ⊕ H2 = 4γ ⊕ 12γ 2 ⊕ 15γ 3 ⊕ 20γ 4 ⊕ 24γ 5 ⊕ 29γ 6 ⊕ · · · . (1.46) A transfer function can easily be visualized. If H (γ ) is a scalar function, i.e. the system has one input and one output, then it is a continuous and piecewise linear function. As an example, the transfer function of the parallel connection considered above is pictured in Figure 1.14. Above it was shown how to derive the transfer matrix of a system if the representation of the system in the ‘time domain’ is given. This time domain representation is characterized by the matrices A, B and C . Now one could pose the opposite question: How can we obtain a time domain representation, or equivalently, how can we find A, B and C if the transfer matrix is given? A partial answer to this question is given in [103]. For the example above, one would like to obtain a time domain representation of the two systems put in parallel starting from (1.46). This avenue will not be pursued now. Instead, one can always obtain such a representation by connecting the underlying networks of the two original systems in the appropriate (‘parallel’) way and then derive the state equations directly. In this way one gets for the above example, 37εεε ε 2 4 ε ε ε 1 x (k + 1) = ε ε e ε 4 x (k ) ⊕ ε u (k ) , ε ε 1 1 ε 2 εεε63 e y (k ) = ( 3 ε 1.4 1 1 4 )x (k ) . Notes A few times, reference has been made to (linear) system theory. Classic texts are [81], [32] or [72]. A few examples could be phrased in terms of dynamic programming. There are many elementary texts which explain the theory of dynamic programming. A more advanced text is [18]. Systems in the context of max-plus algebra were probably first described in [49], though most theory in this book is algebraically rather than system oriented. It was in [39] where the relation between system theory and max-plus algebra was clearly shown. The shortest path problem is a standard example in dynamic programming texts. The Viterbi algorithm was found in [59]. The example on production was presented in [38], the examples of parallel computation and traffic can be found in [97]. Other simple examples can be found in [101]. A max-plus modeling of the Dutch intercity railway net is given in [27]. An application to chemical batch processing is given in [107]. A good introductory text to design methods of processors for parallel computation is [79]. The relation between eigenvalue and cycle time was developed in [39]. Stochastic extensions were given in [104]. The connection between transfer matrices and state equations in the max-plus algebra context was investigated in [103]; see also §9.2.3. 33 24 γγ 5 20 γγ 4 15 γγ 3 1.4. Notes 7 3 3 γγ H par 6 29 γγ 6 4 2 −6 −4 −2 0 12 γ 2 γ −8 −2 15 3 20 γγ γγ 4 24 γγ 5 6 29 γγ 7 33 γγ 4γ γ −4 Figure 1.14: The transfer function Hpar as a function of γ −6 34 Synchronization and Linearity Chapter 2 Graph Theory and Petri Nets 2.1 Introduction An overview of various results in the theories of graphs and Petri nets will be given. Several properties of DEDS can be expressed in terms of graph-theoretic notions such as strong connectedness and critical circuits. A rich relationship exists between graphs and matrices, about which many books have been written. Here, we will emphasize some of these relationships between directed graphs and matrices, together with their consequences if such matrices and graphs are composed to build larger ones. The way to construct such compositions is by means of parallel and series connections. Petri nets, which describe DEDS pictorially, can be viewed as bipartite graphs. An essential feature of Petri nets, not present in conventional graphs, is that they are dynamic systems. Tokens are used to reflect this dynamic behavior. There is an equivalence between DEDS without concurrency and a subclass of Petri nets called ‘event graphs’. For any timed event graph, we will show how to obtain a mathematical model in terms of recurrence equations. In the proper algebraic framework, these equations are linear and the model offers a strong analogy with conventional linear dynamic systems. In the last part of this chapter, starting from the point of view of resources involved in DEDS, we propose a methodolgy to go from the specifications of a concrete system to its modeling by event graphs. 2.2 Directed Graphs A directed graph G is defined as a pair (V , E ), where V is a set of elements called nodes and where E is a set the elements of which are ordered (not necessarily different) pairs of nodes, called arcs. The possibility of several arcs between two nodes exists (one then speaks about a multigraph); in this chapter, however, we almost exclusively deal with directed graphs in which there is at most one (i.e. zero or one) arc between any two nodes. One distinguishes graphs and directed graphs. The difference between the two is that in a graph the elements of E are not ordered while they are in a directed graph. Instead of nodes and arcs, one also speaks about vertices and edges, respectively. The origin of the symbols V and E in the definition of a (directed) graph is due to the first 35 36 Synchronization and Linearity letters of the latter two names. Instead of directed graph one often uses the shorter word ‘digraph’, or even ‘graph’ if it is clear from the context that digraph is meant. In this chapter we will almost exclusively deal with digraphs (hence also called graphs). Denote the number of nodes by n , and number the individual nodes 1, 2, . . . , n . If (i, j ) ∈ E , then i is called the initial node or the origin of the arc (i, j ), and j the final node or the destination of the arc (i, j ). Graphically, the nodes are represented by points, and the arc (i, j ) is represented by an ‘arrow’ from i to j . We now give a list of concepts of graph theory which will be used later on. Predecessor, successor. If in a graph (i, j ) ∈ E , then i is called a predecessor of j and j is called a successor of i . The set of all predecessors of j is indicated by π( j ) and the set of all successors of i is indicated by σ (i ). A predecessor is also called an upstream node and a successor is also called a downstream node. Source, sink. If π(i ) = ∅, then node i is called a source; if σ (i ) = ∅ then i is called a sink. Depending on the application, a source, respectively sink, is also called an input(-node), respectively an output(-node) of the graph. Path, circuit, loop, length. A path ρ is a sequence of nodes (i1 , i2 , . . . , i p ), p > 1, such that i j ∈ π(i j +1 ), j = 1, . . . , p − 1. Node i1 is the initial node and i p is the final one of this path. Equivalently, one also says that a path is a sequence of arcs which connects a sequence of nodes. An elementary path is a path in which no node appears more than once. When the initial and the final nodes coincide, one speaks of a circuit. A circuit (i1 , i2 , . . . , i p = i1 ) is an elementary circuit if the path (i1 , i2 , . . . , i p −1 ) is elementary. A loop is a circuit (i, i ), that is, a circuit composed of a single node which is initial and final. This definition assumes that i ∈ π(i ), that is, there does exist an arc from i to i . The length of a path or a circuit is equal to the sum of the lengths of the arcs of which it is composed, the lengths of the arcs being 1 unless otherwise specified. With this convention, the length of a loop is 1. The length of path ρ is denoted |ρ |l . The subscript ‘l’ here refers to the word ‘length’ (later on, another subscript ‘w’ will appear for a different concept). The set of all paths and circuits in a graph is denoted R . A digraph is said to be acyclic if R contains no circuits. Descendant, ascendant. The set of descendants σ + (i ) of node i consists of all nodes j such that a path exists from i to j . Similarly the set of ascendants π + (i ) of node i is the set of all nodes j such that a path exists from j to i . One has, e.g., π + (i ) = π(i ) ∪ π(π(i )) ∪ . . . . The mapping i → π ∗ (i ) = {i } ∪ π + (i ) is the transitive closure of π ; the mapping i → σ ∗ (i ) = {i } ∪ σ + (i ) is the transitive closure of σ . Subgraph. Given a graph G = (V , E ), a graph G = (V , E ) is said to be a subgraph of G if V ⊂ V and if E consists of the set of arcs of G which have their origins and destinations in V . Chain, connected graph. A graph is called connected if for all pairs of nodes i and j there exists a chain joining i and j . A chain is a sequence of nodes (i1 , i2 , . . . , i p ) such that between each pair of successive nodes either the 2.2. Directed Graphs 37 arc (i j , i j +1 ) or the arc (i j +1 , i j ) exists. If one disregards the directions of the arcs in the definition of a path, one obtains a chain. Strongly connected graph. A graph is called strongly connected if for any two different nodes i and j there exists a path from i to j . Equivalently, i ∈ σ ∗ ( j ) for all i, j ∈ V , with i = j . Note that, according to this definition, an isolated node, with or without a loop, is a strongly connected graph. Bipartite graph. If the set of nodes V of a graph G can be partitioned into two disjoint subsets V1 and V2 such that every arc of G connects an element of V1 with one of V2 or the other way around, then G is called bipartite. In §2.3, it will be useful to introduce the notion of an ‘empty circuit’ the length of which is equal to 0 by definition. An empty circuit contains no arcs. The circuit (i ) is an empty circuit which should not be confused with the loop (i, i ) of length 1 (the latter makes sense only if there exists an arc from node i to itself). Empty circuits are not included in the set R of paths. To exemplify the various concepts introduced, consider the graph presented in Figure 2.1. It is a digraph since the arcs are indeed directed. The graph has seven nodes. Node 3 is a predecessor of node 6; 3 ∈ π(6). Similarly, 6 ∈ σ (3). The sequence of nodes 1, 3, 6, 4, 3, 2 is a nonelementary path. The arc (1, 1) is a loop and the sequence of nodes 3, 6, 4, 3 is an elementary circuit of length 3. The sequence of nodes 2, 3, 6 is a chain. It should be clear that the graph of Figure 2.1 is connected. Definition 2.1 (Equivalence relation R ) Let i, j ∈ V be two nodes of a graph. We say that i R j , if either i = j or there exist paths from i to j and from j to i. Then V is split up into equivalence classes V1 , . . . , Vq , with respect to the relation R. Note that if node i belongs to V , then V = σ ∗ (i ) ∩ π ∗ (i ). To each equivalence class V corresponds a subgraph G = (V , E ), where E is the restriction of E to V , which is strongly connected. Definition 2.2 (Maximal strongly connected subgraphs–m.s.c.s.) The subgraphs Gi = (Vi , Ei ) corresponding to the equivalence classes determined by R are the maximal strongly connected subgraphs of G . Notation 2.3 • The subset of nodes of the m.s.c.s. containing node i (and possibly reduced to i ) is denoted [i ]. • The subset of nodes j ∈π ∗ (i ) [ j ] is denoted [≤ i ]. • The symbol [< i ] represents the subset of nodes [≤ i ] \ [i ]. The graph of Figure 2.1 has two m.s.c.s.’s, namely the subgraphs consisting of the nodes 1, 3, 6, 4 and 2, 5, 7, respectively. If one ‘lumps’ the nodes of each m.s.c.s. into a single node, one obtains the so-called reduced graph. 38 Synchronization and Linearity 2 5 7 1 3 6 4 Figure 2.1: A digraph Figure 2.2: The reduced graph def Definition 2.4 (Reduced graph) The reduced graph of G is the graph with nodes V = {1, . . . , q } (one node per m.s.c.s.), and with arcs E , where (i, j ) ∈ E if (k , l ) ∈ E for some node k of Vi and some node l of V j . Figure 2.2 shows the reduced graph of the graph in Figure 2.1. Notation 2.5 A node i of the reduced graph corresponds to a collection of nodes of the original graph. Let x be a vector the entries of which are associated with nodes of the original graph. If we want to refer to the subvector associated with the nodes of m.s.c.s. i , we will use the notation x (i ) . Similarly, for a matrix A, A(i )( j ) is the block extracted from A by keeping the rows associated with the nodes of m.s.c.s. i and the columns associated with the nodes of m.s.c.s. j . If node of the original graph belongs to m.s.c.s. i , the notation x [ ] is equivalent to x (i ) . Similarly, x (<i ) , respectively x (≤i ) , is equivalent to x [< ] , respectively x [≤ ] . Lemma 2.6 The reduced graph is acyclic. Proof If there is a path from k ∈ Vi to l ∈ V j , then there is no path from any node of V j to any node of Vi (otherwise, k and l would be in the same m.s.c.s.). Denote the existence of a path from one subgraph Gi to another one G j by the binary relation R ; Gi R G j . Then these subgraphs G1 , . . . , Gq , together with the relation R form a partially ordered set, see Chapter 4 and also [85]. 2.3 Graphs and Matrices In this section we consider matrices with entries belonging to an abstract alphabet C in which some algebraic operations will be defined in §2.3.1. Some relationships between these matrices and ‘weighted graphs’ will be introduced. Consider a graph G = (V , E ) and associate an element Ai j ∈ C with each arc ( j , i ) ∈ E : then G is called a weighted graph. The quantity Ai j is called the weight of arc ( j , i ). Note that the second subscript of Ai j refers to the initial (and not the final) node. The reason is that, in the algebraic context, we will work with column vectors (and not with row vectors) 2.3. Graphs and Matrices 39 later on. In addition, we will also consider compositions of matrices and the resulting consequences for the corresponding graphs. The alphabet C contains a special symbol ε the properties of which will be given in §2.3.1. Definition 2.7 (Transition graph) If an m × n matrix A = ( Ai j ) with entries in C is given, the transition graph of A is a weighted, bipartite graph with n + m nodes, labeled 1, . . . , m , m + 1, . . . , m + n , such that each row of A corresponds to one of the nodes 1, . . . , m ; each column of A corresponds to one of the nodes m + 1, . . . , m + n. An arc from j to n + i, 1 ≤ i ≤ m, 1 ≤ j ≤ n, is introduced with weight Ai j if Ai j = ε. As an example, consider the matrix 3ε ε ε e ε A= ε ε ε 4 4 ε εε ε 2 ε ε ε 1 ε 7 ε 2 ε ε ε ε ε ε ε ε ε ε e ε ε ε 5 8 ε ε ε 1 ε ε 6 ε ε . (2.1) Its transition graph is depicted in Figure 2.3. 1 2 3 4 5 6 7 3 0 4 4 2 1 7 2 0 5 8 1 6 8 9 10 11 12 13 14 Figure 2.3: The transition graph of A Definition 2.8 (Precedence graph) The precedence graph of a square n × n matrix A with entries in C is a weighted digraph with n nodes and an arc ( j , i ) if Ai j = ε, in which case the weight of this arc receives the numerical value of Ai j . The precedence graph is denoted G ( A). It is not difficult to see that any weighted digraph G = (V , E ) is the precedence graph of an appropriately defined square matrix. The weight Ai j of the arc from node j to 40 Synchronization and Linearity node i defines the i j -th entry of a matrix A. If an arc does not exist, the corresponding entry of A is set to ε. The matrix A thus defined has G as its precedence graph. The transition graph of a square n × n matrix, which has 2n nodes, can be transformed into a precedence graph of n nodes. Towards that end, one combines the nodes i and n + i of the transition graph into one single node for the precedence graph, i = 1, . . . , n . As an example, Figure 2.4 gives the precedence graph of the matrix A 4 0 3 1 2 2 7 32 4 1 4 0 8 5 5 6 7 6 1 Figure 2.4: The precedence graph of A defined in (2.1). One directly recognizes the relation of this graph with the transition graph in Figure 2.3. The latter graph has been ‘folded’ so as to obtain the precedence graph. It may be convenient to consider that entries Ai j equal to ε define dummy arcs (which are not drawn) in the associated precedence or transition graph. Later, a path including a dummy arc will be called a dummy path. Dummy paths are not included in the set R of paths (which were taken into account in Definition 2.1; therefore these dummy paths are not involved in the definition of m.s.c.s.’s). The interest of the notion of dummy arcs is that these arcs may be considered as being of the same length as arcs associated with entries Ai j = ε (generally this length is 1); hence arcs of the same length can be associated with all entries of a matrix. Notation 2.9 The number of m.s.c.s.’s of G ( A) is denoted N A . For later reference, the following definitions are given. Definition 2.10 (Incidence matrix) The incidence matrix F = ( Fi j ) of a graph G = (V , E ) is a matrix the number of columns of which equals the number of arcs and the number of rows of which equals the number of nodes of the graph. The entries of F can take the values 0, 1 or −1. If l = (i, j ) ∈ E , i = j , then Fil = 1, F jl = −1 and the other entries of column l are 0. If l = (i, i ) ∈ E , then Fil = 0. Definition 2.11 (Adjacency matrix) The adjacency matrix G = (G i j ) of a graph G = (V , E ) is a matrix the numbers of rows and columns of which are equal to the number of nodes of the graph. The entry G i j is equal to 1 if j ∈ π(i ) and to 0 otherwise. Note that if G = G ( A), then G i j = 1 if and only if Ai j = ε (G describes the ‘support’ of A). 2.3. Graphs and Matrices 2.3.1 41 Composition of Matrices and Graphs We now study two kinds of compositions of matrices, and the relation between the transition graph of these compositions and the original transition graphs. These compositions are, respectively, the parallel composition, denoted ⊕, and the series composition, denoted ⊗. These compositions will be defined by means of the corresponding composition operations of elements in the alphabet C , for which the same symbols ⊕ and ⊗ will be used. The operation ⊕ is usually referred to as ‘addition’ or ‘sum’, and the operation ⊗ as ‘multiplication’ or ‘product’. The alphabet C includes two special elements ε and e with specific properties to be defined in the following set of axioms: Associativity of addition: ∀a , b , c ∈ C , (a ⊕ b ) ⊕ c = a ⊕ (b ⊕ c) . Commutativity of addition: ∀a , b ∈ C , a⊕b = b⊕a . Associativity of multiplication: ∀a , b , c ∈ C , (a ⊗ b ) ⊗ c = a ⊗ (b ⊗ c) . Right and left distributivity of multiplication over addition: ∀a , b, c ∈ C , (a ⊕ b ) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c) , ∀a , b , c ∈ C , c ⊗ (a ⊕ b ) = (c ⊗ a ) ⊕ (c ⊗ b ) . Existence of a zero element: ∃ε ∈ C : ∀a ∈ C , a⊕ε=a . Absorbing zero element: ∀a ∈ C , a⊗ε=ε . Existence of an identity element: ∃e ∈ C : ∀a ∈ C , a⊗e = e⊗a = a . In Chapter 3 other related axioms will be discussed in detail. There the notion of a semifield will be introduced and its relation to axioms of this type will be made clear. The parallel composition ⊕ of matrices is defined for matrices of the same size by the following rule: if A = ( Ai j ) and B = ( Bi j ) have the same size, then ( A ⊕ B )i j = Ai j ⊕ Bi j . 42 Synchronization and Linearity A transition graph can of course be associated with the matrix C = A ⊕ B . This transition graph has the same set of nodes as the transition graph of A (and therefore of B ) and there exists a (nondummy) arc from node j to node i if and only if at least one of the transition graphs of A and B has a (nondummy) arc from j to i . This is a consequence of the axiom Ai j ⊕ ε = Ai j . In general, this arc receives the weight Ai j ⊕ Bi j . It may be viewed as the arc resulting from the merging of two parallel arcs (if both Ai j and Bi j are different from ε). The symbol ε is called the zero element of the operation ⊕. Two other axioms are that ⊕ is associative and commutative and they have obvious consequences for the parallel composition of transition graphs. The series composition ⊗ of matrices A and B is defined only when the number of columns of A equals the number of rows of B (say, A is m × n and B is n × p ) by the following rule: n ( A ⊗ B )i j = Aik ⊗ Bk j . (2.2) k =1 A transition graph can be associated with the matrix C = A ⊗ B of size m × p . With the help of Figure 2.5, we explain how this graph is obtained. First, the transition B p nodes A n nodes m nodes j k i Figure 2.5: The series composition of two transition matrices graphs of matrices B and A are concatenated. The graph so obtained is not a transition graph since it has n intermediate nodes in addition of its p input nodes and its m output nodes. These intermediate nodes are removed, and an arc from node j to node i exists in the transition graph of C if and only if there exists at least one (nondummy) path from node j to node i in the concatenated graph shown in Figure 2.5. This arc receives the weight indicated by Equation (2.2). In order to interpret this formula, let us first define the weight of a path ρ = (i1 , . . . , i p ) in a weighted graph as the product Ai p ,i p−1 ⊗ · · · ⊗ Ai2 ,i1 of weights of arcs composing this path (observe the order). This weight is denoted |ρ |w , where the subscript w refers to the word ‘weight’. Note that a dummy path always has a weight ε thanks to the absorbing property of ε in products. Then, each term of the sum in (2.2) can be interpreted as the weight of some parallel path of length 2 from node j to node i , characterized by the intermediate node k it passes through (see Figure 2.5). The previous rule pertaining to the weights of parallel compositions of arcs is thus extended to parallel paths. Since ε is the zero element for addition, dummy paths do not contribute to (nonzero) weights in parallel compositions. 2.3. Graphs and Matrices 43 For the series composition of matrices, we have ( A ⊗ B ) ⊗ C = A ⊗ ( B ⊗ C ). This associativity property of matrix multiplication is a direct consequence of the axioms given above, namely the associativity of ⊗ and the right and left distributivity of ⊗ over ⊕. It is easily seen that these right and left distributivities also hold for matrices of appropriate sizes. The notion ( A p ) j i , p = 1, 2, . . . , where A is square, is defined by ( A p ) j i = ( A ⊗ p −1 A ) j i . Observe that this involves the enumeration and the sums of weights of all paths of length p with initial node i and final node j . This definition makes sense for both the transition graph of A, concatenated p times, and the precedence graph of A. We get ( Ap ) ji = Ai p ,i p−1 ⊗ Ai p−1 ,i p−2 ⊗ · · · ⊗ Ai1 ,i0 . {ρ | |ρ |l = p ;i0 =i ;i p = j } Because we removed the intermediate nodes in the graph of Figure 2.5 in order to obtain the transition graph of C = A ⊗ B , and similarly when one considers the transition graph of a matrix C equal to A p , the information that weights of the graph of C have been obtained as weights of paths of length larger than 1 (namely 2 and p , respectively) has been lost. In order to keep track of this information, one may introduce the notion of length of a transition graph. This length is an integer number associated with the transition graph and hence also with all its individual arcs, including dummy arcs, and finally with the matrix associated with this graph. These considerations explain why the lengths of arcs may be taken greater than 1 and why dummy arcs may also have a nonzero length. We now consider the transition graph corresponding to matrices A0 where A is an n × n matrix. In the same way as A p , for p ≥ 1, describes the weights of paths of length p , A0 should describe the weights of paths of length 0, that is, empty circuits (i ) corresponding to ‘no transition at all’. Pictorially, the corresponding transition graph of such matrices has the special form depicted in Figure 2.6a in which input e e e (a) (b) (c) Figure 2.6: The transition graphs of A0 , ε and e and output nodes are not distinguishable. This transition graph must not be confused either with that of the ‘zero matrix’ (with all entries equal to ε) or with that of the ‘identity matrix’ (with diagonal entries equal to e and all off-diagonal entries equal to ε). The transition graph of the zero matrix is depicted in Figure 2.6b: all its arcs are dummy with length 1. The transition graph of the identity matrix is depicted in Figure 2.6c: weights are indicated in the figure and the length is 1 for all dummy and nondummy arcs. The length associated with all entries of A0 is 0, that is, the series 44 Synchronization and Linearity composition of the transition graph of A0 with any other bipartite graph do not modify the lengths of paths. In the same way, we would like the weights not to be modified in that operation: this requires that the entries of A0 be the same as those of the identity matrix e. But again the respective lengths and transition graphs of A0 and e are different (see Figure 2.6a–c). Also, dummy loops (i, i ) of the transition graph of the zero matrix should not be confused with empty circuits (i ) (see Figure 2.6a–b) even if it is difficult to distinguish them with the help of precedence, rather than transition, graphs. In Chapter 1 we already met several examples of ⊕ and ⊗ operations which satisfy the axioms just stated. A particular example is that C equals R, e = 0, ε = −∞, ⊕ equals maximization and ⊗ equals addition. Note that the max-operation is idempotent, i.e. a ⊕ a = a , but this property is not (yet) assumed as an axiom. Remark 2.12 An example of operations which do not satisfy some of the axioms is one where ⊕ is addition and ⊗ is minimization. The axioms of distributivity are not satisfied. Indeed, min(5, 3 + 6) = min(5, 3) + min(5, 6) . As a consequence, associativity of multiplication with respect to matrices does not hold. If for instance, A= 1 2 3 2 , B= 32 15 , C= 42 25 , then 46 56 = ( AB )C = A( BC ) = 44 44 . A practical interpretation of a system with these operations is in terms of the calculation of the capacity of a network in which the arcs are pipes through which there is a continuous flow. The capacity of a pipe is assumed to be proportional to the diameter of this pipe. Then it is easily seen that the capacity of two pipes in parallel equals the sum of the two capacities. Similarly, the capacity of two pipes in series equals the minimum of their capacities. The reader should contemplate the physical consequences of the lack of associativity. Definition 2.13 (Irreducibility) The (square) matrix A is called irreducible if no permutation matrix P exists such that the matrix A, defined by A = P AP , has an upper triangular block structure. The reader should be aware of the fact that this definition is invariant with respect to the algebra used. Premultiplication of A by P and postmultiplication by P simply refers to a renumbering of the nodes of the corresponding graph. Hence renumbering of the nodes of the same graph leads to different A-matrices. In an upper triangular block structure, diagonal blocks with non-ε entries are allowed. If one also wants the diagonal blocks to have ε-entries only, one should speak about a strictly upper triangular block structure. 2.3. Graphs and Matrices 45 Theorem 2.14 A necessary and sufficient condition for the square matrix A to be irreducible is that its precedence graph be strongly connected. Proof Suppose that A is such that by an appropriate renumbering of the nodes, A has an upper triangular block structure. Call the diagonal blocks Ad , . . . , Ad . If Ad has q q 1 size n q × n q , then there are no paths from any of the (renumbered) nodes 1, . . . , n − n q , to any of the nodes n − n q + 1, . . . , n . Hence this graph is not strongly connected. On the other hand, if the graph is not strongly connected, determine its m.s.c.s.’s Gi , i = 1, . . . , q . These subgraphs form a partially ordered set. Number the individual nodes of V in such a way that if Gi R G j , then the nodes of Vi have lower indices than those of V j (R was defined in §2.2). With this numbering of the nodes, the corresponding matrix A will be upper block triangular. Definition 2.15 (Aperiodicity) The square matrix A is aperiodic if there exists an integer N such that for all n ≥ N and for all i, j , ( An )i j = ε. Theorem 2.16 An irreducible matrix A such that A j j = ε for all j is aperiodic. Proof From Theorem 2.14, the irreducibility assumption implies that for all i, j , there exists n such that ( An )i j = ε. This together with the assumption A j j = ε in turn implies that ( Am )i j = ε for all m ≥ n . The assertion of the theorem follows immediately from this since the number of nodes is finite. Definition 2.17 A digraph is called a tree if there exists a single node such that there is a unique path from this node to any other node. 1 6 4 2 2 1 3 0 7 5 8 4 1 3 2 0 7 6 5 4 Figure 2.7: Weighted digraph consisting of two m.s.c.s.’s In order to determine whether A is irreducible, one can calculate A+ (see (1.18)). Matrix A is irreducible if and only if all entries of A+ are different from ε. This algorithm for determining whether A is irreducible can be simplified by considering only Boolean variables. Replace A by the adjacency matrix G of its precedence graph (see Definition 2.11), except that 0 and 1 are replaced by ε and e, respectively, in the present context. Then A is irreducible if and only if all entries of G + are identical to e. 46 Synchronization and Linearity As an example, consider the matrix given in (2.1). The matrix G + becomes eεeeεeε e e e e e e e e ε e e ε e ε e ε e e ε e ε . e e e e e e e e ε e e ε e ε eeeeeee (2.3) Hence A is not irreducible. In fact it follows directly from (2.3) that the nodes 1, 3, 4 and 6 form a m.s.c.s., as do the nodes 2, 5 and 7. If one rearranges the nodes and arcs of Figure 2.4, one obtains Figure 2.7, in which the weights of the arcs have been indicated, and the two m.s.c.s.’s are clearly visible. Figure 2.7 is identical to Figure 2.1 apart from the fact that weights are given. 2.3.2 Maximum Cycle Mean In this subsection the maximum cycle mean will be defined and some of its properties will be derived. The maximum cycle mean has a clear relation with eigenvalues of matrices within the context of the max algebra and with periodic regimes of systems described by linear equations, also within the same context. These relationships will be described in Chapter 3. Let G = (V , E ) be a weighted digraph with n nodes. The weights are real numbers here and are given by means of the n × n matrix A. As discussed before, the numerical value of Ai j equals the weight of the arc from node j to node i . If no such arc exists, then Ai j = ε. It is known from Chapter 1 that the entry (i, j ) of Ak = A ⊗ · · · ⊗ A, def considered within the algebraic structure Rmax = (R ∪ {−∞}, max , +), denotes the maximum weight with respect to all paths of length k which go from node j to node i . If no such path exists, then ( Ak )i j = ε. Within this algebraic structure, ε gets assigned the numerical value −∞ and e = 0. In this subsection we will confine ourselves to the algebraic structure Rmax . Definition 2.18 (Cycle mean) The mean weight of a path is defined as the sum of the weights of the individual arcs of this path, divided by the length of this path. If the path is denoted ρ , then the mean weight equals |ρ |w /|ρ |l. If such a path is a circuit one talks about the mean weight of the circuit, or simply the cycle mean. We are interested in the maximum of these cycle means, where the maximum is taken over all circuits in the graph (empty circuits are not considered here). Consider an n × n matrix A with corresponding precedence graph G = (V , E ). The maximum weight of all circuits of length j which pass through node i of G can be written as ( A j )ii . The maximum of these maximum weights over all nodes is n=1 ( A j )ii which i can be written trace( A j ). The average weight is obtained by dividing this number by j in the conventional sense, but this can be written (trace ( A j ))1/ j in the max-plus algebra notation. Finally, we have to take the maximum with respect to the length j . It is not necessary to consider lengths larger than the number n of nodes since it is enough to 2.3. Graphs and Matrices 47 limit ourselves to elementary circuits. It follows that a formula for the maximum cycle mean λ in the max-plus algebra notation is n (trace( A j ))1/ j . λ= j =1 The following theorem provides an expression, due to R. Karp, for this maximum value. All circuits with a cycle mean equal to the maximum cycle mean are called critical circuits. Karp’s theorem does not give the critical circuit(s). Theorem 2.19 (Karp’s theorem) Given is an n × n matrix A with corresponding precedence graph G = (V , E ). The maximum cycle mean is given by λ = max i =1,... ,n ( An )i j − ( Ak )i j , k =0,... ,n−1 n −k min ∀j . (2.4) In this equation, An and Ak are to be evaluated in Rmax ; the other operations are conventional ones. Proof Note that the index j in (2.4) is arbitrary (it will be shown in this proof that one can take any j ∈ {1, . . . , n }). The resulting value of λ is independent of j . Without loss of generality, we may assume that G is strongly connected. If it were not, we would consider each of its m.s.c.s.’s—since G is assumed to be finite, there are only a finite number of such m.s.c.s.’s—and determine the maximum cycle mean of each of them and then take the maximum one. We first assume that the maximum cycle mean is 0. Then it must be shown that max i =1,... ,n ( An )i j − ( Ak )i j =0 . k =0,... ,n−1 n −k min Since λ = 0 there exists a circuit of weight 0 and there exists no circuit with positive weight. Because there are no circuits (or loops) with positive weight, there is a maximum weight of all paths from node j to node i which is equal to k def χi j = max Ail ,il−1 , subject to i0 = j , ik = i , l =1 where the maximum is taken with respect to all paths and all k . Since for k ≥ n the path would contain a circuit and since all circuits have nonpositive weight, we can restrict ourselves to k < n . Therefore we get χi j = max k =0,... ,n−1 ( Ak )i j . Also, ( An )i j ≤ χi j , and hence ( An )i j − χi j = min k =0,... ,n−1 ( An )i j − ( Ak )i j ≤ 0 . 48 Synchronization and Linearity Equivalently, ( An )i j − ( Ak )i j ≤0 . k =0,... ,n−1 n−k min (2.5) Equality in (2.5) will only hold if ( An )i j = χi j . It will be shown that indeed an index i exists such that this is true. Let ζ be a circuit of weight 0 and let l be a node of ζ . Let ρl j be a path from j to l with corresponding maximum weight |ρl j |w = χl j . Now this path is extended by appending to it a number of repetitions of ζ such that the total length of this extended path, denoted ρe, becomes greater than or equal to n . This is again a path of maximum weight from j to l . Now consider the path consisting of the first n arcs of ρe ; its initial node is j and denote its final node l . Of course l ∈ ζ . Since any subpath of any path of maximum weight is of maximum weight itself, the path from j to l is of maximum weight. Therefore ( An )l j = χl j . Now choose i = l and we get max i =1,... ,n min k =0,... ,n−1 ( An )i j − ( Ak )i j n −k =0 . This completes the part of the proof with λ = 0. Now consider an arbitrary finite λ. A constant c is now subtracted from each weight Ai j . Then clearly λ will be reduced by c. Since ( Ak )i j is reduced by kc, we get that ( An )i j − ( Ak )i j n−k is reduced by c, for all i, j and k , and hence max min i =1,... ,n k =0,... ,n−1 ( An )i j − ( Ak )i j n−k is also reduced by c. Hence both sides of (2.4) are affected equally when all weights Ai j are reduced by the same amount. Now choose this amount such that λ becomes 0 and then we are back in the previous situation where λ = 0. 2.3.3 The Cayley-Hamilton Theorem The Cayley-Hamilton theorem states that, in conventional algebra, a square matrix satisfies its own characteristic equation. In mathematical terms, let A be an n × n matrix and let p A (x ) = det(x I − A) = x n + c1 x n−1 + · · · + cn−1 x + cn x 0 , def (2.6) where I is the identity matrix, be its characteristic polynomial. The term x 0 in the polynomial equals 1. Then p A ( A) = 0, where 0 is the zero matrix. The coefficients ci , i = 1, . . . , n , in (2.6) satisfy Ai1 ,i1 . . . Ai1 ,ik . . . . (2.7) det . ck = (−1)k . . i1 <i2 <···<ik Aik ,i1 ... Aik ,ik 2.3. Graphs and Matrices 49 The reason for studying the Cayley-Hamilton theorem is the following. In conventional system theory, this theorem is used for the manipulation of different system descriptions and for analyzing such properties as controllability (see [72]). In the context of discrete event systems, a utilization of this theorem will be shown in §9.2.2. In this section it will be shown that a Cayley-Hamilton theorem also exists in an algebraic structure defined by a set C of elements supplied with two operations denoted ⊕ and ⊗ which obey some of the axioms given in §2.3.1, namely • associativity of addition, • commutativity of addition, • associativity of multiplication, • both right and left distributivity of multiplication over addition, • existence of an identity element, provided we also have Commutativity of multiplication: ∀a , b ∈ C , a⊗b = b⊗a . Note that the existence of a zero element (and its absorbing property) is not required in this subsection. A partial permutation of {1, . . . , n } is a bijection ς of a subset of {1, . . . , n } onto itself. The domain of ς is denoted by dom(ς ) and its cardinality is denoted |ς |l . A partial permutation ς for which |ς |l = n is called a complete permutation. The completion ς of a partial permutation ς is defined by ς (i ) = ς (i ) i if i ∈ dom(ς ) , if i ∈ {1, . . . , n } \ dom(ς ) . The signature ∗ of a partial permutation ς , denoted sgn∗ (ς ), is defined by sgn∗ (ς ) = sgn(ς )(−1)|ς |l , where sgn(ς ), sometimes also written as (−1)ς , denotes the conventional signature of the permutation ς , see [61]. Every (partial) permutation has a unique representation as a set of disjoint circuits. For example, the permutation 1 4 2345 6351 6 2 has the circuit representation {(1, 4, 5), (3), (2, 6)} . 50 Synchronization and Linearity With the graph-theoretic interpretation of permutations in mind, these disjoint circuits correspond to m.s.c.s.’s. The unique partial permutation of cardinality 0 has the empty set as its circuit representation. If ς is a partial permutation with cardinality k consisting of a single circuit, then sgn∗ = (−1)k−1 (−1)k = −1. It easily follows that for any partial permutation ς , sgn∗ = (−1)r , where r is the number of circuits appearing in the circuit representation of ς . Given an n × n matrix A = ( Ai j ), the weight of ς is defined by |ς |w = Aς(i ),i . i ∈dom(ς) The weight of the partial permutation with cardinality 0 equals e, in accordance with the theory presented in §2.3.1. Let 1 ≤ i, j ≤ n and let T j+ be the set of all pairs (ς, ρ) i where ς is a partial permutation and where ρ is a path from i (the initial node) to j (the final node) in such a way that |ς |l + |ρ |l = n , sgn∗ (ς ) = 1 . The set T j− is defined identically except for the fact that the condition sgn∗ (ς ) = 1 is i replaced by sgn∗ (ς ) = −1. Lemma 2.20 For each pair ( j , i ), with 1 ≤ i, j ≤ n, there is a bijection η j i : T j+ → i T j− in such a way that η j i (ς, ρ) = (ς , ρ ) implies |ς |w ⊗ |ρ |w = |ς |w ⊗ |ρ |w . i Proof Each pair (ς, ρ) ∈ T j+ ∪ T j− is represented by a directed graph with nodes i i {1, . . . , n }. The set of arcs consists of two classes of arcs: E ς = {(i, ς (i )) | i ∈ dom(ς )} , Eρ = {(l , k ) | (l , k ) is an arc of ρ } . This graph will, in general, contain multiple arcs, since ρ may traverse the same arc more than once, or the same arc may appear in both E ς and Eρ . The expression |ς |w ⊗ |ρ |w is the series composition, with multiplicities taken into account, of all the Alk for which (k , l ) is an arc of the graph associated with (ς, ρ). Let ρ be the path connecting the nodes i0 , i1 , . . . , iq , in this order. There is a smallest integer v ≥ 0 such that either iu = iv for some u < v , or iv ∈ dom(ς ). If such a v did not exist, then ρ must have |ρ |l + 1 distinct nodes (because there are no u and v with iv = iu ). But then there are at least |ς |l + |ρ |l + 1 = n + 1 distinct nodes (because ς and ρ do not have any node in common), which is a contradiction. Furthermore, it is easily seen that this smallest integer v cannot have both properties. Hence either iu = iv for some u < v or iv ∈ dom(ς ). An example of the first property is given in Figure 2.8 and an example of the second property is given in Figure 2.9. The dashed arcs refer to the set E ς and the solid arcs refer to the set Eρ . In Figure 2.8, v equals 3, and in Figure 2.9, v equals 1 (i.e. i1 = 2). Consider the situation with the first property such as pictured in Figure 2.8. We have iu = iv for some u and v . The circuit passing through iu is removed from ρ and adjoined as a new circuit to ς . The new path from i0 to iq is denoted ρ and the new, longer, partial permutation is denoted ς . The mapping η j i is defined as η j i (ς, ρ) = 2.3. Graphs and Matrices 1 3 2 4 51 5 1 4 6 Figure 2.8: Example of a graph with the first property 1 4 5 5 3 6 Figure 2.9: Example of a graph with the second property 3 2 2 1 4 6 Figure 2.10: η31 applied to Figure 2.8 2 5 3 6 Figure 2.11: η31 applied to Figure 2.9 (ς , ρ ). Application of the mapping η31 to Figure 2.8 is given in Figure 2.10. Since the number of circuits of ς and of ς differ by one, we have sgn∗ (ς ) = −sgn∗ (ς ) and therefore η j i maps T j+ into T j− and vice versa. i i Consider next the situation with the second property such as pictured in Figure 2.9. The mapping η j i (ς, ρ) = (ς , ρ ) is obtained by removing the circuit containing iv from ς and adjoining it to ρ . Application of η31 to Figure 2.9 is given in Figure 2.11. Also in this situation, the numbers of circuits of ς and of ς differ by one, and again we have sgn∗ (ς ) = −sgn∗ (ς ) which results in the fact that η j i maps T j+ into T j− and i i vice versa. In both situations |ς |w ⊗ |ρ |w = |ς |w ⊗ | |w , since nothing has changed in the graph of (ς, ρ). It is in the derivation of this equality that the associativity and commutativity of multiplication, and the existence of an identity element, have been used. What remains to be shown is that the mapping η j i is surjective. For this reason consider the iteration η j i ◦η j i . It easily follows that this mapping is the identity on T j+ ∪ T j− , which is only possible if η j i is surjective. i i Definition 2.21 (Characteristic equation) The characteristic equation is given by p + (x ) = p − (x ) , A A where n p + (x ) A = and n p − (x ) = A k =0 |ς |l k =0 Aς(i ),i x n−k , =k ,sgn∗ (ς)=1 i ∈dom(ς) =k ,sgn∗ (ς)=−1 |ς |l (2.8) Aς(i ),i x n−k . i ∈dom(ς) 52 Synchronization and Linearity It is easily verified that this characteristic equation, if considered the conventional algebra, coincides with the equation obtained by setting the characteristic polynomial (2.6) equal to zero. The crucial feature of (2.8) is that there are no terms with ‘negative’ coefficients (in contrast with the conventional characteristic equation, which can have negative coefficients). Since the inverse of ⊕ does not exist, the terms in (2.8) cannot freely be moved from one side to the other side of the equation. Theorem 2.22 (The Cayley-Hamilton theorem) The following identity holds true: p+ ( A) = p− ( A) . A A Proof For k = 0, . . . , n , ( An−k ) j i = |ρ |w . |ρ |l = n−k initial node of ρ equals i final node of ρ equals j It follows that p+ ( A) j i = A |ς |w ⊗ |ρ |w , p − ( A) j i = A (ς,ρ)∈T j+ i |ς |w ⊗ |ρ |w . (ς,ρ)∈T j− i Owing to Lemma 2.20, these two sums are identical. It is in these two equalities that the associativity and commutativity of addition, the distributivity and the existence of an identity element have been used. Let us give an example. For the 3 × 3 matrix A = ( Ai j ), the characteristic equation is p+ (x ) = p − (x ), where A A p + (x ) A = p − (x ) A = x 3 ⊕ ( A11 A22 ⊕ A11 A33 ⊕ A22 A33 )x ⊕ A13 A22 A31 ⊕ A12 A21 A33 ⊕ A11 A32 A23 , ( A11 ⊕ A22 ⊕ A33 )x 2 ⊕ ( A12 A21 ⊕ A13 A31 ⊕ A23 A32 )x ⊕ A11 A22 A33 ⊕ A12 A23 A31 ⊕ A21 A13 A32 , where, as usual, the ⊗-symbols have been omitted. If we consider 123 A= 4 1 ε e53 in the algebraic structure Rmax , then the characteristic equation becomes x 3 ⊕ 4 x ⊕ 9 = 3 x 2 ⊕ 6 x ⊕ 12 , which can be simplified to x 3 = 3 x 2 ⊕ 6 x ⊕ 12 , 2.4. Petri Nets 53 since the omitted terms are dominated by the corresponding terms at the other side of the equality. A simple calculation shows that if one substitutes A in the latter equation, one obtains an identity indeed: 12 ε ε 7 89 9 11 9 12 11 9 10 12 10 = 8 9 10 ⊕ 10 7 ε ⊕ ε 12 ε . ε ε 12 6 11 9 12 11 9 12 11 12 This section will be concluded with some remarks on minimal polynomial equations. The Cayley-Hamilton theorem shows that there exists at least one polynomial equation satisfied by a given n × n matrix A. This polynomial equation is of degree n . For example, if A is the 3 × 3 identity matrix, then p + (x ) = x 3 ⊕ x , A p − (x ) = x 2 ⊕ e , A and A satisfies the equation A3 ⊕ A = A2 ⊕ e. There may exist equations of lower degree also satisfied by A. With the previous A, we also have A = e and x = e is a polynomial equation of degree 1 also satisfied by the identity matrix. A slightly less trivial example is obtained for ε1ε A = 1 ε ε . εε1 The characteristic equation is x 3 ⊕ 3 = 1x 2 ⊕ 2x . It is easily seen that A satisfies both x 3 = 2 x and 3 = 1 x 2 . These equations have been obtained by a ‘partitioning’ of the characteristic equation; ‘adding’ these partitioned equations, one obtains the characteristic equation again. In this case, 3 = 1 x 2 is of degree 2. We may call a polynomial equation of least degree satisfied by a matrix, with the additional requirement that the coefficient of the highest power be equal to e, a minimal polynomial equation of this matrix. This is the counterpart of the notion of the minimal polynomial in conventional algebra and it is known that this minimal polynomial is a divisor of the characteristic polynomial [61]. In the present situation, it is not clear whether the minimal polynomial equation of a matrix is unique and how to extend the idea of division of polynomials to polynomial equations. In Chapter 3, a more detailed discussion on polynomials is given. 2.4 Petri Nets 2.4.1 Definition Petri nets are directed bipartite graphs. They are named after C.A. Petri, see [96]. The set of nodes V is partitioned into two disjoint subsets P and Q. The elements of P are called places and those of Q are called transitions. Places will be denoted 54 Synchronization and Linearity pi , i = 1, . . . , |P |, and transitions, q j , j = 1, . . . , |Q|. The directed arcs go from a place to a transition or vice versa. Since a Petri net is bipartite, there are no arcs from place to place or from transition to transition. In the graphical representation of Petri nets, places are drawn as circles and transitions as bars (the orientation of these bars can be anything). An example of a Petri net is given in Figure 2.12. Figure 2.12: A Petri net with sources and sinks In order to complete the formal definition of a Petri net, an initial marking must be introduced. The initial marking assigns a nonnegative integer µi to each place pi . It is said that pi is marked with µi initial tokens. Pictorially, µi dots (the tokens) are placed in the circle representing place pi . The components µi form the vector µ, called the initial marking of the Petri net. Definition 2.23 A Petri net is a pair (G , µ), where G = (V , E ) is a bipartite graph with a finite number of nodes (the set V ) which are partitioned into the disjoint sets P and Q; E consists of pairs of the form ( pi , q j ) and (q j , pi ), with pi ∈ P and q j ∈ Q; the initial marking µ is a |P |-vector of nonnegative integers. Notation 2.24 If pi ∈ π(q j ) (or equivalently ( pi , q j ) ∈ E ), then pi is an upstream place for q j . Downstream places are defined likewise. The following additional notation will also be used when we have to play with indices: if pi ∈ π(q j ), we write i ∈ π q ( j ), i = 1, . . . , |P |, j = 1, . . . , |Q|; similarly, if q j ∈ π( pi ), we write j ∈ π p (i ), with an analogous meaning for σ p or σ q . Roughly speaking, places represent conditions and transitions represent events. A transition (i.e. an event) has a certain number of input and output places representing the pre-conditions and the post-conditions of the event, respectively. The presence of a token in a place is interpreted as the condition associated with that place being fulfilled. In another interpretation, µi tokens are put into a place to indicate that µi data items or resources are available. If a token represents data, then a typical example of a transition is a computation step for which these data are needed as an input. In Figure 2.13, the Petri net of the production example in §1.2.3 is given. The tokens in this figure are located in such a way that machine M1 starts working on product P2 and M2 on P1 . Note that M1 cannot work on P3 in Figure 2.13. 2.4. Petri Nets 55 5 1 q1 1 q3 3 3 q4 3 2 q2 3 q5 3 2 4 5 4 q6 3 q7 Figure 2.13: Petri net of the manufacturing system of §1.2.3 Within the classical Petri-net setting, the marking of the Petri net is identified with the state. Changes occur according to the following rules: • A transition is said to be enabled if each upstream place contains at least one token. • A firing of an enabled transition removes one token from each of its upstream places and adds one token to each of its downstream places. Remark 2.25 The enabling rule given above is not the most general one. Sometimes integer valued ‘weights’ are attached to arcs. A transition is enabled if the upstream place contains at least the number of tokens given by the weight of the connecting arc. Similarly, after the firing of a transition, a downstream place receives the number of tokens given by the weight of the connecting arc. Instead of talking about such ‘weights’, one sometimes talks about multi-arcs; the weight equals the number of arcs between a transition and a place or between a place and a transition. In terms of ‘modeling power’, see [96] and [108] for a definition, this generalization is not more powerful than the rules which will be used here. The word ‘weight’ of an arc will be used in a different sense later on. For q j to be enabled, we need that µi ≥ 1 , ∀ pi ∈ π(q j ) . If the enabled transition q j fires, then a new marking µ is obtained with µi − 1 if pi ∈ π(q j ) , µi = µi + 1 if pi ∈ σ (q j ) , µi otherwise. 56 Synchronization and Linearity In case both pi ∈ π(q j ) and pi ∈ σ (q j ) for the same place pi , then µi = µi . In Figure 2.13, once M1 has completed its work on P2 and M2 its work on P1 , then Figure 2.14 is obtained. The next transitions that are now enabled are described by the 5 1 q1 1 q3 3 3 q4 3 2 q2 3 q5 3 2 4 5 4 q6 3 q7 Figure 2.14: The tokens after the firing of q1 and q3 combinations ( M1 , P3 ), ( M2 , P2 ) and ( M3 , P1 ). Note that in general the total amount of tokens in the net is not left invariant by the firing of a transition, although this does not happen in Figure 2.13. If we have a ‘join’-type transition (Figure 2.15), which is called an and-convergence, or a ‘fork’-type transition (Figure 2.16), called an anddivergence, then clearly the number of tokens changes after a firing has taken place. In Figure 2.15: And-convergence before and after firing Figure 2.16: And-divergence before and after firing the same vein, an or-convergence refers to two or more arcs entering one place, and an or-divergence refers to two or more arcs originating from one place. A transition without predecessor(s) is called a source transition or simply a source; it is enabled by the outside world. Similarly, a transition which does not have successor(s), is called a sink (or sink transition). Sink transitions deliver tokens to the outside world. In Figure 2.12 there are two transitions which are sources and there is one transition which is a sink. If there are no sources in the network, as in Figure 2.17, then we talk about an autonomous network. It is assumed that only transitions can be sources or sinks. This is no loss of generality, since one can always add a transition upstream 2.4. Petri Nets 57 or downstream of a place if necessary. A source transition is an input of the network, a sink transition is an output of the network. The structure of a place pi having two or more output transitions, as shown in Figure 2.18, is referred to as a conflict , since the transitions are competing for the Figure 2.17: An autonomous Petri net Figure 2.18: Part of a Petri net with a conflict token in the place. The transitions concerned will be said to be rivals. There are no general rules as to which transition should fire first. One says that a Petri net with such an or-divergence exhibits nondeterminism. Depending on the application, one can also talk about a choice (between the transitions) or a decision. The firing of an enabled transition will change the distribution of tokens. A sequence of firings will result in a sequence of markings. A marking µ is said to be reachable from a marking µ if there exists a sequence of enabled firings that transforms µ into µ. Definition 2.26 (Reachability tree) The reachability tree of a Petri net (G , µ) is a tree with nodes in N|P | which is obtained as follows: the initial marking µ is a node of this tree; for each q enabled in µ, the marking µ obtained by firing q is a new node of the reachability tree; arcs connect nodes which are reachable from one another in one step; this process is applied recursively from each such µ. Definition 2.27 (Reachability graph) The reachability graph is obtained from the reachability tree by merging all nodes corresponding to the same marking into a single node. Take as an example the Petri net depicted in Figure 2.19. The initial marking is (1, 1, 1, 1). Both transitions are enabled. If q1 fires first, the next marking is (1, 1, 0, 2). If q2 fires instead, the marking becomes (1, 1, 2, 0). From (1, 1, 0, 2), only the initial marking can be reached immediately by firing q2 ; starting from (1, 1, 2, 0), only q1 can fire, which also leads to the initial marking. Thus it has been shown that there are three different markings in the reachability graph of (1, 1, 1, 1). 58 Synchronization and Linearity p3 p1 q1 (1,1,1,1) p2 q2 p4 (1,1,0,2) (1,1,2,0) (1,1,1,1) (1,1,1,1) (1,1,1,1) (1,1,0,2) (1,1,0,2) (1,1,2,0) (1,1,0,2) (1,1,2,0) (1,1,2,0) Figure 2.19: A Petri net with corresponding reachability graph and reachability tree Definition 2.28 For a Petri net with n transitions and m places, the incidence matrix G = (G i j ) is an n × m matrix of integers −1, 0 and +1. The entry G i j is defined by G i j = G out − G in , ij ij where G out = 1 (0) if there is an (no) arc from qi to p j and G in = 1 (0) if there is ij ij an (no) arc from p j to qi . Matrices G out and G in are defined as G out = (G out ) and ij G in = (G in ), respectively. ij Note that G does not uniquely define a Petri net since, if G ii = 0, a path including exactly one place around the transition qi is also possible. A circuit consisting of one transition and one place is called a loop in the context of Petri nets. If each place in the Petri net had only one upstream and one downstream transition, then the incidence matrix G would reduce to the well-known incidence matrix F introduced in Definition 2.10 by identifying each place p with the unique arc from π( p ) to σ ( p ). Transition q j is enabled if and only if a marking µ is given such that µ ≥ (G in ) e j , where e j = (0, . . . , 0, 1, 0, . . . , 0) , with the 1 being the j -th component. If this transition fires, then the next marking µ is given by µ = µ + G ej . A destination marking µ is reachable from µ if a firing sequence e j1 , . . . , e jd exists such that d µ=µ+G e jl . l =1 Hence a necessary condition for µ to be reachable from µ is that an n -vector x of nonnegative integers exists such that G x = µ−µ . (2.9) 2.4. Petri Nets 59 The existence of such a vector x is not a sufficient condition; for a counterexample see for instance [108]. The vector x does not reflect the order in which the firings take place. In the next subsection a necessary and sufficient condition for reachability will be given for a subclass of Petri nets. An integer solution x to (2.9), with its components not necessarily nonnegative, exists if and only if µ y = µ y , for any y that satisfies Gy = 0. The necessity of this statement easily follows if one takes the inner products of the left- and right-hand sides of (2.9) with respect to y . The sufficiency is easily shown if one assumes that x does not exist, i.e. rank[G ] <rank[G, µ −µ]—the notation [G, µ − µ] refers to the matrix consisting of G and the extra column µ − µ. Then a vector y exists with y G = 0 and y (µ − µ) = 0, which is a contradiction. 2.4.2 Subclasses and Properties of Petri Nets In this subsection we introduce some subclasses of Petri nets and analyze their basic properties. Not all of these properties are used later on; this subsection is also meant to give some background information on distinct features of Petri nets. The emphasis will be on event graphs. Definition 2.29 (Event graph) A Petri net is called an event graph if each place has exactly one upstream and one downstream transition. Definition 2.30 (State machine) A Petri net is called a state machine if each transition has exactly one upstream and one downstream place. Event graphs have neither or-divergences nor or-convergences. In event graphs each place together with its incoming and outgoing arcs can be interpreted as an arc itself, connecting the upstream and downstream transition, directly. In the literature, event graphs are sometimes also referred to as marked graphs or as decision free Petri nets. Figure 2.20 shows both a state machine which is not an Figure 2.20: A state machine and an event graph event graph and an event graph which is not a state machine. An event graph does not allow and cannot model conflicts; a token in a place can be consumed by only one predetermined transition. In an event graph several places can precede a given transition. It is said that event graphs can model synchronization. State machines do 60 Synchronization and Linearity not admit synchronization; however, they do allow competition. The number of tokens in an autonomous state machine never changes (Petri nets in which the number of tokens remains constant are called strictly conservative; a discussion follows later). It can be shown [108] that state machines are equivalent to the finite state machines or automata in theoretical computer science. Each automaton can be rephrased as a state machine Petri net. This shows that Petri nets have more modeling power than automata. Basic definitions and properties of Petri nets are now given. Definition 2.31 (Bounded and safe nets) A Petri net, with initial marking µ, is said to be k-bounded if the number of tokens in each place does not exceed a finite number k for any marking reachable from µ. Instead of 1-bounded (k = 1) Petri nets, one speaks of safe Petri nets. Concerning (practical) applications, it is important to know whether one deals with a bounded or safe Petri net, since one is then sure that there will be no overflows in the buffers or registers, no matter what the firing sequence will be. Definition 2.32 (Live net) A Petri net is said to be live for the initial marking µ if for each marking ν reachable from µ and for each transition q, there exists a marking o which is reachable from ν and such that q is enabled on o. A Petri net which is not live is called deadlocked. A Petri net is deadlocked if its reachability tree has a marking where a transition, or a set of transitions, can never fire whatever the firing sequences of the other transitions. For a live Petri net, whatever the finite initial sequence of firings, from that point onwards, any arbitrary transition can be fired an infinite number of times. An example of a live net is a state machine the underlying graph of which is strongly connected and the initial marking of which has at least one token. Definition 2.33 (Consistent net) A Petri net is called consistent (weakly consistent) if there exists a firing sequence, characterized by the x-vector with positive (nonnegative) integers as components, such that G x = 0, where G is the incidence matrix. In a consistent Petri net we can choose a finite sequence of firings such that repeating this sequence results in a periodic behavior. Definition 2.34 (Synchronous net) A consistent net is called synchronous if the only solutions x of G x = 0 are of the form x = k (1, 1, . . . , 1) . Definition 2.35 (Strictly conservative net) A Petri net with initial marking µ is called strictly conservative if, for all reachable markings µ, we have pi ∈P µi = pi ∈P µi . Definition 2.36 (Conservative net) A Petri net with initial marking µ is called conservative if positive integers ci exist such that, for all reachable markings µ, we have pi ∈P ci µi = pi ∈P ci µi . Theorem 2.37 The number of tokens in any circuit of an event graph is constant. 2.4. Petri Nets 61 Proof If a transition is part of an elementary circuit, then exactly one of its incoming arcs and one of its outgoing arcs belong to the circuit. The firing of the transition removes one token from the upstream place connected to the incoming arc (it may remove tokens from other places as well, but they do not belong to the circuit) and it adds one token to the downstream place connected to the outgoing arc (it may add tokens to other downstream places as well, but they do not belong to the circuit). An event graph is not necessarily strictly conservative. Consider an andconvergence, where two circuits merge. The firing of this transition removes one token from each of the upstream places and adds one token to the only downstream place. At an and-divergence, where two circuits split up, the firing of the transition removes one token from the only upstream place and adds one token to each of the downstream places. Theorem 2.38 An autonomous event graph is live if and only if every circuit contains at least one token with respect to the initial marking. Proof Only if part: If there are no tokens in a circuit of the initial marking of an event graph, then this circuit will remain free of tokens and thus all transitions along this circuit never fire. If part: If a transition is never enabled by any firing sequence, then by backtracking token-free places, one can find a token-free circuit. Indeed, if in an event graph a transition never fires, there is at least one upstream transition that never fires also (this statement cannot be made for general Petri nets). This backtracking is only possible if each place has a transition as predecessor and each transition has at least one place as predecessor. This holds for autonomous event graphs. Thus the theorem has been proved. Theorem 2.39 For a connected event graph, with initial marking µ, a firing sequence can lead back to µ if and only if it fires every transition an equal number of times. Proof In a connected event graph all transitions are either and-divergences, andconvergences or they are simple, i.e. they have one upstream place as well as one downstream place. These categories may overlap one another. If an and-divergence is enabled and it fires, then the number of tokens in all downstream places is increased by one. In order to dispose of these extra tokens, the downstream transitions in each of these places must fire also (in fact, they must fire as many times as the originally enabled transition fired in order to keep the number of tokens of the places in between constant). If an and-convergence wants to fire, then the upstream transitions of its upstream places must fire first in order that the number of tokens of the places in between do not change. Lastly, if a transition is simple and it can fire, both the unique downstream transition and upstream transition must fire the same number of times in order that the number of tokens in the places in between do not change. The reasoning above only fails for loops. Since loops are connected to the event graph also and since a firing 62 Synchronization and Linearity of a transition in a loop does not change the number of tokens in the place in the loop, these loops can be disregarded in the above reasoning. This theorem states that the equation G x = 0 has only one positive independent solution x = (k , . . . , k ) . An immediate consequence is that every connected event graph is synchronous. Theorem 2.40 Consider an autonomous live event graph. It is safe if and only if the total number of tokens in each circuit equals one. Proof The if part of the proof is straightforward. Now consider the only if part. Assume that the graph is safe and that the total number of tokens in a circuit ζk , indicated by µ(ζk ), is not necessarily one. Consider all circuits (ζ1 , ζ2 , . . . , ζm ) passing through a place pi and its upstream transition t j . Bring as many tokens as possible to each of the upstream places of t j and subsequently fire t j as many times as possible. It can be seen that the maximum number of tokens that can be brought in pi is bounded from above by min{µ(ζ1 ), µ(ζ2 ), . . . , µ(ζm )}. In particular, if this minimum equals one, then this maximum number of tokens is less than or equal to one. Since the event graph is live, t j can be enabled, and therefore this maximum equals one. The following theorem is stated in [95]. Theorem 2.41 In a live event graph µ is reachable from µ if and only if µ y = µ y , for any y that satisfies Gy = 0. This last theorem sharpens the result mentioned at the end of §2.4.1, where the condition µ y = µ y was only a necessary condition. 2.5 Timed Event Graphs The original theory of Petri nets deals with the ordering of events, and questions pertaining to when events take place are not addressed. However, for questions related to performance evaluation (how fast can a network produce?) it is necessary to introduce time. This can be done in two basic ways by associating durations with either transition firings or with the sojourn of tokens in places. Durations associated with firing times can be used to represent production times in a manufacturing environment, where transitions represent machines, the length of a code in a computer science setting, etc. We adopt the following definition. Definition 2.42 (Firing time) The firing time of a transition is the time that elapses between the starting and the completion of the firing of the transition. We also adopt the additional convention that the tokens to be consumed by a transition remain in the preceding places during the firing time; they are called reserved tokens. Durations associated with places can be used to represent transportation or communication time. When a transition produces a token into a place, this token cannot immediately contribute to the enabling of the downstream transitions; it must first spend some holding time in that place, which actually represents the time it takes to transport this token from the initial transition to the place. 2.5. Timed Event Graphs 63 Definition 2.43 (Holding time) The holding time of a place is the time a token must spend in the place before contributing to the enabling of the downstream transitions. Observe that there is a basic asymmetry between both types of durations: firing times represent the actual time it takes to fire a transition, while holding times can be viewed as the minimal time tokens have to spend in places (indeed it is not because a specific token has completed its holding time in a place that it can immediately be consumed by some transition; it may be that no transition capable of consuming this token is enabled at this time). In practical situations, both types of durations may be present. However, as we shall see later on, if one deals with event graphs, one can disregard durations associated with transitions without loss of generality. Roughly speaking, a Petri net is said to be timed if such durations are given as new data associated with the network. A basic dichotomy arises depending on whether these durations are constant or variable. Throughout the book, only the dependence on the index of the firing (the firing of transition q of index k is the k -th to be initiated), or on the index of the token (the token of p of index k is the k -th token of p to contribute enabling σ ( p )) will be considered. The other possible dependences, like for instance the dependence on time, or on some possibly changing environment, will not be addressed. The timing of a Petri net will be said to be variable if the firing times of a transition depend on the index of the firing or if the holding times of tokens in a place depend on the index of the token. The timing is constant otherwise. In the constant case, the first, the k -th and (k + 1)-st firings of transition q take the same amount of time; this common firing time may however depend on q (see the examples below). Remark 2.44 With our definitions, nothing prevents a transition from having several ongoing firings (indeed, a transition does not have to wait for the completion of an ongoing firing in order to initiate a new firing). If one wants to prevent such a phenomenon, one may add an extra place associated with this transition. This extra place should have the transition under consideration as unique predecessor and successor, and one token in the initial marking, as indicated in Figure 2.21. The addition of this loop models a mechanism that will be called a recycling of the transition. Owing to this mechanism, the firings of the transition are properly serialized in the sense that its (k + 1)-st firing can only start after the completion of the k -th firing. In the rest of this chapter, unless otherwise specified, ⊕ will be maximization and ◦ ⊗ addition, so that ε = −∞ and e = 0. We will also use the symbol / to denote subtraction. 2.5.1 Simple Examples The global aim of the present section is to derive evolution equations for the variables x i (k ), i = 1, . . . , |Q|, k ≥ 0, not counting the sources and sinks, and where x i (k ) is defined as the epoch at which transition qi starts firing for the k -th time. Both constant 64 Synchronization and Linearity p1 p1 or q1 q1 p2 p2 Figure 2.21: Recycling of a transition and variable timings will be considered. For general Petri nets, such equations are difficult to derive. This problem will only be addressed in Chapter 9. For the moment, we shall confine ourselves to event graphs. We start with some simple examples with constant timing before addressing the general case. The issue of the initial condition will not be addressed in these simple examples either (see §2.5.2.1). The general rule (to which we will return in the next subsection) is that transitions start firing as soon as they are enabled. Example 2.45 (An autonomous timed event graph) The first example deals with the manufacturing system of §1.2.3, as depicted in Figure 1.5. The related Petri net, given in Figure 2.13, is the starting point for discussing the way the evolution equations are derived. The timing under consideration is limited to constant holding times on places, which are indicated in the figure. The firing times are all assumed to be 0. We have the following evolution equations: x 1 (k x 2 (k x 3 (k x 4 (k x 5 (k x 6 (k x 7 (k + 1) = 5 x 2 (k ) ⊕ 3 x 7 (k ) , + 1) = 1 x 1 (k + 1) ⊕ 3 x 5 (k ) , + 1) = 3 x 5 (k ) ⊕ 4 x 6 (k ) , + 1) = 1 x 1 (k + 1) ⊕ 3 x 3 (k + 1) , + 1) = 5 x 2 (k + 1) ⊕ 2 x 4 (k + 1) , + 1) = 3 x 3 (k + 1) ⊕ 3 x 7 (k ) , + 1) = 2 x 4 (k + 1) ⊕ 4 x 6 (k + 1) . In order to get this set of equations, one must first observe that owing to our assumption that holding times are constant, overtaking of tokens in places is not possible: the k -th token to enter a place will be the k -th token to leave that place (at least if the initial marking in that place is 0). If one uses this observation, the equation for x 6 (for instance) is obtained as follows: q6 is enabled for the (k + 1)-st time at the latest of the two epochs when the (k + 1)-st token to enter the place between q3 and q6 completes its holding time there, and when the k -th token to enter the place between q7 and q6 completes its holding time. The difference between the arguments k and k + 1 comes from the fact that the place between q7 and q6 has one token in the initial marking. If one now uses the definition of holding times, it is easily seen that the first of these two 2.5. Timed Event Graphs 65 epochs is x 3 (k + 1) + 3, while the second one is x 7 (k ) + 3, which concludes the proof. There are clearly some further problems to be addressed regarding the initial condition, but let us forget them for the moment. In matrix form this equation can be written as x (k + 1) = A0 x (k + 1) ⊕ A1 x (k ) , (2.10) where A0 = εε 1ε εε 1ε ε5 εε εε ε ε ε 3 ε 3 ε ε ε ε ε 2 ε 2 ε ε ε ε ε ε ε ε ε ε ε ε ε 4 ε ε ε ε ε ε ε , A1 = ε ε ε ε ε ε ε ε ε ε ε ε ε ε 5 ε ε ε ε ε ε ε ε ε ε ε ε ε εε 3ε 34 εε εε εε εε 3 ε ε ε ε 3 ε . The equation is written in a more convenient way as x (k + 1) = Ax (k ) , where A = A∗ A1 (see (1.22)), or, written out, 0 ε 5εε ε 6εε ε εεε 6εε A= ε ε 11 ε ε ε εεε ε 8εε ε ε 3 ε 34 67 89 67 10 11 (2.11) 3 4 ε 4 9 3 7 . Remark 2.46 Both equations (2.11) and (1.29) describe the evolution of the firing times, the first equation with respect to the state x , the second one with respect to the output y . Using the notation of §1.2.3, one can check that y (k + 1) = Cx (k + 1) = C Ax (k ) and that y (k + 1) = M y (k ) = MCx (k ) , where A is defined above, and where C A equals MC . Conversely, it is easy to derive a Petri net from (2.11); such a net has 7 transitions (the dimension of the state vector) and 22 places (the number of entries in A which are not equal to ε). Each of these places has a token in the initial marking. The holding time associated with the place connecting transition j to transition i is given by the 66 Synchronization and Linearity u1 u2 3 1 x1 5 x2 4 3 4 x3 2 20 y Figure 2.22: Petri net of Example 2.47 appropriate Ai j entry. Thus at least two different Petri nets exist which both yield the same set of evolution equations. In this sense these Petri nets are equivalent. Example 2.47 (A nonautonomous timed event graph) The starting point for the next example is Figure 2.22, which coincides with Figure 2.12 with an initial marking added. The timing is again limited to places. The evolution equations are given by x (k + 1) = A0 x (k + 1) ⊕ A1 x (k ) ⊕ A2 x (k − 1) ⊕ B0 u (k + 1) ⊕ B1 u (k ) , y (k ) = C0 x (k ) ⊕ C1 x (k − 1) , where εε A0 = 3 ε 34 ε ε , ε ε A1 = ε ε 1 B0 = ε ε C0 = ( ε ε 4 ε ε ε ε , ε ε ε , ε ε B1 = ε ε 2), C1 = ε ε A2 = ε ε 1 ⊕ 4 8 y (k ) = ε ε ε e ε ε ε ε ε ε u (k + 1) ⊕ ε ε ε 2 x (k ) ⊕ ε e ε ε , 2 ε 5 , ε If one uses A∗ , this system can be written as 0 ε4ε ε x (k + 1) = ε 7 ε x (k ) ⊕ ε ε 11 ε ε ε ε ε ε . ε ε x (k − 1) 2 ε 5 u (k ) , 9 x (k − 1) . 2.5. Timed Event Graphs 67 This equation can be put into standard form by augmenting the state space: if one defines x (k ) = (x 1 (k ), x 1 (k − 1), x 2 (k ), x 2 (k − 1), x 3 (k ), x 3 (k − 1), u 1 (k ), u 2 (k )) , the system can be written as x (k + 1) y (k ) = = A x (k ) ⊕ B u (k + 1) , C x (k ) , (2.12) where A= ε e ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε 4 ε 7 e 11 ε ε ε ε ε ε ε ε e ε ε ε ε ε ε 2 ε ε ε ε ε ε ε ε ε ε ε ε ε 5 ε 9 ε ε ε , B= 1 ε 4 ε 8 ε e ε ε ε ε ε ε ε ε e , and C= ε ε ε e 2ε ε ε . Further simplifications are possible in this equation. Since x 1 is not observable (see [72]) and since it does not influence the dynamics of x 2 or x 3 either, it can be discarded from the state, as can u 1 . Thus a five-dimensional state vector suffices. Equation (2.12) is still not in the standard form since the argument of u is k + 1 instead of k . If one insists on the ‘precise’ standard form, it can be shown that (2.12) and x (k + 1) y (k ) = = A x (k ) ⊕ A B u (k ) , C x (k ) ⊕ C B u (k ) , (2.13) are identical in the sense that their γ -transforms (see Chapter 1) are identical. This is left as an exercise to the reader. The latter equation does have the standard form, though there is a direct throughput term (the input u (k ) has a direct influence on the output y (k )). Example 2.48 (Discrete analogue of the system of §1.2.7) The purpose of this example is to play again with the ‘counter’ description already alluded to in Chapter 1, and to show that discrete event systems may obey similar equations as some continuous systems, up to the problem of ‘quantization’. Figure 2.23 (left-hand side) represents a simple event graph: the three transitions are labelled u , x and y , holding times of places are indicated by numbers (firing times are all zero) and an initial marking is shown. With each transition, e.g. x , is associated a function of time t having the same name, e.g. t → x (t ), with the following meaning: x (t ) represents the number of firings 68 Synchronization and Linearity u 2 x 2 0 y Figure 2.23: An event graph and its continuous analogue of transition x up to time t , and it is assumed that x (t ) = e for t < 0. The following equations are obtained in the min-plus algebra: x (t ) = 1 x (t − 2) ⊕ u (t − 2) ; y (t ) = 3 x (t ) , where the delays in time originate from the holding times and where the coefficients are connected with the initial number of tokens in the places. By successive substitutions, one obtains: y (t ) = = . . . 3u (t − 2) ⊕ 4 x (t − 2) 3u (t − 2) ⊕ 4u (t − 4) ⊕ 5 x (t − 4) = h (τ )u (t − τ ) , 2≤τ ≤t +2 τ even where h is the function defined by (1.38). Observe that the min-summation can be limited to t + 2 because u (−1) = u (−2) = · · · = x (−1) = x (−2) = · · · = e and the coefficient of x (t ) is larger than that of u (t ). Indeed, for the same reason, the minsummation can be extended to τ < +∞, and also to −∞ < τ because h (τ ) remains equal to 3 for values of τ below 2, whereas u (t − τ ) is nonincreasing with τ . Finally, one obtains that y (t ) = h (τ )u (t − τ ) , −∞<τ<+∞ τ even which compares with (1.39), except that now τ ranges in 2Z instead of R. The right-hand side of Figure 2.23 suggests the correspondence of continuous elements (extracted from Figure 1.13) with their discrete counterparts. Recalling the mixing operation explained in the last paragraph of §1.2.7, we see that the discrete analogous operation consists here in ‘synchronizing’ two event graphs similar to that of Figure 2.23 by a join at their output transition y . 2.5.2 The Basic Autonomous Equation The event graphs of this section are assumed to be autonomous. The nonautonomous case will be considered in §2.5.5. We now turn to the derivation of a set of evolution 2.5. Timed Event Graphs 69 equations for event graphs with variable timing, under the general assumptions that • the transitions start firing as soon as they are enabled; • the tokens of a place start enabling the transitions downstream as soon as they have completed their holding times. The problems that arise with variable timing are slightly more complex than in the preceding examples. The main reason for this lies in the fact that tokens can then overtake one another when traversing places or transitions. As we will see in Chapter 9, this precludes a simple ordering of events to hold, and event graphs with overtaking lose the nice linearity property emphasized in the preceding examples. The discussion will hence be limited to the case of event graphs with First In First Out (FIFO) places and transitions, where the ordering of events preserves linearity. 2.5.2.1 Initial Condition We start with a first discussion of the initial condition which is here understood as a set of initial delays attached to the tokens of the initial marking in a way that generalizes what is often done in queuing theory. This topic will be revisited in §5.4.4.1 and §5.4.4.2, and further considered from a system-theoretic point of view in §5.4.4.3. Assume that one starts looking at the system evolution at time t = 0, and that the piecewise constant function Ni (t ) describing the evolution of the number of tokens present in pi , i = 1, . . . , |P |, at time t ∈ R, is right continuous. Let Ni (0) = µi , where µi denotes the initial marking in place pi . The general idea behind the initial condition is as follows: the Ni (0) ( = µi ) tokens visible at time t = 0 in pi are assumed to have entered pi before time 0; at time 0, each token is completing its holding time or it is being consumed by the transition (namely it is a reserved token), or it is ready to be consumed. We can equivalently define the initial condition through the entrance times of the initial tokens, or through the vector of R-valued lag times, where Definition 2.49 (Lag time) The lag time of a token of the initial marking of pi is the epoch when this token starts contributing to enabling σ ( pi ). However, these lag times should be compatible with the general rules that transitions fire as soon as they are enabled and that tokens start enabling the transition downstream as soon as they have completed their holding times. For instance • if the lag time of an initial token exceeds its holding time, this token cannot have entered the place before time 0; • if the lag times (which are possibly negative) are such that one of the transitions completes firing and consumes tokens of the initial marking before t = 0, these tokens cannot be part of the marking seen at time 0 since they must have left before time 0. Definition 2.50 (Weakly compatible initial condition) The initial condition of a timed event graph consists of an initial marking and a vector of lag times. This initial condition is weakly compatible if 70 Synchronization and Linearity 1. the lag time of each initial token does not exceed its holding time; 2. the first epoch when a transition completes firing is nonnegative. 2.5.2.2 FIFO Places and Transitions A basic assumption that will be made throughout the chapter is that both places and transitions are First In First Out (FIFO) channels. Definition 2.51 (FIFO place) A place pi is FIFO if the k-th token to enter this place is also the k-th which becomes available in this place. In view of the interpretation of holding times as communication or transportation times, this definition just means that the transportation or communication medium is overtake free. For instance, a place with constant holding times is FIFO. Definition 2.52 (FIFO transition) A transition q j is FIFO if the k-th firing of q j to start is also the k-th to complete. The interpretation is that tokens cannot overtake one another because of the firing mechanism, namely the tokens produced by the (k + 1)-st firing of q j to be initiated cannot enter the places of σ (q j ) earlier than those of the k -th firing. For instance, a transition with constant firing times is always FIFO. If a transition is recycled, its (k + 1)-st firing cannot start before the completion of the k -th one, so that a recycled transition is necessarily FIFO, regardless of the firing times. Definition 2.53 (FIFO event graph) An event graph is FIFO if all its places and transitions are FIFO. A typical example of a FIFO timed event graph is that of a system with constant holding times and recycled transitions with possibly variable firing times. An event graph with constant holding and firing times is always FIFO, even if its transitions are not recycled. Since the FIFO property is essential in order to establish the evolution equations of the present section, it is important to keep in mind that: The classes of timed event graphs considered throughout the book are those with 1. constant firing and holding times; 2. constant holding times and variable firing times, provided all transitions are recycled. 2.5.2.3 Numbering of Events The following way of numbering the tokens that traverse a place and the firings of a transition will be adopted. 2.5. Timed Event Graphs 71 By convention, the k -th token, k ≥ 1, of place pi is the k -th token to contribute enabling σ ( pi ) during the evolution of the event graph, including the tokens of the initial marking. The k -th firing, k ≥ 1, of transition qi is the k -th firing of qi to be initiated, including the firings that consume initial tokens. It may happen that two tokens in pi contribute enabling σ ( pi ) at the same epoch, or that two firings of a transition are initiated at the same time (when the transition is not recycled). In this case, some ordering of these simultaneous events is chosen, keeping in mind that it should be compatible with the FIFO assumptions. 2.5.2.4 Dynamics In what follows, the sequences of holding times αi (k ), i = 1, . . . , |P |, k ∈ Z, and of firing times β j (k ), j = 1, . . . , |Q|, k ∈ Z, are assumed to be given nonnegative and finite real numbers. Initially, only the restriction of these sequences to k ≥ 1 will be needed. However, we assume that these sequences can be continued to k ≤ 0. Such a continuation is clear in the case of a constant timing, and we will see in due time how to define the continuation in more general circumstances (see §2.5.7). We are now in a position to define the dynamics of the event graph more formally. • The k -th token of place pi incurs the holding time αi (k ). • Once the k -th firing of transition q j is enabled, the time for q j to complete its k -th firing is the firing time β j (k ). When this firing is completed, the reserved token is removed from each of the places of π(q j ), and each place of σ (q j ) receives one token. We now state a few basic properties of the numbering in a FIFO event graph with a weakly compatible initial condition. For i such that µi ≥ 1, denote wi (1) ≤ wi (2) ≤ · · · ≤ wi (µi ) ∈ R, the lag times of the initial tokens of place pi ordered in a nondecreasing way. Lemma 2.54 If the initial condition is weakly compatible and if the timed event graph is FIFO, then for all i and for all k such that 1 ≤ k ≤ µi and µi ≥ 1, the initial token with lag time wi (k ) is also the k-th token of place pi (that is the k-th token to enable σ ( pi )). Proof If this last property does not hold for some place pi , then a token which does not belong to the initial marking of pi , and which hence enters pi after time 0 (the initial condition is weakly compatible), contributes to enabling σ ( pi ) before one of the tokens of the initial marking does. Since the tokens of the initial marking enter pi before time 0 (the initial condition is weakly compatible), this contradicts the assumption that pi is FIFO. Lemma 2.55 The firing of q j that consumes the k-th token of pi (for all pi ∈ π(q j )) is the k-th firing of qi . 72 Synchronization and Linearity Proof Owing to the numbering convention, the set of k -th tokens of pi ∈ π(q j ) enables q j before the set of (k + 1)-st tokens. Lemma 2.56 The completion of the k-th firing of q j , k ≥ 1, produces the (k + µi )-th token of pi , for all pi ∈ σ (q j ). Proof The FIFO assumptions on transitions imply that the completion of the k -th firing of q j produces the k -th token to enter the places that follow pi . The property follows immediately from the FIFO assumption on places and from Lemma 2.54. 2.5.2.5 Evolution Equations Definition 2.57 (State variables, daters) The state variable x j (k ), j = 1, . . . , |Q|, k ≥ 1, of the event graph is the epoch when transition q j starts firing for the k-th time, with the convention that for all qi , x i (k ) = ∞ if qi fires less than k times. These state variables will be called daters. These state variables are continued to negative values of k by the relation x j (k ) = ε, for all k ≤ 0. Let M= max µi . i =1,... ,|P | (2.14) In what follows, we will adopt the convention that the ⊕-sum over an empty set is ε. Define the |Q| × |Q| matrices A(k , k ), A(k , k − 1), . . . , A(k , k − M ), by def A jl (k , k − m ) = αi (k ) ⊗ βl (k − m ) , (2.15) {i ∈π q ( j )|π p (i )=l ,µi =m } and the |Q|-dimensional vector v(k ), k = 1, . . . , M , by def v j (k ) = wi (k ) . (2.16) {i ∈π q ( j )|µi ≥k } Theorem 2.58 For a timed event graph with recycled transitions, the state vector x (k ) = (x j (k )) satisfies the evolution equations: x (k ) = A(k , k ) x (k ) ⊕ A(k , k − 1)x (k − 1) ⊕ · · · ⊕ A(k , k − M )x (k − M ) , k = M + 1, M + 2, . . . , (2.17) with the initial conditions x (k ) def = A(k , k ) x (k ) ⊕ · · · ⊕ A(k , k − M )x (k − M ) ⊕ v(k ) , k = 1, 2, . . . , M , where x j (k ) = ε for all k ≤ 0. (2.18) 2.5. Timed Event Graphs 73 Proof We first prove that the variables x j (k ), j = 1, . . . , |Q|, satisfy the evolution equations: x j (k ) = αi (k ) ⊗ βπ p (i ) (k − µi ) ⊗ x π p (i ) (k − µi ) {i ∈π q ( j )|k >µi } ⊕ wi (k ) , k = 1, 2, . . . . (2.19) {i ∈π q ( j )|k ≤µi } The k -th firing, k ≥ 1, of transition q j starts as soon as, for all i ∈ π q ( j ), the k -th token of pi contributes to enabling q j . In view of Lemmas 2.55 and 2.56, for k > µi , this k -th token is produced by the (k − µi )-th firing of transition π( pi ), so that the epoch when this token contributes enabling σ ( pi ) is αi (k ) ⊗ βπ p (i ) (k − µi ) ⊗ x π p (i ) (k − µi ). For k ≤ µi , this event takes place at time wi (k ), in view of Lemma 2.54, which completes the proof of (2.19). We now use associativity and commutativity of ⊕, together with our convention on ⊕-sums over empty sets, to rewrite x j (k ), k > M , as M |Q| αi (k ) ⊗ βl (k − m ) ⊗ xl (k − m ) . m =0 l =1 {i ∈π q ( j )|π p (i )=l , µi =m } The distributivity of ⊗ with respect to ⊕ implies in turn M |Q| x j (k ) = m =0 l =1 αi (k ) ⊗ βl (k − m ) ⊗ xl (k − m ) , {i ∈π q ( j )|π p (i )=l , µi = m } which completes the proof of (2.17), in view of the definition of A. The proof of (2.18) follows the same lines (using the continuation of the functions x j (k ) to ε for k ≤ 0). Remark 2.59 Owing to the dynamics, the first transition to complete its firing is necessarily within the set of transitions q j having at least one token in the initial marking of pi for all pi ∈ π(q j ). Since the set of tokens with the smallest lag times is the first to be consumed, the second weak compatibility condition in Definition 2.50 can be translated into the requirement that β j (1) ⊗ v j (1) ≥ e , (2.20) for all j such that µi ≥ 1 ∀ pi ∈ π(q j ), which can be seen as a first set of linear constraints on the lag times in view of (2.16). Similarly, the first weak compatibility relation is translated into the following additional set of linear constraints: wi (k ) ≤ αi (k ) , i = 1 , . . . , |P | , 1 ≤ k ≤ µi . (2.21) For instance, if wi (k ) = αi (k ) for all i = 1, . . . , |P |, 1 ≤ k ≤ µi , then the initial condition is weakly compatible, provided the condition wi (1) ≤ wi (2) ≤ . . . ≤ wi (µi ) is satisfied. 74 Synchronization and Linearity α 14 ( k ) α 21 ( k ) q1 α 42 ( k ) q4 q2 α 31 ( k ) q3 α 43 ( k ) α 11 ( k ) α 44 ( k ) Figure 2.24: Event graph of Example 2.60 Example 2.60 Consider the timed event graph of Figure 2.24. The firing times are assumed to be equal to e. The place connecting q j to ql is denoted pl j . The holding times in this place are denoted αl j (k ), and the lag times of the initial tokens wl j (k ). In order to have a FIFO event graph, places should be overtake free. This will always be true for p11 and p22 , regardless of the holding time sequences in these places (since there is always at most one token each of these places). A simple sufficient condition ensuring that the other places are overtake free is that the associated holding time sequences are non decreasing in k (for instance constant). Under the assumption that the event graph is FIFO, the matrices and vectors involved in the evolution equations are ε ε ε ε α11 (k ) ε ε α14 (k ) ε ε ε ε ε , A(k , k − 1) = α21 (k ) ε ε , A(k , k ) = ε ε εε ε ε ε ε ε ε ε α44 (k ) ε α42 (k ) α43 (k ) ε ε ε A(k , k − 2) = α31 (k ) ε and w11 (1) ⊕ w14 (1) w21(1) , v(1) = w31(1) w44(1) ε ε ε ε ε ε ε ε ε ε , ε ε ε ε v(2) = w31 (2) . ε The constraints (2.21) and (2.20) are translated into the bounds wl j (k ) ≤ αl j (k ), and w11 (1) ⊕ w14 (1) ≥ e, w21 (1) ≥ e, w31 (1) ≥ e. 2.5.2.6 Simplifications Firing Times The evolution equations (2.19) are unchanged if one sets all the firing times equal to e and if αi (k ) receives the value αi (k ) ⊗ βπ p (i ) (k − µi ). Thus, one can 2.5. Timed Event Graphs 75 always modify the holding times in order to get an ‘equivalent’ event graph with firing times of duration e, where equivalence means that the epochs when transitions fire are the same in both systems. It may be assumed without loss of generality that the firing times are equal to e = 0. Observe that under this assumption the state variable x j (k ) is also the epoch when transition q j completes its k -th firing. Graphically, this is exemplified in Figure 2.25. If a firing time were assigned to a transition within an event graph, then this time can always be assigned to (i.e. added to) the holding times of all the upstream places. Consider Figure 2.25a, in which the transition has a firing time of 5 time units. The 3 4 9 8 5 4 3 0 6 4 3 4 3 0 11 6 5 0 5 6 (a) (b) (c) (d) 6 (e) Figure 2.25: Firing times set to e holding times of the places are 3, 4 and 6 as indicated in this figure. Figure 2.25b shows assignment of the firing time to the places. The firing of the transition in Figure 2.25b, which is instantaneous, corresponds to the completion of the firing of the transition in Figure 2.25a. Another, similar, solution is provided in Figure 2.25c, where the holding time has been assigned to all the downstream places. The firing of the transition in this figure corresponds to the initiation of the firing of the transition in Figure 2.25a. A different solution is provided in Figure 2.25d. In this figure, both the beginning and completion of the firing are now explicitly represented. In Figure 2.25e, the holding time at the transitions is also 0, but in contrast to the previous solutions the transitions cannot fire twice (or more times) within 5 time units. The transitions cannot be engaged in (partly) parallel activities. In what follows, we shall therefore often assume that the firing times are zero. The practical implication of this mathematical simplification is clear in the constant firing and holding time case. In the variable case, one should however always keep in mind that the only meaningful initial situation is that of variable firing times (on recycled transitions) and constant holding times. Initial Condition The end of this subsection is devoted to a discussion of the initial condition. It is shown that Equation (2.18), defining the initial condition, can be further simplified whenever the lag times satisfy certain additional and natural constraints. 76 Synchronization and Linearity For all i such that µi > 0, denote by yi (k ), k ≤ 0, the entrance time function associated with place pi , defined by the relation def yi (k − µi ) = ◦ wi (k )/αi (k ) ε if 1 ≤ k ≤ µi ; if k > µi , (2.22) ◦ where we recall that / denotes conventional subtraction. The initial condition is said to be compatible if it is wealky compatible and if for any pair of places pi and p j which follow the same transition, the entrance times yi (k ) and y j (k ) coincide provided k ≥ min(µi , µ j ). Definition 2.61 (Compatible initial condition) The initial condition is compatible if it is weakly compatible and if there exist functions z j (k ), j = 1, . . . , |Q|, k ≤ 0, such that yi (k ) = z π p (i ) (k ) , ∀i, k such that − µi + 1 ≤ k ≤ 0 . (2.23) This condition is quite natural, should the initial condition result from a past evolution of the event graph: for instance, the last tokens of the initial marking to enter two places pi and pi that follow the same transition q j , have then been produced at the same time z j (0) by a firing of q j . Let def M j = max (µi ) . q i ∈σ ( j ) Observe that the function z j (k ) is only defined through (2.23) for − M j < k ≤ 0, provided M j ≥ 1. For other values of k , or if M j = 0, we take z j (k ) = ε. Instead of the former continuation of x (k ) to k ≤ 0 (which consisted in taking x j (k ) = ε for k ≤ 0), we now take x j (k ) = z j (k ) , ∀k ≤ 0 , j = 1 , . . . , | Q| . (2.24) Corollary 2.62 For a FIFO timed event graph with a compatible initial condition, the state vector x (k ) = (x j (k )) satisfies the evolution equations: x (k ) = A(k , k ) x (k ) ⊕ A(k , k − 1)x (k − 1) ⊕ · · · ⊕ A(k , k − M )x (k − M ) , k = 1, 2, . . . , (2.25) provided the continuation of x (k ) to negative values of k is the one defined by (2.24). Proof By successively using (2.22) and (2.23), one gets wi (k ) = {i ∈π q ( j )|k ≤µi } αi (k ) ⊗ yi (k − µi ) {i ∈π q ( j )|k ≤µi } = αi (k ) ⊗ z π p (i ) (k − µi ) , {i ∈π q ( j )|k ≤µi } 2.5. Timed Event Graphs 77 for all k = 1, 2, . . . , so that one can rewrite (2.19) as x j (k ) = x π p (i ) (k − µi ) ⊗ αi (k ) , k = 1, 2, . . . , (2.26) {i ∈π q ( j )} when using the continuation of x proposed in (2.24). Equation (2.25) follows immediately from (2.26). Remark 2.63 A simple example of compatible initial condition is obtained when choosing wi (k ) = αi (k ) for all 1 ≤ k ≤ µi . Practically speaking, this means that all the initial tokens enter at time 0. This corresponds to the continuation x j (k ) = e ε if − M j < k ≤ 0 ; if k ≤ M j . (2.27) Example 2.64 (Example 2.60 continued) If the initial condition is compatible, let z 1 (0) = ◦ ◦ ◦ w11(1)/α11 (1) = w21 (1)/α21 (1) = w31(2)/α31 (2) , z 1 (−1) def = ◦ w31(1)/α31 (1) , z 4 (0) Define def def ◦ ◦ w14(1)/α14 (1) = w44 (1)/α44 (1) . = z 1 (0) ε x (0) = ε , z 4 (0) z 1 (−1) ε . x (−1) = ε ε It is easily checked that v(2) = A(2, 0)x (0) and that v(1) = A(1, 0)x (0) ⊕ A(1, −1)x (−1) . Thus x (k ) = A(k , k )x (k ) ⊕ A(k , k − 1)x (k − 1) ⊕ A(k , k − 2)x (k − 2), k = 1, 2, . . . . 2.5.3 Constructiveness of the Evolution Equations The first natural question concerning Equation (2.17) is: is it implicit or constructive? The main result of this section establishes that the evolution equations (2.17) are not implicit and that they allow one to recursively define the value of x j (k ) for all j = 1, . . . , |Q|, and k ≥ 1, provided the event graph under consideration is live. 78 Synchronization and Linearity Lemma 2.65 The event graph is live if and only if there exists a permutation P of the coordinates for which the matrix P A(k , k ) P is strictly lower triangular for all k. Proof If the matrix P A(k , k ) P is strictly lower triangular for some permutation P , then there is no circuit with 0 initial marking, in view of the definition of A(k , k ) (see (2.15)). Conversely, if the event graph is live, the matrix A(k , k ) has no circuit, and there exists a permutation of the coordinates that makes A strictly lower triangular. The proof is then concluded from Theorem 2.38. Observe that the fact that P does not depend on k comes from the fact that the support of A(k , k ) does not depend on k (by ‘the support’ of matrix A we mean the matrix S with the same dimension as A defined by Si j = 1 Ai j =ε ). If the matrix P A(k , k ) P is strictly lower triangular, An (k , k ) = ε for n ≥ |Q|, and the matrix def A∗ (k , k ) = e ⊕ A(k , k ) ⊕ A2 (k , k ) ⊕ · · · is finite. Let A(k , k − l ) = A∗ (k , k ) A(k , k − l ) , def and k∈Z , v(k ) = A∗ (k , k )v(k ) , def l = 1, . . . , M , (2.28) k ∈Z , def with v j (k ) = ε for k ≤ 0 or k > M . Theorem 2.66 If the event graph is live, the evolution equations (2.17) and (2.18) can be rewritten as x (k ) = A (k , k − 1)x (k − 1) ⊕ · · · ⊕ A(k , k − M )x (k − M ) ⊕ v(k ) , k = 1, 2, . . . , (2.29) def where x j (k ) = ε, for all k ≤ 0. Proof From (2.17) and (2.18), we obtain by induction on n that x (k ) = An+1 (k , k )x (k ) n M A (k , k ) ⊕ A(k , k − l )x (k − l ) ⊕ v(k ) , k = 1, 2, . . . . m m =0 l =1 Equation (2.29) follows from the last relation by letting n go to ∞. Remark 2.67 If the initial condition is compatible and if one now takes the continuation of x (k ) for k ≤ 0, as defined in (2.24), the same type of arguments shows that (2.25) becomes M x (k ) = A(k , k − l )x (k − l ) , l =1 k = 1, 2, . . . . 2.5. Timed Event Graphs 79 Corollary 2.68 If the event graph is live and if the holding times and the lag times are all finite, so are the state variables x j (k ), j = 1, . . . , |Q|, k ≥ 1. Proof The proof is by induction based on (2.29). Remark 2.69 The matrix A(k , k −l ), l ≥ 1, has a simple graph-theoretic interpretation. Let S ( j , j , l ) be the set of paths in the graph G of the event graph, of length at least 2, with initial transition q j , with final transition q j , and such that the first two transitions of the path are connected by a place with initial marking equal to l , while the other transitions are connected by places with 0 initial marking. It is easily checked, using the results of §2.4, that A j j (k , k − l ) is defined by the relation h −1 A j j (k , k − l ) = αin (k ) , (2.30) {ρ =( j1,i1 , j2 ,i2 ··· ,ih −1 , jh )∈S ( j , j ,l )} n=1 with the usual convention if the set S ( j , j , l ) is empty. The entry A j j (k , k − l ) is hence simply the longest path in S ( j , j , l ). Example 2.70 (Example 2.60 continued) We have e ε ε ε e ε ∗ A (k , k ) = ε ε e ε α42 (k ) α43 (k ) so that α11 (k ) α21 (k ) A (k , k − 1) = ε α42 (k )α21 (k ) ε ε A (k , k − 2) = α31 (k ) α43 (k )α31 (k ) and ε ε , ε e ε ε ε ε ε ε ε ε α14 (k ) ε , ε α44 (k ) ε ε ε ε ε ε ε ε ε ε , ε ε w11 (1) ⊕ w14 (1) ε ε w21 (1) , v(2) = v(1) = w31(2) . w31 (1) w21 (1)α42 (1) ⊕ w31 (1)α43 (1) ⊕ w44 (1) w31(2)α43 (2) Remark 2.71 (Equivalent event graph with positive initial marking) With the evolution equation (2.29), one can associate a derived event graph with the same set of transitions as the initial event graph, and where the initial marking is such that µi > 0 80 Synchronization and Linearity for all places pi . This event graph is equivalent to the initial one in the sense that corresponding transitions fire at the same times. The derived event graph associated with the event graph of Example 2.60 is given in Figure 2.26, left-hand side. This derived event graph can be defined from the original one by the following transformation rules: 1. take the same set of transitions as in the original event graph; 2. for each path of S ( j , j , l ), l ≥ 1, in the original event graph, create a place connecting j to j with l tokens and with the weight of the path as holding time. α 14 q1 α 14 q4 α 11 α 31 q2 α 21 q1 α 44 α 42 α 21 α 43 α 31 α 11 q4 α 42 α 21 α 43 α 31 α 44 q3 Figure 2.26: Illustration of Remarks 2.71 and 2.72 Remark 2.72 (Equivalent event graph with reduced state space) The dimension of the state space can be reduced using the following observation: the transitions followed by places all having a 0 initial marking, or equivalently the transitions q j such that the entries of the j -th column of A(k , k − l ) are ε, are not present in the right-hand side of (2.29). Let Q be the set of transitions followed by at least one place with a positive initial marking. One can take as reduced state variables x j (k ), j ∈ Q , k ≥ 1. The remaining variables are obtained from them via (2.29). Example 2.73 (Example 2.60 continued) Here we have Q = {q1 , q4 }. With these new state variables, the evolution equations are reduced to x 1 (k ) x 4 (k ) α11 (k ) α42 (k )α21 (k ) = ⊕ α14(k ) α44(k ) ε α43 (k )α31 (k ) ε ε x 1 (k − 1) x 4 (k − 1) x 1 (k − 2) x 4 (k − 2) ⊕ v 1 (k ) v 4 (k ) . The event graph corresponding to these equations is depicted in Figure 2.26, right-hand side. It is obtained from the derived graph by deleting the transitions that do not belong to Q . 2.5. Timed Event Graphs 81 The other state variables are obtained from the reduced state variables by the relation x 2 (k ) x 3 (k ) α21 (k ) ε = ⊕ ε ε ε 31(k ) x 1 (k − 1) x 4 (k − 1) x 1 (k − 2) x 4 (k − 2) ε ε ⊕ v 2 (k ) v 3 (k ) . The variables (x 2 (k ), x 3 (k )) are output variables in the derived event graph. Equivalently, transitions q2 and q3 are sinks of the derived event graph. 2.5.4 Standard Autonomous Equations The data of this section is a live event graph satisfying the evolution equations (2.29), with the reduction of the state space mentioned in Remark 2.72. We will assume that the transitions of Q are numbered 1, . . . , |Q |, which introduces no loss of generality. It may be desirable to replace the initial recurrence (2.29), which is of order M , by an equivalent recurrence of order 1. This is done by using the standard technique which consists in extending the state vector. As a new state vector, take the |Q | × M dimensional vector x (k ) x (k − 1) def x (k ) = . . . . x (k + 1 − M ) Let A(k ), k ∈ Z, be the (|Q | × M ) × (|Q | × M ) matrix defined by the relation A (k + 1, k ) A (k + 1, k − 1) . . . . . . A(k + 1, k + 1 − M ) e ε ... ε ε . . .. . . , . . ε e . A(k ) = . .. . . . e ε ε ε ... ε e ε where e and ε denote the |Q | × |Q | identity and zero matrices, respectively, and let v (k ) be the |Q | × M -dimensional vector def v (k ) = v(k + 1) ε . . . , ε where ε represents here the |Q |-dimensional zero vector. Adopting the convention that x j (k ) and v j (k ) are equal to ε for k ≤ 0, it should be clear that Equation (2.31) in the following corollary is a mere rewriting of the evolution equations (2.29). 82 Synchronization and Linearity Corollary 2.74 The extended state space vector x (k ) satisfies the dimensional recurrence relation of order 1 x (k + 1) = A (k )x (k ) ⊕ v (k ) , k = 1, 2, . . . . M × |Q | (2.31) Equation (2.31) will be referred to as the standard form of the evolution equations of an autonomous timed event graph satisfying (2.29). Remark 2.75 In the particular case of a compatible initial condition, these equations read x (k + 1) = A(k )x (k ) , k = 1, 2, . . . , (2.32) provided the continuation of x j (k ) for k ≤ 0, is that of Equation (2.24). Whenever the entrance times of the tokens of the initial marking are all equal to e (see Remark 2.63), it is easily checked from (2.27) that in this case e ε xl|Q |+ j (0) = if 0 ≤ l < M j ; for l ≥ M j , (2.33) for l = 0, . . . , M − 1; j = 1, . . . , |Q |. Example 2.76 (Example 2.60 continued) Here we have x 1 (k ) v 1 (k + 1) x 4 (k ) v 4 (k + 1) x (k ) = x 1 (k − 1) , v (k ) = ε ε x 4 (k − 1) , and α11 (k + 1) α42 (k + 1)α21 (k + 1) A(k ) = e ε α14 (k + 1) α44 (k + 1) ε e ε α43 (k + 1)α31 (k + 1) ε ε ε ε . ε ε In the special case mentioned at the end of the preceding remark, we have x (0) = (e, e, e, ε) . Remark 2.77 (Equivalent net with at most one token in the initial marking) One can associate a timed event graph with the evolution equations (2.31). The interesting property of this event graph is that its initial marking is such that M = 1 (more precisely, each µi in this event graph is 1). In view of Corollary 2.74, one can hence state that for any timed event graph, one can construct another ‘equivalent’ event graph with initial marking equal to 1 everywhere. The equivalence means here that one can find a bijective mapping from the set of transitions of the initial event graph to a subset of the transitions of the second 2.5. Timed Event Graphs 83 one, such that two corresponding transitions fire at the same time. In particular, any observation of these transitions will be identical. For instance, in the case of a compatible initial condition, this event graph can be obtained from the original one by first transforming it into an event graph with positive initial marking (as it was done in Remark 2.71), and by then applying the following transformation rules: 1. for each transition q j of Q in the original event graph, create M transitions q jl , l = 0, . . . , M − 1; 2. for each q jl , l = 0, . . . , M − 2, create a place that connects q jl to qi,l+1 , with 0 holding times. Put one token in its initial marking, with initial lag time z j (−l ); 3. for each place connecting qi ∈ Q to q j , and with l + 1 initial tokens, l ≥ 0, in the original system, create a place with one token with initial lag time z i (−l ), and with the same holding times sequence as the original place. This new place has qil as input transition, and q j 0 as output transition. For Example 2.60, the corresponding event graph is given in Figure 2.27. The behavior of q10 in this graph is the same as that of q1 in Figure 2.24. The same property holds for q40 and q4 respectively. α 14 α 42 α 21 z 4 (0) q10 q40 0 α 11 z 1 (0) q11 0 α 43 α 31 z 1 ( −1) q41 α 44 Figure 2.27: Illustration of Remark 2.77 2.5.5 The Nonautonomous Case This subsection focuses on FIFO timed event graphs with external inputs. The firing times can be taken equal to 0, without loss of generality. To the framework of the preceding sections, we add a new class of transitions called input transitions. This set of transitions will be denoted I . Definition 2.78 (Input transition) An input transition consists of a source transition and of a nondecreasing real valued sequence, called the input sequence . 84 Synchronization and Linearity The input sequence associated with transition q j ∈ I will be denoted u j (k ), k ≥ 1; the interpretation of this sequence is that u j (k ), k ≥ 1, gives the epoch when q j fires for the k -th time, due to some external trigger action. The input sequences are assumed to be given. Definition 2.79 (Weakly compatible input sequence) The input sequence u j (k ), k ∈ Z, is weakly compatible if u j (1) ≥ 0. In what follows, all input sequences will be assumed to be weakly compatible. As in the autonomous case, the initial condition is said to be weakly compatible if in addition • the entrance times of the tokens of the initial marking are nonpositive; • the firings which consume tokens of the initial marking complete at nonnegative epochs (the assumption that the input sequences are weakly compatible may contribute ensuring that this holds). The definition of compatibility is the same as in the autonomous case. For instance, if the lag times of a nonautonomous event graph are compatible, one can continue the input sequence {u j (k )} (with j ∈ I ) to a nondecreasing sequence {u j (k )}k∈Z , with u j (0) ≤ 0, such that for all pi ∈ σ (q j ) with µi ≥ 1, wi (k ) = αi (k ) ⊗ u j (k − µi ) , 2.5.5.1 ∀k : 1 ≤ k ≤ µi . (2.34) Basic Nonautonomous Equations The derivation of the evolution equations is based on the same type of assumptions as in the autonomous case, namely the event graph is FIFO and the initial condition is weakly compatible. Define the |Q| × |I | matrices B (k , k ), . . . , B (k , k − M ) by def B jl (k , k − m ) = αi (k ) , (2.35) {i ∈π q ( j )|π p (i )=l ,µi =m } and the |I |-dimensional vector u (k ) = (u 1 (k ), . . . , u |I | (k )), k = 1, 2, . . . . Using the same arguments as in Theorem 2.58, we get the following result. Theorem 2.80 Under the foregoing assumptions, the state vector x (k ) (x 1 (k ), . . . , x |Q| (k )) satisfies the evolution equations: x (k ) = A(k , k ) x (k ) ⊕ · · · ⊕ A(k , k − M )x (k − M ) ⊕ B (k , k )u (k ) ⊕ · · · ⊕ B (k , k − M )u (k − M ) ⊕ v(k ) , k = 1, 2, . . . , def = (2.36) where x j (k ) = ε and u j (k ) = ε for all k ≤ 0; v j (k ) is defined as in (2.16) for 1 ≤ k ≤ M and it is equal to ε otherwise. 2.5. Timed Event Graphs 85 If the initial lag times and the input sequences are both compatible, this equation can be simplified by using the same arguments as in Corollary 2.62, which leads to the equation x (k ) = A(k , k ) x (k ) ⊕ · · · ⊕ A(k , k − M )x (k − M ) ⊕ B (k , k )u (k ) ⊕ · · · ⊕ B (k , k − M )u (k − M ) , k = 1, 2, . . . , (2.37) where the continuations that are taken for x (k ) and u (k ), k ≤ 0, are now those defined in Corollary 2.62 and Equation 2.34, respectively. In what follows, we will say that the nonautonomous event graph is live if the associated autonomous event graph (namely the one associated with the equation x (k ) = A(k , k ) x (k ) ⊕ · · · ⊕ A(k , k − M )x (k − M )) is live. Let B (k , k − l ) = A∗ (k , k ) B (k , k − m ) , def k∈Z , l = 0, . . . , M . The following theorem is proved like Theorem 2.66. Theorem 2.81 If the event graph is live, the evolution equations (2.36) can be rewritten as x (k ) = A(k , k − 1)x (k − 1) ⊕ · · · ⊕ A (k , k − M )x (k − M ) ⊕ B (k , k )u (k ) ⊕ · · · ⊕ B (k , k − M )u (k − M ) ⊕ v(k ) , k = 1, 2, . . . , (2.38) with the same simplification as in Corollary 2.62, provided the initial lag times and the input sequences are compatible. The graph theoretic interpretation of B j j (k , k −l ) is again the longest path in S ( j , j , l ) (see Remark 2.69). 2.5.5.2 Standard Nonautonomous Equations Define the ( M × |I |)-dimensional vector def u (k ) = u (k + 1) u (k ) . . . , u (k + 2 − M ) and the (|I | × M ) × (|Q | × M ) matrix B (k + 1, k + 1) B (k + 1, k ) ε ε B (k ) = . . . . . . ε ε ... ... . . . ... B (k + 1, k + 2 − M ) ε . . . . ε 86 Synchronization and Linearity Corollary 2.82 (Standard nonautonomous equation) The extended state space vector x (k ) satisfies the M × |Q | -dimensional recurrence of order 1: x (k + 1) = A(k )x (k ) ⊕ B (k )u (k ) ⊕ v(k ) , k = 1, 2, . . . , (2.39) with the same simplification as in Corollary 2.62, provided the initial lag times and the input sequences are compatible. Remark 2.83 Formally, the autonomous equation (2.17) can also be seen as that of a nonautonomous event graph with a set of input transitions I = {q1 , q2 , . . . } of the same cardinality as the set Q , with input sequence vector v(k ), and with B (k , k ) = e, B (k , k − l ) = ε for l = 1, . . . , M (the input transition q j is connected to the internal transition q j by a single place with 0 initial marking). However, the requirement that an input sequence should be nondecreasing contradicts our foregoing assumption on v(k ) (with our definitions, v j (k ) eventually becomes ε for k large, as it can be seen from (2.16)). However, when using the fact that the sequences x j (k ) are nondecreasing, it is easy to check that one can take the input sequence u j (k ) defined by the function v j (k ) v j (M j ) def u j (k ) = if 1 ≤ k ≤ M j ; if k ≥ M j , instead of v(k ), without altering the values of x (k ). This representation of an autonomous event graph as a nonautonomous one, where all initial lag times can be taken equal to ε, is exemplified on the event graph of Example 2.60 in Figure 2.28. u 2 ( k ) = w 21 (1) , q1 k≥1 q4 q2 q3 u 1 ( k ) = w 14 (1) ⊕ w 11 (1) k≥1 u 3 (1) = w 31 (1) u 3 ( k ) = w 31 (2) , k ≥ 2 u 4 ( k ) = w 44 (1) k≥1 Figure 2.28: Illustration of Remark 2.83 2.5. Timed Event Graphs 2.5.6 87 Construction of the Marking This subsection is concerned with the construction of the marking from the state variables. We will limit ourselves to the construction of the marking at certain epochs for which their expression is quite simple. However, the formulæ that are obtained are nonlinear in the max-plus algebra. We will therefore return to classical algebra, at least in this subsection. For the sake of simplicity, it is assumed that the initial condition is compatible. Pick some place pi in P , and let q j = π( pi ), ql = σ ( pi ). Let Ni (t ) be the number of tokens in place pi at time t , t ≥ 0, with the convention that this piecewise constant function is right-continuous. Let Ni− (k ) def = Ni x π p (i ) (k ) , k≥1 , (2.40) Ni+ (k ) def Ni x σ p (i ) (k ) , k1 . (2.41) = Owing to our definitions, Ni− (k ) is the number of tokens in pi just after the k -th token entrance into pi after t = 0, while Ni+ (k ) is the number of tokens pi just after the departure of the k -th token to leave pi after t = 0. Lemma 2.84 Under the foregoing assumptions, Ni− (k ) Ni+ (k ) k + µi = 1{ xl (h)> x j (k)} , k = 1, 2, . . . , (2.42) h =1 ∞ = 1 { xl ( k ) ≥ x j ( h ) } , k = 1, 2, . . . . (2.43) h = k + 1 − µi Proof The tokens present in pi at time (just after) x j (k ) are those that arrived into pi no later than x j (k ) and which are still in pi at time x j (k ). The tokens that arrived no later than x j (k ) are those with index h with respect to this place, with 1 ≤ h ≤ k + µi . Among these tokens, those which satisfy the relation xl (h ) > x j (k ) are still in pi at time x j (k ). Similarly, the only tokens that can be present in place pi just after time xl (k ) are those with index h > k with respect to pi . The token of index h with respect to pi is the token produced by transition j at time x j (h − µi ) (where the continuation of x j (k ) to k ≤ 0 is the one defined in §2.5.2.6). Among these tokens, those which entered pi no later than time xl (k ) are in pi at time xl (k ). 2.5.7 Stochastic Event Graphs Definition 2.85 (Stochastic event graph) A timed event graph is a stochastic event graph if the holding times, the firing times and the lag times are all random variables defined on a common probability space. Different levels of generality can be considered. The most general situation that will be considered in Chapters 7 and 8 is the case when the sequences {αi (k )}k∈Z , i = 1, . . . , |P |, and {β j (k )}k∈Z , j = 1, . . . , |Q|, are jointly stationary and ergodic 88 Synchronization and Linearity sequences of nonnegative and integrable random variables defined on a common probability space ( , F, P). Similarly, the lag times wi (k − µl ), 1 ≤ k ≤ µi , are assumed to be finite and integrable random variables defined on ( , F, P). More specific situations will also be considered, like for instance the case when the sequences {αl (k )}k∈N , l = 1, . . . , |P |, and {βi (k )}k∈N , i = 1, . . . , |Q|, are mutually independent sequences of independent and identically distributed (i.i.d.) random variables. For instance, all these variables could be exponentially distributed, with a parameter that depends on l or i , namely P[αl (k ) ≤ x ] = 1 − exp(al x ) 0 if x ≥ 0 ; otherwise, (2.44) P[βi (k ) ≤ x ] = 1 − exp(bi x ) 0 if x ≥ 0 ; otherwise, (2.45) and where a j > 0 and b j > 0. Another particular case arises when all the sequences are constant and deterministic, and the case with constant timing is thus a special (and degenerate) case of the i.i.d. situation. 2.6 Modeling Issues In this section some issues related to the modeling of Petri nets will be described briefly. 2.6.1 Multigraphs Multigraphs are graphs in which more than one arc between two nodes is allowed. The fact that in this chapter no such multigraphs have been considered is twofold. The first reason is that the modeling power of Petri nets with multiple arcs and Petri nets with single arcs is the same [108]. This ‘modeling power’ is defined in terms of ‘reachability’, as discussed in §2.4.1. Petri nets with multiple arcs can straightforwardly be represented by Petri nets with single arcs, as shown in Figure 2.29. Note, Figure 2.29: The change of multiple arcs into single arcs however, that in the second of these figures a conflict situation has arisen. In order that 2.6. Modeling Issues 89 this single arc representation of the originally multiple arc from place to transition behaves in the same way as this multiple arc, it is necessary that the two rival transitions receive tokens alternately. The second reason is that it is not at all clear how to obtain equations for timed Petri nets in which there are multiple arcs. Observe that such nets are not event graphs, as is directly seen from Figure 2.29. It seems that some transitions fire in the long run twice as often as some other transitions; the 2k -th enabling of such a ‘fast’ transition is caused by the k -th firing (approximately) of such a ‘slow’ transition. 2.6.2 Places with Finite Capacity For the rule for enabling transitions it has tacitly been assumed that each place can accommodate an unlimited number of tokens. For the modeling of many physical systems it is natural to consider an upper limit for the number of tokens that each place can hold. Such a Petri net is referred to as a finite capacity net. In such a net, each place has an associated capacity K i , being the maximum number of tokens that pi can hold at any time. For a transition in a finite capacity net to be enabled, there is the additional condition that the number of tokens in each pi ∈ σ (q j ) cannot exceed its capacity after the firing of q j . In the discussion to come we confine ourselves to event graphs, although the extension to Petri nets is quite straightforward, see [96]. Suppose that place pi has a capacity constraint K i , then the finite capacity net will be ‘remodeled’ as another event graph, without capacity constraints. If π( pi ) ∩ σ ( pi ) = ∅, then there is a loop. The number of tokens in a loop before and after the firing is the same and hence the capacity constraint is never violated (provided the initial number of tokens was admissible). Assume now that π( pi ) ∩ σ ( pi ) = ∅. Add another place pi to the net. This new place will have µi = K i − µi tokens. Add an arc from σ ( pi ) to pi and an arc from pi to π( pi ). The number of tokens in this new circuit is constant according to Theorem 2.37. The liveness of the event graph is not influenced by the addition of such a new circuit, see Theorem 2.38. An example is provided in Figure 2.30 where K i = 3. pi Ki = 3 pi pi Figure 2.30: Node with and without a capacity constraint It is easily verified that this newly constructed Petri net, without capacity constraints, behaves in exactly the same way in terms of possible firing sequences as the original finite capacity net. In this sense the nets are ‘equivalent’. 90 Synchronization and Linearity 2.6.3 Synthesis of Event Graphs from Interacting Resources This aim of this subsection is to show how, starting from the physical understanding of ‘resources’, and some assumptions, one can build up event graphs in a somewhat systematic way. This approach leads to a subclass of event graphs (essentially restricted by the kind of initial marking they can receive) for which the issue of initial conditions is transparent. 2.6.3.1 General Observations The exercise of modeling is more an art than an activity which obeys rigid and precise rules. For example, the degree of details retained in a model is a matter of appraisal with respect to future uses of the model. Models may be equivalent in some respects but they may differ in the physical insights they provide. These observations are classical in conventional system theory: it is well known that different state space realizations, even with different dimensions, may yield the same input-output behavior, but some of these realizations may capture a physical meaning of the state variables whereas others may not. To be more specific, a clear identification of what corresponds to ‘resources’ in an abstract Petri net model may not be crucial if available resources are given once and for all and if the problem only consists in evaluating the performance of a specified system, but it may become a fundamental issue when resource quantities enter the decision variables, e.g. in optimal resource sizing at the design stage. It has already been noticed that, in an event graph, although the number of tokens in each circuit is invariant during the evolution of the system, this is not generally the case of the total number of tokens in the system, even if the graph is strongly connected and autonomous. In this case, it is unclear how tokens and physical resources are related to each other. Figure 2.31 shows the simplest example of this type for which there are only two distinct situations for the distribution of tokens, the total number of tokens being either one or two. Indeed, if we redraw this event graph as in Figure 2.32, it becomes pos- (a) (b) Figure 2.31: Merging two resources (a) (b) Figure 2.32: An alternative model sible to interpret the two tokens as two resources circulating in the system sometimes alone (case (a)) and sometimes jointly (case (b)). The problem with the former model of Figure 2.31 is that, when they stay together (case (b)), the two resources are represented by a single token. Obviously, these two models are equivalent if one is only 2.6. Modeling Issues 91 interested in the number of transition firings within a certain time (assuming that some holding times have been defined properly in corresponding places), but the difference becomes relevant if one is willing to play with individual resource quantities in order to ‘optimize’ the design. As long as modeling is an art, what is described hereafter should be considered as a set of practical guidelines—rather than as a rigorous theory—which should help in trying to construct models which capture as much physical meaning as possible. The dynamic systems we have in mind are supposed to involve the combined evolution of several interacting resources designed to achieve some specific overall task or service. Our approach is in three stages: 1. we first describe the evolution of each type of resource individually; 2. then we describe the mechanism of interaction between resources; 3. finally, we discuss the problem of initialization. 2.6.3.2 State Evolution of Individual Resources The word ‘resource’ should be understood in the broadest sense: a machine, a tool, a part, a channel, a transportation link, a position in a storage or in a buffer, etc. are all resources. Of course, in practice, it is worthwhile modeling a resource explicitly as long as it is a ‘scarce’ resource, that is a resource the limited availability of which may have some influence on the evolution of the system at some time. The capacity of a buffer may be large enough for the buffer to behave as if it were infinite. Again, it is a matter of feeling to decide whether an explicit model of the finite capacity is a priori needed or not. The evolution of each resource in the system is modeled by a sequence of ‘stages’. What is considered a ‘stage’ is a matter of judgment since several consecutive stages may be aggregated into a single stage. For example, considering the evolution of a part in a workshop, traveling to the next machine and then waiting for the temperature to be cool enough before entering this next machine may be considered a single stage. We assume that • the nature and the order of the stages experienced by each type of resource are known in advance; • for a given resource, every stage is preceded (followed)—if it is not the first (last) stage—by a single other stage. Consequently, for a given type of resource, if we draw a graph such that nodes (drawn as places in a Petri net) represent stages, and arcs indicate precedence of the upstream stages over the downstream stages, then this graph is a path (i.e. it has neither circuits nor branches) and it reflects the total order of stages. Obviously, in a workshop, this assumes that a ‘deterministic scheduling’ of operations has been defined a priori. For example, a machine is supposed to work on a specified sequence of parts in a specified order, each operation being represented by a particular stage. If the stage ‘idle’ is possible between two such operations (i.e. the 92 Synchronization and Linearity machine may deliver the part it holds without receiving its next part—hence it stays alone for a while), then additional stages should be introduced in between. Notice that • the stage ‘idle’ is not represented by a unique node, but that a node of this type is introduced if necessary between any two successive operations; • a machine may be idle, while still holding the part it was working on, because there is no storage available downstream, and the next operation of this machine cannot be undertaken before the downstream machine can absorb the present part. In this case, it is not necessary to introduce an additional stage, that is, the stage ‘working’ and the stage ‘waiting for downstream delivery’ need not be distinguished. For a part, it is also assumed that the sequence of machines visited is specified in advance, but notice that storages visited should be considered as an operation of the same type as machines visited. Therefore, a storage position, if it is common to several types of parts, must know in advance the order in which it will be visited by these different parts, which is certainly a restrictive constraint in the modeling phase, but this is the price to pay for remaining eventually in the class of event graphs. Finally, the evolution of a resource is simply a path (in other words, this evolution is represented by a serial automaton). For convenience, each arc will be ‘cut’ by a bar resembling a transition, which will serve later on for synchronization purposes. Each stage receives a sequence of holding times representing minimal times spent in this stage by the successive resources. For example, waiting in a storage should involve a minimal time equal to 0. At any time, the present stage of a given resource is represented by a token marking the corresponding place. Transitions represent changes of stages and they are instantaneous. Resources may enter the system and leave it after a while (nonreusable resources) or they may revisit the same stages indefinitely because they are ‘recycled’ (reusable resources). For example, raw materials come in a workshop and leave after a transformation, whereas machines may indefinitely resume their work on the same repetitive sequences of parts. Sometimes, nonreusable resources are tightly associated with reusable resources so that it is only important to model these reusable resources: for example, parts may be fixed on pallets that are recycled after the parts leave the workshop. For reusable resources, we introduce an additional stage called the ‘recycling stage’, and we put an arc from the last stage (of an elementary sequence) to this recycling stage and another arc from the recycling stage to the first stage. Hence we obtain a circuit. Physically, the recycling stage might represent a certain ‘reconditioning’ operation (possibly involving other resources too), and therefore it might receive a nonzero holding time (transportation time, set-up time, etc.). However, it will be preferable to suppose that the recycling stage of any resource corresponds to an abstract operation which involves only this resource, and which is immediate (holding time 0). The positioning of this recycling stage with respect to the true reconditioning operation (before or after it) is left to the appraisal of the user in each specific situation. Indeed, each stage along the circuit of the reusable resource where this resource stands alone may be a candidate to play the role of the recycling stage, a remark which should be kept in 2.6. Modeling Issues 93 mind when we speak of canonical initialization later on (this is related to the issue of the point at which one starts a periodic behavior). 2.6.3.3 Synchronization Mechanism Any elementary operation may involve only one particular type of resource, or it may also involve several different resources. For example, a part waiting in a storage involves both the part and the storage position (whereas an idle machine involves only this machine). It should be realized that, so far, the same physical operation, as long as it involves n different resources simultaneously, has been represented by n different places. If two stages belonging to two distinct resource paths (or circuits) correspond to the same operation, we must express that these stages are entered and left simultaneously by the two resources (moreover, the holding times of these stages should be the same). This may be achieved by putting two synchronization circuits, one connecting the ‘beginning’ transitions, the other connecting the ‘end’ transitions, as indicated in Figure 2.33. In order to comply with the standard event graph representation, we have put new places—represented here in grey color—over the arcs of these circuits. However, these grey places do not represent ‘stages’ as other places do. They are never marked with tokens and they have holding times equal to 0. Then, it is realized that these circuits involving no tokens and having a total holding time equal to 0 express simultaneity of events, that is entering or leaving the considered stage by anyone of the two resources precedes the same type of event achieved by the other resource and vice versa. Remark 2.86 What we have just done here is nonstandard from the point of view of Petri net theory, although it is mathematically correct for the purpose of expressing simultaneity. Indeed, having a live Petri net which includes circuits with no tokens seems to contradict Theorem 2.38. More specifically, every transition having a ‘grey place’ upstream (which will never receive tokens) will never fire if we stick to the general rule about how transition firings are enabled. We propose the following (tricky) adaptation of this rule to get out of this contradiction: ‘in a timed Petri net, a transition may “borrow” tokens to enable its own firing during a duration of 0 time units, that is, provided that it can “return” the same amount of tokens immediately’. This condition is satisfied for transitions preceded and followed by (the same number of) ‘grey places’ since tokens may be ‘borrowed’ in upstream ‘grey places’ only at the epoch of firing (since those tokens are then immediately available—holding time 0), and, for the same reason, tokens produced in downstream ‘grey places’ can immediately be ‘returned’ after firing (they are immediately available when produced). Mathematically, consider the pair of transitions at the left-hand side of Figure 2.33 and let x 1 (k ) and x 2 (k ) denote their respective daters (see Definition 2.57). The ‘grey circuit’ translates into the following two inequalities: x 1 (k ) ≥ x 2 (k ) and x 2 (k ) ≥ x 1 (k ) , which imply equality: this is exactly what we want and everything is consistent. 94 Synchronization and Linearity On the contrary, the reader should think of what happens if one of the holding times put on ‘grey places’ is strictly positive and there are still no tokens in the initial marking of these places. To avoid this discussion, two alternative solutions can be adopted. The first one consists in merging the simultaneous transitions as shown by Figure 2.34, which re- Figure 2.33: Synchronization mechanism Figure 2.34: Alternative representation of synchronization moves the synchronization circuits and the ‘grey places’. We then come up with a representation similar to that of Figure 2.32. A further step towards simplification would be to merge the places and arcs in between so as to arrive at a representation similar to that of Figure 2.31, but then the resource interpretation would be obscured. The second solution involves the introduction of fake transitions x 1 and x 2 upstream the real transitions to be synchronized. This mechanism is explained by Figure 2.35 with the equations proving that the firing times of x 1 and x 2 (denoted after the name of the transitions) are equal. x1 x1 x2 x2 x1 = x1 ⊕ x2 x2 = x1 ⊕ x2 ⇒ x1 = x2 Figure 2.35: Another synchronization mechanism Notice that the holding times of two synchronized stages, which were equal by construction, can now be different—e.g. one of the two may be taken to be 0—since only the greater holding time is important. All the above considerations extend to the 2.6. Modeling Issues 95 case of n , rather than 2, resources simultaneously. It is realized that, with this approach, every transition has as many output arcs as it has input arcs. A problem arises with resources which are physically merged into a single product (assembly) or split up into two or more products (disassembly). That is, the end nodes of a resource path do not correspond to inlet or outlet nodes of the system. There is a choice of considering that a cluster of two resources that enter the system separately is still made of two resources traveling together (which amounts to doubling all arcs and places after the assembly operation) or to accept that the number of output arcs at some transition (representing the beginning of an assembly operation) be less than the number of input arcs (which means that a new type of resource starts its life at that point of the system). Dual considerations apply to the problem of disassembly. 2.6.3.4 Initialization As already indicated, during the evolution of the system the present stage of each resource will be marked on the graph by a token in a certain place along the path or the circuit of that resource. Two resources of the same type, e.g. two machines performing exactly the same sequence of operations, may use the same path or circuit as long as they never need to be distinguished. For example, a storage with n positions, and which is dedicated to a single type of stored resource, will be represented by a circuit with two stages, ‘occupied’ and ‘available’ (the only distinguishable stages of a single position), and with n tokens the distribution of which in the circuit indicates how many positions are available at any time. For a storage accommodating several types of resources, we refer the reader to the example at the end of this section. Epochs at which resources move from one stage to the next one will be given by the dater attached to the transition in between. We now define a canonical initial condition. For reusable resources, it corresponds to all tokens put at the corresponding recycling stages. As discussed earlier, these recycling stages are supposed to involve a single type of resource each, and a holding time equal to 0 (therefore, it is irrelevant to know when the tokens had been put there). For any nonreusable resource, since it passes through the system, we first complete its path by adding an inlet transition (upstream from the first place) and an outlet transition (downstream from the last place) so as to attach the epochs of inputs and outputs to these transitions (unless one of these transitions is already represented because the resource participates in an assembly operation, or because it is issued from a disassembly operation). The canonical initial condition for nonreusable resources corresponds to their paths being empty: all tokens must be introduced at the inlet transitions after the origin of time. Observe that the canonical initial condition is compatible in the sense of Definition 2.61. From this given canonical initial condition, and given a sequence of epochs at all input transitions at which tokens are introduced into the system (see Definition 2.78), tokens will evolve within the system (whereas other tokens will leave) according to the general rules of timed event graphs, and they will reach some positions at some other given epoch. Obviously, all situations thus obtained, which we may call ‘reachable conditions’, are also acceptable ‘initial conditions’ by changing the origin of time to the present time. Such a candidate to play the role of an initial condition obviously fulfills some constraints (see hereafter), and it is not defined only by the positions of 96 Synchronization and Linearity (a) (b) Figure 2.36: Two plus one resources tokens (called the ‘initial marking’): the time already spent by each token of the initial marking in this position before the (new) origin of time is also part of the definition of the ‘initial condition’ (at least for places where the holding time is nonzero). Alternatively, the first epoch at which each token of the initial marking may be consumed must be given: this corresponds to the notion of lag time introduced in Definition 2.49. Necessary, but maybe not sufficient, requirements of reachability from the canonical initial condition can be stated for all places included between any pair of synchronized (Figure 2.33) or merged (Figure 2.34) transitions. Say there are n such places, then the same number of tokens must mark all these places, say p tokens per place, and there must exist p n -tuples of tokens (with one token per place) with the same difference between their holding times and their lag times. This assumption that there must exist exactly the same number p of tokens in each of the n places makes this notion of an ‘initial condition reachable from the canonical initial condition’ even more restrictive than the notion of a ‘compatible initial condition’ of Definition 2.61. For example, if we return to the graph of Figure 2.32, with the interpretation of two reusable resources, position (a) corresponds to the canonical initial condition, position (b) is another acceptable initial marking. Suppose now that we add an additional exemplary of the resource represented by the left-hand circuit. Figure 2.36a represents an initial marking which we can interpret, but Figure 2.36b does not, although it is perfectly correct from the abstract point of view of event graphs. Moreover, it may be noted that the graph of Figure 2.36b is not reducible to something similar to Figure 2.31. 2.6.3.5 An Example We consider two types of parts, say ♠ and ♥, which can wait in separate buffers of large capacities before being heated individually by a furnace. The furnace deals with parts ♠ and ♥ alternately. The parts can then wait in a stove having the room for three parts, each one of either type, the purpose being to maintain the temperature of the parts until they are assembled in pairs (♠, ♥) by a machine. Finally, parts leave the workshop. A part must stay in the furnace if it cannot enter the stove. In the same way, parts can leave the stove in pairs (♠, ♥) only when the assembly machine can handle them. Figure 2.37 represents the stage sequences of parts ♠ and ♥ as vertical paths on 2.7. Notes 97 Table 2.1: An example: stage interpretation Stage label A B C D E F G H I J K L M N O P Q R Stage interpretation Part ♠ waits in buffer Part ♠ stays in furnace Part ♠ waits in stove Part ♠ is assembled with part ♥ Part ♥ waits in buffer Part ♥ stays in furnace Part ♥ waits in stove Part ♥ is assembled with part ♠ Furnace waits for part of type ♠ Furnace holds part of type ♠ Furnace waits for part of type ♥ Furnace holds part of type ♥ One position in stove waits for part of type ♠ One position in stove holds part of type ♠ One position in stove waits for part of type ♥ One position in stove holds part of type ♥ Machine assembles a pair (♠, ♥) Machine waits for a pair (♠, ♥) Holding time 0 α1 0 α2 0 α3 0 α2 0 α1 0 α3 0 0 0 0 α2 0 the left-hand and right-hand sides, respectively. In the middle, from top to bottom, the three circuits represent the stage sequences of the furnace, of the stove and of the machine. Each stage is labeled by a letter and Table 2.1 gives the interpretation of these stages together with their holding times. The transitions which are synchronized are connected by a dotted line. The initial marking assumes that the furnace starts with a part ♠ and that two positions out of the three in the stove accept a part ♠ first. Figure 2.38 shows the reduced event graph obtained by merging the synchronized transitions and the stages representing the same operation. However, we keep multiple labels when appropriate so that the order of multiplicity of places between synchronized transitions is still apparent. Single labels on places in circuits indicate possible recycling stages. Observe that, for any transition, the total number of labels of places upstream balances the total number of labels of places downstream. Indeed, the Petri net of Figure 2.37 (with transitions connected by dotted lines merged together) is strictly conservative (Definition 2.35) whereas that of Figure 2.38 is conservative (Definition 2.36) with weights (used in that definition) indicated by the number of labels. 2.7 Notes Graph theory is a standard course in many mathematical curricula. The terminology varies, however. Here we followed the terminology as used in [67] as much as possible. Karp’s algorithm, 98 Synchronization and Linearity A E I A K E IK J B C D N L M O Q F P R Figure 2.37: Expanded model G H BJ M FL CN DHQ GP O R Figure 2.38: Reduced model the subject of §2.3.2, was first published in [73]. The proof of the Cayley-Hamilton theorem in §2.3.3 is due to Straubing [124]. Since the 1970s, a lot of attention has been paid to the theory of Petri nets. This theory has the potential of being suitable as an excellent modeling aid in many fields of application. Its modeling power is larger than the one of automata. If one were to allow inhibitor arcs, see [1], then the modeling power is the same as that of Turing machines. A ‘system theory’ including maximization, addition as well as negation (which is what an inhibitor arc represents), has not yet been developed. An excellent overview of the theory of Petri nets is given in [96], where also many other references can be found. The section on timed Petri nets, however, is rather brief in [96]. Some material on cycle times can be found in [47] and [115], which is also discussed in Chapter 9. A good, though somewhat dated, introduction to Petri nets is [108]. Other sources which contain a lot of material on Petri nets are [30] and [29]. For a recent discussion on modeling power related to Petri nets, see [82]. Section 2.5 on equations for timed Petri nets is mainly based on [39] for the constant timing case and on [11] for the general case. Equivalence of systems represented by different graphs is also discussed in [84]. This reference, however, deals with systolic systems, in which there is a fixed clock frequency. Some of the results obtained there also seem plausible within the timed event graph setting. Relationships between graphs and binary dynamic systems are described in [25]. In [92] a novel scheme, called kanban, is described and analyzed for the coordination of tokens in an event graph. The essence is that the introduction of a new circuit with enough tokens regulates the ‘speed’ of the original event graph and it controls the number of tokens in various places. For a recent development in continuous Petri nets, where the the number of tokens is real rather than integer valued, see [53] and [99]. In the latter reference some results given in Chapter 3 for ‘discrete’ Petri nets are directly extended to continuous ones. Part II Algebra 99 Chapter 3 Max-Plus Algebra 3.1 Introduction In this chapter we systematically revisit the classical algebraic structures used in conventional calculus and we substitute the idempotent semifield Rmax (the set of real numbers endowed with the operations max and plus) for the field of scalars. The purpose is to provide the mathematical tools needed to study linear dynamical systems in (Rmax )n . This chapter is divided in three main parts. In the first part we study linear systems of equations and polynomial functions in Rmax . In the second part we consider a more advanced topic which can be skipped in a first reading. This topic is the problem of the linear closure of Rmax and its consequences for solving systems of linear and polynomial equations (the linear closure is the extension of a set in such a way that any nondegenerated linear equation has one and only one solution). The third part is concerned with a max-plus extension of the Perron-Frobenius theory. It gives conditions under which event graphs reach a periodic behavior and it characterizes their periodicities. It can be seen as a more advanced topic of the spectral theory of max-plus matrices given in the first part of this chapter. We first introduce the algebraic structure Rmax and we study its basic properties. 3.1.1 Definitions Definition 3.1 (Semifield) A semifield K is a set endowed with two operations ⊕ and ⊗ such that: • the operation ⊕ is associative, commutative and has a zero element ε; def • the operation ⊗ defines a group on K ∗ = K \ {ε}, it is distributive with respect to ⊕ and its identity element e satisfies ε ⊗ e = e ⊗ ε = ε. We say that the semifield is • idempotent if the first operation is idempotent, that is, if a ⊕ a = a, ∀a ∈ K ; • commutative if the group is commutative. 101 102 Synchronization and Linearity Theorem 3.2 The zero element ε of an idempotent semifield is absorbing for the second operation, that is ε ⊗ a = a ⊗ ε = ε, ∀a ∈ K . Proof We have that ε = εe = ε(ε ⊕ e) = ε2 ⊕ ε = ε2 , and then, ∀a ∈ K ∗ , ε = εe = εa −1 a = ε(a −1 ⊕ ε)a = εa −1 a ⊕ ε2 a = ε2 a = εa . Definition 3.3 (The algebraic structure Rmax ) The symbol Rmax denotes the set R ∪ {−∞} with max and + as the two binary operations ⊕ and ⊗, respectively. We call this structure the max-plus algebra. Sometimes this is also called an ordered group. We remark that the natural order on Rmax may be defined using the ⊕ operation a≤b if a ⊕ b = b . Definition 3.4 (The algebraic structure Rmax ) The set R ∪ {−∞} ∪ {+∞} endowed with the operations max and + as ⊕ and ⊗ and with the convention that (−∞) +∞ = −∞ is denoted Rmax . The element +∞ is denoted . Theorem 3.5 The algebraic structure Rmax is an idempotent commutative semifield. The proof is straightforward. If we compare the properties of ⊕ and ⊗ with those of + and ×, we see that: • we have lost the symmetry of addition (for a given a , an element b does not exist such that max (b, a ) = −∞ whenever a = −∞); • we have gained the idempotency of addition; • there are no zero divisors in Rmax (a ⊕ b = −∞ ⇒ a = −∞ or b = −∞). If we try to make some algebraic calculations in this structure, we soon realize that idempotency is as useful as the existence of a symmetric element in the simplification of formulæ. For example, the analogue of the binomial formula (a + b)n = n 0 an + n 1 a n −1 b + · · · + n n −1 a b n −1 + n 0 bn is n max(a , b ) = max(na , nb ), which is much simpler. On the other hand, we now face the difficulty that the max operation is no longer cancellative, e.g. max(a , b ) = b does not imply that a = −∞. 3.1. Introduction 3.1.2 103 Notation First of all, to emphasize the analogy with conventional calculus, max has been denoted ◦ ⊕ , and + has been denoted ⊗. We also introduce the symbol / for the conventional − (the inverse operation of + which plays the role of multiplication, that is, the ‘di◦ ◦ vision’). Hence a/b means a − b . Another notation for a/b is the two-dimensional display notation a . b We will omit the sign ⊗ if this does not lead to confusion. To prevent mistakes, we use ε and e for the ‘zero’ and the ‘one’, that is, the neutral elements of ⊕ and ⊗, respectively, namely −∞ and 0. To get the reader acquainted with this new notation, we propose the following table. Rmax notation Conventional notation = 2⊕3 max(2, 3) 3 1⊕2⊕3⊕4⊕5 max (1, 2, 3, 4, 5) 5 2⊗3 = 5 2+3 5 2⊕ε max(2, −∞) 2 ε=ε⊗2 −∞ + 2 −∞ (−1) ⊗ 3 −1 + 3 2 e⊗3 0+3 3 32 = 23 = 3 ⊗ 3 = 2 ⊗ 2 ⊗ 2 3 × 2 = 2 × 3 = 3 + 3 = 2 + 2 + 2 6 e = e 2 = 20 0×2 = 2×0 0 ◦ (2 ⊗ 3)/(2 ⊕ 3) (2 + 3) − max(2, 3) 2 (2 ⊕ 3)3 = 23 ⊕ 33 3 × max(2, 3) = max(3 × 2, 3 × 3) 9 ◦ 6/e 6−0 6 ◦ e/ 0−3 −3 √3 2 8 8/ 2 4 √ 5 15 15/5 3 There is no distinction, hence there are risks of confusion, between the two systems of notation as far as the power operation is concerned. As a general rule, a formula is written in one system of notation. Therefore if, in a formula, an operator of the max-plus algebra appears explicitly, then usually all the operators of this formula are max-plus operators. 3.1.3 The min Operation in the Max-Plus Algebra It is possible to derive the min operation from the two operations ⊗ and ⊕ as follows: min(a , b ) = ab . a⊕b Let us now prove the classical properties of the min by pure rational calculations in the max-plus algebra. 104 Synchronization and Linearity • − min(a , b ) = max (−a , −b ): e a⊕b e e = = ⊕ ; ab ab b a a⊕b • − max (a , b ) = min(−a , −b ): e e e ⊗ e b; = ab = a e e a⊕b a⊕b ⊕ a b ab • min(a , min(b , c)) = min(min(a , b ), c): bc abc b⊕c = bc ab ⊕ ac ⊕ bc a⊕ b⊕c a and the symmetry of the formula with respect to a , b and c proves the result; • max (c, min(a , b )) = min(max (c, a ), max (c, b )): c⊕ ab (c ⊕ a )(c ⊕ b ) = a⊕b (c ⊕ a ) ⊕ (c ⊕ b ) ⇔ ca ⊕ cb ⊕ ab (c ⊕ a )(c ⊕ b ) = a⊕b a⊕b⊕c ⇔ {(ca ⊕ cb ⊕ ab )(a ⊕ b ⊕ c) = (c ⊕ a )(c ⊕ b )(a ⊕ b )} . To check the last identity, we consider the expressions in both sides as polynomials in c and we first remark that the coefficient of c2 , namely a ⊕ b , is the same in both sides. The coefficient of c0 , namely ab (a ⊕ b ), also is the same in both sides. Now, considering the coefficient of c, it is equal to (a ⊕ b )2 ⊕ ab in the left-hand side, and to (a ⊕ b )2 in the right-hand side: these two expressions are clearly always equal. • min(c, max (a , b )) = max (min(c, a ), min(c, b )): c(a ⊕ b ) ca cb = ⊕ c⊕a⊕b c⊕a c⊕b ⇔ a⊕b a b = ⊕ c⊕a⊕b c⊕a c⊕b . The latter identity is amenable to the same verification as earlier. 3.2 Matrices in Rmax In this section we are mainly concerned with systems of linear equations. There are two kinds of linear systems in Rmax for which we are able to compute solutions: x = Ax ⊕ b and Ax = b (the general system being Ax ⊕ b = Cx ⊕ d ). We also study the spectral theory of matrices. There exist good notions of eigenvalue and eigenvector but there is often only one eigenvalue: this occurs when the precedence graph associated with the matrix is strongly connected (see Theorem 2.14). 3.2. Matrices in Rmax 3.2.1 105 Linear and Affine Scalar Functions Definition 3.6 (Linear function) The function f : Rmax → Rmax is linear if it satisfies f (c) = c ⊗ f (e) , ∀c ∈ Rmax . Thus any linear function is of the form y = f (c) = a ⊗ c, where a = f (e). The graph of such a function consists of a straight line with slope equal to one and which intersects the y -axis at a (see Figure 3.1). Definition 3.7 (Affine function) The function f : Rmax → Rmax , f (c) = ac ⊕ b , a ∈ Rmax , b ∈ Rmax is called affine. ◦ Observe that, as usual, b = f (ε) and a = limc→∞ f (c)/c, but here the limit is reached for a finite value of x (see Figure 3.2). Rmax a b Rmax y = ac y slo = pe ac = 1 ⊕ b Rmax a Rmax 0 0 Figure 3.1: A linear function Figure 3.2: An affine function In the primary school, the first algebraic problem that we have to solve is to find the solution of a scalar affine equation. Definition 3.8 (Affine equation) The general scalar affine equation is ax ⊕ b = a x ⊕ b . (3.1) Indeed, since ⊕ has no inverse, Equation (3.1) cannot be reduced to the usual form ax ⊕ b = ε, which motivates the definition above. Theorem 3.9 The solution of the general scalar affine equation is obtained as follows: • if ((a < a ) and (b < b )) or ((a < a ) and (b < b )) (3.2) ◦ hold true, then the solution is unique and it is given by x = (b ⊕ b )/(a ⊕ a ); • if a = a , b = b , and (3.2) does not hold, no solutions exist in Rmax ; • if a = a and b = b , the solution is nonunique and all solutions are given by ◦ x ≥ (b ⊕ b )/a; 106 Synchronization and Linearity • if a = a and b = b , the solution is nonunique and all solutions are given by ◦ x ≤ b/(a ⊕ a ); • if a = a and b = b , all x ∈ R are solutions. The proof is straightforward from the geometric interpretation of affine equations as depicted in Figure 3.3. = y y = a ax x ⊕ ⊕ b b Rmax Rmax 0 Figure 3.3: An affine equation In practice, it is better to simplify (3.1) before solving it. For example, if a > a and b > b , then ax ⊕ b = a x ⊕ b ⇔ ax = b . Let us give all different kinds of simplified equations that may come up. Definition 3.10 (Canonical form of an affine equation) An affine equation is in canonical form if it is in one of the simplified forms: • ax = b; • ax ⊕ b = ε; • ax ⊕ b = ax; • ax ⊕ b = b. 3.2.2 Structures The moduloid structure is a kind of module structure which, in turn, is a kind of vector space, that is, a set of vectors with an internal operation and an external operation defined over an idempotent semifield. Definition 3.11 (Moduloid) A moduloid M over an idempotent semifield K (with operations ⊕ and ⊗, zero element ε and identity element e) is a set endowed with • an internal operation also denoted ⊕ with a zero element also denoted ε; • an external operation defined on K × M with values in M indicated by the simple juxtaposition of the scalar and vector symbols; which satisfies the following properties: 3.2. Matrices in Rmax 107 • ⊕ is associative, commutative; • α(x ⊕ y ) = α x ⊕ α y; • (α ⊕ β)x = α x ⊕ β x; • α(β x ) = (αβ)x; • ex = x; • ε x = ε; for all α, β ∈ K and all x , y ∈ M. We will only be concerned with some special cases of such a structure. Example 3.12 (Rmax )n is a moduloid over Rmax . Its zero element is (ε, . . . , ε) . Other examples of moduloids will be presented later on. Definition 3.13 (Idempotent algebra) A moduloid with an additional internal operation also denoted ⊗ is called an idempotent algebra if ⊗ is associative, if it has an identity element also denoted e, and if it is distributive with respect to ⊕. This idempotent algebra is the main structure in which the forthcoming system theory is going to be developed. Example 3.14 Let (Rmax )n×n be the set of n × n matrices with coefficients in Rmax endowed with the following two internal operations: • the componentwise addition denoted ⊕; • the matrix multiplication already used in Chapters 1 and 2 denoted ⊗: n ( A ⊗ B )i j = Aik ⊗ Bk j ; k =1 and the external operation: • ∀α ∈ Rmax , ∀ A ∈ (Rmax )n×n , α A = (α Ai j ). The set (Rmax )n×n is an idempotent algebra with • the zero matrix, again denoted ε, which has all its entries equal to ε; • the identity matrix, again denoted e, which has the diagonal entries equal to e and the other entries equal to ε. 108 Synchronization and Linearity 3.2.3 Systems of Linear Equations in (Rmax)n In this subsection we are mainly interested in systems of linear equations. To define such systems, we use matrix notation. A linear mapping (Rmax )n → (Rmax )n will be represented by a matrix as it was done in the previous chapters. With the max-plus algebra, the general system of equations is Ax ⊕ b = Cx ⊕ d , where A and C are n × n matrices and b and d are n -vectors. This system can be put in canonical form in the same way as we have done in the scalar case. Definition 3.15 (Canonical form of a system of affine equations) The system Ax ⊕ b = Cx ⊕ d is said to be in canonical form if A, C , b , and d satisfy • Ci j = ε if Ai j > Ci j , and Ai j = ε if Ai j < Ci j ; • di = ε if bi > di , and bi = ε if bi < di . Example 3.16 Consider the system 32 ε2 x1 x2 ⊕ 1 2 = 41 11 x1 x2 ⊕ e 3 , 1 ε = 4ε 1ε x1 x2 ⊕ ε 3 , which can be simplified as follows: ε ε 2 2 x1 x2 ⊕ which implies 2x2 ⊕ 1 = 4x1 2x2 = 1x1 ⊕ 3 ⇒ 4 x 1 = 1 x 1 ⊕ 3 ⇒ 4 x 1 = 3 ⇒ x 1 = −1 ⇒ x 2 = 1 . This system has a solution. In general, a linear system may or may not have a solution. Moreover, even if a solution exists, it may be nonunique. There are two classes of linear systems for which we have a satisfactory theory, namely, • x = Ax ⊕ b ; • Ax = b. Let us study the former case first. 3.2. Matrices in Rmax 3.2.3.1 109 Solution of x = Ax ⊕ b Theorem 3.17 If there are only circuits of nonpositive weight in G ( A), there is a solution to x = Ax ⊕ b which is given by x = A∗ b. Moreover, if the circuit weights are negative, the solution is unique. The reader should recall the definition of A∗ as given by (1.22). Proof If A∗ b does exist, it is a solution; as a matter of fact, A( A∗ b ) ⊕ b = (e ⊕ AA∗ )b = A∗ b . Existence of A∗ b . The meaning of ( A∗ )i j is the maximum weight of all paths of any length from j to i . Thus, a necessary and sufficient condition for the existence of ( A∗ )i j is that no strongly connected components of G ( A) have a circuit with positive weight. Otherwise, there would exist a path from j to i of arbitrarily large weight for all j and i belonging to the strongly connected component which includes the circuit of positive weight (by traversing this circuit a sufficient number of times). Uniqueness of the solution. Suppose that x is a solution of x = Ax ⊕ b . Then x satisfies x = b ⊕ Ab ⊕ A2 x , x = b ⊕ Ab ⊕ · · · ⊕ Ak−1 b ⊕ Ak x , (3.3) and thus x ≥ A∗ b . Moreover, if all the circuits of the graph have negative weights, then Ak → ε when k → ∞. Indeed, the entries of Ak are the weights of the paths of length k which necessarily traverse some circuits of A a number of times going to ∞ with k , but the weights of these circuits are all negative. Using this property in Equation (3.3) for k large enough, we obtain that x = A∗ b . Remark 3.18 If the maximum circuit weight is zero, a solution does exist, but there is no uniqueness anymore. For example, the equation x = x ⊕ b admits the solution x = a ∗ b = b but all x > b are solutions too. Example 3.19 Consider the two-dimensional equation −1 2 −3 −1 x= e 2 . 3 1 , A3 b = 3 e 3 e ⊕ x⊕ Then, b= e 2 , Ab = 4 1 , A2 b = 2 e , A4 b = Thus, x= e 2 ⊕ This is the unique solution. 4 1 ⊕ 2 e ⊕··· = 4 2 . ... . 110 Synchronization and Linearity Theorem 3.20 If G ( A) has no circuit with positive weight, then A∗ = e ⊕ A ⊕ · · · ⊕ An−1 , where n is the dimension of matrix A. Proof All the paths of length greater than or equal to n are necessarily made up of a circuit and of a path with length strictly less than n . Therefore, because the weights of circuits are nonpositive by assumption, we have Am ≤ e ⊕ · · · ⊕ An−1 . ∀m ≥ n , Returning to Example 3.19, we remark that A2 b , A3 b , . . . are less than b ⊕ Ab. 3.2.3.2 Solution of Ax = b The second class of linear systems for which we can obtain a general result consists of the systems Ax = b . However, we must first consider the problem in Rmax rather than in Rmax , and second, we must somewhat weaken the notion of ‘solution’. A subsolution of Ax = b is an x which satisfies Ax ≤ b , where the order relation on the vectors can also be defined by x ≤ y if x ⊕ y = y . Theorem 3.21 Given an n × n matrix A and an n-vector b with entries in Rmax , the greatest subsolution of Ax = b exists and is given by −x j = max (−bi + Ai j ) . i For reasons that will become apparent in §4.4.4 and §4.6.2, the vector form of this ◦ ◦ formula can be written e/ x = (e/ b ) A. Proof We have that { Ax ≤ b } ⇔ Ai j x j ≤ bi , ∀i j ⇔ x j ≤ bi − Ai j , ⇔ x j ≤ min bi − Ai j ⇔ −x j ≥ max −bi + Ai j i i ∀i, j , ∀j , ∀j . Conversely, it can be checked similarly that the vector x defined by −x j = maxi −bi + Ai j , ∀ j , is a subsolution. Therefore, it is the greatest one. As a consequence, in order to attempt to solve the system Ax = b , we may first compute its greatest subsolution and then check by inspection whether it satisfies the equality. 3.2. Matrices in Rmax 111 Example 3.22 Let us compute the greatest subsolution of the following equality: 23 45 x1 x2 = 6 7 . According to the preceding considerations, let us first compute ◦ (e/ b ) A = −6 2 4 −7 3 5 = −3 −2 . Then the greatest subsolution is (x 1 , x 2 ) = (3, 2); indeed, 23 45 3 2 5 7 = ≤ 6 7 . It is easily verified that the second inequality would not be satisfied if we increase x 1 and/or x 2 . Therefore, the first inequality cannot be reduced to an equality. 3.2.4 Spectral Theory of Matrices Given a matrix A with entries in Rmax , we consider the problem of existence of eigenvalues and eigenvectors, that is, the existence of (nonzero) λ and x such that Ax = λ x . (3.4) The main result is as follows. Theorem 3.23 If A is irreducible, or equivalently if G ( A) is strongly connected, there exists one and only one eigenvalue (but possibly several eigenvectors). This eigenvalue is equal to the maximum cycle mean of the graph (see § 2.3.2): λ = max ζ |ζ |w , |ζ |l where ζ ranges over the set of circuits of G ( A). Proof def ◦ ◦ Existence of x and λ. Consider matrix B = A/λ = (e/λ) A, where λ = maxζ |ζ |w /|ζ |l . The maximum circuit weight of G ( B ) is e. Hence B ∗ and B + = B B ∗ exist. Matrix B + has some columns with diagonal entries equal to e. To prove this claim, pick a node k of a circuit ξ such that ξ ∈ arg maxζ |ζ |w /|ζ |l . + The maximum weight of paths from k to k is e. Therefore we have e = Bkk . Let B·k denote the k -th column of B . Then, since, generally speaking, B + = B B ∗ and B ∗ = e ⊕ B + (e the identity matrix), for that k , B·+ = B·∗ ⇒ B B·∗ = B·+ = B·∗ ⇒ AB·∗ = λ B·∗ . k k k k k k k Hence x = B·∗ = B·+ is an eigenvector of A corresponding to the eigenvalue k k λ. The set of nodes of G ( A) corresponding to nonzero entries of x is called the support of x . 112 Synchronization and Linearity Graph interpretation of λ. If λ satisfies Equation (3.4), there exists a nonzero component of x , say x i1 . Then we have ( Ax )i1 = λ x i1 and there exists an index i2 such that Ai1 i2 x i2 = λ x i1 . Hence x i2 = ε and A1i2 = ε. We can repeat this argument and obtain a sequence {i j } such that Ai j −1 i j x i j = λ x i j −1 , x i j = ε and Ai j −1 i j = ε. At some stage we must reach an index il already encountered in the sequence since the number of nodes is finite. Therefore, we obtain a circuit β = (il , im , . . . , il+1 , il ). By multiplication along this circuit, we obtain Ail il+1 Ail+1 il+2 . . . Aim il x il+1 x il+2 . . . x im x il = λm −l+1 x il x il+1 . . . x im . Since x i j = ε for all i j , we may simplify the equation above which shows that λm −l+1 is the weight of the circuit of length m − l + 1, or, otherwise stated, λ is the average weight of circuit β . Observe that this part of the proof did not use the irreducibility assumption. If A is irreducible, all the components of x are different from ε. Suppose that the support of x does not cover the whole graph. Then, there are arcs going from the support of x to other nodes because the graph G ( A) has only one strongly connected component. Therefore, the support of Ax is larger than the support of x , which contradicts Equation (3.4). Uniqueness in the irreducible case. Consider any circuit γ = (i1 , . . . , i p , i1 ) such that its nodes belong to the support of x (here any node of G ( A)). We have Ai2 i1 x i1 ≤ λ x i2 , ... , Ai p i p−1 x i p−1 ≤ λ x i p , Ai1 i p x i p ≤ λ x i1 . Hence, by the same argument as in the paragraph on the graph interpretation of λ, we see that λ is greater than the average weight of γ . Therefore λ is the maximum cycle mean and thus it is unique. It is important to understand the role of the support of x in the previous proof. If G ( A) is not strongly connected, the support of x is not necessarily the whole set of nodes and, in general, there is no unique eigenvalue (see Example 3.26 below). Remark 3.24 The part of the proof on the graph interpretation of λ indeed showed that, for a general matrix A, any eigenvalue is equal to some cycle mean. Therefore the maximum cycle mean is equal to the maximum eigenvalue of the matrix. Example 3.25 (Nonunique eigenvector) With the only assumption of Theorem 3.23 on irreducibility, the uniqueness of the eigenvector is not guaranteed as is shown by the following example: 1e e1 e −1 = 1 e =1 e −1 , 1 e −1 e = e 1 =1 −1 e . and e 1 The two eigenvectors are obviously not ‘proportional’. 3.2. Matrices in Rmax 113 Example 3.26 ( A not irreducible) • The following example is a trivial counterexample to the uniqueness of the eigenvalue when G ( A) is not connected: 1ε ε2 e ε e ε =1 1 ε , ε 2 ε e =2 ε e . • In the following example G ( A) is connected but not strongly connected. Nevertheless there is only one eigenvalue: 1e εe e ε =1 e ε 1 ε a e =λ a e , but e e has no solutions because the second equation implies λ = e, and then the first equation has no solutions for the unknown a . • In the following example G ( A) is connected but not strongly connected and there are two eigenvalues: e ε e 1 e ε =e e ε e ε , e 1 e 1 =1 e 1 . More generally, consider the block triangular matrix F= A B ε C , where G ( A) and G (C ) are strongly connected, and G (C ) is downstream of G ( A). Let λ A and λC , be the eigenvalues of blocks A and C , respectively, and let x A and x C be the corresponding eigenvectors. Observe that ε x C is an eigenvector of F for the ◦ eigenvalue λC . In addition, if λ A > λC , the expression (C/λ A )∗ is well-defined. The vector xA ◦ ◦ (C/λ A )∗ ( B/λ A )x A is an eigenvector of F for the eigenvalue λ A . In conclusion, F has two eigenvalues if the upstream m.s.c.s. is ‘slower’ than the downstream one. Clearly this kind of result can be generalized to a decomposition into an arbitrary number of blocks. This generalization will not be considered here (see [62]). 114 3.2.5 Synchronization and Linearity Application to Event Graphs Consider an autonomous event graph with n transitions, that is, an event graph without sources, with constant holding times and zero firing times. In §2.5.2 we saw that it can be modeled by M x (k ) = A(i ) x (k − i ) , (3.5) i =0 where the A(i ) are n × n matrices with entries in Rmax . We assume that the event graph (in which transitions are viewed as the nodes and places as the arcs) is strongly connected. In §2.5.4, it was shown that an equation in the standard form x (k + 1) = A x (k ) (3.6) can also describe the same physical system. A new event graph (with a different number of transitions in general) can be associated with (3.6). In this new event graph, each place has exactly one token in the initial marking. Therefore, in this graph, the length of a circuit or path can either be defined as the total number of arcs, or as the total number of tokens in the initial marking along this circuit or path. We refer the reader to the transformations explained in §2.5.2 to §2.5.4 to see that some transitions in the two graphs can be identified to each other and that the circuits are in one-to-one correspondence. Since the original event graph is assumed to be strongly connected, the new event graph can also be assumed to be strongly connected, provided unnecessary transitions (not involved in circuits) be canceled. Then A is irreducible. Hence there exists a unique eigenvalue λ, and at least one eigenvector x . By starting the recurrence in (3.6) with the initial value x (0) = x , we obtain that x (k ) = λk x for all k in N. Therefore, a token leaves each transition every λ units of time, or, otherwise stated, the throughput of each transition is 1/λ. It was shown that λ can be evaluated as the maximum cycle mean of G A , that is, as the maximum ratio ‘weight divided by length’ over all the circuits of G A . The purpose of the following theorem is to show that λ can also be evaluated as the same maximum ratio over the circuits of the original graph, provided the length of an arc be understood as the number of tokens in the initial marking of the corresponding place (the weight is still defined as the holding time). Let us return to (3.5). The graph G ( A(i )) describes the subgraph of the original graph obtained by retaining only the arcs corresponding to places marked with i tokens. Since A(i ) is an n × n matrix, all original nodes are retained in this subgraph which is however not necessarily connected. Consider the following n × n matrix: M λ−i A(i ) , def B (λ) = i =0 where λ is any real number. Remark 3.27 For a given value of λ, a circuit of B (λ) can be defined by a sequence of nodes (in which the last node equals the first). Indeed, once this sequence of nodes 3.2. Matrices in Rmax 115 is given, the arc between a pair of successive nodes (a , b ) is selected by the argument iλ (a , b ) of the maxi which is implicit in the expression of ( B (λ))ba . If this iλ (a , b ) is not unique, it does not matter which one is selected since any choice leads to the same weight. Therefore, the set of circuits of G ( B (λ)) is a subset of the set of circuits of the original graph (a circuit of the original graph can be specified by a given sequence of nodes and by a mapping (a , b ) → i (a , b ) in order to specify one of the possible parallel arcs between nodes a and b ). The set of circuits of G ( B (λ)) is thus changing with λ. However, for any given value of λ, if we are only interested in the maximum circuit weight of G ( B (λ)) (or in the maximum cycle mean, assuming that the length of a circuit is defined as the number of arcs in G ( B (λ))), the maximum can be taken over the whole set of circuits of the original graph (this set is independent of λ). Indeed, the additional circuits thus considered do not contribute to the maximum since they correspond to choices of i (a , b ) which are not optimal. Theorem 3.28 We assume that 1. G ( B (e)) is strongly connected; 2. G ( A(0)) has circuits of negative weight only; 3. there exists at least one circuit of G ( B (λ)) containing at least one token. Then, there exist a vector x and a unique scalar λ satisfying x = B (λ)x. The graph interpretation of λ is λ = max ζ |ζ |w , |ζ |t (3.7) where ζ ranges over the set of circuits of the original event graph and |ζ |t denotes the number of tokens in circuit ζ . Proof To solve the equation x = B (λ)x , we must find λ and x such that e is an eigenvalue of the matrix B (λ). The graph G ( B (e)) being strongly connected, G ( B (λ)) is also strongly connected for any real value of λ and therefore B (λ) admits a unique eigenvalue (λ). Owing to the graph interpretation of (λ), (λ) = maxζ |ζ |w /|ζ |l , where ζ ranges over the set of circuits of G ( B (λ)). However, Remark 3.27 showed that we can as well consider that ζ ranges over the set of circuits of the original graph. Hence, in conventional notation, we have 1 ( B (λ))ba (λ) = max ζ |ζ |l (a ,b )∈ζ 1 = max max (( A(i ))ba − i × λ) ζ i ∈{1,... , M } |ζ |l (a ,b )∈ζ 1 max (( A(i (a , b ))ba − i (a , b ) × λ) . (3.8) = max ζ |ζ |l i (a ,b ) (a ,b )∈ζ 116 Synchronization and Linearity If we assume that there exists a λ such that (λ) = e = 0, then, for any circuit ζ of the original graph, and thus for any mapping i (·, ·) which completes the specification of the circuit, we have (a ,b )∈ζ ( A(i (a , b )))ba λ≥ , (a ,b )∈ζ i (a , b ) and the equality is obtained for some circuit ζ . This justifies the interpretation (3.7) of λ. Let us now prove that (λ) = e has a unique solution. Because of (3.8), and since, according to Remark 3.27, the mappings i (·, ·) can be viewed as ranging in a set independent of λ, (·) is the upper hull of a collection of affine functions (of λ). Each affine function has a nonpositive slope which, in absolute value, equals the number of tokens in a circuit divided by the number of arcs in the circuit. Therefore, is a nonincreasing function of λ. Moreover, due to the third assumption of the theorem, there is at least one strictly negative slope. Hence limλ→−∞ (λ) = +∞. On the other hand, owing to the second assumption, and since the affine functions with zero slope stem necessarily from the circuits of A(0), limλ→+∞ (λ) < 0. Finally is a convex nonincreasing function which decreases from + to a strictly negative value, and thus its graph crosses the x -axis at a single point. It is easy to see that if we start the recurrence (3.5) with x (0) = x , x (1) = λ ⊗ x , . . . , x ( M ) = λ M ⊗ x , then x (k ) = λk ⊗ x for all k . Hence 1/λ is the throughput of the system at the periodic regime. At the end of this chapter, we will give conditions under which this regime is asymptotically reached, and conditions under which it is reached after a finite time, whatever the initial condition is. 3.3 Scalar Functions in Rmax In this section we discuss nonlinear real-valued functions of one real variable, considered as mappings from Rmax into Rmax . We classify them in polynomial, rational and algebraic functions in the max-plus algebra sense. 3.3.1 Polynomial Functions P (Rmax) Polynomial functions are a subset of piecewise linear functions (in the conventional sense) for which we have the analogue of the fundamental theorem of algebra. This set is not isomorphic to the set of formal polynomials of Rmax , that is, the set of finite sequences endowed with a product which is the sup-convolution of sequences. 3.3.1.1 Formal Polynomials and Polynomial Functions Definition 3.29 (Formal polynomials) We consider the set of finite real sequences of any length p = ( p (k ), . . . , p (i ) . . . p (n )) , k , i, n ∈ N , p (i ) ∈ Rmax . 3.3. Scalar Functions in Rmax 117 If the extreme values k and n are such that p (k ) and p (n ) are different from ε, then def def val( p ) = k is called the valuation of p, and deg( p ) = n is called the degree of p. This set is endowed with the following two internal operations: • componentwise addition ⊕; • sup-convolution ⊗ of sequences, that is, def ( p ⊗ q )(l ) = p (i )q ( j ) , i + j =l val( p )≤i ≤deg( p ) val(q )≤ j ≤deg(q ) and with the following external operation involving scalars in Rmax : • multiplication of all the elements of the sequence p by the same scalar of Rmax , We thus define Rmax [γ ] which is called the set of formal polynomials. Note that if the polynomial γ is defined by γ (k ) = e ε if k = 1 ; otherwise, then any polynomial of Rmax [γ ] can be written as p = ln=k p (l )γ l . Let us give a list of definitions related to the notion of formal polynomials. Definition 3.30 Polynomial functions: associated with a formal polynomial p, we define the polynomial function by p : Rmax → Rmax , c → p (c) = p (k )ck ⊕ · · · ⊕ p (n )cn . The set of polynomial functions is denoted P (Rmax ). Support: the support supp( p ) is the set of indices of the nonzero elements of p, that is, supp( p ) = {i | k ≤ i ≤ n , p (i ) = ε}. Monomial: a formal polynomial reduced to a sequence of one element is called a monomial1. Head monomial: the monomial of highest degree one can extract from a polynomial p, that is p (n )γ n , is called the head monomial. Tail monomial: the monomial of lowest degree out of p, that is, p (k )γ k , is called the tail monomial. 1 We do not to make the distinction between formal monomials and monomial functions because, unlike polynomials-see Remark 3.34 below-a formal monomial is in one-to-one correspondence with its associated function. 118 Synchronization and Linearity Full Support: we say that a formal polynomial has a full support if p (i ) = ε , ∀i : k ≤ i ≤ n . The following two theorems are obvious. Theorem 3.31 The set of formal polynomials Rmax [γ ] is an idempotent algebra. Remark 3.32 Because we can identify scalars with monomials of degree 0, this idempotent algebra can be viewed as the idempotent semiring obtained by considering the two internal operations only, since the external multiplication by a scalar is amenable to the internal multiplication by a monomial. Theorem 3.33 The set of polynomial functions P (Rmax ) endowed with the two internal operations: def • pointwise addition denoted ⊕, that is, ( p ⊕ q )(c) = p (c) ⊕ q (c); def • pointwise multiplication denoted ⊗, that is, ( p ⊗ q )(c) = p (c) ⊗ q (c), and the external operation over Rmax × P (Rmax ), namely, def • (b p)(c) = b ⊗ p (c), is an idempotent algebra. The same remark as above applies to this idempotent algebra too. Polynomial functions are convex piecewise linear integer-sloped nondecreasing functions (see Figure 3.4). Indeed the monomial p (i )ci is nothing but the conventional affine function ic + p(i ). Owing to the meaning of addition of monomials, polynomial functions are thus upper hulls of such affine functions. Remark 3.34 There is no one-to-one correspondence between formal polynomials and polynomial functions. For example, ∀c , c2 ⊕ 2 = (c ⊕ 1)2 = c2 ⊕ 1c ⊕ 2 . The monomial 1c is dominated by c2 ⊕ 2. In other words, 1c does not contribute to the graph of c2 ⊕ 1c ⊕ 2 (see Figure 3.5), and thus, two different formal polynomials are associated with the same function. The following lemma should be obvious. Lemma 3.35 (Evaluation homomorphism) Consider the mapping F : Rmax [γ ] → P (Rmax ), p → p . Then, F is a homomorphism between the algebraic structures defined in Theorems 3.31 and 3.33. It will be referred to as the evaluation homomorphism. 3.3. Scalar Functions in Rmax 119 Rmax Rmax 2 2 1 1 Rmax 0 0 Rmax −1 Figure 3.4: The graph of y = (−1)c2 ⊕ 1c ⊕ 2 Figure 3.5: The graph of y = (c ⊕ 1)2 Unlike in conventional algebra, the evaluation homomorphism is not a one-to-one correspondence, as shown in Remark 3.34. More precisely, it is surjective but not injective. This is why it is important to distinguish between formal objects and their associated numerical functions. Remark 3.36 The mapping F is in fact closely related to the Fenchel transform [58]. In convexity theory [119], with a numerical function f over a Hilbert space V , the Fenchel transform Fe associates a numerical function Fe ( f ) over the dual Hilbert space V ∗ as follows: ∀c ∈ V ∗ , [Fe( f )] (c) = sup ( c, z − f (z )) , z∈V where ·, · denotes the duality product over V ∗ × V . If we consider the formal polynomial p as a function from N into Rmax , p : l → p (l ) (the domain of which can be extended to the whole N by setting p (l ) = ε if l < val( p ) or l > deg( p )), then, p (c) = max(lc + p (l )) = Fe (− p )(c) . l ∈N (3.9) Before studying the properties of F and the equivalence relation it induces in Rmax [γ ], let us first give some terminology related to convexity that we will use later on. Definition 3.37 Convex set in a disconnected domain: we say that a subset F ⊂ E = N × R is convex if, for all µ ∈ [0, 1] and for all x , y ∈ F for which µx + (1 − µ) y ∈ E, it follows that µx + (1 − µ) y ∈ F. Hypograph and epigraph of a real function f : X → R: these are the sets defined respectively by def hypo( f ) = {(x , y ) | x ∈ X , y ∈ R, y ≤ f (x )} , 120 Synchronization and Linearity def epi( f ) = {(x , y ) | x ∈ X , y ∈ R, y ≥ f (x )} . Convex (concave) mapping: a function f : X → R is convex (respectively concave) if its epigraph (respectively its hypograph) is convex. Extremal point of a convex set F : it is a point which is not a convex combination of two other distinct points of F. Theorem 3.38 Consider the equivalence relation F p ≡ q ⇔ p = q ⇔ q ∈ F −1 ( p ) . ∀ p , q ∈ Rmax [γ ] , 1. With a given p ∈ Rmax [γ ], we associate two other elements of Rmax [γ ] denoted p1 and p2 , such that, for all l ∈ N, def p1 (l ) = max (µp (i ) + (1 − µ) p ( j )) , 0≤µ≤1 i, j ∈N (3.10) subject to l = µi + (1 − µ) j , def p2 (l ) = min ( p (c) − lc) . (3.11) c ∈Rmax Then p1 = p2 (denoted simply p ) and p belongs to the same equivalence class as p of which it is the maximum element. The mapping l → p (l ) is the concave upper hull of l → p (l ). Hence hypo p is convex. 2. Let now p ∈ Rmax [γ ] be obtained from p by canceling the monomials of p which do not correspond to extremal points of hypo p . Then p belongs to the same equivalence class as p of which it is the minimum element. 3. Two members p and q of the same equivalence class have the same degree and valuation. Moreover p = q and p = q . Proof 1. Using (3.10) with the particular values µ = 1, hence l = i , we first prove that p1 ≥ p for the pointwise conventional order (which is also the natural order associated with the addition in Rmax [γ ]). Combining (3.9) (written for p1 ) with (3.10), we obtain p1 (c) = max l c + l max 0≤µ≤1, i, j ∈N (µp (i ) + (1 − µ) p ( j )) subject to l = µi + (1 − µ) j = = = max 0 ≤ µ≤ 1 µ max (ic + p (i )) + (1 − µ) max ( j c + p ( j )) i max (µ p (c) + (1 − µ) p (c)) 0 ≤ µ≤ 1 p (c) . j 3.3. Scalar Functions in Rmax 121 This shows that p1 belongs to the same equivalence class as p and that it is greater than any such p , hence it is the maximum element in this equivalence class. On the other hand, combining (3.11) with (3.9), we get p2 (l ) = min c ∈Rmax max (mc + p (m )) − lc m ∈N . By choosing particular value m = l , it is shown that p2 ≥ p . Since F is a homomorphism, it preserves the order, and thus p2 ≥ p . But, if we combine (3.9) (written for p2 ) with (3.11), we get p2 (c) = max l c + min l ∈N c ∈Rmax p (c ) − lc . By picking the particular value c = c, it is shown that p2 ≤ p . Hence, we have proved that p2 = p and that p2 ≥ p . Therefore p2 is the maximum element in the equivalence class of p. Since the maximum element is unique (see § 4.3.1), it follows that p1 = p2 . From (3.11), it is apparent that p2 is concave as the lower hull of a family of affine functions. Hence, since it is greater than p , it is greater than its concave upper hull, but (3.10) shows that indeed it coincides with this hull. 2. It is now clear that the equivalence class of p can be characterized by p , or equivalently by its hypograph which is a convex set. Since a convex set is fully characterized by the collection of its extreme points, this collection is another characterization of the class. Since p has precisely been defined from this collection of extreme points, it is clearly an element of the same class and the minimum one (dropping any further monomial would change the collection of extreme points and thus the equivalence class). 3. In particular, the head and tail monomials of a given p correspond to members of the collection of extreme points. Therefore, all elements of an equivalence class have the same degree and valuation. Definition 3.39 (Canonical forms of polynomial functions) According to the previous theorem, we may call p and p the concavified polynomial and the skeleton of p, respectively. • The skeleton p is also called the minimum canonical form of the polynomial function p; • the concavified polynomial p is also called the maximum canonical form of p. Figure 3.6 illustrates these notions. It should be clear that necessarily p has full support. 122 Synchronization and Linearity Example 3.40 For p = γ 2 ⊕ γ ⊕ 2, we have p = γ 2 ⊕ 2. For p = γ 3 ⊕ 3, we have p = γ 3 ⊕ 1γ 2 ⊕ 2γ ⊕ 3. Lemma 3.41 A formal polynomial p of valuation k and degree n is a maximal canonical form of a polynomial function p , that is, p = p , if and only if p has full support (hence p(l ) = ε, for l = k , . . . , n), and p (n − 1) p (n − 2) p (k ) ≥ ≥ ··· ≥ . p (n ) p (n − 1) p (k + 1) (3.12) Proof The fact that a maximal canonical form must have full support has been established earlier and this ensures that the ‘ratios’ in (3.12) are well defined. In the proof of Theorem 3.38 it has also been shown that p is the concave upper hull of the function l → p (l ) (concavity in the sense of Definition 3.37). Conversely, if p is concave, it is equal to its own concave upper hull, and thus p = p . Now, (3.12) simply expresses that the slopes of the lines defined by the successive pairs of points ((l − 1, p (l − 1)), (l , p (l ))) are decreasing with l , which is obviously a necessary and sufficient condition for p to be concave. 3.3.1.2 Factorization of Polynomials Let us now show that polynomial functions and concave formal polynomials can be factored into a product of linear factors. Definition 3.42 (Corners, multiplicity) The nonzero corners of a polynomial function p are the abscissæ of the extremal points of the epigraph of p. Since p is convex, at such a corner, the (integer) slope increases by some integer which is called the multiplicity of the corner. A corner is called multiple if the multiplicity is larger than one. The zero corner exists if the least slope appearing in the graph of p is nonzero: the multiplicity of this zero corner is then equal to that (integer) slope. Figure 3.7 shows a nonzero corner of multiplicity 2. Rmax Rmax p p Corner of multiplicity 2 p 0 N Figure 3.6: The functions p , p and p 0 Rmax Figure 3.7: Corner of a polynomial function 3.3. Scalar Functions in Rmax 123 Theorem 3.43 (Fundamental theorem of algebra) n i =1 (γ 1. Any formal polynomial of the form p = p (n ) be equal to ε) satisfies p = p . ⊕ ci ) (where some ci may 2. Conversely, if a formal polynomial p = p (k )γ k ⊕ p (k + 1)γ k+1 ⊕ · · · ⊕ p (n )γ n is such that p = p , then it has full support, the numbers ci ∈ Rmax defined by def ci = ◦ p (n − i )/ p (n − i + 1) for 1 ≤ i ≤ n − k ; ε for n − k < i ≤ n , (3.13) are such that c1 ≥ c2 ≥ · · · ≥ cn , n i =1 (γ and p can be factored as p = p (n ) (3.14) ⊕ ci ) . 3. Any polynomial function p can be factored as n (c ⊕ ci ) , p (c) = p (n ) i =1 where the ci are the zero and/or nonzero corners of the polynomial function p repeated with their order of multiplicity. These corners are obtained by Equation (3.13) using the maximum canonical form. Proof n 1. Let p = i =1 p (n )(γ ⊕ ci ) and assume without loss of generality that the ci have been numbered in such a way that (3.14) holds. We consider the nontrivial case p (n ) = ε. By direct calculation, it can be seen that p (n − k ) = p (n )σk , where σk is the k -th symmetric product of the corners ci , that is, n σ1 = ci , i =1 σ2 = ci c j , . . . . i = j =1 Owing to our assumption on the ordering of the ci , it is clear that σk = Therefore, pn − k σk pn − k + 1 = = ck ≤ ck−1 = . pn − k + 1 σk−1 pn − k + 2 k i =1 cj. Thus p = p by Lemma 3.41. 2. If p = p , the ci can be defined by (3.13) unambiguously and then (3.14) follows from Lemma 3.41. The fact that p can be factored as indicated is checked as previously by direct calculation. 124 Synchronization and Linearity 3. From the preceding considerations, provided that we represent p with the help of its maximum canonical form, the factored form can be obtained if we define the ci with Equation (3.13). To complete the proof, it must be shown that any such ci is a corner in the sense of Definition 3.42 and that, if a ci appears ki times, then the slope jumps by ki at ci . To see this, rewrite the factored form in conventional notation, which yields n p (c) = max(c, ci ) . i =1 Each elementary term c ⊕ ci has a graph represented in Figure 3.2 with a slope discontinuity equal to one at ci . If a ci appears ki times, the term ki × max(c, ci ) causes a slope discontinuity equal to ki . All other terms with c j = ci do not cause any slope discontinuity at ci . Example 3.44 The formal polynomial γ 2 ⊕ 3γ ⊕ 2 is a maximum canonical form ◦ ◦ because c1 = 3/e = 3 ≥ c2 = 2/3 = −1, and therefore it can be factored into (γ ⊕ 3)(γ ⊕ (−1)). The formal polynomial (γ 2 ⊕ 2) = γ 2 ⊕ 1γ ⊕ 2 can be factored into (γ ⊕ 1)2 . 3.3.2 Rational Functions In this subsection we study rational functions in the Rmax algebra. These functions are continuous, piecewise linear, integer-sloped functions. We give the multiplicative form of such functions, which completely defines the points where the slope changes. Moreover, we show that the Euclidean division and the decomposition into simple elements is not always possible. 3.3.2.1 Definitions Definition 3.45 Given p (0), . . . , p (n ), q (0), . . . , q (m ) ∈ Rmax , p (n ) and q (m ) = ε, the rational function r , associated with these coefficients, is given by r : Rmax → Rmax , c → r (c) = p (0) ⊕ · · · ⊕ p (n )cn . q (0) ⊕ · · · ⊕ q (m )cm Such a function is equal to the difference of two polynomial functions (see Figure 3.8): hence it is still continuous, piecewise linear, integer-sloped, but it is neither convex nor increasing anymore. Definition 3.46 The corners of the numerator are called zero corners or root corners, and the corners of the denominator are called pole corners. Using the fundamental theorem of algebra, we can write any rational function r as r (c) = a n i =1 (c ⊕ ci ) m j =1 (c ⊕ d j ) , 3.3. Scalar Functions in Rmax 125 where the zero and pole corners are possibly repeated with their order of multiplicity. At a zero corner of multiplicity ki , the change of slope is ki (unless this zero corner coincides with some pole corner). At a pole corner of multiplicity l j , the change of slope is −l j (see Figure 3.9). Rmax y = c 2 ⊕ 2c ⊕ 3 Rmax lj pole c orner Rmax ki 0 root corner Rmax ◦ y = e/ (1c2 ⊕ 4c ⊕ 5) Figure 3.8: Graph of ◦ y = (c2 ⊕ 2c ⊕ 3)/(1c2 ⊕ 4c ⊕ 5) 3.3.2.2 0 Figure 3.9: Root and pole corners Euclidean Division In general, a polynomial p cannot be expressed as b q ⊕ r with deg(r ) < deg (b ) for some given polynomial b as shown by Example 3.47. Nevertheless, sometimes we can obtain such a decomposition, as shown by Example 3.48. Example 3.47 The equation c2 ⊕ e = q (c ⊕ 1) ⊕ r has no solutions. Indeed, q must be of degree 1 and r of degree 0; thus q = q (1)γ ⊕ q (0), r = r (0), with q (0), q (1), r (0) ∈ Rmax . By identifying the coefficients of degree 2 in both sides, we must have q (1) = e. Now, since the maximal canonical form in the left-hand side is c2 ⊕ c ⊕ e, by considering the coefficient of degree 1 in the right-hand side, we must have q (0) ⊕ 1q (1) ≤ e, which contradicts q (1) = e. Example 3.48 For p (c) = c2 ⊕ 3 and b = c ⊕ 1, we have c2 ⊕ 3 = (c ⊕ 1)2 ⊕ 3 = (c ⊕ 1)(c ⊕ 1.5) ⊕ 3 . As in conventional algebra, this issue of the Euclidean division leads to solving a triangular system of linear equations. However, in Rmax , the difficulty dwells in the fact that a triangular system of linear equations with nonzero diagonal elements may have no solutions. Example 3.49 The system (x 1 = 1, x 1 ⊕ x 2 = e) has no solutions in Rmax . 126 3.3.2.3 Synchronization and Linearity Decomposition of a Rational Function into Simple Elements ◦ Definition 3.50 A proper rational function r /q is a rational function which satisfies deg(r ) < deg(q ). ◦ ◦ ◦ In general, it is not possible to express a rational function p/q as s ⊕ r/q , where r/q is proper and s is a polynomial function. Nevertheless, given a proper rational function, we may attempt to decompose it into simple elements. Definition 3.51 A proper rational function r is decomposable into simple elements if it can be written as n Ki ◦ aik/(c ⊕ ci )k , r= i =1 k =1 where the aik are constants. Such a decomposition is not always possible. Example 3.52 We first consider a rational function for which the decomposition into simple elements is possible: c⊕1 c⊕e 1 e 1 = ⊕ = ⊕ . (c ⊕ e)2 (c ⊕ e)2 (c ⊕ e)2 c⊕e (c ⊕ e)2 ◦ The rational function (c ⊕ e)/(c ⊕ 1)2 , however, cannot be decomposed. Indeed, if such a decomposition exists, we would have c⊕e a b = ⊕ . 2 (c ⊕ 1) c⊕1 (c ⊕ 1)2 Then a (c ⊕ 1) ⊕ b = c ⊕ e, hence a = e, and also a 1 ⊕ b = 1 ⊕ b = e, which is impossible. The graph of a proper rational function which can be decomposed into simple elements is necessarily nonincreasing because it is the upper hull of nonincreasing functions. But a rational function with the degree of the numerator lower than the degree of the denominator is not always nonincreasing: this depends on the relative ordering of its pole and zero corners. ◦ Example 3.53 The function y = 2(c ⊕ e)/(c ⊕ 1)2 is proper but not monotonic (see Figure 3.10). However, being nonincreasing is a necessary but not a sufficient condition to be decomposable into simple elements as shown by the following example. ◦ Example 3.54 The function r (c) = e/(c2 ⊕ c), the graph of which is displayed in Figure 3.11, cannot be decomposed into simple elements. Indeed, r (c) = a b ⊕ c c⊕e which is impossible in Rmax . ⇒ {a (c ⊕ e) ⊕ bc = e} ⇒ a⊕b a =ε =e , 3.3. Scalar Functions in Rmax 127 Rmax 1 Rmax 1 1 −1 Rmax 0 Rmax 0 1 2 −2 ◦ Figure 3.10: Graph of y = 2(c ⊕ e)/(c ⊕ 1)2 ◦ Figure 3.11: Graph of y = e/(c2 ⊕c) Theorem 3.55 A proper rational function has a simple element decomposition if and only if it has general root corners and special pole corners which are at the intersection of a zero-sloped line and a negative-integer-sloped line (Figure 3.12). Proof If r is decomposable, it can be written as r = ri with Ki ri (c) = ◦ aik/(c ⊕ ci )k . k =1 Reducing the right-hand side to a common denominator (c ⊕ ci ) K i , we obtain ri = ◦ p (c ⊕ ci )/(c ⊕ ci ) K i , where p is a polynomial function of degree K i − 1. The polynomial function p is characterized by the fact that the abscissæ of its corners are greater than ci . Therefore ri (c) is constant on the left-hand side of ci , has a pole corner of order K i at ci , and a root corner on the right-hand side of ci . Conversely, a function having this shape can easily be realized by an ri . The proof is completed by considering the fact that r is the supremum of a finite number of such ri . 3.3.3 Algebraic Equations Definition 3.56 (Polynomial equation) Given two polynomial functions p and q of degree n and m, respectively, the equality p (c) = q (c) is called a polynomial equation. Solutions to this equation are called roots. The degree of the polynomial equation is the integer max (n , m ). Some√ polynomial equations have roots, some do not. For example, cn = a has the root c = n a , that is, c = a / n in conventional algebra. On the other hand, the equation p (c) = ε has no roots when p is a general polynomial of degree n ≥ 1. 3.3.3.1 Canonical Form of an Equation Before studying equations, it is useful to write them in their simplest form. An equation p (c) = q (c) can generally be simplified even if it is in the form p (c) = q (c). Indeed, 128 Synchronization and Linearity if two monomials p (k )γ k and q (k )γ k of the same degree appear simultaneously, and if p (k ) < q (k ), we can further simplify the equation by canceling the monomial p (k )γ k . Example 3.57 The equation c2 ⊕ 3c ⊕ 2 = 3c2 ⊕ 2c ⊕ e can be reduced to 3c ⊕ 2 = 3c2 (see Figure 3.13). Rmax 3c 2 ⊕2 c⊕ e Rmax root corners 0 y= admissible pole corner y 2 = c ⊕ ⊕ 3c 2 Rmax Rmax 0 Figure 3.12: A rational function Figure 3.13: Equation decomposable into simple elements c 2 ⊕ 3c ⊕ 2 = 3c 2 ⊕ 2c ⊕ e Definition 3.58 If there exist two identical monomials on both sides of the equation, we say that the equation is degenerated. The nondegenerated equation p(c) = q (c) is in minimal canonical form if ( p ⊕ q ) = p ⊕ q. In the case of a degenerated equation, there is a segment of solutions. 3.3.3.2 Solving Polynomial Equations Theorem 3.59 Any root of the nondegenerated equation p (c) = q (c) is a corner of p ⊕ q. Proof Let us take the canonical form of the equation. Any root of p (c) = q (c) is the solution of p(k )ck = q (l )cl for some different k and l and thus it is a corner of p ⊕ q because the equation is in minimal canonical form. The converse of this theorem is not true, i.e. a corner is not always a root of the equation p(c) = q (c). Example 3.60 The polynomial equation 3c ⊕ 2 = 3c2 has the corner c = −1 which is not a root (3(−1) ⊕ 2 = 2, 3(−1)2 = 1). 3.4. Symmetrization of the Max-Plus Algebra 129 Now we give a characterization of the situation where the polynomial equation of degree n has exactly n roots. Theorem 3.61 Suppose that n is even (the case when n is odd is similar). Let p (c) = p(0) ⊕ p (2)c2 ⊕ · · · ⊕ p (2k )c2k , q (c) = p (1)c ⊕ p (3)c3 ⊕ · · · ⊕ p (2k − 1)c2k−1 , and suppose that p (c) = q (c) is a nondegenerated equation in canonical form, then def ◦ ci = p (i − 1)/ p (i ), i = 1, . . . , n, are the n roots of the equation. Proof No corner of p ⊕ q is multiple. Hence each one is obtained as the intersection of two monomials of consecutive degrees. Therefore, these monomials belong to different sides of the equation. Thus the corresponding corners are roots of the equation. Conversely if the equation has n roots, p ⊕ q has n corners which therefore are distinct. But each of these corners is a root, hence these corners are characterized by the intersection of two monomial functions of consecutive degrees, the monomials being in different sides of the equation. 3.4 Symmetrization of the Max-Plus Algebra We have seen that the theory of linear system of equations in the max-plus algebra is not satisfactory at all, not even in the scalar case. We now extend Rmax to a larger set S for which Rmax can be viewed as the positive part. The construction is similar to the construction of Z as an extension of N in conventional algebra, but we cannot avoid some complications coming from the idempotency of the max operator. With this new set, we can generalize the notion of an equation that we call a balance. Then all nondegenerated scalar linear balances have a unique solution. Thus a linear closure of Rmax has been achieved. 3.4.1 The Algebraic Structure S A natural approach to our problem is to embed the max-plus algebra into a structure in which every nontrivial scalar linear equation has at least one solution. In particular we would like to have a solution to the equation a ⊕ x = ε, that is, we would like to find a symmetric element to a . But this not possible, as shown by the following theorem. Theorem 3.62 Every idempotent group is reduced to the zero element. Proof Assume that the group (G , ⊕) is idempotent with zero element ε. Let b be the symmetric element of a ∈ G . Then a = a ⊕ ε = a ⊕ (a ⊕ b ) = (a ⊕ a ) ⊕ b = a ⊕ b = ε . Nevertheless we can adapt the idea of the construction of Z from N to build a ‘balancing’ element rather than a symmetric one. This is the purpose of the following subsections. 130 3.4.1.1 Synchronization and Linearity The Algebra of Pairs Let us consider the set of pairs R2 endowed with the natural idempotent semifield max structure: (x , x ) ⊕ ( y , y ) = (x ⊕ y , x ⊕ y ) , (x , x ) ⊗ ( y , y ) = (x y ⊕ x y , x y ⊕ x y ) , with (ε, ε) as the zero element and (e, ε) as the identity element. Let x = (x , x ) and define the minus sign as x = (x , x ), the absolute value of x as |x | = x ⊕ x and the balance operator as x • = x x = (|x |, |x |). Clearly, these operators have the following properties: 1. a • = ( a )• ; 2. a •• = a • ; 3. ab• = (ab)• ; 4. ( a) = a; 5. (a ⊕ b ) = ( a ) ⊕ ( b ); 6. (a ⊗ b) = ( a ) ⊗ b . These properties allow us to write a ⊕ ( b ) = a 3.4.1.2 b as usual. Quotient Structure Definition 3.63 (Balance relation) Let x = (x , x ) and y = ( y , y ). We say that x balances y (which is denoted x ∇ y) if x ⊕ y = x ⊕ y . It is fundamental to notice that ∇ is not transitive and thus is not an equivalence relation. For instance, consider (e, 1) ∇ (1, 1), (1, 1) ∇ (1, e), but (e, 1) ∇ (1, e)! Since ∇ cannot be an equivalence relation, it is not possible to define the quotient structure of R2 by max means of ∇ (unlike in conventional algebra in which N2 / ∇ Z). However, we can introduce the equivalence relation R on R2 closely related to the balance relation, max namely, (x , x )R( y , y ) ⇔ x ⊕y =x ⊕y (x , x ) = ( y , y ) if x = x , y = y , otherwise. It is easy to check that R is compatible with the addition and multiplication of R2 , max the balance relation ∇ , and the , | · | and • operators. Definition 3.64 The symmetrized algebra R2 /R of Rmax is called S. max We distinguish three kinds of equivalence classes: (t , −∞) = {(t , x ) | x < t } , (−∞, t ) = {(x , t ) | x < t } , (t , t ) = {(t , t )} , called positive elements; called negative elements; called balanced elements. 3.4. Symmetrization of the Max-Plus Algebra 131 By associating (t , −∞) with t ∈ Rmax , we can identify Rmax with the semifield of positive or zero classes denoted S⊕ . The set of negative or zero classes (of the form x for x ∈ S⊕ ) will be denoted S . This set is not stable by multiplication and thus it is not a semifield. The set of balanced classes (of the form x • ) is denoted S• ; it is also isomorphic to Rmax . This yields the decomposition S = S⊕ ∪ S ∪ S• . (3.15) The element ε is the only element common to S⊕ and S and S• . This decomposition of S should be compared with Z = N+ ∪ N− . This notation allows us to write 3 2 instead of (3, −∞) ⊕ (−∞, 2) . We thus have 3 2 = (3, 2) = (3, −∞) = 3. More generally, calculations in S can be summarized as follows: a b a b=a , if a > b ; a = a , if a > b ; a = a• . (3.16) Because of its importance, we introduce the notation S∨ for the set S⊕ ∪ S and S = S∨ \ {ε}. The elements of S∨ are called signed elements. They are either positive, negative or zero. ∨ Theorem 3.65 The set S∨ = S \ S• is the set of all invertible elements of S. Proof The obvious identity t ⊗ (−t ) = ( t ) ⊗ ( − t ) = e for t ∈ Rmax \ {ε} implies that every nonzero element of S∨ is invertible. Moreover, the absorbing properties of the balance operator show that S∨ is absorbing for the product. Thus, x • y = e for all y ∈ S since e ∈ S• . Remark 3.66 Thus, in S, with each element a ∈ S∨ , we can associate an element a such that b = a a ∈ S• but in general b = ε. This is the main difference with the usual symmetrization. Here the whole set S• plays the role of the usual zero element. 3.4.2 Linear Balances Before solving general linear balances, we need to understand the meaning of the generalization of equations in Rmax by balances in S. This can be done by studying the properties of balances. Theorem 3.67 The relation ∇ satisfies the following properties: 1. a ∇ a; 2. a ∇ b ⇔ b ∇ a; 3. a ∇ b ⇔ a b ∇ ε; 4. {a ∇ b, c ∇ d } ⇒ a ⊕ c ∇ b ⊕ d; 5. a ∇ b ⇒ ac ∇ bc. 132 Synchronization and Linearity Proof Let us prove Property 5. Obviously, a ∇ b ⇔ a absorbing, (a b)c = ac b c ∈ S• , i.e. ac ∇ bc. b ∈ S• and, since S• is Although ∇ is not transitive, when some variables are signed, we can manipulate balances in the same way as we manipulate equations. Theorem 3.68 1. Weak substitution If x ∇ a, cx ∇ b and x ∈ S∨ , we have ca ∇ b. 2. Weak transitivity If a ∇ x, x ∇ b and x ∈ S∨ , we have a ∇ b. 3. Reduction of balances If x ∇ y and x , y ∈ S∨ , we have x = y. Proof 1. We have either x ∈ S⊕ or x ∈ S . Assume for instance that x ∈ S⊕ , that is, x = (x , ε). With the usual notation, x ⊕ a = a and c x ⊕ b = c x ⊕ b . Adding c a ⊕ c a to the last equality, we get c x ⊕ca ⊕c a ⊕b = c x ⊕ca ⊕c a ⊕b , which yields, by using x ⊕ a = a , ca ⊕c a ⊕b = c a ⊕ca ⊕b , that is, ca ∇ b . 2. This a consequence of the weak substitution for c = e. 3. This point is trivial but is important in order to derive equalities from balances. The introduction of these new notions is justified by the fact that any linear balance (which is not degenerated) has one and only one solution in S∨ . Theorem 3.69 Let a ∈ S∨ and b ∈ S∨ , then x = balance a −1 b is the unique solution of the ax ⊕ b ∇ ε , (3.17) which belongs to S∨ . Proof From the properties of balances it follows that ax ⊕ b ∇ ε ⇔ x ∇ a −1 b . Then using the reduction property and the fact that a −1 b ∈ S∨ , we obtain x = a −1 b . Remark 3.70 If b ∈ S∨ , we lose the uniqueness of signed solutions. Every x such that |ax | ≤ |b | (i.e. |x | ≤ |a −1b |) is a solution of the balance (3.17). If a ∈ S∨ , we again lose uniqueness. Assume b ∈ S∨ (otherwise, the balance holds for all values of x ), then every x such that |ax | ≥ |b | is a solution. 3.5. Linear Systems in S 133 Remark 3.71 We can describe all the solutions of (3.17). For all t ∈ Rmax , we obviously have at • ∇ ε. Adding this balance to ax ⊕ b ∇ ε, where x is the unique signed solution, we obtain a (x ⊕ t • ) ⊕ b ∇ ε. Thus, xt = x ⊕ t • (3.18) • is a solution of (3.17). If t ≥ |x |, then x t = t is balanced. Conversely, it can be checked that every solution of (3.17) may be written as in (3.18). Finally the unique signed solution x is also the least solution. Remark 3.72 Nontrivial linear balances (with data in S∨ ) always have solutions in S; this is why S may be considered as a linear closure of Rmax . 3.5 Linear Systems in S It is straightforward to extend balances to the vector case. Theorems 3.67 and 3.68 still hold when a , b , x , y and c are matrices with appropriate dimensions, provided we replace ‘belongs to S∨ ’ by ‘every entry belongs to S∨ ’. Therefore, we say that a vector or a matrix is signed if all its entries are signed. We now consider a solution x ∈ Rmax of the equation Ax ⊕ b = Cx ⊕ d . (3.19) Then the definition of the balance relation implies that (A C )x ⊕ (b d) ∇ ε . (3.20) Conversely, assuming that x is a positive solution of (3.20), we obtain Ax ⊕ b ∇ Cx ⊕ d , with Ax ⊕ b and Cx ⊕ d ∈ S⊕ . Using Theorem 3.68, we obtain Ax ⊕ b = Cx ⊕ d . Therefore we have the following theorem. Theorem 3.73 The set of solutions of the general system of linear equations (3.19) in Rmax and the set of positive solutions of the associated linear balance (3.20) in S coincide. Hence, studying Equation (3.19) is reduced to solving linear balances in S. Remark 3.74 The case when a solution x of (3.20) has some negative and some positive entries is also of interest. We write x = x + x − with x + , x − ∈ (S⊕ )n . Partitioning the columns of A and C according to the sign of the entries of x , we obtain A = A+ ⊕ A− , C = C + ⊕ C − , so that Ax = A+ x + A− x − and Cx = C + x + C − x − . We can thus claim the existence of a solution in Rmax to the new problem A+ x + ⊕ C − x − ⊕ b = A− x − ⊕ C + x + ⊕ d . The solution of nondegenerated problems is not unique, but the set of solutions forms a single class of R2 (for the equivalence relation R ). max 134 3.5.1 Synchronization and Linearity Determinant Before dealing with general systems, we need to extend the determinant machinery to the S-context. We define the signature of a permutation σ by sgn(σ ) = if σ is even; otherwise. e e Then the determinant of an n × n matrix A = ( Ai j ) is given (as usual) by n sgn(σ ) σ Ai σ (i ) , i =1 and is denoted either | A| or det( A). The transpose of the matrix of cofactors is denoted def A ( A i j = cof j i ( A)). The classical properties of the determinant are still true. Theorem 3.75 The determinant has the following properties: linearity: |(u 1 , . . . , λu i ⊕ µvi , . . . , u n )| = λ|(u 1 , .. , u i , .. , u n )| ⊕ µ|(u 1 , .. , vi , .. , u n )| ; antisymmetry: |(u σ (1), . . . , u σ (n))| = sgn(σ )|(u 1 , . . . , u n )| ; and consequently |(u 1 , . . . , v, . . . , v, . . . , u n )| ∇ ε ; expansion with respect to a row: n | A| = aik cofik ( A) ; k =1 transposition: | A| = | A |. A direct consequence is that some classical proofs lead to classical identities in this new setting. Sometimes weak substitution limits the scope of this approach. Theorem 3.76 For an n × n matrix A with entries in S, we have Cramer formula: AA ∇ | A|e, and if | A| is signed, then the diagonal of AA is signed; recursive computation of the determinant: | A| = F H G ann = | F |ann for a partition of matrix A where ann is a scalar; HF G 3.5. Linear Systems in S 135 Cayley-Hamilton theorem: p being the characteristic polynomial of matrix A, i.e. def p (λ) = | A λe|, we have p ( A) ∇ ε. Remark 3.77 We define the positive determinant of a matrix A, denoted | A|+ , by the sum of terms i Ai σ (i ) where the sum is limited to even permutations, and a negative determinant, denoted | A|− , by the same sum limited to odd permutations. The matrix of positive cofactors is defined by A+ ij | A j i |+ | A j i |− = if i + j is even, if i + j is odd, where A j i denotes the matrix derived from A by deleting row j and column i . The matrix of negative cofactors A − is defined similarly. With this notation, Theorem 3.76 can be rewritten as follows: AA + ⊕ | A|− e = AA − ⊕ | A|+ e . This formula does not use the sign and is valid in any semiring. The symmetrized algebra appears to be a natural way of handling such identities (and giving proofs in an algebraic way). 3.5.2 Solving Systems of Linear Balances by the Cramer Rule In this subsection we study solutions of systems of linear equations with entries in S. We only consider the solutions belonging to (S∨ )n , that is, we only consider signed solutions. Indeed, in a more general setting we cannot hope to have a result of uniqueness; see Remark 3.70. We can now state the fundamental result for the existence and uniqueness of signed solutions of linear systems. Theorem 3.78 (Cramer system) Let A be an n × n matrix with entries in S, | A| ∈ S∨ , b ∈ Sn and A b ∈ (S∨ )n . Then, in S∨ there exists a unique solution of Ax ∇ b , (3.21) ◦ x ∇ A b/| A| . (3.22) and it satisfies Proof By right-multiplying the identity AA ∇ | A|e by | A|−1 b , we see that x is a solution. Let us now prove uniqueness. The proof is by induction on the size of the matrix. It is based on Gauss elimination in which we manage the balances using weak substitution. Let us prove (3.22) for the last row, i.e. | A|x n ∇ ( A b )n . Developing | A| with respect to the last column, | A| = n=1 akn cofkn ( A), we see that at least one k term is invertible, say a1n cof1n ( A). We now partition A, b and x in such a way that the scalar a1n becomes a block: A= H F a1 n G , b= b1 B , x= X xn . 136 Synchronization and Linearity Then Ax ∇ b can be written as H X ⊕ a1 n x n ∇ b1 , (3.23) F X ⊕ Gx n ∇ B . (3.24) Since | F | = ( e)n+1 cof1n ( A) is invertible, we can apply the induction hypothesis to (3.24). This implies that X ∇ | F |− 1 F ( B G x n ) . Using the weak substitution property, we can replace X ∈ (S∨ )n−1 in Equation (3.23) to obtain | F |− 1 H F ( B G x n ) ⊕ a 1 n x n ∇ b 1 , that is, x n (| F |a1n H F G ) ∇ | F |b 1 HF B . Here we recognize the developments of | A| and ( A b )n , therefore ◦ x n ∇ ( A b )n/| A| . Since the same reasoning can be applied to the entries of x other than n , this concludes the proof. Remark 3.79 Let us write Di for the determinant of the matrix obtained by replacing def the i -th column of A by the column vector b ; then Di = ( A b )i . Assume that D = | A| is invertible, then Equation (3.22) is equivalent to x i ∇ D − 1 Di , ∀i . If A b ∈ (S∨ )n , then by using the reduction of balances (see Theorem 3.68), we obtain x i = D − 1 Di , which is exactly the classical Cramer formula. Example 3.80 The Rmax equation e −4 3 2 ⊗ x1 x2 1 −5 ⊕ −1 ε = 1 2 x1 x2 ⊗ ⊕ 2 7 , (3.25) corresponds to the balance e 3 x1 x2 1 2• ∇ 2 7 . (3.26) Its determinant is D = 4. D1 = 2 7 1 2• =8 , D2 = e2 37 =7 , 3.5. Linear Systems in S S⊕ S y= y ax⊕ = b ax ⊕ b 137 S⊕ ε S Figure 3.14: A (2, 2) linear system of equations Ab= D1 D2 = 8 7 ∈ (S∨ )2 . ◦ The system is invertible and has a unique solution. Thus (x 1 = D1/ D = 8 − 4 = ◦ D = 7 − 4 = 3) is the unique positive solution in S of the balance (3.26). 4, x 2 = D2/ Hence it is the unique solution in Rmax of Equation (3.25). Example 3.81 In the two-dimensional case the condition A b ∈ (S∨ )n has a very clear geometric interpretation (see Figure 3.14) as the intersection of straight lines in S. First we can choose an exponential scaling of the x 1 and x 2 axes. The exponential maps S⊕ to R+ and S to R− , if we identify S with i π + R ∪ {−∞}. We do not represent the balance axis in this representation. Therefore the straight line ax 2 ⊕ bx 1 ⊕ c ∇ ε is a broken line (in the usual sense) composed of four segments: • two of them are symmetric with respect of the origin; they correspond to the contribution of ax 2 ⊕ bx 1 to the balance (they belong to the conventional line ax 2 + bx 1 = 0); • the horizontal segment corresponds to the contribution of ax 2 ⊕ c to the balance; • the vertical segment corresponds to the contribution of bx 1 ⊕ c to the balance. Then it is easy to see that two such lines have one and only one point of intersection, or there exists a complete segment of solutions. This latter case is called singular and is not further considered here. Remark 3.82 The invertibility of | A| is not a necessary condition for the existence of a signed solution to the system Ax ∇ b for some value of b . Let us consider eeε A = e e ε . eeε 138 Synchronization and Linearity Because | A| = ε, the matrix A is not invertible. Let t ∈ S∨ be such that |bi | ≤ |t | for all i , and let x = t t ε . Then Ax ∇ b . But in this case, the signed solution is not unique. Remark 3.83 As already noticed, in [65] for example, determinants have a natural interpretation in terms of assignment problems. So the Cramer calculations have the same complexity as n + 1 assignment problems, which can be solved using flow algorithms. 3.6 Polynomials with Coefficients in S We have built the linear closure S of Rmax . Therefore any linear equation in S has in general a solution in this set. The purpose of this section is to show that S is almost algebraically closed, that is, any ‘closed’ polynomial equation of degree n in S has n solutions. The term ‘closed’ refers to the fact that the class of formal polynomials having the same polynomial function defined on S is not always closed (in a topological sense that will be made precise later). We will see that S is algebraically closed only for the subset of ‘closed’ polynomial equations. Moreover, we will see that any polynomial function can be transformed into a closed function by modifying it in a finite number of points. 3.6.1 Some Polynomial Functions We can generalize the notions of formal polynomials and of polynomial functions to def the set S. We restrict our study to polynomial functions in P (S∨ ) = F (S∨ [γ ]), where S∨ [γ ] denotes the class of formal polynomials with coefficients in S∨ and F is the straightforward extension of the evaluation homomorphism introduced in Lemma 3.35. We will see that such functions assume values in S∨ when their argument ranges in S∨ , except perhaps at a finite number of points. For the more general class of polynomial functions with coefficients in S, denoted P (S), it may happen that the function takes a balanced value on a continuous set of points of S∨ (in which case we say that we have a balanced facet). Because we are mainly interested in the analogy with conventional polynomials, we do not deal with this latter situation. To get a better understanding of polynomial functions on S we study some particular cases. Let us start by plotting the graphs of polynomial functions of degree one, two and three (see Figure 3.15). We must study the graphs over each of the three components S⊕ , S and S• . The value of the function itself, however, is an element of S which also belongs to one of these components. Therefore the plot is quite complicated. In order to simplify the study, only the plot over S∨ is considered. Moreover we will use the exponential system of coordinates that we have discussed in the previous section. Figure 3.15 shows the points of discontinuity of the three component functions. These discontinuities always appear at abscissæ which are symmetric with respect to corner abscissæ. At these discontinuities the polynomial functions take balanced values (in the graph, we see sign changes). They correspond to what we call ‘roots’ of polyno- 3.6. Polynomials with Coefficients in S 139 S⊕ S ε y = ax ⊕ b S S⊕ S⊕ S S⊕ S⊕ S ε y = ax 2 ⊕ bx ⊕ c S ε S⊕ y = ax 3 ⊕ bx 2 ⊕ cx ⊕d S Figure 3.15: Polynomial functions of degree one, two and three mial functions. If a polynomial is of degree n , it has in general n corners and n points in S∨ where the polynomial function takes balanced values. Let us study the polynomial functions of degree two in more detail. We consider p (c) = p (0) ⊕ p (1)c ⊕ p(2)c2 , p (i ) ∈ S∨ , i = 0, 1, 2 . The polynomial function | p| ∈ P (Rmax ) is defined as p (c) = | p (0)| ⊕ | p (1)|c ⊕ | p(2)|c2 . We will later prove, but it should be clear, that if c is a root of p , then |c| is a corner of | p |. Owing to Theorem 3.43, | p| can be factored into the product of two linear polynomials. We have the following four possibilities: ◦ ◦ ◦ 1. If | p (1)/ p (2)| > | p(0)/ p (1)|, then | p | has two distinct corners c1 = | p (1)/ p (2)| ◦ ◦ and c2 = | p (0)/ p (1)|. It can be checked that p = p (2)(c ⊕ p (1)/ p (2))(c ⊕ ◦ p(0)/ p (1)) and that this factorization is unique. In addition, the moduli of the roots are c1 and c2 . ◦ ◦ ◦ 2. If | p (1)/ p (2)| = | p (0)/ p (1)|, then | p| has a corner c1 = | p (0)/ p (2)|1/2 of ◦ p (2))(c ⊕ p (0)/ p (1)) have ◦ multiplicity 2 and the roots of p = p (2)(c ⊕ p (1)/ modulus c1 . ◦ ◦ 3. If | p (1)/ p (2)| < | p (0)/ p (1)| and p (2) p (0) ∈ S⊕ , then | p | has a corner c1 = ◦ | p(0)/ p (2)|1/2 of multiplicity 2 and p cannot be factored. Indeed at c1 and c1 , ◦ p has signed values. We have p = p (2)(c2 ⊕ p (0)/ p (2)). ◦ ◦ 4. If | p (1)/ p (2)| < | p(0)/ p (1)| and p (2) p (0) ∈ S , then the polynomial | p | has a ◦ corner c1 = | p(0)/ p (2)|1/2 of multiplicity 2. We have p = p (2)(c ⊕ c1 )(c c1 ) and thus p has been factored. This discussion suggests that if the corners of | p | (now a polynomial function of degree n ) are distinct, then there are n roots; but if | p| has multiple corners, then we are not guaranteed to have a factorization in linear factors. We now study the situation in more detail. 3.6.2 Factorization of Polynomial Functions We can consider formal polynomials S[γ ] and polynomial functions P (S) with coefficients in S as we have done for polynomials with coefficients in Rmax . They define 140 Synchronization and Linearity algebras. The following mapping from P (S) into P (Rmax ) p = p (k )γ k ⊕ · · · ⊕ p (n )γ n → | p | = | p (k )|γ k ⊕ · · · ⊕ | p (n )|γ n is a surjective morphism. Definition 3.84 (Root) The root of a polynomial function p ∈ P (S) is an element c ∈ S∨ such that p(c) ∇ ε. Remark 3.85 It should be clear that the computation of the linear factors of a polynomial yields its roots. Indeed we have (c ci ) ∇ ε ⇔ c ci ∇ ε . i Lemma 3.86 If, for a polynomial p ∈ P (S∨ ), c is a root of p (c), then |c| is a corner of | p|. Proof If c is a root, then p (c) is balanced and hence | p (c)| = | p(|c|)| is a quantity which is achieved by two distinct monomials of p since, by assumption, the coefficients of p belong to S∨ . Thus |c| is a corner of | p |. For the same reason as in the Rmax case, the mapping F : S[γ ] → P (S) , p→p , is not injective. In order to study the set valued-inverse mapping F −1 , we introduce the notion of a closed polynomial function. Definition 3.87 (Closed polynomial function) We say that the polynomial function p ∈ P (S∨ ) is closed if F −1 ( p ) admits a maximum element denoted p ∈ P (S). This element is called the maximum representative of p . Example 3.88 The polynomial function c2 e is closed because it has the same graph as c2 ⊕ e• c e. The polynomial function c2 ⊕ e is not closed because its graph is different from that of any polynomial pa = c2 ⊕ ac ⊕ e with a ∈ {e, e, e• } although it is the same for all a < e. The notion of closed polynomial function is relevant because the inverse of the evaluation homomorphism is simple for this class, and because this class is exactly the set of polynomial functions which can be factored into linear factors. Theorem 3.89 A polynomial function p ∈ P (S∨ ) can be factored in linear factors if and only if it is closed. 3.6. Polynomials with Coefficients in S 141 Proof Any polynomial which can be factored is closed. Let us suppose that the degree of p is n , its valuation 0, and the coefficient of its head monomial is e (the proof can be generalized to avoid these assumptions). Let us suppose that the degree of p is n , its valuation 0, and the coefficient of its head monomial is e (the proof can be adapted to the general case). Let ci , i = 1, . . . , n , denote the roots of p numbered according to the decreasing order of their modulus. If the ci are all distinct, we have n p (c) = mi def with m i = i =0 i c j c n −i . j =1 Because mi > m j , ∀j > i , ∀c : ci +1 < c < ci , we cannot increase the coefficient of a monomial without changing the graph of p and therefore it is closed. The situation is more complicated when there is a multiple root because then at least three monomials take the same value in modulus at this root. To understand what happens at such a multiple root, let us consider the case when p has only the roots e or e, that is, p (c) = (c ⊕ e)n−m (c e)m . The expansion of this polynomial gives five kinds of polynomials: • c n ⊕ c n −1 ⊕ · · · ⊕ e ; • cn c n −1 ⊕ c n −2 · · · ⊕ e ; • cn c n −1 ⊕ c n −2 · · · e; • c n ⊕ e • c n −1 ⊕ e • c n −2 · · · ⊕ e ; • c n ⊕ e • c n −1 ⊕ e • c n −2 · · · e. By inspection, we verify that we cannot increase any coefficient of these polynomials without changing their graphs. Therefore, they are closed. We remark that some polynomials considered in this enumeration do not belong to P (S∨ ). For example, it is the case of (c e)2 (c ⊕ e). Some other polynomials have their coefficients in S• and do not seem to belong to the class that we study here, but they have other representatives in P (S∨ ). For example, we have (c e)(c ⊕ e) = c2 ⊕ e• c e = c2 e. Any closed polynomial can be factored. If p is closed, let p denote its maximum ◦ representative. Its coefficients ci = p (n − i )/ p (n − i + 1) are nonincreasing with i in modulus. Indeed, if that were not the case, there would exist i and k such that |ci −k | > |ci +1 | > |ci |. Then it would be possible to increase pn−i while preserving the inequalities |ci −k | > |ci +1 | > |ci |. Because this operation would not change the graph of p, we would have contradicted the maximality of p . Now, if the |ci | are strictly decreasing with i , we directly verify by expansion that p (c) = i (c ⊕ ci ). If the ci are simply nonincreasing, we will only examine the particular case when the ci have their modulus equal to e. There are four subcases: 142 Synchronization and Linearity 1. p (e) ∇ ε and p ( e) ∇ ε; 2. p (e) ∇ ε and p ( e) ∇ ε; 3. p (e) ∇ ε and p( e) ∇ ε; 4. p (e) ∇ ε and p ( e) ∇ ε. The first case can appear only if n is even and p (c) = cn ⊕ cn−2 ⊕ · · · ⊕ e , which contradicts the fact the ci are nonincreasing. The other cases correspond to a factorization studied in the first part of the proof. Corollary 3.90 A sufficient condition for p ∈ P (S∨ ) to be closed is that | p | has distinct corners. Example 3.91 • c2 ⊕ e is always positive and therefore cannot be factored; • c2 ⊕ c ⊕ e is closed and can be factored into (c ⊕ e)2 ; • (γ e)(γ ⊕ e)2 = (γ ⊕ e)(γ e)2 = γ 3 ⊕ e• γ 2 ⊕ e• γ ⊕ e is a maximum representative of a closed polynomial; • γ 3 ⊕ e• γ 2 ⊕ eγ ⊕ e is not a maximum representative of a closed polynomial. In the following theorem the form of the inverse function of F , which is a setvalued mapping, is made precise. Theorem 3.92 The set F −1 ( p ), with p ∈ P (S∨ ), admits the minimum element p ∈ S∨ [γ ] and the maximum element p ∈ S[γ ] which satisfy • p ≤ F −1 ( p ) ≤ p ; • p , p ∈ F −1 ( p ); • | p | = | p| ; • | p | = | p| . Proof The proof follows from the epimorphism property of p → | p |, from the corresponding result in Rmax , and from the previous result on closed polynomial functions. Example 3.93 For p = c2 ⊕ 1c ⊕ e we have p = p = γ 2 ⊕ 1γ ⊕ e. For p = c2 ⊕ (−1)c ⊕ e we have p = γ 2 ⊕ e, but p does not exist in this case. 3.7. Asymptotic Behavior of Ak 3.7 143 Asymptotic Behavior of Ak In this section we prove a max-plus algebra analogue of the Perron-Frobenius theorem, that is, we study the asymptotic behavior of the mapping k → Ak , where A is an n × n matrix with entries in Rmax . We can restrict ourselves to the case when G ( A) contains at least a circuit since, otherwise, A is nilpotent, that is, Ak = ε for k sufficiently large. We suppose that the maximum cycle mean is equal to e. If this is not the case, the matrix A is normalized by dividing all its entries by the maximum cycle mean λ. Then the behavior of the general recurrent equation is easily derived from the formula Ak = λk (λ−1 A)k . Therefore, in this section all circuits have nonpositive weights and some do have a weight equal to e = 0 which is also the maximum cycle mean. We recall that, in this situation, e is the maximum eigenvalue of A (see Remark 3.24). 3.7.1 Critical Graph of a Matrix A Definition 3.94 For an n × n normalized matrix A, the following notions are defined: Critical circuit: a circuit ζ of the precedence graph G ( A) is called critical if it has maximum weight, that is, |ζ |w = e. Critical graph: the critical graph G c ( A) consists of those nodes and arcs of G ( A) which belong to a critical circuit of G ( A). Its nodes constitute the set V c . Saturation graph: given an eigenvector y associated with the eigenvalue e, the saturation graph S ( A, y ) consists of those nodes and arcs of G ( A) such that Ai j y j = yi for some i and j with yi , y j = ε. Cyclicity of a graph: the cyclicity of a m.s.c.s. is the gcd (greatest common divisor) of the lengths of all its circuits. The cyclicity c(G ) of a graph G is the lcm (least common multiple) of the cyclicities of all its m.s.c.s.’s. Example 3.95 Consider the matrix e −1 A= ε ε e ε −2 ε −1 −1 ε e ε ε . e e • Its precedence graph G ( A) has three critical circuits {1}, {3, 4},{4}. • Its critical graph is the precedence graph of the matrix eεεε ε ε ε ε C= ε ε ε e . εεee 144 Synchronization and Linearity • Matrix A has the eigenvector e −1 −2 −2 associated with the eigenvalue e. The corresponding saturation graph is the precedence graph of the matrix e εεε −1 ε ε ε S= ε −1 ε e . ε εee • The cyclicity of the critical graph is 1. Indeed, the critical graph has two m.s.c.s.’s with nodes {1} and {3, 4}, respectively. The second one has two critical circuits, {4} and {3, 4}, of length 1 and 2, respectively. The cyclicity of the first m.s.c.s. is 1, the cyclicity of the second m.s.c.s. is gcd(1, 2) = 1. Therefore the cyclicity of G c ( A) is lcm(1, 1) = 1. Let us give now some simple results about these graphs which will be useful in the following subsections. Theorem 3.96 Every circuit of G c ( A) is critical. Proof If this were not the case, we could find a circuit ζ , composed of subpaths ζi of critical circuits γi , with a weight different from e. If this circuit had a weight greater than e, it would contradict the assumption that the maximum circuit weight of G ( A) is e. If the weight of ζ were less than e, the circuit ζ composed of the union of the complements of ζi in γi would be a circuit of weight greater than e and this would also be a contradiction. Corollary 3.97 Given a pair of nodes (i, j ) in G c ( A), all paths connecting i to j in G c ( A) have the same weight. Proof If there exists a path p from i to j in G c ( A), it can be completed by a path p from j to i also in G c ( A) to form a critical circuit. If there exists another path p from i to j in G c ( A), the concatenations of p and p on the one hand, and of p and p on the other hand, form two critical circuits with the same weight. Hence p and p must have the same weight. Theorem 3.98 For each node in a saturation graph, there exists a circuit upstream in this graph. The circuits of any saturation graph belong to G c ( A). Proof Indeed, if i is one of its nodes, there exists another node j , upstream with respect to i , such that yi = Ai j y j ; yi , y j = ε. The same reasoning shows that there exists another node upstream with respect to j , etc. Because the number of nodes of S ( A, y ) is finite, the path (i, j , . . . ) obtained by this construction contains a circuit. A circuit (i0 , i1 , . . . , ik , i0 ) of a saturation graph S ( A, y ) satisfies yi1 = Ai1 i0 yi0 , . . . , yi0 = Ai0 ik yik . The multiplication of all these equalities shows that the weight of the circuit (i, i1 , . . . , ik , i ) is e. 3.7. Asymptotic Behavior of Ak 145 Example 3.99 In Example 3.95, node 2 has the critical circuit (indeed a loop) {1} upstream. 3.7.2 Eigenspace Associated with the Maximum Eigenvalue In this subsection we describe the set of eigenvectors of matrix A associated with the eigenvalue e. Clearly, this set is a moduloid. We characterize a nonredundant set of generators of this moduloid as a subset of the columns of A+ . Here the word ‘eigenvector’ must be understood as ‘eigenvector associated p with the eigenvalue e’. We will use the notation Ai j for ( A p )i j and A+ for ij + ( A )i j . Theorem 3.100 If y is an eigenvector of A, it is also an eigenvector of A+ . It is the linear combination of the columns A+ , i ∈ V c . More precisely, ·i yi A+ . ·i y= (3.27) i ∈V c Proof The first part of the theorem is trivial. Let us prove Formula (3.27). Consider two nodes i and j in the same m.s.c.s. of the saturation graph S ( A, y ). There exists a path (i, i1 , . . . , ik , j ) which satisfies yi1 = Ai1 i yi , ... , y j = A j ik yik . Therefore, y j = w yi with w = A j ik · · · Ai1 i ≤ A+i , j and we have + Al+ y j = Al+ w yi ≤ Al+ A+i yi ≤ Ali yi , j j jj ∀l . (3.28) We could have chosen i in a circuit of the saturation graph according to Theorem 3.98. This i will be called i ( j ) in the following. We have yl Al+ y j j = j ∈S ( A , y ) ≤ ≤ (by definition of S ( A, y ) + Ali ( j ) yi ( j ) j ∈S ( A , y ) + Ali yi i ∈V c , (by (3.28)) ∀l , where the last inequality stems from the fact that i ( j ), belonging to a circuit of a saturation graph, belongs also to V c by Theorem 3.98. The reverse inequality is derived immediately from the fact that y is an eigenvector of A+ . 146 Synchronization and Linearity Theorem 3.101 Given a matrix A with maximum circuit weight e, any eigenvector c associated with the eigenvalue e is obtained by a linear combination of N A columns of c + c A , where N A denotes the number of m.s.c.s.’s of G ( A). More precisely we have 1. the columns A+ , i ∈ V c , are eigenvectors; ·i 2. if nodes i and j belong to the same m.s.c.s. of G c ( A) , then A+ and A+ are ·i ·j ‘proportional’; 3. no A+ can be expressed as a linear combination of columns A+ which only ·i ·j makes use of nodes j belonging to m.s.c.s.’s of G c ( A) distinct from [i ]. Proof The first statement has already been proved in Theorem 3.23. Consider now the second statement: since A+ A+ ≤ A+ and A+ A+i = e, if nodes i and j belong to the ij j same m.s.c.s. of G c ( A), hence to the same critical circuit by Theorem 3.96, we have + + Al+ A+i ≤ Ali = Ali A+ A+i ≤ Al+ A+i , jj ij j jj ∀l , which shows that + Al+ A+i = Ali , jj ∀l . c This result, together with (3.27), show that N A columns of A+ are sufficient to generate all eigenvectors. The third statement of the theorem claims that we cannot further reduce this number of columns. Otherwise, one column of A+ , say i , could be expressed as a linear combination of other columns of A+ selected in other m.s.c.s.’s of G c ( A). Let K denote the set of columns involved in this linear combination. Then, construct a matrix B as follows. Let J = K ∪ {i }. Matrix B is obtained from A+ by deleting all rows and columns with indices out of J . By construction, Bkk = e, ∀k , and the weights of all circuits of G ( B ) are less than or equal to e. The linear combination of the columns of A+ is preserved when restricting this combination to matrix B . Owing to the multilinearity and antisymmetry of the determinant, and to the decomposition of any permutation into circular permutations, det B = e• . Since k Bkk = e, there must exist another permutation ς such that k Bkς(k) = e. Stated differently, there must exist a critical circuit connecting some m.s.c.s.’s of G c ( A) and this yields a contradiction. Example 3.102 If we return to Example 3.95, matrix A+ is equal to e −1 −2 −2 eε −1 ε −1 e −1 e ε ε . e e Nodes 1, 3, 4 belong to G c ( A) which has two m.s.c.s.’s, namely {1} and {3,4}. Columns 1 and 3 are independent eigenvectors. Column 4 is equal to column 3. 3.7. Asymptotic Behavior of Ak 3.7.3 147 Spectral Projector In this subsection we build the spectral projector on the eigenspace (invariant moduloid) associated with the eigenvalue e. Definition 3.103 (Spectral projector) A matrix Q satisfying AQ = Q A = Q 2 is called a spectral projector of A associated with the eigenvalue e. Theorem 3.104 The matrices Q i = A+ A+ , ·i i · def i ∈ Vc , (3.29) are spectral projectors of A. Proof The properties AQ i = Q i A = Q i follow from the fact that the columns A+ , i ∈ ·i V c , are eigenvectors of A (from which we deduce by transposition that the rows of A+ such that A+ = e are left eigenvectors). Let us prove that Q 2 = Q i , that is, ii i A+ A+ = ·i i · A+ A+ A+ A+ . ·i ik ki i · k This relation is true because A+ ii = e implies def Theorem 3.105 The matrix Q = (3.29), is a spectral projector. i ∈V c k A+ A+ = e. ik ki Q i , where the matrices Q i are defined by Proof The only nontrivial fact to prove is that Q 2 = Q . This relation will be proved if we prove that Q i Q j ≤ Q i ⊕ Q j . This last inequality is true because it means that the greatest weight of the paths connecting a pair of nodes and traversing i and j is less than the maximum weight of the paths connecting the same pair of nodes and traversing either i or j . Example 3.106 Continuing with Example 3.95, we obtain two elementary spectral projectors: e eεε e −1 −1 ε ε −1 Q1 = −2 e e ε ε = −2 −2 ε ε , −2 −2 ε ε −2 ε εεε ε ε ε ε ε ε Q 2 = −2 −1 e e = −2 −1 e e , e −2 −1 e e e e eεε −1 −1 ε ε Q = Q1 ⊕ Q2 = −2 −1 e e . −2 −1 e e 148 3.7.4 Synchronization and Linearity Convergence of Ak with k In this subsection we give a necessary and sufficient condition for the convergence of the powers of matrix A. To achieve this goal, we equip Rmax with the topology: when n → +∞, def x n → x ⇔ |x n − x |e = | exp(x n ) − exp(x )| → 0 . The purpose of this topology is to simplify the study of the convergence towards ε. Indeed, because exp (ε) = e, limn |x n − ε|e = 0 and limn |x n − ε|e = 0 imply that limn |x n − x n |e = 0. This property, which is not true with respect to the usual absolute value in R, is useful for the asymptotic cyclicity notion that we will introduce later on. We first recall the following result on the diophantine linear equation (see [33]). Lemma 3.107 For all p and n which are coprime and for all q ≥ ( p − 1)(n − 1), there exist two integers a (q ) and b (q ) such that q = a (q ) p + b (q )n. Theorem 3.108 A necessary and sufficient condition to have limk→∞ Ak = Q is that the cyclicity-see Definition 3.94-of each m.s.c.s. of G c ( A) is equal to 1. Proof Let us prove the sufficient condition first. Consider a node i in G c ( A). For any other node j , there exists a path from i to j of maximum weight (possibly equal to ε). If there happens to be more than one such path, we take the one with the least length and call this length p (i, j ) which is less than n − 1 if A is an n × n matrix. If the maximum weight is ε, we consider that the length is equal to 1. By Lemma 3.107 and the assumption on cyclicity, there exists some integer M (i ) such that, for all m greater than M (i ), there exists a critical circuit of length m in [i ]G c ( A) (the m.s.c.s. of G c ( A) to which i belongs). Therefore, because the maximum weight of the circuits is e, any maximum-weight path from i to j of length q greater than M (i ) + p is composed of a critical circuit of length q − p traversing i and a maximum-weight path from i to j (of q any length). Therefore A j i = A+i for all q greater than p + M (i ) and this holds for all j q + i in G c ( A). Since Aii = e, we also have A j i = A+i A+ . j ii Consider now another node l which does not belong to G c ( A). Let i be a node in p G c ( A), let q be large enough and let p ≤ n be such that Ail = A+ (such a p exists il because circuits have weights less than e, hence the lengths of maximum-weight paths between any two nodes do not need to exceed n ). We have q− p q A jl ≥ A j i Ail = A+i A+ , j il p where the inequality is a consequence of the matrix product in Rmax whereas the equality arises from the previous part of the proof and from the property of p . If we have a strict inequality, it means that the paths with maximum weight from j to l do not traverse i , and since this is true for any i in G c ( A), these paths do not traverse G c ( A). On the other hand, for q large enough, they must traverse some circuits which therefore have a strictly negative weight. When q increases, these paths have weights arbitrarily close to ε. Finally, this situation is possible only if there is no node of G c ( A) located downstream of l in G ( A). In this case A+ = ε for all i in G c ( A) and therefore il A+i A+ . j il q lim A jl = ε = q →∞ i ∈V c 3.7. Asymptotic Behavior of Ak 149 By collecting all the above results, we have proved that A+i A+ , j il q lim A jl = q →∞ ∀ j , l ∈ G ( A) . i ∈V c Conversely, suppose that the above limit property holds true and that at the same time the cyclicity of G c ( A) is strictly greater than 1. Let us consider a node i ∈ G c ( A) (hence A+ = e). We have ii exp( Ak×d ) = exp(e) = exp( Ak×d +1 ) + η , ii ii where η can be arbitrarily small because of the assumed limit. But Ak×d +1 = Aii ii for 0 ≤ p ≤ n (again because circuits have nonpositive weights). Therefore, Ak×d +1 ii can assume values out of a finite set. From the relation above, it should be clear that Ak×d +1 = e. This means that there exists a circuit of length k × d + 1. But the gcd of ii kd and k × d + 1 is 1, which is a contradiction. p Theorem 3.109 Suppose that G ( A) is strongly connected. Then there exists a K such that ∀k ≥ K , Ak = Q , if and only if the cyclicity of each m.s.c.s. of G c ( A) is equal to 1. Proof The proof is similar to the previous one. The only difference lies in the second part of the ‘if’ part. Under the assumption that G ( A) is strongly connected, a path of maximum weight from l to j with length large enough necessarily crosses G c ( A). Therefore, for q large enough we have A jl = A+i A+ , j il q where i belongs to G c ( A). Example 3.110 • Using Example 3.95 once more, we have e e −1 −1 2 A = −2 −2 ε −1 e −1 3 4 A = A = ··· = −2 −2 ε ε e e ε ε , e e eε −1 ε −1 e −1 e ε ε . e e Therefore An , n ≥ 3, is equal to Q 1 ⊕ Q 2 given in Example 3.106. 150 Synchronization and Linearity • In the previous example the periodic regime is reached after a finite number of steps. This is true for the submatrix associated with the nodes of the critical graph but it is not true in general for the complete matrix. To show this, take the example −1 ε A= . εe • In the previous example there is an entry which goes to ε. When all entries converge to a finite number, the periodic regime can be reached also, but the time needed may be arbitrarily long. Consider the matrix A= −η e −1 e . The matrix Ak converges to the matrix −1 −1 e e if η is a small positive number. But we have to wait for a power of order 1/η to reach the asymptote. 3.7.5 Cyclic Matrices In this subsection we use the previous theorem to describe the general behavior of the successive powers of matrix A which turns out to be essentially cyclic. Definition 3.111 Cyclicity of a matrix: a matrix A is said to be cyclic if there exist d and M such that ∀m ≥ M , Am +d = Am . The least such d is called the cyclicity of matrix A and A is said to be d-cyclic. Asymptotic cyclicity: a matrix A is said to be asymptotically cyclic if there exists d such that, for all η > 0, there exists M such that, for all m ≥ M, supi j |( Am +d )i j −( Am )i j |e ≤ η . The least such d is called the asymptotic cyclicity of matrix A and A is said to be d-asymptotically cyclic. Theorem 3.112 Any matrix is asymptotically cyclic. The asymptotic cyclicity d of matrix A is equal to the cyclicity ρ of G c ( A). Moreover if G ( A) and G ( Aρ ) are connected, the matrix is ρ -cyclic. Proof This result has already been proved in the case ρ = 1. For any matrix A, if we consider B = Aρ , then the asympotic cyclicity ρ of B is equal to 1. Indeed, the nodes of the critical graph of B are a subset of G c ( A), and around each such node there exists a loop. The necessary and sufficient conditions of convergence of the powers of a matrix can be applied to B (see Theorems 3.108 and 3.109). They show the convergence (possibly in a finite number of stages) of B k = Ak×ρ to the spectral 3.8. Notes 151 projector Q associated with B . Because any m can be written h + k × ρ , Am = Ah+k×ρ = Ah B k converges to Ah Q when k goes to infinity. This is equivalent to saying that matrix A is d -asymptotically-cyclic (or d -cyclic in the case of finite convergence), with d ≤ ρ . Let us prove that the asymptotic cyclicity d of matrix A is greater than or equal to ρ (hence it is equal to ρ ). The proof when the matrix is not only asymptotically cyclic but cyclic is similar. Consider a node of the m.s.c.s. l = [i ]G c ( A) of G c ( A) and let ρl denote its cyclicity. By definition of d , we have exp Ak×ρl = exp(e) = exp( Ak×ρl +d ) + η , ii ii k ×ρ k ×ρ + d for η arbitrarily small and for k large enough. Therefore Aii l = Aii l and there is a circuit of length d in the m.s.c.s. l of G c ( A). Therefore ρl divides d . But this is true for all m.s.c.s.’s of G c ( A) and therefore d is divided by the lcm of all the ρl which is ρ . Example 3.113 The matrix −1 A= ε ε has cyclicity 2. Indeed, −2n ε A2n = ε 3.8 ε e ε ε ε , e ε ε e ε e ε A2n+1 −(2n + 1) ε = ε ε ε e ε e . ε Notes The max-plus algebra is a special case of a more general structure which is called a dioid structure. This is the topic of the next chapter. Nevertheless the max-plus algebra, and the algebras of vector objects built up on it, are important examples of dioids for this book because they are perfectly tailored to describe synchronization mechanisms. They were also used to compute paths of maximum length in a graph in operations research. This is why they are sometimes called ‘path algebra’ [67]. Linear systems of equations in the max-plus algebra were systematically studied in [49]. Some other very interesting references on this topic are [67], [130]. In these references the Gauss elimination algorithm can be found. It was not discussed here. Linear dependence has been studied in [49], [65], [93], [126] and [62]. Several points of view exist but none of them is completely satisfactory. Moreover the geometry of linear manifolds in the max-plus algebra is not well understood (on this aspect, see [126]). The only paper that we know on a systematic study of polynomial and rational functions in the max-plus algebra is [51]. In this paper one can find some results on rational functions not detailed in this book. 152 Synchronization and Linearity The symmetrization of the max-plus algebra was discussed earlier in [109] and [110]. The presentation given here is based on these references. This symmetrization is more deeply studied in [62]. Reference [65] has been an important source of ideas even though symmetrization has been avoided in this paper. The proof of the Cramer formula is mainly due to S. Gaubert and M. Akian. Relevant references are [118], [124], [93]. The attempt made here to discuss polynomials in S is new. It could give a new insight into the eigenvalue problem. Because of the lack of space this discussion has not been continued here. The section on the max-plus Perron-Frobenius theorem is a new version of the report [37]. The proof is mainly due to M. Viot. Some other relevant references are [64], [49], [127]. Chapter 4 Dioids 4.1 Introduction In previous chapters, the set R ∪ {−∞} (respectively R ∪ {+∞}) endowed with the max (respectively the min) operation as addition and the usual addition as multiplication has appeared as a suitable algebraic structure for obtaining ‘linear’ models of some discrete event systems. In Chapter 5 it will be shown that another slightly more complex structure is also appropriate for the same class of systems. All these algebraic structures share some common features that will be studied in the present chapter. However, there is yet no universal name nor a definite set of axioms everybody agrees upon in this general field. We refer the reader to the notes at the end of this chapter where some related works are briefly discussed. Here we adopt the following point of view: we introduce a first ‘minimal’ set of axioms according to what seems to be the intersection of axioms generally retained in the works alluded to above, and also according to what seems to be appropriate for the linear system theory we are going to develop in Chapter 5. Starting from this minimal set of axioms, we derive some basic results. To obtain further results we may need to introduce some additional assumptions or properties, which we do only when necessary: it is then clear where this added structure is really needed. We use the word ‘dioid’ as the generic name for the algebraic structure studied in this chapter. The linguistic roots of this name and its introduction in the literature are discussed in the notes section. Dioids are structures that lie somewhere between conventional linear algebra and semilattices endowed with an internal operation generally called multiplication. With the former, it shares combinatorial properties such as associativity and commutativity of addition, associativity of multiplication, distributivity of multiplication with respect to addition, and of course the existence of zero and identity elements. With the latter, it shares the features of an ordered structure (adding is then simply taking the upper bound) endowed with another ‘compatible’ operation. Therefore one may expect that the results of linear algebra which depend only on combinatorial properties will generalize to dioids. A typical case is the Cayley-Hamilton theorem. On the other hand, since neither addition nor multiplication are invertible in general dioids (in this respect the max-plus algebra is special since the structure associated with + is a group), one appeals to the classical theory of residuation in lattice structures to provide alternative 153 154 Synchronization and Linearity notions of inversion of the basic operations and of other order-preserving mappings. This yields a way to ‘solve’ some equations in a certain sense related to the order structure even if there is no solution in a more classical sense. A section of this chapter is devoted to rational calculus, in the sense in which this expression is used in automata and formal languages theory. The motivation in terms of system theory and especially in terms of realization theory should be clear and this will be illustrated by Chapter 5. However, the problem of minimal realization is yet unsolved in the present framework (see also Chapters 6 and 9). In this chapter we will also be interested in constructing more elaborate dioids from given basic dioids. This is generally done by considering the quotient of simple dioids by certain ‘congruences’ (equivalence relations which are compatible with the original dioid structure). Particular congruences will be considered for their usefulness regarding the developments in Chapter 5. As a motivation, the reader may think of this quotient operation yielding a ‘coarser’ dioid as a way to ‘filter’ elements of the original dioid that are ‘undesirable’ for the system theory one is concerned with. For example, if trajectories of discrete event systems are the basic objects, one is often interested in nondecreasing trajectories whereas the basic dioid may also contain nonmonotonic trajectories. Nonmonotonic trajectories are then mapped to nondecreasing ones in a canonical way by special congruences. 4.2 Basic Definitions and Examples 4.2.1 Axiomatics Definition 4.1 (Dioid) A dioid is a set D endowed with two operations denoted ⊕ and ⊗ (called ‘sum’ or ‘addition’, and ‘product’ or ‘multiplication’) obeying the following axioms: Axiom 4.2 (Associativity of addition) ∀a , b , c ∈ D , (a ⊕ b ) ⊕ c = a ⊕ (b ⊕ c) . Axiom 4.3 (Commutativity of addition) ∀a , b ∈ D , a⊕b = b⊕a . Axiom 4.4 (Associativity of multiplication) ∀a , b , c ∈ D , (a ⊗ b ) ⊗ c = a ⊗ (b ⊗ c) . Axiom 4.5 (Distributivity of multiplication with respect to addition) ∀a , b , c ∈ D , (a ⊕ b ) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c) , c ⊗ (a ⊕ b ) = c ⊗ a ⊕ c ⊗ b . This is right, respectively left, distributivity of product with respect to sum. One statement does not follow from the other since multiplication is not assumed to be commutative. 4.2. Basic Definitions and Examples 155 Axiom 4.6 (Existence of a zero element) ∃ε ∈ D : ∀a ∈ D , a⊕ε = a . Axiom 4.7 (Absorbing zero element) ∀a ∈ D , a⊗ε = ε⊗a =ε . Axiom 4.8 (Existence of an identity element) ∃e ∈ D : ∀a ∈ D , a⊗e = e⊗a = a . Axiom 4.9 (Idempotency of addition) ∀a ∈ D , a⊕a =a . Definition 4.10 (Commutative dioid) A dioid is commutative if multiplication is commutative. Most of the time, the symbol ‘⊗’ is omitted as is the case in conventional algebra. Moreover, a k , k ∈ N, will of course denote a ⊗ · · · ⊗ a and a 0 = e. k times With the noticeable exception of Axiom 4.9, most of the axioms of dioids are required for rings too. Indeed, Axiom 4.9 is the most distinguishing feature of dioids. Because of this axiom, addition cannot be cancellative, that is, a ⊕ b = a ⊕ c does not imply b = c in general, for otherwise D would be reduced to ε (see Theorem 3.62). In fact, Axiom 4.9 is at the basis of the introduction of an order relation; as mentioned in the introduction, this is the other aspect of dioids, their lattice structure. This aspect is dealt with in §4.3. Multiplication is not necessarily cancellative either (of course, because of Axiom 4.7, cancellation would anyway only apply to elements different from ε). We refer the reader to Example 4.15 below. A weaker requirement would be that the dioid be ‘entire’. Definition 4.11 (Entire dioid) A dioid is entire if ab = ε ⇒ a = ε or b = ε . If a = ε, b = ε, and ab = ε, then a and b are called zero divisors. Hence, an entire dioid is a dioid which does not contain zero divisors. Not every dioid is entire (see Example 4.81 below). If multiplication is cancellative, the dioid is entire. As a matter of fact, ab = ε ⇒ ab = a ε ⇒ b = ε if a = ε by cancellation of a . 4.2.2 Some Examples For the following examples of dioids, we let the reader check the axioms and define what ε and e should be. All of them are commutative dioids. 156 Synchronization and Linearity Example 4.12 The first example of a dioid encountered in this book was R ∪ {−∞} with max as ⊕ and + as ⊗. It was denoted Rmax . Example 4.13 (R ∪ {+∞}, min, +) is another dioid which is isomorphic—this terminology is precisely defined later on—to the previous one by the compatible bijection: x → −x . It will be denoted Rmin . Example 4.14 Using the bijection x → exp (x ), R ∪ {−∞} is mapped onto R+ . For this bijection to preserve the dioid structure of Rmax , one has to define ⊕ in R+ as max again and ⊗ as × (the conventional product). This yields the dioid (R+ , max , ×). Example 4.15 Consider the set R ∪ {−∞} ∪ {+∞} and define ⊕ as max and ⊗ as min. Example 4.16 In the previous example, replace the set by {0, 1} and keep the same operations: this is the Boole algebra and also the unique dioid (up to an isomorphism) reduced to {ε, e}. Example 4.17 Let 2R denote the set of all subsets of the R2 plane, including ∅ and the whole R2 itself. Then define ⊕ as ∪ and ⊗ as +, that is, the ‘vector sum’ of subsets 2 ∀ A , B ⊆ R2 , A ⊗ B = A + B = x ∈ R2 | x = y + z , y ∈ A , z ∈ B . Example 4.18 A similar example in dimension 1 is provided by considering the subset of 2R consisting only of half-lines infinite to the left, that is, intervals (−∞, x ] for all x ∈ R, including ∅ but not R itself, with again ∪ as ⊕ and + as ⊗. Observe that this subset of half-lines is closed—see below—for these two operations. This dioid is isomorphic to Rmax by the bijection x ∈ R → (−∞, x ] ∈ 2R and ε = −∞ → ∅. In all the examples above, except Examples 4.15 and 4.17, ⊗ induces a group structure on D \ {ε} (D minus ε). This implies of course that ⊗ is cancellative. Obviously ⊗ is not cancellative in Example 4.15. This is also true for Example 4.17: this fact follows from Theorem 4.35 below. However, in both cases the dioid is entire. 4.2.3 Subdioids Definition 4.19 (Subdioid) A subset C of a dioid is called a subdioid of D if • ε ∈ C and e ∈ C ; • C is closed for ⊕ and ⊗. The second statement means that ∀a , b ∈ C , a ⊕ b ∈ C and a ⊗ b ∈ C . We emphasize the first condition. For example, the dioid in Example 4.16 (Boole algebra) is not a subdioid of the one in Example 4.15. The dioid (N ∪ {−∞}, max , +) is a subdioid of Rmax . 4.2. Basic Definitions and Examples 4.2.4 157 Homomorphisms, Isomorphisms and Congruences Most of the material in this subsection is not very specific to the dioid structure and can be found in elementary textbooks on algebra. Here, we reconsider this material in the framework of dioids. from a dioid D into another dioid C Definition 4.20 (Homomorphism) A mapping is a homomorphism if ∀a , b ∈ D , (a ⊕ b ) = (a ) ⊕ (b ) and (ε) = ε , (4.1) (a ⊗ b) = (a ) ⊗ (b ) and (e) = e . (4.2) Of course the operations and neutral elements on the left-hand (respectively right-hand) side are those of D (respectively C ). If is surjective, it is clear that the former part of (4.1) (respectively (4.2)) implies the latter part which is thus redundant. A mapping having only property (4.1) will be called ‘a ⊕-morphism’, and a mapping having property (4.2) will be called ‘a ⊗-morphism’. Definition 4.21 (Isomorphism) A mapping from a dioid D into another dioid C is an isomorphism if −1 is defined over C and and −1 are homomorphisms. is a homomorphism from D to C and if it is a bijection, then it is an Lemma 4.22 If isomorphism. Proof It suffices to prove that −1 satisfies (4.1)–(4.2). Applying (4.1) to a = −1 (x ), x ∈ C and b = −1 ( y ), y ∈ C , we get −1 (x ) ⊕ −1 −1 ( y) = −1 (x ) ⊕ ( y) = x ⊕ y , and therefore −1 (x ) ⊕ −1 ( y) = −1 (x ⊕ y ) . Also (ε) = ε ⇒ ε = which proves that −1 −1 (ε) , is a ⊕-morphism. The same reasoning can be applied to (4.2). Definition 4.23 (Congruence) A congruence in a dioid D is an equivalence relation (denoted ≡) in D which is compatible with ⊕ and ⊗, that is, ∀a , b , c ∈ D , a ≡ b ⇒ a⊕c ≡ b⊕c , and the same for ⊗. Lemma 4.24 The quotient of D by a congruence (that is, the set of equivalence classes) is a dioid for the addition and multiplication induced by those of D. 158 Synchronization and Linearity Proof The main difficulty here is to show how ⊕ and ⊗ can be properly defined in the quotient. Let [a ] denote the equivalence class of a (b ∈ [a ] ⇔ b ≡ a ⇔ [b ] = [a ]). Then define [a ] ⊕ [b ] by [a ⊕ b ]. This definition is correct because if a ∈ [a ] and b ∈ [b ], then [a ⊕ b ] = [a ⊕ b ] from the compatibility of ≡ with ⊕, that is, [a ⊕ b ] only depends on [a ] and [b ], not on particular representatives of these classes. The same considerations apply to ⊗ too. Example 4.25 One special instance of a congruence is the following. Let be a homomorphism from a dioid D to another dioid C . We can define an equivalence relation in D as follows: ∀a , b ∈ D , a≡b⇔ (a ) = (b ) . (4.3) Corollary 4.26 If is a homomorphism, ≡ is a congruence. Therefore, the quotient set denoted D/ is a dioid; it is isomorphic to (D). The proof is straightforward. Of course, D/ is isomorphic to D if 4.3 Lattice Properties of Dioids 4.3.1 is injective. Basic Notions in Lattice Theory Hereafter, we list a few basic notions from lattice theory, the main purpose being to make our vocabulary more precise, especially when there are some variations with respect to other authors. The interested reader may refer to [22] or [57]. In a set, we adopt the following definitions. Order relation: a binary relation (denoted ≥) which is reflexive, transitive and antisymmetric. Total (partial) order: the order is total if for each pair of elements (a , b ), the order relation holds true either for (a , b ) or for (b , a ), or otherwise stated, if a and b are always ‘comparable’; otherwise, the order is partial. Ordered set: a set endowed with an order relation; it is sometimes useful to represent an ordered set by an undirected graph the nodes of which are the elements of the set; two nodes are connected by an arc if the corresponding elements are comparable, the greater one being higher in the diagram; the minimal number of arcs is represented, the other possible comparisons being derived by transitivity. Figure 4.1 below gives an example of such a graph called a ‘Hasse diagram’. Chain: a totally ordered set; its Hasse diagram is ‘linear’. Please note that the following elements do not necessarily exist. Top element (of an ordered set): an element which is greater than any other element of the set (elsewhere also called ‘universal’). 4.3. Lattice Properties of Dioids 159 Bottom element (of an ordered set): similar definition (elsewhere also called ‘zero’, but we keep this terminology for the neutral element of addition, although, as it will be seen hereafter, both notions coincide in a dioid). Maximum element (of a subset): an element of the subset which is greater than any other element of the subset; if it exists, it is unique; it coincides with the top element if the subset is equal to the whole set. Minimum element (of a subset): similar definition. Maximal element (of a subset): an element of the subset which is not less than any other element of the subset; Figure 4.1 shows the difference between a maximum and a maximal element; if the subset has a maximum element, it is the unique maximal element. Majorant (of a subset): an element not necessarily belonging to the subset which is greater than any other element of the subset (elsewhere also called ‘upper bound’ but we keep this terminology for a notion introduced below); if a majorant belongs to the subset, it is the maximum element. Minorant (of a subset): similar definition (elsewhere also called ‘lower bound’, but we reserve this for a more specific notion). Upper bound (of a subset): the least majorant, that is, the minimum element of the subset of majorants (elsewhere, when ‘majorant’ is called ‘upper bound’, this notion is called ‘least upper bound’). Lower bound (of a subset): similar definition (elsewhere also called ‘greatest lower bound’). Maximum element of subset Top element Majorants of subsets and Maximal elements of subset Figure 4.1: Top, maximum, majorants and maximal elements The following items introduce more specific ordered sets and mappings between these sets. 160 Synchronization and Linearity Sup-semilattice: an ordered set such that there exists an upper bound for each pair of elements. Inf-semilattice: similar definition. Lattice: an ordered set which is both a sup- and an inf-semilattice. Complete sup-semilattice: an ordered set such that there exists an upper bound for each finite or infinite subset. Complete inf-semilattice: similar definition. Complete lattice: obvious definition. Distributive lattice: let a ∨ b (respectively, a ∧ b ) denote the upper (respectively, lower) bound of a and b in a lattice; then the lattice is distributive if ∀a , b , c , a ∨ (b ∧ c) = (a ∨ b ) ∧ (a ∨ c) ; in fact, as shown in [57, p. 188], if this equality holds true, the same equality with ∨ and ∧ interchanged also holds true, and conversely. Isotone mapping: a mapping from an ordered set D into an ordered set C such that ∀a , b ∈ D , a≥b⇒ (a ) ≥ (b ) . We conclude this brief enumeration (more facts from lattice theory will be recalled later on) by mentioning a fundamental result [57, pp. 175–176]. Theorem 4.27 A complete sup-semilattice having a bottom element is a complete lattice. Proof Let C be a subset of a complete sup-semilattice D; we must prove that it admits a lower bound. Consider the subset T of minorants of C . This subset is nonempty since it contains at least the bottom element. Let c be the upper bound of T , which exists since D is a complete sup-semilattice. Let us check whether c obeys the definition of the lower bound of C . First, c itself is below C (that is, it belongs to T —it is thus the maximum element of T ). As a matter of fact, T is bounded from above by all b ∈ C (by definition). Since c is less than, or equal to, every element greater than T (by definition of the upper bound), c ≤ b , ∀b ∈ C , hence c ∈ T . Therefore, c is less than, or equal to, all elements of C and greater than every other element below C , namely the elements in T . Hence it is the lower bound of C . 4.3.2 Order Structure of Dioids Theorem 4.28 (Order relation) In a dioid D, one has the following equivalence: ∀a , b : a = a ⊕ b ⇔ ∃c : a = b ⊕ c . 4.3. Lattice Properties of Dioids 161 Moreover these equivalent statements define a (partial) order relation denoted ≥ as follows: a ≥ b ⇔ a = a⊕b . This order relation is compatible with addition, namely a ≥ b ⇒ {∀c , a ⊕ c ≥ b ⊕ c} , and multiplication, that is, a ≥ b ⇒ {∀c , ac ≥ bc} (the same for the left product). Two elements a and b in D always have an upper bound, namely a ⊕ b, and ε is the bottom element of D. Proof • Clearly, if a = a ⊕b , then ∃c : a = b ⊕c, namely c = a . Conversely, if a = b ⊕c, then adding b on both sides of this equality yields a ⊕b = b ⊕(b ⊕c) = b ⊕c = a . • The relation ≥ is reflexive (a = a ⊕a from Axiom 4.9), antisymmetric (a = a ⊕b and b = b ⊕ a implies a = b), and transitive since a ⊕ c = a ⊕ b ⊕ c a⊕c = a⊕b a =a⊕b b = b⊕c ⇒ ⇒ a⊕c=a . ⇒ a =a⊕b b =b⊕c a = a⊕b Therefore ≥ is an order relation. • The compatibility of ≥ with addition is a straightforward consequence of Axioms 4.2 and 4.3. The compatibility of multiplication involves Axiom 4.5. The expression ‘the (left or right) multiplication is isotone’ is also used for this property. But, as will be discussed in §4.4.1, the mapping x → ax is more than simply isotone: it is a ⊕-morphism. • Obviously, a ⊕ b is greater than a and b . Moreover, if c ≥ a and c ≥ b, then c = c ⊕ c ≥ a ⊕ b . Hence a ⊕ b is the upper bound of a and b . • Finally, {∀a , a = a ⊕ ε} ⇔ {∀a , a ≥ ε} , which means that ε is the bottom element of D. Notation 4.29 As usual, we may use a ≤ b as an equivalent statement for b ≥ a , and b > a (or a < b) as an equivalent statement for [b ≥ a and b = a ]. The following lemma deals with the problem of whether the order relation induced by ⊕ is total or only partial. 162 Synchronization and Linearity Lemma 4.30 (Total order) The order relation defined in Theorem 4.28 is total if and only if ∀a , b ∈ D , a ⊕ b = either a or b . Proof It is just a matter of rewriting the claim ‘either a ≥ b or b ≥ a ’ using ⊕ and the very definition of ≥. Let us revisit the previous examples of dioids and discover what is the order relation associated with ⊕. It is the conventional order of numbers for Examples 4.12, 4.14, 4.15 and 4.16. However, in Example 4.13 it is the reversed order: 2 ≥ 3 in this dioid since 2 = 2 ⊕ 3. As for Examples 4.17 and 4.18, ≥ is simply ⊇. All these dioids are chains except for Example 4.17. Theorem 4.28 essentially shows that an idempotent addition in D induces a structure of sup-semilattice over D. But we could have done it the other way around: considering a sup-semilattice, we can define the result of the addition of two elements as their upper bound; this obviously defines an idempotent addition. The sup-semilattice has then to be endowed with another operation called ⊗. This multiplication should be assumed not only isotone but also distributive, except that isotony is sufficient if D is a chain (see §4.4.1). We now present a counterexample to the statement that isotony of multiplication implies distributivity (and therefore of the statement that isotony of a mapping would imply that this mapping is a ⊕-morphism). Example 4.31 Consider Example 4.17 again but change addition to ∩ instead of ∪. Now A ≥ B means A ⊆ B and it is true that this implies A⊗C ≥ B ⊗C or equivalently A + C ⊆ B + C . Since ⊗ = + is isotone, we do have, as a translation of (4.12), ( B ∩ C ) + D ⊆ ( B + D ) ∩ (C + D ) (4.4) (because here ≥ is ⊆, not ⊇!), but equality does not hold in general, as shown by the particular case: B is the subset reduced to the point (1, 0) ∈ R2 , C is similarly reduced to the point (0, 1), whereas D is the square [−1, 1] × [−1, 1]. Clearly, the left-hand side of (4.4) is equal to ∅, whereas the right-hand side is the square [0, 1] × [0, 1] (see 2 Figure 4.2). In conclusion, 2R , ∩, + is not a dioid. 4.3.3 Complete Dioids, Archimedian Dioids In accordance with the definition of complete sup-semilattices, we adopt the following definition. Definition 4.32 (Complete dioid) A dioid is complete if it is closed for infinite sums and Axiom 4.5 extends to infinite sums. With the former requirement, the upper bound of any subset is simply the sum of all its elements. The latter requirement may be viewed as a property of ‘lowersemicontinuity’ of multiplication. 4.3. Lattice Properties of Dioids C +D 163 B +D D Figure 4.2: + is not distributive with respect to ∩ In a complete dioid the top element of the dioid, denoted , exists and is equal to the sum of all elements in D. The top element is always absorbing for addition since obviously ∀a , ⊕ a = . Also ⊗ε = ε , (4.5) because of Axiom 4.7. If we consider our previous examples again, Examples 4.12, 4.13, 4.14 and 4.18 are not complete dioids, whereas Examples 4.15, 4.16 and 4.17 are. For Rmax to be complete, we should add the top element = +∞ with the rule −∞ + ∞ = −∞ which is a translation of (4.5). This completed dioid is called Rmax and R denotes R ∪ {−∞} ∪ {+∞}. Similarly, the dioid in Example 4.18 is not complete but can be completed by adding the ‘half-line’ R itself to the considered set. However something is lost when doing this completion since multiplication does not remain cancellative (see Theorem 4.35 below). Of course, a subdioid of a complete dioid may not be complete. For example (Q ∪ {−∞} ∪ {+∞}, max , +) is a subdioid of Rmax which is not complete. The question arises whether is in general absorbing for multiplication, that is, ∀a ∈ D , a=ε , ⊗a = a⊗ = ? (4.6) Property (4.6) can be proved for ‘Archimedian’ dioids. Definition 4.33 (Archimedian dioid) A dioid is Archimedian if ∀a = ε , ∀b ∈ D , ∃c and d ∈ D : ac ≥ b and da ≥ b . Theorem 4.34 In a complete Archimedian dioid, the absorbing property (4.6) holds true. Proof We give the proof only for right multiplication by . From Definition 4.33, given a , for all b , there exists cb such that acb ≥ b . One has that 164 Synchronization and Linearity a =a b ≥a b ∈D cb b ∈D acb ≥ = b ∈D b= . b ∈D Among our previous examples, all dioids, except for the one in Example 4.15, are Archimedian, but only Examples 4.16 and 4.17 correspond to complete dioids for which (4.6) holds true. Example 4.15 is a case of a dioid which is complete but not Archimedian, and (4.6) fails to be true. Theorem 4.35 If a dioid is complete, Archimedian, and if it has a cancellative multiplication, then it is isomorphic to the Boole algebra. Proof Since (4.6) holds true and since ⊗ is cancellative, it is realized that every element different from ε is equal to . Hence, the dioid is reduced to {ε, }. 4.3.4 Lower Bound Since a complete dioid is a complete sup-semilattice, and since there is also a bottom element ε, the lower bound can be constructed for any subset C of elements of D and the semilattice becomes then a complete lattice (Theorem 4.27). If C = {x , y , z , . . . }, its lower bound is denoted x ∧ y ∧ z ∧ . . . . In general, we use the notation x ∈C x . One has the following equivalences: a ≥b ⇔a =a⊕b ⇔ b =a∧b . (4.7) This operation ∧ is also associative, commutative, idempotent and has as neutral element ( ∧ a = a , ∀a ). The following property, called ‘absorption law’, holds true [57, p. 184]: ∀a , b ∈ D , a ∧ (a ⊕ b ) = a ⊕ (a ∧ b ) = a . Returning to our examples, the reader should apply the formal construction of the lower bound recalled in Theorem 4.27 to Example 4.17 (a complete dioid) and prove that ∧ is simply ∩ in this case. As for the other examples, since all of them are chains, and even when the dioid is not complete, a simpler definition of a ∧ b can be adopted: indeed, owing to Lemma 4.30, (4.7) may serve as a definition. Moreover, in the case of a chain, since a lower bound can be defined anyway, and because there exists a bottom element ε, the dioid is a complete inf-semilattice even if it is not a complete sup-semilattice. Equivalences (4.7) may leave the impression that ⊕ and ∧ play symmetric roles in a complete dioid. This is true from the lattice point of view, but this is not true when considering the behavior with respect to the other operation of the dioid, namely ⊗. Since multiplication is isotone, from Lemma 4.42 in §4.4.1 below it follows that (a ∧ b )c ≤ (ac) ∧ (bc) (4.8) (similarly for left multiplication) what we may call ‘subdistributivity’ of ⊗ with respect to ∧. The same lemma shows that distributivity holds true for chains. But this is not true 4.3. Lattice Properties of Dioids 165 in general for partially-ordered dioids. A counterexample is provided by Example 4.17 (⊕ is ∪, ⊗ is + and ∧ is ∩). In Example 4.31 we showed that + is not distributive with respect to ∩. There are, however, situations in which distributivity of ⊗ with respect to ∧ occurs for certain elements. Here is such a case. Lemma 4.36 If a admits a left inverse b and a right inverse c, then • b = c and this unique inverse is denoted a −1 ; • moreover, ∀x , y , a (x ∧ y ) = ax ∧ ay . The same holds true for right multiplication by a, and also for right and left multiplication by a −1 . Proof • One has b = b(ac) = (ba )c = c, proving uniqueness of a right and left inverse. • Then, ∀x , y , define ξ and η according to (ξ = ax , η = ay ), which is equivalent to (x = a −1 ξ, y = a −1 η). One has ξ ∧ η = aa −1 (ξ ∧ η) ≤ a [a −1ξ ∧ a −1 η ] = a [x ∧ y ] ≤ ax ∧ ay = ξ ∧ η . Hence equality holds throughout. 4.3.5 Distributive Dioids Once the lower bound has been introduced, this raises the issue of the mutual behavior of ⊕ and ∧. In fact, ∧ is not necessarily distributive with respect to ⊕ and conversely, except again for chains. The following inequalities are again consequences of Lemma 4.42 in §4.4.1 below, and of the fact that x → x ⊕ c and x → x ∧ c are isotone: ∀a , b , c ∈ D , (a ∧ b ) ⊕ c ≤ (a ⊕ c) ∧ (b ⊕ c) , (a ⊕ b ) ∧ c ≥ (a ∧ c) ⊕ (b ∧ c) , which means that ⊕ is subdistributive with respect to ∧, and ∧ is superdistributive with respect to ⊕. As already defined, a lattice is distributive when equality holds true in the two inequalities above. Example 4.37 Here is an example of a complete lattice which is not distributive. Consider all the intervals of R (including ∅ and R itself) with ⊆ as ≤. The upper bound ⊕ of any (finite or infinite) collection of intervals is the smallest interval which contains the whole collection, that is, it is the convex hull of the union of all the intervals in the collection. The lower bound ∧ is simply ∩. Then, consider a = [−3, −2], b = [2, 3] and c = [−1, 1]. We have that (a ⊕ b ) ∧ c = c, whereas (a ∧ c) ⊕ (b ⊕ c) = ∅. The following theorem can be found in [57, p. 207]. 166 Synchronization and Linearity Theorem 4.38 A necessary and sufficient condition for a lattice to be distributive is that a∧c = b∧c ∀a , b , ∃c : ⇒ {a = b } . a⊕c = b⊕c In [57] it is also shown that if G is a multiplicative ‘lattice-ordered group’, which means that, in addition to being a group and a lattice, the multiplication is isotone, then • the multiplication is necessarily distributive with respect to both the upper and the lower bounds (G is called a ‘reticulated group’), • moreover, the lattice is distributive (that is, upper and lower bounds are distributive with respect to one another). Also, one has the remarkable formulæ: (a ∧ b )−1 = a −1 ⊕ b−1 , (4.9) (a ⊕ b )−1 = a −1 ∧ b −1 , (4.10) a ∧ b = a (a ⊕ b )−1 b , (4.11) which should remind us of the De Morgan laws in Boolean algebra, and also the simple formula min(a , b) = − max (−a , −b ). However, this situation is far from being representative for the general case as shown by Examples 4.15 (total order) and 4.17 (partial order). Nevertheless, these examples correspond to ‘distributive dioids’ in the following sense. Definition 4.39 (Distributive dioid) A dioid D is distributive if it is complete and, for all subsets C of D, c ⊕a = ∀a ∈ D , c ∈C (c ⊕ a ) , c ∈C c ∧a = c ∈C (c ∧ a ) . c ∈C Notice that here distributivity is required to extend to infinite subsets. Both properties should be required now since one does not imply the other in the infinite case [57, p. 189]. Using the terminology of §4.4.1, we may state the preceding definition in other words by saying that a dioid is distributive if and only if the mappings x → a ∧ x and x → a ⊕ x are both continuous for every a . All complete dioids considered so far are distributive. Example 4.37 can be extended to provide a nondistributive dioid. It suffices to define ⊗ as the ‘sum’ of intervals (conventional arithmetic sum). Of course, ⊕ is the upper bound as defined in that example (the reader may check the distributivity of ⊗ with respect to ⊕). 4.4. Isotone Mappings and Residuation 167 Remark 4.40 A distributive dioid may also be considered as a dioid with the two def def def def operations ⊕ = ⊕ and ⊗ = ∧. But one can also choose ⊕ = ∧ and ⊗ = ⊕. Special features of these dioid structures are that ∀x , ε ≤ x ≤ e and that multiplication is commutative and idempotent. Examples 4.15 and 4.16 are instances of such dioids. 4.4 Isotone Mappings and Residuation Most of the material in this section is classical in Lattice Theory. The structure added by ⊗ in dioids plays virtually no role, except of course when the mappings considered themselves involve multiplication (for example when considering the residual of the mapping x → ax ). A basic reference is the book by Blyth and Janowitz [24]. 4.4.1 Isotony and Continuity of Mappings We are going to characterize isotone mappings in terms of ‘lower’ and ‘upper sets’. Definition 4.41 (Lower, upper set) A lower set is a nonempty subset L of D such that (x ∈ L and y ≤ x ) ⇒ y ∈ L . A closed lower set (generated by x) is a lower set denoted [←, x ] of the form { y | y ≤ x }. An upper set is a subset U such that (x ∈ U and y ≥ x ) ⇒ y ∈ U . A closed upper set (generated by x) is an upper set denoted [x , →] of the form { y | y ≥ x }. The names ‘(principal) ideal’ and ‘(principal) filter’ are used for ‘(closed) lower set’ and ‘(closed) upper set’, respectively, in [24]. A closed lower set is a lower set which contains the upper bound of its elements. Similarly, a closed upper set is an upper set containing the lower bound of its elements. For a chain, say Rmax , closed lower sets correspond to closed half-lines (−∞, x ], lower sets are open or closed half-lines, whereas closed upper sets are of the type [x , +∞). Figure 4.3 gives examples of such sets in a partially-ordered lattice. Obviously, if is a ⊕- or a ∧-morphism, it is isotone (see §4.3.1). For example, for every a ∈ D, the mapping x → ax from D into itself is a ⊕-morphism, hence it is isotone. But, conversely, if is isotone, it is neither necessarily a ⊕- nor necessarily a ∧-morphism. Lemma 4.42 Let be a mapping from a dioid D into another dioid C . The following statements are equivalent: 1. the mapping is isotone; 2. the ‘pre-image’ empty; −1 ([←, x ]) of every closed lower set is a lower set or it is 168 Synchronization and Linearity closed lower set lo w e r s e t Figure 4.3: Lower set and closed lower set 3. the pre-image empty; 4. the mapping −1 ([x , →]) of every closed upper set is an upper set or it is is a ⊕-supermorphism, that is, ∀a , b ∈ D , (a ⊕ b ) ≥ 5. if lower bounds exist in D and C , ∀a , b ∈ D , (a ) ⊕ (b ) ; (4.12) is a ∧-submorphism, that is, (a ∧ b ) ≤ (a ) ∧ (b ) . (4.13) Proof Suppose that is isotone. Let a ∈ −1 ([←, x ]) if this subset is nonempty. Then (a ) ≤ x . Let b ≤ a . Then (b ) ≤ (a ), hence (b ) ≤ x and b ∈ −1 ([← , x ]). Therefore −1 ([←, x ]) is a lower set. Conversely, let b ≤ a . Since obviously a ∈ −1 ([←, (a )]) and since this latter is a lower set by assumption, then b belongs to this subset. Hence (b ) ≤ (a ) and is isotone. A similar proof involving upper sets is left to the reader. Suppose that is isotone. Since a and b are less than a ⊕ b , then (a ) and (b), and thus their upper bound (a ) ⊕ (b ), are less than (a ⊕ b ) proving (4.12). Conversely, let b ≤ a , or equivalently, a = a ⊕ b . Then, under the assumption that (4.12) holds true, (a ) = (a ⊕ b ) ≥ (a ) ⊕ (b ), proving that (b ) ≤ (a ). Thus is isotone. A similar proof involving ∧ instead of ⊕ can be given. If D is a chain, a ⊕ b is equal to either a or b , hence (a ⊕ b ) is equal to either (a ) or (b ), hence to (a ) ⊕ (b ). That is, is a ⊕-morphism in this case. Similarly, it is a ∧-morphism too. If D and C are complete dioids, it is easy to see that (4.12) (respectively (4.13)) extends to ⊕ (respectively ∧) operating over infinite subsets of D. Definition 4.43 (Continuity) A mapping from a complete dioid D into a complete dioid C is lower-semicontinuous, abbreviated as l.s.c. respectively, upper-semicontinuous, abbreviated as u.s.c. if, for every (finite or infinite) subset X of D, x x∈X = (x ) , x∈X 4.4. Isotone Mappings and Residuation 169 respectively, = x x∈X The mapping (x ) . x∈X is continuous if it is both l.s.c. and u.s.c. Of course, a l.s.c. (respectively an u.s.c.) mapping is a ⊕- (respectively a ∧-) morphism. If is a ⊕-morphism, it is isotone and thus it is a ∧-submorphism, but not necessarily a ∧-morphism. This has already been illustrated by Example 4.31. To justify the terminology ‘l.s.c.’, one should consider the example of a nondecreasing mapping from R to itself (because here we are interested in isotone mappings between ordered sets), and check that, in this case, the lower-semicontinuity in the previous sense coincides with the conventional notion of lower-semicontinuity (which requires that lim infxi →x (x i ) ≥ (x )). The same observation holds true for uppersemicontinuity. Lemma 4.44 One has the following equivalences: • is l.s.c.; • the pre-image is empty. −1 ([←, x ]) of every closed lower set is a closed lower set or it Similarly, the following two statements are equivalent: • is u.s.c.; • the pre-image is empty. −1 ([x , →]) of every closed upper set is a closed upper set or it Proof We prove the former equivalence only. Suppose that is l.s.c. In particular it is isotone and the pre-image X = −1 ([←, x ]) of every closed lower set, if nonempty, is a lower set. If a ∈ X , then (a ) ≤ x . Thus = a a ∈X (a ) ≤ x . a ∈X Hence the upper bound of X belongs to X : X is closed. Conversely, suppose that the pre-image of every closed lower set is a closed lower set. In particular, according to Lemma 4.42, is isotone and for every nonempty subset X ⊆ D, one has that x ≥ x ∈X (x ) . x ∈X On the other hand, it is obvious that X⊆ −1 (x )] . [← , x ∈X (4.14) 170 Synchronization and Linearity But the latter subset is a closed lower set by assumption, hence it contains the upper bound of its elements and a fortiori it contains the upper bound of X . This implies the reverse inequality in (4.14). Therefore equality holds true and is l.s.c. Example 4.45 Let D = Rmax and C = Nmax , let : D → C defined by y, :x→y= y ∈C , y ≤ x where the symbol ≤ has its usual meaning. Indeed, is the residual of the mapping x → x from C into D (not from D to C !)—see §4.4.2 below. More simply, (x ) is just the integer part of the real number x . Then, is a ⊕- and a ∧-morphism, it is u.s.c. (this is a consequence of being a residual) but not l.s.c. Lemma 4.46 The set of l.s.c. mappings from a complete dioid D into itself is a complete dioid when endowed with the following addition ⊕ and multiplication ⊗: ⊕ ⊗ : : x x → → (x ) ⊕ (x ) ; ( (x )) . (4.15) Similarly, the set of u.s.c. mappings from D into D is a complete dioid when endowed with the following addition ⊕ and multiplication ⊗: ⊕ ⊗ : : x x → → (x ) ∧ (x ) ; ( (x )) . (4.16) Proof We only prove the former statement. It is easy to check that l.s.-continuity is preserved by addition and composition of mappings. The other axioms of dioids are also easily checked. In particular, ε is the mapping identically equal to ε and e = ID (identity of D). Distributivity of right multiplication with respect to addition is straightforward from the very definitions of ⊕ and ⊗, whereas left distributivity involves the assumption of l.s.-continuity. Remark 4.47 Observe that since Lemma 4.46 defines a complete dioid structure for the set of l.s.c. mappings from a complete dioid D into itself, this dioid of mappings has a lower bound operation (Theorem 4.27) also denoted ∧. However, in general, for two mappings and , ( ∧ )(x ) = (x ) ∧ (x ) since the right-hand side is in general not a l.s.c. function of x . Example 4.48 Consider D = Rmax 2 (operations of D are those of Rmax operating 2 componentwise). Let C = R , ⊕, ⊗ for which the underlying set is the same as that of D and ⊗ is also the same. But ⊕ is defined as follows: 2 ∀x , y ∈ R , x⊕y = x y if (x 1 > y1 ) or (x 1 = y1 and x 2 ≥ y2 ) ; if (x 1 < y1 ) or (x 1 = y1 and x 2 ≤ y2 ) . 4.4. Isotone Mappings and Residuation 171 2 The order in D is the usual partial order in R whereas C is totally ordered by the lexicographic order. Let : D → C be simply the canonical bijection x → x . This is an isotone mapping since x ≤ y in D implies x ≤ y in C . However, this mapping is neither l.s.c. nor u.s.c. Figure 4.4 depicts the shape of a closed lower set generated by the point x = (2, 3) in C (shaded area). It is a half-plane including the border Figure 4.4: A closed lower set for the lexicographic order 2 for x 2 ≤ 3 but not for x 2 > 3. Since −1 is also the canonical bijection in R , the pre-image of this closed lower set is itself and it is not a closed lower set in D: closed lower sets in D consist of closed south-west orthants. Hence is not l.s.c. This example is also interesting because it shows that an isotone bijection may have an inverse mapping −1 which is not isotone. As a matter of fact, x ≤ y in C does not imply the same inequality in D in general. However, we leave to the reader to prove that an isotone bijection from a totally ordered dioid D onto another dioid C has an isotone inverse and, moreover, if both dioids are complete, the mapping and its inverse are continuous. Lemma 4.49 Let D and C be complete dioids and be a homomorphism from D into C . Consider the congruence defined in Example 4.25. If is l.s.c. (respectively u.s.c.), then every equivalence class has a maximum (respectively minimum) element, which is therefore a canonical representative of the equivalence class. Proof We consider the case of a l.s.c. mapping. Let [x ] denote the equivalence class def of any x ∈ D, then x = y ∈[ x ] y is the upper bound of [x ], and it belongs to [x ] since, by lower-semicontinuity, (x ) = y = y ∈[ x ] 4.4.2 ( y) = (x ) . y ∈[ x ] Elements of Residuation Theory Residuation has to do with ‘inverting’ isotone mappings and with solving equations. Let be an isotone mapping from a dioid D into another dioid C . To guarantee the 172 Synchronization and Linearity existence of upper and lower bounds, we assume throughout this subsection that D and C are complete. If is not surjective, the equation in x : (x ) = b will have no solution for some values of b , and if is not injective, the same equation may have nonunique solutions. One way to always give a unique answer to this problem of equation solving is to consider the subset of so-called ‘subsolutions’, that is, values of x satisfying (x ) ≤ b , if this subset is nonempty, and then to take the upper bound of the subset, if it exists: it remains to be checked whether the upper bound itself is a subsolution, namely, that it is the maximum element of the subset of subsolutions, which has to do with l.s.-continuity of . In this case, this maximum element will be denoted (b ) and we have (b ) = x and (b ) ≤ b . (4.17) { x | ( x ) ≤b } Dually, one may consider ‘supersolutions’ satisfying (x ) ≥ b , if again this subset is nonempty, and then take the lower bound assuming it exists: again it remains to check whether the lower bound is itself a supersolution, namely, that it is the minimum element of the subset of supersolutions, which has to do with u.s.-continuity of . In this case this minimum element will be denoted (b ) and we have (b ) = x and (b ) ≥ b . (4.18) { x | ( x ) ≥b } Theorem 4.50 Let be an isotone mapping from the complete dioid D into the complete dioid C . The following three statements are equivalent: 1. For all b ∈ C , there exists a greatest subsolution to the equation (x ) = b (by this we mean that the subset of subsolutions is nonempty and that it has a maximum element). 2. (ε) = ε and is l.s.c. (or equivalently, the pre-image of every closed lower set is nonempty and it is a closed lower set). from C into D which is isotone and u.s.c. such that 3. There exists a mapping ◦ ◦ is unique. When Consequently, ated and is called its residual. ≤ ≥ IC ID (identity of C ) ; (identity of D) . (4.19) (4.20) satisfies these properties, it is said to be residu- Proof First of all, it should be clear that the two statements in ‘2.’ above are equivalent. In the rest of the proof, we always refer to the former of these two statements. 1 ⇒ 3: As a matter of fact, ∀b ∈ C , there exists a greatest subsolution that we denote (b ). It is obvious that the mapping thus defined is isotone. Inequality (4.19) is immediate from the definition of a subsolution. Now, ∀x ∈ D, let 4.4. Isotone Mappings and Residuation 173 b = (x ). Since x is a subsolution corresponding to that b , from the definition of (b ), x ≤ (b ) = ◦ (x ), from which (4.20) follows. is u.s.c. Since We now prove that B ⊆ C , one has that is isotone, using (4.13), for a subset b≤ b∈B (b ) . (4.21) b∈B Using (4.13) again, we obtain (b ) ≤ b∈B (b ) ≤ ◦ b∈B b, b∈B in which the latter inequality follows from (4.19). Hence b ∈ B (b ) is a subsolution corresponding to the right-hand side b ∈ B b . Thus, the reverse inequality also holds true in (4.21), and equality is obtained, proving that is u.s.c. 3 ⇒ 2: From (4.19), ◦ (ε) ≤ ε ⇒ ◦ (ε) = ε. But (ε). If we combine the two facts, it follows that ε ≥ of the two sides. Let X ⊆ D. Since is isotone, it follows from (4.12) that (x ) ≤ x. x∈X For all x ∈ X , let bx = x ≤ x∈X (ε) ≥ ε ⇒ ◦ (ε) ≥ (ε), proving the equality (4.22) x∈X (x ). Because of (4.20), (bx ) ≤ bx ◦ x∈X (bx ) ≥ x , hence bx = ≤ x∈X x∈X (x ) , x∈X where we used (4.12) for and then (4.19). This is the reverse inequality of (4.22), hence equality holds true and the l.s.-continuity of is proved. 2 ⇒ 1: Since (ε) = ε, the subset of subsolutions X b ⊆ D is nonempty ∀b ∈ C . Then, by l.s.-continuity of , and since every x ∈ X b is a subsolution, x∈X b This proves that x∈X b x = (x ) ≤ b . x∈X b x is a subsolution too. Finally, since the greatest subsolution is unique by definition, the equivalences above imply that is unique as well. Remark 4.51 It is clear that −1 ([←, x ]) = [←, , then ( ) ≥ , hence ( ) = . (x )]. Moreover, since ( )≤ Now, instead of being interested in the greatest subsolution, we may search for the least supersolution. This is dual residuation. The dual of Theorem 4.50 can be stated. 174 Synchronization and Linearity Theorem 4.52 Let be an isotone mapping from the complete dioid D into the complete dioid C . The following three statements are equivalent: 1. For all b ∈ C , there exists a least supersolution to the equation (x ) = b (by this we mean that the subset of supersolutions is nonempty and that it has a minimum element). 2. ( ) = and is u.s.c. (or equivalently, the pre-image of every closed upper set is nonempty and it is a closed upper set). from C into D which is isotone and l.s.c. such that 3. There exists a mapping ◦ ◦ Consequently, residuated and ≥ ≤ IC ID (identity of C ) ; (identity of D) . (4.23) (4.24) is unique. When satisfies these properties, it is said to be dually is called its dual residual. Remark 4.53 One has that −1 ([x , →]) = [ (x ), →] , and (ε) = ε . It should also be clear that if is residuated, its residual is dually residuated and ( )= . Example 4.54 An example of a residuated mapping was encountered in Example 4.45. Indeed, if we let D = Nmax and C = Rmax , the canonical injection from N into R is residuated and its residual is the mapping described in that example, that is, the ‘integer part’ of a real number ‘from below’. The same injection is also dually residuated and its dual residual is the ‘integer part from above’. Example 4.55 Another interesting example is provided by the mapping :x→ (x , x ) from a complete dioid D into D2 . This mapping again is residuated and dually residuated and it is easy to check that (x , y ) = x ∧ y and (x , y ) = x ⊕ y . Subsection 4.4.4 provides other examples on residuation. The following theorem lists additional properties of residuated mappings and residuals, and dual properties when appropriate. 4.4. Isotone Mappings and Residuation 175 Theorem 4.56 • If is a residuated mapping from D into C , then ◦ ◦ ◦ ◦ = ; = (4.25) . (4.26) One has the following equivalences: ◦ = ID ⇔ ◦ injective ⇔ = IC ⇔ surjective ; surjective . injective ⇔ (4.27) (4.28) The same statements hold true for dually residuated mappings by changing into . • If : D → C and residuated and : C → B are residuated mappings, then ( ◦ )= ◦ . ◦ is also (4.29) Again, the same statement holds true with instead of . • If , , and ated, then are mappings from D into itself, and if ◦ ≤ ◦ ⇔ ≤ ◦ ◦ and are residu- . (4.30) As corollaries, one has that ≤ ⇔ ≤ ID ≥ ID ≤ ⇔ ⇔ , (4.31) and ≥ ID , ≤ ID . (4.32) (4.33) Similar statements hold true for dual residuals with appropriate assumptions; in particular, the analogue of (4.30) is ◦ ≤ ◦ ⇔ ≤ ◦ ◦ . (4.34) • If and are two residuated mappings from a dioid D (in which ∧ exists) into itself, then ⊕ is residuated and ( If and ⊕ )= are dually residuated, then ( ∧ )= ∧ ∧ . (4.35) is dually residuated and ⊕ . (4.36) 176 Synchronization and Linearity • If and are two residuated mappings from a dioid D (in which ∧ exists) into itself and if ∧ is residuated, then ( ∧ )≥ ⊕ . (4.37) ( ⊕ )≤ ∧ . (4.38) With dual assumptions, Proof About (4.25)–(4.26): One has that ◦ ◦ = ≥ , ≤ ◦ ◦ , which follows from (4.20). But one also has that ◦ ◦ = ◦ ◦ by making use of (4.19), hence (4.25) follows. Equation (4.26) is similarly proved by remembering that is isotone. ◦ = ID and suppose that (x ) = ( y ). About (4.27)–(4.28): Assume that Applying , we conclude that x = y , hence is injective. Also, since ◦ (x ) = x , it means that every x belongs to Im , hence is surjective. Conversely, if ◦ = ID , there exists y such that x = ◦ ( y ) = y . However, because of (4.25), (x ) = ( y ). Hence cannot be injective. On the other hand, if is surjective, ∀x ∈ D, ∃b ∈ C : (b ) = x . Since x is a subsolution corresponding to the right-hand side b , (x ) ≤ b , hence ◦ (x ) ≤ (b ) = x . ◦ We conclude that ≤ ID , but equality must hold true because of (4.20). This completes the proof of (4.27). The proof of (4.28) is similar. About (4.29): As already noticed l.s.- or u.s.-continuity is preserved by composition of two similarly semicontinuous mappings and the same holds true for the property of being residuated (consider the conditions stated in item 2 of Theorem 4.50). Also ◦ is an isotone and u.s.c. mapping. Finally, ◦ ◦ ◦ = ◦ ◦ ◦ ≤ ◦ ≤ IC , by repeated applications of (4.19), showing that ◦ satisfies (4.19) together with ◦ . Likewise, it can be proved that (4.20) is met by the two composed functions. From the uniqueness of the residual, we conclude that (4.29) holds true. ◦ ◦◦ ◦◦ ≤ ◦ About (4.30)–(4.34): If ◦ ≤ ◦ , then which implies that ◦ ≤ ◦ using (4.19)–(4.20). The converse proof is left to the reader (use a dual trick). Then (4.31) is obvious whereas to prove (4.32)–(4.33), we use the straightforward fact that ID is residuated and is its own residual. The proof of (4.34) is similar to that of (4.30). 4.4. Isotone Mappings and Residuation 177 About (4.35)–(4.36): We give a proof of (4.35) only. First it is clear that the sum of two residuated mappings is residuated (l.s.-continuity is preserved by ⊕). Now consider the composition = 3 ◦ 2 ◦ 1 of the following three mappings: 1 2 3 :D : D2 : D2 → D2 , → D2 , → D, → → → x x x y y xx , (x ) ( y) x⊕y . , Thus :D→D , x →( ⊕ )(x ) . Then, 1 2 3 : D2 → D , : D2 → D2 , : D → D2 , x x y y x →x∧y → (x ) →xx ( y) (see Example 4.55), (trivial), , the last statement following also from Example 4.55 since it was explained there that ( 3 ) is indeed 3 . Then, it suffices to calculate by using (4.29) repeatedly to prove (4.35). About (4.37)–(4.38): We prove (4.37) only. Observe first that it is necessary to assume that ∧ is residuated since this is not automatically true. Then ( ∧ )◦ ⊕ =( )◦ ⊕( ∧ )◦ , since ∧ is assumed residuated and hence l.s.c. The former term at the righthand side is less than ◦ which is less than ID ; the latter term is less than ◦ which again is less than ID , and so is the left-hand side. This suffices to prove (4.37). Remark 4.57 Returning to Lemma 4.49, if is residuated, it should be clear that x considered in the proof of this lemma is nothing but ◦ (x ). 4.4.3 Closure Mappings Here we study a special class of mappings of a dioid into itself which will be of interest later on. Definition 4.58 (Closure mapping) Let D be an ordered set and : D → D be an isotone mapping such that = ◦ ≥ ID , then is a called a closure mapping. If = ◦ ≤ ID , then is called a dual closure mapping. Theorem 4.59 If : D → D is a residuated mapping, then the following four statements are equivalent: = ◦ ◦ = = ≥ ID = ≤ ID , ◦ ◦ . (i.e. (i.e. is a closure mapping), is a dual closure mapping), (4.39) (4.40) (4.41) (4.42) 178 Synchronization and Linearity Proof (4.39) ⇒ (4.40): This follows from (4.29) and (4.33). (4.40) ⇒ (4.41): From (4.40) it follows that ◦ ◦ = ◦ . The left-hand side is less than or equal to because of (4.19). The right-hand side is greater than or equal to because ≤ ID ⇒ ≥ ID (see (4.33)). Hence (4.41) is proved. (4.41) ⇒ (4.42): From (4.41), it follows that ◦ ◦ = is equal to (see (4.25)). Hence (4.42) results. (4.42) ⇒ (4.39): Since hand, (4.42) ⇒ ◦ = ◦ ◦ = ◦ . But the left-hand side , then ≥ ID because of (4.20). On the other but the left-hand side is equal to (see (4.25)). ◦ Theorem 4.59 states that all residuated closure mappings can be expressed as in (4.42). Indeed, all closure mappings on D can be factored as ◦ for some : D → C , where C is another ordered set [24, Theorem 2.7]. Another characterization of l.s.c. closure mappings will be given in Corollary 4.69. Theorem 4.60 If : D → D is a dually residuated mapping, then the following four statements are equivalent: = ◦ ≤ ID ◦ = = = ◦ is a dual closure mapping), (i.e. ≥ ID (i.e. is a closure mapping), , ◦ . (4.43) (4.44) (4.45) (4.46) Lemma 4.61 If and are closure mappings on D and if they are ∧-morphisms, then ∧ also is a closure mapping. Likewise, if and are dual closure mappings and if they are ⊕-morphisms, then ⊕ is a dual closure mapping. These statements extend to infinite numbers of mappings if the mappings are u.s.c., respectively l.s.c. ∧ Proof Let us prove the former statement. Clearly, ( ∧ )◦( ∧ )= ∧ ◦ ∧ ◦ ∧ ≥ ID . Moreover, = ∧ , are greater than ID . since and 4.4.4 Residuation of Addition and Multiplication In this subsection we consider the following mappings from a dioid D into itself: Ta : La : Ra : x →a⊕x x →a⊗x x →x⊗a (translation by a ); (left multiplication by a ); (right multiplication by a ). Observe that Ta ◦ Tb = Tb ◦ Ta = Ta ⊕b = Ta ⊕ Tb . (4.47) 4.4. Isotone Mappings and Residuation 179 Moreover, if D is a distributive dioid, Ta ∧ Tb = Ta ∧b . (4.48) As for multiplication, the associativity of ⊗ implies that L a ◦ L b = L ab , (4.49) L a ◦ Rb = Rb ◦ L a . (4.50) and also that The distributivity of ⊗ with respect to ⊕ implies that L a ⊕ L b = L a ⊕b , (4.51) L a ◦ Tb = Tab ◦ L a . (4.52) and also that Observe that L a is l.s.c. if and only if (left) multiplication is distributive with respect to addition of infinitely many elements, which we assume here, and, since moreover L a (ε) = ε, L a is residuated. The same considerations apply to right multiplication Ra . ◦ Notation 4.62 We use the one-dimensional display notation L a (x ) = a \ x (‘left divi◦ sion’ by a —reads ‘a (left) divides x ’), respectively, Ra (x ) = x/a (‘right division’ by a —reads ‘x (right) divided by a ’), and the two-dimensional display notation x x L a (x ) = , Ra (x ) = . a a As for Ta , since Ta (ε) = ε unless a = ε, this mapping is not residuated. Actually, by restraining the range of Ta to a ⊕ D, that is, to the subset of elements greater than or equal to a (call this new mapping Aa : D → a ⊕ D), we could define a residual Aa with domain equal to a ⊕ D. However, this is not very interesting since Aa is simply the identity of a ⊕ D. Indeed, since Aa is surjective (by definition), Aa ◦ Aa is the identity according to (4.28). On the other hand, since Aa is obviously a closure mapping, Aa ◦ Aa = Aa (see (4.41)). This is why we assume that D is a distributive dioid—see Definition 4.39—and, as a consequence of that, Ta is u.s.c. Since moreover Ta ( ) = , Ta is dually residuated. ◦ Notation 4.63 We use the notation Ta (x ) = x − a . It should be clear that: x− a = ε ⇔ a ≥ x . ◦ We are going to list a collection of formulæ and properties for these two new operations, ‘division’ and ‘subtraction’, which are direct consequences of the general properties enumerated in §4.4.2. For the sake of easy reference, the main formulæ have been gathered in Tables 4.1 and 4.2. In Table 4.1, left and right multiplication and division are both considered. 180 Synchronization and Linearity Remember that, when we consider properties involving − , the dioid D is ◦ tacitly assumed to be complete (hence multiplication is infinitely distributive) and also distributive. Table 4.1: Formulæ involving division x∧y x y = ∧ a a a x⊕y x y ≥ ⊕ a a a x x x = ∧ a⊕b a b x x x ≥ ⊕ a∧b a b x a ≤x a ax ≥x a ax a = ax a ◦ a (a \ x ) x = a a ◦ x a \x = ab b ◦ ◦ a \x x/b = b a x x b≤ ◦ a a/b x xb b≤ a a x x ⊕ ab ⊕b ≤ a a x∧y x y = ∧ a a a x⊕y x y ≥ ⊕ a a a x x x = ∧ a⊕b a b x x x ≥ ⊕ a∧b a b x a≤x a xa ≥x a xa a = xa a ◦ (x/a )a x = a a ◦ x x/a = ba b ◦ ◦ x/a b \x = b a x x b≤ ◦ a b \a x bx b≤ a a x x ⊕ ba ⊕b≤ a a (f.1) (f.2) (f.3) (f.4) (f.5) (f.6) (f.7) (f.8) (f.9) (f.10) (f.11) (f.12) (f.13) For Table 4.1, we only prove the left-hand side versions of the formulæ. Formulæ (f.1) and (f.2) are consequences of the fact that L a is u.s.c. and isotone. Dually, Formulæ (f.14) and (f.15) result from Ta being l.s.c. and isotone. Inequalities (4.19)– (4.20) imply (f.5)–(f.6). However, if multiplication is cancellative, then L a is injective ◦ ◦ and (see (4.27)) a \(ax ) = x and x → a \x is surjective. If a is invertible, then 4.4. Isotone Mappings and Residuation 181 Table 4.2: Formulæ involving subtraction (x ⊕ y ) − a = (x − a ) ⊕ ( y − a ) ◦ ◦ ◦ (f.14) (x ∧ y ) − a ≤ (x − a ) ∧ ( y − a ) ◦ ◦ ◦ (f.15) (x − a ) ⊕ a = x ⊕ a ◦ (f.16) (x ⊕ a ) − a = x − a ◦ ◦ (f.17) x − (a ⊕ b) = (x − a ) − b = (x − b ) − a ◦ ◦ ◦ ◦◦ (f.18) x − (a ∧ b) = (x − a ) ⊕ (x − b ) ◦ ◦ ◦ (f.19) ax − ab ≤ a (x − b ) ◦ ◦ (f.20) x = (x ∧ y ) ⊕ (x − y ) ◦ (f.21) ◦ a \x = a −1 x . Dually, (4.23)–(4.24) yields (x − a ) ⊕ a ≤ x and (x ⊕ a ) − a ≤ x . ◦ ◦ But this is weaker than what results from Theorem 4.60. Clearly Ta is a closure mapping, hence from (4.44), (x − a ) − a = x − a , ◦ ◦ ◦ and one also gets (f.16)–(f.17) from (4.45)–(4.46). It follows that x ≥ a ⇒ (x − a ) ⊕ a = x and (x ≥ a , y ≥ a , x − a = y − a ) ⇒ (x = y ) , ◦ ◦ ◦ which may also be viewed as consequences of the dual of (4.28) by observing that Ta is surjective if its range is restrained to a ⊕ D. Formulæ (f.7)–(f.8) are consequences of (4.25)–(4.26). The dual result stated for ⊕ and − is weaker than (f.16)–(f.17). ◦ As a consequence of (4.31) or its dual, a≤b⇔ x x ≥ , a b ∀x ⇔ { x − a ≥ x − b , ◦ ◦ ∀x } . ◦ In particular, a ≥ e ⇔ a \x ≤ x , ∀x and x − a ≤ x , ∀a , ∀x since a is always greater ◦ than or equal to ε and x − ε = x . ◦ Using (4.29) and (4.49), one gets (f.9). Dually, using (4.47), Formula (f.18) is derived. Formula (f.10) is a consequence of (4.50) and (4.29). To obtain (f.12), one makes use of (4.30) with = = L a and = = Rb , and also of (4.50). The ◦ ◦ ◦ proof of (f.11) essentially uses (f.5) twice: x ≥ a (a \x ) ≥ (a/b ) b (a \x ) (the latter inequality arising from the version of (f.5) written for right multiplication and division applied to the pair (a , b ) instead of (x , a )); by associativity of the product, and from the very definition of L a/b (x ), one obtains (f.11). ◦ 182 Synchronization and Linearity Equations (4.35) and (4.51) yield Formula (f.3) (which should be compared with (4.10)), whereas (4.36) and (4.48) yield (f.19). Because of (4.8), L a ∧b ≤ L a ∧ L b . If L a ∧ L b were residuated, we could use Inequality (4.37) to get that L a ∧b ≥ ( L a ∧ L b ) ≥ L a ⊕ L b , which would prove (f.4). Unfortunately, L a ∧ L b is not residuated in general, unless multiplication is distributive with respect to ∧. A direct proof of (f.4) is as follows: L a ∧b ≤ L a , hence L a ∧b ≥ L a ; similarly L a ∧b ≥ L b ; hence L a ∧b ≥ L a ⊕ L b . As for (4.38) applied to Ta ⊕ Tb , it would yield a weaker result than Equality (f.18). Finally, consider (4.52) and use (4.30) with = = L a , = Tb , = Tab ; this yields (f.13). Considering (4.52) again, but now in connection with (4.34), and setting = = L a , = Tb , = Tab yield (f.20). An interesting consequence of some of these formulæ is the decomposition of any x with respect to any y as given by (f.21). Indeed, (x ∧ y ) ⊕ (x − y ) ◦ = (x ⊕ (x − y )) ∧ ( y ⊕ (x − y )) ◦ ◦ = = x ∧ (x ⊕ y ) x, the first equality following from the assumption of distributivity, the second based on the fact that x − y ≤ x on the one hand, and (f.16) on the other hand, the last equality ◦ being obvious. As a corollary, x ⊕ y = (x − y ) ⊕ (x ∧ y ) ⊕ ( y − x ) , ◦ ◦ which is straightforward using the decompositions of x with respect to y and of y with respect to x . Remark 4.64 Formula (f.3) can be written L a ⊕b (x ) = L a (x ) ∧ L b (x ), whereas (f.9) can be written L a b (x ) = L b ◦ L a (x ). Then considering the dioid structure of u.s.c. mappings from D into D described in Lemma 4.46 (see (4.16)), it is realized that the mapping a → L a is a homomorphism from D into that dioid of u.s.c. mappings. Likewise, (f.19) can be written Ta ∧b (x ) = Ta (x ) ⊕ Tb (x ), whereas (f.18) can be written Ta ⊕b (x ) = Tb (x )◦ Ta (x ). Remember that now D is supposed to be a distributive dioid. Consider the dioid of l.s.c. mappings with the operations defined by (4.15). Observe that ⊗ is commutative and idempotent when restricted to elements of the form Ta . For the mapping a → Ta to be a homomorphism, we must supply D with the def def addition ⊕ = ∧ and the multiplication ⊗ = ⊕ (see Remark 4.40). Example 4.65 Let us consider the complete dioid Rmax . From the very definition of − , we have that ◦ a if b < a ; ◦ ∀a , b ∈ R , a − b = ε otherwise. ◦ ◦ As for a/b (or b \a , which is the same since the multiplication is commutative), it is equal to a − b (conventional subtraction) whenever there is no ambiguity in this expression, that is, in all cases except when a = b = ε = −∞ and when a = b = 4.4. Isotone Mappings and Residuation 183 ◦ ◦ ◦ = +∞. Returning to the definition of /, it should be clear that ε/ε = / = , which yields the rule ∞ − ∞ = +∞ in conventional notation. Note that the conventional notation should be avoided because it may be misleading. As a matter of fact, we also have that ⊗ ε = ε (according to Axiom 4.7), which yields the rule ∞ − ∞ = −∞: this seems to contradict the previous rule, at least when using conventional notation. ◦ Example 4.66 In order to illustrate the operations − and / in the case of the commu◦ tative dioid of Example 4.17, consider first Figure 4.5 in which A and B are two disks (respectively, transparent and grey). The subset C = B − A is the smallest subset such ◦ that C ∪ A ⊇ B : it is depicted in the figure which also illustrates Formula (f.16) in this particular case. Consider now Figure 4.6 in which A is a disk centered at the origin, ◦ whereas B is a square. Then, C = B/ A is the largest subset such that C + A ⊆ B : it is the dark small square in the middle of the figure. The right-hand side of this figure illustrates Formula (f.5) (written here for right division and multiplication). A B B−A ◦ (B − A ) ⊕ A = B ⊕ A ◦ Figure 4.5: The operation − in 2R , ∪, + ◦ 2 ◦ B /A ◦ (B /A ) ⊗ A ≤ B A B ◦ Figure 4.6: The operation / in 2R , ∪, + 2 We conclude this subsection by considering the problem of ‘solving’ equations of the form ax ⊕ b = c (4.53) in the sense of the greatest subsolution. This amounts to computing (c) with = Ab ◦ L a . Notice that, as discussed at the beginning of this subsection, we have to restrain the image of Tb to b ⊕ D for the mapping Ab to be residuated. More directly, it is 184 Synchronization and Linearity obvious that the subset of subsolutions of (4.53) is nonempty if and only if c ≥ b . Then (c) = L a because of (4.29) and since Ab = Ib ⊕D as already discussed. We summarize this discussion in the following lemma. Lemma 4.67 There exists a greatest subsolution x to (4.53) if and only if b ≤ c. Then ◦ x = a \c. Of course, other similar equations may be considered as well, for example those involving the residuated mappings Tb ◦ L a and L a ◦ Tb , or the dually residuated mappings Tb ◦ L a and L a ◦ Tb . 4.5 Fixed-Point Equations, Closure of Mappings and Best Approximation In this section we first study general fixed-point equations in complete dioids. The general results are then applied to special cases of interest. These special cases are motivated by some problems of ‘best approximation’ of elements of a dioid by other elements subject to ‘linear constraints’. This kind of questions will arise frequently in Chapter 5 and therefore they are discussed here in some detail. 4.5.1 General Fixed-Point Equations Let D be a complete dioid and consider the following ‘fixed-point’ equation and inequalities: (x ) = (x ) ≤ (x ) ≥ x, x, x, (4.54) (4.55) (4.56) where is an isotone mapping from D into D. Observe that the notion of ‘inequality’ is somewhat redundant in complete dioids: indeed, Inequalities (4.55) and (4.56) can be written as equations, namely (x ) ⊕ x = x and (x ) ∧ x = x , respectively. Let us again consider the operations defined by (4.15) or (4.16). We will use ⊕ instead of ⊕ and ∧ instead of ⊕. Accordingly, ≥ will mean = ⊕ , or else = ∧ (which would not be the case if ∧ were considered as the ‘sum’); k will denote ◦ . . . ◦ and 0 = ID . We introduce the following additional notation: k times ∗ +∞ = +∞ k ; ∗ = k =0 k . (4.57) k =0 Although (4.15) (respectively (4.16)) has been defined only for l.s.c. (respectively u.s.c.) mappings, there is no difficulty in defining ∗ or ∗ for mappings which are neither l.s.c. nor u.s.c. We will also use the following notation: + +∞ = +∞ k k =1 ; + = k k =1 . 4.5. Fixed-Point Equations, Closure of Mappings and Best Approximation 185 Obviously ∗ + = ID ⊕ ∗ hence + ≥ , (4.58) ≥ ID . Also, because of (4.12), but equality holds true if ∗ ◦ but equality holds true if ∗ , ≥ ∗ + = ◦ , (4.59) is l.s.c. Similarly, = ID ∧ , + hence ∗ ≤ , + (4.60) ≤ ID . Also, because of (4.13), but equality holds true if ◦ ∗ ≤ = ∗◦ + , (4.61) but equality holds true if is u.s.c. If is a closure mapping, then ∗ = , and if is a dual closure mapping, then . Indeed, when is not a closure (respectively dual closure) mapping, ∗ ∗= (respectively ∗ ) is its ‘closure’ (respectively ‘dual closure’) under a semicontinuity assumption. Lemma 4.68 Let be a mapping from a complete dioid D into itself. If then ∗ is the least closure mapping which is greater than . Likewise, if then ∗ is the greatest dual closure mapping which is less than . is l.s.c., is u.s.c., Proof We give the proof of the first statement only. By direct calculations using the assumption of lower-semicontinuity, it is first checked that ∗2 = ∗ . (4.62) Since moreover ∗ ≥ ID , ∗ meets properties (4.39) and it is a closure mapping greater than . Then, assume that is another closure mapping greater than . We have ≥ ID , and successively, ≥ , = 2 ≥ ◦ ≥ 2 and k ≥ k , ∀k ∈ N. Therefore, by summing up these inequalities, +∞ ≥ k = ∗ , k =0 which completes the proof. Corollary 4.69 Let then be a mapping from a complete dioid D into itself. If is a closure mapping ⇔ If ∗ = . is u.s.c., then is a dual closure mapping ⇔ = ∗ . is l.s.c., 186 Synchronization and Linearity This corollary provides an additional equivalent statement to Theorem 4.59, respectively Theorem 4.60. We now return to (4.54), (4.55) and (4.56) and we set D = {x | (x ) = x }, D = {x | (x ) ≤ x }, D = {x | (x ) ≥ x }. (4.63) Obviously, D = D ∩ D . With the only assumption that is isotone, Tarski’s fixed-point theorem (see [22]) states that D is nonempty (thus, D and D are also nonempty). Theorem 4.70 1. Given two mappings and ≥ , if , then D ⊆ D . 2. If C ⊆ D , then x ∈C x ∈ D ; otherwise stated, the set D with the order induced by that of D is a complete inf-semilattice having the same lower bound operation ∧ as D. Moreover, ∈ D . Hence (by Theorem 4.27, or rather by its dual), D is also a complete lattice, but the upper bound operation does not need to be the same as that of D, the latter being denoted ⊕. 3. If is l.s.c., then C ⊆ D implies x ∈C x ∈ D ; otherwise stated, the set D with the order induced by that of D is a complete sup-semilattice having the same upper bound operation ⊕ as D. 4. Statement 3 holds true also for D . 5. In general, D = D ∗ = D ∗ . Otherwise stated, (x ) ≤ x ⇔ ∗ (x ) ≤ x ⇔ ∗ (x ) = x . (4.64) 6. If is l.s.c., then D = ∗ (D). The minimum element is ∗ (ε) which also belongs to D , and thus is the minimum element of this subset too. Proof 1. Straightforward. 2. Since is isotone, if x , y ∈ D , then x ∧ y ≥ (x ) ∧ ( y ) ≥ (x ∧ y ) (see (4.13)). Hence x ∧ y ∈ D . This result obviously extends to any finite or infinite subset of D . Also ≥ ( ), hence ∈ D . The dual of Theorem 4.27 shows that any (finite or infinite) number of elements of D admit an upper bound in D . But this does not mean that, if x , y ∈ D , then x ⊕ y ∈ D , where of course ⊕ denotes the addition of D. 3. Since is assumed to be l.s.c., for x , y ∈ D , x ⊕ y ≥ (x )⊕ ( y ) = (x ⊕ y ). Hence x ⊕ y ∈ D . This result obviously extends to any finite or infinite subset of D . 4.5. Fixed-Point Equations, Closure of Mappings and Best Approximation 187 4. Same argument. 5. If ≥ ID and if x ∈ D , then x ≥ (x ) ≥ x , hence x = (x ) and x ∈ D , thus D = D . In particular, this is true for = ∗ . Now, if x ∈ D , 2 x ≥ (x ) ≥ (x ) ≥ · · · , and by summing up, x ≥ ∗(x ), hence x ∈ D ∗ and D ⊆ D ∗ . But since ∗ ≥ , the reverse inclusion also holds true and thus D = D ∗ = D ∗. 6. From its very definition, D ∗ ⊆ ∗ (D). On the other hand, let x ∈ ∗ (D), hence ∃ y ∈ D : x = ∗( y ). Then, (x ) = ◦ ∗ ( y ) = + ( y ), the latter equality being true because is assumed to be l.s.c. From (4.58), it follows that (x ) = + ( y ) ≤ ∗ ( y ) = x , hence x ∈ D and ∗ (D) ⊆ D . But since it has been proved that D = D ∗ , finally D = D ∗ = D ∗ = ∗ (D). Since ξ = ∗ (ε) is clearly the minimum element of ∗ (D), it is also that of D , but since D ⊆ D , ξ is a minorant of D . It remains to be proved that ξ indeed belongs to D . This follows from the fact that ∗ (ε) = + (ε) = ◦ ∗ (ε), hence ξ = (ξ ). Lemma 4.71 If is residuated, then x≥ (x ) ⇔ ∗ is also residuated, and (x ) ≥ x , that is, D =D . (4.65) Moreover, ∗ = ∗ . (4.66) Proof If is residuated, the fact that ∗ is also residuated is immediate from its definition. Then (4.65) is a direct consequence of (4.19)–(4.20). Equality (4.66) can be proved using (4.29) and (4.35) (or rather its extension to infinitely many operations ⊕ and ∧). The following dual of Theorem 4.70 is stated without proof. Theorem 4.72 1. If ≥ , then D ⊇ D . 2. If C ⊆ D , then x ∈C x ∈ D ; otherwise stated, the set D with the order induced by that of D is a complete sup-semilattice having the same upper bound operation ⊕ as D. Moreover, ε ∈ D . Hence (by Theorem 4.27), D also is a complete lattice, but the lower bound operation does not need to be the same as that of D, the latter being denoted ∧. 3. If is u.s.c., then C ⊆ D implies x ∈C x ∈ D ; otherwise stated, the set D with the order induced by that of D is a complete inf-semilattice having the same lower bound operation ∧ as D. 4. The same statement holds true for D . 188 Synchronization and Linearity 5. In general, D = D ∗ = D ∗. 6. If is u.s.c., then D = ∗ (D). The maximum element is ∗ ( ) which also belongs to D , and thus is the maximum element of this subset too. 4.5.2 The Case ◦ ( x ) = a \x ∧ b Given a and b in a complete dioid D, we consider the equation ◦ x = a \x ∧ b , (4.67) ◦ x ≤ a \x ∧ b . (4.68) and the inequality We may use all the conclusions of Theorem 4.72 since the corresponding is u.s.c. ( is the composition of two u.s.c. mappings, namely L a introduced in §4.4.4, and x → x ∧ b ). Let us evaluate ∗ (x ). By using (f.1) and (f.9) of Table 4.1, it follows that ◦ a \x ∧ b x b 2 (x ) = ∧b = 2 ∧b∧ , a a a . . . x b b k (x ) = k ∧ b ∧ ∧ · · · ∧ k −1 . a a a Taking the lower bound on both sides of these equalities for k = 0, 1, 2, . . . , and using (f.3) (more properly, using its extension to infinitely many operations), it follows that ∗ (x ) = = = = x x b b ∧ 2 ∧ ···∧ b∧ ∧ 2 ∧ ··· a a a a x b ∧ e ⊕ a ⊕ a2 ⊕ · · · e ⊕ a ⊕ a2 ⊕ · · · x∧b e ⊕ a ⊕ a2 ⊕ · · · x ∧b , a∗ x∧ (4.69) with a∗ = e ⊕ a ⊕ a2 ⊕ · · · . (4.70) ◦ Returning to Theorem 4.72, we know that ∗ ( ) = a ∗ \b is the maximum element of both subsets of solutions to (4.67) and (4.68). On the other hand, ε solves (4.68), ◦ but it also solves (4.67), unless a = ε and b = ε (note that ε \ε = ). Because of statement 5 of Theorem 4.72, if x solves (4.68), and a fortiori if it solves (4.67), that ◦ is, if x ∈ D , then x ∈ D ∗ , that is, x = ∗(x ) = a ∗ \(x ∧ b ). This implies that ∗◦ x = a \ x since x ≤ b as a solution of (4.68). We summarize these results in the following theorem. 4.5. Fixed-Point Equations, Closure of Mappings and Best Approximation 189 Theorem 4.73 Consider Equation (4.67) and Inequality (4.68) with a and b given in a complete dioid D. Then, ◦ 1. a ∗ \b is the greatest solution of (4.67) and (4.68); ◦ 2. every solution x of (4.67) and (4.68) satisfies x = a ∗ \x; 3. ε is the least solution of (4.68), and it is also the least solution of (4.67) provided that a = ε or b = ε. Remark 4.74 (Some identities involving a ∗ ) Observe that the notation a ∗ of (4.70) may be justified by the fact that L ∗ = L a∗ , a (4.71) which is a consequence of (4.57), (4.49) and (4.51). Since L ∗ is a closure mapping, a 2 L ∗ = L ∗ and with (4.49) and (4.71), it follows that a a (a ∗ )2 = a ∗ , hence (a ∗ )∗ = a ∗ . (4.72) ◦ Consider again (x ) = a \x (derived from the previous by letting b = mapping is nothing but L a . By using (4.69) in this case, we see that ( L a )∗ = ( L a ∗ ) = L ∗ a , ). This (4.73) the latter equality following from (4.71). Indeed, this is a particular instance of Equation (4.66). Since L ∗ is a dual closure mapping, it is equal to its square, hence, with a (4.73), ∀x , ◦ x a ∗ \x = . ∗ ∗ a a (4.74) Since L a ∗ is a closure mapping, and since its residual is ( L a ∗ ) , from (4.42), it follows that ∀x , a∗x = (a ∗ x ) a∗ in particular, a ∗ = a∗ a∗ . (4.75) From (4.41), we obtain ∀x , 4.5.3 The Case x x = a∗ ∗ a a∗ . (4.76) (x ) = ax ⊕ b Given a and b in a complete dioid D, we consider the equation x = ax ⊕ b , (4.77) 190 Synchronization and Linearity and the inequality x ≥ ax ⊕ b . We may use all the conclusions of Theorem 4.70 since the corresponding direct calculation shows that ∗ (4.78) is l.s.c. A (x ) = a ∗ (x ⊕ b ) . Then, ∗ (ε) = a ∗ b is the minimum element of both subsets of solutions to (4.77) and (4.78). On the other hand, solves (4.78), but it also solves (4.77) if D is Archimedian, unless a = ε and b = . Because of (4.64), if x solves (4.78), and a fortiori if it solves (4.77), then x = a ∗ (x ⊕ b ). This implies that x = a ∗ x since x ≥ b as a solution of (4.78). We summarize these results in the following theorem. Theorem 4.75 Consider Equation (4.77) and Inequality (4.78) with a and b given in a complete dioid D. Then, 1. a ∗ b is the least solution of (4.77) and (4.78); 2. every solution x of (4.77) and (4.78) satisfies x = a ∗ x; 3. is the greatest solution of (4.78), and, if D is Archimedian, it is also the greatest solution of (4.77) provided that a = ε or b = . We conclude this subsection by showing a result which is analogous to a classical result in conventional linear algebra. Namely, in conventional algebra, let A be an n × n matrix and b be an n -dimensional column vector, it is known that all the solutions of Ax = b can be obtained by summing up a particular solution of this equation with all solutions of the ‘homogeneous’ equation Ax = 0. More precisely, if Ax = b and if Ay = 0, then, by summing up the two equations, one obtains A(x + y ) = b . This statement and proof also hold true for equation (4.77) in a dioid, where x = ax (4.79) plays the part of the homogeneous equation. Conversely, in conventional algebra, if Ax = b and Ax = b , by subtraction, y = x − x satisfies Ay = 0. This latter argument cannot be translated straightforwardly to the dioid situation. Indeed, one should first observe that, since ‘adding’ also means ‘increasing’ in a dioid, one cannot recover all solutions of (4.77) by adding something to a particular solution, unless this is the least solution. Moreover, the proof by subtraction has to be replaced by another argument. We are going to see that the ‘minus’ operation − indeed plays a part in proving essentially the expected result, al◦ though, admittedly, things are somewhat more tricky. Since we are playing with − , ◦ we recall that D has to be assumed distributive. Theorem 4.76 Let D be a distributive dioid (which may as well be a matrix dioid—see §4.6). A necessary and sufficient condition for x to be a solution of (4.77) is that x can be written y ⊕ a ∗ b, where y is a solution of (4.79). 4.5. Fixed-Point Equations, Closure of Mappings and Best Approximation 191 Proof Let x be a solution of (4.77). Consider the decomposition of x with respect to a ∗ b (see (f.21) of Table 4.2), that is, x = = ◦ (x ∧ a ∗ b ) ⊕ (x − a ∗ b ) ◦ a ∗ b ⊕ (x − a ∗ b ) ◦ since x ≥ a ∗b by Theorem 4.75. Let r = x − a ∗ b . One has that r = = = ≤ ≤ (ax ⊕ b) − a ∗ b ◦ (ax − a ∗ b ) ⊕ (b − a ∗ b ) ◦ ◦ ax − a ∗ b ◦ ax − a + b ◦ a (x − a ∗ b ) = ar ◦ owing to (4.77), using (f.14), since b − a ∗ b = ε because e ≤ a ∗ , ◦ since a ∗ ≥ a + , using (f.20). Since x = a ∗ b ⊕ r , one also has a ∗ x = a ∗ a ∗ b ⊕ a ∗r = a ∗ b ⊕ a ∗r from (4.62). But def x = a ∗ x (by Theorem 4.75) and thus x = a ∗ b ⊕ y with y = a ∗r . Observe that y = a ∗ y (from (4.62) again). Since r ≤ ar , then y ≤ ay , and hence, multiplying by a ∗ , we obtain y = a ∗ y ≤ a + y ≤ a ∗ y = y . Finally, we have proved that y = ay and that x = a ∗ ⊕ y . 4.5.4 Some Problems of Best Approximation Let us give the practical rationale behind solving inequalities such as (4.68) or (4.78) in the sense of finding an ‘extremal’ solution (respectively the maximum or the minimum). This motivation will be encountered several times in Chapter 5. In a complete dioid D, for some given a , D L a is the subset {x ∈ D | x ≥ ax } . Such subsets enjoy nice properties that will be described later on. Let I be the ‘canonical injection’ from D L a into D, namely I : x → x . Given any b , if b ∈ D L a , there is no solution to I (x ) = b , x ∈ D L a . However, residuation theory provides an answer by looking for the maximum element in D L a which is less than b , or the minimum element in D L a which is greater than b , as long as I is both residuated and dually residuated (which we will check later on). In some sense, these solutions can be viewed as ‘best approximations from above or from below’ of b by elements of D L a . It will be shown that these two residuation problems are directly related to the problems of §4.5.2 and §4.5.3. We first study several equivalent characterizations of D L a and the structure of this subset. Lemma 4.77 1. We have the following equivalences: ◦ ◦ x ≥ ax ⇔ x = a ∗ x ⇔ x ≤ a \x ⇔ x = a ∗ \x . (i) (ii) 2. The subset D L a contains ε and tive ideal, that is, ∀x ∈ D L a , (iii) (4.80) (iv) ; it is closed for addition; it is a left multiplica∀y ∈ D , a fortiori, it is closed for multiplication. x y ∈ DLa ; 192 Synchronization and Linearity 3. The subset D L a is the image of D by L a ∗ and also by ( L a ∗ ) , that is, ∀ y ∈ D, ◦ a ∗ y ∈ D L a and a ∗ \ y ∈ D L a ; the subset D L a is a complete dioid with a ∗ as its identity element (it is a subdioid of D only if a ≤ e and D L a = D). Proof 1. The equivalences (i) ⇔ (ii) and (iii) ⇔ (iv) are direct consequences of Theorems 4.70 and 4.72 (statement 5), respectively, applied to = L a and = L a . The equivalence (i) ⇔ (iii) comes from (4.65). 2. We may use any of the equivalent characterizations of D L a to prove the rest of the statements of this lemma. For each statement, we choose the most adequate characterization, but, as an exercise, we invite the reader to use the other ones to prove the same statements. Of course ε ≥ a ε and ≥ a . We have [x ≥ ax , y ≥ ay ] ⇒ x ⊕ y ≥ a (x ⊕ y ). Also x ≥ ax ⇒ ∀ y ∈ D, x y ≥ a (x y ). 3. The first part of the statement is a direct consequence of statements 6 of Theorems 4.70 and 4.72 (applied to = L a and to = L a , respectively), and of (4.73). For all x ∈ D L a , a ∗ x = x , and hence a ∗ behaves as the identity element in D L a . Therefore, D L a satisfies all the axioms of a dioid; it is even a complete dioid since it is also closed for infinite sums. It is a subdioid of D if a ∗ coincides with e. Since a ≤ a ∗, this implies that a ≤ e. In this case, D L a coincides with D, which is a rather trivial situation. Since D L a = L a ∗ (D), from now on we will prefer the more suggestive notation a ∗D instead of D L a . Let us now return to the problem of the best approximation of b by the ‘closest’ element of a ∗D among those which are either ‘below’ or ‘above’ b . More precisely, we look for the greatest x ∈ a ∗ D such that I (x ) ≤ b or for the least x ∈ a ∗ D such that I (x ) ≥ b. Such problems are well-posed if I is residuated or dually residuated respectively. This is indeed the case thanks to the fact that a ∗ D is a complete dioid containing ε and which are mapped to the same elements of D (and all continuity assumptions needed are satisfied by I ). Consider the former problem of approximation from below, the solution of which is I (b ) by definition. We show that this problem is the same as that of finding the ◦ greatest element of D with (x ) = a \x ∧ b . Indeed, x must be less than b ; it must ◦ ◦ also belong to a ∗ D, hence x ≤ a \x , thus x ≤ a \ x ∧ b . Conversely, this inequality ◦ x , hence it belongs to a ∗ D . Therefore, implies that x is less than b and less than a \ ◦ from the results of §4.5.2, we conclude that I (b ) = a ∗ \b . Similarly, it can be shown that finding I (b ) is the same problem as finding the least element of D with (x ) = ax ⊕ b . The solution has been given in §4.5.3 and therefore I (b ) = a ∗b . We consider the mapping which associates with any b ∈ D its best approximation from below (or from above) in a ∗ D. This mapping is of course surjective (any element of a ∗ D is its own approximation) but not injective: several b having the same best approximation are said to be ‘equivalent’. We can partition D into equivalence classes. The following theorem summarizes and completes this discussion. 4.5. Fixed-Point Equations, Closure of Mappings and Best Approximation 193 Theorem 4.78 1. Let I : a ∗ D → D be such that I (x ) = x. The canonical injection I is both residuated and dually residuated and I (b ) = b , a∗ I (b ) = a ∗ b . 2. The mapping I : D → a ∗ D is a surjective l.s.c. dioid homomorphism. ConsidI ering the equivalence relation ≡ in D (see (4.3)), then for any b ∈ D, its equivalence class [b ] contains one and only one element which can also be viewed as an element of a ∗ D and which, moreover, is the maximum element in [b ]. This element is precisely given by I I (b ) = a ∗ b. 3. The mapping I : D → a ∗ D is surjective and u.s.c. (it is not a homomorphism). I Considering the equivalence relation ≡ in D, then for any b ∈ D, its equivalence class [b] contains one and only one element which can also be viewed as an element of a ∗ D and which, moreover, is the minimum element in [b ]. This ◦ element is precisely given by I I (b ) = a ∗ \b. Proof 1. Already done. 2. The fact that I is a homomorphism is obvious from its explicit expression; it is l.s.c. (as a dual residual) and surjective as already discussed. Each equivalence I class by ≡ has a maximum element b by Lemma 4.49, and an explicit expression for b has been given in Remark 4.57: here = I , thus = I and hence b = I (a ∗ b ). Clearly, b may be considered as an element of a ∗D. If another b belongs at the same time to [b ] (hence a ∗ b = a ∗ b ) and to a ∗ D (hence b = a ∗ b ), then b = a ∗ b = a ∗ b = b and b coincides with b . I 3. Dual arguments can be used here. The main difference is that ≡ is not a congruence because I is only a ∧-morphism (indeed it is u.s.c.), but it does not behave well with respect to ⊕ and ⊗. Concrete applications of these results will be given in Chapter 6 and 5. Remark 4.79 Most of the results of this subsection can be generalized to the situation when the subset a ∗ D characterized by (4.80) is replaced by D (see (4.63)) with residuated. Theorems 4.70 and 4.72, and Lemma 4.71 show that other characterizations of D are x= ∗ (x ) ⇔ x ≤ (x ) ⇔ x = ∗ (x ) = ∗ (x ) ; that this subset contains ε and ; that it is closed for addition (but it is no longer a left multiplicative ideal, unless satisfies (x ) y ≥ (x y ), ∀x , y , which is equivalent to 194 Synchronization and Linearity (x ) y ≤ (x y )); that it is the image of the whole D by best approximation of some b from above in D is given by ∗ and also by ( ∗ ) . The (b ). It is the maximum ∗ ∗ representative of the equivalence class [b ] of b for the equivalence relation ≡ and the only element in [b ] ∩ D . Dual statements hold true for the best approximation from below given by ( ∗) (b ). 4.6 Matrix Dioids 4.6.1 From ‘Scalars’ to Matrices Starting from a ‘scalar’ dioid D, consider square n × n matrices with entries in D. The sum and product of matrices are defined conventionally after the sum and product of scalars in D. The set of n × n matrices endowed with these two operations is also a dioid which is denoted Dn×n . The only point that deserves some attention is the existence of an identity element. Thanks to Axiom 4.7, the usual identity matrix with entries equal to e on the diagonal and to ε elsewhere is the identity element of Dn×n . This identity matrix will also be denoted e and the zero matrix will simply be denoted ε. Remark 4.80 We prefer to move from ‘scalars’ directly to square matrices. In this way the product of two matrices is a matrix of the same type and Dn×n can be given a dioid structure too (multiplication remains an ‘internal’ operation). In fact, from a practical point of view and for most issues that will be considered later on, in particular linear equations, we can deal with nonsquare matrices, and especially with row or column vectors, as well. This is just a matter of completing the nonsquare matrices by rows or columns with entries equal to ε in order to convert them into square matrices, and to check that, for the problem considered, this artificial part does not interfere with the real part of the problem and that it only adds a trivial part to that problem. Notice that if D is a commutative dioid, this is not the case for Dn×n in general. Even if D is entire, Dn×n is not so. Example 4.81 Let n = 2 and A= ε ε a ε . Then A2 = A ⊗ A = ε although A = ε. Of course A ≥ B in Dn×n ⇔ { Ai j ≥ Bi j in D , i = 1, . . . , n , j = 1, . . . , n } . Even if D is a chain, Dn×n is only partially ordered. If D is complete, Dn×n is complete too. Moreover ( A ∧ B )i j = Ai j ∧ Bi j . 4.6. Matrix Dioids 195 If D is distributive, Dn×n is also distributive. Even if D is Archimedian, Dn×n is not Archimedian. Here is a counterexample. Example 4.82 Let n = 2 and consider the matrices A= a ε ε ε and B = ε ε ε b . Then there is obviously no matrix C such that AC ≥ B . In §2.3 it was shown how weighted graphs can be associated with matrices, and moreover, in the case when the entries lie in sets endowed with two operations ⊕ and ⊗ satisfying certain axioms, how the sum and the product of two matrices can be interpreted in terms of those graphs (see §2.3.1). These considerations are valid for matrices with entries belonging to a general dioid. The only point that deserves some attention is the notion of ‘circuit of maximum weight’ in the case when the underlying dioid is not a chain. We will discuss this issue in the case of polynomial matrices in §4.7.3. 4.6.2 Residuation of Matrices and Invertibility We consider the mapping L A from Dn into Dn defined by x → Ax , where A ∈ Dn×n and D is a dioid in which ∧ exists. Returning to Remark 4.80, we could rather define a mapping from Dn×n (which is a dioid unlike Dn ) into Dn×n , namely X → AX and then use it for X ∈ Dn×n having its first column equal to x ∈ Dn and its n − 1 last columns identically equal to ε. The purpose here is to establish a formula for L A and then to study conditions of exact invertibility to the left of matrix A. Indeed, it is not more difficult to consider a ‘matrix of operators’ in the following way. To keep notation simple, we take n = 3 but the generalization is straightforward. Then, consider six dioids {Di }i =1,2,3 and {C j } j =1,2,3 and nine residuated mappings i j from D j to Ci . The mapping maps D1 × D2 × D3 into C1 × C2 × C3 and is defined as follows: x1 y1 11(x 1 ) ⊕ 12 (x 2 ) ⊕ 13 (x 3 ) : x = x 2 → y = y2 = 21(x 1 ) ⊕ 22 (x 2 ) ⊕ 23 (x 3 ) . x3 y3 31(x 1 ) ⊕ 32 (x 2 ) ⊕ 33 (x 3 ) It is interesting to consider as the sum of the following three mappings: 11 (x 1 ) 12(x 2 ) 13 (x 3 ) 1 (x ) = 22 (x 2 ) , 2 (x ) = 23(x 3 ) , 3 (x ) = 21 (x 1 ) . 33 (x 3 ) 31(x 1 ) 32 (x 2 ) The reason for considering these mappings is that their residuals should be obvious since each yi depends upon a single x j (or otherwise stated, they are ‘diagonal’ up to a permutation of ‘rows’). For instance, 21 ( y2 ) x = 3 ( y ) = 32 ( y3 ) . 13 ( y1 ) 196 Then, since Synchronization and Linearity = 1 ⊕ 2 ⊕ 3, ( y) = by application of (4.35), one obtains 11 ( y1 ) ∧ ( y1 ) ∧ 12 13 ( y1 ) ∧ 21 ( y2 ) ∧ ( y2 ) ∧ 22 23 ( y2 ) ∧ 31 ( y3 ) 32 ( y3 ) . 33 ( y3 ) Returning to the mapping L A : x → Ax , we will use the natural notation A \ y for L A ( y ). It should be kept in mind that L A is not a ‘linear’ operator in general, that is, it is not expressible as the left product by some matrix. The following lemma is indeed just a corollary of the considerations just made. Lemma 4.83 If A = Ai j ∈ Dn×n where D is a dioid in which ∧ exists, and y ∈ Dn×1 , then n ◦ ( A \ y )i = ◦ A ji \y j . (4.81) j =1 ◦ Therefore, calculating A \ y amounts to performing a kind of (left) matrix product of the vector y by the transpose of matrix A where multiplication is replaced by (left) ◦ division and addition is replaced by lower bound. Recall that \ is distributive with respect to ∧ as shown by Formula (f.1). With A, D ∈ Dm ×n , B ∈ Dm × p , C ∈ Dn× p , it is straightforward to obtain the ◦ ◦ following more general formulæ for C = A \ B and D = B/C : p m Ci j = ◦ Aki \ Bk j , Di j = k =1 ◦ Bik/C j k . (4.82) k =1 We now consider conditions under which there exists a left inverse to A ∈ Dn×n , that is, an operator B from Dn to Dn such that B ◦ A = I (here we use I instead of IDn to denote identity). If D is a commutative dioid, and B ∈ Dn×n (as A does), Reutenauer and Straubing [118] proved that B A = I ⇔ AB = I . In what follows we do not assume that the operator B can be expressed as the left product by a matrix (see Remark 4.85 below) nor that there exists a right inverse to A. Lemma 4.84 Let D be a complete Archimedian dioid and let A be an n × n matrix with entries in D. A necessary and sufficient condition for the existence of a left inverse operator to A is that there is one and only one entry in each row and column of A which is different from ε and each such an entry has a left inverse. Proof Notice first that if B ◦ A = I , it can be proved that A is injective by using a similar ◦ argument as that used in the proof of (4.27). Then (4.27) again shows that A \ A = I . ◦ Hence x = A \ ( Ax ), ∀x . Fix any i ∈ {1, . . . , n } and set x i = ε and x j = , ∀ j = i . Using (4.81) and the conventional matrix product formula, one obtains n xi = ε = k =1 j =i Ak j x j Aki . (4.83) 4.7. Dioids of Polynomials and Power Series 197 For the lower bound, we can limit ourselves to the indices k ∈ K (i ), where K (i ) = ◦ {k | Aki = ε}, since Aki = ε ⇒ ( Aki \ y = , ∀ y ). This subset is nonempty for all i since otherwise the right-hand side of (4.83) would be equal to for the reason just given, yielding a contradiction (or equivalently, because A is injective and there is no column of A which is identically equal to ε). For all k ∈ K (i ), we consider J (i, k ) = j | j = i, Ak j = ε . If none of these J (i, k ) was empty, we would again reach a contradiction since the right-hand side of (4.83) would again be equal to . Therefore we have proved that for every column i , there exists at least one row k such that Aki = ε and all other entries in the same row are equal to ε. Since such a row can obviously be associated with only one index i , there are exactly n rows in A with a single nonzero entry. Hence A contains exactly n nonzero entries, but since it has no column identically zero, each column must also contain exactly one nonzero entry. Therefore, up to a permutation of rows and columns, A is a diagonal matrix. Then, using vectors x which are columns of the identity matrix, it is easy to prove that each diagonal term has a left inverse. This result generalizes similar results by Wedderburn [128] and Rutherford [120] for Boolean matrices (observe that the Boole algebra is a complete Archimedian dioid). Other extensions using different assumptions on the dioid D are discussed in the notes section. ◦ ◦ Remark 4.85 Of course, the mapping y → A \ y (denoted A \·) is a ∧-morphism, but ◦ when A has a left inverse, A \· is also a ⊕-morphism when restricted to the image of A. As a matter of fact, ◦ ◦ ◦ x ⊕ y = A \ ( Ax ⊕ Ay ) ≥ A \ ( Ax ) ⊕ A \ ( Ay ) , ◦ since A \· is isotone. But the last term is also equal to x ⊕ y , hence equality holds throughout. However, when D is not commutative, this is not sufficient to associate a matrix with this operator. 4.7 Dioids of Polynomials and Power Series 4.7.1 Definitions and Properties of Formal Polynomials and Power Series Starting from a ‘scalar’ dioid D, we can consider the set of formal polynomials and power series in one or several variables with coefficients in D. If several variables are involved (e.g. z 1 and z 2 ), we only consider the situation of commutative variables (e.g. z 1 z 2 and z 2 z 1 are considered to be the same object). Exponents ki of z i can be taken in N or in Z: in the latter case, one usually speaks of ‘Laurent series’. Definition 4.86 (Formal power series) A formal power series in p (commutative) variables with coefficients in D is a mapping f from N p or Z p into D: ∀k = (k1 , . . . , k p ) 198 Synchronization and Linearity k ∈ N p or Z p , f (k ) represents the coefficient of z k1 . . . z pp . Another equivalent represen1 tation is k f (k1 , . . . , k p )z k1 . . . z pp . 1 f= (4.84) k ∈N p or Z p Remember that e.g. f (3) denotes the coefficient of z 3 , not the ‘numerical’ value of the series for z = 3. First, this has no meaning if D is not a dioid of numbers but just an abstract dioid. Second, even if D is a set of numbers, we are not dealing here with numerical functions defined by either polynomials or series, we only deal with formal objects. The relationship between a formal polynomial and its related numerical function was discussed in Chapter 3. Definition 4.87 (Support, degree, valuation) The support supp( f ) of a series f in p variables is defined as supp( f ) = k ∈ Z p | f (k ) = ε . The degree deg( f ) (respectively valuation val( f )) is the upper bound (respectively p lower bound) of supp( f ) in the completed lattice Z , where Z denotes Z ∪ {−∞} ∪ {+∞}. Example 4.88 For p = 2 and f = z 1 z 4 ⊕ z 2 z 3 , deg ( f ) = (2, 4) and val( f ) = (1, 3). 2 12 Definition 4.89 (Polynomial, monomial) A polynomial (respectively a monomial) is a series with a finite support (respectively with a support reduced to a singleton). The set of formal series is endowed with the following two operations: f ⊕ g : ( f ⊕ g )(k ) = f (k ) ⊕ g (k ) , f ⊗ g : ( f ⊗ g )(k ) = f (i ) ⊗ g ( j ) . (4.85) i + j =k These are the conventional definitions of sum and product of power series. The product is nothing other than a ‘convolution’. As usual, there is no ambiguity in using the same ⊕ symbol in (4.84) and for the sum of series. It is easy to see that the set of series endowed with these two operations is a dioid denoted D[[z 1 , ..., z p ]]. In particular, its zero element, still denoted ε, is defined by f (k ) = ε, ∀k , and its identity element e corresponds to f (0, . . . , 0) = e and f (k ) = ε otherwise. Most of the time, we will consider exponents ki ∈ Z; we will not use a different notation when ki ∈ N but we will state it explicitly when necessary. Notice that when k lies in Z p , the definition of f ⊗ g involves infinite sums: for this definition to make sense, it is then necessary to assume that D is complete. This is not required for polynomials. The subset of polynomials is a subdioid of D[[z 1 , ..., z p ]] denoted D[z 1 , ..., z p ]. One has that f ≥ g ⇔ { f (k ) ≥ g (k ) , ∀k } . 4.7. Dioids of Polynomials and Power Series 199 Of course, D[[z 1 , ..., z p ]] is only partially ordered even if D is a chain. The dioid D[[z 1 , ..., z p ]] is commutative if D is commutative (this holds true because we consider commutative variables only). If D is complete, D[[z 1 , ..., z p ]] is complete, but D[z 1 , ..., z p ] is not. Here is a counterexample. Example 4.90 For p = 1, consider the infinite subset of polynomials z k sum is not a polynomial. k ∈N . Their However, if lower bounds can be defined in D, in particular when D is complete, these lower bounds extend to D[z 1 , ..., z p ] and D[[z 1 , ..., z p ]] ‘coefficientwise’. Distributivity of D implies distributivity of D[[z 1 , ..., z p ]]. But even if D is Archimedian, D[[z 1 , ..., z p ]] and D[z 1 , ..., z p ] are not necessarily so when exponents are in N p . Here is a counterexample. Example 4.91 Let p = 1, f = z and g = e. Obviously, there is no h such that f h ≥ g , since z is always a factor of f h , that is, ( f h )(0) = ε, which cannot dominate g (0) = e. Lemma 4.92 If D is Archimedian, D[z 1 , ..., z p ] and D[[z 1 , ..., z p ]] are Archimedian too provided the exponents lie in Z p . Proof Given f = ε and g (Laurent series or polynomials), we must find h such that f h ≥ g . Since f = ε, there exists at least one such that f ( ) = ε. Let f denote the corresponding monomial, that is, f ( ) = f ( ) and f (k ) = ε when k = . Of course, f ≥ f , hence it suffices to find h such that f h ≥ g . One has that ( f h )(k ) = f ( )h (k − ). Since D is Archimedian, for all k , there exists an ak such that f ( )ak ≥ g (k ). It suffices to set h (k ) = ak+ . Of course, if g is a polynomial, h can be a polynomial too. Lemma 4.93 We consider supp(.) as a mapping from the dioid D[[z 1 , ..., z p ]] into the p dioid 2Z , ∪, + in which ∧ is ∩, and deg (.) and val(.) as mappings from the dioid p D[[z 1 , ..., z p ]] into the dioid Z, max, + in which all operations are componentwise, in particular ∧ is min componentwise. Then supp( f ⊕ g ) = supp( f ) ⊕ supp(g ) , (4.86) supp( f ∧ g ) = supp( f ⊗ g ) ≤ deg( f ⊕ g ) = supp( f ) ∧ supp(g ) , supp( f ) ⊗ supp(g ) , deg( f ) ⊕ deg(g ) , (4.87) (4.88) (4.89) deg( f ) ∧ deg(g ) , deg( f ) ⊗ deg(g ) , val( f ) ∧ val(g ) , val( f ) ⊕ val(g ) , (4.90) (4.91) (4.92) (4.93) val( f ) ⊗ val(g ) . (4.94) deg( f deg( f val( f val( f ∧ g) ⊗ g) ⊕ g) ∧ g) = ≤ = = val( f ⊗ g ) ≥ 200 Synchronization and Linearity Of course, equalities and inequalities involving the lower bound in D[[z 1 , ..., z p ]] are meaningful only if this lower bound exists. Moreover, all inequalities become equalities if D is entire, and then supp and deg are homomorphisms, whereas val would be a homomorphism if considered as a mapping from D[[z 1 , ..., z p ]] into Z, min, + p . Proof Equation (4.86)—respectively, (4.87)—results from the fact that f (k ) ⊕ g (k ) = ε ⇔ { f (k ) = ε or g (k ) = ε} —respectively, f (k ) ∧ g (k ) = ε ⇔ { f (k ) = ε and g (k ) = ε} . Inequality (4.88) results from the fact that ( f ⊗ g )(k ) = ε ⇒ {∃i, j : i + j = k , f (i ) = ε , g ( j ) = ε} . But the converse statement is also true if D is entire, proving equality in (4.88). Now, to prove the corresponding statements for deg (respectively, val), it suffices to take the upper bound (respectively, the lower bound) at both sides of (4.86)–(4.88) and to observe that, in the particular case of Z, max, + respect to one another. p , ⊕, ⊗, ∧ are distributive with Remark 4.94 Since ⊕, and therefore ≤, operate componentwise for power series, it is clear that ∧ operates also componentwise, as was claimed in Lemma 4.93. However, there is another interesting way of viewing this question. Consider a family { f j } j ∈ J ⊆ D[[z ]] (we limit ourselves to a single variable z simply to alleviate the notation) and the expression fj = f j (k )z k . j∈J j ∈ J k ∈Z Note that the general formula of distributivity of any abstract operation to some other operation is a jk = j ∈ J k∈ K a j ϕ( j ) , with respect (4.95) ϕ∈ K J j ∈ J where K J is the set of mappings from J into K . Applying this formula to our situation, we obtain fj = f j (ϕ( j ))z ϕ( j ) . j∈J ϕ∈Z J j ∈ J Then, since for any a , b ∈ D, az k ∧ bz = ε whenever k = , we can limit ourselves to constant mappings ϕ in the above formula. Therefore, we finally obtain fj = j∈J which is the expected result. f j (k )z k , k ∈Z j ∈ J (4.96) 4.7. Dioids of Polynomials and Power Series 4.7.2 201 Subtraction and Division of Power Series ◦ Since ⊕ operates componentwise, so does − for power series. Let us consider \ ◦ which is more involved since ⊗ is a ‘convolution’. We again limit ourselves to a single variable z without loss of generality. We also assume that the exponent k ranges in Z rather than in N. A power series f with exponents in N is nothing but a series with exponents in Z for which f (k ) = ε for k < 0. However, if one considers f = z , for ◦ ◦ example, it should be clear, from the very definition of \, that z \e = z −1 if exponents ◦ are allowed to range in Z and z \e = ε if exponents are restricted to belong to N. Since we consider k ∈ Z, recall that D should be complete. Lemma 4.95 Under the foregoing assumptions, for any given f and h in D[[z ]], one has h = f def g= k ∈Z ∈Z h( ) zk . f ( − k) (4.97) Proof This is another consequence of the considerations preceding Formula (4.81). If h = f ⊗ g , then h ( ) = k (g (k )), where k (x ) = f ( − k )x . Therefore, k g (k ) = k (h ( )), which yields (4.97). Remark 4.96 There is another way to derive (4.97), which makes use of Formula (f.3) of Table 4.1, plus a remark concerning the division by monomial, namely: h ( )z = f (m )z m h( ) z f (m ) −m . (4.98) This formula should be obvious, but note that it is stronger than the inequality derived from (f.2). Now, to derive (4.97), we have k g (k )z k = n m h(n) zn f (m ) zm by definition, h(n) zn = = m h (n ) n f (m ) = m k h (m + k ) k f (m ) z by setting n = m + k , = k m h (m + k ) k z f (m ) by (4.96), = 4.7.3 m k n by (f.3), f (m ) zm z n −m h( ) f ( −k ) zk by (4.98), by setting m = − k . Polynomial Matrices Since D[z 1 , ..., z p ] is a dioid, we may consider square n × n matrices with entries n ×n in this dioid: this is the dioid D[z 1 , ..., z p ] . Here, we just want to return to the 202 Synchronization and Linearity interpretation of such matrices in terms of precedence graphs, and discuss the issue of ‘path or circuit of maximum weight’ through an example. Example 4.97 Suppose D is the dioid of Example 4.12 and let p = 1 and n = 2. Consider the matrix ε e⊕z A= , 3 ⊕ z e ⊕ 2z Figure 4.7 features the weighted graph G ( A). We have 3 ⊕z node 1 node 2 e ⊕ 2z e⊕z Figure 4.7: A graph representation of a polynomial matrix A2 = 3 ⊕ 3z ⊕ z 2 3 ⊕ 5z ⊕ 2z 2 e ⊕ 2z ⊕ 2z 2 3 ⊕ 3z ⊕ 4z 2 . The term ( A2 )22 = 3 ⊕ 3 z ⊕ 4 z 2 gives the upper bound of weights of circuits of length 2 passing through node 2. But no circuit of length 2 corresponds to this weight in Figure 4.7. This is due to the fact that D[z 1 , ..., z p ] is only partially ordered. To figure out what happens, one may adopt the alternative representation shown in Figure 4.8, which amounts to viewing A as being equal to the sum B ⊕ C of two matrices with monomial 3 z node 1 node 2 2z e z e Figure 4.8: Another graph representation of the same matrix entries (according to the rule of parallel composition of graphs explained in §2.3.1— the pair ( B , C ) is of course not uniquely defined). The advantage is that monomials of the same degree can always be compared. It is then seen that the monomial 3 of ( A2 )22 is obtained by going from node 2 to node 1 using the arc weighted 3 and coming back using the arc weighted e; the monomial 3 z of ( A2 )22 is obtained by going from node 2 4.8. Rational Closure and Rational Representations 203 to node 1 using the arc weighted 3 and coming back using the arc weighted z ; finally, the monomial 4 z 2 is obtained by using the loop weighted 2 z twice. Therefore, each entry of A2 can always be interpreted as the weight of a path or a circuit made up of arcs belonging to either G ( B ) or G (C ). 4.8 Rational Closure and Rational Representations The main motivation of this section arises from system theory over dioids, and in particular from realization theory. Therefore, this material forms a bridge to Chapter 5. However, since the results hold true in general dioids, this theory of rational calculus has its natural place in the present chapter. 4.8.1 Rational Closure and Rational Calculus We consider a complete dioid D and a subset T ⊆ D which contains ε and e. For example, think of D as the set of formal power series in one or several variables with coefficients in a complete dioid, and of T as the corresponding subset of polynomials. In general, T does not need to be a subdioid. Definition 4.98 (Dioid closure) The dioid closure of a subset T of a dioid D, denoted T , is the least subdioid of D containing T . This definition is well-posed since the set of subdioids containing T is nonempty (it contains D itself) and this set has a minimum element (for the order relation ⊆) since the intersection (lower bound) of a collection of subdioids is a subdioid. The termi= T . Notice that we do not nology ‘closure’ is justified because T ⊇ T and T require T to be complete. It should be clear that T contains, and is indeed reduced to, all elements of D which can be obtained by finite sets of operations ⊕ and ⊗ involving elements of T only. The idea is now to consider ‘scalar’ equations like (4.77), subsequently called ‘affine equations’, with data a and b in T (or, equivalently, in T ). The least solution a ∗b exists in D since D is complete, but it does not necessarily belong to T since the star operation involves an infinite sum. Thus, one may produce elements out of T from data in T or T . One can then use these new elements as data of other affine equations, and so on and so forth. The ‘rational closure’ of T , hereafter defined, is essentially the stable structure that contains all elements one can produce by repeating these operations a finite number of times. We shall see that if we consider matrix, instead of scalar, affine equations (with data in T ), but of arbitrary large, albeit finite, dimensions, it is not necessary to repeat the process of using solutions as data for further equations. In the case when D is a commutative dioid, it is even enough to limit ourselves to weighted sums of solutions to sets of decoupled scalar equations (the weights belonging to T ). Definition 4.99 (Rational closure) The rational closure of a subset T of a complete dioid D, denoted T , is the least subdioid of D containing T and all finite sums, products and star operations over its elements. A subset T is rationally closed if T = T. 204 Synchronization and Linearity This definition is well-posed for the same reason as previously. Moreover, it is clear that (T ) = T and that T = T (hence the terminology ‘closure’). If we go from scalars to matrices, we may first consider the subset T n×n ⊆ Dn×n of n × n matrices with entries in T and its rational closure T n×n . This is a subdioid of Dn×n . On the other hand, we may consider the subdioid (T )n×n ⊆ Dn×n . We state a first result which will be needed soon in its present form, but which will be improved later on (Theorem 4.104). Lemma 4.100 The subdioid T n×n is included in the subdioid (T )n×n . The proof is based on the following technical lemma. Lemma 4.101 For a ∈ Dn×n partitioned into four blocks, namely a= a12 a22 a11 a21 , (4.99) a ∗ is equal to ∗ ∗ ∗ ∗ a11 ⊕ a11 a12 (a21 a11 a12 ⊕ a22 )∗ a21 a11 ∗ ∗ (a21 a11 a12 ⊕ a22 )∗ a21 a11 ∗ ∗ a11 a12 (a21 a11 a12 ⊕ a22 )∗ ∗ (a21 a11a12 ⊕ a22 )∗ . (4.100) Proof We use the fact that a ∗ is the least solution of equation x = ax ⊕ e, which yields the system x 11 x 12 x 21 = = = a11 x 11 ⊕ a12 x 21 ⊕ e , a11 x 12 ⊕ a12 x 22 , a21 x 11 ⊕ a22 x 21 , (4.101) (4.102) (4.103) x 22 = a21 x 12 ⊕ a22 x 22 ⊕ e . (4.104) We can solve this system in a progressive manner, using Gaussian elimination. From (4.101) and (4.102), we first calculate ∗ x 11 = a11 (a12 x 21 ⊕ e) and ∗ x 12 = a11 a12 x 22 , which we substitute into (4.103)–(4.104). These equations are then solved for x 21 , x 22, and the solutions are placed back in the equations above, yielding the claimed formulæ. Note that placing partial least solutions in other equations preserves the objective of getting overall least solutions since all operations involved are isotone. Another path to solve the system is to first get x 21 and x 22 from (4.103)–(4.104), and then to calculate x 11 and x 12 . This amounts to interchanging the roles of a11 and a12 with those of a22 and a21 , respectively. Identifying the expression of the solution which one gets in this way with the previous expression, the following identity is obtained: ∗ ∗ ∗ ∗ ∗ (a21 a11 a12 ⊕ a22 )∗ = a22 ⊕ a22 a21(a12 a22 a21 ⊕ a11 )∗ a12 a22 . (4.105) Proof of Lemma 4.100 The sums and products of matrices in T n×n belong to (T )n×n . To prove that T n×n is included in (T )n×n , it remains to be proved that one stays 4.8. Rational Closure and Rational Representations 205 in (T )n×n when performing star operations over elements of T n×n . This is done by induction over the dimension n . The statement holds true for n = 1. Assuming it holds true up to some n − 1, let us prove it is also true for n . It suffices to consider a partitioning of an element of T n×n into blocks as in (4.99) such that a11 is (n − 1) × (n − 1)-dimensional. By inspection of (4.100), and by using the induction assumption, the proof is easily completed. 4.8.2 Rational Representations We are going to establish some results on representations of rational elements. Here the connection with realization theory of rational transfer functions should be clear. For reasons that will become more apparent in Chapter 5, we distinguish two particular subsets of T , namely B and C . There is no special requirement about these subsets except that they both must contain ε and e. Hence, we allow B and C to be overlapping and even identical. The extreme cases are B = C = {ε, e} and B = C = T . Theorem 4.102 The rational closure T coincides with the set of elements x which can be written as x = cx A∗ bx , x (4.106) where Ax ∈ T n x ×n x , bx ∈ Bn x ×1 (column vector), and cx ∈ C 1×n x (row vector). The dimension n x is finite but may depend on x. For short, a representation of x like in (4.106) will be called a (B, C )-representation. Proof Let F be the subset of all elements of D having a (B, C )-representation. This subset includes T because of the following identity: x= e ε ε ε x ε ∗ ε e . Suppose that we have already proved that F is stable by addition, multiplication and star operation, which we postpone to the end of this proof, then F is of course equal to its rational closure F . Since F includes T , F = F includes T . On the other hand, from Lemma 4.100, A∗ has its entries in T . From (4.106), it is thus clear that F is x included in T . Finally, we conclude that F = F = T . For the proof to be complete, we have to show that, considering two elements of F , say x and y , which, by definition, have (B, C )-representations, x ⊕ y , x ⊗ y and x ∗ also have (B, C )-representations. This is a consequence of the following formulæ: cx A∗ bx ⊕ c y A∗ b y = x y cx Ax ε cy cx A∗ bx ⊗ c y A∗ b y = x y cx ε ε Ax ε ε ε Ay bx ε ε ∗ bx by , ∗ ε ε cy ε , Ay by 206 Synchronization and Linearity (cx A∗ bx )∗ = x ε Ax cx e bx ε ∗ ε e . These formulæ can be proved by making repeated use of (4.100). However, the reader already familiar with system theory will have recognized the arithmetics of transfer functions in parallel, series and feedback. Remark 4.103 As already mentioned, B and C can be any subsets of T ranging from {ε, e} to T itself. Let B = {ε, e}. For a fixed x ∈ T , and for two pairs (B , C ) and (B, C ) such that B ⊆ B and C ⊆ C , a (B , C )-representation can also be considered as a (B, C )-representation. Conversely, every (B, C )-representation can yield a (B, B)representation thanks to the formula (which is again a consequence of (4.100) used repeatedly) ∗ Abε ε c A∗ b = ε ε e ε ε ε e . cεε ε However, we note that the corresponding inner dimension n increases when passing from the (B, C )- to the (B, B)-representation (which is also a (B , C )-representation). In fact, this discussion cannot be pursued satisfactorily until one is able to clarify the issue of ‘minimal representation’, that is, for a given pair (B, C ), and for a given x ∈ T , a representation yielding the minimal (canonical) value of n x . This problem is yet unsolved. Theorem 4.104 The subdioids T n×n (T )n×n is rationally closed. and (T )n×n are identical. Consequently, Proof The inclusion in one direction has been stated in Lemma 4.100. Therefore, we need only to prove the reverse inclusion. Let X ∈ (T )n×n and assume that n = 2 for the sake of simplicity and without loss of generality. Then X can be written as X= x1 x3 x2 x4 with entries x i ∈ T . Every x i has a (B, C )-representation consisting of a triple ( Axi , bxi , cxi ), with Axi ∈ T ni ×ni . Then X = = cx1 A∗1 bx1 x cx3 A∗3 bx3 x c x1 ε c x2 ε cx2 A∗2 bx2 x cx4 A∗4 bx4 x ε c x3 ε c x4 A x1 ε ε ε ε A x2 ε ε ε ε A x3 ε ∗ b x1 ε ε ε ε b x3 A x4 ε ε b x2 . ε b x4 The inner dimension is 4=1 n i , but it can be artificially augmented to the next multiple i of 2 (and more generally of n ) by adding enough rows and columns with entries equal to ε in the matrices. Then, since the outer dimension is 2 and the inner dimension 4.8. Rational Closure and Rational Representations 207 is now a multiple of 2, by appropriately partitioning these matrices in 2 × 2 blocks, one may consider this representation as a B2×2 , C 2×2 -representation. Application of Theorem 4.102 in the dioid D2×2 proves that X belongs to T 2×2 . 4.8.3 Yet Other Rational Representations So far, we have considered representations of elements of T by triples ( A, b , c), such that the entries of A are taken in T , whereas those of b and c are allowed to lie in subsets B and C of T which are arbitrary, up to the fact that they must contain B = {ε, e}. Recall that B and C need to be neither distinct nor disjoint. As an example to be encountered in Chapter 5, consider again T as the subset of polynomials of D which is the dioid of formal power series in one or several variables. Then B and C may be subsets of particular polynomials, or they may be reduced to B. Since formal variables are going to be interpreted as ‘shift’ or ‘delay’ operators in the system theory setting, it means that no ‘dynamics’ is allowed in b and c in the latter case, whereas ‘some’ dynamics is allowed in the former case. In Chapter 5, we are going to consider a two-dimensional domain description involving two shift operators γ and δ in the event, respectively the time, domain. To describe the connection between this two-dimensional description and the more classical one-dimensional description (either in the event or in the time domain), it is necessary to study other rational representations. They correspond to other choices for the subsets in which the entries of A, b , c assume their values. Let us introduce the following notation. For two subsets U and V of D, let k def U ⊗V = x ci bi , ci ∈ U , bi ∈ V ∃k ∈ N : x = . i =1 The notation V ⊗U is similarly defined. Notice that ε belongs to the subsets so defined. We now consider a ‘covering’ (U , V ) of T (that is, T = U ∪ V but U ∩ V does not need to be empty). We always assume that B ⊆ U when considering U . Theorem 4.105 The rational closure T coincides with the set of elements x which can be written as in (4.106), but with entries of Ax lying in U ⊗ V , those of bx in U ⊗ B and those of cx in C (we call this an observer representation). Alternatively, there exist other representations such that the entries of Ax are in V ⊗ U , those of bx are in B, and those of cx are in C ⊗ U (we call these controller representations). Proof Only the former statement will be proved. The latter can be proved similarly. We first prove that if x ∈ T , then x does have an observer representation. From Theorem 4.102, we know that x has a (B, C )-representation, say ( A, b , c). The matrix A can be written AV ⊕ AU in such a way that AV contains only entries which are elements of V , and AU only elements of U . If V ∩ U is nonempty, entries of A which lie in the intersection of those sets may be arbitrarily put either in AV or in AU , or even in both matrices thanks to Axiom 4.9. Therefore, we have x = c ( AV ⊕ AU )∗ b . 208 Synchronization and Linearity Consider (4.105) with a11 = ε, a12 = e, a21 = a and a22 = b . We obtain (a ⊕ b )∗ = = b∗ ⊕ b∗ a (b∗ a )∗ b∗ , (e ⊕ (b∗ a )+ )b∗ , hence the identity (a ⊕ b )∗ = (b∗ a )∗ b∗ . (4.107) If we use this with a = AV and b = AU , we obtain x = c A∗ AV U ∗ A∗ b , U which is an observer representation. Conversely, if x has an observer representation ( Ax , bx , cx ), then x ∈ T . As a matter of fact, it is easy to realize that the entries of Ax , bx , cx lie in subsets of T (in particular, remember that T = T ). The conclusion follows from Theorem 4.102. Remark 4.106 Another form of (4.107) is obtained by letting a11 = ε, a21 = e, a12 = a , a22 = b in (4.105), which yields (a ⊕ b )∗ = b∗ (ab∗ )∗ . (4.108) If we return to our example of D being the dioid of power series in two variables γ and δ (say, with exponents in N), we may for example assume that T = {ε, e, γ , δ }— the dioid closure of which is the dioid of polynomials in γ , δ —and we may choose B = C = {ε, e}, U = {ε, e, γ } and V = {δ }. A more explicit interpretation of this situation will be discussed in Chapter 5. 4.8.4 Rational Representations in Commutative Dioids We have defined ‘rational elements’ (i.e. elements of T ) as those elements which can be obtained by a finite number of operations such as sums, products and stars, starting from elements of T . This can also be viewed as the process of obtaining (least) solutions from equations like (4.77), which in turn serve as coefficients of further equations of the same type, this process being repeated a finite number of times, starting with coefficients in T . The results of the previous subsections showed that, indeed, all rational elements can also be obtained by solving equations with coefficients in T only once, but these should be matrix equations—or systems of equations—of arbitrary, albeit finite, dimensions. What we are going to discuss here is the possibility, in the context of commutative dioids (Definition 4.10), of limiting ourselves to linear combinations of solutions of scalar equations with coefficients in T , or otherwise stated, of solving only ‘decoupled’ systems of equations with coefficients in T . 4.8. Rational Closure and Rational Representations 209 Lemma 4.107 Let D be a complete commutative dioid, then ∀a , b ∈ D , (a ⊕ b )∗ = a ∗ b∗ . (4.109) Proof One way to prove this is by direct calculations, starting from the very definition of the left-hand side above, and reducing it to the right-hand side using commutativity. Alternatively, one may start from (4.107) (for scalars) and remark that (a ⊕ b )∗ = (b∗ a )∗ b∗ = (b∗ a ∗ )b∗ = a ∗ (b∗ )2 = a ∗b∗ when commutativity holds true. A third, maybe more involved, but interesting, argument is based on considering an equation like (4.54) with (x ) = ax ⊕ xb ⊕ c. With or without commutativity, the least solution ∗ (ε) is easily proved to be equal to a ∗ cb∗ . But, with commutativity, the same equation can be written x = (a ⊕ b )x ⊕ c, the least solution of which is (a ⊕ b )∗ c. Setting c = e, we obtain the identity (4.109). With this formula at hand, (4.100) can be given a new useful form, at least when a22 is a scalar (i.e. a 1 × 1 block). Lemma 4.108 In a commutative dioid, for a matrix a partitioned into four blocks as in (4.99), where a22 is 1 × 1, and a12 and a21 are respectively column and row vectors, then a ∗ is equal to ∗ ∗ a11 (e ⊕ a22 a12 a21 (a11 ⊕ a12 a21 )∗ ) ∗ a22 a21 (a11 ⊕ a12 a21 )∗ ∗ a22(a11 ⊕ a12 a21 )∗ a12 ∗ a22 (e ⊕ a21 (a11 ⊕ a12 a21 )∗ a12 ) . (4.110) ∗ Proof Since a22 and a21 a11a12 are scalars, using (4.109), one obtains ∗ ∗ ∗ (a21 a11 a12 ⊕ a22 )∗ = (a21 a11 a12 )∗ a22 . Moreover, from (4.105) with a22 = ε, we find that ∗ (a21 a11 a12 )∗ = e ⊕ a21 (a11 ⊕ a12 a21)∗ a12 . Therefore ∗ ∗ ∗ (a21 a11 a12 ⊕ a22 )∗ = a22 ⊕ a22 a21 (a11 ⊕ a12 a21)∗ a12 . These are the lower right-hand blocks of (4.100) and (4.110), respectively. Consider now the upper right-hand block of (4.100) which is equal (see (4.100)) to ∗ the lower right-hand block premultiplied by a11 a12 . Using (4.108), ∗ ∗ a11 a12 (a21 a11 a12 ⊕ u )∗ ∗∗ = a22 a11 a12 e ⊕ a21 (a11 ⊕ a12 a21)∗ a12 ∗∗ ∗ ∗ = a22 a11 a12 e ⊕ a21 a11(a12 a21 a11 )∗ a12 ∗∗ ∗ = a22 a11 e ⊕ (a12 a21 a11 )+ a12 ∗∗ ∗ = a22 a11 (a12 a21 a11)∗ a12 ∗ ∗ = a22 (a11 ⊕ a12 a21 )∗ a12 . Similar calculations yield the left-hand blocks of (4.110). 210 Synchronization and Linearity Theorem 4.109 Let a ∈ Dn×n where D is a complete commutative dioid. Then all entries of a ∗ are finite sums of the form i ci (bi )∗ , where each ci is a finite product of entries of a and each bi is a finite sum of weights of circuits of the precedence graph G (a ). Proof The proof is by induction. The statement is true for n = 1. Suppose that it also holds true up to dimension n − 1. Consider the partitioning (4.99) of A with a22 scalar. In the graph associated with a , matrix a12 a21 describes the weights of paths of length 2 which start from one of the first n − 1 nodes, then go to the n -th node, and finally come back to one of the first n − 1 nodes. The paths coming back to their initial nodes are circuits of length 2, among other circuits of the graph associated with a . Matrix a12 a21 can be considered as describing a graph with n − 1 nodes in which the previous paths or circuits of length 2 can be considered as arcs (i.e. paths of length 1) or loops. As for a11 , it describes the subgraph associated with the first n − 1 nodes. Matrix a11 ⊕ a12 a21 corresponds to a graph with the same n − 1 nodes but with weights calculated as upper bounds of the weights of the two previous graphs. The weights of paths of this graph are among the weights of paths of the graph of a . The induction assumption applies to (a11 ⊕ a12 a21 )∗ . The conclusion follows easily by considering the expressions of the four blocks in (4.110) and by remembering that products of stars of scalar elements can be converted to stars of sums of these elements using (4.109). Theorem 4.110 Let T be a subset of the complete commutative dioid D. Then, T coincides with the set of elements x which can be written as kx x= ci (bi )∗ , (4.111) i =1 where kx is an arbitrary finite integer and ci , bi ∈ T (the dioid closure of T ). This is a straightforward consequence of Theorems 4.60 and 4.109. 4.9 4.9.1 Notes Dioids and Related Structures Dioids, as defined and studied in this chapter, are members of a larger family of algebraic structures that stem from various fields of mathematics and from several works motivated by a wide range of applications. We shall not attempt to be exhaustive in describing the origins of these theories. The interested may refer e.g. to [66] where some references are given. In all these works, the set of axioms and the terminology are subject to some variations. The notion of ‘semiring’ has already been defined in Chapter 3. ‘Absorbing semiring’ is sometimes used when the first operation is supposed to be idempotent (Axiom 4.9), but ‘idempotent semiring’ would be a more appropriate denomination in this case. As already discussed, this axiom prevents the addition from being cancellative. This is why Gondran and Minoux reject the name ‘semiring’ which may suggest that the structure can be embedded into that of a ring. Hence they propose the appellation ‘dioid’ which they attribute to Kuntzmann [80]. In French (or Latin), ‘di’ is a prefix for ‘two’ as ‘mono’ is a prefix for ‘one’. A ‘dioid’ is thus ‘twice a monoid’. 4.9. Notes 211 As discussed in §4.3.2, Axiom 4.9 is closely related to the introduction of a partial order relation and to a semilattice structure. However, weaker axioms may serve the same purpose. The following axiom is proposed in [66]: {a = b ⊕ c and b = a ⊕ d } ⇒ a = b , (4.112) and this axiom is sufficient for stating Theorem 4.28. We retained Axiom 4.9 because all dioids of interest to us naturally satisfy it. An example of a dioid satisfying (4.112) but not Axiom 4.9 is (R+ , +, ×). However, this example corresponds to a cancellative addition and it is natural to embed this structure in (R, +, ×), that is, in the conventional algebra. Helbig [69], who himself refers to Zimmermann [130], defines an ‘extremal algebra’ with axioms which are very close to but stronger than ours on two points: • the multiplication is commutative; • Axiom 4.9 is replaced by the stronger one: x ⊕ y = either x or y . As stated by Lemma 4.30, the latter axiom corresponds to a total order. Cuninghame-Green [49] studies structures that we called Rmax and Rmin under the name of ‘minimax algebra’. The term ‘path algebra’ may also be found, owing to the relevance of these particular dioids in graph theory. Reference [34] is about ‘incline algebra’ which is a structure close to our dioid algebra, but with the following additional axiom: ∀a , b , a ⊕ ab = a , (4.113) which says that ab ≤ a . This suggests that the multiplication is close to the lower bound (although these two operations may be different), and that every element is less than e (the identity element—although the existence of an identity element is not required a priori). Indeed, Proposition 1.1.1 of [34] states that an incline algebra is exactly a distributive lattice (that is, multiplication and lower bounds are the same) if a 2 = a (that is, the multiplication itself is idempotent). The dioid of Example 4.15 is an incline algebra. The structure ([0, 1], max, ×) is an example of an incline algebra for which multiplication and lower bound do not coincide. Observe that Axiom (4.113) prevents the corresponding dioid from being Archimedian, unless it is isomorphic to the Boole algebra (Example 4.16). Finally, since an idempotent addition can indirectly be introduced through the introduction of a semilattice or a lattice structure, in the literature on ordered sets, owing to the properties of the second operation (multiplication), the name ‘lattice-ordered semigroup’ is frequently encountered. 4.9.2 Related Results Results of §4.3 and §4.4, which are not very specific to dioid theory, are largely based on the corresponding quoted references, with a few variations with respect to terminology (these variations have been indicated) and to presentation. The main topic of §4.5 is about solving implicit equations like x = ax ⊕ b for example. Unlike [67] or Chapter 3 of this book, we only considered the case of complete dioids (in which a ∗ always exists), which makes the problem of the existence of a solution easier, but at the price of losing uniqueness in general (for example, in an Archimedian dioid, is a trivial solution of x = ax ⊕ b ). Theorem 4.76 is an original result, first published in [44] with a slightly different proof. In this same reference, a discussion of the form of the general solution of the homogeneous equation (4.79) can be found. 212 Synchronization and Linearity The problem of invertibility of matrices (§4.6.2) has been considered by several authors, first for Boolean matrices ([128], [120]), then for more general dioids ([23], [34]). Formula (4.81) appears in [49] in the special case of Rmax . As for the condition of exact invertibility (see Lemma 4.84 which appears here for the first time), it is similar to that obtained in the above mentioned references, but under quite different assumptions: like [34], reference [23] is more or less in the context of an incline algebra—or at least of an algebra in which every element lies between ε and e —whereas our result deals with Archimedian dioids. Finally, the rational theory of §4.8, which appeared first in [44], is largely inspired by the use of it we are going to make in Chapter 5 in a system theoretic context. Part III Deterministic System Theory 213 Chapter 5 Two-Dimensional Domain Description of Event Graphs 5.1 Introduction In Chapter 2 a class of Petri nets called event graphs has been discussed. This class pictorially describes discrete event systems. The dynamics of such systems is essentially driven by synchronization phenomena. In §2.5, it was shown that linear equations can be obtained for event graphs by appealing to some descriptive variables and to some associated dioid algebras. To be precise, we will call an ‘event’ any occurrence which is instantaneous, such as the beginning of a transition firing, the end of a transition firing (these two events are simultaneous if transition firings are themselves immediate), the arrival of a token at, or the departure of a token from, a place, etc. In fact, we distinguish ‘events’, which are unique since they occur only once, from ‘types of events’ which refer to families of events of the same nature. For example, ‘a message pops up on the screen of my computer’ is a type of event, whereas ‘a message pops up on the screen of my computer at five o’clock’ is a particular event of this type. In the context of event graphs, a type of event will very often correspond to the successive firings of a particular transition (we assume that firings have a zero duration). In the ‘dater’ description, one essentially deals with variables d (k ) associated with types of events such that, for a given type: • k is an index in Z which numbers successive events of this type (from an initial, possibly negative, value onwards); • d (k ) is the epoch (or ‘date’) at which the event numbered k takes place. The mapping k → d (k ) is called the dater associated with the type of event. Because of the meaning of the index k , one may call this an ‘event-domain description’. For this description, the appropriate underlying dioid is Rmax in continuous time or Zmax in discrete time. Using the γ -transform (which is analogous to the z -transform of conventional system theory—see Chapter 1), daters can be represented by formal power series with exponents in Z and with coefficients in Rmax or Zmax . In conventional system theory, a ‘time-domain’ description is rather used. For event graphs, this description involves variables c(t ) such that: 215 216 Synchronization and Linearity • t has the usual meaning of time (either in a continuous or in a discrete domain); • c(t ) is the number1 of the last event of the considered type which happens before or at time t . In fact, there is a discrepancy between the definitions of daters and counters. To each k , at least from a certain k0 (the initial value of the numbering process) to a certain k1 which can be infinite, corresponds a unique d (k ) which is well defined. On the contrary, for any t , it may be that no event takes place at t , a single event happens at t , or several events occur simultaneously at t . Consequently, the definition of c(t ) adopted above is just one among several possible definitions. A purpose of this chapter is to discuss two ‘canonical’ definitions and their relationship with daters. In any case, the mapping t → c(t ), defined over the whole time domain, will be called a counter. The appropriate dioid algebra of counters turns out to be Zmin (see e.g. Example 2.48). In order to enhance the symmetry between counter and dater descriptions, from now on in this chapter, time will be discrete. Then, the δ -transform of c(·) is classically defined as the formal power series t ∈Z c(t )δ t with coefficients in Zmin . In view of what happens in conventional system theory, this dual possibility of describing event graphs by models written down either in the event domain or in the time domain is not usual. This arises because of the fact that trajectories exhibit a monotonic behavior, due to the numbering of events in the order they take place. Roughly speaking, the mappings k → d (k ) and t → c(t ) are inverses of each other. Indeed, to give to this statement a precise meaning, it will be necessary to appeal to residuation theory (see §4.4). Anyway, this inversion is a nonlinear operation. Nevertheless, the dater and counter descriptions are both ‘linear’, but of course not in the same dioid. We will discuss the fact that neither description has a definite superiority over the other one. Then, we will study another description, namely in a two-dimensional domain which is the cartesian product of the event and time domains. In this new domain, a description involving formal power series in (γ , δ) will be proposed. Unlike Zmax and Zmin , the corresponding dioid is no longer totally ordered, and it is not the straightforward product of these two dioids. Section 5.6 addresses the issue of obtaining equations for ‘dual’ systems. We assume that desired outputs of an event graph are given and we wish to find the ‘best possible’ inputs which meet this target, that is, to compute the latest input dates which cause output dates to be less than or equal to the given target. This problem of ‘inverting a system’ is solved via residuation and the equations so obtained are reminiscent of adjoint- or co-state equations in conventional optimal control. Section 5.7 discusses the equivalence of three notions related to transfer functions, namely rationality, periodicity and realizability. Finally, §5.8 studies the response of rational systems to some periodic inputs which are shown to be eigenfunctions of rational transfer functions (in the same way as sine functions are eigenfunctions in conventional system theory). The notions of phase shift, amplification gain and Black plots can then be demonstrated for timed event graphs. 1 In French, ‘num´ ro’ rather than ‘nombre’, the former being a numerical label assigned to each event. e 5.2. A Comparison Between Counter and Dater Descriptions 5.2 217 A Comparison Between Counter and Dater Descriptions We consider the simple example of Figure 5.1 and we compare the equations obtained u x1 x2 y Figure 5.1: An event graph for daters and counters. Bars in places indicate the holding times of these places (in time units). Each transition receives a name (indicated in the figure) and this name is also that of the descriptive variable attached to this transition, be it a dater or a counter. The name of the argument, either k or t , will indicate whether we are dealing with a dater or with a counter description. It should also be remembered that the symbol ‘⊕’ has a different meaning in each context: it stands for the max operation when used in conjunction with daters, and for the min operation in conjunction with counters. According to §5.1, we consider that, e.g., x (t ) is the number of the last firing of transition x occurring before or at time t . The numbering of firings starts with 1, say, for all transitions. For the event graph of Figure 5.1, the following equations are then obtained (we do not discuss the issue of initial conditions at this moment—see §5.4.4.1, page 241). Dater equations: x 1 (k ) = 1 x 1 (k − 2) ⊕ 1 x 2 (k − 1) ⊕ 1u (k ) ; x 2 (k ) = 1 x 1 (k − 1) ⊕ 2u (k ) ; y (k ) = x 1 (k ) ⊕ x 2 (k ) . (5.1) Counter equations: x 1 (t ) = 2 x 1 (t − 1) ⊕ 1 x 2 (t − 1) ⊕ u (t − 1) ; x 2 (t ) = 1 x 1 (t − 1) ⊕ u (t − 2) ; y (t ) = x 1 (t ) ⊕ x 2 (t ) . (5.2) 218 Synchronization and Linearity Using the former representation, we derive y (k ) = = = = x 1 (k ) ⊕ x 2 (k ) 1(x 1 (k − 1) ⊕ x 1 (k − 2)) ⊕ 1 x 2 (k − 1) ⊕ (1 ⊕ 2)u (k ) 1x 1 (k − 1) ⊕ 1 x 2 (k − 1) ⊕ 2u (k ) 1 y (k − 1) ⊕ 2u (k ) . Thus a first order input-output relation has been obtained. It should be noticed that we have used two different rules in our simplifications. On the one hand, 2 ⊕ 1 = 2 because we are working with the dioid Zmax . On the other hand, we have used that x 1 (k − 1) ⊕ x 1 (k − 2) = x 1 (k − 1) because we are interested only in trajectories of x 1 which are nondecreasing functions of k . Remark 5.1 The nondecreasingness is not an intrinsic property of solutions of (5.1). For example, if u (k ) = ε for k < 0 and u (k ) = e( = 0) for k ≥ 0 (such inputs will be interpreted as ‘impulses’ in §5.4.4.1), then one can check that ∀k ∈ Z , x 1 (k ) x 2 (k ) = k+1 k+3 if k even; k+3 k+1 if k odd, is a nonmonotonic solution to (5.1). In terms of γ -transforms, the preceding simplification rules can be summarized as follows: t γ ⊕ τ γ = max (t , τ )γ ; t γ ⊕ t γ m = t γ min( ,m ) . (5.3) In terms of event graphs, this corresponds to the graph reductions displayed in Figure 5.2. Figure 5.2: Two rules for graph reduction Remark 5.2 Since we left apart the issue of initial conditions, one should be aware of the fact that the reduction shown on the right-hand side of Figure 5.2 is only valid for certain initial conditions (in particular, it holds true for canonical initial conditions discussed at §5.4.4.1, page 241). 5.2. A Comparison Between Counter and Dater Descriptions 219 Now, using the counter representation, we derive y (t ) = = = = x 1 (t ) ⊕ x 2 (t ) (2 ⊕ 1)x 1 (t − 1) ⊕ 1 x 2 (t − 1) ⊕ u (t − 1) ⊕ u (t − 2) 1(x 1 (t − 1) ⊕ x 2 (t − 1)) ⊕ u (t − 2) 1 y (t − 1) ⊕ u (t − 2) . We have used that 1 ⊕ 2 = 1 in Zmin , and that u (t − 1) ⊕ u (t − 2) = u (t − 2) because u is a nondecreasing function of t . In terms of δ -transforms, these rules can be summarized by k δ τ ⊕ δ τ = min(k , )δ τ ; k δ τ ⊕ k δ θ = k δ max(τ,θ ) . (5.4) These rules are similar to those of (5.3) but the roles of the exponents and coefficients are, roughly speaking, interchanged. In terms of event graphs, the rules (5.4) also express the graph reductions of Figure 5.2 (in reverse order). The above example also shows that in both approaches we reach a kind of ‘ARMA’ (Auto-Regressive-Moving-Average) equation which, in this specific case, involves the same delay in the AR part in both representations, but different delays in the MA part. Consequently, we would need state vectors of different dimensions in both cases to convert this ARMA equation into standard state space equations (with only unit delays on the right-hand side). Otherwise stated, the same physical system appears to be of a different order in the dater and in the counter descriptions. These discrepancies and dissymmetries are not very satisfactory and we could further accumulate remarks in the same vein. Let us just mention another intriguing fact. Figure 5.3 represents an event graph before and after the firing of the transition named x 1 or ξ1 . The following equations are obtained for the dater description before and after firing. Before firing x 1 (k ) = 1 x 1 (k − 1) ⊕ x 2 (k − 1) , x 2 (k ) = x 1 (k ) ⊕ u (k ) , y (k ) = x 2 (k ) , After firing ξ1 (k ) = 1ξ1 (k − 1) ⊕ ξ2 (k ) , ξ2 (k ) = ξ1 (k − 1) ⊕ u (k ) , y (k ) = ξ2 (k ) . Some substitutions yield the following equivalent descriptions: Before firing x 1 (k ) = x 2 (k ) y (k ) = ε e After firing ξ1 (k ) = ξ2 (k ) y (k ) = ε 1 e e e e x 1 (k − 1) x 2 (k − 1) x 1 (k ) x 2 (k ) 1 1 , ε ε ξ1 (k − 1) ξ2 (k − 1) ξ1 (k ) ξ2 (k ) . ⊕ ε e u (k ) , ⊕ e e u (k ) , 220 Synchronization and Linearity These are two state space realizations of the same γ -transfer function (which can be proved to be equal to e ⊕ γ (1γ )∗ provided that all possible simplification rules be used). In matrix notation, we have u x1 u ξ2 ξ1 x2 before firing after firing y y Figure 5.3: Firing a transition x (k ) = Ax (k − 1) ⊕ Bu (k ) , ξ(k ) = Aξ(k − 1) ⊕ Bu (k ) , y (k ) = Cx (k ) , y (k ) = C ξ(k ) . But one cannot nd a linear coordinate transformation to pass from one realization to the other. As a matter of fact, this would require that an invertible 2 × 2 matrix T exists such that x = T ξ , implying for example that B =TB , i.e. ε e = T11 T21 T12 T22 e e . The first row of this matrix relation implies that T11 ⊕ T12 = ε, hence T11 = T12 = ε, which is not compatible with the fact that T is invertible. Indeed, from the physical interpretation of this situation (remember that an internal transition fired once), or directly from the equations, it is apparent that the true relationship between ξ and x is ξ2 (k ) = x 2 (k ); ξ1 (k ) = x 1 (k + 1). However, this cannot be captured by a (static) linear change of basis in the state space. Because, in the counter description, coefficients and delays are, roughly speaking, exchanged, this issue of finding a linear change of basis in the state space can be solved positively when moving to the counter description for the same example. In this description, entries of matrices correspond to numbers of tokens in the initial marking. Firing an internal transition removes one token from each upstream place: this subtracts 1—in conventional algebra—from each entry of the row of matrix A corresponding to that transition, and it does the same on the corresponding row of B . Similarly, the same transition firing adds one token to each downstream place: algebraically, this adds 1 to each entry of the corresponding column of A and C . These operations can be realized in Zmin by pre-, respectively post-multiplication by appropriate matrices which are inverse of each other. For the above example, the pre-multiplication involves the matrix: −1 ε ε e . 5.3. Daters and their Embedding in Nonmonotonic Functions 221 We let the reader works out this example completely in the counter description and check this claim. Remark 5.3 From the above example, one should not conclude any superiority of the counter over the dater description. When it is possible, consider removing one bar from all places upstream of a given internal transition, and adding one bar to all downstream places: this leaves the input-output relation unchanged, and this is indeed the dual situation of firing a transition (which moves tokens instead of bars). Therefore, playing with bars instead of tokens will correspond to a change of basis in the dater description, but not in the counter description. In the next sections we move smoothly to a two-dimensional description in which, roughly speaking, monomials such as t γ k in γ -transforms and k δ t in δ -transforms will be represented by monomials of the form γ k δ t ; the basic objects will be power series in (γ , δ) with Boolean coefficients; and, in addition to the conventional sum and product of series, we will be allowed to use the rules γ k δ t ⊕ γ δ t = γ min(k, )δ t , γ k δ t ⊕ γ k δ τ = γ k δ max(t ,τ ) , (5.5) which are just the synthesis of (5.3) and (5.4). However, this requires an algebraic foundation which appeals to isomorphisms, quotients of dioids by congruences and residuation theory. As an introduction to these algebraic manipulations, the next section discusses how the set of daters can be embedded into a more general set of nonmonotonic functions from Z to Zmax . 5.3 5.3.1 Daters and their Embedding in Nonmonotonic Functions A Dioid of Nondecreasing Mappings Recall that with each type of event is associated a numbering mechanism which assigns a number to each event of this type in the order of these events taking place, starting from an initial finite (but possibly negative) value k0 ∈ Z. We also consider a special type of event which corresponds to ticks of an absolute clock. These ticks are also numbered in increasing order, starting from an initial value t0 ∈ Z (the origin of time). At each event of a given type, the current clock value is instantaneously read and the pair (event number, clock value) is saved. The dater associated with a type of event is just the mapping from Z into Z the graph of which is the set of all such pairs. Obviously, daters are nondecreasing functions, but they may not be strictly increasing since several events of the same type may occur simultaneously. We use d as a generic notation for a dater. Strictly speaking, the function d is defined over an interval of Z possibly extending to infinity to the right (if events of the same type occur infinitely often), and, wherever it is defined, d assumes finite, but a priori unbounded values in Z. Indeed, in order to extend the definition of d to the whole domain Z, it is 222 Synchronization and Linearity def convenient to assume that the range set is Z = Z ∪ {−∞} ∪ {+∞}. The convention is that if k < k0 (the initial value of the numbering); −∞ +∞ if the k -th event of the considered type never d (k ) = took place; any finite value otherwise. From a mathematical point of view, it may sometimes be useful to see daters as mappings from a complete dioid into a complete dioid. For this reason, we may extend the domain of d by setting d (−∞) = −∞ and d (+∞) = +∞ . (5.6) Obviously, these end-point conditions are always compatible with the nondecreasingness property of d . As already discussed, the natural algebra for the range space of d is Zmax , that is, Z, max, + . It should be remembered that, in Zmax , (−∞) + (+∞) = ε ⊗ = ε = −∞ (5.7) according to Axiom 4.7. As for the domain space of d , the algebraic structure we need consists of the conventional order relation of Z (this is necessary in order to speak of the nondecreasingness property of d ), and the conventional addition (which will be needed for defining the product of daters). At this stage, it is immaterial to decide whether the domain will be called Zmin or Zmax . Indeed, if we adopt the former option, the only consequence is that we should speak of ‘nonincreasing’, rather than ‘nondecreasing’ functions d with regard to the order relations implied by the dioid structures in the domain and in the range. There is however a more important criterion to decide which name is to be given to the domain of daters. In this dioid, do we wish that +∞ − ∞ = +∞ ⊗ (−∞) = −∞ or + ∞? This question involves +, i.e. ⊗, rather than ⊕ which is related to the order relation. We leave the answer open until Remark 5.4 below. The next stage is to endow the set of daters with a dioid structure which already appeared to be appropriate for our purpose. Namely, • addition is just the conventional pointwise maximum, or otherwise stated ∀k ∈ Z , (d1 ⊕ d2 )(k ) = d1 (k ) ⊕ d2 (k ) , in which the symbol ‘⊕’ on the left-hand side denotes the addition of daters, whereas it denotes addition in the range dioid Zmax on the right-hand side; this definition is extended to infinite sums without difficulty since the range is a complete dioid; • multiplication is the conventional ‘sup-convolution’, that is, for all k ∈ Z, (d1 ⊗ d2 )(k ) = (d1 ( ) ⊗ d2 (k − )) = sup (d1 ( ) + d2 (k − )) . ∈Z ∈Z 5.3. Daters and their Embedding in Nonmonotonic Functions 223 Remark 5.4 The above formula can be written (d1 ⊗ d2 )(k ) = sup d1 (−∞) + d2 (k + ∞), d1 (+∞) + d2 (k − ∞), sup (d1 ( ) + d2 (k − )) . ∈Z Using (5.6) and (5.7), it can be proved by inspection that • for finite k , (d1 ⊗ d2 )(k ) = sup (d1 ( ) + d2 (k − )) , ∈Z that is, the result is the same whether we consider that the domain is Z or Z; • for k = −∞, we obtain (d1 ⊗ d2 )(−∞) = −∞, whatever we decide upon the value to be given to +∞ − ∞ in the domain of (event domain); • for k = +∞, one has that (d1 ⊗ d2 )(+∞) = sup −∞, +∞ + d2 (+∞ − ∞), sup (d1 ( ) + ∞) . ∈Z For the class of functions satisfying (5.6) to be closed by multiplication (it is obviously closed by addition), we want to ensure that (d1 ⊗ d2 )(+∞) = +∞, even if d1 ( ) = −∞, ∀ < +∞. Then, we must decide that +∞ − ∞ = +∞ in the event domain. (5.8) In conclusion, • we should consider that the event domain is Zmin rather than Zmax (however, we will keep on speaking of ‘nondecreasing’ functions); • we also observed that one may first consider that addition and multiplication operate on functions from Z (instead of Z) into Zmax , and then complete the results of these operations by the end-point conditions (5.6). We summarize this subsection with the following definition. Definition 5.5 (Daters) Daters are nondecreasing mappings from Zmin into Zmax obeying the end-point conditions (5.6) (‘nondecreasing’ refers to the conventional order of Z in both the domain and the range). The set of daters is endowed with the pointwise maximum of functions as the addition, and with the sup-convolution as the multiplication. One can check that the zero and identity elements of the dioid of daters are respectively: −∞ if k < 0 ; −∞ if k < +∞ ; ε(k ) = ´ e (k ) = 0 ´ (5.9) if 0 ≤ k < +∞ ; +∞ otherwise; +∞ otherwise. 224 Synchronization and Linearity 5.3.2 γ -Transforms of Daters and Representation by Power Series in γ 5.3.2.1 Power Series in γ and the Nondecreasingness Property A convenient way to manipulate daters is to encode them using their γ -transforms. This yields formal power series with coefficients in Zmax . As for exponents, owing to the last observation in Remark 5.4, we may restrict them to belong to Z. For a dater d , D will denote its γ -transform and we have D= d (k )γ k . k ∈Z As is usual, if some monomial γ k is missing in the explicit expression of some D , this just means that the corresponding coefficient is ‘zero’, that is, it is equal to ε. If the set of γ -transforms of daters is endowed with the addition and multiplication introduced in Chapter 4 (see (4.85)), then daters and their γ -transforms constitute two, isomorphic dioids. The latter will be denoted D[[γ ]]. In D[[γ ]], the zero element can be denoted simply ε because, owing to (5.9), it is the zero series with all coefficients equal to ε = −∞. As for the identity element, it is the γ -transform of e given in (5.9), ´ and this is γ ∗ = γ 0 ⊕ γ ⊕ γ 2 ⊕ · · · . Remark 5.6 Observe that the interpretation of γ is that of the ‘backward shift operator in numbering’ (or ‘in the event domain’) since the series γ D corresponds to the γ transform of the dater k → d (k − 1). The expression ‘backward shift’ is traditional in system theory as is the name ‘forward shift’ for the operator z (see [72]). However, this appellation is somewhat misleading since it should be realized that, if we plot the graphs of k → d (k ) and k → d (k − 1), then the latter is shifted to the right with respect to the former. Note that γ itself, viewed as a formal power series which has all its coefficients equal to ε except that of γ 1 which is equal to e, may be considered as the γ -transform of the function def k → γ (k ) = e ε if k = 1 ; otherwise. (5.10) Shifting a dater may be considered as achieving its sup-convolution with γ ; with γ transforms, this operation amounts to ‘multiplying by γ ’. Of course, the function γ itself is not a dater since it is not monotonic, hence γ ∈ D[[γ ]]. Therefore, to give a meaning to this ‘multiplication by γ ’, we must embed elements of D[[γ ]] into a larger set, namely the set of (general) formal power series with coefficients in Zmax and exponents in Z. According to §4.7.1, once endowed with the same operations as D[[γ ]] (see (4.85)), this set is a complete commutative distributive Archimedian dioid denoted Zmax [[γ ]]. 5.3. Daters and their Embedding in Nonmonotonic Functions 225 The zero element ε of Zmax [[γ ]] is again the zero series (all coefficients equal to ε), but the identity element e of Zmax [[γ ]] is the series which has only one coefficient different from ε, namely that of γ 0 which is equal to e = 0 (in Zmax ). It is realized that this e = γ 0 of Zmax [[γ ]] is not formally equal to γ ∗ which is the identity element in the dioid D[[γ ]]. Hence D[[γ ]] is not a subdioid of Zmax [[γ ]]. Actually, this situation is pretty much related to that considered in Theorem 4.78. To show this, let us first observe that the property of a function f : Z → Zmax to be nondecreasing can be characterized by ∀k ∈ Z , f (k ) ≥ f (k − 1) . In terms of the γ -transform F ∈ Zmax [[γ ]], this translates into f nondecreasing ⇔ F ≥ γ F . (5.11) This should be compared with (4.80) which provides other characterizations of nondecreasing functions. Remark 5.7 If we let k range in Z instead of Z, without imposing the end-point conditions (5.6), then (5.11) is no longer a characterization of nondecreasing functions. For instance, consider the function f such that f (−∞) = 2 and f (k ) = 1 for k > −∞: it satisfies f (k ) ≥ f (k − 1), and thus also (5.11), although it is not nondecreasing over Z. If (5.11) cannot be retained as a characterization of nondecreasing functions, then it is not clear how to solve in a simple way the best approximation problems addressed below. It is thus realized that, as a subset of elements of Zmax [[γ ]] meeting condition (5.11), D[[γ ]] is nothing but what we have denoted γ ∗ Zmax [[γ ]] in §4.5.4. The following theorem is just a rephrasing of Theorem 4.78 in the present context. Theorem 5.8 Let I denote the canonical injection from D[[γ ]] into Zmax [[γ ]], and consider some F ∈ Zmax [[γ ]]. 1. The greatest element F in I (D[[γ ]]) which is less than or equal to F is given by ◦ F = γ ∗ \ F = F ∧ γ −1 F ∧ γ −2 F ∧ · · · . (5.12) In the equivalence class of elements F of Zmax [[γ ]] which have the same ‘best approximation from below’ F , this F is the unique element which belongs to I (D[[γ ]]) and it is also the minimum representative in the equivalence class. 2. The least element F in I (D[[γ ]]) which is greater than or equal to F is given by F = γ ∗F . (5.13) In the equivalence class of elements F of Zmax [[γ ]] which have the same ‘best approximation from above’ F , this F is the unique element which belongs to I (D[[γ ]]) and it is also the maximum representative in the equivalence class. 226 Synchronization and Linearity Corollary 5.9 The greatest dater f which is less than or equal to a given (not necessarily monotonic) mapping f from the event domain into Zmax is obtained by the formula ∀k ∈ Z , f ( ) = inf f ( ) . f (k ) = ≥k ≥k (5.14) The least dater f which is greater than or equal to f is obtained by ∀k ∈ Z , f ( ) = sup f ( ) . f (k ) = ≤k ≤k (5.15) Of course, these formulæ should be completed by the end-point conditions (5.6). Proof The formulæ (5.14) and (5.15) are straightforward consequences of (5.12) and (5.13). The mapping I which associates with F ∈ Zmax [[γ ]] its best approximation from below in D[[γ ]] is u.s.c., but it is neither a ⊕- nor a ⊗-morphism. On the contrary, the mapping I which selects the best approximation from above is a l.s.c. surjective dioid homomorphism. This is why in what follows we concentrate on this type of approximation. Remark 5.10 Because of (5.13) and of Lemma 4.77, statement 2, it should be clear that D[[γ ]] is a multiplicative ideal and that ∀ F , G ∈ Zmax [[γ ]] , F⊗G = F ⊗G = F⊗G = F⊗G . Figure 5.4 explains how to geometrically construct the graph of the mapping k → Figure 5.4: Featuring the construction of γ ∗ F f (k ) associated with the power series F for a given F (represented by the graph of k → f (k )): to each point of this discrete graph is attached a ‘horizontal half line’ extending to the right (corresponding to the multiplication by γ ∗) and then, the graph 5.3. Daters and their Embedding in Nonmonotonic Functions 227 of f is obtained as the ‘upper hull’ of this set of half lines. Of course, when we speak of ‘lines’, only the trace of those lines over Z2 is significant. The practical consequence of the preceding results is that an element of D[[γ ]] can be viewed as a particular representative (indeed the maximum one) of an equivalence class of elements of Zmax [[γ ]] for the equivalence relation F ≡ G ⇔ γ ∗ F = γ ∗G . (5.16) Calculations in D[[γ ]] can be performed using the general rules of power series with any representative of an equivalence class (i.e. not necessarily a series with nondecreasing coefficients); however, the second simplification rule (5.3) is now available (the first one is of course also valid since it arises from the fact that the coefficients lie in Zmax ). The symbol ‘=’ in the second rule must be understood as the fact that both sides are in the same equivalence class. Because of (5.16), there is no meaning in speaking of the degree of an element of D[[γ ]], because an element is an equivalence class, and two representatives of the same class may have different degrees. For example, e = γ 0 (which is of degree zero in Zmax [[γ ]]) and γ ∗ (which is of infinite degree in Zmax [[γ ]]) are the same element in D[[γ ]]. The situation is better for the valuation which is invariant in an equivalence class. This is stated by the next lemma which also exhibits another invariant of equivalence classes. Lemma 5.11 Consider F = k∈Z f (k )γ k and G = and G are in the same equivalence class, then k ∈Z g (k )γ k in Zmax [[γ ]]. If F 1. val( F ) = val(G ), i.e. inf{k | f (k ) = ε} = inf{k | g (k ) = ε}; 2. k ∈Z f (k ) = k ∈Z g (k ), i.e. supk∈Z f (k ) = supk∈Z g (k ). Proof 1. We have γ ∗ F = γ ∗ G . But val(γ ∗ F ) = val(γ ∗ ) ⊗ val( F ) from (4.94) (equality holds true since Zmax is entire), but val(γ ∗ ) = e and hence val( F ) = val(γ ∗ F ) = val(γ ∗ G ) = val(G ). 2. Since the two formal power series in (5.16) must be equal, the corresponding values in Zmax , obtained by substituting a numerical value in Zmax for γ , must also be the same. Therefore, set γ = e and the result follows. 5.3.2.2 Minimum Representative In general, there is no minimum representative of an equivalence class because I is not a ∧-morphism (check that I ( F ∧ G ) < I ( F ) ∧ I (G ) with F = e ⊕ 1γ and G = 1 ⊕ γ ). However, it turns out that some equivalence classes do have a minimum representative. We address this question now. Let F define an equivalence class. If there were to exist a minimum member, say F , of that equivalence class, then we would have that γ ∗F = γ ∗F = F ⊕ γ γ ∗F = F ⊕ γ γ ∗F , 228 Synchronization and Linearity where the following identities have been used: γ∗ = e ⊕γ+ and γ+ = γγ∗ . (5.17) Hence, if such an F were to exist, it should be the smallest one satisfying γ ∗ F = F ⊕ γ + F , and therefore it should be equal to γ ∗ F − γ + F . ◦ Theorem 5.12 (Minimum representative) Let F = γ ∗ F − γ + F. Then, one also has that ◦ def f (k )γ k ∈ Zmax [[γ ]] and F = F = F − γ +F ◦ (5.18) (equality in Zmax [[γ ]]). Moreover, this F depends only on the equivalence class of F of which it is a minorant. Finally, the following three statements are equivalent: 1. F belongs to the equivalence class of F of which it is the minimum representative; 2. val( F ) = val( F ); 3. limk→−∞ f (k ) = ε. Before giving a proof, let us consider Figure 5.5 which illustrates how F is obtained in practice using a geometric construction of γ ∗ F − γ + F : the black graph represents ◦ Points belonging to the graph of the minimal representative Figure 5.5: Featuring the construction of γ ∗ F − γ + F ◦ γ ∗ F (a nondecreasing mapping from Z to Z); the grey graph represents γ + F and is obtained from the previous one by a unit shift along the x -axis; finally, only the coefficients corresponding to points where the black graph differs from the grey graph are nonzero coefficients of F . Proof of Theorem 5.12 First, we have that F = γ ∗ F − γ + F = (F ⊕ γ + F ) − γ + F = F − γ + F ◦ ◦ ◦ according to Formula (f.17) of Table 4.2. The former expression shows that F depends only on the equivalence class of F (since γ + F = γ γ ∗ F and γ ∗ F characterizes an equivalence class). The latter expression shows that F ≤ F , hence F is a minorant of the equivalence class of F since this inequality can be obtained for any F in this subset. 5.3. Daters and their Embedding in Nonmonotonic Functions 229 1 ⇒ 2 If F belongs to the equivalence class of F , then Lemma 5.11 shows that val( F ) = val( F ). 2 ⇒ 3 Suppose that val( F ) = val( F ). We also assume that F = ε since otherwise F = ε and the theorem is trivial. Then, either val( F ) > −∞ —in this case the equality with val( F ) need not be assumed since it can indeed be proved, see Remark 5.13 below—or it is equal to −∞. In the former case, clearly f (k ) = ε for all k < val( F ) and statement 3 is trivially true. In the latter case, we are going to prove that statement 3 also holds true. Indeed, since val( F ) = −∞, for all k0 , there exists k0 ≤ k0 such that f (k0 ) > ε. Since (5.18) says that f (k0 ) = f (k0 ) − sup ≤k0 −1 f ( ), it is necessary that sup ≤k0 −1 f ( ) ≤ f (k0 ) − 1. Let ◦ k1 = k0 − 1 < k0 . By repeating the same argument, we can construct a strictly decreasing subsequence {ki } such that sup ≤ki f ( ) ≤ f (ki + 1) − 1. This clearly shows that limki →−∞ sup ≤ki f ( ) = ε, and since the mapping k → sup ≤k f (l ) is nondecreasing, then limk→−∞ sup ≤k f ( ) = ε. This property is equivalent to statement 3. def def 3 ⇒ 1 Statement 1 is equivalent to the fact that A = B with A = γ ∗ F and B = γ ∗ F , which is also equivalent to the fact that B − A = ε because F ≤ F and thus ◦ A ≤ B . From (5.17), we have that A = γ A ⊕ F and B = γ B ⊕ F . With the help of Formula (f.16) of Table 4.2, we have that B = γ B ⊕ ( F − γ B ) = γ B ⊕ F . ◦ Moreover, B− A ◦ = = = = ≤ ≤ (γ B ⊕ F ) − (γ A ⊕ F ) ◦ (γ B − (γ A ⊕ F )) ⊕ ( F − (γ A ⊕ F )) ◦ ◦ using (f.14), γ B − (γ A ⊕ F ) ◦ since F − (γ A ⊕ F ) = ε, ◦ (γ B − γ A) − F ◦ ◦ using (f.18), γ ( B − A) − F ◦ ◦ using (f.20), γ ( B − A) ◦ (obvious). def It follows that X = B − A satisfies X ≤ γ X which means that x (·) = b (·) − a (·) ◦ ◦ is nonincreasing. In addition, for all k , x (k ) ≤ b (k ) = sup f ( ) and ≤k lim b (k ) = ε k →−∞ from the assumption that statement 3 holds true. Therefore, x being nonincreasing and tending to ε at −∞ is always equal to ε. Remark 5.13 If −∞ < val( F ) < +∞, then val(γ + F ) = val(γ + ) ⊗ val( F ) = 1 + val( F ) > val( F ). From (5.18), we have that F ≤ γ + F ⊕ F , hence, with (4.92), val( F ) ≥ min(val(γ + F ), val( F )). But val(γ + F ) > val( F ), hence val( F ) ≥ val( F ). On the other hand, since F ≤ F , val( F ) ≥ val( F ) and finally val( F ) = val( F ). If val( F ) = +∞, then F = F = ε. Therefore, statement 2 of the theorem can be replaced by the statement {val( F ) = −∞ ⇒ val( F ) = −∞}. 230 5.4 Synchronization and Linearity Moving to the Two-Dimensional Description In (5.3), the simplification rule on exponents is dual to the one which applies to coefficients. Therefore, it seems more natural to preserve the symmetry between exponents and coefficients. This is realized by a new coding of daters using two shift operators instead of one, the so-called two-dimensional domain description. Then a nice interpretation of this new coding will be given in terms of ‘information’ about events. 5.4.1 The Zmax Algebra through Another Shift Operator In Examples 4.17 and 4.18 we observed that a dioid of vectors or scalars can be made isomorphic to a dioid of some subsets of this vector or scalar set in which ∪ plays the role of addition and + (‘vector sum’) that of multiplication. For our present purpose, we consider Zmax on the one hand, and (L, ∪, +) on the other hand, where L is the subset of 2Z consisting of ‘half lines of Z’ extending to the left, and including ∅ and Z itself. More precisely, we consider the mapping {s ∈ Z | s ≤ t } if t ∈ Z ; Z : Z → 2 , t −→ ∅ (5.19) if t = ε = −∞ ; Z if t = = +∞ . Hence Z = L and is a dioid isomorphism between the two complete dioids Zmax and (L, ∪, +) (in the latter, ε = ∅ and e = (−∞, 0]). We now consider the set of power series in one variable δ , with ‘Boolean coefficients’ (denoted ε and e) belonging to the dioid of Example 4.16, and with exponents in Z, this set of series being endowed with the conventional sum and product of series; this dioid is denoted B[[δ ]]. With any subset S of Z, we associate a power series via the mapping S = {t }t ∈ JS → δt . (5.20) t ∈ JS This expression should be interpreted as a series in which only coefficients equal to e are explicitly mentioned, the missing monomials having a coefficient equal to ε. Clearly, S ∪ S is represented by the series obtained by summing up the series related to S and S . The empty subset is represented by the zero series (all coefficients equal to ε) and the subset Z is represented by the series having all coefficients equal to e. Also, if def S ⊗ S = S + S = t + t | t ∈ JS , t ∈ JS , then the product of the series associated with S and S is the series associated with S ⊗ S . The identity element consists of the subset {0}, and is represented by the series δ 0 also denoted e. The mapping (5.20) is an isomorphism between the two complete dioids 2Z , ∪, + and B[[δ ]]. The subset L of 2Z is mapped to some subset of B[[δ ]] which we are going to characterize. Note first that δ is the series representing the subset {1}, and ‘multiplying 5.4. Moving to the Two-Dimensional Description 231 by δ ’ amounts to shifting a subset to the right by one (later on, δ will be called a ‘backward shift operator in timing’ or ‘in the time domain’ for reasons akin to those put forward in Remark 5.6). Then, a half line L ∈ L is a subset characterized by the fact that it is included in its own image obtained by translation to the right: in terms of associated series, and keeping the same letter to denote the half line and its coding series, this means that L ≤ δ L or L ≥ δ −1 L . (5.21) Given any subset S , we may look for the smallest half line L larger than (i.e. containing) S : in the algebraic setting, this amounts to solving the algebraic problem of the ‘best approximation from above’ of a series S by a series L satisfying (5.21). By direct application of Theorem 4.78, the solution of this problem is obtained by using the formula L = (δ −1 )∗ S . The dioid L[[δ ]] of series representing half lines L is isomorphic to the quotient of B[[δ ]] by the congruence ∀ S , S ∈ B[[δ ]] , S ≡ S ⇔ (δ −1 )∗ S = (δ −1 )∗ S , (5.22) and it is also isomorphic to the multiplicative ideal (δ −1 )∗ B[[δ ]]. Calculations in L[[δ ]], which amount to manipulations of half lines—and hence also of numbers in Zmax according to the mapping (5.19)—can be done with any representative of an equivalence class in B[[δ ]], provided that the following simplification rule be remembered (which should remind us of the second rule in (5.4)): δ t ⊕ δ τ = δ max(t ,τ ) . This indeed expresses the equivalence of both sides of the equation. Remark 5.14 The composition of (5.19) and (5.20) (direct correspondence from Zmax to (δ −1 )∗ B[[δ ]]) is given by δ t (δ −1 )∗ t → ε (zero series) −1 ∗ ∗ (δ ) δ = (δ −1 ⊕ δ)∗ = (δ −1 )∗ ⊕ δ ∗ if t ∈ Z ; if t = ε = −∞ ; if t = = +∞ . (5.23) In the first two cases there exist minimum representatives in the corresponding equivalence classes of B[[δ ]] which are respectively δ t and ε (the latter class contains only this element), but there is no minimum representative in the class of (the last case). If we attempt to allow infinite exponents for power series in B[[δ ]] in order to say that δ +∞ is a minimum representative of , then expression (5.22) of the congruence is no longer valid since δ + and δ ∗ , which should both represent , do not appear to be ∗ ∗ algebraically equivalent through (5.22), that is, δ +∞ δ −1 = δ ∗ δ −1 . The reason is that a subset of Z which is a left half line can no longer be characterized by the fact that this subset is included in its image by a right unit shift: this fails for the subset {+∞}. This observation is similar to that of Remark 5.7. 232 5.4.2 Synchronization and Linearity The Max[[γ , δ ]] Algebra in We start from the set of formal power series in two variables (γ , δ) with Boolean coefficients and with exponents in Z, this set being endowed with the conventional addition and multiplication of series: this dioid is called B[[γ , δ ]]. In two stages, that is, by two successive quotients by equivalence relations, we reach an algebraic structure, called ax Min [[γ , δ ]] (pronounced ‘min max γ δ ’), which is isomorphic to D[[γ ]] (the dioid of γ -transforms of nondecreasing functions from Z to Zmax ). At the first stage, we reach a dioid which is isomorphic to Zmax [[γ ]] (γ -transforms of general functions). We also show that the two steps can be combined into a single one. For each stage, we give algebraic and geometric points of view. 5.4.2.1 From Sets of Points in the Plane to Hypographs of Functions The dioid B[[γ , δ ]] is complete, commutative, distributive and Archimedian. It is iso2 morphic to the dioid 2Z , ∪, + via the one-to-one correspondence: F ∈ 2Z , F = {(k , t )}(k,t )∈ JF −→ 2 γ k δ t ∈ B[[γ , δ ]] . (5.24) (k , t )∈ J F The lower bound operation ∧ in B[[γ , δ ]] corresponds to the intersection ∩ in 2Z . Instead of subsets of points in Z2 , we can manipulate their indicator functions over 2 Z which assume the value e at a point belonging to the corresponding subset and the value ε elsewhere. This set of Boolean functions is a complete dioid once endowed with the pointwise maximum as the addition and the two-dimensional max-convolution as the multiplication. Then, elements of B[[γ , δ ]] appear as (γ , δ)-transforms of these functions in an obvious sense. From an algebraic point of view, B[[γ , δ ]] is also isomorphic to B[[δ ]][[γ ]] which is the dioid of power series in one variable γ with coefficients in B[[δ ]]. The equivalence relation (5.22) can be extended to elements of B[[γ , δ ]] by using the same definition (note that (δ −1 )∗ is another notation for (δ −1 )∗ γ 0 in B[[γ , δ ]]). The quotient of B[[γ , δ ]] by this equivalence relation, denoted (δ −1 )∗ B[[γ , δ ]] because it is isomorphic to this multiplicative ideal, is also isomorphic to (δ −1 )∗ B[[δ ]] [[γ ]] which is the dioid of power series in γ with coefficients in (δ −1 )∗ B[[δ ]]. Since this one is isomorphic to Zmax by the correspondence (5.23), we are back to the dioid Zmax [[γ ]]. We summarize these considerations with the following lemma. 2 Lemma 5.15 The dioids (δ −1 )∗ B[[γ , δ ]] and Zmax [[γ ]] are isomorphic. Geometrically, if one starts from a collection of points in Z2 (coded by an element of B[[γ , δ ]] as indicated by (5.24)), the quotient by (5.22) corresponds to ‘hanging a vertical half line’ (extending downwards) at each point as shown in Figure 5.6. This operation is the counterpart of the isomorphism described in §5.4.1 which associates a half line extending to the left with each number of Zmax (but now Zmax is disposed vertically along the y -axis). All the subsets of Z2 yielding the same collection of points under this transformation are equivalent. We obtain the geometric representation of an element of (δ −1 )∗ B[[δ ]] [[γ ]] (in fact, a maximum representative of an equivalence 5.4. Moving to the Two-Dimensional Description 233 Figure 5.6: Hypograph class). Since (δ −1 )∗ B[[δ ]] [[γ ]] is isomorphic to Zmax [[γ ]], this geometric figure is in turn in one-to-one correspondence with the γ -transform of some function k → f (k ) of which it is the ‘hypograph’ (see Definition 3.37). This function is determined by f : k → f (k ) = sup t . (5.25) (k , t )∈ J F Conversely, given a function f : Z → Zmax , it follows from (5.23) that one representative of the corresponding element F ∈ (δ −1 )∗ B[[γ , δ ]] is obtained by F= γ k δ f (k ) ⊕ {k |−∞< f (k )<+∞} 5.4.2.2 γ k δ ∗ . (5.26) {k | f (k )=+∞} From Hypographs of General Functions to Hypographs of Nondecreasing Functions The next step is to restrict ourselves to nondecreasing functions. This amounts to making the quotient of (δ −1 )∗ B[[γ , δ ]]—isomorphic to Zmax [[γ ]]—by the equivalence relation (5.16): the result is isomorphic to the multiplicative ideal γ ∗ (δ −1 )∗ B[[γ , δ ]], and it will be denoted Max[[γ , δ ]]. This dioid is isomorphic to D[[γ ]]. in ax def Lemma 5.16 The dioids D[[γ ]] (dioid of γ -transforms of daters) and Min [[γ , δ ]] = γ ∗ (δ −1 )∗ B[[γ , δ ]] are isomorphic. Geometrically, this new quotient amounts to attaching a horizontal right half line to each point of the hypograph of a general function (as we did in Figure 5.4) to obtain the hypograph of a nondecreasing function d which is determined by d : k → d (k ) = sup t . (5.27) ( ,t ) ∈ J F ≤k This formula is derived from (5.25) and (5.15). Conversely, given a nondecreasing ax function d , one representative in Min [[γ , δ ]] of this dater is obtained by (5.26) with d replacing f , that is, F= {k |−∞<d (k )<+∞} γ k δ d (k ) ⊕ {k |d (k )=+∞} γ k δ∗ . (5.28) 234 Synchronization and Linearity 5.4.2.3 ax Directly from B[[γ , δ ]] to Min [[γ , δ ]] It is realized that the quotients associated with (δ −1 )∗ and with γ ∗ done sequentially can be condensed into a single one using the new equivalence relation in B[[γ , δ ]]: A ≡ B ⇔ γ ∗ (δ −1 )∗ A = γ ∗ (δ −1 )∗ B . ∀ A, B ∈ B[[γ , δ ]] , (5.29) Because of Formula (4.109), Min [[γ , δ ]] is also equal to (γ ⊕ δ −1 )∗ B[[γ , δ ]]. Geometrically, starting from a collection of points in Z2 (in one-to-one correspondence with an element of B[[γ , δ ]]), one first attaches vertical half lines down from each point, and then horizontal right half lines to all the points so obtained: this amounts to fixing a cone extending in south-east directions with vertical and horizontal borders to each original point. Note that the cone with its vertex at the origin is coded by γ ∗ (δ −1 )∗ in B[[γ , δ ]]; it corresponds to the identity element in the quotient dioid. ax Notation 5.17 We introduce the following notation: ∀(k , t ), ( , τ ) ∈ Z2 , {( , τ ) (k , t ) or (k , t ) {( , τ ) ≺ (k , t ) or (k , t ) ( , τ )} ⇔ { ≥ k and τ ≤ t } , ( , τ )} ⇔ {( , τ ) (k , t ) and ( , τ ) = (k , t )} . Geometrically, the point ( , τ ) lies in a south, east or south-east direction with ax respect to (k , t ). The elements of Min [[γ , δ ]] could have been obtained by raising the geometric problem of finding the smallest set of points in Z2 containing a given set of points and closed by translations to the right and downwards (that is, a set containing its own images by these translations). The corresponding algebraic formulation in B[[γ , δ ]] is the following: for a given A ∈ B[[γ , δ ]], find the ‘best approximation from above’ by a B satisfying B ≥γB , B ≥ δ −1 B or equivalently B ≥ (γ ⊕ δ −1) B . By application of Theorem 4.78, this B is equal to γ ∗ (δ −1 )∗ A and it is the maximum representative of the equivalence class of A. The problem of minimum representatives is addressed later on. ax Note that there is another path to obtain Min [[γ , δ ]] from B[[γ , δ ]]: it consists in ∗ making the quotient by γ first , followed by the quotient by (δ −1 )∗ . This procedure may be interpreted in terms of functions t → g (t ), which amounts to inverting the role of the x - and y -axes. This is what we will do when considering counter descriptions in §5.5. Finally, we obtain the commutative diagram of Figure 5.7. ax The practical rule for manipulating elements of Min [[γ , δ ]] is to use any representative in each equivalence class and the usual rules of addition and multiplication of power series in two variables plus the rules (5.5) which should be understood as equivax alences with respect to the congruence (5.29). The symbol Min [[γ , δ ]] is supposed to suggest the rules (5.5) which involve min and max. These rules can be summarized by the following one: ( , τ) (k , t ) ⇒ γ k δ t ⊕ γ δ τ = γ k δ t . (5.30) 5.4. Moving to the Two-Dimensional Description 235 B[[γ , δ ]] (δ −1 )∗ (δ −1 )∗ B[[γ , δ ]] γ∗ (γ ⊕ δ −1 )∗ γ∗ γ ∗ B[[γ , δ ]] (δ −1 )∗ Min [[γ , δ ]] ax Figure 5.7: Commutative diagram The comments preceding Lemma 5.11 can be repeated here with some adaptation. Because of (5.29), it is clearly meaningless to speak of the degree in γ , or of the ax valuation in δ , of an element of Min [[γ , δ ]]: two members of the same equivalence class may have different such characteristics. The next lemma is a rephrasing of Lemma 5.11 ax in the context of Min [[γ , δ ]]. Lemma 5.18 Consider F and G in B[[γ , δ ]]. If F and G represent the same element of Max[[γ , δ ]], then the valuations of F and G in γ are equal, and so are the degrees in of F and G in δ . Proof Essentially, the proof uses the same argument as at point 1 of the proof of Lemma 5.11. We start from the equality of γ ∗ (δ −1 )∗ F and γ ∗ (δ −1 )∗ G (equality in B[[γ , δ ]]). Then we apply (4.91), respectively (4.94), to the degree in δ , respectively the valuation in γ , of those series. These are equalities since {ε, e} is an entire dioid. We finally observe that deg((δ −1 )∗ ) = val(γ ∗ ) = e. ax Definition 5.19 For any element F of Min [[γ , δ ]], its valuation (still denoted val( F )), respectively its degree (still denoted deg ( F )), is the valuation in γ , respectively the degree in δ , of any representative (in B[[γ , δ ]]) of F. Such an F is a polynomial if it is equal to ε, or if its valuation and its degree are both finite. It is a monomial if it is a ax polynomial and, when it is not equal to ε, if it is equal (in Min [[γ , δ ]]) to γ val( F ) δ deg( F ) . Of course, we cannot claim here that val( F ) ≤ deg ( F ) as is the case for conventional nonzero polynomials or power series. However, it is straightforward to see that the relevant properties of Lemma 4.93 are still valid (with equality). Also, for a given polynomial F , any monomial which is greater than or equal to F must have a valuation not larger than val( F ) and a degree not smaller than deg( F ). Therefore, the smallest such monomial is γ val( F ) δ deg( F ) . 5.4.2.4 Minimum Representative Let us return to the problem of the minimum representative in each equivalence class. 236 Synchronization and Linearity ax Theorem 5.20 (Minimum representative in Min [[γ , δ ]]) Let F = f (k , t )γ k δ t ∈ def ◦ B[[γ , δ ]] ( f (k , t ) ∈ B) and F = (γ ⊕ δ −1 )∗ F − (γ ⊕ δ −1 )+ F. Then, one has that F = F − (γ ⊕ δ −1 )+ F ◦ (5.31) (equality in B[[γ , δ ]]). Moreover, F depends only on the equivalence class of F of which it is a minorant (the equivalence relation is (5.29)). Finally, the following three statements are equivalent: 1. F belongs to the equivalence class of F of which it is the minimum representative; 2. val( F ) = val( F ) and deg ( F ) = deg( F ); 3. the following two conditions are satisfied: ∀t ∈ Z , ∀k ∈ Z , ∃k ∈ Z : ∀( , τ ) ∃t ∈ Z : ∀( , τ ) (k , t ) , (k , t ) , f ( , τ) = ε , f ( , τ) = ε . (5.32) (5.33) Figure 5.8 illustrates the geometric construction of the minimum representative: the set Points belonging to the minimal representative Figure 5.8: Featuring the construction of (γ ⊕ δ −1 )∗ F − (γ ⊕ δ −1)+ F ◦ of points of γ ∗ F (in white) is shifted downwards (shift by δ −1 which yields the light grey set) and to the right (shift by γ which yields the dark grey set), and only the points of the white set which are not ‘covered’ by points of at least one of the grey sets are kept for the minimum representative. Proof of Theorem 5.20 The fact that F only depends on the equivalence class of F and that it is a minorant of this equivalence class is proved in the same way as in Theorem 5.12, γ ⊕ δ −1 now replacing γ . 1 ⇒ 2 This is an immediate consequence of Lemma 5.18. 5.4. Moving to the Two-Dimensional Description 237 2 ⇒ 3 We consider the equality of valuations and show that it implies (5.32). A similar proof, not given here, can be made for the equality of degrees implying (5.33). The case when F = ε is trivial. Moreover, when val( F ) is finite, it can be proved, as shown in Remark 5.13, that this implies that val( F ) is equal to it, and (5.32) is again obvious. Finally, suppose that val( F ) = val( F ) = −∞ and F = ε. Pick some k0 ; there must exist some k0 ≤ k0 and some t0 such that f (k0 , t0 ) = e, which implies that f (k0 , t0) = e and, for all ( , τ ) (k0 , t0 ), f ( , τ ) = ε, since otherwise f (k0 , t0 ) would be equal to ε by (5.31). Set (k1 , t1 ) = (k0 − 1, t0 − 1). We can repeat the same argument and find some (k1 , t1 ) such that k1 ≤ k1 , f (k1 , t1 ) = e and, for all ( , τ ) (k1 , t1 ), f ( , τ ) = ε. Necessarily, t1 ≤ t0 − 1 since otherwise, we would have found a (k1 , t1 ) (k0 , t0 ) such that f (k1 , t1 ) = e, which is a contradiction. Hence we can construct a sequence (ki , ti ) with the mentioned property and such that ki +1 < ki and ti +1 < ti . Hence (ki , ti ) → (−∞, −∞) as i → +∞. Let any t be given. Pick the next (ki , ti ) such that ti ≤ t . Set k = ki − 1. This k fulfills the condition expressed by (5.32). def def def 3 ⇒ 1 Let X = B − A with A = (γ ⊕ δ −1 )∗ F and B = (γ ⊕ δ −1 )∗ F . To prove ◦ that F belongs to the equivalence class determined by F , we need to prove that A = B . Since we know that A ≤ B , it suffices to prove that X = ε. In a way similar to that of the proof of Theorem 5.12, it can be proved that X ≤ (γ ⊕ δ −1) X . (5.34) Suppose that there exists (k0 , t0) such that x (k0 , t0 ) = e. Then, because of (5.34), either x (k , t + 1) or x (k − 1, t ) is also equal to e. Call this new point (k1 , t1 ), where x assumes the value e. This argument can be repeated at the point (k1 , t1 ), providing the next point (k2 , t2 ) (k1 , t1 ) at which x is equal to e, etc. Therefore, we can construct an infinite sequence of points such that (ki +1 , ti +1 ) (ki , ti ) and at each point x (ki , ti ) = e, which also implies that b (ki , ti ) = e, since x = b − a . One of the following three possibilities must then occur: ◦ • the sequence {ti } is bounded from above, hence it stays at some value t for i large enough; then ki must go to −∞ when i increases; consequently, there exist arbitrarily small values of k such that b (k , t ) = e; on the other hand, according to (5.32), there exists a k such that f ( , τ ) = ε for all def ( , τ ) (k , t ) which also implies that b ( , t ) = ( ,τ ) ( ,t) f ( , τ ) = ε; this yields a contradiction; • the sequence {ki } is bounded from below, hence it stays at some value k for i large enough; then ti must go to +∞ when i increases; consequently, there exist arbitrarily large values of t such that b (k , t ) = e; on the other hand, according to (5.33), there exists a t such that f ( , τ ) = ε for all def ( , τ ) (k , t ) which also implies that b (k , τ ) = ( ,τ ) (k ,τ ) f ( , τ ) = ε; this yields a contradiction; • the sequences {ki } and {ti } are both unbounded and converge to −∞ and +∞, respectively; this again yields a contradiction with both (5.32) and (5.33). 238 Synchronization and Linearity Finally, x cannot assume the value e anywhere. Remark 5.21 With the aid of (5.27), it is seen that the condition (5.32) is equivalent to the fact that limk→−∞ d (k ) = ε which is statement 2 of Theorem 5.12. But now there is an extra condition, namely (5.33), which is equivalent to saying that d (k ) remains finite for all finite k . The reason for this extra condition is that, in Zmax [[γ ]], the point +∞ does belong to the y -axis, whereas in B[[γ , δ ]] it does not. About this issue, the reader should refer to Remark 5.14. 5.4.3 Algebra of Information about Events We are going to provide an interpretation of the algebraic manipulation of power series of B[[γ , δ ]] using the additional rule (5.30) in terms of the manipulation of information about events. Consequently, the relation order introduced earlier will be interpreted as the domination of pieces of information over one another. Given a power series F (see (5.24)), we may view each pair (k , t ) ∈ J F as the coordinates of a ‘pixel’ in Z2 which is ‘on’ (whereas pairs of exponents of (γ , δ) corresponding to zero coefficients in F represent pixels which are ‘off’). Each such pixel which is ‘on’ gives a piece of information about the associated dater d evaluated by (5.27): it says that ∀ ≥ k , d ( ) ≥ t , or, in words, ‘the event numbered k and the subsequent ones take place at the earliest at time t ’. Geometrically, the graph of d cannot cross the region of Z2 delineated by the south-east cone {(ell , τ ) | ( , τ ) (k , t − 1)}. Now, given two pixels (k1 , t1) and (k2 , t2 ), the forbidden region is of course the union of the two corresponding cones. Obviously, if (k1 , t1 ) (k2 , t2 ), the piece of information associated with the latter pixel is at least as informative as the two pieces of information together (for one cone is included in the other one) and hence the latter piece of information only may be kept. Indeed, we are just rephrasing the rule (5.30). In ax summary, power series in Min [[γ , δ ]] can be interpreted as representations of collections of pieces of information about events, and summing up two power series consists in gathering all the pieces of information brought by the two series about the same type of event. At any stage, the reduction of the representation using the rule (5.30) amounts to canceling the pieces of information which are redundant. The relation order associated with this idempotent addition expresses the domination of collections of information over one another. The particular element ε, which corresponds to the power series with zero coefficients, has all its pixels ‘off’ and therefore it brings no information at all. It is the neutral element for the addition of information. ax To complete our interpretation of the manipulations in Min [[γ , δ ]], we discuss the product operation in the next subsection in which we return to event graphs. 5.4.4 Max[[γ , δ ]] Equations for Event Graphs in 5.4.4.1 Transfer Function Let us refer back to Figure 5.1. With each transition is associated a power series in ax Min [[γ , δ ]] (with the same name as the transition itself) which encodes the information available about the corresponding dater trajectory. For the sake of simplicity, in the 5.4. Moving to the Two-Dimensional Description 239 same way as we have assumed that there is a global clock delivering the ticks numbered t for all the transitions, we assume that there is a common initial value of the numbering mechanisms at all transitions (assigning numbers k at successive transition firings). Then, each arc between two transitions, indeed the place on this arc, transmits information from upstream to downstream, but this information is ‘shifted’ by the number of ‘bars’ in terms of timing and by the number of ‘dots’ in terms of numbering. Algebraically, this shift is obtained by multiplication of the corresponding series by the appropriate monomial. For example, since the place between u and x 1 has one bar, and since e.g. u denotes the information available about the transition with the same name, the arc u → x 1 carries the information δ u . Hence x 1 ≥ δ u , that is, the information available at x 1 is at least δ u . In the same way, x 1 ≥ γ 2 δ x 1 and x 1 ≥ γ δ x 2. The transition x 1 gathers the information brought by all incoming arcs. Finally, x1 ≥ γ 2δ x1 ⊕ γ δ x2 ⊕ δ u . In the same way, we can obtain inequalities for x 2 and y . In matrix form (remember ax that all elements belong to Min [[γ , δ ]]), we obtain ≥ x1 x2 y γ 2δ γδ ≥ e γδ ε e x1 x2 x1 x2 ⊕ δ δ2 u, , of the general form x ≥ Ax ⊕ Bu , y ≥ Cx . (5.35) These inequalities should be compared with the equations obtained in §5.2. Remark 5.22 Without our assumption of a common initial value of all the numbering mechanisms, corrections should have been made as for the exponent of γ in the shift operator associated with each place in order to account for the difference in numbering initial values between the upstream and downstream transitions of this place. We make the following assumption which is different from that made in §2.5.2.3, but which will find a justification later on in this section. The initial global clock value is t = 0 by convention, and the numbering mechanism at each transition assigns the value k = 0 to the first transition firing occurring at or after time 0. This convention does not mean that tokens cannot be brought from the outside world before time 0: it suffices to include these tokens in the initial marking of the place connecting the input transition with other internal transitions. Remark 5.23 (Interpretation of ε inputs) Because ε is the bottom element, an ε input is the least constraining input possible: it is less constraining than any input γ n δ −t for 240 Synchronization and Linearity arbitrarily large n and t . Therefore, one may view ε inputs as those which correspond to bringing an infinity of tokens at time −∞. So far, only the inequalities (5.35) have been obtained. Without further information, the behavior of the system is not completely specified and nothing more can be said. In particular, lag times of tokens of the initial marking (see Definition 2.49) have not been stipulated. For the time being, we give a mathematical answer to this lack of information by selecting a ‘canonical’ solution to the system of inequalities. Later on, a more concrete interpretation of this uniquely defined solution in terms of arrival times of tokens of the initial marking in places will be given. From Theorem 4.75, we know that the least solution of (5.35) is given by x = A∗ Bu , y = Cx = C A∗ Bu , (5.36) and that it satisfies the equalities in (5.35). This solution corresponds to the earliest possible occurrences of all events. For the time being, let us assume that the least solution is indeed the one of interest. We will return to this issue later on. As an exercise, the reader may try to evaluate the expression C A∗ B for the considered example. This can be done either by Gaussian elimination (hint: first express x 2 with respect to x 1 and u , then solve a fixed-point equation in x 1 ), or, equivalently, by using the formulæ (4.100), or preferably (4.110), and (4.109), plus the simplification rules (5.5). Finally, one obtains y = δ 2 (γ δ)∗ u . (5.37) Under the assumption that the earliest possible behavior is the one which will occur, we reach the conclusion that the input-output relation of the event graph of Figure 5.1 is given by (5.37): in general C A∗ B will be called the transfer function of the system (for single-input-single-output (SISO) systems). Note that C A∗ B encodes the output dater caused by the input dater e. One possible representation of e is δ 0 (γ 0 ⊕ γ 1 ⊕ γ 2 ⊕ · · · ). Due to the convention adopted earlier regarding the initial time and the initial numbering value of events, the input e may be interpreted as the action of firing the transition u an infinite number of times at time 0 (or putting an infinite amount of tokens at time 0 at the inlet of this transition)2. This is the analogue of an impulse in conventional system theory, and therefore the transfer function may be viewed as the coding of the impulse response. We now return to the issue of selecting the ‘earliest possible solution’ of (5.35). This solution corresponds to the least constraining conditions. It does not only mean that transitions are fired immediately after being enabled, but also that ‘the best initial condition’ must be selected: this concerns the time at which tokens of the initial marking are available; these tokens must not determine the firing epochs of transitions they contribute to enable, whatever the input u is, and whatever the holding times are. For the relations (5.36) to be valid for all u , irrespective of the initial marking and holding times, we thus assume the following condition. 2 With the convention of §2.5.2.3 regarding the numbering of events, the same sequence of events would be coded by δ0 (γ 1 ⊕ γ 2 ⊕ · · · ) = γ . 5.4. Moving to the Two-Dimensional Description 241 Tokens of the initial marking are available at time −∞. This convention corresponds to always choosing lag times equal to −∞. These lag times may fail to fulfill item 2 of Definition 2.50 of weakly compatible lag times. If other lag times are desired, there is a way to introduce them without changing the above convention. This is the topic of the following discussion. 5.4.4.2 Introduction of More General Lag Times Consider a place p of an event graph with, say, two tokens in the initial marking and a holding time equal to two time units (see Figure 5.9a). Suppose that for the two tokens w x1 p x2 x1 (a) p x2 (b) Figure 5.9: The introduction of lag times of the initial marking, we wish to have the lag times w(0) and w(1), respectively. Then the event graph is modified locally in such a way as to introduce an additional place and two additional transitions, one of which is an input transition (labeled w ) as shown in Figure 5.9b. The new additional place keeps the original initial marking and holding time. The original place p is now free of any initial marking and holding time. The lag times are forced by the additional input w = γ 0 δ w(0) ⊕ γ 1 δ w(1) ⊕ γ 2 δ w(1) ⊕ · · · , (5.38) that is, the first token is introduced at time w(0) and infinitely many tokens are introduced at time w(1). Since the convention that tokens of the initial marking are available since −∞ is still assumed, it is seen that indeed the first token of the initial marking starts enabling transition x 2 at time w(0) and the second token does the same at time w(1), which is consistent with the definition of lag times. After time w(1), the input defined by (5.38) is no longer constraining for the rest of the life of the system. Consider again Figure 5.9a, and assume now that this figure represents an isolated event graph, instead of a part of a larger event graph (that is, grey arrows are discarded). Rename x 1 and x 2 as u and y , respectively. The input-output relation of such a system is y = γ 2 δ 2 u . If, say, u = e (u is an impulse at time 0), then we obtain that the corresponding output is equal to y = δ 2 (γ 2 ⊕ γ 3 ⊕ · · · ) . (5.39) In terms of information, we thus learn that the third token (numbered 2) and the next ones get out at time 2. Nothing is said about the first two tokens (those of the initial 242 Synchronization and Linearity marking). Alternatively, by completing the missing information in a canonical way (behavior ‘at the earliest possible time’), it can be said that these two initial tokens went out at −∞. This contradicts the fact that they are part of the initial marking, if this initial marking is meant to represent the exact position of tokens at time 0. This paradox occurs because the lag times, equal to −∞ after our convention, are not weakly compatible in this case (transition y is enabled twice before the initial time). We now discuss two different ways of resolving this contradiction. The first one is described below. The second one is the topic of the next paragraph. Along the lines of Chapter 2, we consider the modification shown in Figure 5.9b, and we only accept weakly compatible lag times (they serve to determine the additional input w as in (5.38)). In this specific case, these lag times must be nonnegative and less than or equal to 2, the holding time of p . Then, the input-output relation is given by yw = γ 2 δ 2 u ⊕ w . (5.40) Notice that this is now an affine, rather than a linear, function of u . For u = e and for w given by (5.38), we obtain the information already provided by (5.39) for the output tokens numbered 2, 3, . . . , but we obtain additional information regarding the epochs at which tokens numbered 0 and 1 get out. We see that yw , given by (5.40), is not ax less than y , given by (5.39), from both the Min [[γ , δ ]] (algebraic) and the informational points of view (of course, these two points of view are consistent according to §5.4.3). The input-output relation (5.40) can be considered to be linear, rather than affine, if we restrict ourselves to inputs u not less than γ −2 δ −2 w (w being given by (5.38)), which amounts to considering that the two tokens of the initial marking have also been produced by the input: this discussion will not be pursued further here but it obviously related to the notion of a compatible initial condition (see Definition 2.61). 5.4.4.3 System-Theoretic View of Event Graphs From a different point of view, we may consider event graphs as playing the role of block-diagrams in conventional system theory. Recall that, for block-diagrams, the ‘initial conditions’ are not directly related to the operators shown in the blocks of the diagram, but they are either set to zero canonically (in which case the input-output relation is indeed linear), or they are forced by additional (Dirac-like) inputs which make the ‘states’ (i.e. the initial values of the integrators in continuous time models) jump to nonzero values at the initial time. In an analogous way, we may view places of event graphs as serving the only purpose of representing elementary shift operators in the event domain (number of ‘dots’) and in the time domain (number of ‘bars’) in a pictorial way. In this more abstract (or more system-theoretic) point of view, there is no notion of ‘circulation of tokens’ involved, and therefore no applicability of any ‘initial position of tokens’. The conventional rules of Petri nets which make tokens ‘move’ inside the net (and possibly get outside) are not viewed as describing any dynamic evolution, but they are rather considered as transformation rules affecting the internal representation but not the input-output relation (at least when tokens do not get outside the system; they add 5.4. Moving to the Two-Dimensional Description 243 a shift in counting between input and output when some tokens get outside the system during these ‘moves’). As discussed in Remark 5.3, there is a counterpart to transformations which move tokens, namely transformations which move bars, and both classes of transformations correspond to linear changes of basis in the internal representation. That is, the vectors x —see (5.35)—of two such equivalent representations can be obtained from each other by multiplication by an invertible matrix with entries ax in Min [[γ , δ ]] (a shift of the output y may also be necessary when tokens or bars cross the output transitions during their moves). To illustrate this point with an example which is even simpler than that of Figure 5.9, consider an output transition y connected to an input transition u by a place with one token and a zero holding time. This is the representation of the elementary shift operator γ . There is no way to represent this elementary input-output relation y = γ u for any u by an event graph which at the same time preserves the elementary view of this object: for the token of the initial marking to be ‘here’ at time zero, we need an extra input w = e (as in Figure 5.9b), but this modified graph represents the input-output relation y = γ u ⊕ e which coincides with y = γ u only for u ≥ γ −1. Remark 5.24 Note also that this is not the first time in this book that we meet a situation in which the naive interpretation of event graphs raises problems which do not appear in a purely algebraic conception: recall the discussion of Remark 2.86 on circuits with no tokens and no bars. 5.4.4.4 Reduction of the Internal Representation It should be realized that (5.37) is also the input-output relation of the event graph shown in Figure 5.10 which is thus equivalent to the previous one from this ‘external’ u y Figure 5.10: An equivalent simpler event graph (i.e. input-output) point of view: this illustrates the dramatic simplifications provided by algebraic manipulations. As in conventional system theory, the internal structure is not uniquely determined by the transfer function, and several more or less complex realizations of the same transfer function can be given. Of course, this equivalence of internal realizations assumes our convention that tokens of the initial marking are available since −∞. To handle different lag times, one must first appeal to the transformation described above, then compute the transfer function, and finally find a reduced representation of this transfer function taking lag times (that is, additional inputs) into account. As an example, the left-hand side of Figure 5.11 displays the event graph of Figure 5.1 for which additional inputs allow the choice of lag times for all tokens of the initial marking. For this new event graph, 244 Synchronization and Linearity w1 w2 w3 w1 u w2 w3 u x3 x1 x5 x4 x2 y y Figure 5.11: Event graph with lag times and its reduced representation the input-output relation turns out to be y = (γ δ)∗ δ 2 u ⊕ w1 ⊕ w2 ⊕ w3 : another simpler event graph which has the same transfer function. Remark 5.25 (Multiple-Input Multiple-Output—MIMO—systems) All the notions presented in this section extend without difficulty to the case when the input u and the output y are (column) vectors of respective dimension m and p . In particular, C A∗ B (see (5.36)) is then a p × m matrix (called the transfer matrix), the entries of which are polynomial elements of Max[[γ , δ ]]. in 5.5 Counters 5.5.1 A First Derivation of Counters As already mentioned in §5.1, discrete event systems, and in particular event graphs, can be described in the event domain by daters, or in the time domain by counters. We also mentioned that for a counter t → c(t ) associated with some type of event, the precise meaning given to c(t ) deserves some attention: roughly speaking, it represents the value reached by the corresponding numbering mechanism of events at time t ; however, several events may take place simultaneously at time t . On the contrary, there is no ambiguity in speaking of the epoch d (k ) of the event numbered k . From a mathematical point of view, this issue about counters will receive a natural solution in this section. Yet, from an intuitive point of view, the interpretation given to c(t ) might not appear as the most natural one. This interpretation will be given only after the relationship with the corresponding dater has been established. At this stage, we can discuss the two possibilities of describing event graphs, by daters or by counters, from an abstract point of view. Consider two different ways of moving from the upper left-hand corner to the lower right-hand corner of the commutative diagram of Figure 5.7, namely via the eastern then southern directions on the one hand, and via the southern then eastern directions on the other hand. The former 5.5. Counters 245 path has been described in some detail: the first move is developed in §5.4.2.1, and the second move in §5.4.2.2. We can interpret the latter path in the same way. The first move goes from B[[γ , δ ]] to γ ∗ B[[γ , δ ]] which is isomorphic to (γ ∗ B[[γ ]]) [[δ ]]. For reasons dual to those given in §5.4.1, γ ∗B[[γ ]] is isomorphic to Zmin by the correspondence from Zmin to γ ∗ B[[γ ]]: γ k γ ∗ if k ∈ Z ; (5.41) k → ε (zero series) if k = ε = +∞ ; −1 ∗ ∗ −1 ∗ −1 ∗ ∗ (γ ) γ = (γ ⊕ γ ) = (γ ) ⊕ γ if k = = −∞ , which is the counterpart of (5.23). Therefore, (γ ∗B[[γ ]])[[δ ]] can be viewed as the subset of B[[γ , δ ]] (the latter encodes collections of points in Z2 ) corresponding to epigraphs of mappings g : t → g (t ) from Z into Zmin , or alternatively, as the set of δ -transforms of those mappings g . Remark 5.26 We still speak of ‘epigraph’, that is the part of the plane above the graph (see Definition 3.37), although, in Zmin , the dioid order is reversed with respect to the conventional order. The second move in the diagram, namely that going from γ ∗ B[[γ , δ ]] to Min [[γ , δ ]] through the south, corresponds to selecting only the nondecreasing mappings from Z to Zmin which are precisely the counters. The approximation of nonmonotonic functions by nondecreasing ones is ‘from below’. Again, the words ‘nondecreasing’ and ‘from below’ should be understood with reference to the conventional order. These two moves can be summarized by the following formulæ which are the counterparts of (5.25) and (5.27): the mappings g and c defined by ax g : t → g (t ) = inf k , (k , t )∈ J F c : t → inf g (s ) = inf k s ≥t ( k ,s ) ∈ J F s ≥t (5.42) are successively associated with a power series F ∈ B[[γ , δ ]], or with the corresponding collection of points in Z2 (see (5.24)). In terms of ‘information’, a pixel (k , t ) ∈ J F , tells that, at time t , the counter reaches at most the value k (since, from (5.42), c(s ) ≤ k for all s ≤ t ). Conversely, to a counter c : Z → Zmin corresponds a power series in ax Min [[γ , δ ]], namely {t |−∞<c (t )<+∞} γ c (t )δ t ⊕ (γ −1 )∗ δ t , (5.43) {t |c (t )=−∞} a formula which follows from (5.41). This formula is the counterpart of (5.28). 5.5.2 Counters Derived from Daters Let us now discuss the relationship that exists between the dater and the counter associated with the same type of event. We are going to prove that under some mild condition, the counter is the dual residual of the dater, i.e. c = d . 246 Synchronization and Linearity However, for this to be possible, d must first be considered as an isotone function between two complete dioids. Indeed, as already discussed, d is a monotonic mapping from Zmin into Zmax ; as such, it is antitone rather than isotone (that is, with the dioid orders in the domain and the range, it is ‘decreasing’ rather than ‘increasing’). However, we may as well consider the mapping d from Z (with the natural order) into itself having the same graph as d . This d is an isotone mapping from a complete lattice into itself. Because of the end-point conditions (5.6) that we imposed on purpose, d can be both residuated and dually residuated, provided that the required semicontinuity conditions be fulfilled (see Theorems 4.50 and 4.52, statements 2). The following theorem discusses the dual residuation. By abuse of expression, we speak of the residuation of d rather than of d . Theorem 5.27 A dater d is dually residuated if and only if lim d (k ) = −∞ . (5.44) k →−∞ ax Then, if a power series in Min [[γ , δ ]] is associated with d by (5.28), and if c is derived from this series by (5.42), then c = d . Proof Since d is a mapping between two totally ordered and discrete sets, the required upper-semicontinuity condition (see Definition 4.43) has to be checked only for subsets {ki } such that i ki = −∞. Since we imposed d (−∞) = −∞, condition (5.44) follows. Then, the set of pixels {(k , t )} defined by (k , d (k )) ∪ −∞<d (k )<+∞ (k , τ ) d (k )=+∞ τ ≥0 is associated with d . Indeed, this is one possible collection, since a south-east cone can be attached to any pixel of the collection. The above formula follows from (5.28). Finally, looking at (5.42), it should be clear that c(t ) = inf k , (5.45) d ( k ) ≥t which is nothing but a possible definition of d (see statement 1 of Theorem 4.52). Remark 5.28 If (5.44) is satisfied, then it follows from (4.23) and (4.24) that ∀t , d (c(t )) ≥ t and ∀k , c(d (k )) ≤ k . (5.46) Moreover, since c = d , then d = c , hence d (k ) = sup t . (5.47) c ( t ) ≤k Relation (5.45) always holds true, even if condition (5.44) is not fulfilled. But then, we cannot say that c is the dual residual of d since the other statements of Theorem 4.52 5.5. Counters 247 do not hold true, in particular (5.46) and (5.47) may be wrong. For example, consider the mapping defined by (5.6) and d (k ) = 0 for all finite k . Then the corresponding c is such that c(t ) = +∞ for t > 0 and c(t ) = −∞ otherwise. Therefore, d (c(0)) = d (−∞) = −∞ < 0, in contradiction with (5.46). Also, supc(t )≤−∞ t = 0 > d (−∞) = −∞, in contradiction with (5.47). However, this discussion is of purely mathematical interest since, given our conventions, any realistic dater, even after some finite shift in numbering due to the initial marking, will be such that d (k ) = −∞, ∀k < k0 for some finite k0 . That is, condition (5.44) will always be satisfied in practice. Remark 5.29 (Interpretation of counters) In words, the relation (5.45) expresses that c(t ) is the smallest value the numbering mechanism will reach at or after time t ; otherwise stated, c(t ) is the number of the next event to come at or after time t . This explains the inequalities (5.46) which may seem counterintuitive at first sight. An alternative definition of counters is considered in the next subsection. 5.5.3 Alternative Definition of Counters Another possible definition of counters is c = d , provided that d be residuated. A necessary and sufficient condition is that d be l.s.c., which here is implied by the dual condition of (5.44), namely lim d (k ) = +∞ . k →+∞ (5.48) This condition means that no infinite numbers of events occur in finite time. Then, the residuated mapping c of d is, by definition, equal to c(t ) = sup k . d ( k ) ≤t (5.49) Even if (5.48) is not fulfilled, we can take (5.49) as the definition of c (but then, we cannot say that, conversely, d (k ) = infc(t )≥k t ). The meaning of this new counter c(t ) is the number of the last event which occurs before or at t . This might seem a more natural definition for counters than the one corresponding to c(t ). However, from a mathematical point of view, there is a drawback in manipulating this alternative notion of counters. To explain this point, we first establish the following result. Lemma 5.30 If c and c are derived from the same d by (5.45) and (5.49), respectively, then ∀t , c(t ) = c(t − 1) + 1 . Proof Let us start from (5.45). A c(t ) so defined is characterized by the fact that d (c(t )) ≥ t but also d (c(t ) − 1) < t , that is, d (c(t ) − 1) ≤ t − 1. Observe that if we def set k = c(t ) − 1, these inequalities also tell us that k = supd ( )≤t −1 , which is nothing but c(t − 1). 248 Synchronization and Linearity ax Given c, a two-dimensional representative in Min [[γ , δ ]] can be associated with c by (5.43). Therefore, owing to the above lemma, given c, the corresponding twodimensional representative is given by γ c (t )+ 1 δ t + 1 ⊕ {t |−∞<c(t )<+∞} (γ −1 )∗ δ t +1 , {t |c(t )=−∞} which is of course not as convenient as with c. 5.5.4 Dynamic Equations of Counters We return to the definition (5.45) of counters. We introduce the following notation: • Equation (5.45) defines a functional Z from the set of nondecreasing mappings d : Z → Zmax (referred to as the ‘dater set’) to the set of nondecreasing mappings c : Z → Zmin (referred to as the ‘counter set’); if we restrict ourselves to mappings d which satisfy (5.6) and (5.44), then Z (d ) is simply d (and then, Z −1 (c) = c ); • in the dater set, the pointwise addition of mappings (denoted ⊕) is the operation of upper hull; in the counter set, the pointwise addition (denoted ⊕) is the lower hull; • in the dater set, the ‘shift’ {k → d (k )} → {k → d (k − 1)} is denoted γ for obvious reasons (see Remark 5.6); similarly, the same kind of shift in the counter set is denoted δ ; • in the dater set, the ‘gain’ {k → d (k )} → {k → d (k ) + 1} is denoted 1d ; the analogous unit gain in the counter set is denoted 1c . Lemma 5.31 With the previous notation, we have, for all d or di in the dater set: Z (d1 ⊕ d2 ) = Z (d1 )⊕ Z (d2 ) , (5.50) Z (γ d ) = 1c Z (d ) , (5.51) Z (1d d ) = δ Z (d ) . (5.52) Proof A proof can be given by playing with Definition (5.45) of Z and with the meaning of the notation γ , δ , . . . . A smarter proof is obtained in the following way. For a given d , let D denote the element in Max[[γ , δ ]] defined by the right-hand side of (5.28). in ax Similarly, for a given c, let C denote the element in Min [[γ , δ ]] defined by (5.43). If ax c = Z (d ), it should be clear that C = D in Min [[γ , δ ]] (that is, C and D are two representatives of the same equivalence class). The series γ D corresponds to the dater γ d , hence the series associated with the counter Z (γ d ) must be γ C , but this series corresponds, through (5.43), to the series associated with the counter 1c c = 1c Z (d ). This proves (5.51). Formula (5.52) can be similarly proved (by noticing that the series associated with 1d d is δ D ). 5.6. Backward Equations 249 As for (5.50), let Di , i = 1, 2, be the series associated with the daters di , i = 1, 2. Then, because of the second rule (5.5), which can be used in Max[[γ , δ ]], D1 ⊕ D2 in ax (here ⊕ is the addition in Min [[γ , δ ]]) is associated with d1 ⊕ d2 (pointwise maximum of the mappings d1 and d2 ). Similarly, for counters, C1 ⊕ C2 is associated with c1 ⊕c2 because of the first rule (5.5). With these observations at hand, the proof of (5.50) is easily completed. Remark 5.32 In the case when Z (d ) = d , and with the necessary adaptation of notation, (5.50) can be viewed as a stronger version of (4.38). As a consequence of Lemma 5.31, Equations (5.2) can be derived from (5.1). For example, 1x 1 (k −2) is the value at k of the dater 1d γ 2 x 1 with which the counter δ(1c )2 x 1 is associated according to (5.51)–(5.52) (here, the dater and its associated counter are denoted with the same symbol, as we did in §5.2). Therefore, the term 2x 1 (t − 1) corresponds to the term 1 x 1 (k − 2) in the counter equations. And ⊕ in dater equations (that is, max) is converted to ⊕ in counter equations (that is, min) according to (5.50). Afterwards, it is realized that 1d , respectively 1c , could have been denoted δ , respectively γ . Using Lemma 5.30, once Equations (5.2) have been established using one notion of counters (given by (5.45)), it is clear that these equations are also valid with the alternative notion of counters (given by (5.49)). 5.6 Backward Equations So far we have been interested in computing outputs produced by given inputs. In the dater description, outputs are sequences of the earliest dates at which events (numbered sequentially) can occur. Sometimes, it may also be useful to derive inputs from outputs, which, roughly speaking, corresponds to ‘inverse’ the system. More precisely, and still in the dater setting, we may be given a sequence of dates at which one would like to see events occur at the latest , and we are asked to provide the latest input dates that would meet this objective. It is the topic of this section to discuss this problem. From a mathematical point of view, as long as the transfer function (matrix, in the MIMO case) has to be inverted, it is no surprise that residuation plays an essential role. This inversion translates into the fact that recurrent equations in the event domain for daters, respectively in the time domain for counters, now proceed backwards in ◦ event numbering, respectively in time. Moreover, the ‘algebra’ (∧, \) is substituted for (⊕, ⊗) in these backward equations. These equations offer a strong analogy with the adjoint-state (or co-state) equations of optimal control theory. 5.6.1 Max[[γ , δ ]] Backward Equations in Consider a system in Max[[γ , δ ]] described by Equation (5.35) or (5.36) (recall that in (5.36) yields the least solution y of (5.35) with either inequalities or equalities). Let y 250 Synchronization and Linearity be given. The greatest u such that z = H u = C A∗ Bu ≤ y (5.53) ◦ u = H y = C A∗ B \ y . (5.54) def is, by definition, obtained as From the previous results, in practice (5.53) means that, in the dater description, the output events produced by u occur not later than those described by y ; moreover, u being the ‘greatest’ input having property (5.53), the input events corresponding to u occur not earlier than with any other input having property (5.53). Recall that y = C A∗ Bu can be also described as the least solution of x = Ax ⊕ Bu , y = Cx . (5.55) We are going to give a similar ‘internal’ representation for the mapping H defined by (5.54). Lemma 5.33 Let u be derived from y by Equation (5.54). Then u is the greatest solution of the system ξ u ξ y ∧ , A C ξ . B = = (5.56) (5.57) This is equivalent to saying that ξ must be selected as the greatest solution of (5.56). Moreover, the symbol ‘=’ can be replaced by ‘≤’ in (5.56)–(5.57). Proof We have u = = = ◦ C A∗ B \ y ◦ ◦ A∗ B \ (C \ y ) ◦ ◦ ◦ B \ ( A∗ \ (C \ y )) owing to (5.54), thanks to (f.9), (same reason). def ◦ ◦ Let ξ = A∗ \ (C \ y ). By Theorem 4.73, we know that ξ is the greatest solution of (5.56) with equality or with the inequality ≤. If x (hence ξ ) is n -dimensional, and if u , respectively y , is p -dimensional, respectively m -dimensional, then, by using Formula (4.81), (5.56)–(5.57) can be more explicitly written as p n ξj yr ∧ , ∀i = 1, . . . , n , ξi = A ji Cri j =1 r =1 (5.58) n ξs ∀ = 1, . . . , m , u= . Bs s =1 5.6. Backward Equations 251 Observe again the transposition of matrices and the substitution of the operation ∧, ◦ respectively \, to the operation ⊕, respectively ⊗. However, these equations are not ‘linear’. Owing to (f.1) and (f.9), the mapping H rather obeys the dual properties: H ( y ∧ z ) = H ( y ) ∧ H (z ) , ◦ ◦ H (α \ y ) = α \ H ( y ) , ax where α is a ‘scalar’ (i.e. α ∈ Min [[γ , δ ]]). 5.6.2 Backward Equations for Daters We are going to translate the preceding equations in the setting of the dater description. Consider a system described by equations of the form (2.36), but with matrices (that is, holding times) which do not depend on the event number k (hence we will write e.g. A( ) for A(k , k − )). More specifically, we consider a system described by Rmax equations of the form: x (k ) = A(0)x (k ) ⊕ · · · ⊕ A( M )x (k − M ) ⊕ B (0)u (k ) ⊕ · · · ⊕ B ( M )u (k − M ) , y (k ) = C (0)x (k ) ⊕ · · · ⊕ C ( M )x (k − M ) . (5.59) There is of course no loss of generality in assuming that there is the same delay M for x and u and in both equations (this possibly amounts to completing the expressions with terms having zero matrix coefficients). The γ -transforms of these equations yield (5.55) where x , u , y now denote the γ -transforms of signals, and similarly, e.g. M A= A( )γ . =1 In the same way, (5.56)–(5.57) are still valid with the new interpretation of notation. Using (4.97), we can write these equations more explicitly in terms of power series in γ . Taking into account that A, B , C are in fact polynomials, i.e. power series for which coefficients are ε for powers of γ out of the set {0, . . . , M }, we finally obtain the relations, for all k : ξ(k ) ξ(k + M ) y (k ) y (k + M ) ∧ ... ∧ ∧ ∧ ... ∧ , ξ(k ) = A(0) A( M ) C (0) C(M ) ξ(k ) ξ(k + M ) u (k ) = ∧ ... ∧ . B (0) B( M ) (5.60) From these equations, the backward recursion is clear. To alleviate notation, let us now limit ourselves to M = 1, that is, consider the standard form (2.39), namely, x (k + 1) = Ax (k ) ⊕ Bu (k ) ; y (k ) = Cx (k ) . (5.61) 252 Synchronization and Linearity Then, combining (5.60) with (5.58), we obtain n ξ j (k + 1) ∀i = 1, . . . , n , ξi (k ) = ∧ A ji j =1 n ∀ = 1, . . . , m , u (k ) = s =1 ξs (k ) . Bs p r =1 yr (k + 1) , Cri (5.62) Let us rewrite these equations with conventional notation. We refer the reader to Example 4.65, to the rule regarding the ambiguous expression ∞ − ∞ that may show up ◦ if ratios such as ε \ε are encountered, and finally to the warning about the ambiguity of conventional notation since the expression ∞−∞ may also be obtained as the result of ε ⊗ which yields a different value. With this warning in mind, (5.62) can be written: p n ξi (k ) = min min ξ j (k + 1) − A j i , min ( yr (k + 1) − Cri ) , j =1 r =1 (5.63) n u (k ) = min (ξs (k ) − Bs ) . s =1 The reader is invited to establish these equations by a direct reasoning with an event graph for which the sequence { y (k )}k∈Z of desired output dates is given. It is then realized that the recursion is not only backward in event numbering (index k ) but also backward in the graph (from output transitions to input transitions through internal transitions). A few words are in order regarding ‘initial conditions’ of the recursion (5.63). This problem arises when the backward recursion starts at some finite event number, say kf (‘f ’ for ‘final’), because desired objectives { y (k )} are only given up to this number kf . In accordance with the idea of finding the ‘latest input dates’, that is the greatest subsolution of C A∗ Bu ≤ y as supposed by residuation theory, the missing information must be set to the maximum possible value. This amounts to saying that y (k ) must be for k > kf . As for set to = +∞ beyond kf , and more importantly ξ(k ) = the tokens of the initial marking, they are still supposed to be available at time −∞ (see page 241) since this is the assumption under which the ‘direct’ system obeys the input-output relation at the left-hand side of Inequality (5.53). At the end of this section, let us consider the following situation. Suppose that some output trajectory y (·) has been produced by processing some input v(·) through a system obeying Equations (5.61). This output trajectory is taken as the desired latest output trajectory. Of course, it is feasible since it is an actual output of the system. Then, if we compute the latest possible input, say u , that meets the given objective by using Equations (5.62), this u will also produce the output y , and it will be greater than or equal to v . Therefore, the pairs (v, y ) and (u , y ) are both solutions of (5.61), but two different internal state trajectories, say x and x , respectively, would be obtained with x ≤ x . Moreover, the trajectory x is also different from ξ which is computed by (5.62) using y as the input. The differences ξi (k ) − x i (k ), i = 1, . . . , n ; k ∈ Z, are nonnegative since (5.62) corresponds to the backward operation at ‘the latest time’ 5.7. Rationality, Realizability and Periodicity 253 whereas (5.61) describes the forward operation ‘at the earliest time’ for the same inputoutput pair (u , y ). For the k -th firing of the i -th transition, the difference ξi (k ) − x i (k ) indicates the time margin which is available, that is, the maximum delay by which this firing may be postponed, with respect to its earliest possible occurrence, without affecting the output transition firing. This kind of information is of course very useful in practical situations. Let us finally summarize the equations satisfied by the pair (x , ξ ). The following is derived from (5.61)–(5.62) in which u and y are replaced by their expressions, namely ◦ u = B \ξ and y = Cx , respectively. Then we obtain the system ξ(k ) x (k + 1) = Ax (k ) ⊕ B , B (5.64) Cx (k + 1) ξ(k + 1) ∧ . ξ(k ) = C A This system is very reminiscent of state/co-state (or Hamiltonian) equations derived for example from Pontryagin’s minimum principle for optimal control problems in conventional control theory. Moreover, the difference ξi (k ) − x i (k ) alluded to above is ◦ the i -th diagonal entry of the matrix ξ(k )/ x (k ). Pursuing the analogy with conventional control problems, it is known that introducing the ‘ratio’ of the co-state vector by the state vector yields a matrix which satisfies a Riccati equation. 5.7 Rationality, Realizability and Periodicity 5.7.1 Preliminaries At this point, we know that event graphs can be described by general equations of the form (5.55) in which the mathematical form of u , x , y , A, B , C depends on the description adopted. Essentially u , x , y may be: • power series in γ with coefficients in Zmax ; • power series in δ with coefficients in Zmin ; • power series in (γ , δ) with coefficients in {ε, e}. As for A, B , C , they are matrices with polynomial entries of the same nature as the power series u , x , y , but with only nonnegative exponents since tokens of the initial marking and holding times introduce nonnegative ‘backward’ shifts (see Remark 5.6) in event numbering, respectively in time. The input-output relation u → y is given by y = C A∗ Bu , hence the entries of this so-called transfer matrix belong to the rational closure (see Definition 4.99) of the corresponding class of polynomials with nonnegative exponents. One purpose of this section is to study the converse implication, namely that a system with a ‘rational’ transfer matrix does have a finite-dimensional ‘realization’ of the form (5.55) with polynomial matrices. In fact, this is an immediate consequence of Theorem 4.102. Moreover, playing with the various possibilities of 254 Synchronization and Linearity realizations provided by Theorem 4.105, we will recover essentially the three possible descriptions of systems alluded to above by starting from a single framework, namely ax the one offered by the Min [[γ , δ ]] algebra. Remark 5.34 In an event graph, if there are direct arcs from input to output transitions, one obtains an output equation of the form y = Cx ⊕ Du and a transfer matrix of the form C A∗ B ⊕ D . However, by redefining the ‘state’ vector as x = x u , it is possible to come back to the form C A∗ B with C = C D , B = B e and A = diag( A, ε), but at the price of increasing the ‘state’ dimensionality. Indeed, in what follows the issue of the ‘minimal’ realization will not be addressed since only partial results have been obtained so far. The equivalence between rationality of the transfer matrix and its ‘realizability’ is a classical result in both conventional linear system theory and in automata and formal language theory. However, there is here a third ingredient coming in: rationality is also equivalent to some ‘periodicity’ property of the transfer function or the impulse response. This is analogous to the situation of rational numbers which have a periodic decimal expansion. We will only address the SISO case. The MIMO case (e.g. 2 inputs, 2 outputs) can be dealt with in a trivial manner by considering all the individual scalar transfer functions Hi j : u j → yi , j = 1, 2, i = 1, 2, first. Suppose that 3-tuples ( Ai j , Bi j , Ci j ) have been found to realize Hi j in the form (5.55) ( Ai j is in general a matrix, not a scalar). Then, it is easy to check that the 3-tuple ε ε ε ε A11 B11 ε A12 B12 ε ε , B = ε , A= ε B21 ε A21 ε ε ε ε ε A22 ε B22 C= C11 ε C12 ε ε C21 ε C22 , is a realization of the 2 × 2 transfer matrix. Of course, this way of handling MIMO systems does not consider the dimensionality of the realization explicitly. In the following, we will comment on the MIMO case when appropriate. 5.7.2 Definitions We start with the following definitions. ax Definition 5.35 (Causality) An element h of Min [[γ , δ ]] is causal either if h = ε or if val(h ) ≥ 0 and h ≥ γ val(h) . ax This definition is somewhat technical, due to the fact that h , as an element of Min [[γ , δ ]], has various formal representations, and among them e.g. the maximal one which involves the multiplication by (δ −1 )∗ (hence it may have monomials with negative exax ponents in δ ). However, the definition, while using the language of Min [[γ , δ ]], clearly 5.7. Rationality, Realizability and Periodicity 255 says that the graph of the associated dater lies in the right-half plane and above the ax x -axis. It can then be formally checked that the set of causal elements of Min [[γ , δ ]] ax is a subdioid of Min [[γ , δ ]]. For example, if p and q are causal, then val( p ⊕ q ) = min(val( p), val(q )) ≥ 0, and γ val( p ⊕q ) = γ min(val( p ),val(q )) = γ val( p ) ⊕ γ val(q ) ≤ p ⊕ q , proving that p ⊕ q is also causal. A similar proof can be given for p ⊗ q . ax Definition 5.36 (Rationality) An element of Min [[γ , δ ]] is rational if it belongs to the def rational closure of the subset T = {ε, e, γ , δ }. A vector or matrix is rational if its entries are all rational. Indeed, because of the choice of the basic set, the rational elements will also be causal. ax Definition 5.37 (Realizability) A matrix H ∈ Min [[γ , δ ]] be written as H = C (γ A1 ⊕ δ A2 )∗ B p ×m is realizable if it can (5.65) where A1 and A2 are n × n matrices, n being an arbitrary but finite integer (depending on H ), C and B are n × m and p × n matrices respectively, and each entry of these matrices is equal to either ε or e. Definition 5.38 (Periodicity) An element h of Max[[γ , δ ]] is periodic if there exist two in polynomials p and q and a monomial m (all causal) such that h = p ⊕ qm ∗ . (5.66) A matrix H is periodic if its entries are all periodic. Here, we adopt a ‘mild’ definition of periodicity. It is however mathematically equivalent to seemingly other more sophisticated definitions which put further constraints on the polynomials p and q . This point will be discussed in §5.7.4. At this stage, it suffices to understand that, if one considers h as the (γ , δ)-transform of a trajectory, say an impulse response, the intuitive meaning of Formula (5.66) is that a certain pattern represented by q is reproduced indefinitely, since the multiplication by m = γ r δ s represents a shift by r units along the x -axis—event domain—and s units along the y -axis—time domain—and qm ∗ = q ⊕ qm ⊕ qm 2 ⊕ · · · is the union of all these shifted versions of q . This periodic behavior occurs after a certain transient which is essentially (but not always exactly, as we shall see) represented by p . The ratio s / r represents the asymptotic slope of the graph of the dater associated with h , and thus the asymptotic output rate: on the average, r events occur every s time units. The extreme cases s = 0 and r = 0 will be discussed in §5.7.4. 5.7.3 Main Theorem ax Theorem 5.39 For H ∈ Min [[γ , δ ]] lent p ×m , the following three statements are equiva- 256 Synchronization and Linearity (i) H is realizable; (ii) H is rational; (iii) H is periodic. Proof The implication (i) ⇒ (ii) is straightforward. The converse (ii) ⇒ (i) follows, at least for the SISO case, from Theorem 4.105 with B = C = U = B = {ε, e} and V = {γ , δ }. We then observe that U = U hence U ⊗ V = {ε, γ , δ, γ ⊕ δ }. It remains to split up matrix Ax , which appears in the (B, C )-representation (4.106) (with entries in U ⊗ V ), into γ A1 ⊕ δ A2 , which offers no difficulty. The MIMO case is handled as indicated previously. We now outline the proof of the equivalence (ii) ⇔ (iii). Since the definitions of periodicity and rationality refer to the entries of H individually, it suffices to deal with the SISO case. The implication (iii) ⇒ (ii) is obvious: if h can be written as in (5.66), ax then clearly h ∈ T . Conversely, if h is rational, since Min [[γ , δ ]] is a commutative dioid, we can use Theorem 4.110 (applied to the dioid closure of T ) to see that h can be written as ∗ h γ αi δ βi = i∈I γ rj δsj j ∈ Ji αi βi = γδ i∈I γ rj δsj ∗ , (5.67) j ∈ Ji where I and the Ji are finite sets, αi , βi , r j , s j are nonnegative integers, and (5.67) follows from (4.109). The proof is then completed by showing that (5.67) is amenable to the form (5.66) where m is essentially the monomial γ r j δ s j with maximal ‘slope’ s j / r j . Indeed, the term (γ r j δ s j )∗ tends to asymptotically dominate all other similar terms in sums and products. When the monomial with maximal slope is nonunique, the precise rules for obtaining the monomial m used in (5.66) will be given in the proof of Theorem 6.32 in the next chapter. The precise derivation of this last part of the proof, which is rather technical, will be skipped here. The reader is referred to [44] for a more detailed outline and to [62] for a full treatment. In the above proof, instead of attempting to prove the implication (ii) ⇒ (iii), we might have proved the implication (i) ⇒ (iii) using Theorem 4.109 and Formula (4.110). This would have provided some insight into how the monomial m i j appearing in the representation (5.66) of each Hi j (in the form of m ∗j ) is related to the weights of ciri cuits of G ( A), where A = γ A1 ⊕ δ A2 appears in (5.65). The graph G ( A) is drawn as explained in §4.7.3 (see Figure 4.8). Then, for an input-output pair (u j , yi ), we consider all (oriented) paths connecting these transitions and all circuits which have at least one node in common with those paths. These are the circuits of interest to determine the maximal ratio si j / ri j . If there is no such circuit, this means that the polynomial q of (5.66) is ε and thus m i j is irrelevant. As a consequence, if matrix A is strongly connected (which in particular precludes any direct path from the input to the output, 5.7. Rationality, Realizability and Periodicity 257 given that we have not included a term Du — see Remark 5.34), then the ratios si j / ri j take a unique value for all the m i j . We could also have proved that (iii) ⇒ (i) directly by providing an explicit realization of a periodic element. This essentially follows the scheme illustrated by Figure 6.4 in the next chapter. 5.7.4 On the Coding of Rational Elements In this subsection, it is helpful to use the pictorial representation of (causal) elements of Max[[γ , δ ]] by collection of points in the N2 -plane: a monomial γ k δ t is represented in by a point with coordinates (k , t ) and we assume that polynomials are represented by their minimum representatives (which do exist—see Theorem 5.20). Let the monomial m involved in (5.66) be equal to γ r δ s . • If s = 0, then a = p ⊕ qm ∗ = p ⊕ q since m ∗ = (γ r )∗ = e, that is, the impulse response a is a polynomial : this is the behavior of an event graph having either no circuits or only circuits with zero holding time. Such a system is able to work at infinite speed. This ‘finite’ impulse response thus corresponds to the situation when an infinite number of tokens get out at time t = deg(a ); the first tokens get out earlier as described by p ⊕ q . From the point of view of the minimum coding of a , there no way but to retain the minimum representative of p ⊕ q . • If s = 0 but r = 0 (and q = ε), and since m ∗ = (δ s )∗ corresponds to an infinite slope, the impulse response is ‘frozen’ after a transient during which some tokens get out (those numbered from 0 to val(q ) − 1): this is indeed the kind of behavior one would like to call a ‘finite response’. It happens when there is a circuit without tokens but with a positive holding time: the system is ‘deadlocked’. The nontrivial part of a is provided by p − γ val(q )δ ∗ (the part of ◦ the plot of a which lies at the left hand of the vertical asymptote) which should again be coded by its minimum representative. These two particular cases will no longer be considered in detail. Hence, from now on, we assume that r > 0 and s > 0 unless stated otherwise. With (5.66), we have adopted a characterization of rational (or periodic) elements which is mathematically simple. The forthcoming lemma shows that this characterization is equivalent to another one which puts further constraints on the polynomials p and q . The latter definition will be used in Chapter 6 (see Theorem 6.32). It has the advantage of making the interpretation of the new p as the transient part and of the new q as the periodic pattern (see Figure 6.2) possible. Indeed, p can then be represented by points contained in a box of width ν − 1 and height τ − 1 with its lower left-hand corner located at the origin. The periodic pattern q can be encapsulated in a box of width r − 1 and height s − 1 with its lower left-hand corner located at the point (ν, τ ). This box is translated indefinitely by the vector (r, s ). These conditions are now expressed mathematically in the following lemma. ax Lemma 5.40 An element a ∈ Min [[γ , δ ]] is rational if and only if it is a polynomial, or an element of the form p ⊕ γ ν δ ∗ ( p is a polynomial), or if there exist positive integers 258 Synchronization and Linearity r and s, nonnegative integers ν and τ , and polynomials p= γ k δt with k ≤ ν − 1, t ≤ τ − 1, ∀(k , t ) ∈ J p , (k , t )∈ J p γ κ δθ q= with ν ≤ κ ≤ ν + r − 1, τ ≤ θ ≤ s + τ − 1, ∀(κ, θ) ∈ Jq , (κ,θ )∈ Jq ( J p and Jq are necessarily finite) such that a = p ⊕ q (γ r δ s )∗ . Proof Only the case when r > 0, s > 0 and q = ε is of interest here. Moreover, since the new characterization of rational elements is a particular case of (5.66), it suffices to prove that, conversely, (5.66) is amenable to the new characterization. Consider a = def p ⊕ q (γ r δ s )∗ for any polynomials p and q . Let τ = max (deg ( p ) + 1, deg (q ) − s + 1). For any (k , t ) ∈ Jq , there exists a unique (k,t ) ∈ Z such that τ ≤ t + (k,t ) s ≤ τ + s − 1. Because deg(q ) ≤ τ + s − 1 and s > 0, necessarily (k,t ) ≥ 0. We consider γ k+ def q= ( k ,t ) r δt + ( k ,t ) s . (5.68) ( k , t ) ∈ Jq def def In addition, let α = max(k,t )∈ Jq (k,t ) and ν = val (q ). Observe that all points representing monomials appearing at the right-hand side of (5.68) lie in a strip of height s − 1 delimited by the horizontal lines y = τ and y = τ + s − 1, and at the right-hand closed half plane bordered by the vertical line at x = ν . Let a minimum representative of a be written (k,t )∈ Ja γ k δ t ( Ja is countably infinite). This minimum representative does exist since we deal with causal elements and since we assume that r > 0 (see Theorem 5.20). Consider any (k , t ) ∈ Ja . If t ≥ τ , then the corresponding monomial cannot belong to p since τ > deg( p ). Hence, it necessarily belongs to some qm n . If t < τ , then this monomial may belong to either p or to some qm n but then n < α : indeed, for n ≥ α , we have by construction that def t ≥ τ . Hence, if we set p = p ⊕ q ⊕ qm ⊕ · · · ⊕ qm α −1 , we can consider all pairs (k , t ) ∈ Ja with t < τ as coming from p . We now prove that the other pairs can be explained by monomials of q m ∗ . If (k , t ) ∈ Ja and t ≥ τ , then there exist k , t ∈ Jq and ≥ 0 such that (k , t ) = k , t + × (r, s ). Moreover, ≥ (k ,t ) , hence (k , t ) = k , t + − (k,t ) × (r, s ) , where γ k δ t is one of the polynomials involved in (5.68). At this point, we have proved that all monomials of a = p ⊕ qm ∗ are among the monomials of p ⊕ q m ∗ , hence a ≤ p ⊕ q m ∗ . But the converse statement is also true since the monomials in p and q have been obtained from a . Hence p ⊕ q m ∗ is another expression for a . To complete the proof, we must delete monomials which are useless from this expression (because they are dominated by other polynomials in the same expression ax in the sense of the order relation of Min [[γ , δ ]]). Firstly, concerning p , if monomials of 5.7. Rationality, Realizability and Periodicity 259 p, thus also of a , have a degree greater than or equal to τ , they can also be obtained by other monomials of q m ∗ (proceed as previously) and thus they can be dropped from p. Let m = γ ν δ τ be the monomial of q with valuation ν (recall that ν = val (q )). Observe that τ ≥ τ . This monomial dominates the monomials γ k δ t of p with t < τ but k ≥ ν which can thus also be dropped from p . Finally, the new p stays in the lower left-hand part of the plane delimited by the horizontal line y = τ − 1 and the vertical line x = ν − 1. Secondly, concerning q , consider the monomials of q with valuation greater than ν + s − 1: their degree being necessarily less than τ + s , they are dominated by the monomial γ ν +r δ τ +s contained in q m . This observation is of course preserved by the successive translations (r, s ). Therefore, a new q can be used which stays in the box given in the statement of the lemma. We refer the reader to [62] in which an algorithm is provided to obtain the ‘best’ possible representation of the type described by the lemma. The following example shows that by redefining not only p and q but also m , more compact representations may be obtained (the reader is invited to make the drawing corresponding to each example, which is the best way to quickly grasp the situation). Example 5.41 The expression (e ⊕ γ δ)(γ 2 δ 2 )∗ is already in the form of Lemma 5.40 but it can be simplified to (γ δ)∗ by redefining m as γ δ instead of γ 2 δ 2 . Consider now the following example. Example 5.42 Let a = p ⊕ qm ∗ with p = e ⊕ γ 2 δ 2 ⊕ γ 5 δ 3 ⊕ γ 6 δ 4 ⊕ γ 8 δ 6 ⊕ γ 11 δ 7 , q = γ 12 δ 8 (e ⊕ γ 2 δ) and m = γ 3 δ 2 . Another representation of a as p ⊕ q m ∗ involves p = γ 2 δ 2 ⊕ γ 8 δ 6 and q = e ⊕ γ 2 δ . In this example, what happens is that p can partly be explained by ‘noncausal shifts’ qm −l of q . Algebraically, there exists some n (here n = 4) such that adding b = n −l to a does not change a . Hence l =1 qm a = a ⊕ b = p ⊕ qm ∗ ⊕ b = ( p − b ) ⊕ b ⊕ qm ∗ = p ⊕ q m ∗ , ◦ where p = p − b and q = qm −n . Now p does not any longer appear as the transient ◦ part, but rather as a transient perturbation of the periodic regime. Now the contributions of p and q to the transient part are interweaved. 5.7.5 Realizations by γ - and δ -Transforms In the proof of Theorem 5.39, Theorem 4.105 has been used with the choice B = C = U = B and V = {γ , δ }. Other possibilities have been suggested at the end of §4.8.3. 5.7.5.1 Dater Realization If we consider the possibility B = C = B, U = B ∪ {δ } and V = {γ }, then U = {ε, e, δ, δ 2, . . . , δ ∗ } since all sums of such elements are reducible to one of them by the rule δ t ⊕ δ τ = δ max(t ,τ ) . Hence U , being obviously a complete dioid in 260 Synchronization and Linearity this case, is isomorphic to (and will be identified with) Nmax (with δ ∗ identified with = +∞). Consequently, in an observer representation (C , A , B ) of a rational matrix H as provided by Theorem 4.105, B has its entries in Nmax and A = γ A where A is also a matrix with entries in Nmax ; C is a Boolean matrix. This realization may be interpreted as the one directly derived, by the γ -transform, from the dater equations x (k ) = Ax (k − 1) ⊕ Bu (k ) , y (k ) = Cx (k ) , where in addition y is a subvector of x (hence the name ‘observer representation’). For a controller representation, B is a Boolean matrix whereas C makes any linear combination of the x i with weights in Nmax . Remark 5.43 It is thus always possible, starting from any timed event graph, to obtain an equivalent event graph with a structure corresponding to the observer (respectively, the controller) representation, that is, the initial marking consists of exactly one token in internal places, no tokens in the input and output places, and in addition, zero holding times for the output (respectively, the input) places. It should be realized that there is a trick here to represent deadlocked systems in this way, i.e. event graphs circuits with no tokens and positive holding times. These systems will be realized in ‘state space’ form by matrices A, B or C having some entries equal to δ ∗ . Indeed, an arc with weight δ ∗ introduces an unbounded holding time, which is sufficient to block some parts of the system. For example, for the graph of Figure 5.12 (the transfer function of which is δ ∗ ), an observer (dater) realization is C = e, A = γ , u x y Figure 5.12: A deadlocked event graph B = δ∗. 5.7.5.2 Counter Realization The counter realization corresponds to the dual possibility offered by Theorem 4.105, namely to choose U = B ∪ {γ } and V = {δ }. Then U = {ε, e, γ , γ 2 , . . . }. This is a complete dioid. Due to the rule γ k ⊕ γ κ = γ min(k,κ) , this U is now identified with Nmin (note also that γ ∗ = e). Remark 5.44 In Nmax , all elements are greater than e except ε. In Nmin , all elements lie between ε = +∞ and = e = 0. In this new context, the realizations one obtains appear to be derived, by the δ transform, from counter equations in ‘state space’ form with, in addition, Boolean matrices C or B . 5.8. Frequency Response of Event Graphs 261 Deadlocked systems can be represented directly since a weight of e means no tokens in the corresponding place (or arc). For the graph of Figure 5.12, an observer (counter) realization is C = B = e and A = δ . 5.8 Frequency Response of Event Graphs In conventional system theory, with a continuous time-domain, rational transfer functions may be viewed as rational expressions of a formal operator denoted s , which can be interpreted as the derivative with respect to time. However, transfer functions, say H (s ), are also used as numerical functions when an imaginary numerical value j ω is substituted for s . It is well known that when a signal of pure frequency ω is used as the input of a (stable) linear time-invariant system, the corresponding output is, after a possible transient, a signal of the same frequency ω which is phase-shifted by arg H ( j ω) and amplified by the gain | H ( j ω)| with respect to the input. The transient can be avoided by starting at time −∞. Then sine functions of any frequency ω appear as eigenfunctions of any rational transfer function H with corresponding eigenvalue H ( j ω). The main purpose of this section is to discuss analogous results for event graphs. We will confine ourselves to the SISO case. Since we can consider transfer functions under three different forms as recalled at the beginning of §5.7 (referred to as dater, counter or two-dimensional representations), the following developments could be made from these three different points of view. We will favor the two-dimensional point of view, but the analogy of certain quantities to phase shift or amplification gain depends on which of the first two points of view is adopted. In §6.4, this topic will be revisited from the dater point of view and with a continuous domain. 5.8.1 Numerical Functions Associated with Elements of B[[γ , δ ]] ax In this section, an element F of B[[γ , δ ]] or of Min [[γ , δ ]] will be written either as γ k δt (5.69) (k , t )∈ J F or as F (k , t )γ k δ t with F (k , t ) = (k ,t )∈Z2 e ε if (k , t ) ∈ J F ; otherwise. (5.70) Guided by the analogy with conventional system theory alluded to above, we are going to use such a formal expression as a numerical function. This means that numerical integer values will be substituted for the formal variables γ and δ , and the quantity thus obtained will be evaluated numerically. Therefore, the symbols ⊕ and ⊗ (the latter being implicit in the above expressions) must be given a meaning in order to operate on numbers. We choose the max-plus interpretation, that is, ⊕ will be interpreted as max (or sup) and ⊗ as +. Consistently, a coefficient e is interpreted as 0 and ε as −∞. For 262 Synchronization and Linearity the time being, let us consider F as an element of B[[γ , δ ]] and define the associated numerical function, denoted F ( F ), as the mapping from Z2 into Z: (g , d ) → F (k , t )g k d t (k ,t )∈Z2 = = sup ( F (k , t ) + gk + dt ) (k ,t )∈Z2 sup (gk + dt ) . (5.71) (k , t )∈ J F In this section, according to our general convention, the multiplication, denoted by mere juxtaposition of elements, must be interpreted as the conventional × when the +, −, sup or inf symbols are involved in the same expression. Lemma 5.45 The set of mappings from Z2 into Z endowed with the pointwise maximum as addition, and the pointwise (conventional) addition as multiplication, is a complete dioid. The mapping F introduced above is a l.s.c. dioid homomorphism from B[[γ , δ ]] into this dioid of numerical functions. The proof is straightforward. The mapping F will be referred to as the evaluation homomorphism in this chapter and the next one (see §6.4.1), although the context is somewhat different in the two chapters. Equations (5.71) show that not all numerical functions are in the range of F for at least two reasons. • If the function F ( F ) was extended to the continuous domain R2 instead of Z2 (with range in R instead of Z) in an obvious manner, then it would be a convex function as the supremum of a family of linear functions. • The function F ( F ) is positively homogeneous of degree 1, that is, [F ( F )](a × g , a × d ) = a × [F ( F )](g , d ) for any nonnegative (integer) number a (in particular, [F ( F )](0, 0) = 0). For the latter reason, it suffices practically to know the value of F ( F ) for all values of the ratio g /d which ranges in Q. From the geometric point of view, since an element F of B[[γ , δ ]] encodes a subset of points J F in the Z2 plane, it is realized that F ( F ) is nothing but the so-called ‘support function’ of this subset [119]. It is well known that support functions characterize only the convex hulls of subsets: this amounts to saying that F is certainly not injective; its value at F depends only on the extreme points of the subset associated with F . Being l.s.c. and such that F (ε) = ε, the mapping F is residuated. Lemma 5.46 Let F be a mapping from Z2 into Z. If F is positively homogeneous of degree 1, then F = F F is obtained by F (k , t ) = inf ( g ,d )∈Z2 F (g , d ) − gk − dt . (5.72) 5.8. Frequency Response of Event Graphs 263 Proof Recall that F is the largest element in B[[γ , δ ]] such that F ( F ) ≤ F . Because of (5.71), for all (k , t ) ∈ Z2 , we must have ∀(g , d ) ∈ Z2 , F (k , t ) ≤ F (g , d ) − ( gk + dt ) , hence, F (k , t ) ≤ inf ( g ,d )∈Z2 F (g , d ) − (gk + dt ) . (5.73) The largest such F is defined by the equality in (5.73). For this F , we must prove that: 1. F (k , t ) assumes only the values −∞ and 0; 2. the inequality F ( F ) ≤ F is still verified for this F . The first fact stems from the homogeneity of F . Indeed, if we set g = d = 0 at the right-hand side of (5.73), this shows that F (k , t ) ≤ 0. Then, it suffices to realize that the inf cannot be equal to any finite and strictly negative value: as a matter of fact, for any positively homogeneous function ϕ , we have inf ϕ(g , d ) = inf a × a ∈N ( g ,d )∈Z2 inf ϕ(g , d ) ( g ,d )∈Z2 , and a contradiction would be obtained for any value of this inf different from 0 and −∞. As for item 2 above, according to (5.71) and the definition of F , we obtain [F ( F )] (g , d ) = ≤ = 5.8.2 sup (k ,t )∈Z2 g k + dt + inf F (g , d ) − gk − dt inf sup ( g,d ) (k ,t )∈Z2 ( g,d ) (g − g )k + (d − d )t + F (g , d ) F (g , d ) . Specialization to Max[[γ , δ ]] in We are now interested in redefining a similar evaluation homomorphism, but for elax ax ements of Min [[γ , δ ]]. We have seen that elements of Min [[γ , δ ]], which are indeed equivalence classes, may be represented by different formal expressions in B[[γ , δ ]]. All these expressions are characterized by the fact that they yield the same element of B[[γ , δ ]] (the maximum representative) when they are multiplied by γ ∗(δ −1 )∗ . By mere application of (5.71), we have that F γ ∗ (δ −1 )∗ (g , d ) = 0 +∞ if g ≤ 0 and d ≥ 0 ; otherwise. 264 Synchronization and Linearity Therefore, by application of the homomorphism property of F , it is seen that, for all (g , d ) such that g ≤ 0 (denoted g ∈ (−N)) and d ≥ 0, F γ ∗(δ −1 )∗ = G γ ∗ (δ −1 )∗ (in B[[γ , δ ]]) ⇒ [F ( F )](g , d ) = [F (G )](g , d ) . It is readily checked that we can repeat all the proofs and results of §5.8.1, using any representatives in B[[γ , δ ]] of elements in Max[[γ , δ ]], provided that we restrict the pairs in (g , d ) to belong to (−N) × N. Remark 5.47 For (g , d ) ∈ (−N) × N, the value of F ( F ) can consistently be set to ax +∞ whenever F is considered as an element of Min [[γ , δ ]], except if F = ε, for this is the value obtained with the maximum representative of F . This may be explained by recalling the geometric interpretation of F ( F ) as a support function of a subset, and by observing that the ‘cones of information’ introduced in §5.4.3 extend indefinitely in the South and East directions (characterized by g > 0 or d < 0). Observe that the subset of numerical functions equal to +∞ outside (−N) × N, plus the function ε equal to −∞ everywhere, is also a complete dioid for the operations defined at Lemma 5.45. ax We will keep on using the notation F for this mapping defined over Min [[γ , δ ]] (since the previous F defined over B[[γ , δ ]] will no longer be in use). The following definition and lemma, which should now be clear, summarizes the situation. ax Definition 5.48 (Evaluation homomorphism) The mapping F from Min [[γ , δ ]] into the dioid of numerical functions (introduced at Lemma 5.45) is defined as follows: • F (ε) = ε; • if F = ε, – if (g , d ) ∈ (−N) × N, then [F ( F )](g , d ) is defined by Equations (5.71) using any representative of F; – if (g , d ) ∈ (−N) × N, then [F ( F )](g , d ) = +∞. ax Lemma 5.49 The mapping F just defined is a l.s.c. dioid homomorphism over Min [[γ , δ ]] which is residuated, and F can be defined by (5.72) in which the inf is restricted to g ∈ (−N) and d ∈ N. 5.8.3 Eigenfunctions of Rational Transfer Functions ax We now introduce particular elements of Min [[γ , δ ]] which will be shown to play the role of sine functions in conventional system theory. Definition 5.50 For two positive integers k and t , we set def L (k , t ) = F F γ k δ t ⊕ γ −k δ −t ∗ . 5.8. Frequency Response of Event Graphs 265 It is easy to check that M (k , t ) = γ k δ t ⊕ γ − k δ − t def ∗ = γ k δt ∗ γ −k δ −t ∗ = γ k δt ∗ ⊕ γ −k δ −t ∗ . Lemma 5.51 The element L (k,t ) depends only on the ratio c = t / k > 0 (therefore it will be denoted simply L c with c > 0) and it is given explicitly by Lc = γ l δs . s ≤c ×l If one draws the line of slope c > 0 in the R2 -plane, L c is the coding of the points of Z2 which lie below this line. In other words, L c represents the best discrete approximation of this line from below. For example, with k = 3 and t = 2 hence c = 2/3 (see Figure 5.13), then ∗ ∗ L c = (e ⊕ γ 2 δ) γ 3 δ 2 γ −3δ −2 . Figure 5.13: A ‘linear’ function Proof of Lemma 5.51 We have M(k,t )(l , s ) = e if (l , s ) = (n × k , n × t ) with n ∈ Z and M(k,t )(l , s ) = ε otherwise. Then, according to (5.71), F M(k,t ) (g , d ) = sup n (gk + dt ) = n∈Z 0 +∞ if gk + dt = 0 ; otherwise. Obviously, this expression depends only on the ratio c = t / k . Finally, according to Lemma 5.49, [F F M(k,t ) ](l , s ) = = = inf g ∈(−N),d ∈N gk +dt =0 (−(gl + ds )) inf d (lc − s ) d ∈N 0 −∞ if s ≤ cl ; otherwise. (5.74) 266 Synchronization and Linearity The element L c describes a sequence of events which occurs at the average rate of 1/c events per unit of time. Consider a SISO event graph with transfer function H . Since H is realizable, hence periodic (see Definition 5.38), it can be written as P ⊕ Q γ k δt ∗ ( P and Q are polynomials). We will confine ourselves to the nontrivial cases when Q = ε and k > 0, t > 0. The ratio k /t (the inverse of the asymptotic ‘slope’ of the impulse response) characterizes the limit of the rate of events the system can process. If L c is used as the input of the event graph, and if 1/c exceeds this limit, then there will be an indefinite accumulation of tokens inside the system and the output is indefinitely delayed with respect to the input. Otherwise, the following theorem states that, using L c as the input produces an output γ κc δ θc L c , that is, essentially the same trajectory as the input up to shifts by κc along the x -axis (event domain) and θc along the y -axis (time domain). The theorem also shows how (κc , θc ) is related to [F ( H )](g , d ) for any (g , d ) such that g = −d × c. Theorem 5.52 Consider a SISO system with rational transfer function H = P ⊕ Q γ k δt ∗ , where P and Q are polynomials in Max[[γ , δ ]], Q is supposed to be differin ent from ε, and k and t are supposed to be strictly positive. Then, 1. for all g ≤ 0 and d ≥ 0, [F ( H )](g , d ) is different from +∞ if and only if c = −g /d ≥ t /k , (5.75) [F ( H )](g , d ) = [F ( P ⊕ Q )](g , d ) = κc g + θc d (5.76) and then for some finite nonnegative integers κc and θc ; 2. those κc and θc are not necessarily unique, but any selection yields nonincreasing functions of c; def 3. let c = t / k and assume that c satisfies (5.75); then we have H L c = γ κc δ θc L c . (5.77) Proof 1. Let R = γ k δ t ∗ . Since for the minimum representative of R , R (l , s ) = e when (l , s ) = n (k , t ) with n ∈ N, and R (l , s ) = ε otherwise, we have that [F ( R )](g , d ) = sup n g k + d t = n∈N 0 +∞ if (g , d ) satisfies (5.75); otherwise. On the other hand, for a polynomial, say P , the minimum representative involves a finite number of points, and F ( P ) is the convex hull of a finite number of linear functions. Therefore, by the homomorphism property of F , F ( H ) = F ( P ) ⊕ F ( Q ) ⊗ F ( R ), and since Q = ε, [F ( H )](g , d ) is finite if and only 5.8. Frequency Response of Event Graphs 267 if (g , d ) satisfies (5.75), and then F ( H ) = F ( P ) ⊕ F ( Q ). For any such pair (g , d ) ∈ (−N) × N, we thus have [F ( H )](g , d ) = sup (l ,s )∈ J P ∪ J Q (gl + ds ) , and the supremum is reached at some (not necessarily unique) point (κc , θc ) which clearly depends only on the ratio c = −g /d > 0. 2. By a well known result of convexity theory [119], (κc , θc ) belongs to the subdifferential of the convex function F ( H ) at the point (g , d ). Hence the mappings (g , d ) → (κc , θc ), for any choices of (κc , θc ), are monotone, that is, for two pairs (gi , di ), i = 1, 2, and any associated subgradients (κi , θi ), we have (g1 − g2 )(κ1 − κ2 ) + (d1 − d2 )(θ1 − θ2 ) ≥ 0 . Since we are only concerned with the ratios ci = −gi /di , we can either take g1 = g2 or d1 = d2 in the above inequality. This shows the monotonicity property claimed for κ and θ as functions of c. 3. Let Y = H L c with c = t / k satisfying (5.75). Then, for all (l , s ) ∈ Z2 , Y (l , s ) = = = sup ( H (m , r ) + L c (l − m , s − r )) (m ,r )∈Z2 sup (m ,r )∈Z2 sup H (m , r ) + inf d ((l − m )c − (s − r )) d ∈N (from (5.74)) inf ( H (m , r ) + d (−mc + r ) + d (lc − s )) . (m ,r )∈Z2 d ∈N (5.78) On one hand, by inverting the sup and the inf, we obtain a new expression which is larger than (5.78), and which turns out to be equal to inf ([F ( H )](−dc, d ) + d (lc − s )) d ∈N = = inf d ((l − κc )c + θc − s ) d ∈N L c (l − κc , s − θc ) , (5.79) the latter equality being true because of (5.74). On the other hand, if we choose the particular value (m , r ) = (κc , θc ) instead of performing the supremum, we obtain an expression which is less than (5.78), and which turns out to be identical to (5.79) (the clue here is that (κc , θc ) which realizes the maximum in the evaluation of [F ( H )](−dc, d ) does not depend on d indeed). Finally, we have proved that ∀(l , s ) ∈ Z2 , Y (l , s ) = L c (l − κc , s − θc ) , which is equivalent to (5.77). It is intuitively appealing that the ‘shifts’ κc and θc are nonincreasing with c. Indeed, recall that when c decreases, the average time between two successive events at the input decreases, hence the input is faster: the delays introduced by the system, in terms of counters or in terms of daters, are likely to increase. Moreover, there is a ‘threshold’ 268 Synchronization and Linearity effect (or a ‘low-pass’ effect) in that, above a certain speed which is defined by the asymptotic slope of the impulse response, the system, driven by an input which is too fast, ‘blows up’, and the delays become infinite. This corresponds to an unstable situation (using the same calculations as in the above proof, it can be proved in this case that Y = ). This is also similar to conventional system theory in which the sine functions are eigenfunctions only in the stable case. The difference is here that the stability property is not an intrinsic feature of the system (at least in the SISO case considered here), but it depends on the mutual speeds of the input and of the system itself. Let us conclude this section by an example. Example 5.53 Consider H = γ δ 2 ⊕ (γ 2 δ)∗ , the impulse response of which is represented at the left-hand side of Figure 5.14. This system cannot process events faster Figure 5.14: The impulse response of H = γ δ 2 ⊕ (γ 2 δ)∗ and the response to L 2/3 than 2 events per time unit. Let us study the functions κc and θc with respect to c: the subset of points with coordinates (κc , θc ) in the N2 -plane, when c varies, may be considered as the Black plot by analogy with conventional system theory in which the Black plot is the curve generated by the points (arg H ( j ω), log | H ( j ω)|) when ω varies. In this case, it is easy to see that (κc , θc ) = (0, 0) (1, 2) if 2 ≤ c < +∞ ; if 1/2 ≤ c ≤ 2 . The points belonging to the Black plot are circled in the figure. At the right-hand side of this figure, the ‘trajectory’ of L c is represented by a solid black line for c = 2/3 (see also Figure 5.13) and the response of the system to this input is indicated by a gray line. The shifts along the two axes is indicated by a curved arrow. In the dater point of view, one may say that the ‘phase shift’ is κc whereas the ‘amplification gain’ is θc . In the counter point of view (which is closer to conventional system theory since the time domain is usually represented as the x -axis), the role of κc and θc as phase shift and amplification gain are reversed. 5.9 Notes The first papers on a new linear theory of some discrete event systems have been published in early 1983 [37, 38]. They were based on the dater representation and, hence, on the max-plus algebra. The connection of these linear models with timed event graphs was established in 1984 5.9. Notes 269 [41]. In 1985, it has been realized that specific algebraic problems arise from the fact that daters, as time sequences, are nondecreasing [42]. At about the same time, the idea of the counter representation has been introduced by Caspi and Halbwachs [35] to whom this terminology is due. This finally resulted in the two-dimensional representation first published in [43]. A more detailed account of the so-called M ax[[γ , δ]] algebra was given in [44] together with some of the in formulæ about residuation of ⊕ and ⊗. A large part of the material of this (and the previous) chapter(s) is based on that paper—e.g. the sections on backward equations and on rationality, realizability and periodicity—although the presentation has been somewhat improved here. In particular, the role of residuation theory has been clarified in the process of establishing backward equations, and in the relationships between dater and counter representations. Only the results on ‘minimum representatives’ appear for the first time in this book. The idea of using the formal transfer function as a numerical function, the fact that the Fenchel transform plays a role similar to that played by the Laplace transform in conventional system theory, the parallel notion of eigenfunctions of linear transfer functions in discrete event and conventional system theories, . . . , were all discovered and published in 1989 [40]. However, here, the presentation has been more tightly confined in the two-dimensional point of view. 270 Synchronization and Linearity Chapter 6 Max-Plus Linear System Theory 6.1 Introduction In this chapter a linear system theory is developed for a class of max-plus linear systems with discrete or continuous domain and range. This class provides a generalization of the class of event graphs which have been considered in Chapter 2. We will start with an input-output point of view. We will view a max-plus linear system as a max-plus linear operator mapping an input, which is a function over some domain, to an output, which is another function over the same domain. For this max-plus linear operator, we will review the classical notions of conventional linear system theory. In particular, the notions of causality, shift-invariance, impulse response, convolution, transfer function, rationality, realization and stability will be considered. The outline of the chapter is as follows. In §6.2 we give general definitions, we present the system algebra and we discuss some fundamental elementary systems. In §6.3 we define some subalgebras of the system algebra by progressively specializing the systems, starting with the most general ones, and finishing with causal shiftinvariant systems with nondecreasing impulse responses. Most practical examples of discrete event systems fall into this last category. In the dater description, their output is the result of a sup-convolution between their input and their impulse response. In §6.4 we introduce the notion of transfer functions which are related to impulse responses by means of the Fenchel transform. In §6.5 we discuss rationality in the max-plus context, and characterize rational elements in terms of periodicity. We also discuss the problem of minimal realization of these max-plus systems. In §6.6 we give a definition of internally stable systems and characterize them in terms of equations which are the analogue of the conventional Lyapunov equation. In this chapter, mainly single-input single-output (SISO) linear max-plus systems are considered. 6.2 6.2.1 System Algebra Definitions Definition 6.1 (Signal) A signal u is a mapping from R into Rmax . When a signal is a nondecreasing function it is called a dater. R The signal set is Rmax . This signal set is endowed with two operations, namely 271 272 Synchronization and Linearity • pointwise maximum of signals which plays the role of addition: ∀k ∈ R , R def ∀u , v ∈ Rmax , (u ⊕ v)(k ) = u (k ) ⊕ v(k ) = max(u (k ), v(k )) ; • addition of a constant to a signal, which plays the role of the external product of a signal by a scalar: ∀k ∈ R , ∀a ∈ Rmax , R ∀u ∈ Rmax , def (au )(k ) = a ⊗ u (k ) = a + u (k ) . Therefore the set of signals is endowed with a moduloid structure. This algebraic structure is called U . In previous chapters the domain of signals was Z (event domain) and trajectories were nondecreasing. In this chapter we develop the theory in the more general framework of Definition 6.1. Definition 6.2 (Max-plus linear system) A system is an operator S : U → U , u → y. The signal u (respectively y) is called the input (respectively output) of the system. We say that the system is max-plus linear when the corresponding operator satisfies S ui = S (u i ) , i∈I (6.1) i∈I for any finite or infinite set {u i }i ∈ I , and S (au ) = a S (u ) , ∀a ∈ Rmax , ∀u ∈ U . Remark 6.3 Equation (6.1) is indeed the requirement of lower-semicontinuity of S and not only the requirement that S is an ⊕-morphism. Here is an example of a system which is an ⊕-morphism but which fails to be l.s.c.: [ S (u )](t ) = lim sup u (s ) . s →t This system is clearly an ⊕-morphism, but to show that it is not l.s.c., consider 0 if k ≤ 0 ; 1 ∀n ≥ 1, u n (k ) = n × k if 0 < k < n ; 1 if 1 ≤ k . n For all n ≥ 1, we have [ S (u n )](0) = 0, and u n (t ) = n ≥1 This yields S n un 0 1 if t ≤ 0 ; otherwise. (0) = 1, which is different from n S (u n ) (0) = 0 . The set of linear systems is endowed with two internal and one external operations, namely 6.2. System Algebra 273 parallel composition: S = S1 ⊕ S2 is defined as follows: [ S (u )](k ) = [ S1(u )](k ) ⊕ [ S2(u )](k ) ; (6.2) series composition: S = S1 ⊗ S2, 1 or more briefly, S1 S2 is defined as follows: [ S (u )](k ) = [ S1( S2 (u ))](k ) ; (6.3) amplification: T = a ⊗ S , a ∈ Rmax is defined by: T (k ) = a ⊗ S (k ) . In addition to these basic operations, we have another important one, the feedback: feedback: S ∗ defined by the mapping from U into U : u → y , where y (see Figure 6.1) is the least solution of y = S ( y) ⊕ u . (6.4) The notation S ∗ is justified by the fact that the least solution of (6.4) does exist by Theorem 4.75 and it is given by u ⊕ S (u ) ⊕ S ( S (u )) ⊕ · · · . u ⊕ y S Figure 6.1: The feedback operation y = S ∗ u The set of systems endowed with the first three operations defines an idempotent algebra called the algebra of systems. We do not lose anything by considering only the two internal operations because the amplification can be realized by a series composition where the downstream system is what we call a gain in the next section. Whenever we speak of the set of systems endowed with the two internal operations, we refer to it as the ‘dioid of systems’. The second operation is not invertible, and therefore this set is not an idempotent semifield (see Chapter 3), it is only a dioid. 6.2.2 Some Elementary Systems We have discussed how we can combine systems using compositions and feedbacks. Here, we describe some elementary though fundamental systems with which more complex systems can be built up. We first introduce the following notation. 1 We make the usual abuse of notation which consists in using the same symbol for external multiplication by a scalar and for internal multiplication of systems. This will be justified later on. 274 Synchronization and Linearity b Notation 6.4 For f : R → Rmax , sa f (s ) denotes the supremum of f (s ) when s ranges in the interval [a , b ] (or e.g. (a , b ] if a = −∞). We may also use the notation s f (s ) R if a = −∞ and b = +∞. The following elementary systems are now introduced. Zero system ε: this system produces the constant output ε whatever the input is: y (k ) = ε for all k . It satisfies ε ⊕ ε = ε ⊗ ε = ε∗ = ε . Identity e: this system produces an output equal to the input y (k ) = u (k ) for all k . It satisfies e ⊕ e = e ⊗ e = e∗ = e . Shift : this system maps inputs to outputs according to the equation y (k ) = u (k − g ) for all k . The notation g is justified by the following rule of series composition which should be obvious to the reader: g ⊗ g g = g+g = g ⊗g . Therefore 1 may be denoted . If we restrict ourselves to signals that are nondecreasing signals (see the discussion just above Remark 5.1), we have the simplification rule g ⊕ g = min( g , g ) . (6.5) In the context of event graphs, an initial stock of c tokens in a place introduces such a shift between inputs and outputs in the domain where we ‘count’ events. In the framework of the continuous system depicted in Figure 1.13, the same role is played by the initial amount of fluid in the reservoir at the outlet. Note however that, in that example, equations were written in a counter, rather than in a dater, representation, and consequently, this device operated as a gain rather than as a shift. Gain d : this system maps inputs to outputs according to the equation y (k ) = d ⊗ u (k ) = d + u (k ) for all k . Again, the notation d is justified by the following rule of series composition: d ⊗ d Therefore 1 may be denoted holds true for any input signal) d ⊕ = d +d = d ⊗d . . We also have the simplification rule (which d = d ⊕d = max(d ,d ) . In the context of timed event graphs, this is the general input-output relation induced by a place with holding time d . For the system of Figure 1.13, this input-output relation is that of along pipe at the inlet of a funnel. 6.2. System Algebra Flow limiter a: 275 this system maps inputs to outputs according to the relation y (k ) = k s −∞ u (s )a k−s . (6.6) Unlike c and d , we use the notation a with a as a subscript, because a does not behave like an exponent. Indeed, the following parallel, series and feedback composition rules can be checked by direct calculation: = a ⊗ > e and hence a =( a Moreover, a ⊕ a a = max(a ,a ) = a ⊕a . (6.7) ∗ a) . Physically, this system corresponds to the input-output relation between the cumulated quantities traversing a pipe which limits the flow to 1/a (of course, here a is a positive number). This is the case of the aperture of the funnel in Figure 1.13 (recall that this example is worked out using counter rather than dater equations). This system plays the role of the SISO system governed by the differential equation y = ay + u ˙ in conventional system theory, the solution of which is t y (t ) = −∞ u (s ) exp (a (t − s )) ds , which is the analogue of (6.6). Integrator e: this is another notation for 0 . It maps inputs to outputs according to ´ k the equation y (k ) = s u (s ). The output of such a system is always nonde−∞ creasing. It plays the role of an identity element for shift-invariant systems with nondecreasing impulse responses as we shall see later on. This role justifies the notation e. It satisfies ´ e ⊕ e = e ⊗ e = (e)∗ = e . ´´´´ ´ ´ Local integrator w : this system maps inputs to outputs according to the relation k y (k ) = s u (s ). It is the analogue of a conventional system recursively avk −w eraging the input in a window of width w . The following series, parallel and feedback compositions of local integrators can be easily checked: w w ( ⊗ w∗ w = w⊕w = w +w = ∞ =e , ´ ⊕ w )= , w⊗w , ∀w > 0 . 276 6.3 Synchronization and Linearity Impulse Responses of Linear Systems In this section we introduce the notion of impulse response for a max-plus linear system. The algebra of impulse responses is isomorphic to the algebra of systems. The former algebra is first specialized to the case of shift-invariant systems and subsequently to the case of systems with nondecreasing impulse responses. 6.3.1 The Algebra of Impulse Responses We saw that the set of systems can be endowed with a moduloid structure. The next step is to introduce a kind of ‘canonical basis’ for this algebraic structure. Classically, for time functions, this basis is provided by the Dirac function at 0, and all its shifted versions at other time instants. Therefore, we now introduce def e(·) : k → e(k ) = e ε if k = 0 ; otherwise, (6.8) and def γ s (·) = s (e(·)) γ s (k ) = e(k − s ) , i.e. ∀k . (6.9) The justification of the notation e(·) will come from the fact that this particular signal is the identity element for sup-convolution which will be the internal multiplication in the system set. Indeed, it can be checked by direct calculation that ∀u , ∀k , u (k ) = s R u (s )e(k − s ) . (6.10) In view of (6.9), this can be rewritten u= s R u (s )γ s , (6.11) which shows that u is obtained as a linear combination of the signals γ s . This is the decomposition of signals with respect to the canonical basis. This decomposition is unique since, if there exists another function v : R → Rmax such that u = s v(s )γ s , R we conclude that v(s ) = u (s ), ∀s , because of Identity (6.10) applied to the function v . Now we can state the following theorem which introduces the notion of impulse response. Theorem 6.5 Let S be a linear system, then there exists a unique function h (k , s ) (called the impulse response) such that y = S (u ) can be obtained by ∀k , y (k ) = sup[h (k , s ) + u (s )] = for all input-output pairs (u,y). s ∈R s R h (k , s )u (s ) , (6.12) 6.3. Impulse Responses of Linear Systems 277 Proof We have y (k ) = [ S (u )](k ) = S s R (k ) , u (s )γ s which, owing to the linearity assumption, implies y (k ) = s R S (γ s ) (k ) u (s ) = s R h (k , s )u (s ) , def where we have set h (k , s ) = [ S (γ s )](k ). To prove uniqueness, suppose that there exists another function f (·, ·) which satisfies (6.12). Then using inputs u = γ s , we obtain h (k , s ) def = = = [ S (γ s )](k ) τ R f (k , τ )γ s (τ ) f (k , s ) , for all s , k ∈ R, where the last equality is (6.10) applied to the function f (k , ·). To the series, parallel, amplification and feedback compositions of systems correspond operations on the impulse responses. Theorem 6.6 Given a ∈ Rmax and the systems S, S1 and S2 with respective impulse responses h, h 1 and h 2 , then, def • the impulse response of S1 ⊕ S2 is [h 1 ⊕ h 2 ](k , s ) = h 1 (k , s ) ⊕ h 2 (k , s ); def • the impulse response of S1 ⊗ S2 is [h 1 ⊗ h 2 ](k , s ) = r R h 1 (k , r )h 2 (r, s ); def • the impulse response of a S is [ah ](k , s ) = ah (k , s ); def • the impulse response of S ∗ is h ∗ = i ∈N hi . The set of impulse responses endowed with the first three operations (respectively the first two operations) defines an idempotent algebra (respectively a dioid), called the algebra (respectively the dioid) of impulse responses which is denoted H. Impulse responses are representations of systems written in a canonical basis, just like matrices are finite dimensional linear operators written in a particular basis. Definition 6.7 A linear system S is causal if, for all inputs u 1 and u 2 with corresponding outputs y1 and y2 , ∀s , ∀k ≤ s , u 1 (k ) = u 2 (k ) ⇒ y1 (s ) = y2 (s ) . Theorem 6.8 A system S is causal if its impulse response h (k , s ) equals ε for k ≤ s. 278 Synchronization and Linearity Proof If S is causal, S (γ s ) = h (k , s ) coincides with S (ε) = ε for k ≤ s . Remark 6.9 The impulse response h of a series composition of two causal systems of impulse responses h 1 and h 2 has the simplified form k h (k , s ) = r h 1 (k , r )h 2 (r, s ) . s 6.3.2 Shift-Invariant Systems Let us specialize the algebra H to shift-invariant systems. Definition 6.10 A linear system S is called shift-invariant if it commutes with all shift operators, that is, if ∀u , ∀c , S( c (u )) = c ( S (u )) . Theorem 6.11 A system S is shift-invariant if and only if its impulse response h (k , s ) depends only on the difference k − s. With the usual abuse of notation, the impulse response is denoted h (k − s ) in this case. It is equal to h (·) = [ S (e)](·). Proof We have def h (k , s ) = [ S (γ s )](k ) = [ S ( s (e))](k ) = [ s ( S (e))](k ) = [ S (e)](k − s ) . Consequently, in the shift-invariant case, the kernel defining the impulse response is reduced to a function. The input-output relation can be expressed as follows: def y (k ) = (h ⊗ u )(k ) = s R h (k − s )u (s ) . This new operation, also denoted ⊗, is nothing but the sup-convolution which plays the role of the convolution in conventional system theory. We also note that the series composition corresponds to the sup-convolution of the corresponding impulse responses. Definition 6.12 The algebra of shift-invariant impulse responses, denoted S , is the set R Rmax endowed with: • the pointwise maximum of functions denoted ⊕; • the sup-convolution denoted ⊗; • the external operation which consists in adding a constant to the function. The zero element denoted ε(·) is defined by ε(k ) = ε, ∀k . It is absorbing for multiplication. The identity element denoted e(·) is described by (6.8). 6.3. Impulse Responses of Linear Systems 279 Remark 6.13 1. This idempotent algebra S can simply be considered as a dioid. 2. Because signals and SISO systems can be represented by functions, we do not have to distinguish them. 3. Impulse responses of shift-invariant causal systems satisfy h (k ) = ε for k < 0. Example 6.14 The elementary systems introduced in §6.2.2 are shift-invariant linear systems. Their impulse responses are given later in Table 6.2. Notice that γ 0 = δ 0 = φε = e if φε denotes the pointwise limit of φa when a goes to −∞. 6.3.3 Systems with Nondecreasing Impulse Response In the context of event graphs, input signals have the meaning of sequences of dates at which successive events occur, and therefore they are nondecreasing functions. In the case of the continuous system depicted in Figure 1.13, the input and the output are also nondecreasing. A nondecreasing signal u can be characterized by the inequality u ≥ w u for any arbitrary positive w . From an algebraic point of view, this situation is identical to that described by Inequality (5.11) if w plays the role earlier played by γ , now that the def domain is continuous. Hence from Theorem 5.8, we know that v = ( w )∗ v is the ´ best approximation from above of a signal v in the subset of nondecreasing signals. Recall that ( w )∗ = e (see end of §6.2.2). In particular, a nondecreasing function u is ´ characterized by u = u = eu . ´ ´ Consider a system with impulse response h . Then, if only nondecreasing inputs u are considered, the outputs are also nondecreasing as shown by the following equalities: y = h ⊗ u = h ⊗ e ⊗ u = e ⊗ (h ⊗ u ) = e ⊗ y . ´ ´ ´ We also notice that, for this class of nondecreasing inputs, the systems with impulse ´ ´ responses h and h = e ⊗ h yield the same outputs. This h is called the ‘nondecreasing ´ version’ of the impulse response h . The subset of nondecreasing signals and impulse ´ responses, denoted S , is a dioid with the same addition and multiplication as S , but the identity element is e . ´ The following are the nondecreasing versions of the impulse responses of some elementary systems encountered earlier, the nonmonotonic versions of which are given in Table 6.2 below: e(k ) = ´ e ε if k ≥ 0 ; otherwise; γ´c (k ) = e ε if k ≥ c ; otherwise; d δ´d (k ) = ε if k ≥ 0 ; otherwise. 280 6.4 6.4.1 Synchronization and Linearity Transfer Functions Evaluation Homomorphism In this section we discuss the notion of transfer functions associated with shift-invariant max-plus linear systems. Transfer functions are related to impulse responses by a transformation which plays the role of the Fourier or Laplace transform in conventional system theory, and which, in our case, is similar to the Fenchel transform of convex analysis. We saw that signals and impulse responses are functions belonging to the same idempotent algebra and that, in the canonical basis, they can be written f= s R f (s )γ s . We associate a transfer function g , which will be a mapping from Rmax into Rmax , with such an impulse response viewed as a generalization of a formal polynomial introduced in Chapter 3. The value at a point of this latter function is obtained by substituting a numerical variable in Rmax for γ in the expression of f . The resulting expression is evaluated using the calculation rules of Rmax . This substitution of a numerical value for the generator should be compared with what one does in conventional system theory when substituting numerical values in C for the formal operator of the derivative (denoted s ) in continuous time, or the shift operator (denoted z ) in discrete time. To formalize this notion, we introduce the idempotent algebra of convex functions. Recall that a closed convex function is a function which is 1. l.s.c. in the conventional sense, that is, it satisfies limxn →x f (x n ) ≥ f (x ); 2. convex; 3. proper, that is, nowhere equal to −∞; or a function which is always equal to −∞. It is exactly the set of the upper hulls of collections of affine functions [119, Theorem 12.1]. Definition 6.15 The set of closed convex functions endowed with the pointwise maximum denoted ⊕, the pointwise addition denoted ⊗ and the addition of a scalar as external operation, is called the algebra of convex functions and is denoted Ccx . Once more, there is no loss of generality in considering the dioid of convex functions endowed with two internal operations only. Indeed, the product by a scalar or the pointwise product by a constant function gives the same result. Definition 6.16 For f = s R f (s )γ s ∈ S , let g : R → Rmax , c→ s R f (s ) ⊗ cs . (6.13) 6.4. Transfer Functions 281 Then g is called the numerical transfer function2 associated with f . The transform F which maps f to g is called the evaluation homomorphism (as will be justified by the forthcoming Theorem 6.17). Five different complete and commutative idempotent algebras and dioids have been considered, and consequently five different meanings of ⊕ and ⊗ have been used. As usual, the context should indicate which one is meant according to the nature of elements on which these binary operations operate. Table 6.1 recalls the meaning of these operations. The application of the evaluation homomorphism to the impulse Table 6.1: Five dioids Dioid ⊕ ⊗ ε Rmax max + −∞ e 0 Dioid ⊕ ⊗ ε e H S pointwise max max-plus kernel product sup-convolution ε(k , l ) = −∞ , ∀k , l ε(k ) = −∞ , ∀k 0 ifk=l 0 if k = 0 e(k , l ) = e(k ) = −∞ otherwise −∞ otherwise ´´ S = e⊗S Ccx pointwise max sup-convolution pointwise addition ε(k ) = −∞ , ∀k ε(c) = −∞ , ∀c 0 if k ≥ 0 e(k ) = ´ e(c) = 0 , ∀c −∞ otherwise responses of the elementary systems is given in Table 6.2. Theorem 6.17 The evaluation homomorphism F is a l.s.c. (in the sense of Definition 4.43) epimorphism (surjective homomorphism) from S onto Ccx . Proof The homomorphism and l.s.c. properties are true by construction. Indeed, F ( f ⊕ f ) = F ( f ) ⊕ F ( f ), and its extension to infinite sums is true by commutativity of the sup operation. Finally F ( f ⊗ f ) = F ( f ) ⊗ F ( f ) is true by definition of the ⊗ operation in S . Surjectivity arises from the fact that F (S ) is the set of the upper hulls of families of affine functions which coincides with the set of closed convex functions. Remark 6.18 1. Clearly F is not injective, for example 2 c ⊕ c2 = F γ ⊕ γ 2 (c) = F s γs (c) . 1 2 In this chapter we will call it simply a transfer function because the notion of formal transfer is not used. 282 Synchronization and Linearity Table 6.2: Impulse responses and transfer functions of the elementary systems System ε e Impulse response ε(k ) = ε , e ε e(k ) = Transfer function ∀k [F (ε)](c) = ε , ∀c if k = e otherwise [F (e)](c) = e , ∀c g γ g (k ) = e ε if k = g otherwise [F (γ s )](c) = g c , ∀c d δ d (k ) = d ε if k = 0 otherwise F δ d (c) = d , ∀c a φa (k ) = ak ε if k ≥ 0 otherwise e(k ) = ´ e ε e ´ w e ε ς w (k ) = [F (φa )](c) = if w ≥ k ≥ 0 otherwise if c ≤ −a otherwise F e (c) = ´ otherwise e e if c ≤ 0 otherwise [F ( w )](c) = e wc if c ≤ 0 otherwise 2. The convex function g (c) = ε if c = 0 , otherwise, is not closed. Neither is it the upper hull of a set of affine functions. Indeed, each affine function would be below g and therefore would be equal to ε everywhere. Nevertheless this function is l.s.c. because the subsets {c | g (c) ≤ a }, which are equal to {0} for all a ∈ R, are closed. 3. By returning to conventional notation, F can be interpreted in terms of the Fenchel transform. More precisely we have [F ( f )](c) = sup[kc + f (k )] = [Fe (− f )](c) , (6.14) k def where [Fe ( f )](c) = supk (kc − f (k )) denotes the classical Fenchel transform of convex analysis [58]. Recalling that the Fenchel transform converts infconvolutions into pointwise (conventional) additions, we see that the choice of multiplication in Ccx is consistent with this property of the Fenchel transform. 6.4.2 Closed Concave Impulse Responses and Inputs It is well known that the Fenchel transform only characterizes closed convex functions; or otherwise stated, all functions having the same convex hull have the same Fenchel 6.4. Transfer Functions 283 transform. Rephrasing this result in terms of the evaluation homomorphism, we obtain that only the closed concave impulse responses are completely characterized by their transfer functions. For this subclass, the evaluation homomorphism is a tool as powerful as the Laplace transform in conventional system theory. Theorem 6.19 For g ∈ Ccx , the subset F −1 (g ) admits a maximum element F (g ) defined by def F (g ) (k ) = ◦ g (x )/ck = inf [g (c) − ck ] c c (6.15) (where the latter expression in conventional notation requires the convention about ∞ − ∞ discussed in Example 4.65). Moreover, F (g ) is the concave upper hull of any other element of F −1 (g ). Proof From the preceding considerations, we see that all the assumptions required in Theorem 4.50 are fulfilled. Then (6.15) is a straightforward extension of (3.11) to the continuous domain case. Definition 6.20 The subset Scv of S consists of closed concave functions, that is, the functions which are concave, upper-semicontinuous (u.s.c. in the conventional sense) and either nowhere equal to or always equal to . Remark 6.21 The set Scv is closed for multiplication (sup-convolutions of concave u.s.c. functions yield concave u.s.c. functions), but not for addition (the upper hull of concave functions is not in general a concave function). It is closed for pointwise infimum. Therefore, this subset is not a subdioid of S . The next theorem tells us that the computation of the sup-convolution of two concave functions is equivalent to a pointwise addition and three Fenchel transforms. Knowing that there exists a fast Fenchel transform which is the analogue of the fast Fourier transform [31], this formula gives an efficient algorithm to compute sup-convolutions. Theorem 6.22 We have the formula ∀ f , g ∈ Scv , h = f ⊗ g = F (F ( f ) ⊗ F (g )) , which, in conventional notation, means h (k ) = sup [ f (x ) + g ( y )] = F (F ( f ) + F (g )) . x + y =k Proof Equation (6.15) shows that Scv equals F (Ccx ), since lower hulls of families of affine functions are closed concave functions. Therefore, ∀ f ∈ Scv , F ◦F ( f ) = f . Then, using the closedness of Scv and the homomorphism property of F , we have f ⊗ g = F ◦F ( f ⊗ g ) = F (F ( f ) ⊗ F (g )) . 284 6.4.3 Synchronization and Linearity Closed Convex Inputs In conventional system theory any L 2 function can be decomposed with respect to the basis of sine functions. In the present situation, any closed convex function can be decomposed with respect to conventional linear functions: def y = lc (x ) = c × x = x c , which may be considered as the max-plus exponentials (the last expression is in maxplus notation). This decomposition can be used to compute the outputs of a shiftinvariant max-plus system driven by convex inputs. Indeed, the max-plus exponentials are eigenvectors for any shift-invariant max-plus linear system in the same way as the sine functions are eigenvectors for any shift-invariant linear system in conventional linear system theory. Definition 6.23 The subset Scx of S consists of closed convex functions. The canonical injection of Scx into Ccx is denoted I . Remark 6.24 1. The difference between Scx and Ccx is the ⊗ operation (see Table 6.1). 2. Unlike Scv which is closed for multiplication but not for addition, Scx is closed for addition and multiplication. But in general the multiplication of two convex functions is equal to . The only exception is the product of two affine functions with the same slope. 3. The identity element e(·) is not convex. Therefore, Scx is not a subdioid of S either. 4. The intersection of Scv and Scx is the subset of weighted exponentials in the max-plus sense (b ⊗ lc (·)) or affine functions in the conventional sense. In the max-plus framework, the decomposition of closed convex functions tells us that these functions are integrals of weighted exponentials. Moreover, the corresponding weights are explicitly given by the Fenchel transform. Theorem 6.25 For all f ∈ Scx , we have f = I −1 ◦F ◦F ◦I ( f ) , which can be written ∀k , f (k ) = c R ck F ◦I ( f ) (c) = to emphasize the exponential decomposition. c R k c F ◦I ( f ) (c) , (6.16) 6.4. Transfer Functions 285 Proof A function f ∈ Scx may be viewed as a transfer function in Ccx because it is a closed convex function. Therefore, I ( f ) equals f but considered as an element of Ccx . Because I ( f ) ∈ Ccx , we can solve F (g ) = I ( f ) for g . But we have an explicit formula for g , namely g = F (I ( f )). Then using the fact that F ◦F = ICcx , we have proved the result. Let us show now that the max-plus exponentials (conventional linear functions) lc are eigenvectors for the operator defined as the sup-convolution with a given impulse response. Theorem 6.26 For all impulse responses h ∈ S and all scalars c, we have h ⊗ lc = [F (h )](−c) lc . (6.17) Therefore [F (h )](−c) is the eigenvalue (called the gain of h for the exponential lc ) associated with the eigenvector lc of the operator g → h ⊗ g. Proof The proof is the same as in conventional algebra: [h ⊗ lc ](k ) = s R ck−s h (s ) = ck s R c−s h (s ) = ck [F (h )](−c) . We may use this property of the exponentials to compute the output of a shift-invariant system driven by a convex input. Theorem 6.27 We have ∀ f ∈ Scx , ∀h ∈ S , h⊗ f = c R F ◦I ( f ) (c) [F (h )](−c) lc , where F ◦I ( f ) (c) is the weight of the exponential lc in the spectral decomposition of f and [F (h )](−c) is the gain of h for the same exponential. Proof Using the distributivity of ⊗ (sup-convolution in S ) with respect to , we have h⊗ f = h⊗ c R F ◦I ( f ) (c) lc = c F ◦I ( f ) (c) (h ⊗ lc ) = c F ◦I ( f ) (c) [F (h )](−c) lc R R by (6.16), by linearity, by (6.17), and the function h ⊗ f also belongs to Scx . In conclusion, we have encountered two situations of special interest to compute the response of a system to an input 286 Synchronization and Linearity • if the input and the impulse response are concave, we can use the evaluation homomorphism to transform this inf-convolution into a pointwise conventional sum; • if the input is convex, we can first decompose it as a sum of exponential functions (in the dioid sense), and then, using the linearity of the system, we can sum up the responses to these exponentials inputs. 6.5 Rational Systems The set S is a nice algebraic structure but its elements are functions and therefore cannot be coded by finite sets of numbers in general. It is useful to consider subsets of these functions which can be coded in a finite way. The algebraic functions would constitute such a set. Those functions are described as the solutions of a polynomial systems of equations in S . But even in classical system theory the study of these systems is in its infancy. Therefore we restrict ourselves to a simpler situation. We only consider systems which can be described by a finite set of special linear equations y = hy ⊕ u . These equations describe the input-output relation of systems obtained by series, parallel and feedback compositions of elementary systems for which the impulse responses are explicitly known. Such systems are called rational. Clearly this notion of rationality depends on the elementary systems considered. Rational systems can be described in terms of the star operation ( y = h ∗ u ). This story is not specific to max-plus algebra, but the rationals of these max-plus algebras have simple characterizations in terms of their periodic asymptotic behavior which is similar to the periodicity property of the decimal expansion of rational numbers. The aim of this section is to characterize max-plus rational systems by their asymptotic behavior. 6.5.1 Polynomial, Rational and Algebraic Systems Let us consider 1. a subset K of S which also has a structure of idempotent algebra but not necessarily the same identity element as S (for example, nondecreasing functions define an algebra with e as the identity element—see §6.3.3); ´ 2. a finite set α = {α1 , . . . , α } of elements of S . Let us define five subsets of S which may be considered as extensions of K and which have a structure of idempotent algebra: polynomial or dioid closure K [α ] of K ∪ α : its elements are obtained by combining the elements of K ∪ α using a finite number of ⊕ and ⊗ operations; rational closure K (α) of K ∪ α : its elements are obtained by combining the elements of K ∪ α using a finite number of ⊕, ⊗ and ∗ operations; 6.5. Rational Systems 287 algebraic closure K {α } of K ∪ α : its elements are obtained by combining the elements of K ∪ α using a finite number of ⊕ and ⊗ operations and by solving polynomial equations with coefficients in K ∪ α ; series closure K [[α ]] of K ∪ α : its elements are obtained by combining the elements of K ∪ α using a countable number of ⊕ and ⊗ operations: this is the completion of the polynomial closure; topological closure K { α } of K ∪ α : its elements are obtained by combining the elements of K ∪ α using an infinite number of ⊕ and ⊗ operations and by solving polynomial equations: this is the completion of the algebraic closure. Note that S is a subset of Rmax { γ }. In general we have K ⊂ K [α ] ⊂ K (α) ⊂ K [[α ]] ⊂ K {α } . K {α } For example, consider K = B = {ε, e} and α = {δ }, where δ is the impulse response mentioned in Table 6.2. Recall that in S , δ d1 ⊕ δ d2 = δ max(d1 ,d2) . In this particular case we have B [δ ] Nmax ⊂ B(δ) = B[[δ ]] Nmax ⊂ B{δ } Qmax ⊂ B{ δ } Rmax , where Amax for a set A means A ∪ {ε} endowed with the max and the + operations, A is Amax ∪ {+∞}, and the isomorphisms above identify scalars d with impulse responses δd . Remark 6.28 Observe that in B[δ ] it is difficult to speak of the notion of valuation. For example, the valuation of δ d1 ⊕ δ d2 would formally be equal to min(d1 , d2 ); but at the same time δ d1 ⊕ δ d2 is equal to δ max(d1 ,d2 ); the latter is a monomial the valuation of which is thus the same as the degree, namely max(d1 , d2 ). ´´ Similarly, consider B[γ ] which is equal to e ⊗ B[γ ], and observe that the notion ´ of degree is equally difficult to define. Indeed, owing to (6.5) which holds true for nondecreasing signals, γ g1 ⊕ γ g2 is equal to γ min(g1, g2) , whereas it would formally have ´ ´ ´ a degree equal to max (g1 , g2). ´´´ The same difficulties arise in other polynomial (or dioid) closures such as B[γ , δ] = ax e B[γ , δ ]. This dioid is isomorphic to the polynomial subdioid of Min [[γ , δ ]] (‘polyno´ ´´´ mial’ in the sense of Definition 5.19). Any element in B[γ , δ] can be represented, in a nonunique way, as the product of e by an element of B[γ , δ ]: the latter may be called a ´ ‘representative’ of the former. It is thus possible to speak of the valuations and degrees ´´´ in γ and δ of such a representative in B[γ , δ ] of an element of B[γ , δ]. However, these notions can be given an intrinsic meaning only if we restrict ourselves to the ‘minimum representative’ which exists for polynomials (see Theorem 5.20). 6.5.2 Examples of Polynomial Systems Table 6.3 gives the main examples of polynomial closures of Rmax or of B used in this book. They are obtained from the set of scalars (identified with impulse responses δ d 288 Synchronization and Linearity as explained earlier) augmented with the impulse responses of some of the elementary systems encountered previously. We set φ = {φc1 , . . . , φc }, where ci is defined in Table 6.2. In the following, the ci are assumed to be positive and therefore φci is nondecreasing. Table 6.3: Polynomial extensions. K α R h ∈ K [α ] ⊂ Rmax Rmax {γ } h (k ) = ε only for k ∈ N. {γ } ´ h (k ) = ε for k ∈ R− ; over R+ , h is nondecreasing and piecewise constant with a finite number of discontinuities at integer abscissæ. B{ δ } ´ Rmax def = eRmax ´ ´ Rmax φ = {φc1 , . . . , φc } {γ , δ} ´´ h (k ) = ε for k ∈ R− ; over R+ , h is nondecreasing, piecewise constant and integer-valued with a finite number of discontinuities at integer abscissæ. {γ , δ} ∪ φ ´´ h (k ) = ε for k ∈ R− ; over R+ , h is convex, nondecreasing, piecewise linear with slopes in {c1 , . . . , c } and with a finite number of discontinuities at integer abscissæ. ´ B def = e{ε, e} ´ ´ B 6.5.3 h (k ) = ε for k ∈ R− ; over R+ , h is convex, nondecreasing and piecewise linear with slopes in {c 1 , . . . , c }. Characterization of Rational Systems A characterization of elements of K (α) is given under the assumption that K is rationally closed (see Definition 4.99); this is the representation problem. In the present 6.5. Rational Systems 289 context, we defined a rational element as an element h obtained by a finite number of ⊕, ⊗ and ∗ operations applied to elements of K ∪ α . The following result shows that only one ∗ operation is needed with respect to the elements in α . Consequently, it is easy to obtain a linear system which admits h as its impulse response. Theorem 6.29 (Representation of rational impulse responses) We assume that K is rationally closed (for example, K is a complete dioid). Then, for all h ∈ K (α), there exist n ∈ N, B , C ∈ K n and Ai ∈ K n×n , i = 1, . . . , , such that ∗ αi Ai h=C B. (6.18) i =1 Proof Let us refer to Theorem 4.105 and set B = K , C = K , U = K and V = α . Since K is supposed to be rationally closed, then U = K , U ⊗ B = K and U ⊗ V consists of linear combinations of elements of α with coefficients in K . These observations lead to (6.18). Example 6.30 Let us consider the element h of Rmax (γ ) defined by h = ((1γ 3 )∗ ⊕ (γ 2 )∗ )∗ . Observe that Rmax is a complete dioid. Using (4.109) and the fact that (a ∗ )∗ = a ∗ , we have h = ((1γ 3 )∗ )∗ ((γ 2 )∗ )∗ = (1γ 3 )∗ (γ 2 )∗ = (1γ 3 ⊕ γ 2 )∗ for which we obtain the realization x2 = γ x1 , x3 = γ x2 , x 1 = 1γ x 3 ⊕ γ x 2 ⊕ u , y = x1 . In the case of nondecreasing impulse responses, the form of the rational functions may be explicited by specializing Theorem 6.29. ´ Corollary 6.31 Every h ∈ Rmax (φ) can be written h i φc i , h= h i ∈ Rmax , i = 1, . . . , . i =1 ´ Proof Using Theorem 6.29 with K = Rmax (this is a complete, hence rationally closed, dioid) and α = φ , we can write ∗ a i φc i h=c b, i =1 ´ where the entries of b, c and ai belong to Rmax . By expanding the ∗ expression and by using the simplification rules given in (6.7), we obtain the form claimed in the statement ´ ´ of the corollary, but with coefficients h of the φ belonging to R . As such, they can i ci max ´ be written h i = eh i for some h i ∈ Rmax . On the other hand, recall that we assumed ´ ci > 0 for all i , which implies that the φci are nondecreasing. Hence eφci = φci . This ´ observation allows us to adopt the h i as the coefficients of the φci . 290 Synchronization and Linearity ´´´ Theorem 6.32 Every h ∈ B(γ , δ) can be written h = e ( p ⊕ γ ν δ τ (γ r δ s )∗ q ), where ´ • p ∈ B[γ , δ ] is a polynomial of degree at most ν − 1 in γ and τ − 1 in δ ; • q ∈ B[γ , δ ] is a polynomial of degree at most r − 1 in γ and s − 1 in δ . For a given h, the ratio s / r is independent of the particular representation of this type. The above form expresses a periodic behavior of the impulse response h . The polynomial p represents the transient part having a ‘width’ of ν and a ‘height’ of τ . The polynomial q represents a pattern having a ‘width’ of r and a ‘height’ of s . This pattern is reproduced indefinitely after the transient part (see Figure 6.2). The ratio s / r represents the ‘asymptotic slope’ (see Definition 6.46 below). For the extreme cases r = 0 or s = 0, the reader may return to the discussion in §5.7.4. Proof Because we are in the commutative case, we can refer to Theorem 4.110 with ´´´ T = {ε, e, γ , δ} and D = B[[γ , δ]]. In fact, in the following, we will also need to ´´´ ´´´ use elements of B{γ , δ}, hence we may embed all these structures into a larger one, ∗ l ´´´ namely B{ γ , δ} . From Theorem 4.110 we have that h = ´´ , for some i = 1 ai bi ´´´ ´ ´ ´ ai and bi which are elements of T = B[γ , δ] = eB[γ , δ ] (see §6.3.3). Since ai ´ ´ and bi are polynomials, we may consider their minimum representatives in B [γ , δ ] (see Remark 6.28), denoted ai and bi , respectively, and thus obtain the new form h = e li =1 ai (bi )∗ . It remains to show that this form can be reduced to the form given in ´ the theorem statement, which essentially uses the star of a single monomial in (γ , δ). This proof is outlined below. def Considering monomials m = γ r δ s , we first introduce the rational number sl(m ) = def s / r , called the ‘slope’ (with the convention that sl(e) = 0). This notion is extended to polynomials (or power series) as follows. If m 1 and m 2 are two monomials, then sl(m 1 ⊕ m 2 ) = sl(m 1 ) ⊕ sl(m 2 ) . The expression of sl(m 1 ⊗m 2 ) is a direct consequence of the definition since the product of two monomials is also a monomial. Using these rules, we notice that, if p is a polynomial, then sl( p∗ ) = sl( p ) and this is the maximum slope among the monomials which form the polynomial. We now propose the following inequalities: ´ ´ ´ x = eδ s (γ δ s / r )∗ ≥ y = e (γ r δ s )∗ ≥ z = eγ r (γ δ s / r )∗ def def def (note that δ s / r is an element of the algebraic closure of T ). Only the inequality y ≥ z will be proved. The other inequality can be proved using similar calculations. With n = α r + β, α, β ∈ N and β < r , all the monomials of z namely eγ r (γ δ s / r )n , n ∈ N, ´ can be written e(γ r δ s )α +1 γ β δ (β/ r −1)s . The monomial e (γ r δ s )α +1 appear in y , whereas ´ ´ the multiplicative monomial eγ β δ (β/ r −1)s is less than e (it has a nonnegative exponent ´ ´ in γ and a negative exponent in δ ), owing to the simplification rules for ‘shifts’ and ‘gains’ given in §6.2.2. Thus each monomial of z is dominated by a monomial of y . From these inequalities and from Lemma 3.107, we can derive the following four rules. 6.5. Rational Systems 291 Rule 1: sl(m 2 ) < sl(m 1 ) ⇒ p1 (m 1 )∗ ⊕ p2 (m 2 )∗ = p ⊕ p1 (m 1 )∗ , where p is a polynomial depending on the given polynomials pi and monomials m i . Rule 2: sl(m 2 ) = sl(m 1 ) ⇒ (m 1 )∗ ⊕ (m 2 )∗ = p ⊗ (lcm(m 1 , m 2 ))∗ , where m i = def ´ eγ ri δ si , lcm(m 1 , m 2 ) = e γ lcm(r1 ,r2 ) δ lcm(s1 ,s2 ) , and p is a polynomial depending ´ on the m i . Rule 3: sl(m 2 ) < sl(m 1 ) ⇒ (m 1 )∗ ⊗ (m 2 )∗ = (m 1 ⊕ m 2 )∗ = p ⊕ q (m 1 )∗ , where p and q are polynomials depending on the given monomials m i . This rule can be derived from (m 1 )∗ ⊗ (m 2 )∗ = r mr m ∗ . 12 Rule 4: sl(m 2 ) = sl(m 1 ) ⇒ (m 1 )∗ ⊗ (m 2 )∗ = (m 1 ⊕ m 2 )∗ = p ⊕ m gcd(m 1 , m 2 )∗ , where the gcd of two monomials is defined in a similar way as the lcm previously, m is a monomial and p a polynomial, both depending on the m i . The possibility of reducing h to the claimed form comes from the recursive utilization of these four rules. Finally, it should be clear that h cannot have two representations with different values of the ratio s / r . ´´´ Figure 6.2: An element of B(γ , δ) ´´´ Figure 6.3: An element of B(γ , δ, φ) ´ ´ Remark 6.33 The representation of rationals in Rmax (γ ) is a simple extension of ´´´ ´ Theorem 6.32. Indeed B(γ , δ) B δ (γ ) ´ Nmax γ . Therefore we have to ´ generalize the situation to the case when the coefficients of power series in γ are real instead of integer. This extension is straightforward. The result becomes: for each ´ ´ h ∈ R (γ ), there exist p , q ∈ R [γ ] of degrees ν − 1 and r − 1, respectively, and max max a ∈ R such that h = e ( p ⊕ q γ ν (a γ r )∗ ). For a given h (recall this is a nondecreasing ´ impulse response), a can be restrained to be nonnegative, and then the nonnegative slope a / r is independent of the particular representation chosen for h . Finally, the following corollary is just the synthesis of the previous results. ´´´ Corollary 6.34 Every h ∈ B(γ , δ, φ) can be written as h = e( p ⊕ γ ν δ τ (γ r δ s )∗ q ) , ´ where 292 Synchronization and Linearity • p ∈ B[γ , δ, φ ] is a polynomial of degree at most ν − 1 in γ and τ − 1 in δ , and it is linear in φci ; • q ∈ B[γ , δ, φ ] is a polynomial of degree at most r − 1 in γ and s − 1 in δ , and it is linear in φci . This theorem describes the asymptotically periodic behavior of the impulse response, the periodic pattern being a piecewise nondecreasing convex function (see Figure 6.3). 6.5.4 Minimal Representation and Realization The minimal representation problem can be stated in different terms depending on the ´ elementary subsystems that we consider. Let us discuss the two examples R [φ ] and max ´ Rmax (γ ). ´ ´ ´ Definition 6.35 (Minimal representation in Rmax [φ ]) Given h ∈ Rmax [φ ], where φ = {φc1 , . . . , φc } (the ci are nonnegative), the minimal representation problem consists in min finding a subset of φ with minimal cardinality min such that h = i =1 h i φci , with h i ∈ Rmax , i = 1, . . . , min . In conventional system theory, this problem corresponds to finding the minimal number of exponentials of which the impulse response is a linear combination. Observe that this representation directly corresponds to a realization ( A, B , C ) with a diagonal matrix A (see the theorem below). Indeed, in conventional system theory, the impulse response of a continuous-time shift-invariant system may contain functions of the form t n exp(kt ). The max-plus case is simpler because a t = t a and therefore the impulse response is only composed of max-plus exponentials. Theorem 6.36 Given h = i =1 h i φci , the realization x 1 = φc 1 u , . . . , x = φc u , y = Cx , with C = h 1 . . . h , is minimal if and only if the points (ci , h i ) ∈ R+ × R are the corners of the graph of a decreasing and concave piecewise linear function. Proof Let cimin = mini ci and cimax = max i ci . Over R+ , h is the upper hull of affine functions x → ci x + h i , whereas h (x ) = −∞ for x < 0. Since we are interested in determining whether the affine functions are all needed to represent h , it does not matter if we replace h by a new function H such that H (x ) = h (x ) for x ≥ 0 and H (x ) = +∞ for x > 0. This H is convex and is fully characterized by its Fenchel transform (see Remark 3.36). This latter function also is convex and piecewise linear, and it admits some of the points (ci , −h i ) as the corners of its graph. Moreover, owing to our assumption that H (x ) = +∞ for x < 0, this function is constant at the value −h imin on the left of cimin and is equal to +∞ beyond cimax . Because of the horizontal branch of the graph at the left-hand side, the first slope is zero and the next slopes are all positive since the slope of a convex function is nondecreasing in general, and moreover it strictly increases when a corner is traversed. Any pair (ci , −h i ) which is not a corner 6.5. Rational Systems 293 of the graph of the Fenchel transform can be discarded without changing this function. The corresponding affine function of x can also be discarded without changing h . The statement of the theorem expresses these conditions, which are obviously necessary and sufficient, up to the change of −h i into h i . ´ ´ Definition 6.37 (Minimal realization in Rmax (γ )) Given h ∈ Rmax (γ ), the minimal ´ ´ n ×n n n realization problem consists in finding a triple ( A, B , C ) ∈ Rmax × Rmax × Rmax with minimal n such that h = C (γ A)∗ B. Equivalently, if y = hu, then there exists x ∈ ´ ´ (γ ) n such that (u , x , y ) satisfy: R ´ max x y =S x u =e ´ A C B ε x u . Matrix S is called the system matrix. The problem of finding a minimal realization is still open. The previous minimal representation theorem in the case of continuous impulse responses cannot be extended to the present discrete situation in a straightforward manner, essentially because it is difficult to precisely identify the underlying ‘exponentials’ (that is, the expressions of the form (a γ ri )∗ which may contribute to the transient part of the impulse response). Many attempts to solve this problem have not been successful yet. In Chapter 9 partial results are given. Nevertheless the following theorem gives a realization which is not necessarily minimal but which contains only one star operation. ´ Theorem 6.38 (Realization in Rmax (γ )) Every element ´ with a system matrix S given by ε γε · · ε εγ · · · ·· · · aγ ε· · γ S = e ´ ε ·· · · · ·· · · · ·· · · q (r − 1) · · q (0) p (ν − 1) ´ of Rmax (γ ) can be realized ´ · · · ε γ · · · · · · · · γ · p (0) ε ε · · ε ε e ε corresponding to the event graph given in Figure 6.4. ´ Proof Remark 6.33 showed that if h ∈ Rmax (γ ), it can be represented as h = ´ ν −1 r −1 ν r∗ i j e ( p ⊕ q γ (a γ ) ), with p = ´ p (i )γ and q = i =0 j =0 q ( j )γ . By direct calculation, which consists in eliminating x in x y =S x u , with the above expression of S , one can check that y = hu with the given expression of h . 294 Synchronization and Linearity u e x r +ν e x r +ν −1 e p(0) xr p(1) e p(2) a x r −1 e q(0) q(1) x1 y q( r − 1) ´ Figure 6.4: One-star realization of rational systems in Rmax (γ ) ´ 6.6 Correlations and Feedback Stabilization In this section we develop a second-order max-plus system theory. This theory offers algebraic similarities with conventional second-order system theory. In the context of event graphs, its main application is the evaluation of sojourn times of tokens in places or in broader portions of the graph. A notion of internal stability is introduced by saying that a system is stable if all its sojourn times are bounded. Finally it is shown that a structurally observable and controllable system can be stabilized by a dynamic feedback while preserving the asymptotic open-loop performance. 6.6.1 Sojourn Time and Correlations We consider the problem of computing the sojourn times in timed event graphs. Let v and u be two daters associated with a pair of transitions (also named v and u ) surrounding a place p containing µ tokens initially (v corresponds to the upstream transition, u to the downstream). The token involved in the firing of transition u numbered k (this firing occurs at time u (k )) corresponds to the token which was produced by the firing of transition v numbered k − µ (occurring at v(k − µ)). This is because we deal with deterministic event graphs with constant holding times and the FIFO rule may be assumed for places. Therefore, we define the sojourn time Tuv (k , µ) of this token in place p (along arc (v, u ), marked with µ tokens initially) by Tuv (k , µ) = u (k ) − v(k − µ) . More generally, for two transitions v and u connected by a path ρ containing µρ tokens initially (i.e. µρ = |ρ |t ), Tuv (k , µρ ) = u (k ) − v(k − µρ ) represents the time spent along the path ρ by the token numbered k at u . These notions can be generalized to continuous systems, like the one presented in §1.2.7, by considering that tokens are ‘molecules’ of fluid in pipes. More formally, we introduce the following notions. Definition 6.39 Let u, respectively v , be an n-dimensional, respectively p-dimensional, vector with entries in S . Sojourn-time matrix The sojourn time (Tuv )i j (k , µ) of the token participating in the k-th firing of transition u i , and using a path from v j to u i which contains µ tokens initially, is defined as ◦ ◦ (Tuv )i j (k , µ) = u i (k )/v j (k − µ) = (u (k )/v(k − µ)) i j . 6.6. Correlations and Feedback Stabilization 295 ◦ Here / denotes the residuation of the instantaneous matrix product. The n × p matrix function Tuv (·, ·), which gives the sojourn times between paths going from the set v of p transitions to the set u of n transitions, is called the sojourn-time matrix. ◦ Correlation matrix Let Ruv be the matrix with entries in S defined by Ruv = u/v . ◦ is considered as the residuation of the (matrix) ‘convolution’ product in Here / force in S (which is similar to power series product). Therefore (see (4.97)), ◦ ∀m ∈ R, Ruv (µ) = [u/v ](µ) = ◦ (u (k )/v(k − µ)) = k Tuv (k , µ), (6.19) k This Ruv is called the correlation matrix of u with v . If u = v , it is called the autocorrelation matrix of u. There might be parallel paths with different initial markings. ´ Lemma 6.40 If v ∈ S , the mappings µ → Tuv (k , µ) and µ → Ruv (µ) are nondecreasing. Proof Since v is nondecreasing, ∀k , ◦ ◦ µ ≤ µ ⇒ v(k − µ ) ≥ v(k − µ) ⇒ u (k )/v(k − µ ) ≤ u (k )/v(k − µ) . The results follow immediately. ◦ Remark 6.41 We refer the reader to Example 4.65 for the manipulation of / in Rmax ◦ and to §4.6.2 for the matrix formulæ involving /. It may be useful to recall the point of view adopted in [49]. With the choice of primitives used therein, the residuation operator can be evaluated as follows in the case of vectors over Rmax . Let us first introduce the following notation: an overlined (square or nonsquare) matrix or vector will denote the transposed matrix or vector in which, moreover, the (conventional) sign of all entries has been changed to the opposite. For example, if a = 2 3 , then ◦ a = −2 −3 ; if h ∈ S , then h (t ) = −h (−t ).3 Then we have u/v = v ⊗ u and ◦ u = u ⊗ v , where ⊗ still denotes the matrix product in Rmax . These formulæ hold v\ also true in Rmax : we have ◦ = ε/ε = ε ⊗ ε = ε ⊗ , ◦ =/ = ⊗ = ⊗ε . It is also useful to recall the De Morgan formulæ for the sup and the inf (see (4.9)– (4.11)) and in particular the following formula: a⊗b=b a, 3 Indeed, if we view the convolution as an extension of the matrix product with infinite-dimensional elements (for special matrices in which entry (i, j ) depends only on the difference t = i − j ), then h (t ) = −h(−t ) is the composition of transposition and of change of sign. 296 Synchronization and Linearity where denotes the matrix product based on min and + (the absorbing element for scalar multiplication being now = +∞: we have ε ⊗ = ε but ε = ). For example, ◦ a/b = b ⊗ a = a b . Remark 6.42 1. Another interesting quantity is Tu+ (k , µ) = u (k ) − v((k − µ)− ) where v(k− ) = v lims ↑k v(s ). This T + is different from T when v is not left-continuous. 2. Let us consider the case when u and v are scalar functions. The analogy between the conventional correlation and the max-plus correlation should be clear: 1 T →∞ 2 T Suv (µ) = lim T −T u (s )v(s − µ) ds , Ruv (µ) = ◦ (u (s )/v(s − µ)) . s ∈R 3. For finite functions u i and v j , the classical distance supk |u i (k ) − v j (k )| can be expressed as − inf ( Ruv )i j (0), ( Rvu ) j i (0) , which shows some connection between the notions of distance and correlation. ◦ 4. From (6.19), it is clear that (Tuv )i j (k , µ) = u i (k )/v j (k − µ) is bounded from below by ( Ruv )i j (µ) for all i, j , k , m . On the other hand, ( Rvu ) j i (−µ) = ◦ v j (l )/ u i (l + µ) = l ◦ v j (k − µ)/u i (k ) , k ◦ ◦ ◦ hence e/ v j (k − µ)/u i (k ) is bounded from above by e/ ( Rvu ) j i (−µ) . This ◦◦ would provide an upper bound for (Tuv )i j (k , µ) if it were true that e/(x/ y ) = ◦ x . In Rmax this equality obviously holds true whenever x and y are scalars y/ assuming finite values. Otherwise it may not hold, as shown by the following ◦ ◦ ◦ example: let x = y = ε, then x/ y = , e/ = ε but y/x = . Let us now give the evolution equation of the sojourn time for a shift-invariant autonomous linear system. n ×n Theorem 6.43 For the system x (k + 1) = Ax (k ), where A ∈ Rmax , the sojourn time matrix Tx x (·, µ) follows the dynamics ◦ ◦ Tx x (k + 1, µ) = ( ATx x (k , µ))/ A = A(Tx x (k , µ)/ A) , provided that Tx x (·, µ) never assumes infinite values. More generally, the following inequalities always hold true ◦ ◦ Tx x (k + 1, µ) ≥ ( ATx x (k , µ))/ A ≥ A(Tx x (k , µ)/ A) . 6.6. Correlations and Feedback Stabilization 297 Proof We have ◦ ◦ Tx x (k + 1, µ) = x (k + 1)/ x (k + 1 − µ) = ( Ax (k ))/( Ax (k − µ)) ◦ ◦ by (f.9), = (( Ax (k ))/ x (k − µ))/ A ◦ x (k − µ))/ A ◦ ≥ ( A(x (k )/ by (f.12), ◦ ◦ ≥ A((x (k )/ x (k − µ))/ A) by (f.12). The two inequalities become equalities in the case when Tuv (k , µ) has finite entries only. Indeed, in Rmax , the only counterexamples to equality in (f.12) are the cases ◦ ◦ when ε and/or are involved: for example, ε = ε ⊗ (ε/ε) < (ε ⊗ ε)/ε = . The following result provides insight into how correlations are transformed by linear systems. Theorem 6.44 (Nondecreasing correlation principle) Consider a (MIMO) shiftinvariant system with (matrix) impulse response H ∈ S and two inputs signals u and v with their corresponding outputs y and z, respectively. Then ◦ y/ z ≥ ◦ ◦ (v \u )( H / H ) , (6.20) ◦ z \y ≥ ◦ (v \u ) (6.21) ◦ Hi j / Hi j . i, j Proof Observe first that, for all i, j , ◦ (u/v)i j (k ) = ◦ u i (l )/v j (l − k ) = l ◦ v j (l − k ) \ u i (l ) l because ⊗ is commutative for scalars. Using this equality for i = j and the obvious ◦ ◦ ◦ fact that (u/v)i j ≥ ε for i = j , we have that u/v ≥ (v \u )e, where e is the identity matrix. Then, we have ◦ y/ z = = ≥ ≥ ≥ ◦ ( H u )/( H v) ◦◦ (( H u )/v)/ H ◦ ◦ ( H (u/v))/ H ◦ u ) H )/ H ◦ ((v \ ◦ ◦ (v \u )( H / H ) by (f.9), by (f.12), as explained above, by (f.12). This proves (6.20). Inequality (6.21) is obtained easily from (6.20) and (4.82). Remark 6.45 ◦ ◦ ◦ 1. Since H / H ≥ e, by (f.6), Inequality (6.21) implies that z \ y ≥ v \u , which means that, in the SISO case, the correlation of output signals is not less than the correlation of inputs. ◦ ◦ ◦ ◦ ◦ 2. For autocorrelations, (6.20) becomes y/ y ≥ (u \u )( H / H ) ≥ H / H since (u \u ) ≥ e. This is a second correlation principle, which states that the autocorrelation of ◦ outputs is not less than the intrinsic correlation H / H of the system. ◦ Theorem 6.44 suggests the importance of quotients of the form A/ A. Theorem 4.59 and Corollary 4.69 gave an algebraic characterization of these quotients. 298 Synchronization and Linearity 6.6.2 Stability and Stabilization In this subsection we are concerned with what we call the internal stability of systems. The discussion will be limited to systems modeling timed event graphs. This notion of internal stability means that there is no accumulation of tokens in places or, dually, that the sojourn times of tokens remain finite. Let us start our discussion on stability by studying the relation between the asymptotic slopes of functions and their correlations. ´ Definition 6.46 (Asymptotic slope) Let h ∈ Rmax (γ ) be represented by ´ ∗ e( p ⊕ q γ ν (a γ r ) ) ´ (see Remark 6.33; without loss of generality, we assume that a is nonnegative). Then, ´ the asymptotic slope of h ∈ R (γ ), denoted sl (h ), is defined by the ratio a / r. ´ ∞ max As observed in Remark 6.33, the ratio a / r is independent of the particular representation of this type which was chosen for h . Note the difference between the slope introduced in the proof of Theorem 6.32 and this asymptotic slope: in the context ´ of R (γ ), the former would be the maximum ratio a (n )/ n among the monomials ´ max a (n )γ n appearing in h ; the latter is the limit of such ratios when n goes to infinity. ´ ´ ´ Theorem 6.47 Given a realization of some h ∈ Rmax (γ ) by an event graph with ‘internal state’ x, for any rational input (dater) u such that u (k ) = ε, ∀k < 0, the corresponding dater x is also rational and such that x (k ) = ε, ∀k < 0. The following equivalence holds true: ( A) : ∀i, j , ( Rx x )i j = ε ⇔ (B) : ∀i, j , sl∞ (x i ) = sl∞ (x j ) . Proof The case of zero slopes must be handled separately. Suppose that for some i and j , sl∞ (x i ) ≥ sl∞ (x j ) and that, moreover, sl∞ (x i ) > 0. Then it is easy to see that there exists a shift γ µ such that x i ≥ γ µ x j . Therefore, for all k ∈ Z, x i (k ) ≥ x j (k − µ) and ( Rx x )i j (µ) ≥ e. Consequently, if sl∞ (x i ) = sl∞ (x j ) > 0, ( Rx x )i j > ε and ( A) holds true. If sl∞ (x i ) = sl∞ (x j ) = 0, x i and x j are then polynomials, that is, they can be (minimally) represented by a finite number of coefficients not equal to ε. In this case, ◦ it is easy to conclude this part of the proof by remembering that ε/ε = . Conversely, if (B) does not hold, that is, there exists a pair (i, j ) such that sl∞ (x j ) > sl∞ (x j ), then whatever µ ∈ Z, x i (k ) increases to infinity strictly faster than x j (k − µ) ◦ when k → +∞. Hence, for all µ ∈ Z, k x i (k )/ x j (k − µ) = ε (the ∧ is obtained as a limit when k → +∞) and (A) is contradicted. Definition 6.48 (Internal stability) When the equivalent conditions (A) and (B) hold true for all inputs of the type described in Theorem 6.47, we say that the realization is internally stable. Remark 6.49 Owing to Remark 6.42 (point 4), in the situation of Definition 6.47, and if all daters x i remain finite, one can obtain an upper bound for the sojourn times of tokens in any internal path of the event graph (using appropriate shifts µ). The 6.6. Correlations and Feedback Stabilization 299 condition that the x i remain finite is satisfied if the inputs remain finite (no indefinite ‘starving’ of tokens at the input transitions) and if the system has no deadlocks. A deadlock would be revealed by infinite asymptotic slopes for the daters associated with transitions belonging to the deadlocked circuits. However, even for internally stable event graphs with finite inputs and no deadlocks, it might happen that tokens incur unbounded sojourn times: this typically occurs in places which are located immediately downstream of u -transitions (sources), when one uses inputs which are too fast with respect to the potential throughput of the event graph (that is, when the asymptotic slopes of these inputs are less than the common value of the asymptotic slopes of the x i ). Corollary 6.50 Given a rational impulse response h and a realization described by a triple of matrices4 ( A, B , C ), this realization is internally stable if and only ∀i, j , ◦ ( H / H )i j = ε , def ´ where H = e (γ A)∗ B. ◦ ◦ Proof The condition is sufficient because x = H u and x/x ≥ H / H by Remark 6.45 ◦ ◦ ◦ (point 2). Conversely, since ( H / H )i j = l Hil/ H jl and if ( H / H )i j = ε, then there ◦ exists lo such that Hilo / H jlo = ε. The system does not satisfy the requirement of Definition 6.48 for the input ulo = e, ul = ε, l = k . ´ Theorem 6.51 If the internal subgraph of an event graph (that is, the subgraph obtained by deleting input and output transitions together with the arcs connecting them to other transitions) is strongly connected, then this system is internally stable. Proof Indeed if the internal subgraph is strongly connected, for any pair of internal nodes (i, j ), there exists a path ρ from j to i containing µρ tokens (µρ may be equal to 0). Then we have x i (k ) ≥ αρ ⊗ x j (k − µρ ), where αρ is the sum of the holding times of the places in the path ρ (i.e. αρ = |ρ |w ; αρ > ε, indeed αρ ≥ e). Therefore (Tx x )i j (k , µρ ) ≥ tρ for all k . This holds for any input and Definition 6.48 is satisfied. When a given (open-loop) event graph is not internally stable, we consider the problem of obtaining this property by ‘closing the loop’ between inputs and outputs. By this we mean that u will be obtained from y by u = F y ⊕ v , where F is a ‘feed´ ´ back’ matrix of appropriate dimensions with entries in R (γ ), and v is the new input max (of the same dimension as u ). The situation is depicted in Figure 6.5 from which it appears that (at least some components of) u , respectively y , do no longer correspond to sources, respectively sinks. The feedback should in general be dynamic in the sense that F should indeed contain terms a (n )γ n with n ≥ 1 and a (n ) ≥ e, in order to avoid ´ deadlocks in the closed-loop system. A term of this type in Fi j means that there exists a path from y j to u i (in grey in the figure) with a total number of n tokens in the initial marking and a total holding time of a (n ) time units. 4 See Definition 6.37, except that minimality is not required here. 300 Synchronization and Linearity v1 v2 u1 u2 x1 x2 fee db ack in g r e y y Figure 6.5: An unstable timed event graph with a stabilizing feedback The stabilization of event graphs by output feedback requires the introduction of the following notions. Definition 6.52 Structural Controllability An event graph is structurally controllable if every internal transition can be reached by a path from at least one input transition. Structural Observability An event graph is structurally observable if, from every internal transition, there exists a path to at least one output transition. Theorem 6.53 (Feedback stabilization) Any structurally controllable and observable event graph can be made internally stable by output feedback. Proof The idea of the proof is to fulfill the sufficient condition of strong connectedness (mentioned in Theorem 6.51) for the internal subgraph of the closed-loop graph. Under the assumptions of structural controllability and observability of the open-loop system, it should not be difficult to see that this sufficient condition can indeed be satisfied if an effective feedback connection is established from any output to any input. Of course, one can imagine more refined strategies in order to attempt to minimize the number of feedback links so introduced. Obviously, input transitions which are upstream of several m.s.c.s.’s of the internal subgraph must be preferably used and a similar remark applies to output transitions. Example 6.54 The timed event graph represented in Figure 6.5 is not internally stable in the open-loop configuration. For instance, if tokens are input through u 1 at the rate of 2 tokens per time unit, then tokens accumulate indefinitely in the place between x 1 and x 2 since the throughput of x 2 is limited to one token per time unit, whereas x 1 can process tokens at the given input rate. On the other hand, the system can be stabilized by the feedback shown in Figure 6.5 (grey lines). 6.7. Notes 6.6.3 301 Loop Shaping In the previous subsection we saw how to obtain an internally stable system by closing the loop between input and output, provided the system is structurally controllable and observable. However, this operation creates new circuits whereas it preserves the circuits already existing in the open-loop system. Therefore, the maximum cycle mean may only increase when passing from the open-loop to the closed-loop system, which means that the throughput (inverse of the maximum cycle mean) may only be worse, resulting in a loss of performance. The newly created circuits traverse the feedback arcs. If any such circuit, say ζ , happens to be critical, it suffices to increase the number of tokens in the corresponding feedback path in such a way that the cycle mean |ζ |w /|ζ |t ceases to be critical. This reasoning justifies the following theorem which improves the previous one. Theorem 6.55 Any structurally controllable and observable event graph can be made internally stable by output feedback without altering its original open-loop throughput. Another more algebraic view on this problem can be explained in the simple case of a SISO system. Let h be its (rational) impulse response. The open-loop throughput is 1/sl∞ (h ). If one uses the feedback law u = f y ⊕ v , the closed-loop system is y = (h f )∗ h v . Then it can be proved that if f = γ µ, there exists µ large enough such that sl∞ ((h γ µ )∗ h ) = sl∞ (h ). An interesting question is to determine the minimum number of tokens (which may represent costly resources practically) such that a desired throughput is achieved. This problem is discussed in [62]. 6.7 Notes This chapter is based on the two articles [111] and [112]. The idea of extending the application of the max-plus algebra to continuous systems was proposed by R. Nikoukhah during a Max-Plus’ working group meeting. It is quite natural once one realizes that time-invariant max-plus linear systems indeed perform sup-convolutions of the inputs with their impulse responses. Continuous Petri nets have also been studied in [14] and [99]. Formal and numerical transfer functions are isomorphic in conventional algebra, and therefore they are not always clearly distinguished in the literature on system theory. The situation is quite different in the max-plus context. The terminology ‘transfer function’ was reserved for the Fenchel transform of the impulse response in this chapter. In the literature on optimization, the idea of considering dynamic systems based on vector sums of convex objects appeared from time to time but with no connection to the modeling of synchronization mechanisms. The characterization of rational impulse responses in terms of periodicity was given for the first time in [41]. A program for symbolic computation based on this periodic characterization of rational systems has been developed by S. Gaubert. It is called MAX [62]. An analogous notion of periodicity exists in the Petri net literature [36]. The second-order theory developed in the second part has two origins: the first stems from [112], the second from [4]. The first is concerned with finding a max-plus equivalent of the autocorrelation of a process, the second with describing the recurrent equation of differences. 302 Synchronization and Linearity The application to stability provides another view on the stabilization by feedback described for the first time in [41]. The nondecreasing correlation principle was found by S. Gaubert. The interesting problem of the optimization of the number of tokens involved in the loop shaping issue has been solved in [62] but was not discussed here. Part IV Stochastic Systems 303 Chapter 7 Ergodic Theory of Event Graphs 7.1 Introduction The main practical concerns of this chapter are the construction of the stationary regime of stochastic event graphs and the conditions on the statistics of the holding times under which such a stationary regime exists. The basis for the analysis is the set of equations which govern the evolution of daters, established in Chapter 2. In §7.6 we will see that this construction also allows us to determine the stationary regime of the marking process. The main tool for addressing these problems is ergodic theory: the existence problem is stated in terms of a ‘random eigenpair problem’ which generalizes the eigenpair problem formulation of Chapter 3 in the deterministic case, and which can be seen as the Rmax -analogue of that of a multiplicative ergodic theorem in conventional algebra. Section 7.2 focuses on a simple one-dimensional nonautonomous example. This example is the Petri net analogue of the classical G/ G/1 queue. Most of the basic probabilistic tools to be used in this chapter are introduced through this simple example. These tools are based on the probabilistic formalism of [6]. More advanced probabilistic material, and in particular the ergodic theorems which are used or referred to in the chapter, are gathered in §7.7. Section 7.3 gives the basic first-order theorems which indicate how the daters grow in such a stochastic framework. The growth rates given in these first-order theorems are shown to be the Rmax -analogues of Lyapunov exponents in conventional algebra, and generalizations of cycle times in the deterministic case. Second-order theorems are concerned with the construction of the eigenpairs. This construction is based on the analysis of ratios of daters (in the Rmax sense). This second-order theory is first presented for multidimensional nonautonomous systems in §7.4. It is shown that under appropriate statistical assumptions, this type of systems admits a unique stationary regime which is reached in finite time, regardless of the initial condition, provided the ‘Lyapunov exponents’ of the m.s.c.s.’s are less than the asymptotic rate of the input. Section 7.5 focuses on the autonomous case. We provide a simple and natural condition for the uniqueness and the reachability of the stationary regime. Throughout the chapter, we will consider two levels of abstraction. • The first level is that of stochastic Petri nets, for which we will use the notation of 305 306 Synchronization and Linearity Chapter 2, and for which the reference dioid will be Rmax . This level will provide examples and will require a particular attention because of the difficulties related to the initial conditions. • The second one is that of linear algebra in a stochastic context. The discussions will rely upon the notion of residuation introduced in Chapter 4. We will try to consider general dioids, although most of the practical results which we have at this stage are limited to Rmax . For each section of the chapter, we will try to indicate which level is considered. 7.2 A Simple Example in Rmax 7.2.1 The Event Graph We first consider a simple example of an event graph with a single input: the event graph has two transitions q1 and q2 , two places p1 and p2 , and the following topology: p2 is a recycling of q2 , π( p1 ) = q1 , σ ( p1 ) = q2 , and q1 is an input transition with input sequence {u (k )}k ≥1 . The initial marking has one token in pi , with lag time wi , i = 1, 2, so that the set Q of transitions followed by at least one place with nonzero initial marking is {q2 } (see Figure 7.1). The holding times in p1 are all zero, whereas q1 p1 q2 p2 α Figure 7.1: A simple example those in p2 are given by the sequence {α(k )}. Observe that M = 1 and that both |Q | and |I | are equal to 1. Accordingly, the matrices A (k , k − 1) and B (k , k − 1) in (2.38) are one-dimensional: A(k , k − 1) = (α(k )) , B (k , k − 1) = (e) . Let A(k ) = α(k + 1). Equation (2.38) reads x (k + 1) = A(k )x (k ) ⊕ u (k ) ⊕ v(k + 1) , k≥0 , (7.1) where v(1) = w1 ⊕ w2 and v(k ) = ε for k = 1. In this equation the continuation for (u (0), x (0)) is (u (0), x (0)) = (ε, ε). The input is weakly compatible if u (1) ≥ e. The initial lag times are weakly compatible if w2 ≤ α(1) , w1 ≤ e and w1 ⊕ w2 ≥ e . (7.2) Since each transition is followed by at most one place with a nonzero initial marking, any weakly compatible initial condition is compatible, so that we can rewrite the 7.2. A Simple Example in Rmax 307 preceding equation as x (k + 1) = A(k ) x (k ) ⊕ u (k ) , k ≥0 , (7.3) ◦ provided we now take x (0) = w2/α(1) and u (0) = w1 . Remark 7.1 With the statistical assumptions described in the next subsection, this system is very close to the FIFO G/ G/1/∞ queue. The G/ G part states that both the ratios of the input sequence (see (7.4) below) and the holding times form general stationary sequences. The 1/∞ part states that there is a single server and that there is an infinite buffer in front of the server. This system is also known as the producer-consumer system in theoretical computer science. The input u 1 into transition q1 features the external input stream of customers, p1 is the infinite buffer which stores customers to be served, q2 features the single server, and the holding times in p2 represent the service times. 7.2.2 Statistical Assumptions The statistical assumptions are as follows: the firing times are 0, as well as the holding times in p1 . The holding times α(k ), k ≥ 1, (or equivalently A(k )) and the ratios {U (k )}k ≥1 of the input sequence, where def ◦ U (k ) = u (k + 1)/u (k ) , (7.4) form two jointly stationary and ergodic sequences on some probability space ( , F, P). This assumption can be seen as the stochastic generalization of the case of periodic input considered in Chapter 3: the constant ratios of the input and the constant holding times in Chapter 3 are now replaced by stationary ratios. Whenever needed, we will def stress the fact that A(k ), U (k ) and w = (w1 , w2 ) are random variables, namely measurable functions from into R, by writing A(k ; ω), U (k ; ω) and w(ω) instead of A(k ), U (k ), and w , respectively. Observe that this is tantamount to using the same notation for a function and for the value which it takes at a particular point. The context should always allow the reader to decide what is meant. Before going further in the analysis of the system, we comment on the statistical framework, and on what will be meant by joint stationarity and ergodicity of the two sequences { A(k )} and {U (k )} throughout this chapter and the next one. Definition 7.2 (θ -shift) The mapping θ : → is a shift operator on ( , F, P) if it is bijective and measurable from onto itself, and if it is such that the probability law P is left invariant by θ , namely E[ f ] = E[ f ◦θ ], for all measurable and integrable functions f : → R, where E denotes the mathematical expectation with respect to P. By convention, the composition operator ‘◦’ has the highest priority in all formulæ. For instance, f ◦gh means ( f ◦ g )h . 308 Synchronization and Linearity Definition 7.3 (θ -stationarity) We say that a sequence of R-valued random variables {a (k ; ω)}k∈Z defined on ( , F, P) is θ -stationary if the relation a (k ; ω) = a (0; θ k (ω)) (7.5) holds for all k ≥ 0, where θ k is the composition of θ by itself k times: θ k+1 = θ k ◦θ , and θ 0 = I , the identity. Remark 7.4 Another way of stating the preceding definition consists in requiring that a (0)◦θ k = γ −k a (0), for all k ∈ Z, where γ is the backward shift operator on the numbering of sequences which was defined in §5.3.2. In the present example, we will assume that the data of the problem, namely both sequences { A(k )} and {U (k )}, are θ -stationary. We immediately obtain from this and from (7.5) that for all integers m , the relation E[h ( A(0), U (0), . . . , A(k ), U (k ))] = E[h ( A(m ), U (m ), . . . , A(m + k ), U (m + k ))] holds for all measurable functions h : R2(k+1) → R such that the expectation exists. This is a natural property to expect from joint stationarity indeed. Starting from this assumption, we will then be interested in proving that other quantities associated with the event graph also satisfy the θ -stationarity property. Similarly, the joint ergodicity of the sequences { A(k )} and {U (k )} is obtained when assuming that θ is P-ergodic: Definition 7.5 (Ergodic shift) The shift θ is said to be ergodic if the almost sure (a.s.) limit 1 k →∞ k k f ◦θ l = E[ f ] lim a.s. (7.6) l =1 → R. holds for all measurable and integrable functions f : Owing to (7.5), the last property implies in particular that 1 k →∞ k k A(l ) = E[ A(0)] a.s. lim and l =1 1 k →∞ k k U (l ) = E[U (0)] a.s. , lim l =1 provided A(0) and U (0) are integrable, which corresponds to the conventional meaning of the ergodicity of both sequences. The joint ergodicity becomes more apparent from the formula 1/ k k h ( A(l ), U (l )) lim k →∞ = E[h ( A(0), U (0))] a.s. l =1 for all measurable functions h : R2 → R such that the expectation exists, which is also a direct consequence of our definition. 7.2. A Simple Example in Rmax 309 A measurable set A of F (an ‘event’) is said to be θ -invariant if the indicator function of A, which will be denoted 1{A}, satisfies the relation 1{A}◦θ = 1{A}. We will often make use of Birkhoff’s pointwise ergodic theorem (see [20]). Theorem 7.6 (Birkhoff) The shift operator θ is ergodic if and only if the only sets of the σ -algebra F which are θ -invariant are of measure 0 or 1. Example 7.7 (Canonical probability space) In the particular example which we consider, the data consist of the two sequences { A(k )} and {U (k )}. A concrete example of such a shift operator is provided by the translation operator γ −1 on the canonical space of the two sequences, which is defined as follows: • is the space of bi-infinite sequences of the form . . . , z (−2), z (−1), z (0), z (1), z (2), . . . , where z (k ) = (s (k ), t (k )) ∈ R2 for all k ∈ Z; • F is the σ -algebra generated by the coordinate mappings ek (ω), k ∈ Z, where ek (ω) = z (k ); • P is a probability measure on the measurable space ( , F). On this probability space, we can then take θ(. . . , z (−1), z (0), z (1), . . . ) = γ −1(. . . , z (−1), z (0), z (1), . . . ) , that is ek (θ(ω)) = ek+1 (ω) for all k ∈ Z. Within this framework, the θ -stationarity assumption boils down to the assumption that the probability law P of the two sequences is left invariant by γ . If A(k ; ω) denotes the first component of ek (ω), and U (k ; ω) the second one, we obtain that (7.5) is indeed satisfied by both sequences. Remark 7.8 If we consider a sequence of random variables, say {b (k ; ω)}, defined on this canonical probability space, which is different from the coordinate process, it is not true in general that b (0)◦θ k = b (k ). It is clear that b (0; θ(ω)) = b (0; γ −1 (ω)), but in general, it is not true that b (0; γ −1(ω)) = γ −1b (0; ω) because the translation operator which is used at the right-hand side of the last relation has nothing to do with the specific one used at the left-hand side, which operates on the sequences of . For instance, take b (k ; ω) = k A(k ; ω). We have b (0)◦θ k = 0, which clearly differs from k A(k ) unless A(·) = 0. 7.2.3 Statement of the Eigenvalue Problem We can rewrite the equations governing this system as u (k + 1) = U (k )u (k ) , x (k + 1) = A(k )x (k ) ⊕ u (k ) , or equivalently as X (k + 1) = D (k ) X (k ) , k≥0 , (7.7) 310 Synchronization and Linearity where X (k ) = u (k ) x (k ) and D (k ) = U (k ) e ε A(k ) . The variables A(k ) and U (k ) are assumed to be integrable random variables, defined on a probability space ( , F, P, θ), and such that A(k ) = A◦ θ k , def U (k ) = U ◦θ k , k∈Z , def where A = A(0) and U = U (0). Under this assumption, D (k ) = D ◦θ k , with D = D (0). In the deterministic setting, we looked for periodic regimes in terms of eigenpairs associated with the Rmax matrix describing the dynamics of the event graph (see §3.7). In the stochastic setting defined above, we state the problem as the following ‘random eigenpair’ problem. Can we find an eigenvector X = ( X 1 , X 2 ), normalized in such a way that X 1 = e, and an eigenvalue λ, which are both random variables defined on ( , F, P, θ), and such that D X = λ X ◦θ ? (7.8) In view of the specific form of D , this eigenpair property reads λ e ⊕ AX 2 = U, = U X 2 ◦θ . (7.9) So the true unknown is X 2 , and the equation it satisfies is a stochastic fixed point equation. Assume that the above eigenpair is finite; whenever we take x (0) = X 2 and u (0) = X 1 = e in Equation (7.7), we obtain u (1) x (1) = U, = e ⊕ AX 2 . (7.10) From (7.9) and (7.10), we see that x (1) − u (1) = X 2 ◦θ + U − U = (x (0) − u (0))◦θ . More generally, we prove in the same way that for all k ≥ 0, x (k ) − u (k ) = (x (0) − u (0))◦ θ k . Therefore, if the above eigenpair problem has a finite solution, we can find an initial condition such that the random variables x (k ) − u (k ) are stationary. Let us show that for this initial condition the marking process is also stationary: let N + (k ) denote the number of tokens in p1 at the epoch when transition q2 fires for the k -th time, namely at x (k ). Within our setting, N + (k ) is a random variable. For instance N + (1) is given by the following expression (see §2.5.6): + ∞ N (1) = ∞ 1 { x ( 1 ) ≥u ( h ) } = h =1 h =1 1{ X 2 ◦θ ≥ h −1 l =1 U ◦θ l } , 7.2. A Simple Example in Rmax where obtain 0 1 311 = 0 by convention. Similarly, when using the convention N + (2) ∞ 1 2 = 0, we ∞ = 1 { x ( 2 ) ≥u ( h ) } = h =2 ∞ = h =2 1{ ◦θ ≥ h =2 h −1 l =1 U ◦ θ l } ◦θ 1{ X 2 ◦θ 2 ≥ h −1 l =2 U ◦θ l } = N + (1)◦ θ , and more generally N + (k + 1) = N + (1)◦θ k . 7.2.3.1 Solution of the Eigenpair Problem For this classical example, it is customary to take def = X 2 ◦θ U as unknown, rather than X 2 , mainly because this new unknown is nonnegative; indeed it is immediately seen from (7.9) that ◦θ ◦ = e ⊕ ( A◦θ/U ) = e ⊕ F , (7.11) where def ◦ F = A◦ θ/U . The construction of a solution to (7.11) is based on a backward construction which is common in ergodic theory, and which will be used on several occasions in this chapter. The backward process (k ), k ≥ 0, associated with (7.11) is the random process on the probability space ( , F, P) defined by (0) = e and (k + 1) = = (e ⊕ F (k )) ◦θ −1 e ⊕ F ◦θ −1 (k )◦ θ −1 , k≥0 . (7.12) We will return to the physical interpretation of this process in the next subsection. A nice property of the backward process is that (k ) = (k ; ω) is nondecreasing in k for all ω. This is obtained by induction: it is true that (1) ≥ (0) = e; assuming that (k ) ≥ (k − 1), we obtain from (7.12) that (k + 1)◦θ = ≥ e ⊕ F (k ) e ⊕ F (k − 1) = (k )◦ θ . Let be the a.s. limit of (k ) as k goes to ∞ (the a.s. limit exists because (k ) is nondecreasing for all ω). The random variable may be finite or infinite. In both cases we obtain that satisfies (7.11) by letting k go to ∞ in (7.12). 312 7.2.3.2 Synchronization and Linearity Finiteness of the Eigenvector The main result of this subsection is the following theorem. Theorem 7.9 (Stability condition) If E[ A] < E[U ], then the eigenvector ◦ X = e, ( /U )◦θ −1 is P-a.s. finite. Since U is a.s. finite (U is assumed to have a finite mean), it follows from the very definition of that X is a.s. finite if and only if is a.s. finite. The event { = ∞} is θ -invariant. Indeed, if (ω) = ∞, then (θ(ω)) = ∞, in view of (7.11) and of the assumption that U (ω) < ∞ a.s. Similarly, (θ(ω)) = ∞ implies (ω) = ∞, since A(ω) < ∞ a.s. Therefore, in view of the ergodic assumption, P[ = ∞] is either 0 or 1: either is finite with probability 1, or it is infinite with probability 1 (see Theorem 7.6). Lemma 7.10 (Backward star) The following relation holds: l k F ◦θ −h , (k ) = (7.13) l =0 h =1 where the ⊗-product over an empty set (when l = 0) is e by convention. Proof The proof is by induction on k . The relation holds true for k = 0 since both sides are equal to e. Assume the relation holds up to some k ≥ 0. Then using (7.12), we obtain k (k + 1) = l F ◦ θ −1 ⊗ F ◦ θ −h ◦θ −1 ⊕e l =0 h =1 k +1 l F ◦ θ −h ⊕ e = = l =1 h =1 k +1 l F ◦θ −h , l =0 h =1 where we used the distributivity of ⊗ with respect to ⊕ and the associativity and commutativity of ⊕ to pass from the first expression to the second in the last equation. Remark 7.11 The property that (k ) is nondecreasing, which was already shown in the preceding subsection, is obvious from (7.13), since this relation shows that (k ) consists of the maximum of an increasing set of random variables. Proofof Theorem 7.9 If E[ F ] < 0 (or equivalently E[ A] < E[U ]), from the pointwise ergodic theorem we obtain that the a.s. limit 1/ k k F ◦θ lim k →∞ h =1 −h 1 = k k ( A◦ θ −h+1 − U ◦θ −h ) = E[ A − U ] < 0 h =1 7.2. A Simple Example in Rmax 313 holds (we used the obvious property that θ is P-ergodic if and only if θ −1 is P-ergodic). Therefore, k ( A◦ θ −h − U ◦θ −h ) def S (k ) = h =1 tends to −∞ a.s. as k goes to ∞, which in turn implies that S (k ) < 0 for all k greater than a finite random integer L . Hence (k ) is a.s. finite in view of (7.13), since it is the maximum of an a.s. finite number L of finite random variables. Remark 7.12 A partial converse of the preceding result is the following: if E[ A] > E[U ], then no P-a.s. finite solution of (7.11) exists. To prove this, it is enough to show that = ∞, P-a.s., and that is the least nonnegative solution of (7.11). The latter is proved by induction. If we start with a nonnegative solution of (7.11) for which ≥ (0) = e, it is easily checked that ≥ (k ) implies ≥ (k + 1) (this is true because (k + 1)◦θ = e ⊕ F (k ) ≤ e ⊕ F = ◦θ ). As for the proof of = ∞, P-a.s., it follows from the fact that S (l ) then tends to ∞ a.s. as l goes to ∞. This in turn implies that (k ) tends to infinity as well, in view of (7.13). Remark 7.13 The random variable may be finite and nonintegrable. A simple example of this situation is provided by the M / G/1 case (namely u (k ) is the k -th epoch of a Poisson process and {α(k )} is an independent i.i.d. sequence), whenever the service times α(k ) have infinite second moments (see [46]). 7.2.4 Relation with the Event Graph This section focuses on the relationship between the eigenpair which was constructed in the previous section and the stochastic event graph which motivated our preliminary def ◦ example. Consider the ‘ratios’ δ(k ) = x (k + 1)/u (k ) = x (k + 1) − u (k ), k ≥ 0. By using (7.3), we obtain x (k + 2) − u (k + 1) = max ( A(k + 1) + x (k + 1), u (k + 1)) − u (k + 1) = = max ( A(k + 1) + x (k + 1) − u (k + 1), 0) max ( A(k + 1) + δ(k ) − U (k ), 0) , which corresponds to the Rmax relation δ(k + 1) = e ⊕ F (k )δ(k ) , k≥0 , (7.14) where the initial condition δ(0) is given by the relation def ◦ ◦ δ(0) = ( A(0)x (0) ⊕ u (0))/ u (0) = (w2 ⊕ w1 )/w1 , (7.15) ◦ and F (k ) = A(k + 1)/U (k ) = F ◦θ k , k ≥ 0. When making use of Assumption (7.2), we obtain ◦ δ(0) = w2/w1 ≥ e , (7.16) 314 Synchronization and Linearity for all weakly compatible initial lag times w . This lower bound is achievable whenever w1 = w2 = e. In what follows we will emphasize the dependence on the initial condition by adding a second optional argument to the δ function: for instance, δ(k ; z ) will denote the value of δ(k ) whenever δ(0) = z . Of special interest to us will be the sequence {δ(k ; e)} defined by (7.14), and by the initial condition δ(0; e) = e. It is immediately checked by induction that (k ) = δ(k ; e)◦ θ −k , (7.17) which shows that δ(k ; e) and (k ) have the same probability law. Indeed, this is true for k = 0, and assuming it is true for some k ≥ 0, we obtain from (7.12) that (k + 1) = = = e ⊕ F (δ(k ; e)◦ θ −k ) ◦θ −1 (e ⊕ F (k )δ(k ; e)) ◦θ −k−1 δ(k + 1; e)◦θ −k−1 , where we used the property that F = F (k )◦ θ −k . Therefore, the random variable (k ) stands for the value of δ(k ; e), when replacing the sequence U (0), U (1), . . . by U (−k ), U (1 − k ), . . . , and the sequence of holding times A(0), A(1), . . . by A(−k ), A(1 − k ), . . . , respectively. Remark 7.14 Another interpretation is as follows: we go backward in time and we define the continuation of u (k ) by 0 u (k ) = u (1) − U (l ) , k≤0 . l =k If we assume that the holding time of the k -th token in p2 is α(k ), and that the entrance of the k -th token in p1 takes place at time u (k ), for all k ∈ Z, we can then interpret ◦ ◦ (k ) as the value of x (1)/u (0) given that the value of x (−k + 1)/u (−k ) is e. Remark 7.15 Assume is a.s. finite. If we take the initial lag times such that w2 = w1 , (7.18) then δ(0) = , so that δ(1) is equal to ◦θ , in view of (7.16); more generally we have δ(k ) = ◦θ k for all k ≥ 0. In words, we found initial lag times which make the ratio process δ(k ) θ -stationary. Observe, however, that these initial lag times are not weakly compatible in general. For instance, if we take w1 = u (0) = e (w1 and w2 are only defined through (7.18) up to an additive constant), and if > α(1) with a positive probability, then (7.18) shows that the compatibility relation w2 ≤ α(1) cannot hold almost surely. Remark 7.16 If the ratios δ(k ) are finite and θ -stationary, the θ -stationarity of the ◦ ratios x (k + 1)/x (k ) is easily obtained from the following relation x (k + 1) x (k + 1) u (k ) u (k − 1) = . x (k ) u (k ) u (k − 1) x (k ) 7.2. A Simple Example in Rmax ◦ If x (k + 1)/u (k ) = ◦θ k 315 ◦ , since u (k + 1)/u (k ) = U ◦θ k , we then have ◦ x (k + 1)/x (k ) = k◦ k −1 ◦ ◦θ /U ◦θ / Therefore ◦ ◦ /U ◦θ −1 / ◦ x (k + 1)/ x (k ) = ◦θ ◦θ −1 k −1 ◦θ . k . Remark 7.17 By computing the star operation forward (or more directly by using (7.17) and (7.13)), we obtain the expression k k −1 δ(k ; e) = F (h ) . (7.19) l =0 h=k −l The sequence {δ(k ; e)} is not monotone in general. 7.2.5 Uniqueness and Coupling The main result of this section is the following theorem. Theorem 7.18 If the stability condition E[ A] < E[U ] is satisfied, there exists a unique finite random eigenvalue λ and a unique finite random eigenvector X = ( X 1 , X 2 ), with X 1 = e, such that (7.8) holds. In addition, for all finite random initial conditions X (0) = ( X 1 (0), X 2 (0)) with X 1 (0) = e, there exists a finite integer-valued random variable K such that, for k ≥ K , X (k + 1) = D (k ) D (k − 1) . . . D (1) D X (0) = λ(k )λ(k − 1) . . . λ(1)λ(0) X ◦θ k+1 , (7.20) where X and λ are defined as above. The main tool for proving this theorem is the notion of coupling. Definition 7.19 (Coupling) The random sequence {W (k )}k≥0 defined on the probability space ( , F, P) couples in finite time (or simply couples) with the stationary sequence generated by the random variable V if there exists a finite integer-valued random variable K such that W (k ) = V ◦θ k , ∀k ≥ K . We also say that the sequence {V ◦θ k } is reached by coupling by the sequence {W (k )}. Coupling implies convergence in total variation; in particular, if {W (k )} couples with the sequence generated by V , then W (k ) converges weakly to V as k goes to ∞. (see [6, Chapter 2]). We start with the following lemma which deals with the considered stochastic event graph. Lemma 7.20 Assume that E[ A] < E[U ]. Then for all finite and compatible initial lag times w = (w1 , w2 ), there exists a positive integer H (w ; ω) such that for all k ≥ H (w), δ(k ; z ) = δ(k ; e), where z = z (w ; ω) is the initial condition defined in (7.15). 316 Synchronization and Linearity Proof We first prove that H (w) = H (w), where def H (w) = inf {k ≥ 0 | δ(k ; z ) = δ(k ; e)} . In words, after the first time when δ(k ; z ) and δ(k ; e) meet, their paths are identical forever. The proof is by induction: if for some ω, δ(k ; z ; ω) = δ(k ; e; ω), then from (7.14) we obtain that δ(k + 1; z ; ω) = δ(k + 1; e; ω). It is easily checked by induction on k that for all weakly compatible initial lag times δ(k ; z ) ≥ δ(k ; e) ≥ e, for all k ≥ 0. Assume that the statement of the theorem does not hold. Then the paths δ(k ; z ) and δ(k ; e) never meet with a positive probability, so that the event A = {δ(k ; z ) > δ(k ; e) ≥ e, ∀k ≥ 0} has a positive probability. For all ω in A, we obtain δ(k ; z ) = = e ⊕ F (k − 1)δ(k − 1; z ) F (k − 1)δ(k − 1; z ) , for all k ≥ 0. Therefore, if A has a positive probability, the relation k −1 δ(k ; z ) = z ⊗ F ◦θ l l =0 −1 holds with a positive probability. Owing to the ergodic assumption, lk=0 F ◦θ l tends to −∞ a.s. if E[ A] < E[U ]. Therefore, under the assumption E[ A] < E[U ], the last relation readily implies that δ(k ; z ) → −∞ when k → ∞, with a positive probability, which is impossible since δ(k ; z ) ≥ 0. The general coupling property for the ratio process of the event graph is summarized in the following lemma. Lemma 7.21 Let w be an arbitrary finite and compatible initial lag time vector. The sequence {δ(k ; z )} couples with the sequence generated by the random variable , so that δ(k ; δ(0; w)) converges weakly to when k tends to ∞. If E[ A] > E[U ], then δ(k ; z ) converges a.s. to ∞ when k tends to ∞. More precisely, ◦ lim (δ(k , z )) 1/ k = E[ A/U ] > e k a.s. Proof From Lemma 7.20, there exists a finite integer H = H (z ) > 0 such that for all k ≥ H , δ(k ; z ) = δ(k ; e) a.s. Using again Lemma 7.20, we obtain another finite integer H = H ( ) such that for all k ≥ H , ◦θ k = δ(k ; ) = δ(k ; e) a.s. Hence, for all k ≥ max { H , H }, ◦θ k = δ(k ; z ) a.s. As for the case E[ A] > E[U ], we should use the bound δ(k ; z ) ≥ δ(k ; e) and the fact that ◦ lim (δ(k ; e))1/ k = E[ A]/E[U ] > e , k which follows from (7.13), to prove that lim δ(k ; z ) = ∞ a.s. 7.2. A Simple Example in Rmax Corollary 7.22 If E[ A] < E[U ], then 317 is the unique finite solution of (7.11). Proof The uniqueness of the possible stationary regimes follows from the coupling property: if is another stationary regime, namely a finite solution of (7.11), then we first obtain ◦θ ≥ e a.s. from the fact that satisfies (7.11), so that is necessarily nonnegative. In addition, from the coupling property we obtain that ◦θ k = δ(k ; ) = δ(k ; e) = δ(k ; ) = ◦θ k , for all k ≥ max { H ( ), H ( )} < ∞. Therefore = . Proof of Theorem 7.18 The existence part is established in §7.2.3.1–7.2.3.2. For λ and X as in the theorem, we must have λ = U and X 2 ◦θ U necessarily satisfies (7.11). Therefore, X 2 ◦θ U = in view of Corollary 7.22. The last property of the theorem is a mere rephrasing of Lemma 7.21, once we notice that the coupling of {δ(k )} with ◦ { ◦ θ k } implies that of {x (k )/ u (k )} with { X 2 ◦θ k }. In fact, the last assertion is only proved for initial conditions such that δ(0) ≥ e (see the proof of the Lemma 7.21); the extension to more general finite initial conditions is obtained in the same way. Remark 7.23 The only difficulty in the preceding eigenpair problem lies in finding X 2 , or equivalently . In the case when the sequences { A(k )} and {U (k )} are both i.i.d. (independent and identically distributed) and mutually independent, the problem of finding the distribution function of X 2 is solved using Wiener-Hopf factorization [46]. 7.2.6 First-Order and Second-Order Theorems The aim of what follows is primarily to extend Theorem 7.18 to more general classes of matrices D . Of particular interest to us will be matrices which correspond to certain types of autonomous and nonautonomous event graphs, like those introduced in Chapter 2. The results generalizing the eigenpair property of Theorem 7.18 will be referred to as second-order theorems, because they are concerned with ratios of the state variables. These theorems can be seen as Rmax -instances of multiplicative ergodic theorems (see Theorem 7.108). In what follows, the constants which characterize the growth rates of the state variables x j (k ), and which generalize those in Theorem 7.24 below, will be referred to as Lyapunov exponents; these theorems will be called first-order or rate theorems. We conclude the section with the first-order theorem associated with our simple example. Let (e1 , e2 ) denote the following vectors of R2 : e1 = (e, ε) and e2 = (ε, e). Theorem 7.24 The growth rate of X (k ) is characterized by the relations lim X 1 (k ) 1/ k k →∞ = = lim (e1 D (k ) D (k − 1) . . . D (1) D (0) X (0)) 1/ k k →∞ E[U ] a.s., and lim X 2 (k ) 1/ k k →∞ = = lim (e2 D (k ) D (k − 1) . . . D (1) D (0) X (0)) 1/ k k →∞ E[ A] ⊕ E[U ] a.s., 318 Synchronization and Linearity regardless of the (finite) initial condition X (0). Proof The first assertion of the theorem is trivial. As for the second, we have lim (x (k )) 1/ k = ◦ lim (u (k − 1))1/ k lim ( x (k )/ u (k − 1))1/ k = k ◦ E[U ] lim ( x (k )/ u (k − 1))1/ k . k k k If E[ A] > E[U ], then Lemma 7.21 implies that ◦ ◦ lim ( x (k )/ u (k − 1))1/ k = E[ A]/E[U ] . k If E[ A] < E[U ], we obtain from the coupling property of Lemma 7.21 that ◦ lim (x (k )/ u (k − 1))1/ k k = lim ◦θ k k 1/ k 1/ k k = lim ◦ (/ k If E ◦θ −1 )◦θ i . (7.21) i =1 ◦ is integrable in addition to being finite, so is / ◦θ −1 , and we therefore have ◦ / ◦θ −1 = 0; thus Birkhoff’s Theorem and (7.21) immediately imply that ◦ lim ( x (k )/ u (k − 1))1/ k = E ◦ / k ◦θ −1 =0 . Even if is not integrable (which may happen even in this simple case, see Re◦ mark 7.13), the random variable / ◦θ −1 is integrable as can be seen when using the following bounds obtained from (7.11): F ◦ θ −1 ≤ ◦ / ◦θ −1 ≤ A◦ θ −1 . Therefore, in this case too, when using (7.21), we also obtain that ◦ lim (x (k )/ u (k − 1))1/ k = E ◦ / k ◦θ −1 , from Birkhoff’s Theorem. We now prove that E − ◦θ −1 =0 , (7.22) which implies that in this case too limk ( x (k )) 1/ k = E[U ]. In order to prove (7.22), observe that min( , t ) − min( ◦ θ −1 , t ) ≤ − ◦ θ −1 , for all t ∈ R+ . Thus, from the Lebesgue dominated convergence theorem, we obtain that 0= = lim E min( , t ) − min( t →∞ E lim (min( , t ) − min( t →∞ ◦θ −1 ◦θ , t) −1 , t )) = E − ◦θ −1 . 7.3. First-Order Theorems 319 If E[ A] = E[U ], either is a.s. finite, in which case the preceding method applies, or it is a.s. infinite. We will consider the case = ∞ a.s. later on (see Theorem 7.36); the result is that lim ( x (k ))1/ k = E[U ] k in this case too. The proof of the following lemma is contained in the proof of the preceding theorem, and will be important in what follows. Lemma 7.25 If is a finite (not necessarily integrable) random variable such that − ◦θ −1 < ∞, where |x | denotes conven− ◦θ −1 is integrable, namely E tional absolute value in R, then E − ◦ θ − 1 = 0. 7.3 First-Order Theorems 7.3.1 Notation and Statistical Assumptions Let D be a general dioid. For A ∈ D p ×q , let p q def | A|⊕ = Ai j (7.23) Ai j . (7.24) i =1 j =1 and p q def | A|∧ = i =1 j =1 We will often use the following properties. Lemma 7.26 For all pairs of matrices ( A, B ) such that the product AB is well defined, we have | AB |⊕ ≤ | A|⊕ | B |⊕ , | AB |⊕ ≥ | A|∧ | B |⊕ , (7.25) | AB |⊕ ≥ | A|⊕ | B |∧ , and | AB |∧ | AB |∧ | AB |∧ ≥ ≤ ≤ | A|∧ | B |∧ , | A|∧ | B |⊕ , | A|⊕ | B |∧ , where ≤ is the order associated with ⊕ in D. Proof Since Aik ≤ | A|⊕ for all i, k , Aik Bk j ≤ | A|⊕ i, j k Bk j = | A| ⊕ | B |⊕ . j ,k (7.26) 320 Synchronization and Linearity The proof of the other formulæ is similar. The equation of interest in this section is x (k + 1) = A(k ) x (k ) , (7.27) where A(k ), respectively x (k ), is a random square matrix, respectively a random column vector, with entries taking their values in D. We will stress the dependence of x (k ) on the initial condition by writing x (k ; x 0 ). The random variables A(k ), k ∈ Z, and the initial condition x 0 are assumed to be defined on a common probability space ( , F, P, θ), where θ is a shift which leaves P invariant, and is ergodic, with A(k ) = A◦ θ k , k∈Z . (7.28) Most of the section will be devoted to the case when D is Rmax . Within this setting, x (k ) will be an n-dimensional column vector and A(k ) an n × n matrix. In this case, each entry of A is either a.s. equal to ε or nonnegative, and each diagonal entry of A is nonnegative. We start with a few examples in this dioid. 7.3.2 Examples in Rmax 7.3.2.1 Example 1: Autonomous Event Graphs Consider the evolution equation of a FIFO and autonomous stochastic event graph in its standard form, as given in Equation (2.31). If the initial condition of this event graph is compatible, (2.31) is of the type (7.27). In addition assume that the holding times αi (k ), pi ∈ P , and the initial lag times of this event graph are random variables defined on a common probability space ( , F, P, θ), and that the sequence {αi (k )} is θ -stationary, i.e. αi (k ) = αi ◦θ k , k ∈ Z , pi ∈ P , where αi is finite, nonnegative and integrable. Then it easily checked that the matrices A(k ) in (2.31) satisfy the θ -stationarity property and that each entry of A is either a.s. equal to ε or nonnegative and integrable. In view of the FIFO assumption, it is always true that x j (k + 1) ≥ x j (k ), so that the diagonal entry A j j (k ) can be assumed to satisfy the bound A j j (k ) ≥ e without loss of generality. Therefore, under the foregoing statistical assumptions, any FIFO and autonomous stochastic event graph with compatible initial condition satisfies an evolution equation which falls into the framework considered above. Conversely, as was pointed out in §2.5.4, we can also view any equation of the type (7.27) as the standard evolution equation of an event graph with compatible initial condition and where the initial marking is (0, 1)-valued. 7.3.2.2 Example 2: Nonautonomous Event Graphs Similarly, consider the evolution equation of a FIFO nonautonomous stochastic event graph in its standard form (2.39). If the initial condition is compatible, this equation then reads x (k + 1) = A(k )x (k ) ⊕ B (k )u (k ) . (7.29) 7.3. First-Order Theorems 321 If we define X (k ) to be the following M (|Q| + |I |)-dimensional vector and A(k ) to be the following matrix: u (k ) x (k ) X (k ) = , U (k ) B (k ) A(k ) = ε A(k ) , (7.30) where U (k ) is the diagonal matrix with entries ◦ U j j (k ) = u j (k + 1)/u j (k ) , qj ∈ I , (7.31) then it is immediate that (2.39) can also be rewritten as X (k + 1) = A(k ) X (k ) , k ≥1 . (7.32) This transformation is tantamount to viewing each input transition j as a recycled transition where the holding times of the recycling place are given by the sequence U j j (k ) . If the holding times αi (k ) and the inter-input times U j j (k ) satisfy the θ stationarity conditions αi (k ) = αi ◦θ k , pi ∈ P , U j j (k ) = U j j ◦θ k , qj ∈ I , k ∈Z , where the random variables αi and U j j are positive and integrable, then the matrices A(k ) satisfy the θ -stationarity condition (7.28) and the additional conditions mentioned above. Hence, the framework described at the beginning of this section also covers the nonautonomous case, provided we make additional θ -stationarity assumptions on the inter-input times. 7.3.3 Maximal Lyapunov Exponent in Rmax We assume that the nonnegative entries of A are all integrable. Under this condition the sequence {x (k ; x 0 )} defined by (7.27) converges to ∞ a.s. in a way which is quantified by the following theorem. Theorem 7.27 There exists a constant e ≤ a < conditions x 0 , the a.s. limit = ∞ such that, for all finite initial lim |x (k ; x 0 )|1/ k = lim | A(k − 1) A(k − 2) . . . A(k ) A(0) x 0 |1/ k = a a.s. ⊕ ⊕ k →∞ k →∞ (7.33) holds. If the initial condition is integrable, in addition we have lim E |x (k ; x 0 )|1/ k = lim E |x (k ; x 0 )|⊕ ⊕ k →∞ 1/ k k →∞ =a . (7.34) Proof By induction, we obtain that |x (k ; e)| ⊕ is integrable for all k ≥ 0 (using the integrability assumptions together with the fact that max(a , b ) ≤ |a | + |b |, for a and b in R). Therefore we have e ≤ E |x (k ; e)| ⊕ < , ∀k ≥ 0 . 322 Synchronization and Linearity Let ξm ,m +k = |x (k ; e)| ⊕ ◦θ m , Since |x (k , e)| ⊕ = A◦ θ k−1 . . . Ae m∈Z , ⊕ k≥0 . = A◦ θ k−1 . . . A ⊕ (7.35) , we obtain from Lemma 7.26 that for all k ≥ 1, and all 0 ≤ p ≤ k , A◦ θ k−1 . . . A◦ θ p A◦ θ p −1 . . . A ⊕ ◦θ m ≤ A◦ θ k−1 . . . A◦ θ p ⊕ ◦θ m A◦ θ p −1 . . . A ⊕ ◦θ m , that is, ξm ,m +k ≤ ξm ,m + p + ξm + p ,m +k, so that ξm ,m +k is a nonnegative and integrable subadditive process. From Kingman’s Theorem on subadditive ergodic processes (see Theorem 7.106), we obtain lim (ξ0k )1/ k = lim E (ξ0k )1/ k = a a.s., k →∞ k →∞ for some constant a < ∞, which concludes the proof for |x (k ; e)| ⊕ . From the relation x (k ) = A(k − 1) . . . A(0)x 0 and from (7.25), we obtain the immediate bounds |x (k ; e)| ⊕ |x 0 |∧ ≤ |x (k ; x 0 )|⊕ ≤ |x (k ; e)| ⊕ |x 0 |⊕ , k≥0 , ∀x 0 finite . Therefore |x (k ; e)| 1/ k |x 0 |1/ k ≤ |x (k ; x 0 )|1/ k ≤ |x (k ; e)| 1/ k |x 0 |1/ k , ⊕ ∧ ⊕ ⊕ ⊕ (7.36) for all k ≥ 0. Property (7.33) follows immediately when letting k go to ∞. If, in addition, x 0 is integrable, we first prove by induction that x (k ; x 0 ) is integrable for all k ≥ 0. We can hence take expectations in (7.36) and use the fact that limk→∞ E (ξ0k )1/ k = a to obtain (7.34). Remark 7.28 Certain representations of stochastic event graphs considered in Chapter 2, such as the representation of Corollary 2.62 for instance, involve initial conditions with ε entries, for which Theorem 7.27 cannot be applied directly. However, it is easy to check that one can replace these entries by appropriate finite entries without altering the value of x (·). Remark 7.29 It will also be useful to know when the constant a is strictly positive. A sufficient condition for this is that there exists at least a circuit of the precedence graph of A and two nodes i0 and j0 in this circuit such that E A j0 i0 (k ) > e. Under this condition the positiveness of a is obtained from the bound x j0 (k n) ≥ A j0 i0 (k n − 1)x i0 ((k − 1)n) . This in turn implies E x j0 (k n; e) ≥ k E A j0 i0 (k ) = kC , with C > 0, which implies that a > e. Note that in the stochastic event graph setting, this condition is tantamount to having a circuit of the event graph with at least one place with a positive mean holding time. 7.3. First-Order Theorems 7.3.4 323 The Strongly Connected Case The framework is that of the previous section. Let G ( A) denote the precedence graph of the square matrix A (see §2.3). Although matrix A depends on ω, the assumption that its entries are either a.s. equal to ε or a.s. finite implies that G ( A) is either a.s. strongly connected or a.s. nonstrongly connected (or equivalently, either A is a.s. irreducible or it is a.s. nonirreducible). In this subsection we assume that we are in the former case. Remark 7.30 The assumption that A is irreducible and the assumption that the diagonal entries of A are different from ε imply that A is aperiodic (see Definition 2.15 and the theorem which follows this definition). More precisely, the matrix def G (k ) = A(k + n − 1) A(k + n − 2) . . . A(k ) , k∈Z , (7.37) is such that G i j (k ) ≥ e for all pairs (i, j ) ∈ {1, . . . , n}2 . We know from the preceding subsection that |x (k )| ⊕ grows like ak . In fact, in the case considered here, each individual state variable x j (k ) has the same growth rate, as shown in the following lemma. Corollary 7.31 If matrix A is irreducible, then for all finite initial conditions x 0 , and for all j = 1, . . . , n, we have lim x j (k ; x 0 ) 1/ k k →∞ = a a.s., (7.38) where a is the maximal Lyapunov exponent of Theorem 7.27. If the initial condition is integrable, we also have lim E k →∞ x j (k ; x 0 ) 1/ k =a . (7.39) Proof From Remark 7.30, we obtain that x j (k ; x 0 ) ≥ x i (k − n; x 0 ) for all i, j = 1, . . . , n, and k > n. The property (7.38) follows then from the bounds |x (k − n)|⊕ ≤ x j (k ) ≤ |x (k )| ⊕ , ∀ j = 1, . . . , n , (7.40) and from Theorem 7.27. Corollary 7.32 Under the foregoing assumptions, if A is irreducible, the a.s. limits lim |x (k )| 1/ k = a a.s. ∧ (7.41) lim | A(k − 1) . . . A(1) A(0)| 1/ k = a a.s. ∧ (7.42) k →∞ and k →∞ hold. 324 Synchronization and Linearity Proof Equation (7.41) follows from (7.40). As for the second relation, it is immediate that lim sup | A(k − 1) . . . A(1) A(0)| 1/ k ≤ a a.s. ∧ k In addition, we have ( A(k − 1 + n) A(k − 2 + n) . . . A(1) A(0)) j i n = ( A(k − 1 + n) A(k − 2 + n) . . . A(n + 1) A(n)) jl l =1 ⊗ ( A(n − 1) A(n − 2) . . . A(1) A(0)) li . In view of Remark 7.30, this implies ( A(k − 1 + n) A(k − 2 + n) . . . A(1) A(0)) j i n ≥ ( A(k − 1 + n) A(k − 2 + n) . . . A(n + 1) A(n)) jl l =1 = x (k , e)◦ θ n ⊕ . Thus the a.s. limit 1/ k lim inf( A(k − 1) . . . A(1) A(0)) j i ≥ a a.s. k holds as a direct consequence of Theorem 7.27. Remark 7.33 Thus, in the case of a strongly connected stochastic event graph, all transitions have the same asymptotic firing rate; the constant a is also called the cycle time of the strongly connected event graph. Its inverse a−1 is often called its throughput. 7.3.5 General Graph Consider the decomposition of G ( A) into its m.s.c.s.’s (§2.2). For the same reasons as above, the number N A of its m.s.c.s.’s and their topologies are nonrandom. We will use the notations of §2.2 for the m.s.c.s.’s and the reduced graph (V , E ) of G ( A). The reduced graph is acyclic and connected (provided the precedence graph is connected, which will be assumed in the what follows). Remember that a m.s.c.s. (Vn , En ) is said to be a source subgraph if node n of the reduced graph has no predecessors, and that it is said to be a nonsource subgraph otherwise. Remark 7.34 In the particular case when the equation of interest is that of a nonautonomous event graph of the form (7.32), each recycled transition associated with an input transition will be seen as a source subgraph. When there is no ambiguity, the notation π , π ∗ and π + will be used to represent the usual sets of predecessor nodes in the reduced graph. Without loss of generality, 7.3. First-Order Theorems 325 the numbering of the nodes is assumed to be compatible with the graph in the sense that (m , n ) ∈ E implies m < n . In particular, the source subgraphs are numbered {1, . . . , N0 }. For all 1 ≤ n ≤ N A , we will make use of the restrictions A(n)(m ) , x (n) , A(≤n)(≤m ) , x (≤n) , etc. defined in Notation 2.5. def The maximal Lyapunov exponent associated with the matrix A(n) (k ) = A(n)(n) (k ) def def (respectively A(≤n) = A(≤n)(≤n) or A(<n) = A(<n)(<n) ) will be denoted a(n) (respectively a(≤n) or a(<n) ). Observe that in general, x (n) (k ) does not coincide with the solution of the evolution equation y (k + 1) = A(n) (k ) y (k ) , k ≥ 0 , with initial condition y (0) = x (n) (0). However, the sequence x (≤n) (k ) (respectively x (<n) (k ) ) is the solution of the evolution equation (respectively x (≤n) (k + 1) x (<n) (k + 1) = = A(≤n) (k )x (≤n) (k ) , A(<n) (k )x (<n) (k )) , k≥0 , with initial condition x (≤n) (0) (respectively x (<n) (0)). Lemma 7.35 For all finite initial conditions, the following a.s. limits hold: lim x (n) (k ) k →∞ 1/ k ⊕ = a ( ≤n ) a.s., (7.43) ∀ j ∈ Vn . (7.44) and lim x j (k ) 1/ k k →∞ = a ( ≤n ) a.s., If the initial condition is integrable, we also have lim E x (n) (k ) k →∞ 1/ k ⊕ = a ( ≤n ) (7.45) and lim E x j (k ) k →∞ 1/ k = a ( ≤n ) , Proof It is obvious from the definition that x (n) (k ) lim inf x (n) (k ) k 1/ k ⊕ ∀ j ∈ Vn . ⊕ (7.46) ≤ x (≤n) (k ) ⊕ , so that ≤ a ( ≤n ) . When using the fact that there exists a path of length less than n from h to j in G ( A), for all j ∈ Vn and h ∈ m ∈π ∗ (n) Vm , together with the assumption on the diagonal entries of A, we obtain the following bound from (7.27): x j (k + 1) ≥ x h (k − n) , {h∈Vm ,m ∈π ∗ (n)} ∀ j ∈ Vn , 326 Synchronization and Linearity provided k ≥ n. Therefore x (n) (k + 1) ⊕ ≥ x (≤n) (k − n) ⊕ , for k ≥ n, so that lim sup x (n) (k ) k 1/ k ⊕ ≥ a ( ≤n ) a.s. The proof of the individual a.s. limits in (7.44) follows the same lines as in Corollary 7.31. In the integrable case, the proof of the convergence of the expectations is immediate. Owing to the acyclic nature of the reduced graph, the vector x (n) (k ) satisfies the equation x (n) (k + 1) = A(n) (k )x (n) (k ) ⊕ s (n , k + 1) , (7.47) where def s (n , k + 1) = A(n)(<n) (k )x (<n) (k ) . (7.48) Equation (7.47) is the basis for proving the following property. Theorem 7.36 The constant a(≤n) , which characterizes the growth rate of the variables x j (k ), j ∈ Vn , is obtained from the constants a(m ), 1 ≤ m ≤ n, by the relation a ( ≤n ) = a (m ) . (7.49) m ∈π ∗ (n) Proof We first prove that lim |s (n , k )| 1/ k = a(<n) ⊕ k →∞ a.s., (7.50) for all N0 < n ≤ N . From (7.48), we obtain that |s (n , k + 1)|⊕ ≤ | A(k )| ⊕ x (m )(k ) ⊕ , m ∈π + (n) so that |s (n , k + 1)|1/ k ≤ | A(k )| 1/ k x (<n) (k ) ⊕ ⊕ 1/ k ⊕ . (7.51) The integrability assumption on A implies that lim | A(k )| 1/ k = e ⊕ k a.s. (using the same technique as in the proof of Theorem 7.24). Letting k go to ∞ in (7.51) then implies lim inf |s (n , k )| 1/ k ≤ a(<m ) a.s. ⊕ k →∞ By using the same type of arguments as in Lemma 7.35, from (7.48) we obtain that |s (n , k + 1)|⊕ ≥ x (<n) (k − n) ⊕ a.s., 7.3. First-Order Theorems which in turn implies 327 lim sup |s (n , k )|1/ k ≥ a(<m ) ⊕ k →∞ This concludes the proof of (7.50). It is clear from (7.48) that x (n) (k + 1) 1/ k 1/ k ⊕ a.s. ≥ |s (n , k + 1)|⊕ , so that necessarily x (n) (k + 1) ⊕ ≥ |s (n , k + 1)|⊕ , and hence a(≤n) ≥ a(<n) . Owing to the individual limits of Equation 7.44, for all j ∈ Vn , (x (n) ) j (k ) ∼ ak≤n) , whereas |s (n , k )| ⊕ ∼ ak n) , ( (< so that if a(≤n) > a(<n) , then there exists a finite integer-valued random variable K such that A(n) (k )x (n) (k ) ≥ s (n , k ) , ∀k ≥ K . Accordingly, Equation (7.48) reads x (n) (k + 1) = A(n) (k )x (n) (k ) , for k ≥ K . Let y (k ; x 0 ) denote the solution of the equation y (k + 1) = A(n) (k ) y (k ) , k ≥0 , with initial condition y (0) = ( x 0 )(n) . On { K = h }, we have x (n) (k ) = A(n) (k ) . . . A(n) (h )x (n) (h ) = = ( A(n) (k − h ) . . . A(n) (0)x (n) (h )◦ θ −h )◦θ h y (k − h ; x (n) (h )◦ θ −h )◦θ h , for all k ≥ h . Thus, on the event { K = h } lim x (n) (k ) k 1/ k ⊕ = lim y (k − h ; x (n) (h )◦ θ −h ) k 1/ k ⊕ ◦θ h = a (n ) a.s., (7.52) where we used the a.s. convergence result of Theorem 7.27 applied to matrix A(n) (k ). Since K is finite, h { K = h } = , so that lim x j (k ) k 1/ k = a (n ) a.s., ∀ j ∈ Vn . Therefore, a(≤n) ≥ a(<n) , and a(≤n) > a(<n) implies a(≤n) = a(n) , that is, a(≤n) = a(<n) ⊕ a(n) . The proof of (7.49) is obtained from the last relation by an immediate induction on n . Example 7.37 (Acyclic fork-join networks of queues) Consider the stochastic event graph of Figure 7.2. This example features an acyclic fork-join queuing network which is characterized by an acyclic graph G = (V , E ), with nodes {0, 1, . . . , n}. In this graph, π ( j ) will denote the set of predecessors of node j and σ ( j ) the set of its successors. This graph has a single source node denoted 0. With this graph, we associate a FIFO stochastic event graph, for which we use the conventional notation. The set of transitions is {q0 , q1 , . . . , qn }; each transition q j is recycled, with associated place p j j , 0 ≤ j ≤ n; in addition, a place p j i is associated with each pair of nodes 0 ≤ i, j ≤ n such that i ∈ π ( j ). Transition q0 , 328 Synchronization and Linearity q3 p33 p31 p10 q0 p00 q1 p41 p53 p54 q4 p11 p65 p55 p44 p20 q5 p62 q6 p66 q2 p22 Figure 7.2: An acyclic fork-join network of queues which generates the input to the queuing network, is a recycled transition such that σ (q0 ) = { p00 , pi 0 }i ∈σ (0), and π(q0 ) = { p0 }. Transition q j , which represents queue j of the queuing network, admits the set { p j j , p j i }i ∈π ( j ) as a predecessor set and the set { p j j , pi j }i ∈σ ( j ) as a successor set. If σ ( j ) has two or more elements, one says that there is a ‘fork’ from queue j to the queues of σ ( j ). As a result of this fork, when a departure takes place from queue j , this creates simultaneous arrivals into successor queues. Similarly, when π ( j ) has two or more elements, one says that there is a ‘join’. Clearly, the effect of a join in queue j is to synchronize the outputs of the queues of π ( j ). Let α j (k ) denote the holding times in p j j and U (k ) denote those in p00 , and assume that all the other holding times are zero. For instance, the matrix A(k ) associated with the autonomous event graph of Figure 7.2 is characterized by the formula α0 (k ) e e ε ε ε ε ε α1 (k ) ε e e ε ε ε ε α2 (k ) ε ε ε e ε ε α3 (k ) ε e ε . A(k − 1) = ε ε ε ε ε α4 (k ) e ε ε ε ε ε ε α5 (k ) e ε ε ε ε ε ε α6 (k ) It is easily checked that N A = n + 1, and that each m.s.c.s. consists of exactly one transition, so that the firing rate of transition j is simply a (≤ j ) = E [α j ] . i ∈π ∗ ( j ) 7.3.6 First-Order Theorems in Other Dioids The rate theorem (Theorem 7.27) which was established in the preceding sections for ⊕ = max and ⊗ = + is essentially based on the following two ingredients: 7.4. Second-Order Theorems; Nonautonomous Case 329 • the relation | AB |⊕ ≤ | A|⊕ | B |⊕ ; (7.53) • Kingman’s subadditive ergodic theorem which relies on the fact that ⊗ = +. The first result holds in a general dioid (see Lemma 7.26). Example 7.38 For instance, if D = Rmin , the first relation of (7.25) translates into the property | AB | ⊕ ≥ | A|⊕ | B |⊕ (where ≥ denotes here the conventional ordering in R), since ≤ corresponds to the ‘reverse’ of ≤min . In order to extend the second result to a general dioid, we need a ‘sub-⊗ ergodic theorem’ stating that any D-valued random sequence {amn }m <n∈N which satisfies the conditions amn ≤ amp ⊗ a pn , ∀m < p < n , and am ,m +k = a0k ◦θ m , ∀m , k ∈ N , is such that ∃ lim (a0k )1/ k = a a.s. , k →∞ where a is some constant (the meaning of the limit will not be discussed precisely here). For instance, such a theorem will follow from Kingman’s subadditive ergodic theorem if ⊗ is + or ×. Thus, under the same type of statistical assumptions as in Theorem 7.27, we can prove the existence of maximal Lyapunov exponents for linear systems of the type (7.27) in Rmin . 7.4 7.4.1 Second-Order Theorems; Nonautonomous Case Notation and Assumptions In this section the dioid D under consideration is general. The basic equation of interest is the evolution equation x (k + 1) = A(k ) x (k ) ⊕ B (k )u (k ) , k≥0 , (7.54) where x (k ), u (k ) and A(k ) all belong to D. The sequences { A(k ), B (k )}k ∈Z and {u (k )}k ≥0 are assumed to be given, as well as the initial condition x 0 . The sequence {u (k )} is assumed to be finite and nondecreasing in k . Let U (k ) be defined by U (k ) = u (k + 1) , u (k ) k≥0 . (7.55) 330 Synchronization and Linearity 7.4.2 Ratio Equation in a General Dioid As in the preliminary example, we define the ratio process, which consists of the sequence δ(k ) = x (k + 1) , u (k ) k≥0 . (7.56) The first aim of this subsection is to derive an evolution equation for this ratio process. Theorem 7.39 (Ratio Equation) The variables δ(k ) satisfy the inequalities δ(k + 1) ≥ A(k + 1)δ(k )U− (k ) ⊕ B (k + 1)U+ (k )U− (k ) , k≥0 , (7.57) ◦ where the initial condition is δ(0) = δ0 is equal to x (1)/ u (0), and where def U− (k ) = u (k ) , u (k + 1) def U+ (k ) = U (k ) = u (k + 1) . u (k ) (7.58) Proof By successively using formulæ (f.2) and (f.12) of Table 4.1, we obtain x (k + 1) u (k − 1) = ≥ ≥ A(k )x (k ) ⊕ B (k )u (k ) u (k − 1) A(k )x (k ) B (k )u (k ) ⊕ u (k − 1) u (k − 1) x (k ) u (k ) A(k ) ⊕ B (k ) . u (k − 1) u (k − 1) From (f.5), we obtain u (k − 1) ≥ u (k − 1) u (k ) , u (k ) so that x (k + 1) u (k − 1) ≤ = x (k + 1) ◦ (u (k − 1)/u (k )) u (k ) ◦ x (k + 1)/u (k ) , ◦ u (k − 1)/u (k ) where we used (f.9) in order to obtain the last relation. By using the notation U− (k ) and U+ (k ) defined in the statement of the theorem, we finally obtain δ(k ) ≥ A(k )δ(k − 1) ⊕ B (k )U+ (k − 1) , U− (k − 1) k≥1 . Therefore, δ(k ) U− (k − 1) ≥ U− (k − 1) A(k )δ(k − 1)U− (k − 1) ⊕ B (k )U+ (k − 1)U− (k − 1) , k≥1 , 7.4. Second-Order Theorems; Nonautonomous Case 331 and the proof is completed since the left-hand side of the last expression is less than δ(k ) (owing to (f.5)). In what follows, we will concentrate on the least solution of Inequalities (7.57). Lemma 7.40 The least solution of (7.57) is also the solution of the set of equations δ(k + 1) = A(k + 1)δ(k )U− (k ) ⊕ B (k + 1)U+ (k )U− (k ) , k≥0 , (7.59) ◦ with initial condition δ(0) = x (1)/ u (0). Proof The minimum element of the set {x ≥ a } being a , the result of the lemma follows from an immediate induction. Example 7.41 Let the dioid of scalars be Rmax and let x and u be vectors in this dioid, of dimension n and m respectively. Taking Remark 4.80 into account, the above calculations are still valid. A more direct derivation of the same result is obtained by subtracting u i (k ) from the j -th line of (7.54); we directly obtain n δ j i (k + 1) = ◦ A jl (k + 1) ( xl (k + 1)/u i (k + 1)) l =1 m ⊕ ◦ B jl (k + 1) (ul (k + 1)/u i (k + 1)) . l =1 ◦ By using the property that a/a = e for all a ∈ R, a = ε, it is immediately verified that under the finiteness assumptions which were made on u (k ), this can be rewritten as n δ j i (k + 1) m = ◦ A jl (k + 1) xl (k + 1)/u p (k ) ◦ u p (k )/ u i (k + 1) l =1 p =1 m m ⊕ ◦ B jl (k + 1) ul (k + 1)/u p (k ) ◦ u p (k )/ u i (k + 1) , l =1 p =1 which is a mere rephrasing of the matrix relation (7.59). 7.4.3 Stationary Solution of the Ratio Equation All the data are now assumed to be defined on a common probability space ( , F, P, θ), with the usual assumptions on θ . The variables A(k ) and B (k ) are assumed to be θ -stationary, with A(k ) = A◦ θ k and B (k ) = B ◦θ k . The dioid D is assumed to be complete. In addition to this, it is assumed that the sequence of ratios {U+ (k )} and {U− (k )} defined in (7.58) are such that U+ (k ) = U+ ◦θ k , U− (k ) = U− ◦θ k , k∈Z . As in the preliminary example, we are interested in the possibility of making the ratios δ(k ) stationary. For this, we will use a backward construction which generalizes the construction of the preliminary example. 332 Synchronization and Linearity Definition 7.42 (Backward process) The backward process associated with the least solution of (7.57) is the sequence of variables { (k )}k≥0 defined as the solution of the set of relations (0)◦ θ (k + 1)◦θ def = = C, A◦θ (k )U− ⊕ C , k ≥0 , (7.60) def def where U+ = U+ (0), U− = U− (0) and C = B ◦θ U+ U− . Lemma 7.43 The sequence { (k )} is coordinatewise nondecreasing in k with respect to the order in D. Proof It is clear that (1) ≥ C ◦θ −1 = (k ) ≥ (k − 1). Then (k + 1)◦θ = ≥ (0). Assume now that for some k ≥ 1, A◦ θ (k )U− ⊕ C A◦ θ (k − 1)U− ⊕ C = (k )◦ θ , where we used the fact that the mapping x → Ax B ⊕ C is isotone. Lemma 7.44 The random variable ◦θ = k ≥0 (k ) satisfies the relation = A◦ θ U− ⊕ C . (7.61) Proof The sum belongs to D because this dioid is complete. Since the mapping x → Ax B ⊕ C is l.s.c., we also have ◦θ = ( A◦ θ (k )U− ⊕ C ) k ≥0 = A◦ θ (k ) U− ⊕ C k ≥0 = A◦ θ U− ⊕ C . Lemma 7.45 In the case when A ∈ Dn×n , U + and U− ∈ Dm×m , and C and ∈ Dn×m , if the diagonal elements of A are greater than or equal to e and U− has finite entries, then the event B = {| |⊕ = } is of probability either 0 or 1. Proof Owing to the assumption that the diagonal elements of A are a.s. finite, | |⊕ = implies | A◦ θ |⊕ = , which in turn implies that | A◦θ U− |⊕ = (since U− has all its entries a.s. finite). Therefore, | |⊕ = implies that | ◦ θ |⊕ ≥ | A◦ θ U− |⊕ = . 7.4. Second-Order Theorems; Nonautonomous Case 333 Thus, the measurable set B is such that the indicator functions 1B ◦θ k are nondecreasing in k . This is enough to ensure that B is of probability 0 or 1; indeed, when using the ergodicity assumption on the shift θ , and the nondecreasingness property, we obtain 1 k →∞ k k 1B ◦ θ l ≥ 1B P[B] = lim a.s. , l =1 so that if P[B] > 0, then the indicator function 1B is equal to 1 a.s. (an indicator function which is positive is necessarily equal to 1). The following expansion of the backward process generalizes the star operation of Lemma 7.10. Lemma 7.46 The relation k l (k ) = l A(−h + 1) C (−l − 1) l =0 h =1 U− (h − l − 1) (7.62) h =1 holds, where the ⊗-product over an empty set (when l = 0) is equal to e by convention, and C (k ) = B (k + 1)U+ (k )U− (k ). This formula holds true for k = ∞ when taking (∞) = . As in our previous example, the nondecreasingness of the sequence { (k )} becomes transparent from this formula. The main question we are now interested in consists in determining the conditions under which the limiting value is a.s. finite. The answer to this question is based on ergodic theory arguments, and is therefore dependent on the specific dioid which is considered. We will concentrate on the Rmax case for the rest of this section. 7.4.4 Specialization to Rmax 7.4.4.1 Statistical Assumptions In this subsection the underlying scalar dioid is Rmax . In view of the results of the preceding subsection and of those of the preliminary example, the most natural statistical assumptions would consist in taking • U+ (k ) = U+ ◦θ k (which implies that U− (k ) = U− ◦θ k ); • U+ integrable (which implies U− integrable). We will rather take the weaker assumptions • {U+ (k )} couples with {U+ ◦θ k }, where all the entries of U+ are a.s. finite (which implies that {U− (k )} couples with {U− ◦θ k }, where U− has finite entries); • (U+ )ii integrable for all i = 1, . . . , m. 334 Synchronization and Linearity The motivations for taking these weaker assumptions will be commented upon in the next subsection. −1 Remark 7.47 Let ui = E[(U+ )ii ]. Since u i (k ) = u i (0) lk=0 (U+ )ii (l ), k ≥ 1, from the coupling assumption and the assumption E (U+ )ii = ui , we obtain that def lim (u i (k )) 1/ k = ui k →∞ a.s., ∀i = 1, . . . , m . (7.63) This can only be compatible with the assumption that U+ (k ) couples with a stationary and finite sequence if ui = u j for all i, j = 1, . . . , m (since u i (k ) − u j (k ) cannot couple with a finite stationary sequence if ui = u j ). Therefore, a direct conclusion of our assumptions is that lim |u (k )| ⊕ 1/ k = lim |u (k )|1/ k = u ∧ k →∞ k →∞ a.s. (7.64) More general cases with E[(U+ )ii ]= ui and ui = u j for some pair (i, j ), are of limited practical interest, as shown by the following example. Example 7.48 Matrix A has a single m.s.c.s., and its Lyapunov exponent a is such that a < ui < u j . In view of Theorem 7.36, we then have xl (k ) ∼ (a ⊕ ui ⊕ u j )k = ukj , for all l = 1, . . . , n. Therefore, xl (k ) − u i (k ) tends a.s. to ∞ for all l as k goes to ∞. Thus, in such a situation, some of the ratios necessarily become infinite. We conclude this section with an algebraic interpretation of the assumptions on U+ , and a statement of the Rmax -eigenvalue problem. Lemma 7.49 Let V (k ), k ≥ 0 be a sequence of m × m matrices. The two conditions ◦ • V (k ) = v(k + 1)/v(k ), k ≥ 0, where v(k ), k ≥ 0 is a sequence of finite mdimensional vectors, • V (k ) = V ◦θ k , k ≥ 0, are equivalent to the existence of a unique finite Rm eigenvector y, with y1 = e, and a ◦ unique eigenvalue β ∈ R such that V y = β y ◦ θ and V = (β y ◦ θ)/ y. Proof We first show that under the first two conditions, there exists a unique pair of ◦ ◦ ◦ finite vectors ( y , z ) such that y1 = e and V = z/ y . We have V = v(1)/v(0) = z/ y , def def ◦ ◦ where yi = vi (0)/v1 (0) and z i = vi (1)/v1 (0). We have y1 = e; let us show that y and ◦ z are uniquely defined from V : from the very definition of /, we have ◦ Vi j = z i/ y j . By taking j = 1 in the last relation, we see that z is uniquely defined from V , since ◦ z i = Vi 1 ; therefore y j = z i/Vi j does not depend on i and is uniquely determined from V . 7.4. Second-Order Theorems; Nonautonomous Case 335 ◦ Let β be the random variable β = z 1 . We have z = βw , where wi = z i/β = ◦ vi (1)/v1 (1). We now conclude the proof of the eigenpair property by showing that def ◦ ◦ ◦ w = y ◦θ . We have V ◦θ = v(2)/v(1) = v/w , where vi = vi (2)/v1 (1). Since w1 = e, the above uniqueness property shows that w = y ◦θ indeed. ◦ Conversely, if we assume that V = (β y ◦ θ)/ y , for some finite (β, y ), we obtain ◦ V (0) = v(1)/v(0) with v(0) = y and v(1) = β y ◦θ . More generally, we have V (k ) = ◦ v(k + 1)/v(k ), for all k ≥ 1, when taking v(k ) = β k . . . β y ◦ θ k . Under the assumptions listed above, the system of interest can be rewritten as u (k + 1) = U+ (k )u (k ) , x (k + 1) = A(k )x (k ) ⊕ B (k )u (k ) , (7.65) or equivalently as X (k + 1) = D (k ) X (k ) , where D (k ) = (7.66) u (k ) x (k ) X (k ) = and k≥0 , U+ (k ) B (k ) ε A(k ) . In view of the preceding lemma, the assumptions on D (k ) can be summarized as follows: the matrices D (k ) couple in finite time with a stationary sequence { D ◦θ k }, where D= U+ B ε A ; the matrix U+ is such that U+ = λu ◦θ , u (7.67) where (λ, u ) are uniquely defined (u is a finite random vector u ∈ Rm with u 1 = e and λ is a nonnegative and finite random variable). The problem of interest is then similar to the random eigenpair problem of §7.2.3: can we continue the random eigenpair property U+ u = λu ◦θ , which follows from (7.67), to the following eigenpair property of D : D X = λ X ◦θ ? (7.68) Remark 7.50 The assumption that (U+ )i j (k ) couples with a stationary sequence for all i, j = 1, . . . , m, which is equivalent to the eigenpair property of (7.67), is necessary for the second-order theorems of the following subsections; in particular, if this property is only satisfied by the diagonal terms (U+ )ii (k ) (like for instance in the formulation (7.30) of the evolution equation), then these stability theorems do not hold, as shown by Example 7.98. 336 Synchronization and Linearity 7.4.4.2 Example: Nonautonomous Event Graphs We know from Corollary 2.82 that a nonautonomous FIFO stochastic event graph with recycled transitions and with a compatible initial condition in its standard form satisfies an equation of the form (7.54) in Rmax . We make the following stationarity assumptions: • the holding times αi (k ), pi ∈ P , k ∈ Z, are θ -stationary and integrable; def ◦ • the ratios Ui j (k ) = u i (k + 1)/u j (k ), qi , q j ∈ I , k ∈ N, are finite and couple in finite time with a θ -stationary sequence Ui j ◦θ k , which satisfies the integrability and rate conditions mentioned above. As a direct consequence of the first assumption, the sequences { A(k )} and { B (k )} are both θ -stationary and the entries of these matrices which are not a.s. equal to ε are integrable. In particular, the diagonal entries of A(k ) are a.s. nonnegative and integrable owing to the assumptions that the transitions are all recycled. Remark 7.51 We now list a few motivations for the assumptions on U+ (k ), which will become more apparent in §7.4.4.5. In (7.54), we would like to be able to take the input vector u (k ) ∈ Rm equal to the output of some other stochastic event graph (incidentally, this is the most practical way of building input vectors which satisfy the conditions of §7.4.4.1). For instance, consider the vector (u (k ), x (k + 1)) ∈ R2 , associated with Equation (7.3), as an output signal of the system analyzed in the preliminary example. The first assumption (coupling of U+ (k ) with U+ ◦θ k ) is motivated by the following observations: • Even in the stable case, the output process of a stochastic event graph is usually not such that U+ (k ) is θ -stationary from k = 0 on. For instance, if we take the specific vector mentioned above as an input vector of Equation (7.54), with m = 2, we know from the preliminary example that the corresponding sequences {U+ (k )} and {U− (k )} couple with stationary sequences, provided appropriate rate conditions are satisfied. • More generally, for stochastic event graphs, the assumption that {U+ (k )} is θ stationary may not be consistent with the assumption that the initial condition is compatible. As for the second assumption (integrability of the diagonal terms only), we also know from the preliminary example that the nondiagonal entries of U+ or U− are not integrable in general (see Remark 7.13). 7.4.4.3 Finiteness of the Stationary Solution of the Ratio Equation; Strongly Connected Case Matrix A associated with (7.54) is assumed to have a strongly connected precedence graph. Let a denote the maximal Lyapunov exponent associated with A (see Corollary 7.31). Let def ξ(k ; x 0 ) = | A(k ) . . . A(2) A(1)δ(0; x 0 )U− (0)U− (1) . . . U− (k − 1)|⊕ , (7.69) 7.4. Second-Order Theorems; Nonautonomous Case 337 where x 0 is a finite random initial condition in Rn×m . Lemma 7.52 For all finite initial conditions x 0 ∈ Rn×m , ◦ lim (ξ(k , x 0 ))1/ k = a/u a.s. k →∞ (7.70) Proof The property stated in (7.70) is similar to the one proved in Theorem 7.27. Indeed, from (7.25) we obtain that |ξ(k , x 0 )|⊕ ≤ | A(k ) . . . A(2) A(1)| ⊕ |δ(0; x 0 )|⊕ |U− (0)U− (1) . . . U− (k − 1)|⊕ . We know from Theorem 7.27 that lim | A(k ) A(k − 1) . . . A(2) A(1)| 1/ k = a a.s. ⊕ k As for the product U− (0)U− (1) . . . U− (k − 1), we cannot apply the same theorem because the entries of U− (k ) are not assumed to be integrable anymore, and because the θ -stationarity is replaced by the coupling assumption. However, owing to the specific form of U− (k ), and to the fact that the scalar dioid is Rmax , the relation U− (0) . . . U− (k − 1) = holds, so that u (0) u (k ) ◦ lim |U− (0) . . . U− (k − 1)|1/ k = e/u ⊕ k a.s. , in view of (7.63). This immediately implies that ◦ lim sup (ξ(k ; x 0 ))1/ k ≤ a/u a.s. k On the other hand, (7.25) and (7.26) imply that ◦ ξ(k ; x 0 ) ≥ |( A(k − 1) . . . A(0))| ⊕ |δ(0; x 0 )|∧ |u (0)|∧ / |u (k )| ⊕ . By using this inequality together with the above a.s. limits, we finally obtain ◦ lim inf (ξ(k ; x 0 ))1/ k ≥ a/u k a.s. Let ζ (k ) = A . . . A◦ θ −k+1 C ◦ θ −k−1 U− ◦θ −k . . . U− ◦θ −1 def ⊕ , (7.71) where C was defined in Definition 7.42. The following lemma is very similar to the preceding one, although it is somewhat more difficult to prove. Lemma 7.53 Under the foregoing assumptions ◦ lim (ζ (k ))1/ k = a/u k →∞ a.s. (7.72) 338 Synchronization and Linearity Proof When using the same arguments as in Theorem 7.27 and Corollary 7.32 (applied to the shift θ −1 , which is also ergodic), we obtain that for all i, j , k A . . . A◦ θ −k+1 1/ k = ij lim A . . . A◦ θ −k+1 = lim lim E k 1/ k ⊕ A . . . A◦ θ −k+1 k 1/ k ⊕ =a , for some positive and finite constant a (the first two limits are understood in the a.s. sense). Since 1/ k A . . . A◦ θ −k+1 E ⊕ 1/ k A◦ θ k−1 . . . A =E ⊕ , we necessarily have a = a . Therefore, for all i, j , lim k 1/ k A . . . A◦ θ −k+1 ij =a a.s. (7.73) We now show that we also have lim k U− ◦ θ − k . . . U− ◦ θ − 1 1/ k ij ◦ = e/u a.s., (7.74) for all i, j = 1, . . . , m. For this, we will use the specific forms of U+ and U− , which imply that there exists a unique pair of vectors (b (0), b (1)) in (Rm)2 such that b1 (0) = e and b (1) b (0) U+ = , U− = b (0) b (1) (see Lemma 7.49). Let {b (k )}k≥1 be the sequence of Rm -valued vectors defined by the relations b (k ) = U+ ◦θ k−1 b (k − 1) = U+ ◦θ k−1 . . . U+ ◦θ 1 b (1) , k ≥2 . We have U+ ◦θ k = b (k + 1) , b (k ) U− ◦ θ k = b (k ) , b (k + 1) k ≥0 . (7.75) This implies U− ◦θ −k . . . U− ◦θ −1 = U− . . . U− ◦θ k−1 = ij ◦ bi (0)◦ θ −k/b j (k )◦ θ ◦θ ij −k Therefore for i = j U− ◦ θ − k . . . U− ◦ θ − 1 −k ii = ◦ (e/Uii )◦θ l , l =−1 −k . (7.76) (7.77) 7.4. Second-Order Theorems; Nonautonomous Case 339 so that the ergodicity of θ −1 and the assumption E[Uii ] = u show that (7.74) holds at least for i = j . In view of (7.76), we have U− ◦ θ − k . . . U− ◦ θ − 1 = U− ◦θ −k . . . U− ◦θ −1 ij jj ◦ bi (0)◦θ −k/b j (0)◦ θ −k . We conclude the proof of (7.74) by showing that lim bi (0)◦θ −k 1/ k k =e a.s. , (7.78) for all i = 1, . . . , m (this is immediate if the entries of U− or U+ are integrable, but we have not assumed this). We know from Lemma 7.49 that ◦ b (0)◦θ = b (1)/(b1 (1)) . (7.79) Therefore, for all i = 1, . . . , m, the relation ◦ bi (0)◦ θ (bi (1)/bi (0)) (U+ )ii = = ◦ bi (0) (b1 (1)/b1 (0)) (U+ )11 ◦ ◦ holds, which shows that bi (0)◦θ/bi (0) and hence bi (0)◦θ −1/bi (0) are integrable. This ◦ implies that E[bi (0)◦θ −1/bi (0)] = e (see Lemma 7.25). Since bi (0)◦θ −k = bi (0) −k +1 ◦ bi (0)◦ θ −1/bi (0) ◦θ h , h =0 this and Birkhoff’s Theorem imply that lim bi (0)◦θ −k 1 / k k 1/ k −k +1 = lim bi (0) k bi (0)◦ θ /bi (0) ◦θ h h =0 1/ k −k +1 = lim k −1 ◦ −1 ◦ bi (0)◦θ /bi (0) ◦θ h h =0 ◦ = E bi (0)◦θ −1/bi (0) = e . (7.80) ◦ Finally, since C = B ◦θ U+ U− = B ◦θ b (1)/b (1), we obtain from (7.79) that C = B ◦θ ◦ b (1) b (1)/b1 (1) = B ◦θ = ◦ b (1) b (1)/b1 (1) B b (0) b (0) ◦θ . (7.81) Then either the j -th line of B is ε, and C j i (k ) = ε for all i and k , or it is different from ε and (7.81), the integrability assumption on the non ε elements of B , and (7.78) imply that lim C j i ◦θ −k k →∞ 1/ k =e a.s. , for all i . The proof of the lemma is concluded from (7.73), (7.74) and (7.82). (7.82) 340 Synchronization and Linearity Remark 7.54 When using the notation of the preceding theorem, the initial condition for the backward recurrence should be (0) = Bb (0) . b (0) If U+ (k ) is stationary, then for all x 0 , the initial condition δ0 of the ratio process satisfies the bound δ0 = A(0)x 0 ⊕ B (0)u (0) B (0)u (0) Bb (0) ≥ = = C ◦ θ −1 . u (0) b (0) b (0) Remark 7.55 If A is not strongly connected, the proof of Lemma 7.53 allows one to conclude that ◦ lim sup(ζ (k ))1/ k ≤ a/u k →∞ a.s. , (7.83) where a is the maximal Lyapunov exponent of A. In the following theorem, which holds regardless of the strong connectedness of A, a is the maximal Lyapunov exponent of A. Theorem 7.56 If a < u, then | |⊕ < ∞ a.s., and there exists an initial condition δ(0) such that the solution δ(k ) of the ratio equation (7.59) forms a stationary and ergodic process. Proof From Lemma 7.45, either | (k )| ⊕ tends to ∞ a.s., or (k ) tends to | |⊕ < . Assume we are in the first case. Then in view of (7.62), lim sup ζ(k ) = ∞ k →∞ with a.s., which contradicts (7.83) if a < u. Therefore a < u implies that | |⊕ < ∞ a.s., and hence < ∞ a.s. Taking δ(0) = makes δ(k ) stationary in view of (7.61). Remark 7.57 If we return to the initial system (7.54), we may ask whether this system has an initial condition (x (0), u (0)) which renders the ratio process δ(k ) stationary. The answer to this question is equivalent to the existence of a solution to the equation Ax (0) ⊕ Bu (0) = u (0) , where the unknowns are x (0), and u (0), and where is the random variable defined in Lemma 7.44. We will not pursue this line of thought since we will see that the stationary regime { ◦θ k } is actually reached by coupling regardless of the initial condition δ(0). 7.4. Second-Order Theorems; Nonautonomous Case Lemma 7.58 In the strongly connected case, if a > u, the variables to ∞ a.s. 341 (k ) all converge Proof In view of (7.62), a > u implies | (k )| ⊕ tends to ∞ and hence | |⊕ = ∞. From (7.61), we obtain n ◦θ ≥ G ◦θ H , where n def G= A◦ θ l , l =1 def n−1 H= U− ◦ θ l . l =0 In view of our assumptions on A, the matrix G (k ) defined in (see (7.37)) is such that G i j ≥ e for all i, j = 1, . . . , n. This together with | |⊕ = a.s. implies j ◦θ n = a.s. for all j = 1, . . . , n (since U− is a.s. finite). Therefore a > u implies j = a.s. for all j = 1, . . . , n. Corollary 7.59 The random variable is the least stationary solution of (7.61). Proof Starting with a solution of (7.61), we necessarily have ≥ C ◦ θ −1 = (0). It is easily checked that ≥ (k ) implies ≥ (k + 1) (the proof is essentially the same as for the preliminary example). Therefore ≥ . ◦ Hence if a > u, there is no finite stationary regime for ratios of the type x j (k )/ u i (k ). Remark 7.60 A few remarks are in order concerning stochastic event graphs. If there exist initial lag times x (0) and u (0) which make the ratio process stationary, these lag times are not compatible in general (see Remark 7.15). • Nothing general can be said about the critical case u = a. As in queuing theory, it may happen that is finite or infinite, depending on higher order statistics. For instance, if the holding times are deterministic, the variable is finite (see Chapter 3). In the case of i.i.d. exponentially distributed holding times, it will be infinite. • It is not always true that the variables are integrable. Simple counterexamples can be found in queuing theory. For instance, the stationary waiting times in the G I / G I /1 queue (see §7.2) fall in this category of ratios and are only integrable under certain specific conditions on the second moments of the holding (service and interarrival) times. • Assume that the constant a is fixed; what is the minimal value of u for which the system can be stabilized? The preceding remarks show that the threshold is for u = a, the cycle time of the strongly connected component. In other words, the minimal asymptotic rate of the input for which the system can be stabilized, is equal to the cycle time of the system with an infinite supply of tokens in each of the inputs. This result can then be seen as a generalization to Petri nets of a result known as Lavenberg’s Theorem [83] in queuing theory. 342 7.4.4.4 Synchronization and Linearity Coupling; Strongly Connected Case Theorem 7.61 Under the statistical assumptions of §7.4.4.1, if a < u, there exists a unique random matrix such that for all finite initial conditions δ(0), the random variables {δ(k ; δ(0))} couple in finite time with the sequence { ◦θ k }, regardless of the initial conditions provided they are finite. If a > u, then for all j = 1, . . . , n, and i = 1, . . . , m, δ j i (k ; δ(0)) converges a.s. to ∞ when k tends to ∞. The proof is based on the following lemma. Lemma 7.62 Assume that a < u. Then for any finite initial condition δ(0), there exists a positive integer K (δ(0)) such that for all k ≥ K (δ(0)), δ(k ; δ(0)) = δ(k ; C ◦ θ −1 ). Proof The proof is by contradiction. Assume that δ(k ; δ(0)) = δ(k ; C ◦ θ −1 ) for all k ≥ 0. This means that for any fixed k ≥ 1, there exists a pair of integers ( jk , ik ), with 1 ≤ jk ≤ n and 1 ≤ ik ≤ m, such that δ jk ik (k ; z 1 ) > δ jk ik (k ; z 2 ), where z 1 is either z or C ◦θ −1 and z 2 is the other one. In view of (7.59), we necessarily have δ jk ik (k ; z 1 ) = ( A(k )δ(k − 1; z 1 )U− (k − 1)) jk ik , for if it were not the case, we would have δ jk ik (k ; z 1 ) = δ jk ik (k ; z 2 ) = ( B (k )U+ (k − 1)U− (k − 1)) jk ik (in Rmax , a ⊕ b = a implies a ⊕ b = b ). This in turn implies the existence of a pair of integers ( jk−1 , ik−1 ), with 1 ≤ jk−1 ≤ n and 1 ≤ ik−1 ≤ m, such that δ jk ik (k ; z 1 ) = A jk jk−1 (k )δ jk−1 ik−1 (k − 1; z 1 )U− (k − 1)ik−1 ik . It is easy to see that necessarily δ jk−1 ik−1 (k − 1; z 1) > δ jk−1 ik−1 (k − 1; z 2 ). If this were not true, we would then have δ jk ik (k ; z 1 ) = A jk jk−1 (k )δ jk−1 ik−1 (k − 1; z 1 )U− (k − 1)ik−1 ik ≤ A jk jk−1 (k )δ jk−1 ik−1 (k − 1; z 2 )U− (k − 1)ik−1 ik n m ≤ A jk p (k )δ pq (k − 1; z 2 )U− (k − 1)qik p =1 q =1 ⊕ ( B (k )U+ (k − 1)U− (k − 1)) jk ik = δ jk ik (k ; z 2 ) , which would contradict the definition of jk and ik . More generally, by using the same argument iteratively, we can find a sequence of pairs {( jk−l , ik−l )}l=1,2,... ,k such that for all l in this range, δ jk−l+1 ik−l+1 (k − l + 1; z 1 ) = A(k − l + 1) jk−l+1 jk−l δ(k − l ; z 1 ) jk−l ik−l U− (k − l )ik−l ik−l+1 . Therefore, there exists a sequence of pairs such that k −1 k δ jk ik (k ; z 1 ) = A jk−l+1 jk−l (k − l + 1)δ j0 i0 (0; z 1 ) l =1 U− (l )il il+1 , l =0 7.4. Second-Order Theorems; Nonautonomous Case 343 which implies that δ jk ik (k ; z 1 ) ≤ ξ(k ; z 1 ), where ξ(k ; z 1 ) is defined by (7.69). On the other hand, we know from (7.59) that δ(k ; z 1 ) ≥ B (k )U+ (k − 1)U− (k − 1); thus for some pair ( jk , ik ) such that the line B jk . has some nonvanishing entries, (ξ(k ; z 1 ))1/ k ≥ ( B (k )U+ (k − 1)U− (k − 1)) jk ik 1/ k . (7.84) ◦ Owing to (7.70), ξ(k ; z 1 )1/ k → a/u when k → ∞. Similarly, our assumptions on the rates (7.63) and on the integrability of the entries of B imply that 1/ k lim ( B (k )U+ (k − 1)U− (k − 1)) jk ik k →∞ = = = B jk l (k ) lim k →∞ 1/ k l 1/ k l lim B jk l (k ) e ◦ (u l (k )) 1/ k / u ik (k ) k →∞ 1/ k ◦ lim (ul (k )) 1/ k / lim u ik (k ) k →∞ 1/ k k →∞ a.s. ◦ By letting k go to ∞ in (7.84), we finally obtain a/u ≥ e, where the contradiction is ◦ obtained, since we have assumed that a/u < e. The proof of Theorem 7.61 is obtained from the last lemma using the same arguments as in the preliminary example (see Lemma 7.21). Corollary 7.63 Under the stability condition a < u, Equation (7.61) admits a unique finite solution. Proof Given the coupling property of Theorem 7.61, the proof is the same as the one of Corollary 7.22. ◦ Remark 7.64 The coupling of the ratios x (k + 1)/u (k ) with a finite θ -stationary se◦ quence implies the coupling of other ratios like x (k + 1)/x (k ) with a θ -stationary and finite sequence. In order to see this, write x (k + 1) = δ(k )U− (k )δ− (k ) , x (k ) ◦ where δ− (k ) is the matrix with entries (δ− (k ))i j = e/δ j i (k ). The coupling of {δ(k )} with a stationary sequence implies that of {δ− (k )}, which in turn implies that of {x (k + ◦ 1)/ x (k )}, provided {U− (k )} couples with a stationary sequence too. Example 7.65 (Blocking after service) Consider a network of n machines in series. The first machine has an infinite input buffer and is fed by an external arrival stream of items. There are no intermediate buffers between machine j and machine j + 1, 1 ≤ j ≤ n − 1, and an item having completed its service in machine j is blocked there as long as machine j + 1 is not empty. It is easily checked that this mechanism is adequately described by the timed event graph of Figure 7.3. The input transition q0 is associated with the input function u (k ), 344 Synchronization and Linearity q0 p10 q11 q12 p23 p20 p33 p30 q21 p22 p12 pJ −1 ,3 pJ −1 ,0 pJ 3 qJ −1 ,1 p qJ −1 ,2 pJ 0 J −1 ,1 pJ −1 ,2 qJ 1 pJ 1 qJ 2 pJ 2 Figure 7.3: Blocking after service with inter-input times U (k ); transitions q j 1 and q j 2 are associated with the behavior of machine j , with j = 1, . . . , n. Place p j 0 , which precedes q j 1 , represents the device for transporting items from machine j − 1 to machine j . The holding times associated with these places (namely the transportation times) will be assumed to be identically equal to e. Transition q j 1 represents the admittance to machine j , and q j 2 the exit out of machine j . The holding times in p j 1 will be denoted α j (k ) and represent the processing time of the k -th item to enter machine j . Finally, the feedback arc which contains p j +1,3 forbids one to send an item from machine j to machine j + 1 if the latter still has an item (this is the blocking mechanism). Similarly, the feedback arc which contains p j 2 prevents an item from entering machine j if this one contains another item. The two places p j 2 and p j 3 are assumed to have zero holding times. All these variables are defined on the probability space ( , F, P, θ), as well as the compatible initial lag times v j ∈ R, where v j is both the initial lag time of the token in p j 2 , and the one of the token in p j 3 . It is easily checked that the state space can be reduced to the set of transitions Q = {q12 , . . . , qn2 }, and that the corresponding canonical form of the evolution equation of Corollary 2.82 reads x (k + 1) = A(k )x (k ) ⊕ B (k )u (k ) , where the n × n matrix A(k ) is given by the relation j l=i αl (k + 1) for i = 1, . . . , j ; A j i (k ) = e for i = j + 1 ; ε for i = j + 2, . . . , n , j whereas the n× 1 matrix B (k ) is defined by B j 1 (k ) = l=1 αl (k +1), and u (k ) = u (k + ◦ 1). The precedence graph is strongly connected, and ratios of the form x j (k + 1)/u (k ) with j = 1, . . . , n, admit a unique stationary regime if u = E[U ] > a, where a is the ◦ cycle time associated with matrices A(k ). The ratio x j (k + 1)/u (k ) here represents the 7.4. Second-Order Theorems; Nonautonomous Case 345 delay between the arrival of an item in the network and the time at which it starts being processed by machine j . If this rate condition is fulfilled, the unique stationary regime is reached with coupling regardless of the initial lag times. 7.4.4.5 More on Coupling; General Case The notations and the statistical assumptions of the present section are those of §7.4.4.3. The only difference concerns the precedence graph associated with A, which is not supposed to be strongly connected here. We assume that A has N A m.s.c.s.’s. As in the preceding subsections, the basic process of interest is the ratio process def η(k ) = X (k + 1) , X (k ) (7.85) where { X (k )} is the sequence defined in (7.66). For the proof of the next theorem, it will be convenient to use Equation (7.66). By construction, the number of m.s.c.s.’s of D is N D = N A + 1. Remark 7.66 The δ(k ) process defined in (7.56) is the ratio of the state variables x (k + 1) and the input u (k ), whereas the ratio η(k ) defined above is u (k + 1) x (k + 1) ◦ / u (k ) x (k ) . The restriction of the latter matrix to the set of coordinates (> 1), (1) coincides with δ(k ). Theorem 7.67 If the Lyapunov exponents a(n) of Theorem 7.36 satisfy the condition NA a (n ) < u , (7.86) n =1 then there exists a unique finite random (n + m) × (n + m) matrix E such that the sequence {η(k )}, defined in (7.85), couples in finite time with a uniquely defined stationary and ergodic process { E ◦θ k }, regardless of the initial condition. If NA a (n ) > u , (7.87) n =1 let n 0 be the first n = 1, . . . , N A , such that a(n) > u. Then all ratios of the form η j i (k ), j ∈ Vn0 , i ∈ Vm , m < n 0 , tend to ∞ a.s., for all finite initial conditions. Proof We prove by induction on n = 1, . . . , N D , that under the rate condition (7.86), the matrices X (≤n) (k + 1) , X (≤n) (k ) couple in finite time with a uniquely defined and finite θ -stationary process, and that the stationary regime is such that ◦ E[ X (≤n) (k + 1) i / X (≤n) (k ) i ] = u , ∀i . 346 Synchronization and Linearity This induction assumption is satisfied for the source m.s.c.s. of D , namely for n = 1, which corresponds to the set of input variables, because of the assumptions which were made on the input functions. It is also satisfied for all source m.s.c.s.’s of A, namely for all n = 2, . . . , N0 + 1 (the numbering we adopt here for m.s.c.s.’s is that relative to D ), in view of Theorem 7.61 and Remark 7.64. Assume that the induction assumption is true up to rank n − 1, where n − 1 ≥ N0 + 1 and that a(n) < u. From (7.66) we obtain X (n) (k + 1) = D(n),(n) (k ) X (n) (k ) ⊕ D(n),(≤n−1) (k ) X (≤n−1) (k ) , where the notation should be clear in view of our conventions. From the assumption ◦ that the ratios of X (≤n−1) (k + 1)/ X (≤n−1) (k ) couple in finite time with θ -stationary sequences, and that the rates of the coordinates of X (≤n−1)(k ) are all u, we obtain from ◦ Theorem 7.61 that the ratios X (n) (k + 1)/ X (≤n−1)(k ) also couple with a finite and sta◦ tionary sequence. This immediately implies that the sequence X (≤n) (k + 1)/ X (≤n) (k ) also couples with a θ -stationary sequence. ◦ For all i ∈ Vn and in the stationary regime, E[ X i (k + 1)/ X i (k )] = u. This follows from the rate property of Theorem 7.36, which implies that lim ( X i (k ))1/ k = u k →∞ a.s. , (7.88) ◦ for all i ∈ Vn , and from the following simple argument: the variables X i (k + 1)/ X i (k ) are nonnegative (the diagonal elements of D are all assumed to be greater than e), and couple with a θ -stationary sequence { i i ◦θ k }; therefore, either the random variable i i is integrable, or E[ i i ] = ∞. Since k X i (k ) = X i (0) ◦ ( X i (l )/ X i (l − 1)) , l =1 if E[ ii ] = ∞, Birkhoff’s Theorem implies that 1/ k k lim ( X i (k )) k 1/ k = ◦ X i (l )/ X (l lim k 1/ k k = i i ◦θ lim k − 1)i l =1 l =∞ a.s., l= K ◦ where K is the coupling time of X i (k + 1)/ X i (k ) with its stationary regime. This is in contradiction with (7.88). Therefore i i is integrable and necessarily of mean value u. The uniqueness property follows from Corollary 7.63. Remark 7.68 The only general statement which can be made concerning integrability ◦ is as follows: ratios of the form x i (k + 1)/x i (k ) are always integrable, whereas ratios ◦ of the form x j (k + 1)/ x i (k ), where i and j belong to different m.s.c.s.’s, say i ∈ Vm and j ∈ Vn with m = n , may be finite and nonintegrable. 7.4. Second-Order Theorems; Nonautonomous Case 7.4.5 347 Multiplicative Ergodic Theorems in Rmax We are now in a position to answer the eigenpair problem associated with Equation (7.54) and stated in (7.68). The notations concerning the m.s.c.s.’s of A and D are those of Theorem 7.67. Theorem 7.69 Under the statistical assumptions of §7.4.4.1, if the stability condition NA n=1 a(n) < u is satisfied, there exist a unique finite random eigenvalue λ and a unique finite random eigenvector X ∈ Rm+n , with X 1 = e, and such that (7.68) holds. Proof Owing to (7.67), we know that the restriction D(1),(1) = U+ of D to the first m coordinates, satisfies the eigenpair property D(1),(1)u = λu ◦θ , for a uniquely defined finite eigenpair (λ, u ), with u 1 = e. Therefore, the first m coordinates of X and λ are uniquely defined, and in particular X (1) = u . Let be the n × m random matrix of Lemma 7.44 (the finiteness of is obtained from Theorem 7.56). Let y be the Rn vector defined by def ◦ y ◦θ = ( u )/λ , so that (7.89) u = λ y ◦θ . When multiplying (7.61) by u , we obtain λ y ◦θ = A ◦θ −1 U− ◦θ −1 u ⊕ B (U+ U− )◦θ −1 u . (7.90) In view of the definition of U+ , we have U+ U− = λu ◦θ u u ◦θ = . u λu ◦θ u ◦θ Therefore (U+ U− )◦θ −1 u = u . Similarly, ◦θ −1 U− ◦ θ − 1 u = ◦θ −1 u ◦θ −1 u=y . λ◦θ −1 u Thus, the right-hand side of (7.90) reads Ay ⊕ Bu , and we conclude the proof of the existence part by taking X (>1) = y . The uniqueness property follows from the following observations: if X and X are two different eigenvectors, they can only differ through their last coordinates (i.e. the coordinates corresponding to (> 1)), since u and λ are uniquely defined. Then there exist two different vectors y and y in Rn such that λ y ◦θ = Ay ⊕ Bu , By defining λ y ◦θ = Ay ⊕ Bu . y ◦θ y ◦θ def , =λ , u u we obtain two different finite matrices which are easily shown to satisfy Equation (7.61). Since this equation has a unique finite solution, we reach a contradiction. def =λ 348 Synchronization and Linearity We conclude this subsection by the following rewriting of the coupling property of Theorem 7.67, which holds whenever U+ (k ) is stationary, and which generalizes Theorem 7.18. NA Theorem 7.70 If the stability condition n=1 a(n) < u is satisfied, and if the matrices U+ (k ) are θ -stationary, then for all finite random initial conditions X (0) with X 1 (0) = e, a finite integer-valued random variable K exists, such that for all k ≥ K , X (k + 1) = D (k ) . . . D (1) D (0) X (0) = λ◦θ k . . . λ◦θλ X ◦θ k+1 , (7.91) where ( X , λ) is the eigenpair of Theorem 7.69. ◦ Proof Owing to the coupling property of Theorem 7.67, we have that X (k )/ X 1 (k ) couples in finite time with X ◦θ k . We obtain (7.91) by ⊗-multiplying the equality ◦ X (k )/ X 1 (k ) = X ◦θ k , which holds for k greater than the coupling time K , by λ◦θ k−1 . . . λ = X 1 (k ) X 1 (1) ... , X 1 (k − 1) X 1 (0) and by using the assumption that X 1 (0) = e. 7.5 Second-Order Theorems; Autonomous Case The equation of interest in this section is x (k + 1) = A(k )x (k ) , k≥0 , (7.92) with initial condition x 0 , where x (k ) and A(k ) have their entries in a dioid D. 7.5.1 Ratio Equation As in the preceding sections, the basic process of interest is the ratio process1 δ(k ; x 0 ) = x (k + 1; x 0 ) . x (k ; x 0 ) (7.93) The aim of this subsection is to determine the condition under which this process admits a stationary regime where δ(k ) = δ ◦θ k , k≥0 , (7.94) and to quantify the nature of the convergence of δ(k ; x 0 ) to this regime. 1 We will use the same symbol δ(·) to represent this ratio process and the one defined in Equation (7.56) in the nonautonomous case; the context should help determining which one is meant; in this section, the optional argument of δ(k ) will be x0 . 7.5. Second-Order Theorems; Autonomous Case 349 Lemma 7.71 The state variables δ(k ; x 0 ) satisfy the inequalities A(k + 1)δ(k ) , A(k ) δ(k + 1; x 0 ) ≥ k≥0 , (7.95) with the initial condition δ(0; x 0 ) = A(0)x (0) . x (0) (7.96) Proof We have δ(k + 1) = = ≥ x (k + 2) A(k + 1)x (k + 1) = x (k + 1) A(k )x (k ) ◦ ( A(k + 1)x (k + 1)) / x (k ) A(k ) ◦ A(k + 1) (x (k + 1)/x (k )) , A(k ) where we successively used (f.9) and (f.12) in the second and the third relations. Example 7.72 Consider the case of matrices with entries in Rmax ; matrix A(k ) is such that Ai j (k ) = ε , ∀i, j = 1, . . . , n , k ≥ 0 . ◦ In Rmax , if A and B are finite n × n matrices, the n × n matrix X = A/ B is given by the relation n Xij = ◦ Aik / B j k , ∀i, j = 1, . . . , n (7.97) k =1 (see Equation (4.82)). Using this relation and the finiteness assumption on A(k ), we directly obtain that for all k ≥ 1, i, j = 1, . . . , n, δ j i (k + 1; x 0 ) = ◦ x j (k + 2; x 0)/ n Aig (k )x g (k ; x 0 ) g =1 n ◦ x j (k + 2; x 0)/ Aig (k )x g (k ; x 0 ) = g =1 n n = ◦ A j h (k + 1)δhg (k ; x 0 ) / Aig (k ) g =1 h =1 n = ◦ ( A(k + 1)δ(k ; x 0 )) j g / Aig (k ) g =1 = A(k + 1)δ(k ; x 0 ) A(k ) . ji The following lemma is proved exactly in the same way as Lemma 7.40. 350 Synchronization and Linearity Lemma 7.73 The least solution of (7.95) is also the solution of the set of equations A(k + 1)δ(k ; x 0 ) , A(k ) δ(k + 1; x 0) = (7.98) ◦ with the initial condition δ(0; x 0 ) = ( A(0)x (0))/ x (0). 7.5.2 Backward Process The statistical assumptions on A(k ) are those of §7.3.1. The dioid D is assumed to be complete. Theorem 7.74 Under the above assumption, Equation (7.98) has a θ -stationary subsolution in the sense that there exists a random variable satisfying the relation ◦θ A◦ θ A ≤ . (7.99) ◦ This subsolution is a solution if the right division x → x/b is l.s.c. The proof is based on the backward process defined by (0) = A , (k + 1)◦θ = A◦ θ (k ) , A k≥0 . (7.100) Lemma 7.75 The sequence { (k )} is nondecreasing. Proof By using (f.6), we obtain A◦ θ A ≥ A◦ θ , A (1)◦ θ = so that (1) ≥ (0). Assume (k ) ≥ (k + 1)◦θ (k − 1), for k ≥ 1. Then = ≥ = A◦ θ (k ) A A◦ θ (k − 1) A (k )◦ θ , ◦ where we used the isotony of the mapping x → (ax )/b . Proofof Theorem 7.74 From the preceding lemma, we obtain that the sum def = (k ) k ≥0 (7.101) 7.5. Second-Order Theorems; Autonomous Case 351 exists since D is assumed to be complete. By summing up the equations in (7.100) and ◦ by using the nondecreasingness of (k ) and the fact that the mapping x → (ax )/ b is isotone, we directly obtain = (k )◦ θ = k ≥1 k ≥0 A◦ θ (k ) A◦ θ ≤ A A . ◦ If the mapping x → (ax )/ b is l.s.c., there is equality in the last relation. A sufficient ◦ condition for this is that the right division operator x → x/b be l.s.c. The subsolution satisfies the following extremal property. Lemma 7.76 The random variable greater than or equal to A. Proof Let we obtain be an arbitrary solution of (7.99) such that ◦θ Thus, ≥ is less than or equal to any solution of (7.99) = A◦ θ A ≥ A◦ θ (k ) ≥ A ≥A= (0). If ≥ (k ), (k + 1)◦θ . . As in the nonautonomous case, the question of finiteness of the minimal (sub)solution will only be addressed in specific dioids. However, we have the following general bound. Lemma 7.77 For all k ≥ 1, (k )◦ θ ≤ A◦θ A . . . A◦ θ −k+1 . A . . . A◦ θ −k+1 Proof The bound clearly holds for k = 1, in view of the definition. Assume it is true for k ≥ 1. Then (k + 1)◦θ = ≤ ≤ = A◦ θ (k ) A ◦ A◦ θ A A◦ θ −1 . . . A◦θ −k / A◦θ −1 . . . A◦ θ −k A ◦ A◦ θ A◦ θ −1 . . . A◦θ −k / A◦θ −1 . . . A◦ θ −k A A◦ θ A◦ θ −1 . . . A◦ θ −k , AA◦ θ −1 . . . A◦ θ −k where we used (f.12) and (f.9) to obtain the last two relations. 352 7.5.3 Synchronization and Linearity From Stationary Ratios to Random Eigenpairs In this subsection we make use of the following inequalities which hold in any dioid D. Lemma 7.78 For all x ∈ D, x≤ e ◦ x \e and x≤ e . ◦ e/ x (7.102) Proof When taking x = e and a = x in Formula (f.5) of Table 4.1, we obtain ◦ x (x \e) ≤ e, which immediately implies the first relation in view of the definition of the residuation of right multiplication. The second relation is obtained in the same way. The results of this subsection are concerned with Equation (7.92) in a general dioid D, where A(k ) = A◦ θ k . Theorem 7.79 Assume that the ratio equation associated with (7.92) admits a stationary subsolution in D such that ≤ A◦θ A = ◦θ Ax x , (7.103) and such that , (7.104) for some x in D. Then there exists a right super-eigenpair (λ, X ) such that AX ≥ X ◦θλ . (7.105) Proof Let y = Ax . From (7.103), (f.12) and (f.9), we obtain ◦θ ≤ ◦ ◦ A◦ θ( y/ x ) ( A◦ θ y )/ x A◦ θ y A◦ θ y ≤ = = . A A Ax y ◦ Therefore ◦θ y ≤ A◦ θ y , that is, ( y ◦θ)/(x ◦ θ) y ≤ A◦ θ y . By using (7.102), we obtain ◦ ◦ x ◦θ ≤ (e/ x ◦θ) \e, so that A◦ θ y ≥ y ◦θ y ◦θ y≥ ◦ ◦ (e/ x ◦θ) \e e e y, x ◦θ ◦ where we used (f.11) in order to obtain the last relation. Since x/e = x , we finally obtain e y. A◦ θ y ≥ y ◦θ x ◦θ When taking X = ( Ax )◦ θ −1 (7.106) 7.5. Second-Order Theorems; Autonomous Case 353 and λ= e X, x (7.107) we directly obtain (7.105). Remark 7.80 The terminology which is used in the preceding theorem comes from the case when the dioid of interest is a matrix dioid associated with some scalar dioid D. In this case, it is easily checked from (4.82) that if A is n × n and x (k ) n × 1, then λ, defined in (7.107), is a scalar and X , defined in (7.106), is an n × 1 matrix. 7.5.4 Finiteness and Coupling in Rmax; Positive Case In this subsection, D = Rmax . By using (7.97), it is easily checked that the rightdivision operator of matrices with finite entries in Rmax is l.s.c., so that the subsolution of Theorem 7.74 is a solution. The statistical assumptions are those of the previous section. In addition, we assume that A is positive in the sense that Ai j (k ) ≥ e for all i, j = 1 . . . , n. More general conditions will be considered in the next subsection. 7.5.4.1 Finiteness of the Minimal Stationary Solution We start with a few preliminary lemmas. Lemma 7.81 For all initial conditions x 0 , and all k ≥ 0, δ(k ; x 0 ) satisfies the bounds A(k ) ≤ δ(k ; x 0 ) , (7.108) |δ(k ; x 0 )|⊕ ≤ | A(k )| ⊕ | A(k − 1)|⊕ . (7.109) and for all k ≥ 1, Proof Assume that for some k ≥ 0, δ(k ; x 0 ) ≥ A(k ); this is true for k = 0 in view of (7.96) and (f.6) which imply that δ(0) = A(0)x 0 ≥ A(0) . x0 Then δ(k + 1; x 0 ) = A(k + 1)δ(k ; x 0 ) A(k + 1) A(k ) ≥ ≥ A(k + 1) , A(k ) A(k ) ◦ where we successively used the isotony of the mapping x → (ax )/b and (f.6). This completes the proof of the lower bound. As for the upper bound, let δ− (k ; x 0 ) be the matrix δ− (k ; x 0 ) = x (k ) , x (k + 1) k≥0 . 354 Synchronization and Linearity For all k ≥ 1, we have δ− (k ; x 0 ) = = ◦ x (k ) x (k )/ x (k − 1) = A(k ) A(k − 1)x (k − 1) A(k ) A(k − 1) δ(k − 1; x 0 ) A(k − 1) ≥ , A(k ) A(k − 1) A(k ) A(k − 1) where we successively used (f.9) and the lower bound on δ(k ). Since we are in Rmax , this reads (see Equation (4.82)) n ◦ x i (k )/ x j (k + 1) ≥ ◦ Ail (k − 1)/ ( A(k ) A(k − 1)) jl , l =1 which can be rewritten as n ◦ x j (k + 1)/x i (k ) ≤ ◦ ( A(k ) A(k − 1)) jl / Ail (k − 1) . l =1 Since Ail (k − 1) ≥ e for all i, l , we finally obtain n |δ(k )| ⊕ ≤ ( A(k ) A(k − 1)) jl j ,l =1 = | A(k ) A(k − 1)|⊕ ≤ | A(k )| ⊕ | A(k − 1)|⊕ , which concludes the proof of the upper bound. Lemma 7.82 For all k ≥ 0, the random variable A≤ (k ) satisfies the bounds (k ) , (7.110) and | (k )| ⊕ ≤ | A| ⊕ A◦ θ −1 ⊕ . (7.111) Proof The fact that (k ) ≥ A is clear since (k ) is nondecreasing and (0) = A. In order to prove the upper bound in (7.111), we first establish the property that for all x 0 , (k ) ≤ δ(k ; x 0 )◦θ −k , k≥0 . (7.112) The proof is by induction; the property holds for k = 0, in view of (7.108) considered for k = 0. By assuming it holds for some k ≥ 0, we then obtain (k + 1)◦θ = = A◦ θ (k ) A◦θδ(k ; x 0 )◦θ −k ≤ A A A◦ θ k+1 δ(k ; x 0 ) −k ◦θ = δ(k + 1; x 0 )◦θ −k , A◦ θ k 7.5. Second-Order Theorems; Autonomous Case 355 which concludes the proof of (7.112). From (7.112) and (7.109), we immediately obtain that | (k )| ⊕ ≤ |δ(k ; x 0 )|⊕ ◦θ −k ≤ | A|⊕ A◦ θ −1 ⊕ , ∀k ≥ 0 . By putting together the results obtained in this section, we see that the nondecreasing and bounded sequence { (k )} necessarily converges to a finite and integrable limit. Theorem 7.83 In Rmax , if A is integrable and such that Ai j ≥ e for all i, j , the equation ◦θ = A◦θ A (7.113) always admit the limiting value of the sequence { (k )} as a finite and integrable solution. Any other solution of (7.113) is bounded from below by . However, the situation is slightly different from the one encountered in the nonautonomous case: in particular, the equality in law (k ) = δ(k )◦ θ −k has no reason to hold here (because it is not true in general that we can take δ(0) = (0)). Therefore, nothing ensures a priori that δ(k ) converges weakly to as k goes to ∞; the only thing which we know from Lemma 7.76 is that any stationary regime of the ratio process is bounded from below by . The conditions under which this minimal solution can be a weak limit for the ratio process are the main focus of the following subsections. 7.5.4.2 Reachability Definition 7.84 (Direct reachability) A stationary solution of (7.113) is directly reachable if there exists an initial condition x 0 for which the ratio process defined in (7.93) coincides with the stationary process { ◦θ k }, in the sense that δ(k ; x 0 ) = Lemma 7.85 The stationary solution of equations ◦θ k , k≥0 . of (7.113) is reachable if and only if the system = Ax x (7.114) has a finite solution x ∈ Rn with x 1 = e. If such a solution exists, it is unique. Proof If such a solution exists, the ratio process (7.98) can then be made stationary by adopting x 0 = x as the initial condition (see (7.96)). It is clear that there is no loss of generality in assuming that x 1 = e. We now prove that (7.114) has at most one finite solution with x 1 = e. Indeed, for all n-dimensional column vectors a and b with finite entries in Rmax , the relation ◦ a/b b = ◦ a/b b 356 Synchronization and Linearity holds (use (4.82) repeatedly). Therefore, x= x x ·1 ◦ Ax/ x ◦ Ax/ x = so that x is uniquely defined from ·1 = ·1 , . Remark 7.86 Using the construction in the preceding proof, it is easy to check that the set A(ω)x (ω) ω ∃ x (ω) ∈ Rn : x 1 (ω) = 0, (ω) = x (ω) can be rewritten as ω ◦ A(ω) ( (ω) \ (ω)) ·1 ◦ ( (ω) \ (ω)) ·1 (ω) = , (7.115) where its measurability becomes more apparent. Similarly, a stationary solution of (7.113) is said to be reachable by coupling if there exists an initial condition x 0 and a finite random variable K such that δ(k ; x 0 ) = ◦θ k , k ≥ K . The aim of the following subsections is to give sufficient conditions under which the minimal stationary solution of Theorem 7.74 satisfies the above reachability properties. 7.5.4.3 Conditions for Reachability For k ≥ 0, let A(k ) denote the event def A(k ) = ω where event ∃ x ∈ Rn : (k ) = Ax x , (7.116) (k ) is the nondecreasing backward process defined in (7.100), and let A be the def A= ω ∃ x ∈ Rn : = Ax x , (7.117) where is the a.s. limit of (k ). The following notation will be used in what follows: for all events B, B◦θ will denote the set B◦θ = {ω ∈ | 1B ◦θ(ω) = 1}. Lemma 7.87 For all k ≥ 0, A(k ) is included in A(k + 1)◦θ and A is included in A◦θ. Proof For ω ∈ A(k ), there exists a finite random vector x such that Hence, ◦ A◦ θ (k ) A◦ θ ( Ax/ x ) = A A ◦ ( A◦ θ Ax ) / x A◦ θ Ax = = A Ax A◦ θ ( (k )) ·1 = , ( (k )) ·1 ◦ (k ) = Ax/ x . (k + 1)◦θ = (from (f.12) and (f.9)) 7.5. Second-Order Theorems; Autonomous Case 357 so that θ(ω) belongs to A(k + 1) (take x = ( (k )) ·1 ◦θ −1 ). The proof of the second inclusion is similar. Lemma 7.88 If P[A(k )] > 0 for some k, then lim supk→∞ A(k ) = a.s. Proof If P[A(k )] > 0 for some k ≥ 0, the ergodic assumption implies that h 1{A(k)}◦θ −m / h = P[A(k )] > 0 a.s., lim h→∞ m =1 so that necessarily lim supm →∞ A(k )◦ θ −m = a.s. From Lemma 7.87, we obtain by an immediate induction that for all m ≥ 0, A(k + m ) ⊇ A(k )◦ θ −m . Thus, the last relation implies that lim supk→∞ A(k ) = a.s. Lemma 7.89 If P[A(k )] > 0 for some k, then P[A] = 1. Proof Let H be the subset of Rn×n defined by ∈ Rn×n H= ∃ x ∈ Rn : = Ax x . When using (7.115), we obtain the equivalent representation ∈ Rn×n H= = ◦ A ( \ )·1 ◦ ( \ )·1 , from which it is immediate that H is a closed subset of Rn×n . Owing to Lemma 7.88, if P [A(k )] > 0 for some k , then for almost all ω ∈ there exists a sequence of integers kn ↑ ∞ such that ω ∈ A(kn ) or equivalently such that (kn ) ∈ H for all n ≥ 1. Since H is closed, the a.s. limit of (kn ) when n goes to ∞ is also in H, so that ω ∈ A. Let h be a fixed integer such that 1 ≤ h ≤ n, and let B be the event B = { A◦ θ·h Ah · = A◦ θ A} , (7.118) or equivalently, n B= A j h ◦θ Ah i = A j h ◦θ Ahi , ∀i, j = 1, . . . , n . (7.119) h =1 Theorem 7.90 If there exists h such that P[B] > 0, the stationary regime defined by is directly reachable. Proof If a is an n-dimensional row (respectively column) vector with finite entries in Rmax , then we can use the group structure of ⊗ to write a= e ◦ a \e respectively a = e ◦ e/ a , 358 Synchronization and Linearity ◦◦ where e is 1 × 1. If c is a scalar, we have c = e/(e/ c). On B, we therefore have ◦ ◦ A◦ θ A A◦ θ·h Ah · A◦ θ·h (e/ ( Ah · \e)) = = A A A ◦ ( Ah · \e) ◦ A◦ θ·h / A◦ θ·h = = . ◦ A A ( Ah · \e) (1)◦ θ = In these equalities, we used (f.12) in order to obtain the fourth equality, and (f.9) in order to obtain the last one. Both are equalities because D = Rmax and the entries of the matrices which are dealt with are finite. ◦ Let Z = A ( Ah · \e). On B, we have A◦θ Z = A◦ θ A e e = A◦ θ·h Ah · = A◦ θ·h . Ah · Ah · Therefore, on B, (1)◦ θ = A◦θ·h Z = A◦ θ Z , Z so that B ⊂ A(1)◦θ . The proof is immediately concluded from Lemma 7.89. Remark 7.91 Consider the particular case of an autonomous event graph for which the random variables α j (k ) are mutually independent. It is easily checked that a sufficient condition for P[B] > 0 is that there exists one transition in Q such that all the places which follow it have holding times with an infinite support. As we will see in Example 7.92, weaker conditions, like for instance having one transition followed by at least one place with infinite holding times, may be enough to ensure this property. As in the nonautonomous case, there is no reason for the initial condition, the existence of which is proved in Theorem 7.90, to be compatible. Example 7.92 (Manufacturing blocking) Consider a closed cyclic network of n machines. There are no intermediate buffers between machines. The migration of items is controlled by manufacturing blocking, as defined in Example 7.65: when an item is finished in machine j , 0 ≤ j ≤ n − 1, it enters machine s ( j ) = ( j + 1) mod n if s ( j ) is empty. Otherwise it is blocked in j until s ( j ) is empty. In this example, all machine indices are understood to be modulo n. A network of this type, with n = 3, is described by the timed event graph of Figure 7.4. The interpretation of the various types of places and transitions is the same as in Example 7.65. The only nonzero holding times are those in p j 1 , j = 0, . . . , n, which will be denoted α j (k ), representing the service time of the k -th item in machine j . Let µ j denote the initial number of items in p j 1 . It is assumed that µ j = 0 or 1, that n−1 0< µj < n , j =0 and that if µ j = 0, then there is an initial token in place p j 2 and in place p j 3 . 7.5. Second-Order Theorems; Autonomous Case 359 p00 p13 q01 p02 p01 q02 p10 p23 q11 p11 q12 p20 q21 p21 q22 p22 p12 p03 Figure 7.4: Closed manufacturing blocking The state variables are the firing times of the transitions followed by at least one place with a nonzero initial marking, namely x j (k ) = x j1 x j2 if µ j = 1 ; if µ j = 0 . The initial lag times are assumed to be compatible, and are given under the form of the n vector z ∈ R+ , where z j represents the lag time of the initial item in machine j if µ j = 1, and the lag times of the two tokens in p j 2 and p j 3 otherwise. In the first case, z j can be seen as the epoch when machine j starts working. In the second one, z j is the time when some past workload of machine j has been completed. Let r ( j ) be the number of machines to the right of j such that µ j = 1, plus 1 and let l ( j ) be the number of machines to the left of j such that µ j = 0, plus 1. It is easily checked that the n × n matrix A(k ) is given by the relation j− h=iµ j αh (k + 1) if i ∈ { j − l ( j ), . . . , j − 1} ; α (k + 1) i if i ∈ { j , . . . , j + r ( j ) − 1} and A j i (k ) = i ∈ { j − l ( j ), . . . , j − 1} ; / e if i = j + r ( j ) and i ∈ { j − l ( j ), . . . , j − 1} ; / ε otherwise, that the precedence graph is strongly connected, and that under the condition of Theo◦ rem 7.90, the ratios of the form x j (k + 1)/ x i (k ), i, j = 0, . . . , n − 1, admit a stationary regime which is reached with coupling regardless of the initial condition. For the example of Figure 7.4, A(k ) reads α0 (k + 1) α1 (k + 1)α2 (k + 1) α2 (k + 1) . (7.120) α1 (k + 1) e A(k ) = α0 (k + 1) α0 (k + 1) α1 (k + 1)α2 (k + 1) α2 (k + 1) 360 Synchronization and Linearity Matrix A being positive, we can directly apply Theorems 7.90 and 7.93. The condition P[B] > 0 is satisfied if the random variables α j (k ) are mutually independent and if one of the distribution functions S0 and S1 has an infinite support, where S j (t ) = P[α j ≤ t ], t ∈ R+ . 7.5.4.4 Coupling The main result of this subsection is the following theorem. Theorem 7.93 If there exists h such that P[B] > 0, the stationary sequence { ◦θ k } is the unique stationary solution of (7.98). For any initial condition x 0 , the sequence {δ(k ; x 0 )} couples with this stationary sequence. Proof Let C (k ), k ≥ 1, be the event C (k ) = { A(k + 1)x (k + 1) = A·h (k + 1)x h (k + 1)} . We first prove that on the event C (k ), the relation δ(k + 1) = A·h (k + 1) ◦ A(k ) (δh · (k ) \ e) (7.121) holds. On C (k ), we have indeed δ(k + 1) = = = A(k + 1)x (k + 1) A·h (k + 1)x h (k + 1) = A(k )x (k ) A(k ) x (k ) ◦ (e/ x h (k + 1))) ◦ A·h (k + 1) (e/ A·h (k + 1) = ◦ A(k )x (k ) A(k ) x (k ) ( e/ x h (k + 1)) A·h (k + 1) A·h (k + 1) = . ◦ ◦ A(k ) ( x (k )/ x h (k + 1)) A(k ) (δh · (k ) \ e) Therefore, on C (k ), δ(k + 2) = = A(k + 2)x (k + 2) x (k + 2) = A(k + 2) x (k + 2) x (k + 2) δ·1 (k + 1) A·h (k + 1) A(k + 2) = A(k + 2) , δ·1 (k + 1) A·h (k + 1) where the last relation follows from (7.121). This last formula shows that on the event C (k ), regardless of x 0 , δ(k + 2; x 0 ) = φ( A(k + 1), A(k + 2)) , (7.122) where φ is a measurable function which we will not need in explicit form. If we can show that for all k , D◦θ k ⊂ C (k ), where D is an event of positive probability, then Equation (7.122) implies that δ(k )◦ θ −k couples with a uniquely defined finite stationary sequence. This result is a direct consequence of Borovkov’s renovating events theorem (Theorem 7.107 shows that C (k ) is a renovating event of length 2). 7.5. Second-Order Theorems; Autonomous Case 361 The second step of the proof consists in showing that B◦θ k ⊂ C (k ). In order to do so, we first prove that B⊂ A e A◦ θ·h ≤ Ah · A◦ θ . (7.123) ◦ The property A◦ θ A = A◦ θ·h Ah · implies that A◦θ A ( Ah · \e) = A◦θ·h , which in turn ◦ ◦ implies that A ( Ah · \e) ≤ ( A◦ θ \ A◦ θ·h ), where we used the very definition of left division in order to obtain the last implication. This immediately implies (7.123). We are now in a position to conclude the proof by showing that B◦θ k ⊂ C (k ) . (7.124) Inequality (7.109) implies x (k + 1) x (k + 1) δ(k ) A(k ) = = ≥ . x (k + 1) A(k ) x (k ) A(k ) A(k ) Therefore, for all h , x h (k + 1) Ah· (k ) ≥ , x (k + 1) A(k ) or, equivalently, x (k + 1) e ≤ A(k ) . x h (k + 1) Ah· (k ) (7.125) On the event B◦θ k A(k + 1) A(k ) = A·h (k + 1) Ah · (k ) . By using (7.123) and (7.125), we therefore obtain that on this event, x (k + 1) e A·h (k + 1) ≤ A(k ) ≤ . x h (k + 1) Ah · (k ) A(k + 1) From the very definition of left division, the last relation implies that A(k + 1) x (k + 1) ≤ A·h (k + 1) , x h (k + 1) so that A(k + 1)x (k + 1) ≤ A·h (k + 1)x h (k + 1) . which concludes the proof of (7.124). Therefore, there exists a stationary sequence of renovating events of length 2. The coupling property is then a direct application of Borovkov’s Theorem. 362 7.5.5 Synchronization and Linearity Finiteness and Coupling in Rmax; Strongly Connected Case The assumptions of this subsection are the same as in §7.5.4, except for the positiveness assumption which is replaced by the assumption that the precedence graph of A has a deterministic topology (namely the entries which are equal to ε with a positive probability are a.s. equal to ε) and is strongly connected. We also make the usual assumption that the diagonal entries are nonvanishing. Under these assumptions, the matrices def E (k ) = A(nk + n − 1) A(nk + n − 2) . . . A(nk + 1) A(nk ) , k∈Z , are such that E i j (k ) ≥ e for all pairs (i, j ) (this follows from Remark 7.30 and from the fact that E (k ) = G (nk ). In this subsection, it will also be assumed that the shift def = θn is ergodic (the ergodicity of θ does not grant the ergodicity of θ k , k > 1, in general). def Observe that E (k ) = E ◦ k , where E = E (0). n Let X (k ) ∈ R be defined by the relation X (k ) = x (nk ) , k≥0 . (7.126) It is easily checked from (7.92) that the state variables X (k ) satisfy the relation X (0) = x (0) , X (k + 1) = E (k ) X (k ) , k≥0 . (7.127) Following our usual notation, we will stress the dependence on the initial condition def x 0 = x (0) by writing X (k ; x 0 ) when needed. Theorem 7.94 Under the assumption that there exists h such that P [E◦ ·h E h · = E ◦θ E ] > 0 , (7.128) ◦ the ratios δ(k ) = x (k + 1)/x (k ) also admit a stationary regime {δ ◦θ k }. This stationary regime is unique, integrable and directly reachable. Whatever the initial condition x 0 , δ(k ; x 0 ) couples with it in finite time. ◦ Proof Under Assumption (7.128), the ratio process X (k + 1)/ X (k ) couples with a stationary regime ◦ k , which is directly reachable (Theorems 7.90 and 7.93). Therefore, the equation = Ex x (7.129) has a unique solution satisfying the condition x 1 = e. From the very definition, taking x as the initial condition makes the ratios def (k ; x ) = x ((k + 1)n; x ) x (k n; x ) 7.5. Second-Order Theorems; Autonomous Case 363 stationary in k , and more precisely such that (k ; x ) = ◦ k , k ≥ 0. Therefore, the ◦ ratios x ((k + 1)n + 1; x )/x (k n + 1; x ) are stationary in k , as can be seen when writing them as x ((k + 1)n + 1; x ) x (k n + 1; x ) = = = A((k + 1)n)x ((k + 1)n) A(k n)x (k n) ◦ A((k + 1)n)x ((k + 1)n)/x (k n) A(k n)x (k n) A((k + 1)n) (k , x ) , A(k n) x (k n) and when using the stationarity of (k ; x ). But this ratio process is the one generated by the event graph when taking { A(k )◦ θ }k≥0 as the timing sequence, and y as the initial ◦ condition, where y = x (1; x )/ x 1 (1; x ). In view of the uniqueness property mentioned ◦ in Theorem 7.93 we immediately obtain that x (n + 1; x )/x (1; x ) = ◦θ . Since y1 = e, this in turn implies that y = x ◦θ , owing to the uniqueness property mentioned in Lemma 7.85. We show that the ratio process δ(k ; x ) satisfies (7.94). We have δ(1; x ) = = = A(1)x (1) x (1) = A(1) x (1) x (1) ◦ x (1)/ x 1 (1) x Ax A(1) = A◦θ ◦θ = ◦θ ◦ x (1)/ x 1 (1) x x δ(0; x )◦θ , (7.130) so that δ(k ; x ) satisfies (7.94) for k = 1. In addition, δ(k ) satisfies the equation δ(k + 1) = A(k + 1)x (k + 1) A(k + 1)δ(k ) = . A(k ) A(k ) From this relation and (7.130), we prove by an immediate induction that δ(k ) satisfies (7.94) for all k ≥ 0. One proves in the same way that the coupling of the ratios (k ) with a uniquely defined stationary process implies the same property for δ(k ). The integrability property follows from the integrability of x and of the finite entries of A and from the relation ◦ δ(0; x ) = Ax/ x . Remark 7.95 If we replace n in (7.126) by another integer n , such that • (G (n k )i j ≥ e for all i, j = 1, . . . , n (this condition is satisfied for all n ≥ n); • θ n is ergodic, then the whole construction is unchanged. As a consequence, whenever the variables A associated with n do not satisfy the reachability and coupling conditions of Theorems 7.90 and 7.93, we still have the option to test this condition on the variables A associated with n . 364 7.5.6 Synchronization and Linearity Finiteness and Coupling in Rmax; General Case The framework is the same as in §7.5.5, but A is not supposed to have a strongly connected precedence graph anymore. The notations concerning the decomposition of the precedence graph into m.s.c.s.’s are those of the end of §7.3.5. In particular, we will number the source subgraphs 1, . . . , N0 and the nonsource ones N0 + 1, . . . , N . We know from §7.5.5 that the stationary regime of the ratio process of the source subgraphs can be constructed using the techniques developed there. The only remaining problem consists in the construction of the stationary regime of nonsource subgraphs. Consider first the case when the reduced graph has a single source, namely N0 = 1, and assume that it satisfies the assumption of Theorem 7.94. Then we obtain from this ◦ theorem that the ratios x (1)(k + 1)/ x (1)(k ) couple in finite time with a stationary, ergodic k and integrable process δ(1)◦θ , which satisfies the property E δ(1) i i = a(1), where a(1) is the maximal Lyapunov exponent associated with A(1) . The same technique as in the general nonautonomous case (see §7.4.4.5) allows us to prove the following theorem. Theorem 7.96 If A(1) satisfies the assumptions of Theorem 7.94, and if the condition N n=2 a(n) < a(1) holds true, then a unique finite random matrix δ exists such that ◦ the ratio process δ(k ) = x (k + 1)/ x (k ) couples in finite time with the stationary and N k ergodic process δ ◦θ , regardless of the initial condition. If n=2 a(n) > a(1) , let n 0 be the first n ∈ {2, . . . , N } such that a(n) > a(1). Then all ratios of the form δ j i (k ), j ∈ Vn0 , i ∈ Vm , m ∈ π + (n 0 ) tend to ∞ a.s. for all initial conditions. Remark 7.97 Nothing general can be said with respect to the critical case, namely when n≥2 a(n) = a(1) (e.g. queuing theory). Consider now the case when the reduced graph has several sources, namely N0 > 1. If the sources have different cycle times, it is clear that some of the ratios of the processes x j (k ), j ∈ Vn , n = 1, . . . , N0 , can neither be made stationary nor couple with a stationary sequence. Even if all these m.s.c.s.’s have the Lyapunov exponents, nothing general can be said about the stationarity of the variables δi j (k ) for j ∈ Vn , i ∈ Vm , m , n = 1, . . . , N0 , m = n , as exemplified in the following simple situation. Example 7.98 Consider a timed event graph with three recycled transitions q1 , q2 and t and five places p1 , p2 , p1 , p2 and r . Place pi (respectively r ) is the place associated with the recycling of qi , i = 1, 2 (respectively t ) and pi is the place connecting qi to t (see Figure 7.5). Within the terminology of Example 7.37, this system is a join queue with one server (transition t ) and with two sources (transitions q1 and q2 ). This example can be seen as the simplest assembly problem in manufacturing: engines are produced by q1 and car bodies by q2 , whereas t is the assembly machine. With our terminology, we have two source m.s.c.s.’s Gi , with Vi = {qi } and Ei = (qi , qi ), i = 1, 2, and one nonsource subgraph G3 , with V3 = {t } and E3 = (t , t ). Assume the holding times in r , p1 and p2 are zero and that the holding times in p1 and p2 are mutually independent i.i.d. sequences {α1 (k )} and {α2 (k )} with common mean λ. If the variables α1 (k ) and ◦ α2 (k ) are deterministic, the ratios x 1 (k + 1)/ x 2 (k ) (with obvious notation) are stationary 7.5. Second-Order Theorems; Autonomous Case 365 q1 p1 p1 t p2 q2 r p2 Figure 7.5: Counterexample and finite whatever the initial condition. However, if the two sequences are made of exponentially distributed random variables with parameter λ, these ratios form a null recurrent Markov chain on R which admits no invariant measure with finite mass, so that they cannot be made stationary. This example can also be formulated as a nonautonomous system with a twodimensional input vector (u 1 (k ), u 2 (k )) whenever we remove the recycling place pi associated with transitions qi , and replace it by an input function u i (k ), i = 1, 2. In this formulation we see that in the exponential case (i.e. u 1 (k ) and u 2 (k ) are the epochs of two independent Poisson processes with the same intensity), the matrices U+ (k ) and U− (k ) do not satisfy the assumptions of §7.4.4.1 (although the diagonal terms of these matrices are stationary and integrable, the nondiagonal terms do not couple with finite stationary processes). We see that in this case, our second-order theorems do not apply. 7.5.7 Multiplicative Ergodic Theorems in Rmax We will limit ourselves to the case when A is positive (see §7.5.4). Theorem 7.99 If the event B = {ω | ∃h : A◦ θ A = A◦θ·h Ah · } has a positive probability, then there exists a unique finite eigenpair {λ, X }, with X 1 = e and such that AX = λ X ◦θ . This eigenpair is integrable, and E [λ ] = a , (7.131) where a is the maximal Lyapunov exponent of A. In addition, the following coupling property takes place: for all finite initial conditions x (0) = x 0 , with (x 0 )1 = e, there exists a finite integer-valued random variable K such that, for all k ≥ K , x (k + 1; x 0 ) X ◦θ =λ x (k ; x 0 ) X ◦θ k . Proof We know from Theorem 7.90 that under the above assumptions, Equation (7.105) has a unique finite solution for which (7.105) is satisfied with equality, and such that (7.104) has a solution. When specializing the formulæ of Theorem 7.79 to vectors and matrices with entries in Rmax as considered here, it is easily checked that whenever the 366 Synchronization and Linearity subsolution of (7.103) is finite and is a solution, then the super-eigenvalue inequality of (7.105) becomes an eigenvalue equality, so that there exists a pair (λ, X ) satisfying the eigenpair property (7.68). ◦ ◦ First it is easily checked that if ( X , λ) is a finite eigenpair, then ( AX )/ X = (λ X ◦θ)/ X is a solution of the ratio equation (7.113). Under the foregoing assumptions, this equa◦ tion has a unique solution . Since the equation = ( Ax )/ x has a unique solution x such that x 1 = e (Lemma 7.85), the eigenvector X is uniquely defined if we decide ◦ that X 1 = e. The same property holds for λ since λ( X ◦θ)/ X = . Property (7.131) follows from the relation x (k + 1; X ) = λ◦θ k . . . λ X ◦θ k+1 , which implies that 1/ k k 1/ k ( x 1 (k + 1; X )) = λ◦θ h . h =0 The result follows immediately from the pointwise ergodic theorem and from Corollary 7.31. 7.6 Stationary Marking of Stochastic Event Graphs In this section we consider a stochastic event graph, with all its transitions recycled, and where the places in the recycling all have positive holding times. We return to the notation of conventional algebra. Definition 7.100 (Stable place) A place of the event graph is said to be stable if the number of tokens in this place at time t (the marking at time t ), converges weakly to a finite random variable when time goes to ∞. The event graph is said to be stable if all the places are stable. The aim of this section is to determine the conditions under which the event graph is stable and to construct the stationary regime of the marking process, under the usual stationarity and ergodicity on the holding times. Remark 7.101 Let P 0 be the subset of places connecting two transitions belonging to the same strongly connected subgraph, and P 1 be the subset of places connecting transitions which belong to no circuit. The marking of a place in P 0 is bounded, and the only places which can possibly have an infinite marking when time goes to ∞ are those of P 1 (see Chapter 2). Pick some place pi in P and let q j = π( pi ), and ql = σ ( pi ). Assume that there exists an initial condition x 0 such that x j (0; x 0 ) = 0, and such that the ratios x (k + ◦ 1; x 0 )/ x (k ; x 0 ) are stationary and ergodic (the conditions for such an initial condition to exist are given in Theorems 7.67 and 7.96, respectively, for the nonautonomous and the autonomous case). Since the sequence def ◦ b (k ) = x (k ; x 0 )/ x j (k ; x 0 ) , k≥0 , 7.6. Stationary Marking of Stochastic Event Graphs 367 is stationary and ergodic, it can be continued to a bi-infinite stationary and ergodic def sequence by the relation b (k ) = b ◦θ k , k ∈ Z, where b = b (0). A similar continuation also holds for the sequence def def d (k ) = δ j j (k ; x 0 ) = d ◦θ k , k≥0 . Because of the assumption that q j is recycled and with positive holding times in the recycling, d > 0. Therefore, we can consider the sequence {d (k )} as the stationary inter-event times of a point process defined on ( , F, P, θ). Definition 7.102 We define N as the marked point process on ( , F, P, θ) with interevent times sequence {d (k )}k∈Z and with the R|Q| -valued mark sequence (bh (k )) qh ∈Q k ∈Z . Namely, the k-th point of N is def t (k ) = x j (k ; x 0 ) −1 h=k −d (k ) for k ≥ 0 ; for k < 0 , and its mark is {bh (k ), qh ∈ Q}. The interarrival times and the marks being θ -stationary, this point process is stationary (in its so-called Palm version). Owing to our assumptions, N has a finite intensity and no double points. Let T (k ) = (T1 (k ), . . . , Tn (k )), where n = |Q|, be the sequence def Th (k ) = t (k ) + bh (k ) , and let Ni− qh ∈ Q , k∈Z , be the random variable Ni− = 1{ Tl(k+µi )>0} , (7.132) k ≤0 where ql = σ ( pi ). This variable is a.s. finite. Indeed, Tl (k ) satisfies the relations lim k →∞ Tl (k ) =c>0 , k where c is a positive constant. Therefore {Tl (k )} is an increasing sequence such that limk→−∞ Tl (k ) = −∞ a.s. Hence there exists a finite integer-valued random variable H such that Tl (k ) ≤ 0 for all k ≤ − H . Theorem 7.103 Under the assumptions of Theorem 7.67 (respectively 7.96), if a(n) < u, for all n = 1, . . . , N , (respectively a(n) < a(1), for all n = 2, . . . , N ), where N denotes the number of m.s.c.s.’s of the event graph, then the event graph is stable whatever the initial condition, and the marking in place pi at arrival epochs converges weakly to the random variable Ni− . Conversely, if n 0 is the first n = 1, . . . , N , such that a{n} > u (respectively the first n = 2, . . . , N , such that a(n) > a(n) ), then the places connecting the transitions of Q m ∪ I (respectively Q m ), m < n 0 , to transitions of Q n0 are all unstable whatever the initial condition. 368 Synchronization and Linearity Proof Let Ni− (k ) be the number of tokens in pi just after time x j (k ), k ≥ 1, where q j = π( pi ). From (2.42) we obtain that Ni− (k ) = k + µi k + µi 1{ xl (h)> x j (k)} = h =1 1{ xl (k+µi −h)> x j (k)} . (7.133) h =0 We first prove the last assertion of the theorem. Assume that pi is the place connecting a transition of Q m ∪ I (respectively Q m ) to a transition of Q n0 . Owing to the property that xl (k ) − x j (k ) = a (n 0 ) − u > 0 , lim k k xl (k ) − x j (k ) respectively lim = a (n 0 ) − a (1 ) > 0 , k k and to the increasingness of the sequences {xl (k )} and {x j (k )}, we obtain that, for all H , there exits K such that, for all k ≥ K and h = 1, . . . , H , xl (k − h ) − x j (k ) ≥ 0. It follows immediately from this that Ni− (k ) ≥ H for k ≥ K . Therefore, Ni− (k ) tends to ∞ a.s. We now prove the first part. We know that the ratios of x h (k ), qh ∈ Q, couple with their stationary regime in a finite random time K . This implies that for all fixed h , the sequence {xl (k + µi − h ) − x j (k )} couples with a stationary process. More precisely, for all k ≥ K + h , and h > µi , xl (k + µi − h ) − x j (k ) = −ρl (µi − h )◦θ k , where k d ◦θ n − bl ◦θ −k = T j (0) − Tl (k ) , def ρl (k ) = k<0 , n=−1 in view of the uniqueness of the stationary regimes of the ratios. Define H = inf k | k ≥ K , xl (h ) − x j (k ) < 0 , ∀h = 1, . . . , K . This H is a.s. finite since K is finite and x j (k ) tends to ∞ a.s. Therefore, Ni− (k ) = 1{ x (k+µi −h)l − x j (k)>0} = 1 ≤h ≤k − K 1{−ρl (µi −h)◦θ k >0} , 1 ≤h ≤k − K for all k ≥ H . On the other hand, Ni− ◦θ k = 1{ Tl(k+µi −h)− T j (k)>0} = 0 ≤h 1{−ρl (µi −h)◦θ k >0} . 0 ≤h Since T j (k ) tends to ∞ as k goes to ∞, we obtain that there exists an L such that 1{ Tl(k+µi −h)− T j (k)>0} = 0 , k −h ≤ K for all k ≥ L . Therefore, Ni− (k ) = Ni− ◦θ k for k ≥ max( H , L ), and the stationary regime of the marking process is reached with coupling, regardless of the initial condition. 7.7. Appendix on Ergodic Theorems 369 Remark 7.104 This stationary regime is unique, owing to the uniqueness of the stationary regime of the ratio process. Remark 7.105 The preceding construction gives the Palm probability of the number of tokens in pi at arrival epochs. The stationary distribution of the number of tokens in pi in ‘continuous time’ is then obtained via the Palm inversion formula (see [5, p. 17]). 7.7 Appendix on Ergodic Theorems The aim of this section is to state in a concise form a few basic ergodic theorems (in conventional algebra) which are either used or referred to in this chapter. The basic data are a probability space ( , F, P) on which a shift operator θ is defined (see Definition 7.2). This shift is assumed to be stationary and ergodic. Theorem 7.106 (Kingman’s subadditive ergodic theorem) Let ξm ,n , m > n ∈ Z be an integrable random process on ( , F, P) such that ξm ,m + p = ξ0, p ◦θ m , ∀m ∈ Z , ∀ p > 0 (stationarity) , and ξm ,n ≤ ξm , p + ξ p ,n , ∀m < p < n (subadditivity) . Assume in addition that there exists a positive constant A such that E[ξ0, p ] ≥ − Ap, for all p > 0. Then there exists a constant γ such that the following two equations hold: ξ0 , p E [ξ 0 , p ] lim = γ a.s., lim =γ . p →∞ p p →∞ p For the proof, see [75], [76]. Theorem 7.107 (Borovkov’s renovating events theorem) Let {u (k )} be a θ -stationary Rn -valued sequence of random variables defined on ( , F, P). Let {x (k )} be the R K -valued sequence of random variables defined by the recurrence relation x (k + 1) = a (x (k ), u (k )) , k≥0 , (7.134) where a is a continuous mapping R K × Rn → R K , and by the random initial condition x (0). The event A(k ) ∈ F is said to be a renovating event of length m ≥ 1 and of associated function φ : Rm n → R K if, on A(k ), the relation x (k + m ) = φ (u (k ), . . . , u (k + m − 1)) holds. If the random process x (k ) admits a sequence {A(k )} of renovating events, all of length m and associated function φ , such that A(k ) = A(0)◦θ k , ∀k ≥ 0, and P[A(0)] > 0, then, the sequence {x (k )◦ θ −k } converges a.s. to a finite random variable z, which does not depend upon the initial condition x (0). The sequence {z ◦θ k } is a finite solution of (7.134), and the sequence {x (k )} couples with it in finite time for all finite initial conditions. 370 Synchronization and Linearity For the proof, see [26], [6]. For any matrix A, let | A| denote its operator norm, namely | A| = sup x =1 Ax , where x denotes the Euclidean norm of vector x . Theorem 7.108 (Oseledec ’s multiplicative ergodic theorem) Let { A(k )} be a sequence ¸ of n × n random matrices with nonnegative entries, defined on the probability space ( , F, P). Assume that A(k ) = A(0)◦ θ k , for all k ∈ Z, and that E max (log(| A(0)|), 0) < ∞ . Then there exists a constant γ (the maximal Lyapunov exponent of the sequence) such that 1 lim log (| A(k ) . . . A(1)|) = γ , a .s . k →∞ k In addition, there exists a random eigenspace V (ω) of dimension d constant, d ≤ n, such that A(1)V = V ◦θ and such that for all random vectors x in V , lim k →∞ 1 log ( A(k ) . . . A(1)x ) = γ , k a .s . Whenever d = 1, there exists an eigenpair {λ, X } such that A(0) X = λ X ◦θ and E [λ ] = γ . In fact, Oseledec’s Theorem gives the existence of other eigenvalues as well. Our state¸ ment of this theorem is limited to the maximal eigenvalue and its associated eigenspace (see [106], [45]). 7.8 Notes The preliminary example of §7.2 was first analyzed by R.M. Loynes in 1962 [87]. The probabilistic formalism introduced in §7.2 is that developed for queues by P. Br´ maud and one of e the coauthors in [6]. The existence of Lyapunov exponents for products of random matrices of Rmax was first proved by J.E. Cohen, in the case of matrices with non ε entries, in Subadditivity, generalized product of random matrices and operations research, SIAM Review, volume 30, number 1, pages 69–86, 1988. The extension of this result to reducible matrices, and the sections on the relationship between stochastic event graphs and Rmax -multiplicative ergodic theory (§7.3–7.6) are mainly based on [11], [13]. As to the writing of this book, this approach provides a more or less systematic way for analyzing nonautonomous systems. The situation is somewhat less satisfactory in the autonomous case: in particular, only the case when the eigenspace associated with the maximal exponent has dimension 1 was considered. This practically covers cases with ‘sufficiently random’ entries of A, as shown by the results of §7.5; however, we know from the analysis of Chapter 3 that an eigenspace of dimension 1 is rarely sufficient to handle the case of deterministic systems. Autonomous deterministic systems can fortunately be addressed via the spectral methods of Chapter 3. However, some systems are neither deterministic nor random 7.8. Notes 371 enough to satisfy the conditions of §7.4. Filling up this ‘theoretical gap’ between purely deterministic and sufficiently random systems is clearly tantamount to understanding the structure of the eigenspace associated with the maximal exponent when this eigenspace is of dimension greater than 1. 372 Synchronization and Linearity Chapter 8 Computational Issues in Stochastic Event Graphs 8.1 Introduction This chapter gathers miscellaneous results pertaining to the computation of the cycle times and the stationary regimes of stochastic event graphs. The existence and uniqueness of these two quantities are discussed in Chapter 7: the cycle time of a stochastic event graph is the maximal Lyapunov exponent associated with the matrices A(k ) of its standard equation, and stationary regimes correspond to stochastic eigenpairs of A. Section 8.2 focuses on monotonicity properties of daters and counters considered as functions of the data (e.g. firing and holding times, initial marking, topology of the graph, etc.). These results lead to the derivation of a lower bound for the cycle time, which is based on the results of Chapter 3 concerning the deterministic case. It is also shown that the throughput is a concave function of the initial marking, provided that the firing and holding times satisfy appropriate statistical properties. Section 8.3 is concerned with the relationship between stochastic event graphs and a class of age-dependent branching processes. Large deviation techniques are used to provide an estimate for the cycle time, which is also shown to be an upper bound. The last section contains miscellaneous computational results which can be obtained in the Markovian case. Whenever the firing and the holding times have discrete distribution functions with finite support, simple sufficient conditions for the ratio process to have a finite state space Markov chain structure are given. In the continuous and infinite support case, partial results on functional equations satisfied by the stationary distribution functions are provided. These results are then used for computing the distribution of the stationary regime. The sections of this chapter can be read (almost) independently. Each section has its own prerequisites: basic properties of stochastic orders for §8.2 (see [123]); notions of branching processes and of large deviations ([3]) in §8.3; elementary Markov chain theory in §8.4. Throughout the whole chapter, the scalar dioid of reference is Rmax , unless otherwise specified. 373 374 Synchronization and Linearity 8.2 Monotonicity Properties 8.2.1 Notation for Stochastic Ordering Let x and x † be Rn -valued random variables. Three classical stochastic ordering relations between x and x † will be considered in this section. Notation 8.1 Stochastic ordering ≤st : x ≤st x † if E[ f (x )] ≤ E[ f (x † )], for all nondecreasing functions f : Rn → R. Convex ordering ≤cx : x ≤cx x † if E[ f (x )] ≤ E[ f (x † )], for all convex functions f : Rn → R . Increasing convex ordering ≤icx : x ≤icx x † if E[ f (x )] ≤ E[ f (x † )], for all convex and nondecreasing functions f : Rn → R. Let x = {x (1), . . . , x (k ), . . . } (respectively x (·) = x (t ), t ∈ R+ ) and x † = {x † (1), . . . , x † (k ), . . . } (respectively x † (·) = {x † (t )}t ∈R+ ) be two Rn -valued stochastic sequences (respectively processes) defined on the probability space ( , F, P). The sequence x † is said to dominate x (respectively the process x † (·) dominates x (·)) for one of the above ordering relations, say ≤st , which is denoted x ≤st x † (respectively x (·) ≤st x † (·)), if all corresponding finite dimensional distributions compare for this ordering. For basic properties of these orderings, see §8.5. 8.2.2 Monotonicity Table for Stochastic Event Graphs The basic model of this section is a live autonomous event graph, where all transitions are assumed to be recycled. The nonautonomous case leads to similar results and will not be considered in this chapter. The notation and basic definitions concerning stochastic event graphs are those of Chapter 2 and Chapter 7. The following concise notation will be used: Data Firing times: β(k ) denotes the vector β j (k ), j = 1, . . . , |Q|, and β the sequence {β(k )}. Holding times: α(k ) denotes the vector αi (k ), i = 1, . . . , |P |, and α the sequence {α(k )}. Timing sequence: η(k ) denotes the vector (β(k ), α(k )) and η the sequence {η(k )}. Initial marking: µ denotes the vector µi , i = 1 . . . , |P |. 8.2. Monotonicity Properties 375 State Variables With each dater sequence {x j (k )}k∈N we associate the N-valued counter function x j (t ), t ∈ R+ , defined by the relation x j (t ) = sup{k | x j (k ) ≤ t } . Daters: x (k ) denotes the vector x j (k ), j = 1, . . . |Q|, and x the sequence {x (k )}. Counters: x (t ) denotes the vector x j (t ), j = 1, . . . |Q|, t ∈ R, and x (·) the function x (t ). The aim of the following sections is to prove various monotonicity properties of the state variables and of their asymptotic characteristics considered as functions of the data. By asymptotic characteristics, we mean the cycle time a of the event graph and its (conventional) inverse τ , the throughput. A typical question is as follows: if one replaces one of the data, say µ or β , by µ† or β † respectively, where the new data are greater than the initial ones for some partial ordering, what result do we obtain on the various state variables? The main properties along these lines are summarized in Table 8.1. The reader should refer to the following subsections in order to obtain the specific assumptions under which the reported monotonicity properties hold. These assumptions are not always the most general ones under which these properties hold. For instance, we have tried to avoid the intricate issues associated with the initial conditions by choosing assumptions leading to short proofs, although most of the properties of the table extend to more general initial conditions. Table 8.1: Monotonicity of the state variables Data Variation Daters of data Counters Throughput time † a ≥ a† x (·) ≤ x † (·) τ ≤ τ† x ≤ x† a ≤ a† x (·) ≥ x † (·) τ ≥ τ† η ≤st η † x ≤st x † a ≤ a† x (·) ≥st x † (·) τ ≥ τ† η ≤cx η † x ≤icx x † a ≤ a† τ ≥ τ† η ≤icx η † µ † Cycle x ≤icx x † a ≤ a† τ ≥ τ† µ≤µ x ≥x P ⊂ P† G Q ⊂ Q† E ⊂ E† η 8.2.3 Properties of Daters 8.2.3.1 Stochastic Monotonicity Monotonicity with respect to the Timing Sequence In this paragraph, we assume that the entrance times are all equal to e (see Remark 2.75). Let η † be another timing sequence associated with the same event graph (i.e. the topology and the initial 376 Synchronization and Linearity marking of the event graph are unchanged and each timing variable is replaced by the corresponding dagger variable) and let x † be the resulting daters. We first compare the sequences x † and x , whenever the timing sequences can be compared for some integral ordering. Theorem 8.2 If η ≤st η † , then x ≤st x † . Proof The initial condition x (0) is untouched by the transformation of the variables, in view of the definition given in Remark 2.75. The matrices A(k ) in (2.31) are nondecreasing functions of the variables α, β (see Equations (2.15), (2.28) and the definition of A(k ) ). Therefore, from (8.40), we obtain that {x (0), A (0), A (1), . . . } ≤st {x † (0), A† (0), A† (1), . . . } . We now use the canonical representation (2.31) to represent the evolution equations of interest as recursions of the type (8.43), where the mapping a is coordinatewise nondecreasing and such that the sequences defined in (8.44) satisfy the relation {ξ(k )} ≤st {ξ † (k )}. The proof is then concluded from Theorem 8.60. In the next theorem, the timing variables are assumed to be integrable. Theorem 8.3 If η ≤icx η † , then x ≤icx x † . Proof The entries of matrices A (k ), k ≥ 1, are nondecreasing and convex functions of the variables α, β . So, the assumption η ≤icx η † and (8.41) imply that {x (0), A (0), A (1), . . . } ≤icx {x † (0), A† (0), A† (1), . . . } . Since the mapping a of the preceding proof is nondecreasing and convex, the result immediately follows from Theorem 8.62. Remark 8.4 Assume that the holding and firing times are all integrable. Then, it follows from (8.42) that the ‘deterministic version’ of the event graph , with firing times β j (k ) = E β j (k ) and holding times α i (k ) = E [αi (k )], leads to a sequence of daters { x (k )} which is a lower bound of {x (k )} in the ≤icx sense. In particular, since the daters are integrable under these assumptions, it follows from Lemma 8.59 that for all k ≥ 1 and q j ∈ Q, E x j (k ) ≥ x j (k ). Example 8.5 For example, by applying Theorem 8.3 to the cyclic queuing network with finite buffers of Example 7.92, one obtains that the departure times from the queues are ≤icx -nondecreasing functions of the service times. Monotonicity with respect to the Initial Marking Here, the discussion will be limited to the case when all initial lag times are equal to e (which is a compatible initial condition). It will be convenient to use (2.19) as the basic evolution equation. Under our assumption on the initial lag times, this equation reads x j (k ) = βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ⊕ e , {i ∈π q ( j )|k >µi } (8.1) 8.2. Monotonicity Properties 377 for k ≥ 1, where x j (k ) = ε if k ≤ 0. Consider the same event graph as above, but with † the initial marking µi , pi ∈ P , in place of µi , and let {x † (k )} denote the corresponding dater sequence. Theorem 8.6 Under the assumption that all the initial lag times are equal to e, if † µi ≤ µi , ∀ pi ∈ P , then the coordinatewise inequality x ≥ x † holds. Proof Owing to the assumption that all transitions are recycled and to the preceding convention concerning the continuation of x (k ) for k ≤ 0, it is easy to check that x j (k ) ≥ β j (k − 1)x j (k − 1) , ∀ j = 1 , . . . , n, k∈Z . Therefore x j (k ) ≥ x j (k − 1) , ∀ j = 1 , . . . , n, k∈Z , and β j (k )x j (k ) ≥ β j (k − 1)x j (k − 1) , ∀ j = 1 , . . . , n, k∈Z . (8.2) † We prove that x j (k ) ≤ x j (k ) for all j = 1, . . . , n, and k ≥ 1. The proof is by induction on ( j , k ). Since the event graph is assumed to be live, the numbering of the transitions can be chosen in such a way that for all ( j , k ), j = 1, . . . , n, k ≥ 1, the variables x i (l ) which are found at the right-hand side of (8.1) are always such that either l < k or l = k , but i < j . Therefore, there exists a way of numbering the transitions such that the daters x j (k ) can be computed recursively in the order x 1 (1), x 2 (1), . . . , x n (1), x 1 (2), . . . , x n (2), . . . , x 1 (k ), . . . , x n (k ), . . . . Assume that the property holds up to ( j , k ) excluding this point (it holds for (1, 1) † since with our assumptions, we necessarily have x 1 (1) = x 1 (1) = e). Then, we have † x j (k ) † = † † βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ⊕ e † {i ∈π q ( j )|k >µi } † βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ⊕ e ≤ † {i ∈π q ( j )|k >µi } βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ⊕ e ≤ † {i ∈π q ( j )|k >µi } βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ⊕ e ≤ {i ∈π q ( j )|k >µi } = x j (k ) , where we successively used the monotonicity property (8.2), the induction assumption, and finally the fact that we sum up with respect to a larger set. Example 8.7 As an application of this result, one obtains the monotonicity of departure times in closed cyclic networks with blocking, as a function of the population and the buffer sizes. 378 Synchronization and Linearity Stochastic Monotonicity with respect to the Topology Consider two event graphs with associated graphs = ((P ∪ Q), E ) and † = ((P † ∪ Q† ), E † ), where P ⊆ P† , Q ⊆ Q† , E ⊆ E† . (8.3) The event graph † is such that the initial marking, the initial lag times, the firing and holding times of those places and transitions which belong both to and † are the same. Let x † denote the sequence of daters of † , and [x † ] the restriction of x † to the set of transitions which belong to Q and Q † . Theorem 8.8 Under the foregoing assumptions, x ≤ x † coordinatewise. Proof The proof is based on Equation (2.19) and is again by induction on ( j , k ). † Assume that the property x i (l ) ≤ x i (l ) holds up to ( j , k ) excluding this point. The point (1, 1) is not necessarily the first one to compute in the total order associated with † , but we are sure that all the places present in preceding q1 are present in † too, and with the same number of initial tokens and the same lag times, so that the property necessarily holds for (1,1). Then, denoting π † the predecessor function in †, we obtain the following relation for all q j ∈ Q: † x j (k ) † = † † † † βπ †, p (i ) (k − µi )αi (k )x π †, p (i ) (k − µi ) † {i ∈π †,q ( j )|k >µi } † wi (k ) ⊕ † {i ∈π †,q ( j )|k ≤µi } † † † † † βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) ≥ † {i ∈π q ( j )|k >µi } † wi (k ) ⊕ † {i ∈π q ( j )|k ≤µi } † βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) = {i ∈π q ( j )|k >µi } ⊕ wi (k ) {i ∈π q ( j )|k ≤µi } ≥ βπ p (i ) (k − µi )αi (k )x π p (i ) (k − µi ) {i ∈π q ( j )|k >µ i} ⊕ wi (k ) {i ∈π q ( j )|k ≤µi } = x j (k ) , where we successively used the assumptions (8.3), the assumption that the initial condition and the firing and holding times of the nodes of ∩ † are the same, and finally the induction assumption. 8.2. Monotonicity Properties 8.2.4 379 Properties of Counters In this subsection we assume that tokens incur no holding times in places (so that the only timing variables come from the firing times of the transitions) and that there is at most one place between two subsequent transitions qi and q j , which will be denoted ( j , i ), with initial marking µ( j , i ) ∈ N. It will also be assumed that the initial lag times are all equal to e (which leads to a compatible initial condition). Under these assumptions, the evolution equation (2.19) in Chapter 2 reads x j (k ) = βi (k − µ( j , i )) x i (k − µ( j , i )) ⊕ e , k≥1 , i ∈ p( j ) def with initial condition x j (k ) = ε for all k ≤ 0. In this relation, p ( j ) = π p (π q ( j )). 8.2.4.1 Evolution Equations Let x j (t ) (respectively y j (t )) denote the number of firings which transition j initiated (respectively completed) by time t , t ∈ R+ . Without loss of generality, we assume that both x j (t ) and y j (t ) are right continuous. Remark 8.9 The mappings x j (k ) : N → R and x j (t ) : R → N are related by the formulae x j (t ) = sup{k | x j (k ) ≤ t } , (8.4) x j (k ) = inf t | x j (t ) ≥ k (8.5) . When using the definitions of §4.4, we see that the isotone and l.s.c. mapping x j (k ) : N → R admits the u.s.c. mapping x j (t ) : R → N as its residual; similarly, the isotone and u.s.c. mapping x j (t ) : R → N admits the l.s.c. mapping x j (k ) : N → R as its dual residual. In this subsection it is assumed that the firing times are all strictly positive. Theorem 8.10 The random variables x j (t ) and y j (t ), 1 ≤ j ≤ n, t ≥ 0, satisfy the following evolution equations: x j (t ) = y j (t ) min ( yi (t ) + µ( j , i )) , = i ∈ p( j ) t 1{β j (x j (u))≤t −u} x j (du ) , (8.6) (8.7) 0 where, for all j = 1, . . . , n, y j (0) = 0 and x j (t ) = 0, ∀t < 0. Proof By time t , transition j initiated exactly as many firings as the minimum over i ∈ p( j ) of the number of tokens which entered place ( j , i ) by time t (including the initial tokens). Since a place is preceded by exactly one transition, the number of tokens which entered place ( j , i ) by t equals µ( j , i ) plus the number of firings which 380 Synchronization and Linearity transition i completed by time t . The Stieljes integral in (8.7) is a compact way of writing the sum ∞ 1 { x j ( k ) + β j ( k ) ≤t } . k =0 In the deterministic case, (8.7) takes the simpler form y j (t ) = x j (t − β j ), which leads back to the following familiar result. Corollary 8.11 If the firing times are deterministic (i.e. β j (k ) = β j ), then (8.6) and (8.7) reduce to the Rmax equation x j (t ) = µ( j , i )x i (t − βi ) , (8.8) i ∈ p( j ) where x i (t ) = e for t < 0. 8.2.4.2 Stochastic Monotonicity From (8.4), one obtains that for any fixed n -tuple t1 < . . . < tn in R+ , the vector (x (t1 ), . . . , x (tn )) is a nonincreasing function of x . Therefore, each ≤st -monotonicity property of the sequence x with respect to ≤st yields a dual stochastic monotonicity property of x (·) (see Table 8.1). 8.2.4.3 Concavity with respect to the Initial Marking Throughout this subsection it is assumed that the sequences {β j (k )}k are mutually independent in j . Theorem 8.12 If the random variables β j (k ) are i.i.d., with exponential distribution of parameter λ j , then, for any t ≥ 0, and any 1 ≤ j ≤ n, x j (t ) and y j (t ) are stochastically increasing and concave (see Definition 8.63) in the initial marking µ ∈ N|E | . Proof Let {b j (n )}∞ 1 , 1 ≤ j ≤ n, be mutually independent sequences of i.i.d. n= random variables where b j (n ) is exponentially distributed with parameter λ j . Let t0 , t1, t2 , . . . , tn , . . . be the times defined by t0 = 0 and tn = tn−1 + min b j (n ) , 1 ≤ j ≤n n≥1 , and let χ j (n ) be the indicator function χ j (n ) = 1{tn =tn−1 +b j (n)} . Let † be an event graph with the same topology and initial marking as , and with the following dynamics (which differs from the dynamics defined in Chapter 2): in † , for + + all transitions j enabled at time tn , the residual firing time of j at time tn (namely the time which elapses between tn and the completion of the ongoing firing of transition j ) is resampled and taken equal to b j (n + 1). 8.2. Monotonicity Properties 381 • If χ j (n + 1) = 1 for a transition j which belongs to the set of transitions enabled − + at time tn in † , then transition j is fired at time tn+1 , which defines a new set of + enabled transitions at time tn+1 by the usual token production and consumption rule. + • If χ j (n + 1) = 1 for a transition j which is not enabled at time tn in happens as far as the marking process is concerned. † , nothing + For each transition j which is enabled at tn+1 (either still ongoing or newly enabled), one resamples a new residual firing time equal to b j (n + 2), etc. † For † defined above, it is easily checked that the variables x j (t ) (respectively † y j (t )) representing the number of firings of transition j initiated (respectively completed) by time t , satisfy the following equations: † = 0, † = y j (tn ) = Y j (n ) , = def † x j (tn ) = y j (0) y j (t ) † x j (t ) † def X j (n ) , t n ≤ t < t n +1 , t n ≤ t < t n +1 , (8.9) and X j (n ) = (Yi (n )µ(i, j )) , (8.10) (Yi (n )χi (n + 1)) ∧ X i (n ) , (8.11) i ∈ p( j ) Yi (n + 1) = for all n = 0, 1, 2, . . . . Equation (8.10) is obtained in the same way as (8.6). In order to obtain Equation (8.11) observe that Yi (n + 1) ≤ Yi (n ) + χi (n + 1) + (equality holds if i is enabled at time tn ), and + i enabled at tn + i not enabled at tn ⇔ ⇔ Yi (n ) = X i (n ) − 1 ; Yi (n ) = X i (n ) (because of the recycling of transition i , there is at most one uncompleted firing initiated on this transition). Equation (8.11) is obtained from the following observation: + either i is not enabled at tn , and the smaller term in the right-hand side of (8.11) is X i (n ), or i is enabled and Yi (n + 1) = Yi (n ) + χi (n + 1). It is now immediate to prove by induction that, for all realizations of the random variables b j (k ), the state variables X j (n ) and Y j (n ), 1 ≤ j ≤ n, n ≥ 0, are nonde† † creasing and concave functions of µ. The variables x j (t ) and y j (t ) satisfy the same property in view of (8.9) and of the fact that the variables b (n ) do not depend upon µ. Thus, if µ and µ are initial markings such that ν = ρµ + (1 − ρ)µ is in N|P | for some real parameter ρ ∈ (0, 1), we have, with obvious notations, † † † x j (t ; ν) ≥ ρ x j (t ; µ) + (1 − ρ)x j (t ; µ ) , 382 Synchronization and Linearity for all ω, t and j , with a similar result for y . † Owing to the memoryless property of the exponential distribution, the counters x j (t ) † and y j (t ), 1 ≤ j ≤ n, are equal in distribution to x j (t ) and y j (t ), respectively. Therefore, under appropriate integrability assumptions, E x j (t ; ν) ≥ ρ E x j (t ; µ) + (1 − ρ)E x j (t ; µ ) , so that x j (t ) (and y j (t )) are stochastically increasing and concave in µ indeed. We now define the class of PERT-exponential distribution functions, which will allow us to generalize Theorem 8.12. Definition 8.13 (Stochastic PERT graph) A stochastic PERT graph is a connected, directed, acyclic and weighted graph with a single source node and a single sink node, where the weights are random variables associated with nodes. There is no loss of generality in the assumption that only nodes are weighted; one can equivalently weight arcs or both arcs and nodes. In any weighted directed acyclic graph, the path with maximal weight is called the critical path. Definition 8.14 (PERT-exponential distribution function) The distribution function of a random variable X is of PERT-exponential type if X can be expressed as the weight of the critical path of a stochastic PERT graph G where the weights of the nodes are mutually independent random variables with exponential distribution functions. Notation 8.15 Such a distribution function will be denoted F (G , λ), where G is the underlying graph and λ = (λ1 , . . . , λ|G | ), where λi is the parameter of the exponential distribution associated with node i in G (we will assume that the source and sink nodes are numbered 1 and |G | respectively). Definition 8.16 (Log-concave functions) A function f : Rn → R+ is log-concave if for all x , y ∈ Rn and 0 < ρ < 1, the inequality f (ρ x + (1 − ρ) y ) ≥ f ρ (x ) f (1−ρ)( y ) holds. Theorem 8.17 PERT-exponential distribution functions are log-concave. For a proof, see [10]. Theorem 8.18 If the firing times of a stochastic event graph are all mutually independent, and if for all transitions j , 1 ≤ j ≤ n, the firing times β j (k ) are i.i.d. random variables with PERT-exponential distribution function, then, for all t ≥ 0, and all 1 ≤ j ≤ n, x j (t ) and y j (t ) are stochastically increasing and concave in the initial marking µ. Proof Let F (G j , λ j ) be the distribution function associated with transition q j of j def . Let n = |G |. For all j , consider the stochastic event graph j defined from the PERT graph G j j as follows: with each node i of G j , we associate a transition qi in j ; similarly, to j 8.2. Monotonicity Properties 383 each arc in G j corresponds an arc in j and a place on this arc. The initial marking and j the holding times of each of these places are zero. The firing times of transition qi are j i.i.d. random variables with exponential distribution function of parameter λi . We now construct a stochastic event graph † which is defined from and j , j = 1, . . . , n, as follows: for all j , 1 ≤ j ≤ n, we ‘replace’ transition q j in by the event graph j ; all the places of are kept in † together with their initial marking; we j j take π † (q1 ) equal to the set π(q j ) and σ † (qn j ) equal to σ (q j ), for all j = 1, . . . , n; j j finally, we add a new feedback arc from transition qn j to transition q1 , for all j ; the place on this arc is assigned one token in its initial marking and zero holding and lag times. This transformation is depicted in Figure 8.1. If the number of firings j † j q1 qj j q1 j qn j j qn j Figure 8.1: Transformation of an event graph with PERT-exponential firing times j of transition q1 initiated (respectively completed) by time t in (respectively † y j ,1(t )), † † † is denoted x j ,1(t ) then an immediate coupling argument shows that x j (t ) =st x j ,1(t ) , † y j (t ) =st y j ,1(t ) , ∀t ≥ 0 , ∀1 ≤ j ≤ n , where the symbol =st denotes equivalence in law. Let µ† denote the initial marking of †. Applying Theorem 8.12 to † implies that for all t ≥ 0, and all 1 ≤ j ≤ n, x †,1(t ) and y †,1(t ) are stochastically increasing j j † and concave in the initial marking µ† ∈ N|Q | . Consequently, x j (t ) and y j (t ) are stochastically increasing and concave in the initial marking µ ∈ N|Q| , t ≥ 0, 1 ≤ j ≤ n. Remark 8.19 It is easy to see that PERT-exponential distribution functions include Erlang distribution functions as a special case. Therefore one can approximate step functions with PERT-exponential distributions. Theorem 8.18 can be shown to hold when some of the firing times are deterministic, by using some adequate limiting argument. In the particular case when all firing times are deterministic and integer-valued, one can also prove the concavity of the counters by an immediate induction argument based on Formula (8.8). 8.2.5 Properties of Cycle Times Throughout this subsection we suppose that the sequences of holding and firing times satisfy the joint stationary and ergodic assumptions of §7.3 and are integrable. We also 384 Synchronization and Linearity assume that the event graph under consideration is strongly connected and its cycle time (see §7.3.4) is denoted a, so that the following a.s. limits hold: lim (x (k ))1/ k k →∞ ⊕ = lim E |x (k )| ⊕ 1/ k k →∞ = a a.s., (8.12) provided that the initial lag times are integrable. The throughput of the event graph will def be denoted τ = a−1 . 8.2.5.1 First-Order Theorems for Counters Theorem 8.20 For a strongly connected stochastic event graph satisfying the foregoing assumptions, the following a.s. limits hold: lim |x (t )|1/ t = lim |x (t )|1/ t = lim (x j (t ))1/ t = τ ∧ ⊕ t →∞ t →∞ t →∞ a.s. (8.13) for all 1 ≤ j ≤ n. Proof For all k ≥ 1 and 1 ≤ j ≤ n, we have x j (t ) = k , for t in the interval x j (k ) ≤ t < x j (k + 1) , which implies that x j (k ) t x j (k + 1) ≤ < , k x j (t ) k for t in x j (k ) ≤ t < x j (k + 1) . When letting t or k go to ∞ in the last relation and when using (8.12), we obtain lim (x j (t ))1/ t = τ t →∞ a.s. The proofs for |x (t )|∧ and |x (k )| ⊕ follow immediately. Remark 8.21 In the particular case of a deterministic event graph, a is given by the formula |ζ |w a = max , ζ |ζ |t where ζ ranges over the set of circuits of , |ζ |w is the sum of the firing and holding times in circuit ζ and |ζ |t is the number of tokens in the initial marking of ζ (see (3.7)). The remainder of this section focuses on the derivation of monotonicity, convexity properties of cycle time and throughput. 8.2. Monotonicity Properties 8.2.5.2 385 Bounds on Cycle Times The stochastic comparison properties obtained in §8.2.3 admit the following corollaries. Corollary 8.22 If both sequences {α(k ), β(k )} and α † (k ), β † (k ) satisfy the above stationarity, ergodicity and integrability assumptions, and the assumptions of Theorem 8.2 (respectively Theorem 8.3), then the associated cycle times a and a† are such that a ≤ a† . Proof If the initial condition is integrable, the relation E |x (k )| ⊕ ≤ E x † (k ) ⊕ , holds for all k ≥ 1, as a direct consequence of Theorem 8.2 (respectively Theorem 8.3) because the function x → f (x ) = x is nondecreasing (and convex). Dividing this inequality by k and letting k go to ∞ yield the result in view of (8.12). Remark 8.23 The preceding result extends immediately to the non strongly connected case. The observation which was made in Remark 8.4 allows one to provide a general lower bound for the cycle time a as shown by the following corollary. Corollary 8.24 Under the assumptions of Corollary 7.31, the cycle time a of the stochastic event graph satisfies the bound a ≥ max ζ E [|ζ |w ] . |ζ |t where E [|ζ |w ] denotes the mathematical expectation of the sum of holding and firing times in the circuit ζ . The right-hand side of the last expression is also the cycle time of a deterministic event graph with the same topology as the initial one, but with the firing and holding times replaced by their mean values. Under the above statistical assumptions, one obtains the following corollary in the same way. Corollary 8.25 Under the assumptions of Theorem 8.6 (respectively Theorem 8.8) a ≥ a† (respectively a ≤ a† ), where a† is the cycle time of the event graph † which is considered in this theorem. Example 8.26 For instance, the throughput in queuing networks with blocking, open or closed, is stochastically decreasing in the service times (as a consequence of the property of line 4 in the last column of Table 8.1), and increasing in the buffer sizes and in the customer population (line 1), regardless of the statistical assumptions. 386 Synchronization and Linearity 8.2.5.3 Concavity with respect to the Initial Marking The concavity properties established in §8.2.4 for counters together with relation (8.13) readily imply a related concavity property of the throughput with respect to the initial marking. Corollary 8.27 Under the assumptions of §8.2.4.3, if the firing times are mutually independent and if in addition the firing times {β j (k )} are i.i.d. with PERT-exponential distribution functions, then the throughput τ is increasing and concave in the initial marking µ ∈ N|E | . Example 8.28 For instance, the throughput in queuing networks with blocking and with PERT-exponential service times is stochastically decreasing in the service times, and increasing and concave in the buffer sizes or the customer population. 8.2.6 Comparison of Ratios Certain ratios of the state process x (k ), and hence the marking in the corresponding places, also exhibit interesting stochastic ordering properties. Roughly speaking, the places in question are those which do not belong to a strongly connected component of the event graph, which corresponds to the set of places with a marking which is not structurally bounded. The properties of interest are established through simple examples. 8.2.6.1 Assumptions We consider an event graph with several strongly connected subgraphs. We assume that this event graph is in its canonical form, namely all places have exactly one token in the initial marking (see Remark 2.77). Let Q(n) be the set of transitions corresponding to one of the subgraphs, where n is not a source node in the reduced graph. The evolution equation (7.47) of Chapter 7 reads x (n) (k + 1) = A(n)(n) (k )x (n) (k ) ⊕ A(n)(<n) (k )x (<n) (k ) , Let def δ(k ) = def U+ (k ) = k≥1 . x (n) (k + 1) , x (<n) (k ) x (<n) (k + 1) , x (<n) (k ) def U− (k ) = x (<n) (k ) . x (<n) (k + 1) Then, the ratio process δ(k ) satisfies the equation δ(k + 1) = A(k + 1)δ(k )U− (k ) ⊕ B (k + 1)U+ (k )U− (k ) , where A(k ) = A(n)(n) (k ) and B (k ) = A(n)(<n) (k ) (see §7.4.2). k ≥0 , (8.14) 8.2. Monotonicity Properties 8.2.6.2 387 Stochastic Ordering Results Let v(k ) = ( A(k + 1), B (k + 1), U− (k )) and y (k ) = δ(k ). Equation (8.14) can be rewritten as y (k + 1) = a ( y (k ), v(k )). Lemma 8.29 The mapping a (·) satisfies the assumptions of Theorem 8.61. Proof The nondecreasingness of y → a ( y , v) is obvious. As for the convexity of the mapping ( y , v) → a ( y , v), observe that a ( y (k ), v(k )) is given as the maximum of two functions, A(k + 1)δ(k )U− (k ) and B (k + 1)U+ (k )U− (k ), so that it is sufficient to prove that each of these functions has the desired convexity property, which is clear for the first one. In order to prove the convexity property for the second function, we rewrite the entries of U− (k )U+ (k ) as ◦ (U+ (k )U− (k )) i j = U− (k )1 j /U− (k )1i , so that the entries of the second function can be rewritten as ( B (k + 1)U+ (k )U− (k )) i j = ◦ Bil (k + 1)U− (k )1 j /U− (k )1l l = U− (k )1 j . ◦ Bil (k + 1)/U− (k )1l l ◦ Since the mapping ( B (k + 1), U− (k )) → Bil (k + 1)/U− (k )1l is convex (it is linear in the conventional sense), each of these entries is the sum of two convex functions in the variables U− (k ), B (k + 1), which concludes the proof. As a direct consequence of Theorem 8.61, we obtain the following result. Corollary 8.30 If one replaces the sequence {δ(0), A(k ), B (k ), U− (k )} by a sequence † δ † (0), A† (k ), B † (k ), U− (k ) such that † {δ(0), A(k ), B (k ), U− (k )} ≥cx δ † (0), A† (k ), B † (k ), U− (k ) , then the resulting ratio sequence δ † is such that δ ≥icx δ † . Interesting applications of this property arise when the firing and holding times of the event graph are all mutually independent, so that the two sequences of random variables { A(k )} and { B (k ), U− (k )} of the preceding theorem are also mutually independent. For example, when applying the result of Corollary 8.30 to the sequences B † (k ) = E [ B (k )] , † U− (k ) = E U− (k ) , A† (k ) = A(k ) , we obtain {δ(k )} ≥icx δ † (k ) , (8.15) 388 Synchronization and Linearity provided the initial conditions which are chosen for both systems satisfy the assumption of the corollary. In particular, we obtain the relation † ◦ x j (k + 1)/ x i (k ) ≥icx δ j i (k ) , † for all q j ∈ Q(n) and qi ∈ π 2 (q j ) ∩ {Q \ Q(n) }. The random variable δ j i (k ) can be † † ◦ interpreted as the ratio x j (k + 1)/u i (k ) of a nonautonomous event graph † with the same topology as Q(n) , with a |Q(<n) |-dimensional input u † (k ) and with the evolution equation x † (k + 1) = A† (k )x † (k ) ⊕ B † (k )u † (k ) . The input process u † (k ) is determined by the relations † U− (k ) = u † (k ) + 1) u † (k and by the initial condition u † (0). This second event graph is ‘simpler’ than the previous one in that the influence of the predecessors of Q(n) is captured by the first moments of the variables U− (k ) and B (k ) only. Another example of application of Corollary 8.30 consists in choosing B † (k ) = E [ B (k )] , † U− (k ) = E U− (k ) , A† (k ) = E[ A(k )] . With such a definition, we always have v ≥cx v † , which leads to a comparison result between the ratio process of a stochastic event graph and that of a deterministic one (for which one can use the results of Chapter 3). The conditions under which a stationary solution of (8.14) and its †-counterpart exist, are given in Theorem 7.96. Let us assume that these conditions are satisfied for both systems, so that one can construct the stationary marking in the initial event graph and in † . One can then apply Little’s formula ([77]) and (8.15) to derive the following bound on the stationary marking Ni in a place pi = π(q j ), where q j ∈ Q(n) and π( pi ) ∈ Q(n) . / † E δ j ,π p (i ) E δ j ,π p (i ) E [ Ni ] = ≥ . a1 a1 In this relation, a1 is the cycle time of transitions q j and qπ p (i ) (these cycle times must coincide since the place is assumed to be stable). The real number E δ j ,π p (i ) represents the average time spent by a token in pi in the stationary regime (the time spent by the k -th token of pi in this place is x j (k ) − x π p (i ) (k − 1) indeed). Similarly, under the preceding independence assumptions, it is easily seen that the variables δ(k ) are stochastically increasing and convex in { A(n)(n) (k )}. 8.3 Event Graphs and Branching Processes This section focuses on the derivation of bounds and estimates for cycle times of strongly connected stochastic event graphs with i.i.d. firing times. We use association properties satisfied by partial sums of the firing times in order to prove that the 8.3. Event Graphs and Branching Processes 389 daters can be compared for ≤st with the last birth in a multitype branching process, the structure of which is determined from the characteristics of the event graph. Classical large deviation estimates are then used to compute the growth rate of this last birth epoch, following the method developed in [19]. This allows one to derive a computable upper bound for the cycle time, which is exemplified on tandem queuing networks with communication blocking. 8.3.1 Statistical Assumptions The assumptions are those of §8.2.4. In addition to this, we assume that the sequences {β j (k )}+∞ , j = 1, . . . , n, are mutually independent sequences of i.i.d. nonnegative k =1 and integrable random variables defined on a common probability space ( , F, P), and that the initial number of tokens in any place is at most 1 (this last assumption introduces no loss of generality, see Remark 2.77). We know from Chapter 2 that whenever the event graph under consideration is live, it is possible to rewrite its equation as x (k ) = A(k )x (k − 1) , k≥1 , (8.16) where matrix A(k ) is defined as follows: h −1 A j j (k ) = β j (k − 1) ⊗ {( j =i0 ,i1 ,i2 ... ,ih −1 ,ih = j )∈S ( j , j ,1)} βim (k ) , (8.17) m =1 with the usual convention if the set S ( j , j , 1) is empty (see Remark 2.69). It is assumed that the event graph under consideration is strongly connected, and that the initial condition x (0) is equal to e (since we are only interested in determining the cycle time, this last assumption introduces no loss of generality). The following theorem is based on the notion of association (see Definition 8.64). 8.3.2 Statistical Properties Lemma 8.31 Under the foregoing statistical assumptions, { Ai j (k ), x j (k ), i, j = 1, . . . , n, k ≥ 0} forms a set of associated random variables (see Definition 8.64). Proof The independence assumption on the firing times implies that the random variables { A(k )} are associated since they are obtained as increasing functions of associated random variables (see (8.17)). The result for x j (k ) follows immediately from (8.16) and Theorem 8.67. Lemma 8.32 For all j0 , j1, . . . , jh ∈ {1, . . . , n}, the random variables A jk+1 jk (k ), k = 0, . . . , h , are mutually independent. 390 Synchronization and Linearity Proof In view of (8.17), the random variables A jk+1 jk (k ) can all be written in the form A jk+1 jk (k ) = φk (βl1 (k ), . . . , βl pk (k ), βl1 (k − 1)) , for some indices l1 , . . . , l1 . Since the random variables {β j (k )} j ,k are mutually independent, it is enough to show that the arguments of the functions φk , k = 0, . . . , h , are disjoint to prove the property. The only situation where these sets of arguments could fail being disjoint is for two adjacent terms A jk+1 jk (k ) and A jk+2 jk+1 (k + 1) having one argument of the type β j (k ) in common. The only such argument in A jk+2 jk+1 (k + 1) is β jk+1 (k ). Assume that this is also an argument of A jk+1 jk (k ). Then, there exists a circuit crossing jk+1 and jk+1 with zero initial marking in all the places of the circuit, which contradicts the liveness assumption. 8.3.3 Simple Bounds on Cycle Times Let a be the cycle time of A(k ). Since we assumed strong connectedness, we have lim E x j (k ) k →∞ 1/ k = lim (x j (k ))1/ k = a a.s. , k →∞ ∀ j = 1, . . . , n . (8.18) Let • N be the maximal degree of the transitions which are followed by at least one place with a nonzero initial marking (the degree of a node is the number of arcs incident with this node). • b be a random variable which is a ≤st upper bound of each of the random variables Ai j (0), namely Ai j (0) ≤st b , ∀i, j = 1, . . . , n . • b (z ) be the Laplace transform1 E exp(zb ) , which is assumed to be finite in a neighborhood of z = 0. • M (x ) be the Cramer-Legendre transform of the distribution function of the random variable b, namely M (x ) = inf (log(b (z )) − zx ) . z∈R The present section is devoted to the proof of the following result. Theorem 8.33 Let γ = inf{x | x > E[b ], M (x ) + log( N ) < 0}. Under the foregoing assumptions, the cycle time of the event graph admits the upper bound a ≤ γ . We start with two preliminary lemmas. 1 This is not the usual Laplace transform which would read E exp(− zb ) . 8.3. Event Graphs and Branching Processes Lemma 8.34 For all 391 > 0, and for all j = 1, . . . , n, lim P x j (k ) <a+ k =1 , lim P x j (k ) <a− k =0 . k →∞ and k →∞ Proof The property follows immediately from the fact that a.s. convergence implies convergence in probability and from (8.18). Lemma 8.35 If c ∈ R is such that lim P x j (k ) − kc ≤ 0 = 1 , (8.19) k →∞ for some j = 1, . . . , n, then c ≥ a. Proof Under the assumption (8.19), lim P k →∞ x j (k ) ≤c =1 , k so that we cannot have c = a − for some c ≥ a. > 0, in view of Lemma 8.34. Therefore, Proof of Theorem 8.33 Under our assumptions, Equation (8.16) reads x (k + 1) = A(k ) ⊗ x (k ) with the initial condition x (0) = e and it is easily checked by induction that k −1 x j (k ) = A jh +1 , jh (h ) , (8.20) j0 ,... , jk−1 ∈{i,... ,n} h=0 where jk = j . Therefore, k −1 P x j (k ) − ck ≤ 0 = P C jh +1 , jh (h ) ≤ 0 , max j0 ,... , jk−1 ∈{i,... ,n} h =0 def where Ci j (k ) = Ai j (k ) − c. For k fixed, Lemma 8.31 implies that the variables k−1 C jh +1 , jh (h ), where j0 , . . . , jk−1 h =0 vary over the set {1, . . . , n}k , are associated. Therefore, from Lemma 8.66, k −1 k −1 P C jh +1 jh (h ) ≤ 0 ≥ max j0 ,... , jk−1 ∈{i,... ,n} h =0 P j0 ,... , jk−1 ∈{1,... ,n} C jh +1 , jh (h ) ≤ 0 . h =0 392 Synchronization and Linearity Since the random variables A jh +1 , jh (h ) are independent (see Lemma 8.32), and ≤st -bounded from above by b , we have k −1 k −1 C jh +1 , jh (h ) ≤ 0 ≥ P P h =0 (b (h ) − c) ≤ 0 , h =0 where {b (h )} is a sequence of i.i.d. random variables with the same distribution function as b . Now, Chernoff’s Theorem ([3]) implies k −1 b (h ) > ck = exp ( M (c)k + o(k )) , P h =0 for all c > E[b ], so that P x j (k ) − ck ≤ 0 ≥ (1 − exp( M (c)k + o(k )))C j (k) , where C j (k ) denotes the number of paths j0 , . . . , jk−1 which satisfy the property k −1 C jh +1 , jh (h ) = −∞ . h =0 Therefore, it is enough to have the limit C j (k ) exp (k M (c)) → 0 when k goes to ∞, in order to obtain lim P x j (k ) − ck ≤ 0 = 1 . k →∞ (8.21) Clearly, the bound C j (k ) ≤ N k holds, so that a sufficient condition for (8.21) to hold is M (c) + log( N ) < 0. In other words, for c > E[b ] such that M (c) + log N < 0, (8.21) holds, so that c ≥ a in view of Lemma 8.35. In fact, we proved the following and more general result. Corollary 8.36 If log(C j (k )) = Ck + o(k ), then a ≤ inf{c | M (c) + C < 0}. Example 8.37 (Blocking queues in tandem) Consider the example of Figure 8.2, which represents a line of processors with blocking before service, also called communication blocking in the exponential case. Let n denote the number of processors, each of which is represented by a transition. In Figure 8.2, n equals 4. The first processor (on the left of the figure) has an infinite buffer of items to serve. Between two successive processors, the buffer is of capacity one (which is captured by the fact that there are two tokens in any of the upper circuits originating from a processor). The processors are single servers with a FIFO discipline (which is captured by the lower 8.3. Event Graphs and Branching Processes 393 Figure 8.2: Communication blocking: 4 nodes, 1 buffer circuit associated with each transition). It is assumed that all transitions have exponentially distributed firing times with parameter 1. In this example, we have N = 3, b (z ) = (1 − z )−1 . The Cramer-Legendre transform of b (z ) is given by M (x ) = inf (−zx − log(1 − z )) . z∈[0,1) The derivative of the function −zx −log(1−z ) with respect to z vanishes for z = 1−x −1 and this point is a minimum. Therefore M (x ) = 1 − x + log(x ) . As a direct application of Theorem 8.33, we obtain a ≤ inf{x | 1 − x + log(x ) + log(3) < 0} , which provides the following uniform bound in n: a ≤ 3.33 . In other words, the throughput of the systems is always greater than 0.3, regardless of the number of processors. If we apply Theorem 8.36 using the following more precise estimate of C j (k ) 1 π log C j (k ) = 1 + 2 cos k n+1 + o(k ) ≤ 3 (8.22) (see below for its proof), we obtain a ≤ inf x 1 − x + log(x ) < log 1 + 2 cos π 5 3.09 . If the service times are Erlang-3 with mean 1, namely if b (z ) = (3/(3 − z ))3 , in the same way, from Theorem 8.33, we obtain that a ≤ inf{x | 3(x − 1 − log(x )) > log(3)} 2.11 , which corresponds to a throughput greater than 0.48. Proof of Formula (8.22) Let K denote the adjacency matrix of the precedence graph of A(0), namely the n × n matrix such that K i, j = 1 if i ∈ π( j ), and 0 otherwise. For 394 Synchronization and Linearity n = 4, we obtain for instance 1 1 K = 0 0 1 1 1 0 0 1 1 1 0 0 . 1 1 It is easily checked by induction that K ik, j counts the number of paths of length k from i to j . Let P be the substochastic matrix defined by P = K /3, and let λn denote the Perron-Frobenius eigenvalue associated with P . From the irreducibility of P , we obtain n C j (k ) = K ik, j = O(3λn )k . i =1 In order to evaluate λn , we introduce the Markov chain Z k with substochastic transition matrix P and uniform initial measure. We then have P [ Z k = 1] = O(λn )k . We can now evaluate P [ Z k = j ] using the recurrence relations P Z k +1 = j = P Z k +1 = 1 = P [ Z k = j − 1] /3 + P [ Z k = j ] /3 + P [ Z k = j + 1] /3 , 1< j <n , P [ Z k = 1] /3 + P [ Z k = 2] /3 , P Z k +1 = n = P [ Z k = n − 1] /3 + P [ Z k = n] /3 . Let ∞ (8.23) n P (x , y ) = xk y j P [ Zk = j ] . k =0 j =1 From (8.23), we obtain P (x , y ) = G ( y ) − F (x )(1 + y n+1 ) , 3 − x ( y + 1 + y −1 ) (8.24) where F (x ) is the function ∞ P [ Z k = 1] x k , F (x ) = x k =0 and G( y) = 1 + y + y2 + . . . + yn . The denominator of (8.24) vanishes for x = x ( y ) = 3/( y + 1 + y −1 ) ≤ 1. Therefore, we necessarily have G( y) , F (x ) = 1 + y n+1 8.3. Event Graphs and Branching Processes 395 for x = x ( y ). The poles of F (x ) are for y n+1 = −1, namely for y (l ) = exp i π + 2il π n+1 , We have x ( y (l )) = l = 0, . . . , n . 3 1 + 2 cos , π (1+2l ) n+1 the smallest of which is for l = 0. Therefore, from classical theorems on generating functions, P [ Z k = 1] = O(x ( y (0))−k ) , Or, equivalently, λn = 1 + 2 cos 3 which in turn implies C1 (k ) = 1 + 2 cos 8.3.4 π n+1 , . General Case 8.3.4.1 π n+1 Multitype Branching Processes The class of age-dependent multitype branching processes considered in this section is a special case of those considered in [19]. There are n types; the branching process is characterized by a family of integer-valued random processes Z kli (t ) ; j t ∈ R+ ; i , j = 1, . . . , n ; k , l = 1, 2, . . . , where • the n × n matrices Z kl (·) are i.i.d. for l , k = 1, 2, . . . ; • the variables Z kli (·) are mutually independent in i, j , and this for all l , k = j 1, 2, . . . . Index k refers to generation levels from an initial generation called 1, and index l is used to count individuals of a given type within a generation level. If the branching process is initiated by a single generation-1 individual of type j , born at time 0, this individual gives birth to a total of Z 11 (∞) generation-2 individuals of type i , one at ji each jump time of Z 11(·). Once a generation-k individual of type i is born, it is assigned ji an integer l , different from all the integers assigned to already born individuals of the same generation and type (for instance, the individuals of generation 2 and type i can kl be numbered 1, . . . , Z 11(∞)). Then, the random function Z ih (t ) is used to determine ji the number of generation-(k + 1) individuals of type h born from the latter individual in less than t ∈ R+ after its own birth. 396 Synchronization and Linearity Let T j(ik) (t ) ∈ N denote the total number of generation-k individuals of type i born by time t in a branching process initiated at time 0 from a single generation-1 individual of type j . Let F j i (t ) be the monotonic function defined by the relation F j i (t ) = E Z 11 (t ) ji t ∈ R+ , , (z ) be the n × n matrix with entries and let j i (z ) ∞ = exp(zt ) F j i (dt ) . 0 We assume that there exists a real neighborhood of 0 where the matrix (z ) is finite. Lemma 8.38 (Biggins) Under the above assumptions, ∞ E 0 where k exp(zt ) T j(ik) (dt ) = denotes the k-th power of k j i (z ) , (8.25) . Proof Let Fk denote the σ -field of the events up to the k -th generation. Owing to the independence assumptions, we obtain the vector relation ∞ E 0 (k + 1 ) exp(zt ) T j (k )(·) ∞ (dt ) Fk = 0 (k ) (k ) exp(zt ) T j (dt ) (z ) , (k ) where T j denotes the vector (T j 1 (·), . . . , T j n (·)). By taking expectations in the last expression, we obtain (8.25). 8.3.4.2 Comparison between Event Graphs and Branching Processes Consider now the following specific age-dependent branching process associated with the stochastic event graph under consideration: • there are as many types as there are transitions followed by at least one place with a nonzero initial marking, namely n; • the random vector Z 11 (t ) is defined through its probability law by the relation j Z 11 (t ) =st 1i ∈ p ( j ) 1 A j i (0)≤t , for all i, j ∈ {1, . . . , n}. This fully defines the probji ability law of the matrices Z kl (·) in view of the independence assumptions. Observe that, for this specific branching process, an individual of type j gives birth to at most one individual of type i , for all i, j . Let x j (k ) be the epoch of the latest birth of all generation-k individuals ever born in the above branching process, when this one is initiated by an individual of type j at time 0. Lemma 8.39 Under the foregoing statistical assumptions, for all j ∈ {1, . . . , n} and k ≥ 1, x j (k ) ≤st x j (k ) , provided x (0) = e. 8.3. Event Graphs and Branching Processes 397 Proof From the definition of x (k ), for all t ∈ R+ , k −1 P[x j (k ) ≤ t ] = P A jh +1 , jh (h ) ≤ t , j0 ,... , jk−1 ∈{1,... ,n} h=0 where jk = j . Therefore, the association property of Lemma 8.31 implies that k −1 P x j (k ) ≤ t ≥ P j0 ,... , jk−1 ∈{i,... ,n} A jh +1 , jh (h ) ≤ t h =0 k −1 = P j0 ,... , jk−1 ∈{i,... ,n} jh ∈ p ( jh +1 ) A jh +1 , jh (h ) ≤ t . (8.26) h =0 Now, from its very definition, the event {x j (k ) ≤ t } can be written as k −1 A jh +1 , jh (h ) ≤ t j0 ,... , jk−1 ∈{i,... ,n} jh ∈ p ( jh +1 ) , h =0 where the random variables A jh +1 , jh are all mutually independent, and where A jh +1 , jh has the same probability law as A jh +1 , jh . Since the random variables in the right-hand side of (8.26) are also mutually independent (see Lemma 8.32), the latter expression coincides with P x j (k ) ≤ t . 8.3.4.3 Upper Bounds for Cycle times Whenever the integrals defining the entries of (z ) converge, this matrix is positive; its Perron-Frobenius eigenvalue ([61]) is denoted φ(z ) . Let M (x ) be the CramerLegendre transform of φ(z ): M (x ) = inf (log(φ(z )) − zx ) . z>0 It is well known that M (x ) is decreasing for x ≥ 0 (see [3]). Let γ be defined by γ = inf{x | M (x ) < 0} . Theorem 8.40 Under the foregoing statistical assumptions, the cycle time a of the event graph is such that a≤γ . (8.27) Proof We first prove that lim sup k x j (k ) ≤γ k a.s. (8.28) 398 Synchronization and Linearity Let v(z ) be the right eigenvector associated with the maximal eigenvalue φ(z ). From (8.25), we obtain that ∞ E 0 exp(zt ) T j(k) (dt ) , v(z ) = φ k (z )v j (z ) , so that ∞ E 0 exp(zt ) T j(k) (dt ) , 1 ≤ φ k (z )v j (z )u (z ) , (8.29) where u (z ) = (mini vi (z ))−1 (v(z ) is strictly positive due to the Perron-Frobenius theorem). Now, since x j (k ) = sup{t | ∃i = 1, . . . , n, T j(ik) (t ) = 0}, we have ∞ E 0 ∞ n exp(zt ) T j(k) (dt ) , 1 = E 0 i =1 ≥ exp (zt ) T j(ik) (dt ) E exp(z x j (k )) . (8.30) In addition, for z ≥ 0, P x j (k ) ≥ c ≤ E exp z k x j (k ) −c k . This, plus (8.29) and (8.30), in turn imply lim k 1 log P x j (k ) ≥ kc ≤ inf(log(φ(z )) − zc) = M (c) . z>0 k Therefore, for all c such that M (c) < 0, k≥1 P x j (k ) ≥ kc < ∞, so that the BorelCantelli Lemma immediately implies (8.28). From Lemma 8.39, for all bounded and nondecreasing functions f , x j (k ) k Ef ≤E f x j (k ) k , for all j = 1, . . . , n. In view of (8.18), from the Lebesgue dominated convergence theorem we obtain that, for f bounded and continuous, x j (k ) k lim E f k = f (a) . In addition, for f continuous, monotonic, nondecreasing and bounded, we also have lim supE f k x j (k ) k ≤ E lim sup f k x j (k ) k =E f lim sup k x j (k ) k ≤ f (γ ) , where we successively used Fatou’s lemma, the monotonicity and continuity of f , and finally (8.28). Therefore f (a) ≤ f (γ ), for all nondecreasing, continuous and bounded f , which immediately implies (8.27). 8.3. Event Graphs and Branching Processes 399 Observe that in the particular case when all non −∞ entries of A(1) have the same distribution characterized by the function b (z ), the eigenvalue of interest is precisely φ(z ) = b (z )C , where C is the Perron-Frobenius eigenvalue of the adjacency matrix associated with the matrix A, namely the maximal eigenvalue of matrix (0). Example 8.41 (Blocking queues with transportation times) The example is that of the line of processors described previously, but with a deterministic transportation time between processors. The associated event graph is obtained from that of Figure 8.2 by replacing each buffer by two buffers connected by a transportation transition with deterministic firing times δ as shown in Figure 8.3. The statistical assumptions buffer buffer 1 transportation buffer 2 Figure 8.3: Blocking with transportation times concerning the firing times of transitions associated with processors are those of the previous example. In this example, we have 1 1 0 0 1− z 1− z 1 1 exp(δ z) 0 1− z 1− z 1− z (z ) = . exp(δ z) 1 1 0 1− z 1− z 1− z exp(δ z) 1 0 0 1− z 1− z The Perron-Frobenius eigenvalue of this matrix is φ(z ) = 1 1−z 1 + 2 exp δz 2 cos π n+1 (8.31) (the proof is similar to that of the previous case). The technique is then the same as above for deriving the upper bound γ . The lower bounds a+ given in the following arrays are those obtained by convex ordering following the method indicated in Theorem 8.24. For n = 4, one obtains the following array at the left-hand side. For n large, the lower bound is unchanged, and we obtain the following upper bound indicated in the array at the right-hand side. δ γ a+ 0 3.1 1 1 3.3 1.5 2 3.7 2 3 4.2 2.5 δ γ a+ 0 3.3 1 1 3.6 1.5 2 4.0 2 3 4.4 2.5 Remark 8.42 Better upper bounds can be derived when considering the constant γl , l ≥ 0, associated with the n-type, age-dependent branching process of probability law Z 11 (t ) =st 1{−∞<( A(0) A(1)...A(l)) j i ≤t } . ji The constant γ referred to in Theorem 8.40 corresponds to γ0 . It can be shown that the sequence {γl } decreases to a limit when l tends to ∞. 400 Synchronization and Linearity Remark 8.43 Lower bounds on cycle times based on convex ordering were discussed in the previous section. We have shown how to derive upper bounds based on large deviations in the present section. Since the stability region of a non strongly connected event graph is obtained by comparing the cycle times of its strongly connected components (see Theorems 7.69 and 7.96), these two bounding methods also provide a way to analyze the stability region of this class of systems. 8.4 Markovian Analysis 8.4.1 Markov Property The evolution equation studied in this section is x (k + 1) = A(k )x (k ) , k = 0, 1, 2, . . . , (8.32) with initial condition x 0 ∈ Rn . Theorem 8.44 If the matrices A(k ) are i.i.d. and independent of the initial condidef ◦ tion x 0 , the sequence {z (k )} = {x (k )/ x 1 (k )} forms an Rn -valued Markov chain. Proof The Markov property follows immediately from the relation z (k + 1) = x (k + 1) A(k ) x (k ) A(k )z (k ) = = , x 1 (k + 1) ( A(k )x (k ))1 ( A(k ) z (k ))1 and from the independence assumptions (see Theorem 8.68). There is no general theory available for computing the invariant measure of this Markov chain. The following sections will therefore focus on simple examples. These examples are obtained either from specific problems described in Chapter 1 or from simplifying mathematical assumptions2 on the structure of matrices A(k ). The quantities of interest are ◦ lim E [x i (k + 1)/ x i (k )] , k →∞ for an arbitrary i , which coincides with the Lyapunov exponent of m.s.c.s. [i ], and the distribution of the stationary ratios. 8.4.2 Discrete Distributions Example 8.45 Consider the case when x ∈ R2 and when matrix A(k ) is one of the following two matrices: 37 35 , , 24 24 each with probability 1/2. This example was also mentioned in §1.3. Starting from an arbitrary x 0 -vector, say x 0 = 0 2 , we will set up the reachability tree of 8.4. Markovian Analysis 401 Table 8.2: Transitions of the Markov chain A12 = 7 A12 = 5 2 n2 9 n3 7 Initial state n1 = 0 n2 = 0 −3 n4 4 n3 3 n3 = 0 −1 n2 6 n3 4 n4 = 0 −2 n2 5 n3 3 all possible normalized states. This is indicated in Table 8.2 which gives the list of state transitions of z (k ) (the normalization here means that the first component of the normalized state is always 0), together with the corresponding value of ( A(k ) z (k ))1 (the normalization factor). In order to obtain a concise notation, the different normalized state vectors are denoted n i , i = 1, . . . . The table is obtained in the following way. The initial state def is n 1 = 02 . From there, two states can be reached in one step, depending on and 0 −1 . Both normalized states are added the value of A(0): 0 −3 to the list and denoted n 2 and n 3 , respectively. The normalization factors are 9 and 7, respectively. When taking n 2 as initial state, two states can be reached: 0 −2 and 0 −1 . Only the rst of these normalized states is new; it is added to the list and called n 4 , and so on. For the current example, it turns out that there exist four different states (see Table 8.2). From this table, one directly notices that the system never returns to n 1 . Hence this state is transient. In fact, the Markov chain has a single recurrence class, which consists of the three states n 2 , n 3 and n 4 : from the definition of A(k ), we obtain z 2 (k + 1) = = x 2 (k + 1) − x 1 (k + 1) max (2 + x 1 (k ), 4 + x 2 (k )) − max (3 + x 1 (k ), A12 (k ) + x 2 (k )) , where A12 (k ) is equal to either 7 or 5. Rewriting the right-hand side results in z 2 (k + 1) = max (0, 2 + z 2 (k )) − max (1, A12 (k ) − 2 + z 2 (k )) . Whatever integer value we assume for z 2 (k ), z 2 (k + 1) can only assume one of the values −1, −2 or −3. The transition matrix of the restriction of this Markov chain to this recurrence class is 0 1/ 2 1/ 2 1/ 2 1/ 2 1/ 2 . 1/ 2 0 0 2 These examples are defined through (8.32) as stochastic R max -linear systems; they do not necessarily have an interpretation in terms of FIFO stochastic event graphs, as defined in Chapter 2. 402 Synchronization and Linearity The stationary distribution of this chain is easily calculated to be µ(n 2 ) = 1/3 , µ(n 3 ) = 1/2 , µ(n 4 ) = 1/6 . The average cycle time is then µ(n 2 )(4µ( A1 ) + 3µ( A2 )) + µ(n 3 )(6µ( A1 ) + 4µ( A2 )) +µ(n 4 )(5µ( A1 ) + 3µ( A2 )) = 13/3 . The crucial feature in this method is that the number of different normalized state vectors is finite. We now give a few theorems which provide simple sufficient conditions for the finiteness of the state space within this context. Theorem 8.46 Consider the n-dimensional equation (8.32). Assume that for all entries Ai j (k ) there exist finite real numbers Ai j and Ai j such that P[ Ai j ≤ Ai j (k ) ≤ A i j ] = 1 , ∀k ≥ 0 . Suppose that z (0) is finite. Then, for k = 1, 2, . . . , all elements z j (k ) of the Markov chain are bounded and we have min ( A j i − A1i ) ≤ z j (k ) ≤ max ( A j i − A 1i ) , 1 ≤i ≤n 1 ≤i ≤n j = 2, . . . , n . (8.33) Proof We have z j (0) = x j (0) − x 1 (0), which is finite. From the definition of z it follows that z j (1) = A j 1 (0) ⊕ A j 2 (0)z 2 (0) ⊕ · · · ⊕ A j n (0)z n (0) ◦ / ( A11 (0) ⊕ A12 (0)z 2 (0) ⊕ · · · ⊕ A1n (0)z n (0)) . Let q ∈ {1, . . . , n} be such that A j q (0)z q (0) = A j 1 (0) ⊕ A j 2 (0)z 2 (0) ⊕ · · · ⊕ A j n (0)z n (0) , and let r ∈ {1, . . . , n} be such that A1r (0)zr (0) = A11 (0) ⊕ A12 (0)z 2 (0) ⊕ · · · ⊕ A1n (0)z n (0) . Then, z j (1) = = A j q (0)z q (0) − A1r (0)zr (0) A jr (0) − A1r (0) ≥ ≥ A jr (0)zr (0) − A1r (0)zr (0) A jr − A1r , ≤ ≤ A j q (0)z q (0) − A1q (0)z q (0) A j q − A 1q . whereas on the other hand, z j (1) = = A j q (0)z q (0) − A1r (0)zr (0) A j q (0) − A1q (0) 8.4. Markovian Analysis 403 The property extends immediately to z (k ), k ≥ 1. Remark 8.47 Theorem 8.46 can straightforwardly be generalized to matrices A(k ) which are such that, for some l , lk=0 A(k ) has all its entries bounded from below and from above. The preceding theorem admits the following obvious corollary. Corollary 8.48 If the matrices A(k ) are i.i.d. with integer-valued entries which satisfy the conditions of Theorem 8.46, and if all entries of z (0) are finite, integer-valued, and independent of matrices A(k ), then the Markov chain z (k ) has a finite state space. Remark 8.49 It is possible that under certain conditions, bounds exist which are better than those given in Theorem 8.46, as shown by the following two-dimensional example with P[ Ai j (k ) = 0] = P[ Ai j (k ) = 1] = 1/2 , P[ A11 (k ) = 1] = P[ A11 (k ) = 2] = 1/2 . except for i = j = 1 ; Then the greatest lower bound and least upper bound of the random variables are A11 = 1 ; A 11 = 2 ; Ai j = 0 ; Ai j = 1 . According to Theorem 8.46, we have −2 = min(0 − 1, 0 − 2) ≤ z (k ) ≤ max(1 − 0, 1 − 1) = 1 . In the integer-valued case, it follows from this that the state space of z (k ) is given by the set {−2, −1, 0}. Hence, in this case, z (k ) will not achieve the upper bound of Theorem 8.46 with positive probability. This theorem can easily be extended to include rational values. Corollary 8.50 If all entries of A(k ) are rational-valued and satisfy the conditions of Theorem 8.46 for all k a.s. and if all entries of z (0) are rational, then the state space of the Markov chain remains finite. We now give an example in which the number of elements in the state space does depend on the actual values of the random variables, but in which on the other hand the rationality does not play a role. Example 8.51 Consider (8.32), with n = 2, and where the random variables Ai j (k ) have the following support A11 = 0 or 1 , A12 = 0 or α , A21 = 1 , A22 = 0 or α . Let α > 1 and let all possible outcomes have positive probabilities. For this twodimensional system, the Markov chain reduces to z (k ) = x 2 (k ) − x 1 (k ) ∈ R. Using Theorem 8.46, one obtains that −α ≤ z (k ) ≤ α . From Corollaries 8.48 and 8.50, we 404 Synchronization and Linearity know that if α is rational, then the Markov chain has a finite state space (at least for a proper choice of z (0)). Depending on the value of α , the recurrent state space of z (k ) can be determined. For all α > 1, z (k ) can assume the following six states with positive probability: 0, 1, 1−α , α, α−1 , −α . For α ≥ 2, these are the only values z (k ) can assume. For 1 < α < 2, the following states are also in the state space: 2−α , 2 − 2α , 2α − 2 , −1 . For 3/2 ≤ α < 2 the state space consists of just these ten states. But for 1 < α < 3/2 the following values can also be assumed: 3 − 2α , 3 − 3α , α−3 , α−2 . Again, for 4/3 ≤ α < 3/2, the state space consists of the given fourteen states. But, for 1 < α < 4/3, four other states are also possible, resulting in eighteen states, whereas for 1 < α < 5/4 again four new states are possible, etc. We see that if α comes closer to one, the number of states increases (stepwise). But for any value of α the total number of states remains finite. Also for α = 1 the number of states is finite (in fact, the state space is then equal to {-1,0,1}). Also, for all values of α (both rational and irrational) within a certain interval, the number of elements of the state space of the Markov chain z (k ) is the same. Example 8.52 Consider the following six-dimensional case: ε eεεεε ε ε e ε e ε ε ε ε e ε ε , A(k ) = e ε ε ε e ε ε ε ε ε ε e a (k ) ε e ε ε ε with P[a (k ) = e] = P[a (k ) = 1] = 1/2 . The matrix is irreducible since the graph corresponding to the matrix is strongly connected. But it turns out that, in this case, the state space of the Markov chain becomes infinite. This can be seen as follows. From a state l e l e l e the following states are possible: e l e l e l and e l e l l +1 e After the last state, the state l+1 e l+1 e e l+1 . 8.4. Markovian Analysis 405 Table 8.3: Markov chains of the first three routing schemes Initial state First routing scheme s31 = 1 s31 = 2 n1 = 00 0000 000 n1 1 n2 1 n2 = 10 0000 100 n3 1 n3 1 n3 = 11 0100 110 n1 2 n1 2 Second routing scheme n1 = 00 0000 000 n1 1 n2 1 n2 = 10 0100 100 n1 2 n1 2 Third routing scheme n1 = 00 0000 000 n1 1 n2 1 n2 = 00 0100 100 n3 1 n4 1 n3 = 10 0100 001 n5 1 n6 1 n4 = 10 0100 101 n7 1 n8 1 n5 = 01 1001 011 n9 1 n2 2 n6 = 01 1101 111 n1 2 n2 2 n7 = 11 1101 011 n9 1 n2 2 n8 = 11 1101 111 n1 2 n2 2 n9 = 11 1011 111 n1 2 n2 2 406 Synchronization and Linearity can be reached with positive probability, and from this state the state e l +1 e l+1 e l +2 is possible, etc. Example 8.53 (Railway Traffic Example Revisited) In §1.2.6 a railway system in a metropolitan area was studied. This example will be studied in greater detail now. The three railway stations S1 , S2 and S3 are connected by a railway system as indicated in Figure 1.10. The railway system consists of two inner circles, along which the trains run in opposite direction, and of three outer circles. The model which describes the departure times of the nine trains is x (k + 1) = A1 x (k ). Two other models were described in §1.2.6 as well, depending on other routing schemes of the trains. These models were characterized by the transition matrices A2 and A3 respectively. The elements of these matrices were e, ε or si j , the latter referring to the traveling time from station Si to station S j . The quantity sii refers to the traveling time of the outer circle connected to station Si . It is assumed that all si j -quantities are equal to 1, except for s31 . The latter quantity is random and is either 1 or 2. Each time a train runs from S3 to S1 there is a probability p , 0 ≤ p ≤ 1, that the train will be delayed, i.e. s31 = 2 rather than s31 = 1. Thus matrices Ai become k -dependent and will be denoted Ai (k ). The system is now stochastic. It is assumed that no correlation with respect to the ‘counter’ k exists. In this situation, one may also have a preference for one of these three routings or another one. In this context four routings will be studied: the ones characterized by the matrices Ai , i = 1, 2, 3, and the routing in which the trains move in opposite directions compared with the routing characterized by A3 . The matrix corresponding to the latter routing, though not explicitly given, will be indicated by A4 . In fact, the results corresponding to this fourth routing will be obtained by using A3 in which s13 is now the uncertain factor rather than s31 . In Tables 8.3 and 8.4 the normalized states corresponding to the stationary situations of the four routings are given (one must check again in each of these four cases whether the set of normalized states in the stationary situation is unique, which turns out to be true). These states have been normalized in such a way that the least component equals zero. The transition matrices of these Markov chains follow directly from these tables. As an example, if p = 0.2, the transition matrix for the Markov chain of the first routing scheme becomes −0.2 0 1 0.2 −1 0 , 0 1 −1 from which the stationary distribution can be calculated: it is The cycle time then becomes 5/ 7 1/ 7 1/ 7 . (0.8 × 1 + 0.2 × 1) × 5/7 + (0.8 × 1 + 0.2 × 1)/7 + (0.8 × 2 + 0.2 × 2)/7 = 8/7 . The results for varying p are given in Figure 8.4 for all four routing schemes. Note 8.4. Markovian Analysis 407 Table 8.4: Markov chain of the fourth routing scheme s31 = 1 Initial state n1 = n2 = n3 = n4 = n5 = n6 = n7 = n8 = n9 = n 10 = n 11 = n 12 = n 13 = n 14 = n 15 = n 16 = n 17 = n 18 = n 19 = n 20 = n 21 = n 22 = n 23 = n 24 = n 25 = n 26 = n 27 = n 28 = n 29 = n 30 = n 31 = 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 1 0 0 1 2 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 2 2 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 1 2 1 2 1 2 1 2 1 1 1 2 1 2 s31 = 2 n1 n3 n5 n7 n9 n 11 n 13 n 15 n 16 n 18 n 20 n 22 n 24 n1 n 24 n 26 n 27 n1 n3 n1 n3 n1 n3 n1 n3 n 28 n 30 n1 n3 n9 n 11 n2 n4 n6 n8 n 10 n 12 n 14 n1 n 17 n 19 n 21 n 23 n 25 n2 n 25 n 26 n 27 n1 n3 n1 n3 n1 n3 n2 n4 n 29 n 31 n2 n4 n 10 n 12 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 408 Synchronization and Linearity cycle time 1.5 1.4 2 ting 3 rou 1 and tings rou g4 routin 1.3 1.2 1.1 p 1.0 0 0.2 0.4 0.6 0.8 1.0 Figure 8.4: Cycle times of the four routings that the cycle times of routings one and three completely coincide. If one could choose among the routings, then routing 4 would be preferred since it has the least cycle time for any p-value. Example 8.54 (Example of Parallel Computation Revisited) The starting point for this subsection is Equation (1.33). This is a nonlinear equation which describes the evolution of the speed of a simple matrix multiplication on a wavefront array processor. Suppose that αi , i = 1, 2, are either 1 (ordinary multiplication) or 0 (multiplication by a 0 or a 1). Let z (k ) be defined as in Theorem 8.44. Using the same type of arguments as in the proof of this theorem, one shows that z (k ) is a Markov chain, provided that the random variables αi (k ) are independent. It is possible to aggregate the state space of this Markov chain into twelve macrostates, as defined in Table 8.5, while preserving the Markov property (the transition probabilities satisfy the conditions of the ‘lumping theorem’ 6.3.2 in [74]). The other steps of the analysis (computation of the transition matrix of the aggregated Markov chain and of the invariant measure) can then be carried out in the standard way. Table 8.5: States of Markov chain; example of parallel computation; l = 0, 1, 2, . . . n1 n2 n3 n4 l l l l l l l l l l+1 l l l l+1 l+1 l 0 0 0 1 0 1 0 0 n5 n6 n7 l l l l+1 l l l+1 l+1 l l +1 l +1 l +1 0 1 1 1 1 0 n8 n9 n 10 n 11 n 12 l l l l l l l+1 l+1 l l l l+1 l l l l l+1 l+1 l l−1 1 0 0 0 1 1 0 0 1 0 8.4. Markovian Analysis 8.4.3 409 Continuous Distribution Functions The starting point is Equation (8.32). It is assumed that the sequence of matrices { A(k )}k ≥0 is i.i.d., that for each k the entries Ai j (k ), i, j = 1, . . . , n, are mutually independent, and that the random variables Ai j (0) all have the same distribution function on R+ , which will be denoted F . We will assume that F admits a density. Under these assumptions, there is one m.s.c.s. Whenever the support of F is infinite, the increment process {δ(k )} of Chapter 7, and hence the process {z (k )} couple in finite time with stationary processes which do not depend on the initial condition. From Theorem 8.44, the variables z (k ) form a Markov chain on Rn−1 (the first coordinate is zero), the transition matrix of which is characterized by the relation z i (k + 1) = Ai 1 (k ) ⊕ A11 (k ) ⊕ n j =2 ( Ai j (k ) ⊗ z j (k )) n j =2( A1 j (k ) ⊗ z j (k )) , i = 2, . . . , n . (8.34) The transition kernel of the Markov chain, or equivalently the distribution of z 2 (k + 1) . . . z n (k + 1) z 2 (k ) . . . z n(k ) given is obtained from (8.34): K (x 2 , . . . , x n ; y2 , . . . , yn ) def = = P [z 2 (k + 1) ≤ y2 , . . . , z n(k + 1) ≤ yn | z 2 (k ) = x 2 , . . . , z n (k ) = x n ] ◦ ◦ P [ X 2/ X 1 ≤ y2 , . . . , X n/ X 1 ≤ yn ] , where the random variables X i , i = 2, . . . , n, defined by n X i =st Ai 1 (k ) ⊕ Ai j (k )x j , j =2 are independent. The notation =st refers here to equality in distribution. From the last two equations and the fact that all Ai j (k ) possess the same distribution F , it follows that K (x 2 , . . . , x n ; y2 , . . . , yn ) = ∞ n −∞ ∞ −∞ n P[ X j ≤ y j + t ] dP [ X 1 ≤ t ] = j =2 j =2 H (t + y j , t + y j − x 2 , . . . , t + y j − x n ) where d H (t , t − x 2 , . . . , t − x n ) dt dt n def H (u 1 , u 2 , . . . , u n ) = F (u i ) . i =1 The distribution function def ζk (x 2 , . . . , x n) = P [z 2 (k ) ≤ x 2 , . . . , z n (k ) ≤ x n ] 410 Synchronization and Linearity satisfies the functional equation ∞ ζk+1 ( y2 , . . . , yn) = −∞ ∞ ... −∞ K (x 2 , . . . , x n ; y2 , . . . , yn )ζk (d x 2 , . . . , d x n ) . Whenever the infinite support condition is satisfied, we know that the limit limk→∞ ζk = ζ exists in the (weak convergence) distributional sense. This limit is a solution of the functional equation ζ ( y2 , . . . , yn ) = ∞ −∞ ... ∞ −∞ K (x 2 , . . . , x n ; y2 , . . . , yn ) ζ (d x 2 , . . . , d x n) . (8.35) Let ζ be the unique solution of this equation. It is immediate to see that the distribution function D (t ) of the stationary ratio δ11 is given by the relation D (t ) = = = lim P [x 1 (k + 1) − x 1 (k ) ≤ t ] k →∞ lim P [ A11 (k ) ≤ t , A12 (k ) + z 2 (k ) ≤ t , . . . , A1n (k ) + z n (k ) ≤ t ] k →∞ lim P [ A11 (k ) ≤ t , A12 (k ) ≤ t − z 2 (k ), . . . , A1n (k ) ≤ t − z n (k )] k →∞ = F (t ) lim = F (t ) k →∞ ∞ −∞ ∞ −∞ ... ... ∞ ∞ n −∞ i =2 n −∞ i =2 F (t − yi )ζk (d y2 , . . . , d yn) F (t − yi )ζ (d y2 , . . . , d yn ) . (8.36) The limit and the integral in (8.36) can be interchanged by the definition of weak convergence, since F is continuous, see [21]. Example 8.55 Consider (8.32) with n = 2 and with the assumptions made in the previous subsection. The transition kernel K (x ; y ) of the Markov chain {z 2 (k )} is given by def K (x ; y ) = P [z 2 (k + 1) ≤ y | z 2 (k ) = x ] ∞ d = H (t + y , t + y − x ) H (t , t − x ) dt dt −∞ ∞ d d = F (t + y ) F (t + y − x ) F (t − x ) F (t ) + F (t ) F (t − x ) dt dt −∞ dt . Explicit calculations will be made for F (x ) = (1 − exp (−x ))1 [0,∞)(x ) . (8.37) It follows from (8.35) that the density d of the stationary distribution ζ satisfies the equation ∞ d d ( y) = K (x ; y ) d (x ) d x , dy −∞ 8.4. Markovian Analysis 411 or, equivalently, after some calculations, d ( y) = ∞ 1 2 − exp (−| y | − 2|x |) + exp (−2| y | − 2|x |) (8.38) 3 3 −∞ 2 4 + exp (−| y | − |x |) − exp (−2| y | − |x |) + exp (−| y |) d (x ) d x . 3 3 1 2 This is an integral equation, the kernel of which is degenerate, see [90]. The solution d ( y ) must be of the form d ( y ) = c1 exp (−| y |) + c2 exp (−2| y |) (see [90, Chapter 1, §4]), where the coefficients ci still must be determined. Substitution of this form into the integral equation leads to 8c1 + 23c2 = 0. This, together with ∞ the normalization condition −∞ d ( y ) d y = 1 results in 2c1 + c2 = 1, which uniquely determines these coefficients. The stationary density is hence given by d ( y) = 23 4 exp (−| y |) − exp (−2| y |) , 38 19 y ∈ (−∞, ∞) . (8.39) It is easy to show that d ( y ) ≥ 0, ∀ y ∈ (−∞, ∞), and hence d is indeed a probability density function. With the aid of (8.39), one now obtains, after some straightforward analysis, 407 ◦ lim E [x 1 (k + 1)/ x 1 (k )] = = 1.79 . k →∞ 228 This expression also equals limk→∞ E (x i (k ))1/ k , provided that the random variables x 1 (0) and x 2 (0) are integrable. Remark 8.56 The fact that dk , defined as the density of ζk , indeed approaches the limit d as k goes to infinity, can easily be illustrated for this example. If one starts with an arbitrary density d0 , then d1 is already the sum of the two exponentials exp(− y ) and exp(−2 y ), as follows from dk + 1 ( y ) = ∞ −∞ K (x ; y )dk (x ) d x , where the kernel K is the same as in (8.38). In general, dk ( y ) = c1 (k ) exp (−| y |) + c2 (k ) exp (−2| y |) , k≥1 , and the coefficients satisfy c1 (k + 1) c2 (k + 1) = 11/9 23/36 −4/9 −5/18 c1 (k ) c2 (k ) . If one starts with with a probability density function ζ0 , then 2c1 (k ) + c2 (k ) = 1, k = 1, 2, . . . , and lim c1 (k ) = 23/38 , k →∞ lim c2 (k ) = −4/19 . k →∞ 412 Synchronization and Linearity Hence the transient behavior converges to the limit (stationary) behavior. Remark 8.57 The outcome of the above example will be compared with the outcomes of two other examples. The models of all three examples will be the same, i.e. (8.32) with n = 2. The difference is the stochastic behavior of Ai j . In the above example, it was characterized by (8.37). In the next two examples we have Example 2: µ( Ai j = 0) = µ( Ai j = 2) = 1/2; Example 3: Ai j is uniformly distributed on the interval [0, 2). In all these three examples, E[ Ai j ] = 1. In spite of this, it will turn out that the throughput for all three examples is different. In the second example, the elements Ai j have a discrete distribution. The method of §8.4.2 can be applied, which results in ◦ lim E [x 1 (k + 1)/x 1 (k )] = k →∞ 12 = 1.71 . 7 For the third example, the method described at the beginning of this subsection can be used. The same type of analysis leads to ◦ lim E [x 1 (k + 1)/ x 1 (k )] = 1.44 . k →∞ The third example leads to the best throughput. This is not surprising: for instance, the comparison between the exponential case and the case of Example 3 follows from Theorem 8.3; indeed, the Karlin-Novikoff cut criterion [123, Proposition 1.5.1, p. 12] immediately implies that an exponential random variable of mean 1 is ≤cx -bounded from below by a uniform random variable on [0, 2). 8.5 8.5.1 Appendix Stochastic Comparison This subsection gathers a few basic properties of the three stochastic orders introduced in §8.2.1, and related definitions. For proofs and details, see [123] or [6]. From the very definitions, it should be clear that x ≤st x † ⇒ f (x ) ≤st f (x † ) , (8.40) for all coordinatewise nondecreasing functions f : Rn → Rm . In the same vein, x ≤icx x † ⇒ f (x ) ≤icx f (x † ) , (8.41) for all nondecreasing and convex functions f : R → R . n m Remark 8.58 From Jensen’s inequality, it is immediately checked that for all integrable random variables x ∈ Rn , the following relation holds: E [x ] ≤cx x . (8.42) 8.5. Appendix 413 Lemma 8.59 If x and x † are nonnegative, real-valued random variables, each of the properties x ≤st x † , x ≤cx and x ≤icx x † implies the moment relation E [x n ] ≤ E (x † )n , for all n ≥ 0. Consider an Rn -valued sequence x generated by the recursion x (k + 1) = a (x (k ), u (k )) , k≥0 , (8.43) for some Borel mapping a : Rn × Rp → Rn , for some given sequence of Rp -valued random variables u = {u (0), . . . , u (k ), . . . } and some initial condition x (0), all defined on the probability space ( , F, P). Let x † be the sequence defined as above, but for the initial condition and the sequence, which are respectively replaced by x † (0) and u†. In what follows, ξ denotes the sequence ξ = {x (0), u (0), u (1), . . . } , (8.44) with a similar definition for ξ † . The proofs of the following results can be found in [6, Chapter 4]. Theorem 8.60 Assume that the mapping ( X , U ) → a ( X , U ) is nondecreasing; then ξ ≤st ξ † implies that x ≤st x. Theorem 8.61 Assume that the random variables x (0) and u (0), u (1), . . . are integrable, that the mapping ( X , U ) → a ( X , U ) is convex, and that the mapping X → a ( X , U ) is nondecreasing for all U ; then ξ ≤cx ξ † implies that x ≤icx x. Theorem 8.62 Assume that the random variables x (0) and u (0), u (1), . . . are integrable, and that the mapping ( X , U ) → a ( X , U ) is convex and nondecreasing; then ξ ≤icx ξ † , implies that x ≤icx x. Definition 8.63 (Stochastic convexity) A collection of Rn -valued random variables { Z (ρ)}ρ ∈R with a convex parameter set R ⊂ Rm is said to be stochastically (increasing and) convex in ρ if E [φ( Z (ρ))] is (nondecreasing and) convex in ρ ∈ R, for all nondecreasing functions φ : Rn → R. The stochastic concavity with respect to a parameter is defined in a similar way. Definition 8.64 (Association) The (set of) R-valued random variables x 1 , . . . , x n , all defined on the same probability space, are (is) said to be associated if E [ f (x 1 , . . . , x n )g (x 1 , . . . , x n )] ≥ E [ f (x 1 , . . . , x n )] E [g (x 1 , . . . , x n )] , for all pairs of increasing functions f , g : Rn → R such that the integrals are well defined. This definition is extended to sets of Rn -valued random variables by requiring that the set of all coordinates be associated. It is also extended to sequences and to random processes in the usual way: the sequence is said to be associated if all finite subsequences are associated. 414 Synchronization and Linearity Remark 8.65 The association property can often be established without computing the joint distribution of the variables explicitly: for instance, the union of independent sets of associated random variables forms a set of associated random variables; as can easily be checked, for any nondecreasing function φ : Rn → R, and any set of associated random variables {x 1 , . . . , x n }, the variables {φ(x ), x 1 , . . . , x n } are associated, where def φ(x ) = φ(x 1 , . . . , x n ). Lemma 8.66 If the random variables {x 1 , . . . , x n } are associated, then n n x i ≤st i =1 and xi , i =1 n n x i ≥st i =1 xi , i =1 where x is the product form version of x, namely the random vector such that • x i =st x i for all i = 1, . . . , n ; • the marginals of x are mutually independent. Note that the product form version x of x is only characterized through its probability law. Concerning recursions of the type (8.43), we also have the following theorem. Theorem 8.67 Assume that the function ( X , U ) → a ( X , U ) is nondecreasing. If the set {ξ(0), ξ(1), . . . } and the initial condition x (0) form a set of associated random variables, then the random sequence x , ξ is also associated. 8.5.2 Markov Chains Sequences generated like in (8.43) satisfy the following property: Theorem 8.68 If the sequence u (k ) is i.i.d., then {x (k )} forms a homogeneous Markov chain. For this and related results on Markov chain theory, see [121]. 8.6 Notes A good survey on the methods for deriving stochastic monotonicity results for classical queuing systems can be found in the book by D. Stoyan [123]. The interest of these techniques to analyze synchronization constraints was first stressed by A.M. Makowski, Z. Liu and one of the coauthors, in [12] for queuing systems, and in [10] for stochastic event graphs. The uniformization method for proving the concavity of throughput generalizes an idea of L.E. Meester and J.G. Shanthikumar (see [89] and [10]). The use of large deviation techniques for deriving growth rates for age-dependent branching processes was initiated by J.F. Kingman and D. Biggins [76] and [19]. The relation between 8.6. Notes 415 event graphs and branching processes which is presented in §8.3, is that considered in [8]. This approach has interesting connections with the work of B. Derrida on directed polymers in a random medium (see [56], [48]), from which the idea of Remark 8.42 originates. Most of the results mentioned in §8.4 come from [97], [101], [104] and [117]. In the latter reference, one can also find further results on the asymptotic normality of daters. The analysis of the finiteness of the state space in §8.4.2 is mainly drawn from [54]. The type of functional equation which is established in §8.4.3 is also considered in certain nonautonomous cases (see [9]). 416 Synchronization and Linearity Part V Postface 417 Chapter 9 Related Topics and Open Ends 9.1 Introduction In this chapter various items will be discussed which either did not find a natural place in one of the preceding chapters, which are only related to discrete events, or which are not yet fully grown in a scientific way. The personal bias of the authors will be clearly reflected in this chapter. There are no direct relations between the sections. Section 9.2 is concerned with various continuations of the linear theory developed throughout the book. Section 9.3 is devoted to the control of discrete event systems, whereas §9.4 gives a picture of the analogies between the theory of optimization and that of Markov chains. The last three sections are devoted to some (limited) incursions into the realm of general Petri nets and nonlinear systems. 9.2 About Realization Theory 9.2.1 The Exponential as a Tool; Another View on Cayley-Hamilton If a and b are reals (or −∞) then the following identities are easily verified: a ⊕ b = max (a , b) = lim s −1 (ln(exp (as ) + exp(bs ))) , s →∞ a ⊗ b = a + b = s −1 ln(exp (as ) exp (bs )) . Rather than working in the max-plus algebra setting with variables a , b , . . . , one can now envisage working with the variables exp(as ), exp (bs ), . . . , where s is a positive real, in conventional algebra. After having obtained results in conventional algebra, we must translate these results back into corresponding results in the max-plus algebra by using careful limit arguments when s → ∞. This procedure will be elucidated in this subsection and in §9.2.3. Instead of working with exp(as ), exp (bs ), . . . , we will work with z a , z b , . . . , z real, and study the behavior for z → ∞. In conventional calculus, the Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation. To be more explicit, let A be an n × n matrix with entries in R. If det(λ I − A) = λn + c1 λn−1 + · · · + cn−1 λ + cn , 419 (9.1) 420 then Synchronization and Linearity An + c1 An−1 + · · · + cn−1 A + cn I = 0 . In these equations I is the conventional identity matrix and 0 is the zero matrix. The coefficients ci , i = 1, . . . , n , in (9.1) satisfy Ai1 i1 · · · Ai1 ik . . . . (9.2) ck = (−1)k det . . . i1 <i2 <···<ik Aik i1 ... Aik ik def Now consider the matrix z A = z Ai j , i.e. the i j -th entry of z A equals z Ai j . The CayleyHamilton theorem applied to matrix z A yields (z A )n + ζ1 (z A )n−1 + · · · + ζn−1 z A + ζn I = 0 . (9.3) If the principal k × k submatrix occurring on the right-hand side of (9.2) is denoted A(i1 , i2 , . . . , ik ), then the coefficients ζk are given by det z A(i1 ,i2 ,... ,ik ) . ζk = (−1)k i1 <i2 <···<ik If we take the limit when z → ∞, then we obtain ζk ≈ (−1)k ζ k z maxi1 <i2 <···<ik dom A(i1 ,i2 ,... ,ik ) , (9.4) where dom (for dominant ) is a concept similar to per (for permanent ); for the latter see [91]. For an arbitrary square matrix B , dom ( B ) is defined as dom ( B ) = greatest exponent in det(z B ) ε if det(z B ) = 0 , otherwise. (9.5) The coefficient ζk in (9.4) equals the number of even permutations minus the number of odd permutations contributing to the highest-degree term in the exponents of z : max i1 <i2 <···<ik dom ( A(i1 , i2 , . . . , ik )) . Now let us consider the asymptotic behavior of (z A )k as z → ∞. One may easily understand that k (z A )k ≈ z A , (9.6) where Ak on the right-hand side denotes the k -th power of A for the matrix product in Rmax . Define ζk∗ = (−1)k ζ k , I = {k | 1 ≤ k ≤ n , ζk∗ > 0} , ∗ ck = dom ( A(i1 , i2 , . . . , ik )) . i1 <i2 <···<ik 9.2. About Realization Theory 421 Substitution of (9.4) and (9.6) into (9.3) yields the following: ∗ ζk∗ z ck z A n zA + n−k ∗ ζk∗ z ck z A ≈ k ∈I n−k . k ∈I Since all terms now have positive coefficients, the comparison of the highest degree terms in both members of this approximation leads to the following identity in Rmax : ∗ ck An−k = An ⊕ k ∈I ∗ ck An−k . (9.7) k ∈I / It is this identity that we consider as a version of the Cayley-Hamilton theorem in the max-plus algebra sense. ∗ Remark 9.1 The dominant, as it appears implicitly in (9.7) through the coefficients ck , can be directly obtained from A dom ( A(i1 i2 , . . . , ik )) = Ai1 j1 · · · Aik jk , where j1 , . . . , jk is a permutation of i1 , . . . , ik , and where the spect to all such permutations. -symbol is with re- Remark 9.2 It is important to realize that this version of the Cayley-Hamilton theorem differs slightly from the one given in §2.3.3. The reason is that in the derivation of the current version terms have been canceled in the calculation of ζ k as it appears in (9.4). If terms of equal magnitude but of opposite signature (of the permutations) had been kept, then one would have obtained the ‘original’ Cayley-Hamilton theorem in the max-plus algebra. Example 9.3 Consider 1 A= 4 e 23 1 ε , 53 ∗ which was also considered in §2.3.3. First the coefficients ck will be calculated: ∗ c1 = i1 ∗ c2 i1 <i2 = dom A(i1 ) = i1 dom ( Ai1 i1 ) = 1 ⊕ 1 ⊕ 3 = 3 , dom A(i1 , i2 ) = dom A(1, 2) ⊕ dom A(1, 3) ⊕ dom A(2, 3) =6⊕4⊕4 = 6 , ∗ c3 = dom A(1, 2, 3) = 12 . The quantity ζ 1 equals the number of even permutations minus the number of odd ∗ permutations needed to obtain c1 . The permutations of the diagonal elements are even ∗ end hence ζ 1 = +1. The permutation which realized c2 = 6, where the number 6 was obtained by A12 A21 , is odd and therefore ζ 2 = −1. Similarly, the permutation 422 Synchronization and Linearity ∗ which realized c3 = 12, where the number 12 was obtained by A13 A21 A32 , is even and therefore ζ 3 = +1. Thus one obtains ζk∗ = −1, k = 1, 2, 3, and (9.7) becomes A3 = 3 A2 ⊕ 6 A ⊕ 12e . Note that this equation was also given in §2.3.3, with the actual A substituted. However, in that section the characteristic equation was first simplified before A was substituted into this characteristic equation. From the above example it is clear that for any square matrix A, ζ 1 = +1 and hence An and An−1 always appear on different sides of the equality symbol in the Cayley-Hamilton theorem. The lower order exponentials of A can appear at either side (but not on both sides simultaneously) of the equality symbol in the current version of the Cayley-Hamilton theorem. 9.2.2 Rational Transfer Functions and ARMA Models In conventional discrete time system theory a rational transfer function can be exm n i j pressed as the ratio of two polynomials p (z ) = i =0 pi z and q (z ) = j =0 q j z (z is the delay operator). Let U (z ) and Y (z ) denote the z -transforms of the input and of the output trajectories u (·) and y (·) respectively. We have Y (z ) = p (z ) U (z ) ⇔ q (z )Y (z ) = p (z )U (z ) ⇔ q (z ) n m q j y (t + j ) = j =0 pi u (t + i ) . i =0 In Statistics the last equation is known as an ‘ARMA’ model: the ‘autoregressive’ (AR) part of the model corresponds to the left-hand side of the equation, whereas the ‘moving average’ (MA) part is the right-hand side. ax In §5.7 rational transfer functions H (γ , δ) ∈ Min [[γ , δ ]] were identified with functions which can be written as C A∗ B , where C (respectively B ) is a row (respectively a column) vector and A is a square matrix. The entries of C and B may be restrained to be Boolean and those of A are elements of Max[[γ , δ ]] which can be represented by in polynomials of degree 1 in γ and δ (see Theorem 5.39). Our main objective here is to show that rational transfer functions are amenable to ARMA models as previously. However, since there is no possibility of having ‘negative’ coefficients of polynomials, the AR and the MA part should both appear in both sides of the equation, which yields an implicit equation. This implicit equation in Y (U is given) may have several solutions in general, and among them, there is the ‘true’ solution of Y = C A∗ BU . No results are available yet to select this true solution among the possibly many other solutions. Our purpose is just to show a utilization of the Cayley-Hamilton theorem to pass from the (C , A, B )-form to the ARMA form. Lemma 9.4 If Y is the output of a rational transfer function when U is the input, then ax there exist four polynomials p1 , p2 , q1 , q2 ∈ Min [[γ , δ ]], with deg( p1 ) < deg(q2 ) and ax deg( p2 ) < deg (q1 ), such that, for all U ∈ Min [[γ , δ ]], Y satisfies q1 Y ⊕ p1 U = q2 Y ⊕ p2 U . (9.8) 9.2. About Realization Theory 423 Proof Note that Y can be written as C X with X = AX ⊕ BU (conditions on (C , A, B ) have been recalled earlier). In Theorem 2.22 it was shown that, in a commutative ax dioid such as Min [[γ , δ ]], there exist two polynomials p+ (z ) and p − (z ) of an abstract variable z with coefficients belonging to the dioid (here Max[[γ , δ ]]), such that in p+ ( A) = p− ( A) . The explicit form of these polynomials, given in Definition 2.21, shows that their coefficients are themselves polynomials in (γ , δ) since A is a polynomial matrix. Now, for any k N, we have X = Ak X ⊕ e ⊕ A ⊕ · · · ⊕ Ak−1 B U . (9.9) + 1 Let p+ = n=0 pk z k and consider a similar expression for p− (with degree n 2 ). One k + can multiply both sides of Equation (9.9) by pk and sum up all these equations for k = 0, . . . , n 1 . This yields n1 + pk X = p+ ( A) X ⊕ r + ( A) BU k =0 a2 a3 a1 for some polynomial r + (z ) of degree less than n 1 , the form of which is not given in detail here. In a similar way, one can obtain n2 pl− X = p− ( A) X ⊕ r − ( A) BU l =0 a5 a6 a4 for some polynomial r − (z ) of degree less than n 2 . Note that a2 = a5 by the CayleyHamilton theorem. Then, we have a1 ⊕ a6 = a2 ⊕ a3 ⊕ a6 = a3 ⊕ a5 ⊕ a6 = a3 ⊕ a4 . To complete the proof, it suffices to multiply both sides of this equation by C (which commutes with ‘scalars’) to let Y appear (Y = C X ). 9.2.3 Realization Theory For this subsection the reader is assumed to be familiar with conventional realization theory, see e.g. [72]. In §1.3 the following question was posed: How do we obtain a time-domain representation, or equivalently, how do we find A, B and C , if the p × m transfer matrix H (γ ) = γ C B ⊕ γ 2 C AB ⊕ γ 3 C A2 B ⊕ · · · is given? For the sake of simplicity we will confine ourselves to SISO systems, i.e. matrices B and C are vectors (m = p = 1). One may be tempted to study the related 424 Synchronization and Linearity semi-infinite Hankel matrix G defined by g1 g2 g2 g3 G = g3 g4 g4 : . . . . . . g3 g4 g5 : ··· ··· ··· , where gi = C Ai −1 B ; these quantities are sometimes called Markov parameters. Only partial results, some of which will be shown now, have been obtained along this line. The matrix G (≤i )(≤ j ) is, as in Chapter 2, defined as the submatrix of G , consisting of the intersection of the first i columns and the first j rows of G . As an example, consider the Markov parameters g1 = 1, g2 = 3, g3 = 0, g4 = 1, g5 = −2, g6 = −1, g7 = −4, . . . . (9.10) It is easily verified that for this series, dom G (≤1)(≤1) = 1 , dom G (≤2)(≤2) = 6 , dom G (≤3)(≤3) = 0 , and dom G (≤i )(≤i ) = ε for i ≥ 4 , where dom was defined in (9.5). This, with the conventional theory in mind, might lead to the conclusion that the minimal realization would have order 3. This is false since the Markov parameters above were derived from the system with A= ε −2 e −3 , B= ε e , C= 31 , (9.11) and hence the minimal realization will maximally have order 2. Studying dom G (≤i )(≤i ) turns out not to be very fruitful. A better approach is to consider linear dependences among the rows of G . For the current example, for instance, we have G ·i = (−3)G ·i −1 ⊕ (−2)G ·i −2 , i = 3, 4, . . . . Now we can use the following theorem. Theorem 9.5 Given the series {gi }∞ 1 such that for the corresponding Hankel matrix, i= G ·i = c1 G ·i −1 ⊕ · · · ⊕ cn G ·i −n , i = n + 1, n + 2, . . . , holds true for certain coefficients c1 , . . . , cn , and where n is the smallest integer for which this, or another linear dependence (see below), is possible, then the discreteevent system characterized by e ε e ε ... ε g1 ε ε g2 ε e ... ε . . . . , . A= . B= . C = . , , . . ε ... ... gn − 1 ε e . . gn cn . . . . . . c2 c1 ε is a minimal realization. 9.2. About Realization Theory 425 The proof can be found in [105]. The essence of the proof consists in converting the statement of the theorem into the conventional algebra setting by means of the exponential transformation as introduced in §9.2.1, giving the proof there and then returning to the max-plus algebra setting. In the statement of the theorem above, the notion of linear dependence of columns is used. Definition 9.6 Column vectors v1 , . . . , vn are said to be linearly dependent if scalars c1 , . . . , cn , not all ε, and a subset I ∈ {1, . . . , n } exist such that ck vk = ck vk . k ∈I k ∈I If this theorem is applied to the series in (9.10), then the result is A= ε −2 e −3 , B= 1 3 , C= e ε , which is different from (9.11), although both 3-tuples ( A, B , C ) characterize the same series of Markov parameters. Unfortunately, Theorem 9.5 is of limited use. The reason is that it cannot deal with general linear dependences of column vectors. Take as an example g1 = 5 , g2 = 8 , g3 = 11.5 , g4 = 15.5 , g5 = 19.5 , . . . . (9.12) For the corresponding Hankel matrix the following dependence is true: G ·i ⊕ 7G ·i −2 = 4 G ·i −1 , i = 3, 4, . . . , but Theorem 9.5 does not cover this kind of linear dependence. The system characterized by 37 5 A= , B= , C = e 3.5 , −2 4 e however, is a minimal realization of the Markov parameters given in (9.12). The conclusion of this subsection is that, given an arbitrary series of Markov parameters, it is not known how to obtain a minimal state space realization (if it exists). In the next subsection, however, the reader will find a recent development. 9.2.4 More on Minimal Realizations R.A. Cuninghame-Green [50] has recently come up with a promising method to obtain a state space realization from a series of Markov parameters. The following two theorems are used, the proofs of which can be found in [49]. In these theorems, if K is a matrix, then K is the matrix obtained from K by transposition and a change of sign. Theorem 9.7 For a general matrix K , ( K K) ⊗ K = K . The symbol refers to the multiplication of two matrices (or of a matrix and a vector) in which the min-operation rather than the max-operation is used; it will be discussed more extensively in §9.6. The theorem just formulated states that all columns of K are eigenvectors of K K . 426 Synchronization and Linearity Theorem 9.8 For a given matrix K , consider D, where D = K ⊗ K d and K d is derived from K K by replacing all diagonal elements by ε. Then a column of K is linearly dependent on the other columns, i.e. one column can be written as a linear combination of the other columns in the max-plus algebra sense, if and only if it is identical to the corresponding column of D. The corresponding column of K d then yields the coefficients expressing the linear dependence. This theorem gives a routine method of finding linear dependences among the columns of a given matrix. This linear dependence is to be understood as one column being written as a linear combination of the others. Note that this definition of linear dependence is more restrictive than the definition used in §9.2.3, Definition 9.6. For the realization one forms a Hankel matrix G (≤n+1)(≤n+1) for some n sufficiently large. From Theorem 9.7 we know that the columns of G (≤n+1)(≤n+1) are preserved by the action of G (≤n+1)(≤n+1) (G c )(≤n+1)(≤n+1) . It follows that, if A is the matrix obtained by dropping the first row and last column of G (≤n+1)(≤n+1) (G c )(≤n+1)(≤n+1) , then g2 g1 g2 g3 A ⊗ . = . , . . . . gn gn + 1 i.e. A moves the Markov parameters ‘one position up’. A state space realization is now obtained by A, by B as the first column of G (≤m )(≤m ) and by C = e ε . . . ε . In general, the realization found will not have minimal dimension. In order to reduce the dimension, Theorem 9.8 is used. One searches for column linear dependences, as well as for row linear dependences of A. By simultaneously deleting dependent rows and columns of the same index from A, the state space dimension is reduced. As an example, consider the Markov parameters g1 = 0, g2 = 3, g3 = 6, g4 = 10, g5 = 14, g6 = 18, . . . , and take n = 3. It is easily verified, by following the procedure described above, that 3 e −4 e e , B = 3 , C = e ε ε . A= 6 3 10 7 4 6 Since the second column of A depends linearly on the first one, and the second row depends linearly on the other rows (it is linearly dependent on the last row), the second row and second column can be deleted so as to obtain a state space realization of lower dimension: A= 3 −4 10 4 , B= e 6 , C= e ε . This latter realization turns out to have minimal dimension. It is left as an exercise to show that if started with n = 2 rather than with n = 3, one would have obtained the wrong result. This in spite of the fact that the Hankel matrix G has ‘rank’ 2; for i ≥ 1 we have that (−4)G ·i +2 ⊕ 3 G ·i = G ·i +1 . 9.3. Control of Discrete Event Systems 9.3 427 Control of Discrete Event Systems In this section special instances of nonlinear systems will be described, for which the max-plus setting is still appropriate. The system equations to be considered have the form x (k + 1) = A(u (k ), u (k − 1))x (k ) . (9.13) Matrix A can be controlled by the decision variable u to be defined. In Chapter 1 a decision variable u was encountered also. There it had the function of an input to the system; nodes of the underlying network had to wait for external inputs. In the current setting as expressed by (9.13), u influences the entries of the system matrix A. For a motivation of the system described by (9.13), think of a production planning where the holding times at the nodes are zero and where the standard traveling time from node j to node i is indicated by Ai j . This traveling time can be reduced maximally by an amount c if an extra piece of equipment is used. Rather than Ai j , it then becomes def A i j = max ( Ai j − c, 0). It is assumed that only one such a piece of equipment is available and that it can be used only once (i.e. at one arc, connecting two nodes) during each k -step. One can envisage situations in which this piece of equipment could be used a number of times during the same k -step, at different arcs of the network. Although such a generalization can in principle be handled within the context of this section, the analysis becomes rather laborious and such a generalization will therefore not be considered. Suppose we are given a network with two nodes. If no extra piece of equipment were available, the evolution of the state vector x (k ) ∈ R is according to x (k + 1) = Ax (k ) in Rmax , where 31 A= . 42 Boolean variables u i j (k ), i, j = 1, 2, are now introduced to describe the control actions; they are defined subject to u i j (k ) = 0 or = 1. Hence maximally one of the u i j (k ) can be 1, which indicates that the piece of extra equipment has been set in for arc ai j , from node j to node i during the k -th cycle. Thus there are five possibilities; the piece of equipment is not applied or it is applied to one of the arcs corresponding to Ai j , i, j = 1, 2. If c = 2, these possibilities result in the following matrices A: 3 4 1 2 11 42 , , 30 42 , 3 2 1 2 , 31 40 . Formally, we can write x (k + 1) = ◦ A11/u 2 11 ◦ A21/u 2 21 ◦ A12/u 12 ◦ A22/u 2 22 x (k ) . This equation does not take into account the fact that the piece of equipment might not be available at the appropriate node at the right time. For this reason, an extra state variable x 3 will be introduced; x 3 (k ) denotes the epoch of release of the equipment 428 Synchronization and Linearity during the (k − 1)-st cycle. The correct evolution equations then become x 1 (k + 1) x 2 (k + 1) ◦ A11/u 2 11 ◦ A21/u 2 21 = ⊕ 2 x 3 (k ) ◦ A12/u 12 ◦ A22/u 2 22 x 1 (k ) x 2 (k ) ln(u 11 (k )) ⊕ ln(u 21 (k )) ln(u 12 (k )) ⊕ ln(u 22 (k )) x 3 (k ) , (9.14) 2 = A j i ln(u j i (k − 1))x i (k ) , (9.15) i =1 j =1 where we made the convention that ln(1) = e and ln(0) = ε. If (9.15) were substituted into (9.14), then an equation of the form (9.13) would result. If we start at state 0 0 0 for k = 0, then the five possible next states are respectively1 3 1 3 3 3 4 , 4 , 4 , 2 , 4 . 3 1 3 2 3 From these states, new states can be reached again. Thus a tree of states can be found. We will not count states as such if they are linearly dependent on an already existing state. The states will be normalized by adding a same scalar to all components of a state vector, such that the last component becomes zero. (other normalizations are possible, such as for instance setting the least component equal to zero). It turns out that five different normalized states exist. Some trial and error will show that whatever the initial condition is, the evolution will always end up in these five states in a finite number of steps. These states are indicated by n i , i = 1, . . . , 5, and are given, together with the possible follow-ups, in Table 9.1. In the same table the normalization factors are given. If according to this table, n i is mapped to n j with a normalization factor a , then the actual mapping is n i → a ⊗ n j . This table defines a Markov chain the ‘transition matrix’ of which is given below. The j i -th entry equals the normalization factor corresponding to the mapping from n i to n j by means of an appropriate control, if this control exists. If it does not exist, then the entry is indicated by ε: ε34εε 3 3 4 4 3 V = 1 ε ε 2 ε . 2 ε ε 3 ε ε23εε Suppose that in Table 9.1 an initial state n i would have been mapped to another state n j twice, with normalization factors a and b respectively, with a < b (which does not occur in this example though). Matrix V should then have contained the smaller of the two factors a and b . Since the controls should be chosen in such a way that the network 1 The first one is the result of applying u = 0 for all i, j ; for the second one, we used u ij 11 = 1 and the other ui j = 0; for the third, fourth and fifth ones, u12 = 1, u12 = 1 and u22 = 1, respectively, where the nonmentioned u-entries remain zero. 9.4. Brownian and Diffusion Decision Processes 429 Table 9.1: Possible transitions Initial state New states according to the five different controls n1 = 0 00 n2 3 n3 1 n2 3 n4 2 n2 3 n2 = 0 10 n2 3 n5 2 n2 3 n1 3 n2 3 n3 = 0 30 n2 4 n2 4 n5 3 n2 4 n1 4 n4 = 1 00 n2 4 n3 2 n2 4 n4 3 n2 4 n5 = 0 20 n2 3 n2 3 n2 3 n2 3 n2 3 operates as fast as possible, this matrix V will be considered in the min-plus algebra; hence ε = +∞. The eigenvalue of this matrix is found by applying Karp’s algorithm; it turns out to be equal to 2.5. There are two critical circuits, namely n1 → n3 → n1 , n2 → n5 → n2 . There are two different periodic solutions to our problem; they are characterized by the two critical circuits. From Table 9.1, it will be clear how to control the network, i.e. where to use this extra piece of equipment, such that the evolution of the state equals one of these periodic solutions. 9.4 Brownian and Diffusion Decision Processes We show the analogy between probability calculus and dynamic programming. In the former area, iterated convolutions of probability laws play a central role; in the latter area, this role is played by the inf-convolution of cost functions. The main analysis tool is the Fourier transform for the former situation, and it is the Fenchel transform for the latter. Quadratic forms, which form a stable set by inf-convolution, correspond to Gaussian laws, which are stable by convolution. Asymptotic theorems for the value function of dynamic programming correspond to the law of large numbers and the central limit theorem. Straight line optimal trajectories correspond to Brownian motion trajectories. The operator v → ∂v/∂ t − (∂v/∂ x )2 , which will be appear as a min-plus linear operator, corresponds to the operator v → ∂v/∂ t + ∂ 2 v/∂ x 2 . The min-plus √ function x 2 /2t corresponds to the Green function (1/ 2π t ) exp(−x 2 /2t ). A diffusion decision process with generator v → ∂v/∂ t − b (x )∂v/∂ x − a (x )(∂v/∂ x )2 corresponds to the diffusion process with generator ∂/∂ t + b (x )∂/∂ x + a (x )∂ 2 v/∂ x 2 . 430 9.4.1 Synchronization and Linearity Inf-Convolutions of Quadratic Forms For m ∈ R and σ ∈ R+ , let Q m ,σ (x ) denote the quadratic form in x defined by Q m ,σ (x ) = x −m σ 1 2 Q m , 0 ( x ) = δm ( x ) = 2 for σ = 0 , 0 +∞ for x = m ; otherwise. These quadratic forms take a zero value at m . Given two mappings f and g from R into R, we define the inf-convolution of f and g [119] as the mapping from R into R (with the convention ∞ − ∞ = ∞ ) defined by z → inf ( f (x ) + g ( y )) . x + y =z It is denoted f ⊗ g . Theorem 9.9 We have Q m 1 ,σ1 ⊗ Q m 2 ,σ2 = Q m 1 +m 2 , √ 2 2 σ 1 +σ 2 . This result is the analogue of the (conventional) convolution of Gaussian laws (denoted ∗): 2 2 N (m 1 , σ1 ) ∗ N (m 2 , σ2 ) = N (m 1 + m 2 , σ1 + σ2 ) , where N (m , σ ) denotes the Gaussian law with mean m and standard deviation σ . Therefore, there exists a morphism between the set of quadratic forms endowed with the inf-convolution operator and the set of exponentials of quadratic forms endowed with the convolution operator. Clearly this result can be generalized to the vector case. 9.4.2 Dynamic Programming Given the simplest decision process: x (n + 1) = x (n ) − u (n ) , x 0 given, for x (n ) ∈ R, u (n ) ∈ R, n ∈ N, and the particular additive cost function N −1 c(u (i )) + φ(x ( N )) min u(0),u(1),... ,u( N −1) , i =0 where c and φ are mappings from R into R which are supposed to be convex, lowersemicontinuous in the conventional sense, equal to zero at their minimum and thus nonnegative. Let m denote the abscissa where c achieves its minimum, then min c(·) = c(m ) = 0 . 9.4. Brownian and Diffusion Decision Processes 431 The assumptions retained here are not minimal but they will simplify our discussion. The value function defined by N −1 v(n , x ) = c(u ( p )) + φ(x ( N )) min u(n),... ,u( N −1) x (n ) = x p =n satisfies the dynamic programming equation v(n , x ) = min (c(u ) + v(n + 1, x − u )) , u v( N , x ) = φ(x ) . It can be written using the inf-convolution: v(n , ·) = c ⊗ v(n + 1, ·) , v( N , ·) = φ , that is (with the change of time index p = N − n , and the choice φ = δ0 ), v( p , ·) = c p (·) ⊗ δ0 = c p (·) . This, in words, means that the solution of the dynamic programming equation in this particular case of an ‘independent increment decision process’ is obtained by iterated inf-convolutions of the instantaneous cost function. In a more general case, the instantaneous cost c depends on the initial and the final states of a decision period, namely x (n ) and x (n + 1) (and not only on the state variation u (n ) = x (n + 1) − x (n )). Moreover the dynamics is a general Markovian process, namely, x (n + 1) ∈ (x (n )) (where denotes a set-valued function from R into 2R ). Then the dynamic programming equation becomes v(n , x ) = min (c(x , y ) + v(n + 1, y )) , y ∈ (x ) v( N , x ) = δ0 (x ) , the solution of which can be written, with the same change of time, as v(n , ·) = cn ⊗ δ0 (·) , where the product of two kernels is now defined as [c1 ⊗ c2 ] (x , z ) = min (c1 (x , y ) + c2 ( y , z )) . y ∈ (x ) This more general case is the analogue of the general Markov chain case. In addition to the analogues of the law of large numbers and of the central limit theorem, we will show the analogue of the Brownian motion and of diffusion processes. Before addressing this issue, let us recall, once more, that the role of the Fourier transform in probability theory is played by the Fenchel transform in dynamic programming as it was noticed for the first time in [17]. 432 9.4.3 Synchronization and Linearity Fenchel and Cramer Transforms Let f be a mapping from R → R, supposed to be convex, l.s.c. and proper (i.e. never equal to −∞) and let f : R → R be its Fenchel transform (see Remark 3.36). Then it can be shown that f is convex, l.s.c. and proper. Example 9.10 The function defined by Fe Q m ,σ ( p ) = (1/2) p2 σ 2 + pm is the analogue of the characteristic function of a Gaussian law. The transform Fe behaves as an involution, that is, Fe (Fe ( f )) = f for all convex, proper, l.s.c. functions f . As already noticed, the main interest of the Fenchel transform is its ability to convert inf-convolutions into sums, that is, Fe ( f ⊗ g ) = Fe ( f ) + Fe ( g ) . Applying the Fenchel transform to the dynamic programming equation in the case when c depends only on x , we obtain v( N , ·) = Fe φ + N c . Using the fast Fenchel algorithm [31], this formula gives a fast algorithm to solve this particular instance of the dynamic programming equation. Moreover, let us recall that the Fenchel transform is continuous for the epigraph topology, that is, the epigraphs of the transformed functions converge if the epigraphs of the source functions converge for a well chosen topology. We can use, for example, the Hausdorff topology for the epigraphs which are closed convex sets of R2 , but this may be too strong (see [71] and [2] for discussions of these topological aspects). Here we will be more concerned with the analogies between probability and deterministic control. Example 9.11 Let ν : x → ν x . One has [Fe ( ν )]( p ) = δν ( p ). When ν → 0, then δν → δ0 in the epigraph sense, but it does not converge pointwise even if ν → 0 pointwise. Moreover, the pointwise convergence of numerical convex, l.s.c. functions towards a function in the same class implies the convergence of their epigraphs. The Cramer transform is defined by Fe ◦ log ◦L, where L denotes the Laplace transform. Therefore, it transforms the convolutions into inf-convolutions. Thus it is exactly the morphism which we are interested in. Unfortunately, it is only a morphism for a set of functions endowed with one operation, the convolution. It is not a morphism for the sum (the pointwise sum of two functions is not transformed by the Cramer transform into the pointwise min of the transformed functions). Moreover, the Cramer transform convexifies the functions but the inf-convolution is defined on a more general set of functions. Nevertheless the mapping limν →0 logν defines a morphism of algebra between the asymptotics (around zero) of positive real functions of a real variable and the real numbers endowed with the two operations min and plus. Indeed, lim logν (ν a + ν b ) = min(a , b ) , ν →0 logν (ν a ν b ) = a + b . This transformation has been already utilized in §9.2 under a slightly different form. We can now study the analogues of the limit theorems of probability calculus. 9.4. Brownian and Diffusion Decision Processes 9.4.4 433 Law of Large Numbers in Dynamic Programming Suppose we are given two numerical mappings c and φ which are nonnegative, convex, l.s.c. and which are equal to zero at their unique minimum. The first and second order derivatives of a function c are denoted c and c respectively. ˙ ¨ ¨ To simplify the discussion, let us suppose that c ∈ C 2 and |1/c(u )|∞ < ∞ in a neighborhood of the minimum. Let m denote the abscissa where c achieves its minimum (zero) value. Let w N (x ) be the mapping x → v( N , N x ). For the value function, this scaling operation corresponds to the conventional averaging of the sample. Theorem 9.12 (Weak law of large numbers for dynamic programming) Under the foregoing assumptions, we have lim v( N , N x ) = δm (x ) , N →∞ the limit being in the sense of the epigraph convergence. Proof We have v ( N , p ) = φ ( p / N ) + N c( p / N ) , lim φ ( p / N ) = φ (0) = 0 , N →∞ since φ admits a zero minimum by assumption. Moreover, c(0) = 0 for the same reason. Then c( p ) admits a Taylor expansion around 0 of the form pm + O( p2 ). Indeed, ˙ c( p) = x o ( p ) + x o ( p )( p − c (x o ( p )) = x o ( p ) = m + O( p ) , ˙ where x o ( p ) denotes the point at which the maximum is achieved in the definition of the Fenchel transform of c. Therefore, v ( N , p ) = pm + O(1/ N ). Then, using the continuity of the Fenchel transform, we obtain lim Fe (v( N , ·)) = Fe ( pm ) = δm . N →∞ 9.4.5 Central Limit Theorem in Dynamic Programming We have the analogue of the central limit theorem √ probability calculus. The value of function, centered and normalized with the scaling N , is asymptotically quadratic. Theorem 9.13 (Central Limit Theorem) Under the foregoing assumptions, we have lim v N , N →∞ √ N (y + N m) = The limit is in the sense of epigraph convergence. 1 c(m ) y 2 . ¨ 2 434 Synchronization and Linearity Proof We make the expansion up to the second order of p → r N ( p ) where r N is the √ mapping y → v N , N ( y + N m ) . But p p r N ( p) = φ √ + N cm √ , N N where cm ( y ) = c( y + m ). Then we have φ (0) = 0 and cm (0) = 0 because the minima of φ and cm are zero. ˙ Let us expand cm up to the second order. We have seen that cm ( p ) = x o ( p ), and ¨ therefore c m ( p ) = x o ( p ). Moreover, we know that x o ( p ) is defined by p −cm (x o ( p )) = ˙ ˙ 0, and therefore 1 − cm (x o ( p ))x o ( p ) = 0, that is, x o ( p ) = 1/cm (x o ( p )). Finally, ¨ ˙ ˙ ¨ r N ( p) = 1 p2 + o(1) . 2 cm (0) ¨ We obtain the result by passing to the limit using the continuity of the epigraph of the Fenchel transform. These results can be extended to the vector case, to the case when c depends on time, etc. 9.4.6 The Brownian Decision Process Let us consider the discrete time decision process ( T / h )− 1 min u i =0 (u (ih ))2 + 2h (x (T )) , x (t + h ) = x (t ) − u (t ) . It satisfies the dynamic programming equation v(t , x ) = min u u2 + v(t + h , x − u ) , 2h v(T , ·) = . The cost function Q 0,√h is therefore the analogue of the increment of Brownian motion on a time step of h . The analogue of the independence of the increments of the Brownian motion is the independence of the instantaneous cost function u 2 / h from the state variable x . Let us make the change of control u = w h in the dynamic programming equation. We obtain h w2 v(t , x ) = min + v(t + h , x − w h ) . w 2 Passing to the limit when h → 0, we obtain the Hamilton-Jacobi-Bellman (HJB) equation ∂v ∂v w2 + min −w + = 0, v(T , ·) = , w ∂t ∂x 2 that is, ∂v 1 ∂v , − ( )2 = 0 , v(T , ·) = ∂t 2 ∂x 9.4. Brownian and Diffusion Decision Processes 435 which is the analogue of the heat equation ∂v 1 ∂2v =0 , + ∂t 2 ∂x2 v(T , ·) = . Therefore, we can see the Brownian decision process as the Sobolev space H 1 (0, T ) T ˙ endowed with the cost function W (ω) = 0 (ω)2 dt for any function ω ∈ H 1 (0, T ). Then the decision problem can be written def MW (x (T )) = min ω ∈ H 1 (0 , T ) (W (ω) + (x (T ; ω))) (9.16) by analogy with probability theory. The function W is the analogue of the Brownian measure, and it can be interpreted as the cost of choosing ω. Then (x (T ; ω)) is the cost of a decision function (x (T ; ·)) once we have chosen ω. But the solution of the Hamilton-Jacobi equation ∂v 1 − ∂t 2 2 ∂v ∂x =0 , v(T , ·) = δ y , is unique [86], and is explicitly given by v(t , x ) = ( y − x )2 , 2( T − t ) t≤T . It can be considered as the min-plus Green kernel of the dynamic programming equation and as the analogue of the Green kernel of the Kolmogorov equation for the Brownian equation, namely 1 ( y − x )2 exp − √ . 2( T − t ) 2π(T − t ) Therefore, by min-plus linearity, we can derive the solution of min ∂v ∂x 1 ∂v − ∂t 2 2 ,c − v =0 , v(T , ·) = , which is the solution of the control problem v(t , y ) = MW min min c(x s (ω)), (x (T ; ω)) t ≤s ≤ T x (t ) = y , where s denotes a stopping time that we also want to optimize. This cost is clearly the min-plus analogue of T v(t , y ) = E W c(x (s ; ω)) ds + (x (T ; ω)) x (t ) = y . t The solution of the decision problem is v(t , x ) = min min y ( y) + ( y − x )2 ( y − x )2 , min min c( y ) + 2 ( T − t ) t ≤s ≤ T y 2(s − t ) . 436 Synchronization and Linearity This formula is the analogue of v(t , x ) = ( y ) exp − ( y − x )2 2( T − t ) T dy + ds c( y ) exp − t ( y − x )2 2(s − t ) dy . Using the change of time s = T − t , we can summarize this part by the following theorem. Theorem 9.14 We have lim ( Q 0,√h )[s / h] = Q 0,√s , h →0 where [x ] denotes the integer part of x. Moreover, Q 0,√s is the unique solution of ∂Q 1 ∂Q 2 +( ) =0 , ∂s 2 ∂x 9.4.7 s≥0 , Q 0 , 0 = δ0 . Diffusion Decision Process In the previous subsection, the system dynamics was trivial and the instantaneous cost depended on the control only. Let us generalize this situation with a more general instantaneous cost, which will induce more complex optimal trajectories and which is the complete analogue of the diffusion process. We consider the discrete decision process ( T / h )− 1 min u i =0 (u (ih ) − b (ih )h )2 + 2h (σ (ih ))2 (x (T )) , x (t + h ) = x (t ) − u (t ) . It satisfies the dynamic programming equation v(t , x ) = min u (u − b (x )h )2 + v(t + h , x − u ) , 2h σ 2 v(T , ·) = . By the change of control u = w h in the dynamic programming equation and by passing to the limit when h → 0, we obtain the HJB equation defined, for t ≤ T , by ∂v ∂v σ (x )2 − b (x ) − ∂t ∂x 2 ∂v ∂x 2 =0 , v(T , ·) = . This is the HJB equation corresponding to the variational problem T v(t , x ) = min x∈H 1 t 1 2 x −b ˙ σ 2 dt + (x (T )) . (9.17) This HJB equation is the analogue of the Kolmogorov equation ∂v ∂v σ (x )2 ∂ 2 v =0 , + b (x ) + ∂t ∂x 2 ∂x2 v(T , ·) = . It is not necessary that the instantaneous cost be quadratic for the discrete decision process to converge to the diffusion decision process. 9.5. Evolution Equations of General Timed Petri Nets 437 Theorem 9.15 The discrete decision process ( T / h )− 1 ch (u (ih ), x (ih )) + min u (x (T )) , x (t + h ) = x (t ) − u (t ) , i =0 admits the discrete dynamic programming equation v(t , x ) = min (ch (u , x ) + v(t + h , x − u )) , u v(T , ·) = , which converges to the continuous dynamic programming equation ∂v ∂v σ (x )2 − b (x ) − ∂t ∂x 2 ∂v ∂x 2 =0 , v(T , ·) = , as long as ch (x , p ) = b (x ) p + σ (x )2 2 p h + o(h ) , 2 where ch denotes the Fenchel transform of the mapping u → ch (u , x ). The variational problem (9.17) was encountered by researchers in large deviation when they studied differential equations perturbed by a small Brownian noise. For example, we have the following estimate: lim ν log (Pν [x (T ) ∈ (z − →0 ν →0 ,z + = ) | x (0) = y ]) T min x ∈ H 1 (0, T ), x (0)= y , x ( T )= z t 1 2 x −b ˙ σ 2 dt , where Pν denotes the probability law of a diffusion process with drift term b and diffusion term νσ . We conclude this section by summarizing the analogy between probability and dynamic programming in Table 9.2. 9.5 Evolution Equations of General Timed Petri Nets The aim of this section is to provide the basic equations that govern the evolution of general Petri nets, when structural consumption conflicts are resolved by a predefined ‘switching’ mechanism. These equations can be viewed as a nonlinear extension of the evolution equations for event graphs (see §2.5). The system of notation concerning timed Petri nets is that introduced in Chapter 2. 9.5.1 FIFO Timed Petri Nets We will adopt the following definition concerning the numbering of tokens traversing a place and the numbering of firings of a transition, that generalizes that of §2.5. 438 Synchronization and Linearity Table 9.2: Analogy between probability and dynamic programming Probability Dynamic programming + min × + N (m , σ ) Q m ,σ minx c(x ) = 0 d F (x ) = 1 EF f = f (x ) d F (x ) Mc f = infx { f (x ) + c(x )} Convolution Inf-convolution Fourier: F (s ) = E F (exp ( j s X )) Fenchel: c( p ) = −Mc (− p X ) d ds log( F )(0) = j x d F (x ) = j m ˙ c(0) = m : c(m ) = minx c(x ) − ds 2 log( F )(0) = (x − m )2 d F (x ) ¨ c(0) = 1/c(m ) ¨ Brownian motion Brownian decision process d2 v→ ∂2 v v→ ∂x2 ∂ v → ∂v + 1 ∂ xv ∂t 22 √ 1/ 2π t exp(−x 2 /(2t )) 2 v→ ∂v ∂t − 1 2 ∂v 2 ∂x Diffusion decision process + b(x ) ∂v + a (x ) ∂ xv ∂x ∂2 Invariance principle ∂v ∂t x 2 /(2t ) Diffusion process v→ ∂v 2 ∂x 2 v→ ∂v ∂t − b (x ) ∂v − a (x ) ∂x ∂v 2 ∂x Min-plus invariance principle 9.5. Evolution Equations of General Timed Petri Nets 439 • The initial tokens of place p are numbered 1, . . . , M p , whereas the n -th token, n > M p , of place p is the (n − M p )-th to enter p after the beginning of the network evolution. Tokens entering p at the same time are numbered arbitrarily. • The n -th firing, n ≥ 1, of transition q is the n -th firing of q to be enabled from the beginning of the network evolution. Firings of q enabled at the same time are numbered arbitrarily (nothing prevents the same transition from being enabled twice at the same epoch). Timing is involved in the evolution of the system through the following two rules. • The n -th initial token of place p , n ≤ M p , is not considered immediately available for downstream transitions. It is put in place p at time z p (n ) (where the function z p is given), and it has then to stay in p for a minimal holding time α p (n ) before enabling the transitions that follow p . Similarly, the n -th token of place p, n > M p (or equivalently the (n − M p )-th to enter p ) can only be taken into account by the transitions that follow p , α p (n ) units of time after its arrival. • Each transition starts firing as soon as it is enabled (we will discuss the problem that arises with conflicts later on). Once transition q is enabled for the n -th time, the tokens that it intends to consume become reserved tokens (they cannot contribute to enabling another transition before being consumed by the firing of transition q ). Once it is enabled, the time for transition q to complete its n -th firing takes βq (n ) units. Once the firing time is completed, the transition completes its firing. This firing completion consists in withdrawing one token from each of the places that precede q (the reserved tokens), and adding one new token into the places that follow q . These two actions are supposed to be simultaneous. FIFO places and transitions have been defined at §2.5.2.2. Example 9.16 To make the following results more tangible, we deal throughout the section with the FIFO Petri net of Figure 9.1. This Petri net can be considered as a closed queuing network with two servers and two infinite buffers. The customers served by server 1 (respectively 2) are routed to buffers 1 or 2 (respectively 2 or 1). In order to obtain simple results, we have chosen constant holding times on places and zero firing times (i.e. βq (n ) = 0, ∀q ∈ Q, n ∈ N and α p (n ) = α p ∈ R+ , ∀ p ∈ P , n ∈ N). 9.5.2 Evolution Equations Let U = (U1 , . . . , Un ) be a vector of Rn . The symbol R(U ) denotes the vector R(U ) = Ui (1) , . . . , Ui (n) ∈ Rn , where i : {1, . . . , n } → {1, . . . , n } is a bijection such that Ui ( 1 ) ≤ Ui ( 2 ) . . . ≤ Ui ( n ) . This notation is extended to vectors of RN whenever meaningful. 440 Synchronization and Linearity p4 p1 p2 p3 en ev od d d od p6 p5 ev en p7 p8 Figure 9.1: A FIFO Petri net 9.5.2.1 State Variables Let • x q (n ), q ∈ Q, n ≥ 1, denote the time when transition q starts firing for the n -th time, with the convention that for all q ∈ Q, x q (n ) = ∞ if transition q never fires for the n -th time and x q (0) = −∞; • yq (n ), q ∈ Q, n ≥ 1, denote the time when transition q completes its n -th firing, with the same convention as above; • v p (n ), p ∈ P , n ≥ 1, denote the time when place p receives its n -th token, with the convention that for all p ∈ P , v p (n ) = ∞ if the place never receives its n -th token and v p (0) = −∞; • w p (n ), p ∈ P , n ≥ 1, denote the time when place p releases its n -th token, with the usual convention if the n -th token is never released and w p (0) = −∞. Owing to our conventions, βq (n ) denotes the firing time of q that starts at x q (n ), n ≥ 1, whereas α p (n ) denotes the holding time of the token that enters p at v p (n ), n ≥ 1. If transition q is FIFO, we have the obvious relation yq (n ) = x q (n ) + βq (n ) . (9.18) More generally, ( yq (n ))n≥1 = R (x q (n ) + αq (n ))n≥1 If place p is FIFO, we can write w p (n ) ≥ v p (n ) + α p (n ) , . 9.5. Evolution Equations of General Timed Petri Nets 441 since the token that enters p at time v p (n ) stays there for at least α p (n ) time units. More generally, (w p (n ))n≥1 ≥ R (v p (n ) + α p (n ))n≥1 . 9.5.2.2 Initial Conditions It is assumed that the origin of time and the initial marking have been fixed in such a way that the variables v p (n ) and w p (n ) satisfy the bounds v p (n ) = z p (n ) ≤ 0 ≥ 0, for n = 1, . . . , M p , if M p ≥ 1 ; for n > M p , and w p (n ) ≥ 0 for n ≥ 1. These conventions are natural: they mean that tokens that arrived in place p prior to the initial time and which left p before that initial time are not considered to belong to the initial marking. Similarly, tokens that arrived in p ‘at or after’ the initial time do not belong to the initial marking. 9.5.2.3 Upstream Equations Associated with Transitions We first look at the relationships induced by a transition q due to the places preceding q . We first consider the case without structural consumption conflicts, namely for every place p preceding q , the set of transitions that follow p is reduced to q . No Structural Consumption Conflicts For all p ∈ π(q ), one token leaves p at time w p (n ). Since q is the only transition that can consume the tokens of p , this corresponds to the starting of the n -th firing of q . Hence, x q (n ) = max w p (n ) . ∀ p ∈π(q ) In the FIFO case, the n -th token of place p to become available for enabling q must be the n -th to enter p, so that x q (n ) = max (v p (n ) + α p (n )) . p ∈π(q ) More generally, x q (n ) = max U p (n ) , p ∈π(q ) where (U p (n ))n≥1 = R (v p (n ) + α p (n ))n≥1 . General Case Without further specifications on how the conflict is resolved, we can only state the following inequalities: in the FIFO case, x q (n ) ≥ max (v p (n ) + α p (n )) , p ∈π(q ) (9.19) and more generally x q (n ) ≥ max U p (n ) . p ∈π(q ) These inequalities are not very satisfactory, and we will return to this point later on. 442 9.5.2.4 Synchronization and Linearity Downstream Equations Associated with Transitions We now look at the relationships induced by a transition q due to the places following q . We first consider the case without structural supply conflicts, namely, for every place p following q , the set of transitions that precede p is reduced to q (that is, p is only fed by this q ). No Structural Supply Conflicts If no other transition than q can feed the places following q , the token entering place p ∈ σ (q ) with rank ( M p + n ) has been produced by the n -th firing of transition q ; therefore, yq (n ) = v p ( M p + n ) , ∀ p ∈ σ (q ) . In the FIFO case, this leads to the relation x q (n ) + βq (n ) = v p ( M p + n ) , ∀ p ∈ σ (q ) , whereas in the general case R (x q (k ) + βq (k ))k≥1 n = vp (M p + n) , ∀ p ∈ σ (q ) . General Case Without further specifications, we can only state the following inequalities: in the FIFO case, x q (n ) + βq (n ) ≥ v p ( M p + n ) , ∀ p ∈ σ (q ) , whereas, in the general case, R (x q (k ) + βq (k ))k≥1 9.5.2.5 n ≥ vp (M p + n) , ∀ p ∈ σ (q ) . Upstream Equations Associated with Places We now focus on the upstream relationships induced by place p . Consider the sequences { yq (n )}n≥1 , for all q ∈ π( p ). With each of them, associate a point process on the real line, where the points are located at yq (n ). We can look at the arrival process into p as the superimposition of these |π( p )| point processes. With all q ∈ π( p), we associate an integer iq ∈ N representing a number of complete firings of q . If transition q has completed exactly iq firings for all q ∈ π( p ), then place p has received exactly q ∈π( p ) iq tokens. The set of vectors (i ) = (iq )q ∈π( p ) such that the n -th token has entered place p is hence p An = i ∈ N|π( p )| iq = n . q ∈π( p ) p The last token produced by the transition firings specified by some i ∈ An enters p at time maxq ∈π( p ) yq (iq ), where yq (0) = −∞ by convention. Since n tokens have reached p once all the firings specified by i have been completed, one obtains v p (n + M p ) ≤ infp max yq (iq ) . i ∈An q ∈π( p ) (9.20) 9.5. Evolution Equations of General Timed Petri Nets 443 But v p (n + M p ) should be equal to some yt0 (n 0 ) since at least one collection of events p puts n tokens in place p (unless An is empty and v p (n + M p ) = ∞). Hence equality must hold true in (9.20). We then obtain the following final relation: v p (n + M p ) = {i ∈N|π( p)| | inf q∈π( p) iq =n max yq (iq ) , } q ∈π( p ) (9.21) where yq (0) = −∞ by convention. 9.5.2.6 Downstream Equations Associated with Places We now concentrate on the downstream relationships induced by a place p . It is in this type of equations that the structural consumption conflicts associated with general Petri nets become apparent. Consider the sequences {x q (n )}n≥1 , for all q ∈ σ ( p ). With all q ∈ σ ( p ), we associate an integer iq ∈ N representing some number of firing initiations of q . If q has started exactly iq firings for all q ∈ σ ( p ), then exactly q ∈σ ( p ) iq tokens have been withdrawn from p . The set of vectors i = (iq )q ∈σ ( p ) such that the n -th token has left place p is hence p Bn = i ∈ N|σ ( p )| iq = n . q ∈σ ( p ) For any i in this set, the last token to leave p leaves at time maxq ∈σ ( p ) x q (iq ). Hence w p (n ) ≤ infp max x q (iq ) . i ∈Bn q ∈σ ( p ) Using a similar reasoning as previously, we obtain the final relation w p (n ) = {i ∈N|σ ( p)| | inf q∈σ ( p) iq =n max x q (iq ) . } q ∈σ ( p ) (9.22) Relations (9.21) and (9.22) exhibit nothing but a superficial symmetry. Indeed, while (9.21) allows one to construct the sequence {v p (n )} from the knowledge of what happens upstream of p and earlier, this is not true at all for (9.22) which only provides some sort of backward property stating that the knowledge of what will happen following p in the future allows one to reconstruct what happens in p now. The reason for this is that the way the conflict is solved is not yet sufficiently precise. We show now one natural way of solving conflicts, which we will call switching. Several other ways are conceivable like competition, which we will also outline. 9.5.2.7 Switching Within this setting, each place that has several transitions downstream receives a switching sequence {ρ p (n )} with values in σ ( p )N . In the same way as the n -th token to enter place p receives a holding time α p (n ), it also receives a route to which it must be switched. This information is given by ρ p (n ), which specifies which transition it must be routed to. In other words, only those tokens such that ρ p (n ) = q should be taken 444 Synchronization and Linearity into account by q ∈ σ ( p ). By doing so, one completely specifies the behavior of the system. For instance, in the FIFO case, one obtains the inequality x q (n ) ≥ w p (η p ,q (n )) , ∀ p ∈ π(q ) , where the switching function η p ,q is defined by m η p ,q (0) = 0, η p ,q (n ) = inf m ≥ 1 1{ρ p (k ) = q } ≥ n , n≥1 . (9.23) k =1 Whenever the behavior of the places upstream of q is specified, one can go further and obtain the desired forward equation, as we will see in the next section. Example 9.17 In our example (see Figure 9.1), the switchings are deterministic. They are chosen as follows: ρ3 (2n ) = 1 , 9.5.2.8 ρ3 (2n + 1) = 5 , ρ7 (2n ) = 5 , ρ7 (2n + 1) = 1 , ∀n ∈ N . Competition The places following p compete for the tokens of p on a First Come First Served (FCFS) basis: within this interpretation, the tokens that have been served in place p can be seen as building up some queue of tokens. Once a transition q following p is enabled except for the condition depending on p , it puts in a request for one token in some FCFS queue of requests. This request is served (and the corresponding transition enabled) as soon as it is at the head of the request line and there is one token in the token queue. 9.5.3 Evolution Equations for Switching In this subsection it is assumed that all places receive some switching. For places with a single downstream transition, this sequence is trivial in the sense that it always routes tokens to this transition. Theorem 9.18 Under the foregoing assumptions, the state variables v p (n ), p ∈ P , n ≥ 1, of a FIFO Petri net satisfy the (nonlinear) recurrence equations v p (n + M p ) = vt (ηt ,q (iq ))αt (ηt ,q (iq ))βq (iq ) , i ∈N|π( p)| q∈π( p) iq =n {q ∈π( p ),t ∈π(q )} (9.24) for n ≥ 1, with the initial condition v p (n ) = z p (n ) for 1 ≤ n ≤ M p , if M p ≥ 1. 9.5. Evolution Equations of General Timed Petri Nets 445 Proof In addition to the variables v p (n ), we will make use of the auxiliary variables x q (n ), q ∈ Q, n ≥ 1. Owing to the switching assumptions, Inequality (9.19) can be replaced by the relation x q (n ) = v p ( j p )α p ( j p ) , p ∈π(q ) j p ≥1 jp k =1 n≥1 , 1{ρ p (k )=q }=n or, equivalently, x q (n ) = v p (η p ,q (n ))α p (η p ,q (n )) , n≥1 , (9.25) p ∈π(q ) where we used the switching function η p ,q defined in (9.23), and the FIFO assumption, which implies that the mapping i → v p (i )α p (i ) is nondecreasing. Similarly, using (9.18) in (9.21) yields v p (n + M p ) = x q (iq )βq (iq ) , {i ∈N|π( p)| | q∈π( p) n≥1 . (9.26) iq =n } q ∈π( p ) Equation (9.24) follows immediately from (9.25) and (9.26). Remark 9.19 In the case when the Petri net is not FIFO, Equations (9.25) and (9.26) have to be replaced by x q (n ) = R(v p (m )α p (m ))m ≥1 p ∈π(q ) (η p,q (n)) , n≥1 , and v p (n + M p ) = R(x q (m )βq (m ))m ≥1 {i ∈N|π( p)| | q∈π( p) iq =n } q ∈π( p ) (iq ) , n≥1 , respectively. Remark 9.20 In (9.24), we can get rid of the firing times βq (n ) by changing the holding times α p (η p ,q (n )), ∀ p ∈ π(q ), into α p (η p ,q (n ))βq (n ). Thus we obtain an equivalent net with βq (n ) = 0 and α p (n ) > 0, q ∈ Q, p ∈ P , n ≥ 1, where the equivalence means that the entrance times are the same in both systems. 446 Synchronization and Linearity Example 9.21 In our example, we obtain v1 (n + 1) = (1v3 (2n 3 ) ⊕ 1v7 (2n 7 + 1)) , v1 (1) = 0 , v2 (n ) = v1 (n ) ⊕ v4 (n ) , v3 (n ) = 2v2 (n ) , v4 (n + 1) = 2v2 (n ) , (1v3 (2n 3 + 1) ⊕ 1v7 (2n 7 )) , v5 (n + 1) = v4 (1) = 0 , v5 (1) = 0 , n 3 +n 7 =n n 3 +n 7 =n v6 (n ) = v5 (n ) ⊕ v8 (n ) , v7 (n ) = 3v6 (n ) , v8 (n + 1) = 3v6 (n ) , 9.5.4 v8 (1) = 0 . Integration of the Recursive Equations We assume that the Petri net is FIFO. We use Remark 9.20 to assume (without loss of generality) that βq (n ) = 0, q ∈ Q, n ≥ 1. Finally, we assume that the switching is given as well as the holding times in the places and that in every circuit of the Petri net there is a place p with 0 < α p (n ) < ∞, n ≥ 1. In what follows we will use weighted trees where the weights are associated with the nodes. We call the weight of a directed path the sum of the weights of all its nodes but its source. A node N1 is said to be deeper than a node N2 if we can find a directed path from N2 to N1 . Finally, the depth of a tree is the length of its longest directed path. Definition 9.22 (Evolution tree) Let ( p , n ) ∈ P × N. An evolution tree A associated with ( p, n ) is a tree with root ( p , n ) defined recursively as follows. • If n ≤ M p , then A is reduced to a single node ( p , n ) with weight α p (n ) + z p (n ). • If n > M p , choose one i ∈ N|π( p )| satisfying t ∈π( p ) iq = n − M p . Then A is the tree with root ( p , n ) and with |π(π( p ))| subtrees being evolution trees associated with the nodes (q , ηq ,t (iq )), t ∈ π( p ), q ∈ π(q ). The root ( p , n ) is given a weight α p (n ). The set of all the evolution trees of the pair ( p , n ) will be denoted E ( p , n ). In Equation (9.24), we can replace the variables vt (ηt ,q (n )) by using Equation (9.24) once more. We obtain v p (n + M p ) = i ∈N|π( p)| q∈π( p) iq =n {q ∈π( p ),t ∈π(q )} vr (ηr,s ( js ))αr (ηr,s ( js ))αt (ηt ,q (it )) . j ∈N|π(t )| s ∈π(t ) js =ηt ,q (it )− Mt {s ∈π(t ),r ∈π(s )} 9.6. Min-Max Systems 447 If we use the distributivity of ⊕ with respect to ∧ (see (4.95)), this equality becomes: v p (n + M p ) = vr (ηrs ( jst q ))αr (ηrs ( jst q ))αt (ηt q (iq )) , i∈I j ∈ Jt q where def I = i ∈ N|π( p )| iq = n q ∈π( p ) q ∈π( p ) t ∈π(q ) s ∈π(q ) r ∈π(s ) def , J tq = j t q ∈ N|π(t )| jst q = ηt q (iq ) − Mt s ∈π(t ) . This equation represents the first step in the ‘integration’ of the recurrence equations. Indeed, we obtain a tree of depth 2 from the root ( p , n + M p ). If we continue to develop this equation, we obtain trees with increasing depths. We stop when each path ends with a leaf, namely, when it terminates with a node (q , m ) with m ≤ Mq . We eventually obtain the integration of Equation (9.24): v p (n ) = inf A∈E ( p ,n) C ( A) , n ≥ Mp , with C ( A) = sup (w(T )) . T ∈T ( A) The quantity C ( A) is the weight of tree A, T ( A) is the set of all the directed paths from the root to any leaf of the tree A, and w(T ) is the weight of the directed path T (i.e. the sum of the weights of all its nodes except its root). Remark 9.23 The set E ( p , n ) might contain infinite trees, thus E ( p , n ) is not constructible and this transformation of the recursive equations does not obviously give the ‘constructiveness’ character of these equations. However, it is useful for preliminary results. The reader is referred to [7] where this issue is further analyzed. 9.6 Min-Max Systems In this section we will be concerned with systems of which the evolution is determined by three rather than two different operations, namely addition, maximization and minimization. Because these operations occur simultaneously, a different notation for max and min is necessary: ⊕ is reserved for max, and ∧ will denote min. The most general system to be considered is of the form x (k + 1) = y (k + 1) = v(k ) = w(k ) = A1 ⊗ x (k ) ⊕ B1 ⊗ y (k ) ⊕ C1 ⊗ v(k ) ⊕ D1 ⊗ w(k ) , A2 x (k ) ∧ B2 y (k ) ∧ C2 v (k ) ∧ D2 w (k ) , A3 ⊗ x (k ) ⊕ B3 ⊗ y (k ) ⊕ C3 ⊗ v(k ) ⊕ D3 ⊗ w(k ) , A4 x (k ) ∧ B4 y (k ) ∧ C4 v (k ) ∧ D4 w (k ) . (9.27) (9.28) (9.29) (9.30) The notation here refers to the multiplication of two matrices (or a matrix and a vector) in which the ∧-operation is used instead of ⊗ (see §6.6.1 and §9.2.4). The expressions a ⊗ b and a b are identical if at least either a or b is a scalar. The 448 Synchronization and Linearity operation ⊕ has the neutral element ε whereas ∧ has the neutral element following convention, in accordance with (5.7) and (5.8), is made: ⊗ε=ε , ε= . The . In analogy with conventional system theory, system (9.27)–(9.30) is called a descriptor system. It is assumed that the vectors x (k ), y (k ), v(k ) and w(k ) are respectively n -,m -, p -, and q -dimensional. The matrices Al , Bl , Cl and Dl , l = 1, . . . , 4, have appropriate dimensions. The elements of the matrices with an odd index are either finite or ε and the elements of the matrices with an even index are either finite or . Equations (9.29) and (9.30) are implicit equations in v(k ) and w(k ), respectively. It is assumed that the precedence graph G ( E ), where E is the matrix C3 C4 E= D3 D4 , contains neither circuits nor loops. For later reference this condition is called Condition C1: Condition C1 The graph G ( E ) contains neither circuits nor loops. Because of this condition, a finite number of repeated substitutions of the whole righthand side of (9.29) and (9.30) into these same equations leads to, respectively, ∗ v(k ) = C3 ⊗ ( A3 ⊗ x (k ) ⊕ B3 ⊗ y (k ) ⊕ D3 ⊗ w(k )) , ∗ ( A4 x (k ) ∧ B4 y (k ) ∧ C4 v (k )) , w(k ) = D4 where ∗ 2 C3 = e ⊕ C3 ⊕ C3 ⊕ · · · , ∗ 2 D4 = e ∧ D4 ∧ D4 ∧ · · · . In these equations, the matrix product (which is used in the power computation of matrices) must be understood as being ⊗, respectively , when used in conjunction with ⊕, respectively ∧. Similarly, the symbol e denotes the identity matrix in Rmax , ∗ ∗ respectively Rmin . Condition C1 is sufficient, but not necessary, for C3 and D4 to exist in the expressions above. Now the equations in v(k ) and w(k ) can be solved in a suitable order, and the solutions can be expressed in terms of x (k ) and y (k ). These solutions are written symbolically as v(k ) = f1 (x (k ), y (k )) , w(k ) = f2 (x (k ), y (k )) . If these equations are substituted into (9.27) and (9.28), then the new expressions for x (k + 1) and y (k + 1) will show a finite nesting of max- and min-operations. For later reference, Equations (9.27)–(9.30), defining a mapping from Rn+m + p +q to itself, will symbolically be denoted M. Similarly, the mapping of the corresponding nested equations is denoted M (M maps Rn+m to itself). Hence, x (k + 1) y (k + 1) v(k ) w(k ) =M x (k ) y (k ) v(k ) w(k ) x (k + 1) y (k + 1) =M x (k ) y (k ) . , (9.31) 9.6. Min-Max Systems 9.6.1 449 General Timed Petri Nets and Descriptor Systems Consider the network depicted in Figure 9.2. Each of the three nodes performs activi- τ3 τ2 τ1 q3 q2 q1 Figure 9.2: A network ties. The loops around these nodes, with time durations τ1 , τ2 and τ3 , refer to processing or recycling times of one activity at the respective nodes. All other time durations are assumed to be zero. Node q1 delivers products to nodes q2 and q3 simultaneously. Node q3 starts processing on the first incoming product. To start an activity, each node must have delivered its product(s) of the previous activity to its destination node(s). If the destination node is q2 , its buffer, indicated by a rectangle in the figure, can store one incoming item (while q2 works at the present activity). Hence, if this buffer is full, node q1 cannot yet deliver a product and must wait until this buffer becomes empty. Similarly, there is a buffer just before node q3 which can contain two incoming items maximally. If each buffer contains one token initially, one may be tempted to model the succession of firing times as follows: x 1 (k + 1) = max(x 1 (k ) + τ1 , x 2 (k ), x 3 (k )) , x 2 (k + 1) = max(x 1 (k ) + τ1 , x 2 (k ) + τ2 , x 3 (k )) , (9.32) x 3 (k + 1) = max(min(x 1 (k ) + τ1 , x 2 (k ) + τ2 ), x 3 (k ) + τ3 ) , where the quantities x i (k ), k = 1, 2, . . . , are the successive firing times of node qi . This model can be rewritten in the form (9.27)–(9.30) by adding w(k ) = min(x 1 (k ), x 2 (k )) to (9.32) and by replacing the appropriate part in the last of the equations of (9.32) by w(k ). Indeed, node q3 will process the first arriving k -th product, of either q1 or q2 , first. The last arriving product at q3 , however, is not processed at all according to (9.32). It apparently leaves the system in some mysterious way. There is a discrepancy between the problem statement and its model (9.32). In order to model the processing of the last arriving product also, one can introduce a fictive node q4 , which is actually node q3 , and which takes care of the last arriving of the two products coming from q1 and q2 . If this fictive node has firing times x 4 (k ), then the new model becomes x 1 (k + 1) = max(x 1 (k ) + τ1 , x 2 (k ), x 4 (k )) , x 2 (k + 1) = max(x 1 (k ) + τ1 , x 2 (k ) + τ2 , x 4 (k )) , x 3 (k + 1) = max(min(x 1 (k ) + τ1 , x 2 (k ) + τ2 ), x 4 (k ) + τ3 ) , (9.33) x 4 (k + 1) = max(max (x 1 (k ) + τ1 , x 2 (k ) + τ2 ), x 3 (k + 1) + τ3 ) = max(x 1 (k ) + τ1 , x 2 (k ) + τ2 , x 4 (k ) + 2τ3 , min(x 1 (k ) + τ1 + τ3 , x 2 (k ) + τ2 + τ3 )) . Model (9.33) assumes that the buffer just before q3 must be emptied before a new cycle (k → k + 1) can be started. Note that x 3 does not appear on the right-hand side 450 Synchronization and Linearity anymore and therefore the equation for x 3 (k + 1) can be disregarded. It is obvious that this model can be rewritten in the form (9.27)–(9.30) also. Though model (9.33) does not throw away half-finished products, node q3 still might not always take the product which arrives first. Node q3 and its image node q4 process the batch of the two arriving k -th products (from q1 and q2 ) according to first arrival. If the (k + 1)-st product of q1 , say, arrive before the k -th product of q2 , it has to wait until this last k -th product has arrived. Yet another remark with respect to (9.33) must be made. According to (9.33), nodes q1 and q2 can start another cycle only after q4 has started its current activity. However, the performance of the network can be increased if x 4 (k ) in either the first or the second equation of (9.33) is replaced by x 3 (k ), depending on whether x 1 (k ) + τ1 < x 2 (k ) + τ2 or not. Such a conditional dependence can neither be expressed in terms of the operations min, max and +, nor can it be shown graphically by means of a Petri net as introduced in this book. This dependence can be expressed in Petri nets in which so-called inhibitor arcs are allowed. The reader is referred to [1] about such arcs. One can enlarge the batch size from which node q3 takes its products according to the FIFO priority rule. If, for instance, one introduces two fictive nodes, one for q1 and one for q2 , and another pair of two fictive nodes, one node for q3 and one for q4 , then one can construct a model which has a batch size of four. The original products numbered k and k + 1 coming from q1 and the original products numbered k and k + 1 coming from q2 are processed by q3 , or one of its images, according to FIFO. The next batch will then consist of the four original products numbered k + 2 and k + 3 coming from both q1 or its image and q2 or its image. The corresponding model will not be written down explicitly; its (eight) scalar equations become rather unwieldy expressions with nested forms of the operations max and min. 9.6.2 Existence of Periodic Behavior In the following definition, the symbol M and M are those of (9.31). Definition 9.24 A scalar λ, ε ≤ λ ≤ , is called an eigenvalue of the mapping M, respectively M, if a vector x y v w , respectively x y , exists, where either x or y has at least one finite element, such that y v w =M x y λ⊗x λ⊗x λ λ y =M x v y w , respectively, . Such a vector is called an eigenvector of M, respectively M. It will be clear that, provided Condition C1 holds, see §9.6, an eigenvalue of M is also an eigenvalue of M and vice versa. A motivation to study eigenvectors is that the system has a very regular behavior if the initial condition coincides with an eigenvector. In fact, the firing times of the (k + 1)-st activities take place exactly λ time units later than the firing times of the k -th activities. Conditions will be given under which the eigenvalue and a corresponding eigenvector exist. 9.6. Min-Max Systems 451 System (9.27)–(9.30) can be written as x (k + 1) v(k ) = A1 A3 y (k + 1) w(k ) = B2 B4 x (k ) v(k ) ⊕ B1 B3 D1 D3 x (k ) v(k ) C1 C3 ∧ A2 A4 C2 C4 D2 D4 ⊗ y (k ) w(k ) , x (k ) v(k ) ⊗ , the two ‘autonomous’ equations of which are x (k + 1) v(k ) = A1 A3 C1 C3 y (k + 1) w(k ) = B2 B4 x (k ) v(k ) , (9.34) x (k ) w(k ) D2 D4 ⊗ . (9.35) These two sets of autonomous equations can be considered as two subsystems of (9.27)–(9.30), connected by means of the matrices B1 B3 D1 D3 C2 C4 A2 A4 , . Condition C2 The first matrix in (9.36) is not identically identically ε. (9.36) and the second one is not This amounts to saying that the two connections are actual. If Condition C1 is satisfied, then v(k ) can be solved from (9.34) and subsequently be substituted into the right-hand side of (9.34): ∗ x (k + 1) = ( A1 ⊕ C1 ⊗ C3 ⊗ A3 ) ⊗ x (k ) . (9.37) Similarly, we obtain y (k + 1) = ( B2 ∧ D2 ∗ D4 B4 ) y (k ) . (9.38) Condition C3 The transition matrices of (9.37) and (9.38) are irreducible. If Conditions C1 and C3 hold, then the matrices which govern the evolution of the systems in (9.37) and (9.38) have unique eigenvalues, denoted λmax and λmin , respectively. The existence and uniqueness of these eigenvalues is a direct consequence of the theory of Chapter 3. Now the following theorem, proved in [102], holds. Theorem 9.25 Assume Conditions C1, C2 and C3 are fulfilled. The operator M has an eigenvalue λ and a corresponding eigenvector x y all of which components are finite, i.e. λ⊗ x y =M x y , (9.39) if and only if λmax ≤ λmin . Under these conditions, λ is unique and satisfies λmax ≤ λ ≤ λmin . 452 Synchronization and Linearity The condition that the components of the eigenvector must all be finite is essential for the statement of this theorem to hold. As a counterexample, consider x (k + 1) = 2 x (k ) ⊕ 3 y (k ) , y (k + 1) = 4 x (k ) ∧ 5 y (k ) . The unique eigenvalue which falls within the scope of the above theorem is λ = 3.5 with corresponding eigenvector 0.5 1 . However, λ = λmax = 2 is also an eigenvalue with eigenvector e ε . Similarly, λ = λmin = 5 is an eigenvalue with eigenvector e. In Chapter 3 we saw that, within the max-plus algebra setting, the evolution of a linear system, such as (9.37), converges in a finite number of steps to a periodic behavior, the period being related to the length(s) of the critical circuit(s). Such a property has not (yet) been shown for systems within the min-max algebra setting, though simulations do point in this direction. Procedure 1, to be presented in the next subsection, is based on this observation. 9.6.3 Numerical Procedures for the Eigenvalue Three numerical procedures for the calculation of the eigenvalue and corresponding eigenvector of M will be discussed briefly by means of examples. Of course, these procedures can also be applied to systems in Rmax only. Procedure 1 Consider (9.27) and (9.28) with ε1ε 33ε A1 = ε e 1 , B1 = 3 ε ε , A2 = 21ε εε1 3 3 3 , B2 = 6 43 , 96 and C3 = C4 = (ε), B3 = B4 = ( ). The evolution of this system will be studied by starting with an arbitrary initial vector. If x (0) y (0) =123456, then x (0) y (0) = 123456 , x (1) y (1) = 877655 , . . . x x x x x x (12) (13) (14) (15) (16) (17) y (12) y (13) y (14) y (15) y (16) y (17) = = = = = = 38 40 43 46 49 52 37 40 43 46 48 51 37 40 42 45 48 51 37 40 43 45 48 51 37 40 43 46 49 51 37 40 43 45 49 51 , , , , , ... . This evolution is continued until x (k ) becomes linearly dependent on one of the previous states x (l ) y (l ) , l = 1, . . . , k − 1. For this example, this occurs for k = 17: x (17) y (17) = 14 ⊗ x (12) y (12) . It is now claimed that 16 λ = 14/(17 − 12) = 14/5 and that /5 is the eigenvector. x (j) y (j) j =12 Note that in this expression for the eigenvector, the conventional operations addition 9.6. Min-Max Systems 453 and division occur. These are nonlinear operations within the min-max algebra! For the example, the eigenvector thus becomes 1 5 . 216 214 212 213 215 213 It can be verified by means of substitution that the quantities thus obtained are indeed the eigenvalue and eigenvector. No general proof exists of the fact that this method indeed yields the correct answers, however. If the same method is used for systems in the max-plus algebra only, it is known that it does not always give the correct results. In situations where it does not, a slightly more complicated algorithm exists which does give the correct results (see [28]). Procedure 2 Consider (9.27)–(9.30) with sizes n = 2, m = 0, p = 0, q = 1. The matrices concerned are given by A1 = 2ε 23 , D1 = e e , A4 = 53 , D4 = ( ) . If the exponential approach of §9.2 is applied to the definition of the eigenvalue given in Definition 9.24, we obtain z λ+ x1 z λ+ x2 z −w1 = = = z 2+ x1 + z w1 , z 2+ x1 + z 3+ x2 + z w1 , z − 5 − x1 + z − 3 − x2 . (9.40) (9.41) (9.42) The quantities z x1 and z x2 can be solved from (9.40) and (9.41), and expressed in z w1 . These solutions can be substituted into (9.42), which yields z −w1 = z −w1 z −5 (z λ − z 2 ) + z −w1 z 3 z 2 (z λ zλ − z3 . − z 2 )−1 + 1 Dividing this expression by z −w1 and after some rearranging, we obtain z 2 λ+ 3 + z 2 λ− 5 + z − 1 + z 8 = z 0 + z 2 + 2 z λ− 3 + z λ+ 6 + z λ+ 5 . (9.43) The essence of this arrangement is that all the exponential terms have been moved to that side of the equality symbol in such a way that only positive coefficients remain. Equation (9.43) must be valid as z → ∞. Hence λ must satisfy max(2λ + 3, 2λ − 5, −1, 8) = max (0, 2, λ − 3, λ + 6, λ + 5) . This equation is most easily solved graphically. The result is λ = 3 and thus the eigenvalue has been found. This method is only suitable when n , m , p and q are small. Essential is that an explicit equation in z λ must be obtained. Procedure 3 This procedure, which always works for systems which satisfy the conditions of Theorem 9.25, will be described by means of an algorithm. For an efficient 454 Synchronization and Linearity way of explanation, (9.39) is rewritten as λ ⊗ a = M(a ), where a ∈ Rn+m . The vector function M has components Mi , i = 1, . . . , n + m . If the eigenvector is a , then Mi (a ) − ai must be equal to λ for all i . We will say that the accuracy η , where η is a given arbitrary positive number, is achieved if we find an a such that maxi Mi (a ) − ai − mini Mi (a ) − ai < η . We then use the following algorithm. 1. Choose an arbitrary a ∈ Rn+m with all components finite. 2. Calculate ci = Mi (a ) − ai , i = 1, . . . , n + m . Define c = mini ci , c = maxi ci . 3. If c − c < η , then stop. def 4. Construct disjoint subsets ϒi , i = 1, 2, 3, of ϒ = {1, . . . , n + m } such that • ϒ = ϒ1 ∪ ϒ2 ∪ ϒ3 , • j ∈ ϒ1 ⇔ c j < c + η/2 , • j ∈ ϒ2 ⇔ η/2 ≤ c j − c < η , • j ∈ ϒ3 ⇔ c j ≥ c + η . 5. Change a j into a j − η/2 for all j ∈ ϒ1 . Do not change the other a -components. 6. Go to step 2. This algorithm always ends in a finite number of steps. If k denotes the iteration index of the algorithm, then this k will, as an argument, specify the quantities related to the k -th iteration of the algorithm, and ci (k + 1) ≥ ci (k ) ci (k + 1) ≤ ci (k ) for i ∈ ϒ1 (k ) , for i ∈ ϒ2 (k ) ∪ ϒ3 (k ) . Therefore c(k ) is a nonincreasing function of k and similarly c(k ) is nondecreasing. At each iteration of the algorithm some elements of ϒ1 may have moved to ϒ2 or vice versa. Some elements of ϒ3 may have moved to ϒ2 , but not vice versa. As k increases, ϒ1 and ϒ2 will ultimately catch all ci . By means of the following example, it will be shown how the algorithm works: a1 (k + 1) a2 (k + 1) = = max(a1 (k ) + 1, a2 (k ) + 2, a3 (k )) , max(a1 (k ) + 2, a2 (k ), a3 (k ) + 1) , a3 (k + 1) = min(a1 (k ) + 2, a2 (k ) + 4, a3 (k ) + 3) . We take η = 0.2 and start with a (0) = 12 3 . Application of the algorithm 9.6. Min-Max Systems 455 yields the following results: 3 1 3 1 a (0) = 2 , c(0) = 2 , a (1) = 2 , c(1) = 1.9 , . . . , 0.1 2.9 0 3 3 1 3 1 a (9) = 2 , c(9) = 1.1 , a (10) = 2 , c(10) = 1 , 1 2 0.9 2.1 1 2.9 1 2 a (11) = 1.9 , c(11) = 1.1 , . . . , a (20) = 1 , c(20) = 2 . 1.9 1.1 1 2 For this example, even the exact results are obtained: the eigenvector is a (20) = 1 1 1 and the eigenvalue is 2. 9.6.4 Stochastic Min-Max Systems We are given the system described by (9.27)–(9.30). In contrast to the previous subsections, it is now assumed that the matrices in these formulæ are event-dependent; Al (k ), Bl (k ), etc. The reason of this dependence is that (some of) the entries of these matrices will be assumed stochastic. For each k , the stochastic entries are assumed to be mutually independent and moreover, it is assumed that there is no correlation for different k -values. The underlying probability distributions are assumed to be finite, i.e. the entries can only assume a finite number of values. For the calculation of the average throughput, the same technique as used in Chapter 8 for max-plus algebra systems will be used. As an example, consider x 1 (k + 1) = max(x 1 (k ) + τ1 (k ), x 2 (k ), x 3 (k )) , x 2 (k + 1) = max(x 1 (k ) + τ1 (k ), x 2 (k ) + τ2 (k ), x 3 (k )) , (9.44) x 3 (k + 1) = max(min(x 1 (k ) + τ1 (k ), x 2 (k ) + τ2 (k )), x 3 (k ) + 1) , which resembles (9.32). The stochastic quantities τi (k ) are supposed to be independent of each other (i.e. for all i and all k ). Assume that τi (k ) = 0 or τi (k ) = 2, both with probability 0.5. Starting from an ‘arbitrary’ x 0 -vector, say x 0 = 0 0 0 , we will set up the reachability tree of all possible states of (9.44). From x 0 , four new states can be reached in one step in principle, since there are four possibilities for the combination (τ1 (0), τ2 (0)). Actually, one of these states, 2 2 2 , is a translation of x 0 and hence is not considered to be a new state. For this example, it turns out that the reachability tree consists of ten states which will be denoted n 1 , . . . , n 10. Here n 1 = 0 0 0 . If for instance τ1 (0) = τ2 (0) = 0, then we get to state n 2 = 0 0 1 . The ten states, together with their immediate successors, are given in Table 9.3. It is not difficult to show that, from any initial condition, the state x k will converge in a finite number of steps to the Markov chain consisting of the given 10 nodes. Since the probabilities with which the different (τ1 (k ), τ2 (k )) occur are known, the transition probabilities of a Markov chain in which the ten states n i 456 Synchronization and Linearity Table 9.3: Reachable states τ1 = 0 τ2 = 0 Initial state n1 = n2 = n3 = n4 = n5 = n6 = n7 = n8 = n9 = n 10 = 0 0 0 1 0 0 2 0 1 2 0 0 2 1 1 2 2 1 3 4 0 1 1 0 1 0 0 0 0 0 τ1 = 0 τ2 = 2 τ1 = 2 τ2 = 0 τ1 = 2 τ2 = 2 n2 n2 n1 n1 n2 n4 n1 n1 n7 n7 n3 n5 n6 n6 n3 n9 n6 n6 n 10 n 10 n4 n1 n1 n7 n1 n1 n7 n4 n1 n1 n1 n1 n6 n1 n8 n6 n1 n8 n6 n6 e 1 2 1 1 1 2 1 1 2 e 1 2 1 1 1 2 1 1 2 1 2 2 1 2 2 2 1 3 4 2 2 2 3 2 2 4 2 3 4 are the nodes can be calculated. Subsequently, the stationary behavior of this Markov chain can be calculated. Once this stationary distribution is known, it is not difficult to calculate lim E(x i (k + 1) − x i (k )) , k →∞ (9.45) which turns out to be independent of the subscript i . This method, together with its properties, has been described more extensively in Chapter 8 for only max-plus systems. As long as the reachability-tree consists of a finite number of states, the method works equally well for min-max systems. For the example treated, it turns out that the expression in (9.45) equals 1.376 time units. This quantity can be considered as the average cycle time of system (9.44). 9.7 About Cycle Times in General Petri Nets In this section we are interested in finding how fast each transition can initiate firing in a periodically operated Petri net (not necessarily an event graph). Connections with Chapter 2 will be made. It will not be assumed here that each transition fires as soon as it is enabled. The order and timing of the initiation of firings of enabled transitions must be chosen in such a way (if at all possible) that a periodic behavior is possible. During each period, each transition must fire at least once. The thus smallest possible period τ is called the cycle time and it is defined as the time to complete a firing sequence leading back to the initial marking. Therefore we will confine ourselves to consistent Petri nets, i.e. ∃x > 0 , G x = 0 , where G was defined in § 2.4.1. Later on in this section, we will narrow down the consistent Petri nets to event graphs. 9.7. About Cycle Times in General Petri Nets 457 It is assumed that the holding and firing times are constant in time. Suppose there is a firing time of at least βi time units associated with transition qi , i = 1, . . . , n . This means that when qi is enabled and it initiates a firing, it takes βi time units before the firing is completed. Hence one token is reserved in each place p j ∈ π(qi ) for at least βi time units before the transition completes its firing. The ‘resource-time multiplication’ j is defined as (the number of tokens in place p j ) × (the length of time that these tokens remain in that place). A popular interpretation is: if a token represents a potential buyer who has to wait, then j is proportional to the amount of coffee you will offer to all these buyers. In matrix form, = (G in ) D x , where the m -dimensional vector has an entry per place and where D is the diagonal matrix with the elements βi on the diagonal and 0’s elsewhere (in conventional algebra). Here we only considered the reserved tokens (reserved for the duration of the firing time). Now suppose that there are on average (µa ) j tokens in place p j during one period (this average is with respect to clock time). Then the corresponding is given by the vector µa τ (popular again: the amount of coffee which you will need during one cycle for all waiting clients). Since this latter was calculated for both the reserved and not reserved tokens, the following inequality holds: µa τ ≥ (G in ) D x . (9.46) Since µa is the average of a finite number of markings µ, and since µ y , with y satisfying Gy = 0, does