This preview shows page 1. Sign up to view the full content.
Unformatted text preview: DYNAMIC GAMES: THEORY AND APPLICATIONS G E R A D 2 5 t h A nniversary S eries Essays a nd Surveys i n Global O ptimization Charles A udet, Pierre Hansen, and Gilles Savard, editors Graph T heory and Combinatorial O ptimization D avid Avis, A lain H ertz, and O dile M arcotte, editors
w N umerical M ethods i n Finance H atem B enAmeur and M ichkle B reton, editors Analysis, C ontrol a nd O ptimization o f Complex D ynamic Syst ems ElKebir Boukas and Roland M alhame, editors rn C olumn Generation Guy Desaulniers, Jacques Desrosiers, and M arius M . Solomon, editors S tatistical M odeling and Analysis f or Complex D ata Problems Pierre Duchesne and B runo RCmiliard, editors Performance Evaluation a nd Planning M ethods for t he N ext Generation I nternet AndrC Girard, B runilde Sansb, and Felisa VazquezAbad, editors D ynamic Games: T heory and Applications A lain Haurie and Georges Zaccour, editors rn Logistics Systems: Design a nd O ptimization AndrC Langevin and D iane Riopel, editors Energy a nd Environment Richard L oulou, JeanPhilippe W aaub, and Georges Zaccour, editors DYNAMIC GAMES: THEORY AND APPLICATIONS Edited by ALAIN HAUFUE Universite de Geneve & GERAD, Switzerland GEORGES ZACCOUR
HEC Montreal & GERAD, Canada  Springer ISBN 10: ISBN 10: ISBN 13: ISBN 13: 0387246010 (HB) 0387236029 (ebook) 9780387246017 (HB) 9780387246024 (ebook) O 2005 by Springer Science+Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science + Business Media. Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Printed in the United States of America. 98765432 1 S PIN 1 1308 140 Foreword GERAD celebrates this year its 25th anniversary. The Center was created in 1980 by a small group of professors and researchers of HEC Montrkal, McGill University and of the ~ c o l Polytechnique de Montrkal. e GERAD's activities achieved sufficient scope to justify its conversi?n in June 1988 into a Joint Research Centre of HEC Montrkal, the Ecole Polytechnique de Montrkal and McGill University. In 1996, the Universit6 d u Qukbec k Montrkal joined these three institutions. GERAD has fifty members (professors), more than twenty research associates and post doctoral students and more than two hundreds master and P h.D. students. GERAD is a multiuniversity center and a vital forum for the development of operations research. Its mission is defined around the following four complementarily objectives:
rn T he original and expert contribution to all research fields in GERAD's area of expertise; T he dissemination of research results in the best scientific outlets as well as in the society in general; T he training of graduate students and post doctoral researchers; T he contribution to the economic community by solving important problems and providing transferable tools. rn rn rn GERAD's research thrusts and fields of expertise are as follows:
rn Development of mathematical analysis tools and techniques to solve the complex problems that arise in management sciences and engineering; Development of algorithms to resolve such problems efficiently; Application of these techniques and tools to problems posed in relat,ed disciplines, such as statistics, financial engineering, game theory and artificial int,elligence; Application of advanced tools to optimization and planning of large technical and economic systems, such as energy systems, t ransportation/communication networks, and production systems; Integration of scientific findings into software, expert systems and decisionsupport systems that can be used by industry. rn rn rn rn vi D YNAMIC GAMES: THEORY AND APPLICATIONS One of the marking events of the celebrations of the 25th anniversary of GERAD is the publication of ten volumes covering most of the Center's research areas of expertise. The list follows: Essays a n d Surveys in Global Optimization, edited by C. Audet, P. Hansen and G. Savard; G r a p h T h e o r y a n d Combinatorial Optimization, edited by D. Avis, A. Hertz and 0 . Marcotte; Numerical M e t h o d s i n Finance, edited by H. BenAmeur and M . Breton; Analysis, Cont r o l a n d Optimization of Complex Dynamic Systems, edited by E.K. Boukas and R. Malhamk; C o l u m n Generation, edited by G. Desaulniers, J . Desrosiers and h1.M. Solomon; Statistical Modeling a n d Analysis for Complex D a t a Problems, edited by P . Duchesne and B. Rkmillard; P erformance Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , edited by A. Girard, B. Sansb and F. VazquezAbad; Dynamic Games: T h e o r y a n d Applications, edited by A. Haurie and G. Zaccour; Logistics Systems: Design a n d Optimization, edited by A. Langevin and D. Riopel; Energy a n d Environment, edited by R . Loulou, J.P. Waaub and G. Zaccour. I would like to express my gratitude to the Editors of the ten volumes. to the authors who accepted with great enthusiasm t o submit their work and to the reviewers for their benevolent work and timely response. I would also like to thank Mrs. Nicole P aradis, Francine Benoit and Louise Letendre a nd Mr. Andre Montpetit for their excellent editing work. The GERAD group has earned its reputation as a worldwide leader in its field. This is certainly due to the enthusiasm and motivation of GER.4D's researchers and students, but also to the funding and the infrastructures available. I would like to seize the opportunity to thank the organizations that, from the beginning, believed in the potential and the value of GERAD and have supported it over the years. These e are HEC Montrkal, ~ c o l Polytechnique de Montrkal, McGill University, Universitk du Qukbec B Montrkal and, of course, the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds qukbkcois de la recherche sur la nature et les technologies ( FQRNT). Georges Zaccour Director of GERAD Le Groupe d'ktudes e t de recherche e n analyse des dkcisions ( GERAD) fete c ette annke son vingtcinquikme anniversaire. Fondk en 1980 par une poignke de professeurs e t chercheurs de HEC Montrkal engagks d ans des recherches e n kquipe avec des collitgues d e 1'Universitk McGill e t de ~ ' ~ c o Polytechnique de Montrkal, le Centre comporte maintenant le une cinquantaine d e membres, plus d 'une vingtaine de professionnels de recherche e t stagiaires postdoctoraux e t plus de 200 ktudiants des cycles supkrieurs. Les activitks d u GERAD ont pris suffisamment d'ampleur pour justifier e n juin 1988 s a transformation e n un Centre de recherche conjoint de H EC Montreal, d e 1 ' ~ c o l e olytechnique de Montrkal e t de P 1'Universitk McGill. E n 1996, l'universitk d u Qukbec A Montrkal s'est jointe A ces institutions pour parrainer le GERAD. Le GERAD est un regroupement d e chercheurs a utour d e l a discipline de la recherche opkrationnelle. Sa mission s'articule a utour des objectifs complkmentaires suivants : l a contribution originale et e xperte dans t ous les axes de recherche d e ses champs de compktence; l a diffusion des rksult'ats d ans les plus grandes revues d u domaine ainsi qu'auprks des diffkrents publics qui forment l'environnement d u Centre; l a formation d'ktudiants des cycles supkrieurs e t d e stagiaires postdoctoraux; l a contribution A la communautk kconomique & travers l a rksolution de problkmes e t le dkveloppement de coffres d'outils transfkrables. w Les principaux axes de recherche d u G ERAD, en allant d u plus thkorique au plus appliquk, sont les suivants : le dkveloppement d'outils et de techniques d'analyse mathkmatiques d e l a recherche opkrationnelle pour la rksolution d e problkmes complexes qui se posent d ans les sciences d e la gestion e t du gknie; l a confection d'algorithmes p ermettant la rksolution efficace d e ces problkmes; l'application de ces outils A des problkmes posks d ans des disciplines connexes A l a recherche op6rationnelle telles que la s tatist ique, l'ingknierie financikre; l a t~hkoriedes jeux e t l'intelligence artificielle; l'application de ces outils & l'optimisation e t & la planification de grands systitmes technicokconomiques comme les systitmes knergk w vlll ... D YNAMIC GAMES: THEORY AND APPLICATIONS w tiques, les rkseaux de tklitcommunication e t de transport, la logistique et la distributique dans les industries manufacturikres et de service; l'intkgration des rksultats scientifiques dans des logiciels, des systkmes e xperts et dans des systemes d'aide a l a dkcision transfkrables & l'industrie. Le fait m arquant des cklkbrations du 25e d u GERAD est la publication de dix volumes couvrant les champs d'expertise du Centre. La liste suit : E ssays a n d S u r v e y s i n Global Optimization, kditk par C. Audet,, P. Hansen et G. Savard; G r a p h T h e o r y a n d C o m b i n a t o r i a l Optimization, kditk par D. Avis, A. Hertz et 0 . Marcotte; N u m e r i c a l M e t h o d s i n Finance, kditk par H . BenAmeur et M. Breton; Analysis, C o n t r o l a n d O p t i m i z a t i o n of C o m p l e x D y n a m i c S y s t e m s , kditk par E.K. Boukas et R. Malhamit; C o l u m n G e n e r a t i o n , kditk par G. Desaulniers, J . Desrosiers et M.M. Solomon; Statistical Modeling a n d Analysis for C o m p l e x D a t a P r o b l e m s , itditk par P. Duchesne et B. Rkmillard; P e r f o r m a n c e Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , kdit6 par A. Girard, B. Sansb e t F. VtizquezAbad: D y n a m i c Games: T h e o r y a n d Applications, edit4 par A. Haurie et G. Zaccour; Logistics Systems: Design a n d Optimization, Bditk par A. Langevin et D. Riopel; E n e r g y a n d E n v i r o n m e n t , kditk par R. Loulou, J.P. Waaub et G. Zaccour. Je voudrais remercier trks sincerement les kditeurs de ces volumes, les nombreux auteurs qui ont trks volontiers rkpondu a l'invitation des itditeurs B. s oumettre leurs travaux, et les kvaluateurs pour leur bknkvolat et ponctualitk. J e voudrais aussi remercier Mmes Nicole P aradis, Francine Benoit e t Louise Letendre ainsi que M. And& Montpetit pour leur travail expert d'kdition. La place de premier plan qu'occupe le GERAD sur l'kchiquier mondial est certes due a l a passion qui anime ses chercheurs et ses ittudiants, mais aussi au financement et & l'infrastructure disponibles. Je voudrais profiter de cette occasion pour remercier les organisations qui ont cru dks le d epart a u potentiel et la valeur du GERAD et nous o nt soutenus d urant ces annkes. I1 s'agit d e HEC Montrital, 1 ' ~ c o l e olytechnique P de Montrkal, 17UniversititMcGill, l'Universit8 d u Qukbec k Montrkal et, hien s ur, le Conseil de recherche en sciences naturelles et en gknie d u Canada (CRSNG) et le Fonds qukbkcois d e la recherche sur la nature et les technologies (FQRNT). Georges Zaccour Directeur du GERAD Contents Foreword Avantpropos Contributing Authors Preface
1 Dynamical Connectionist Network and Cooperative Games J .P. A ubin
2 A Direct Method for OpenLoop Dynamic Games for Affine Control Systems D .A. Carlson and G. L eitmann
3 Braess Paradox and Properties of Wardrop Equilibrium in some Multiservice Networks R. El A zouzi, E. A ltman and 0 . Pourtallier
4 P roduction Games and Price Dynamics S.D. F lim 5 Consistent Conjectures, Equilibria and Dynamic Games A . JeanMarie and M. Tidball
6 Cooperative Dynamic Games with Iricomplete Information L .A. Petrosjan 7 Electricity Prices in a Game Theory Context M. B ossy, N. Mai'zi, G .J. Olsder, 0. Pourtallier and E. TanrC
8 Efficiency of Bertrand and Cournot: A Two Stage Game M . Breton and A . Furki 9 C heap Talk, Gullibility, a nd Welfare in an Environmental Taxation Game H. Dawid, C. Deissenberg, and Pave1 g e v ~ i k x D YNAMIC GAMES: THEORY AND APPLICATIONS
193 10 A TwoTimescale Stochastic Game Framework for Climate Change Policy Assessment A. Haurie
11 A Differential Game of Advertising for National and Store Brands S . Karray and G . Zaccour
213 12 Incentive Strategies for Shelfspace Allocation in Duopolies G. MartinHerra'n and S . Taboubi
13 Subgame Consistent DormantFirm Cartels D. W.K. Yeung 23 1 Contributing Authors E ITANALTMAN I NRIA, France altmanQsophia.inria.fr J EANPIERRE AUBIN RBseau d e Recherche Viabilite, J e w , Contr6le, France J.P.AubinQwanadoo.fr MIREILLE BOSSY I NRIA, France
M ireille.BossyQsophia.inria.fr NADIAM A ~ Z I ~ c o l d es Mines de Paris, France e Nadia.MaiziQensmp.fr G UIOMAR\/IART~NHERRAN I Universidad de Valladolid, Spain guiomarQeco.uva.es GEERT JAN OLSDER Delft University of Technology, The Netherlands G.J.OlsderQewi.tude1ft.nl L EONA . P ETROSJAN S t.Petersburg S tate University, Russia s pbuoasis7Qpeterlink.r~ O DILEPOURTALLIER I NRIA, France
Odi1e.PourtallierQsophia.inria.fr MICHBLEBRETON HEC Montreal and GERAD, Canada Miche1e.BretonQhec.ca D EANA . C ARLSON T he University of Toledo, USA
dcarlson@math.utoledo.edu HERBERTD.~WID University of Bielefeld hdawidQwiwi.unibielefeld.de CHRISTOPHE EISSENBERG D University of AixMarseille 11, France deissenbQunivaix.fr R ACHID L A ZOUZI E Univesite d ' Avivignon, France elazouziQlia.univavignon.fr S JUR DIDRIKF LLM Bergen University. Norway sjur.flaamQecon.uib.no A LAINHAURIE Universite d e Genkve a nd GERAD, Switzerland Alain.HaurieQhec.unige.ch A LAINJEANMARIE University of Montpellier 2 , France ajmQlirmm. fr S ALMA KARRAY University of Ontario Institute of Technology, Canada salma.karrayQuoit.ca G EORGE LEITMANN University of California a t Berkeley, USA
g leit~clink4.berkeley.edu PAVELS E V ~ I K University of AixMarseille 11, France paulenf ranceQyahoo .f r S I H E ~ IABOUBI T H EC Montreal, Canada sihem.taboubiQhec. c a E TIENNE ANRE T INRIA, France
E tienne.TanreQsophia.inria.fr MABELTIDBALL INRALAMETA, France tidballQensam.inra.fr A BDALLA TURKI HEC Montr4al and GERAD, Canada Abdalla.TurkiQhec.ca DAVIDW .K. Y EUNG Hong Kong Baptist University wkyeungQhkbu.edu.hk GEORGESZACCOUR HEC Montreal and GERAD, Canada Preface This volume collects thirteen chapters dealing with a wide range of topics in (mainly) differential games. It is divided in two parts. Part I groups six contributions which deal essentially, but not exclusively, with theoretical or methodological issues arising in different dynamic games. Part I1 contains seven applicationoriented chapters in economics and management science. Part I I n Chapter 1, Aubin deals with cooperative games defined on networks, which could be of different kinds (socioeconomic, neural or genetic networks), and where he allows for coalitions to evolve over time. Aubin provides a class of control systems, coalitions and multilinear connectionist operators under which the architecture of the network remains viable. He next uses the viability/capturability approach to study the problem of characterizing the dynamic core of a dynamic cooperative game defined in charact,eristic function form. In Chapter 2 , Carlson and Leitmann provide a direct method for openloop dynamic games with dynamics affine with respect to controls. The direct method was first introduced by Leitmann in 1967 for problems of calculus of variations. It has been the topic of recent contributions with the aim to extend it to differential games setting. In particular, the method has been successfully adapted for differential games where each player has its own state. Carlson and Leitmann investigate here the utility of the direct method in the case where the state dynamics are described by a single equation which is affine in players' strategies. In Chapter 3 , El Azouzi et al. consider the problem of routing in networks in the context where a number of decision makers having theirown utility to maximize. If each decision maker wishes to find a minimal path for each routed object (e.g., a packet), then the solution concept is the Wardrop equilibrium. It is well known that equilibria may exhibit inefficiencies and paradoxical behavior, such as the famous Braess paradox (in which the addition of a link to a network results in worse performance to all users). The authors provide guidelines for the network administrator on how to modify the network so that it indeed results in improved performance. FlAm considers in Chapter 4 production or market games with transferable utility. These games, which are actually of frequent occurrence and great importance in theory and practice, involve parties concerned wit,h t he issue of finding a fair sharing of efficient production costs. Flbm xiv D YNAMIC GAMES: THEORY AND APPLICATIONS shows that, in many cases, explicit core solutions may be defined by shadow prices, and reached via quite natural dynamics. JeanMarie and Tidball discuss in Chapter 5 t he relationships between conjectures, conjectural equilibria, consistency and Nash equilibria in the classical theory of discretetime dynamic games. They propose a theoretical framework in which they define conjectural equilibria with several degrees of consistency. In particular, they introduce feedbackconsistency, and prove that the corresponding conjectural equilibria and Nashfeedback equilibria of the game coincide. Finally, they discuss the relationship between these results and previous studies based on differential games and supergames. In Chapter 6 , Petrosjan defines on a game tree a cooperative game in characteristic function form with incomplete information. He next introduces the concept of imputation distribution procedure in connection with the definitions of timeconsistency and strongly timeconsistency. Petrosjan derives sufficient conditions for the existence of timeconsistent solutions. He also develops a regularization procedure and constructs a new characteristic function for games where these conditions cannot be met. The author also defines the regularized core and proves that it is strongly timeconsistent. Finally, he investigates the special case of s t~chast~ic games. Part I1 Bossy et al. consider in Chapter 7 a deregulated electricity market formed of few competitors. Each supplier announces the maximum quantity he is willing to sell at a certain fixed price. The market then decides the quantities to be delivered by the suppliers which satisfy demand at minimal cost. Bossy et al. characterize Nash equilibrium for the two scenarios where in turn the producers maximize their market shares and profits. A close analysis of the equilibrium results points out towards some difficulties in predicting players' behavior. Breton and Turki analyze in Chapter 8 a differentiated duopoly where firms engage in research and development (R&D) t o reduce their production cost. The authors first derive and compare Bertrand and Cournot equilibria in terms of quantities, prices, investments in R&D, consumer's surplus and total welfare. The results are stated with reference to productivity of R&D and the degree of spillover in the industry. Breton and Turki also assess the robustness of their results and those obtained in the literature. Their conclusion is that the relative efficiencies of Bertrand and Cournot equilibria are sensitive t,o t he specifications that are used, and hence the results are far from being robust. PREFACE xv I n Chapter 9, Dawid et al. consider a dynamic model of environmental taxation where the firms are of two types: believers who take the tax announcement by the Regulator at face value and nonbelievers who perfectly anticipate the Regulator's decisions at a certain cost. The authors assume that the proportion of the two types evolve overtime depending on the relative profits of both groups. Dawid et al. show that the Regulator can use misleading tax announcements to steer the economy to an equilibrium which is Paret,oimproving compared with the solutions proposed in the literature. In Chapter 10, Haurie shows how a multitimescale hierarchical noncooperative game paradigm can contribute to the development of integrated assessment models of climate change policies. He exploits the fact that the climate and economic subsystems evolve at very different time scales. Haurie formulates the international negotiation at the level of climate control as a piecewise deterministic stochastic game played in the '!slown time scale, whereas the economic adjustments in the different nations take place in a "faster" time scale. He shows how the negotiations on emissions abatement can be represent,ed in the slow time scale whereas the economic adjustments are represent,ed in the fast time scale as solutions of general economic equilibrium models. He finally provides some indications on the integration of different classes of models that could be made, using an hierarchical game theoretic structure. In Chapter 11, Karray and Zaccour consider a differential game model for a marketing channel formed by one manufact,urer and one retailer. The latter sells the manufacturer's national brand and may also introduce a private label offered at a lower price. The authors first assess the impact of a private label i ntrod~ct~ion the players' payoffs. Next, on in the event where it is beneficial for the retailer to propose his brand to consumers and detrimental to the manufacturer, they investigate if a cooperative advertising program could help the manufacturer to mitigate the negative impact of the private label. MartinHerrBn a nd Taboubi (Chapter 12) aim at determining equilibrium shelfspace allocation in a marketing channel with two competing manufacturers and one retailer. The formers control advertising expenditures in order to build a brand image. They also offer t o the retailer an incentive designed t,o increase their share of the shelf space. The problem is formulated as a Stackelberg infinitehorizon differential game with the manufacturers as leaders. Strationary feedback equilibria are characterized and numerical experiments are conducted to illustrate how the players set their marketing efforts. In Chapter 13, Yeung considers a duopoly in which t he firms agree to form a cartel. In particular, one firm has absolute and marginal cost xvi D YNAMIC GAMES: T HEORY A ND APPLICATIONS advantage over the other forcing one of the firms t o become a dormant firm. The aut,hor derives a subgame consistent solution based on the Nash bargaining axioms. Subganle consistency is a fundamental element in the solution of cooperative stochastic differential games. In particular, it ensures that the extension of the solution policy t o a later starting time and any possible state brought about by prior optimal behavior of the players would remain optimal. Hence no players will have incentive to deviate from the initial plan. Acknowledgements T he Editors would like to express their gratitude t o the authors for their contributions and timely responses t o our comments and suggestions. We wish also to thank Francine Benoi't and Nicole P aradis of G ERAD for their expert editing of the volume. Chapter 1 DYNAMICAL CONNECTIONIST NETWORK AND COOPERATIVE GAMES
JeanPierre Aubin
Abstract Socioeconomic networks, neural networks and genetic networks describe collective phenomena through constraints relating actions of several players, coalitions of these players and multilinear connectionist operators acting on the set of actions of each coalition. Static and dynamical cooperative games also involve coalitions. Allowing “coalitions to evolve” requires the embedding of the ﬁnite set of coalitions in the compact convex subset of “fuzzy coalitions”. This survey present results obtained through this strategy. We provide ﬁrst a class of control systems governing the evolution of actions, coalitions and multilinear connectionist operators under which the architecture of a network remains viable. The controls are the “viability multipliers” of the “resource space” in which the constraints are deﬁned. They are involved as “tensor products” of the actions of the coalitions and the viability multiplier, allowing us to encapsulate in this dynamical and multilinear framework the concept of Hebbian learning rules in neural networks in the form of “multiHebbian” dynamics in the evolution of connectionist operators. They are also involved in the evolution of coalitions through the “cost” of the constraints under the viability multiplier regarded as a price, describing a “nerd behavior”. We use next the viability/capturability approach for studying the problem of characterizing the dynamic core of a dynamic cooperative game deﬁned in a characteristic function form. We deﬁne the dynamic core as a setvalued map associating with each fuzzy coalition and each time the set of imputations such that their payoﬀs at that time to the fuzzy coalition are larger than or equal to the one assigned by the characteristic function of the game and study it. 1. Introduction Collective phenomena deal with the coordination of actions by a ﬁnite number n of players labelled i = 1, . . . , n using the architecture of a network of players, such as socioeconomic networks (see for instance, 2 DYNAMIC GAMES: THEORY AND APPLICATIONS Aubin (1997, 1998a), Aubin and Foray (1998), Bonneuil (2000, 2001)), neural networks (see for instance, Aubin (1995, 1996, 1998b), Aubin and Burnod (1998)) and genetic networks (see for instance, Bonneuil (1998b, 2005), Bonneuil and SaintPierre (2000)). This coordinated activity requires a network of communications or connections of actions xi ∈ Xi ranging over n ﬁnite dimensional vector spaces Xi as well as coalitions of players. The simplest general form of a coordination is the requirement that a relation between actions of the form g (A(x1 , . . . , xn )) ∈ M must be satisﬁed. Here 1. A : n Xi → Y is a connectionist operator relating the individual i=1 actions in a collective way, 2. M ⊂ Y is the subset of the resource space Y and g is a map, regarded as a propagation map. We shall study this coordination problem in a dynamic environment, by allowing actions x(t) and connectionist operators A(t) to evolve according to dynamical systems we shall construct later. In this case, the coordination problem takes the form ∀ t ≥ 0, g (A(t)(x1 (t), . . . , xn (t))) ∈ M However, in the ﬁelds of motivation under investigation, the number n of variables may be very large. Even though the connectionist operators A(t) deﬁning the “architecture” of the network are allowed to operate a priori on all variables xi (t), they actually operate at each instant t on a coalition S (t) ⊂ N := {1, . . . , n} of such variables, varying naturally with time according to the nature of the coordination problem. On the other hand, a recent line of research, dynamic cooperative game theory has been opened by Leon Petrosjan (see for instance Petrosjan (1996) and Petrosjan and Zenkevitch (1996)), Alain Haurie (Haurie (1975)), Georges Zaccour, Jerzy Filar and others. We quote the ﬁrst lines of Filar and Petrosjan (2000): “Bulk of the literature dealing with cooperative games (in characteristic function form) do not address issues related to the evolution of a solution concept over time. However, most conﬂict situations are not “one shot” games but continue over some time horizon which may be limited a priori by the game rules, or terminate when some speciﬁed conditions are attained.” We propose here a concept of dynamic core of a dynamical fuzzy cooperative game as a setvalued map associating with each fuzzy coalition and each time the set of imputations such that their payoﬀs at that time to the fuzzy coalition are larger than or equal to the one assigned by the characteristic function 1 Dynamical Connectionist Network and Cooperative Games 3 of the game. We shall characterize this core through the (generalized) derivatives of a valuation function associated with the game, provide its explicit formula, characterize its epigraph as a viablecapture basin of the epigraph of the characteristic function of the fuzzy dynamical cooperative game, use the tangential properties of such basins for proving that the valuation function is a solution to a HamiltonJacobiIsaacs partial diﬀerential equation and use this function and its derivatives for characterizing the dynamic core. In a nutshell, this survey deals with the evolution of fuzzy coalitions for both regulate the viable architecture of a network and the evolutions of imputations in the dynamical core of a dynamical fuzzy cooperative game. Outline The survey is organized as follows: 1. We begin by recalling what are fuzzy coalitions in the framework of convexiﬁcation procedures, 2. we proceed by studying the evolution of networks regulated by viability multipliers, showing how Hebbian rules emerge in this context 3. and by introducing fuzzy coalitions of players in this network and showing how a herd behavior emerge in this framework. 4. We next deﬁne dynamical cores of dynamical fuzzy cooperative games (with sidepayments) 5. and explain brieﬂy why the viability/capturability approach is relevant to answer the questions we have raised. 2. Fuzzy coalitions The ﬁrst deﬁnition of a coalition which comes to mind, being that of a subset of players S ⊂ N , is not adequate for tackling dynamical models of evolution of coalitions since the 2n coalitions range over a ﬁnite set, preventing us from using analytical techniques. One way to overcome this diﬃculty is to embed the family of subsets of a (discrete) set N of n players to the space Rn : This canonical embedding is more adapted to the nature of the power set P (N ) than to the universal embedding of a discrete set M of m elements to Rm by the Dirac measure associating with any j ∈ M the j th element of the canonical basis of Rm . The convex hull of the image of M by this embedding is the probability simplex of Rm . Hence 4 DYNAMIC GAMES: THEORY AND APPLICATIONS We embed the family of subsets of a (discrete) set N of n players to the space Rn through the map χ associating with any coalition S ∈ P (N ) its characteristic function χS ∈ {0, 1}n ⊂ Rn , since Rn can be regarded as the set of functions from N to R. By deﬁnition, the family of fuzzy sets is the convex hull [0, 1]n of the power set {0, 1}n in Rn . fuzzy sets oﬀer a “dedicated convexiﬁcation” procedure of the discrete power set M := P (N ) instead of the universal convexiﬁcation procedure of frequencies, probabilities, mixed strategies derived from its embedding in n Rm = R2 . By deﬁnition, the family of fuzzy sets1 is the convex hull [0, 1]n of the power set {0, 1}n in Rn . Therefore, we can write any fuzzy set in the form mS χS where mS ≥ 0 & mS = 1 χ=
S ∈P (N ) S ∈P (N ) The memberships are then equal to ∀ i ∈ N, χi =
Si mS Consequently, if mS is regarded as the probability for the set S to be formed, the membership of player i to the fuzzy set χ is the sum of the probabilities of the coalitions to which player i belongs. Player i participates fully in χ if χi = 1, does not participate at all if χi = 0 and participates in a fuzzy way if χi ∈]0, 1[. We associate with a fuzzy coalition χ the set P (χ) := {i ∈ N χi = 0} ⊂ N of players i participating in the fuzzy coalition χ. We also introduce the membership γS (χ) :=
j ∈S χj 1 This concept of fuzzy set was introduced in 1965 by L. A. Zadeh. Since then, it has been wildly successful, even in many areas outside mathematics!. We found in “La lutte ﬁnale”, Michel Lafon (1994), p.69 by A. Bercoﬀ the following quotation of the late Fran¸ois Mitterand, c president of the French Republic (19811995): “Aujourd’hui, nous nageons dans la po´sie e pure des sous ensembles ﬂous” . . . (Today, we swim in the pure poetry of fuzzy subsets)! 1 Dynamical Connectionist Network and Cooperative Games 5 of a coalition S in the fuzzy coalition χ as the product of the memberships of players i in the coalition S . It vanishes whenever the membership of one player does and reduces to individual memberships for one player coalitions. When two coalitions are disjoint (S ∩ T = ∅), then γS ∪T (χ) = γS (χ)γT (χ). In particular, for any player i ∈ S , γS (χ) = χi γS \i (χ). Actually, this idea of using fuzzy coalitions has already been used in the framework of static cooperative games with and without sidepayments in Aubin (1979, 1981a,b), and Aubin (1998, 1993), Chapter 13. Further developments can be found in Mares (2001) and Mishizaki and Sokawa (2001), Basile (1993, 1994, 1995), Basile, De Simone and Graziano (1996), Florenzano (1990)). Fuzzy coalitions have also been used in dynamical models of cooperative games in Aubin and Cellina (1984), Chapter 4 and of economic theory in Aubin (1997), Chapter 5. This idea of fuzzy sets can be adapted to more general situations relevant in game theory. We can, for instance, introduce negative memberships when players enter a coalition with aggressive intents. This is mandatory if one wants to be realistic ! A positive membership is interpreted as a cooperative participation of the player i in the coalition, while a negative membership is interpreted as a noncooperative participation of the ith player in the generalized coalition. In what follows, one can replace the cube [0, 1]n by any product n [λi , μi ] for describing i=1 the cooperative or noncooperative behavior of the consumers. We can still enrich the description of the players by representing each player i by what psychologists call her ‘behavior proﬁle’ as in Aubin, LouisGuerin and Zavalloni (1979). We consider q ‘behavioral qualities’ k = 1, . . . , q , each with a unit of measurement. We also suppose that a behavioral quantity can be measured (evaluated) in terms of a real number (positive or negative) of units. A behavior proﬁle is a vector a = (a1 , . . . , aq ) ∈ Rq which speciﬁes the quantities ak of the q qualities k attributed to the player. Thus, instead of representing each player by a letter of the alphabet, she is described as an element of the vector space Rq . We then suppose that each player may implement all, none, or only some of her behavioral qualities when she participates in a social coalition. Consider n players represented by their behavior proﬁles in Rq . Any matrix χ = (χk ) describing the levels of participation i χk ∈ [−1, +1] of the behavioral qualities k for the n players i is called a i social coalition. Extension of the following results to social coalitions is straightforward. Technically, the choice of the scaling [0, 1] inherited from the tradition built on integration and measure theory is not adequate for describing convex sets. When dealing with convex sets, we have to replace the 6 DYNAMIC GAMES: THEORY AND APPLICATIONS characteristic functions by indicators taking their values in [0, +∞] and take their convex combinations to provide an alternative allowing us to speak of “fuzzy” convex sets. Therefore, “tollsets” are nonnegative cost functions assigning to each element its cost of belonging, +∞ if it does not belong to the toll set. The set of elements with ﬁnite positive cost do form the “fuzzy boundary” of the toll set, the set of elements with zero cost its “core”. This has been done to adapt viability theory to “fuzzy viability theory”. Actually, the Cramer transform Cμ (p) := sup p, χ − log e
Rn x,y χ∈Rn dμ(y ) maps probability measures to toll sets. In particular, it transforms convolution products of density functions to infconvolutions of extended functions, Gaussian functions to squares of norms, etc. See Chapter 10 of Aubin (1991) and Aubin and Dordan (1996) for more details and information on this topic. The components of the state variable χ := (χ1 , . . . , χn ) ∈ [0, 1]n are the rates of participation in the fuzzy coalition χ of player i = 1, . . . , n. Hence convexiﬁcation procedures and the need of using functional analysis justiﬁes the introduction of fuzzy sets and its extensions. In the examples presented in this survey, we use only classical fuzzy sets. 3. 3.1 Regulation of the evolution of a network Deﬁnition of the architecture of a network
We introduce 1. n ﬁnite dimensional vector spaces Xi describing the action spaces of the players 2. a ﬁnite dimensional vector space Y regarded as a resource space and a subset M ⊂ Y of scarce resources2 . Definition 1.1 The architecture of dynamical network involves the evolution
1. of actions x(t) := (x1 (t), . . . , xn (t)) ∈
n i=1 Xi , 2 For simplicity, the set M of scarce resources is assumed to be constant. But sets M (t) of scarce resources could also evolve through mutational equations and the following results can be adapted to this case. Curiously, the overall architecture is not changed when the set of available resources evolves under a mutational equation. See Aubin (1999) for more details on mutational equations. 1 Dynamical Connectionist Network and Cooperative Games 7 2. of connectionist operators AS (t) (t) : and requires that ∀ t ≥ 0, where g :
S ⊂N n i=1 Xi →Y, 3. acting on coalitions S (t) ⊂ N := {1, . . . , n} of the n players g {AS (t)(x(t))}S ⊂N ∈ M YS → Y . We associate with any coalition S ⊂ N the product X S := i∈S Xi and denote by AS ∈ LS (X S , Y ) the space of S linear operators AS : X S → Y , i.e., operators that are linear with respect to each variable xi , (i ∈ S ) when the other ones are ﬁxed. Linear operators Ai ∈ L(Xi , Y ) are obtained when the coalition S := {i} is reduced to a singleton, and we identify L∅ (X ∅ , Y ) := Y with the vector space Y . In order to tackle mathematically this problem, we shall 1. restrict the connectionist operators A := S ⊂N AS to be multiaﬃne, i.e., the sum over all coalitions of S linear operators3 AS ∈ LS (X S , Y ), 2. allow coalitions S to become fuzzy coalitions so that they can evolve continuously. So, a network is not only any kind of a relationship between variables, but involves both connectionist operators operating on coalitions of players. 3.2 Constructing the dynamics The question we raise is the following: Assume that we know the intrinsic laws of evolution of the variables xi (independently of the constraints), of the connectionist operator AS (t) and of the coalitions S (t). Is the above architecture viable under these dynamics, in the sense that the collective constraints deﬁning the architecture of the dynamical network are satisﬁed at each instant. There is no reason why let on his own, collective constraints deﬁning the above architecture are viable under these dynamics. Then the question arises how to reestablish the viability of the system. One may 1. either delineate those states (actions, connectionist operators, coalitions) from which begin viable evolutions,
3 Also called (or regarded as) tensors.They are nothing other than matrices when the operators are linear instead of multilinear. Tensors are the matrices of multilinear operators, so to speak, and their “entries” depend upon several indexes instead of the two involved in matrices. 8 DYNAMIC GAMES: THEORY AND APPLICATIONS 2. or correct the dynamics of the system in order that the architecture of the dynamical network is viable under the altered dynamical system. The ﬁrst approach leads to take the viability kernel of the constrained subset of K of states (xi , AS , S ) satisfying the constraints deﬁning the architecture of the network. We refer to Aubin (1997, 1998a) for this approach. We present in this section a class of methods for correcting the dynamics without touching on the architecture of the network. One may indeed be able, with a lot of ingeniousness and intimate knowledge of a given problem, and for “simple constraints”, to derive dynamics under which the constraints are viable. However, we can investigate whether there is a kind of mathematical factory providing classes of dynamics “correcting” the initial (intrinsic) ones in such a way that the viability of the constraints is guaranteed. One way to achieve this aim is to use the concept of “viability multipliers” q (t) ranging over the dual Y ∗ of the resource space Y that can be used as “controls” involved for modifying the initial dynamics. This allows us to provide an explanation of the formation and the evolution of the architecture of the network and of the active coalitions as well as the evolution of the actions themselves. A few words about viability multipliers are in order here: If a constrained set K is of the form K := {x ∈ X such that h(x) ∈ M } where h : X → Z := Rm is the constrained map form the state space X to the resource space Z and M ⊂ Z is a subset of available resources, we regard elements u ∈ Z = Z in the dual of the resource space Z (identiﬁed with Z ) as viability multipliers, since they play a role analogous to Lagrange multipliers in optimization under constraints. Recall that the minimization of a function x → J (x) over a constrained set K is equivalent to the minimization without constraints of the function
m x → J (x) +
k=1 ∂hk (x) uk ∂xj for an adequate Lagrange multiplier u ∈ Z = Z in the dual of the resource space Z (identiﬁed with Z ). See for instance Aubin (1998, 1993), Rockafellar and Wets (1997) among many other references on this topic. 1 Dynamical Connectionist Network and Cooperative Games 9 In an analogous way, but with unrelated methods, it has been proved that a closed convex subset K is viable under the control system m ∂hk (x(t)) uk (t) xj (t) = fj (x(t)) + ∂xj
k=1 obtained by adding to the initial dynamics a term involving regulons that belong to the dual of the same resource space Z . See for instance Aubin and Cellina (1984) and Aubin (1991, 1997) below for more details. Therefore, these viability multipliers used as regulons beneﬁt from the same economic interpretation of virtual prices as the ones provided for Lagrange multipliers in optimization theory. The viability multipliers q (t) ∈ Y ∗ can thus be regarded as regulons, i.e., regulation controls or parameters, or virtual prices in the language of economists. These are chosen at each instant in order that the viability constraints describing the network can be satisﬁed at each instant. The main theorem guarantees this possibility. Another theorem tells us how to choose at each instant such regulons (the regulation law). Even though viability multipliers do not provide all the dynamics under which a constrained set is viable, they do provide important and noticeable classes of dynamics exhibiting interesting structures that deserve to be investigated and tested in concrete situations. 3.3 An economic interpretation Although the theory applies to general networks, the problem we face has an economic interpretation that may help the reader in interpreting the main results that we summarize below. Actors here are economic agents (producers) i = 1, . . . , n ranging over the set N := {1, . . . , n}. Each coalition S ⊂ N of economic agents is regarded as a production unit (a ﬁrm) using resources of their agents to produce (or not produce) commodities. Each agent i ∈ N provides a resource vector (capital, competencies, etc.) xi ∈ X in a resource space Xi := Rmi used in production processes involving coalitions S ⊂ N of economic agents (regarded as ﬁrms employing economic agents) We describe the production process of a ﬁrm S ⊂ N by a S linear operator AS : n Xi → Y associating with the resources x := (x1 , . . . , xn ) i=1 provided by the economic agents a commodity AS (x). The supply constraints are described by a subset M ⊂ Y of the commodity space, representing the set of commodities that must be produced by the ﬁrms: Condition AS (t)(x(t)) ∈ M
S ⊂N express that at each instant, the total production must belong to M . 10 DYNAMIC GAMES: THEORY AND APPLICATIONS The connectionist operators among economic agents are the inputoutput production processes operating on the resources provided by the economic agents to the production units described by coalitions of economic agents. The architecture of the network is then described by the supply constraints requiring that at each instant, agents supply adequate resources to the ﬁrms in order that the production objectives are fulﬁlled. When fuzzy coalitions χi of economic agents4 are involved, the supply constraints are described by χj (t) AS (t)(x(t)) ∈ M
S ⊂N j ∈S (1.1) since the production operators are assumed to be multilinear. 3.4 Linear connectionist operators We summarize the case in which there is only one player and the operator A : X → Y is aﬃne studied in Aubin (1997, 1998a,b): ∀ x ∈ X, A(x) := W x + y where W ∈ L(X, Y ) & y ∈ Y ∀ t ≥ 0, W (t)x(t) + y (t) ∈ M The coordination problem takes the form: where both the state x, the resource y and the connectionist operator W evolve. These constraints are not necessarily viable under an arbitrary dynamic system of the form ⎧ x (t) = c(x(t)) ⎨ (i) (ii) y (t) = d(y (t)) (1.2) ⎩ (iii) W (t) = α(W (t)) We can reestablish viability by involving multipliers q ∈ Y ∗ ranging over the dual Y ∗ := Y of the resource space Y to correct the initial dynamics. We denote by W ∗ ∈ L(Y ∗ , X ∗ ) the transpose of W : ∀ q ∈ Y ∗, ∀ x ∈ X, W ∗ q, x := q , W x by x ⊗ q ∈ L(X ∗ , Y ∗ ) the tensor product deﬁned by x ⊗ q : p ∈ X ∗ := X → (x ⊗ q )(p) := p, x q the matrix of which is made of entries (x ⊗ q )j = xi q j . i
4 Whenever the resources involved in production processes are proportional to the intensity of labor, one could interpret in such speciﬁc economic models the rate of participation χi of economic agent i as (the rate of) labor he uses in the production activity. 1 Dynamical Connectionist Network and Cooperative Games 11 The contingent cone TM (x) to M ⊂ Y at y ∈ M is the set of directions v ∈ Y such that there exist sequences hn > 0 converging to 0, and vn converging to v satisfying y + hn vn ∈ M for every n. The (regular) normal cone to M ⊂ Y at y ∈ M is deﬁned by NM (y ) := {q ∈ Y ∗ ∀ v ∈ TM (y ), q , v ≤ 0} (see Aubin and Frankowska (1990) and Rockafellar and Wets (1997) for more details on these topics). We proved that the viability of the constraints can be reestablished when the initial system (1.2) is replaced by the control system ⎧ x (t) = c(x(t)) − W ∗ (t)q (t) ⎪ (i) ⎪ ⎨ (ii) y (t) = d(y (t)) − q (t) (iii) W (t) = α(W (t)) − x(t) ⊗ q (t) ⎪ ⎪ ⎩ where q (t) ∈ NM (W (t)x(t) + y (t)) where NM (y ) ⊂ Y ∗ denotes the normal cone to M at y ∈ M ⊂ Y and x ⊗ q ∈ L(X, Y ∗ ) denotes the tensor product deﬁned by x ⊗ q : p ∈ X ∗ := X → (x ⊗ q )(x) := p, x q the matrix of which is made of entries (x ⊗ q )j = xi q j . In other words, i the correction of a dynamical system for reestablishing the viability of constraints of the form W (t)x(t) + y (t) ∈ M involves the rule proposed by Hebb in his classic book The organization of behavior in 1949 as the basic learning process of synaptic weight and called the Hebbian rule: j Taking α(W ) = 0, the evolution of the synaptic matrix W := (wi ) obeys the diﬀerential equation dj w (t) = −xi (t)q j (t) dt i The Hebbian rule states that the velocity of the synaptic weight is the product of presynaptic activity and postsynaptic activity. Such a learning rule “pops up” (or, more pedantically, emerges) whenever the synaptic matrices are involved in regulating the system in order to maintain the “homeostatic” constraint W (t)x(t) + y (t) ∈ M . (See Aubin (1996) for more details on the relations between Hebbian rules and tensor products in the framework of neural networks). Viability multipliers q (t) ∈ Y regulating viable evolutions satisfy the regulation law ∀ t ≥ 0, q (t) ∈ RM (A(t), x(t)) where the regulation map RM is deﬁned by RM (A, x) = (AA + x 2 I)−1 (Ac(x) + α(A)(x) − TM (A(x))) 12 DYNAMIC GAMES: THEORY AND APPLICATIONS One can even require that on top of it, the viability multiplier satisﬁes q (t) ∈ NM (A(t)x(t)) ∩ RM (A(t), x(t))) The norm q (t) of the viability multiplier q (t) measures the intensity of the viability discrepancy of the dynamic since (i) (ii) c(x(t)) − x (t) ≤ A∗ (t) α(A(t)) − A (t) = x(t) q (t) q (t) When α(A) ≡ 0, the viability multipliers with minimal norm in the regulation map provide both the smallest error c(x(t)) − x (t) and the smallest velocities of the connection matrix because A (t) = x(t) q (t) . The inertia of the connection matrix, which can be regarded as an index of dynamic connectionist complexity, is proportional to the norm of the viability multiplier. 3.5 Hierarchical architecture and complexity
AH−1 · · · Ah−1 . . . A1 x1 ∈ MH 2 H h The constraints are of the form This describes for instance a production process associating with the resource x1 the intermediate outputs x2 := A1 x1 , which itself pro2 duces an output x3 := A1 x2 , and so on, until the ﬁnal output xH := 2 AH−1 · · · Ah−1 . . . A1 x1 which must belong to the production set MH . 2 H h The evolution without constraints of the commodities and the operators is governed by dynamical systems of the form ⎧ ⎪ (i) xh (t) = ch (xh (t)) ⎨ ⎪ ⎩ (ii) The constraints ∀ t ≥ 0, AH−1 (t) · · · Ah−1 (t) . . . A1 (t)x1 (t) ∈ MH 2 H h dh h A (t) = αh+1 (Ah (t)) dt h+1 are viable under the system ⎧ 1 1 (h = 1) ⎪ x1 (t) = c1 (x1 (t)) + A2 (t) (t)p (t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x (t) = c (x (t)) − ph−1 (t) + Ah (t) ph (t) (h = 1, . . . , H − 1) ⎪h hh ⎪ h+1 ⎨ ⎪ xH (t) = cH (xH (t)) − pH−1 (t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪d ⎪ h ⎩ Ah (t) = αh+1 (Ah (t)) + xh (t) ⊗ ph (t) dt h+1 (h = H) (h = 1, . . . , H − 1) 1 Dynamical Connectionist Network and Cooperative Games 13 involving viability multipliers ph (t) (intermediate “shadow price”). The inputoutput matrices Ah+1 (t) obey dynamics involving the tensor prodh uct of xh (t) and ph (t). The viability multiplier ph (t) at level h(h = 1, . . . , H − 1) both regulate the evolution at level h and send a message at upper level h + 1. We can tackle actually more complex hierarchical situations with non ordered hierarchies. Assume that X := H=1 , Y := K=1 and that h k A := (Ak ) where Ak ∈ L(Xk , Yh ). We introduce a setvalued map h h J : {1, . . . , H} ; {1, . . . , K}. The constraints are deﬁned by ∀ h = 1, . . . , H,
k∈J (h) Ak (t)xk (t) ∈ Mh ⊂ Yh h We consider a system of diﬀerential equations ⎧ ⎪ (i) xh (t) = ch (xh (t)), h = 1, . . . , H ⎨ ⎪ ⎩ (ii) dk k A (t) = αh (Ak (t)) h dt h Then the constraints ∀ h = 1, . . . , H, . . .
k∈J (h) Ak (t)xk (t) ∈ Mh ⊂ Yh h are viable under the corrected system ⎧ ⎪ (i) xh (t) = ch (xh (t)) − k∈J −1 (h) Ah (t) pk , k ⎪ ⎨ h = 1, . . . , H, k = 1, . . . , K ⎪ ⎪ ⎩ (ii) d Ak (t) = αk (Ak (t)) − xk (t) ⊗ ph (t), (h, k ) ∈ Graph(J ) h h dt h 3.6 Connectionist tensors In order to handle more explicit and tractable formulas and results, we shall assume that the connectionist operator A : X := n Xi ; Y i=1 is multiaﬃne. For deﬁning such a multiaﬃne operator, we associate with any coalition S ⊂ N its characteristic function χS : N → R associating with any i∈N 1 if i ∈ S χS (i) := 0 if i ∈ S / 14 DYNAMIC GAMES: THEORY AND APPLICATIONS It deﬁnes a linear operator χS ◦ ∈ L ( n Xi , n Xi ) that associates i=1 i=1 with any x = (x1 , . . . , xn ) ∈ n Xi the sequence χS ◦ x ∈ Rn deﬁned i=1 by ∀ i = 1, . . . , n, (χS ◦ x)i := xi if i ∈ S 0 if i ∈ S / We associate with any coalition S ⊂ N the subspace
n n X S := xS ◦
i=1 Xi = x∈
i=1 Xi such that∀ i ∈ S, xi = 0 / since xS ◦ is nothing other that the canonical projector from n Xi i=1 onto X S . In particular, X N := n Xi and X ∅ := {0}. i=1 Let Y be another ﬁnite dimensional vector space. We associate with any coalition S ⊂ N the space LS (X S , Y ) of S linear operators AS . We extend such a S linear operator AS to a nlinear operator (again denoted by) AS ∈ Ln ( n Xi , Y ) deﬁned by: i=1
n ∀x∈
i=1 Xi , AS (x) = AS (x1 , . . . , xn ) := AS (χS ◦ x) A multiaﬃne operator A ∈ An ( n Xi , Y ) is a sum of S linear operi=1 ators AS ∈ LS (X S , Y ) when S ranges over the family of coalitions: A(x1 , . . . , xn ) :=
S ⊂N AS (χS ◦ x) =
S ⊂N AS (x) We identify A∅ with a constant A∅ ∈ Y . Hence the collective constraint linking multiaﬃne operators and actions can be written in the form ∀ t ≥ 0,
S ⊂N AS (t)(x(t)) ∈ M For any i ∈ S , we shall denote by (x−i , ui ) ∈ X N the sequence y ∈ X N where yj := xj when j = i and yi = ui when j = i. The linear operator AS (x−i ) ∈ L(Xi , Y ) is deﬁned by ui → AS (x−i )ui := AS (x−i , ui ). We shall use its transpose AS (x−i )∗ ∈ L(Y ∗ , Xi∗ ) deﬁned by ∀ q ∈ Y ∗, ∀ ui ∈ Xi , AS (x−i )∗ q, ui = q , AS (x−i ) ui 1 Dynamical Connectionist Network and Cooperative Games 15 We associate with q ∈ Y ∗ and elements xi ∈ Xi the multilinear operator5
n x1 ⊗ · · · ⊗ xn ⊗ q ∈ Ln
i=1 Xi∗ , Y ∗ associating with any
n p := (p1 , . . . , pn ) ∈
i=1 n Xi∗ the element
i=1 pi , xi q:
n x1 ⊗ · · · ⊗ xn ⊗ q : p := (p1 , . . . , pn ) ∈
i=1 Xi∗
n → (x1 ⊗ · · · ⊗ xn ⊗ q )(p) :=
i=1 pi , xi q ∈Y∗ This multilinear operator x1 ⊗ · · · ⊗ xn ⊗ q is called the tensor product of the xi ’s and q . We recall that the duality product on
n Ln
i=1 Xi∗ , Y ∗ n × Ln
i=1 Xi , Y for pairs (x1 ⊗ · · · ⊗ xn ⊗ q, A) can be written in the form: x1 ⊗ · · · ⊗ xn ⊗ q, A := q , A(x1 , . . . , xn ) 3.7 MultiHebbian learning process Assume that we start with intrinsic dynamics of the actions xi , the resources y , the connectionist matrices W and the fuzzy coalitions χ: (i) xi (t) = ci (x(t)), (ii) AS (t) = αS (A(t)), i = 1, . . . , n S⊂N Using viability multipliers, we can modify the above dynamics by introducing regulons that are elements q ∈ Y ∗ of the dual Y ∗ of the space Y :
5 We recall that the space Ln
n i=1 n i=1 Xi , Y of nlinear operators from
n i=1 n i=1 Xi to Y is iso metric to the tensor product with Ln
n i=1 ∗ Xi , Y ∗ . ∗ Xi ⊗ Y , the dual of which is Xi ⊗ Y ∗ , that is isometric 16 DYNAMIC GAMES: THEORY AND APPLICATIONS Theorem 1.1 Assume that the functions ci , κi and αS are continuous and that M ⊂ Y are closed. Then the constraints
∀ t ≥ 0,
S ⊂N AS (t)(x(t)) ∈ M are viable under the control system ⎧ ⎪ (i) x (t) = c (x (t)) − ⎪ AS (t)(x−i (t))∗ q (t), ii i ⎪ ⎪ ⎪ ⎪ Si ⎪ ⎨ ⎪ (ii) A (t) = α (A(t)) − ⎪ ⎪ S S ⎪ ⎪ ⎪ ⎪ ⎩ where q (t) ∈ NM ( xj (t) ⊗ q (t),
j ∈S S ⊂N i = 1, . . . , n S⊂N AS (t)(x(t))) Remark: MultiHebbian Rule — When we regard the multilinear operator AS as a tensor product of components Aj Π i , j = 1, . . . , p, S
i∈S k ik = 1, . . . , ni , i ∈ S , diﬀerential equation (ii) can be written in the form: ∀ i ∈ S, j = 1, . . . , p, k = 1, . . . , ni , dj = αSΠi∈S ik (A(t)) − A dt SΠi∈S ik xik (t) q j (t)
i∈S
i∈S ik The correction term of the component Aj Π S of the S linear opera tor is the product of the components xik (t) actions xi in the coalition S and of the component q j of the viability multiplier. This can be regarded as a multiHebbian rule in neural network learning algorithms, since for linear operators, we ﬁnd the product of the component xk of the pre2 synaptic action and the component q j of the postsynaptic action. Indeed, when the vector spaces Xi := Rni are supplied with basis eik , k = 1, . . . , ni , when we denote by e∗k their dual basis, and when Y := Rp i ∗ is supplied with a basis f j , and its dual supplied with the dual basis fj , then the tensor products form a basis of LS X S , Y ∗ .
∗ ∗ eik ⊗ fj (j = 1, . . . , p, k = 1, . . . , ni ) i∈S Hence the components of the tensor product
i∈S xi ⊗ q in this basis are the products
i∈S xik q j of the components q j of q and xik of the xi ’s, where q j := q , f j and xik := e∗k , xi . Indeed, we can write i 1 Dynamical Connectionist Network and Cooperative Games
p ni n 17 xi
i∈S ⊗q =
j =1 i∈S k=1 q, f j
i∈S e∗k , xi i eik
i=1 ∗ ⊗ fj 4. Regulation involving fuzzy coalitions Let A ∈ An ( n Xi , Y ), a sum of S linear operators AS ∈ LS i=1 (X S , Y ) when S ranges over the family of coalitions, be a multiaﬃne operator. When χ is a fuzzy coalition, we observe that A(χ ◦ x) =
S ⊂P (χ) γS (χ)AS (x) =
S ⊂P (χ) j ∈S χj AS (x) We wish to encapsulate the idea that at each instant, only a number of fuzzy coalitions χ are active. Hence the collective constraint linking multiaﬃne operators, fuzzy coalitions and actions can be written in the form ∀ t ≥ 0,
S ⊂P (χ(t)) γS (χ(t))AS (t)(x(t)) =
S ⊂P (χ(t)) j ∈S χj (t) AS (t)(x(t)) ∈ M 4.1 Constructing viable dynamics Assume that we start with intrinsic dynamics of the actions xi , the resources y , the connectionist matrices W and the fuzzy coalitions χ: ⎧ xi (t) = ci (x(t)), i = 1, . . . , n ⎨ (i) (ii) χi (t) = κi (χ(t)), i = 1, . . . , n ⎩ (iii) AS (t) = αS (A(t)), S ⊂ N Using viability multipliers, we can modify the above dynamics by introducing regulons that are elements q ∈ Y ∗ of the dual Y ∗ of the space Y : Theorem 1.2 Assume that the functions ci , κi and αS are continuous and that M ⊂ Y are closed. Then the constraints
∀ t ≥ 0,
S ⊂P (χ(t)) AS (t)(χ(t) ◦ x(t)) =
S ⊂P (χ(t)) j ∈S χj (t) AS (t)(x(t)) ∈ M 18 DYNAMIC GAMES: THEORY AND APPLICATIONS are viable under the control system ⎧ ⎪ (i) ⎪ χj (t) AS (t)(x−i (t))∗ q (t), xi (t) = ci (xi (t)) − ⎪ ⎪ ⎪ ⎪ Si j ∈S ⎪ i = 1, . . . , n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (ii) χ (t) = κi (χ(t)) − χj (t) q (t), AS (t) (x(t)) , ⎨ i ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (iii) A (t) = αS (A(t)) − ⎪ S ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ where q (t) ∈ NM
Si j ∈S \i i = 1, . . . , n xj (t) ⊗ q (t), S ⊂ N
j ∈S j ∈S χj (t)
j ∈S S ⊂P (χ(t)) χj (t) AS (t) x(t) Let us comment on these formulas. First, the viability multipliers q (t) ∈ Y ∗ can be regarded as regulons, i.e., regulation controls or parameters, or virtual prices in the language of economists. They are chosen adequately at each instant in order that the viability constraints describing the network can be satisﬁed at each instant, and the above theorem guarantees this possibility. The next section tells us how to choose at each instant such regulons (the regulation law). For each player i, the velocities xi (t) of the state and the velocities χi (t) of its membership in the fuzzy coalition χ(t) are corrected by subtracting 1. the sum over all coalitions S to which he belongs of the AS (t) (x−i (t))∗ q (t) weighted by the membership γS (χ(t)): xi (t) = ci (xi (t)) −
Si γS (χ(t))AS (t)(x−i (t))∗ q (t) 2. the sum over all coalitions S to which he belongs of the costs q (t), AS (t) (x(t)) of the constraints associated with connectionist tensor AS of the coalition S weighted by the membership γS \i (χ(t)): χi (t) = κi (χ(t)) −
Si γS \i (χ(t)) q (t), AS (t) (x(t)) The (algebraic) increase of player i’s membership in the fuzzy coalition aggregates over all coalitions to which he belongs the cost of their constraints weighted by the products of memberships of the other players in the coalition. It can be interpreted as an incentive for economic agents to increase or decrease his participation in the economy in terms of the cost of 1 Dynamical Connectionist Network and Cooperative Games 19 constraints and of the membership of other economic agents, encapsulating a mimetic — or “herd”, panurgean — behavior (from a famous story by Fran¸ois Rabelais (14831553), where Panurge c sent overboard the head sheep, followed by the whole herd). Panurge ... jette en pleine mer son mouton criant et bellant. Tous les aultres moutons, crians et bellans en pareille intonation, commencerent soy jecter et saulter en mer apr`s, ` la ﬁle ... comme vous ea s¸avez estre du mouton le naturel, tous c jours suyvre le premier, quelque part qu’il aille. Aussi li dict Aristoteles, lib. 9, de Histo. animal. estre le plus sot et inepte animant du monde. As for the correction of the velocities of the connectionist tensors AS , their correction is a weighted “multiHebbian” rule: for each component Aj Π i of AS , the correction term is the product of the membership S γS (χ(t)) of the coalition S , of the components xik (t) and of the component q j (t) of the regulon: dj A = αSΠi∈S ik (A(t)) − γS (χ(t)) dt SΠi∈S ik xik (t) q j (t)
i∈S
i∈S k 4.2 The regulation map Actually, the viability multipliers q (t) regulating viable evolutions of the actions xi (t), the fuzzy coalitions χ(t) and the multiaﬃne operators A(t) obey the regulation law (an “adjustment law”, in the vocabulary of economists) of the form ∀ t ≥ 0, q (t) ∈ RM (x(t), χ(t), A(t)) where RM : X N × Rn × An (X N , Y ) ; Y ∗ is the regulation map RM that we shall compute. For this purpose, we introduce the operator h : X N × Rn ×An (X N , Y ) deﬁned by h(x, χ, A) :=
S ⊂N AS (χ ◦ x) 20 DYNAMIC GAMES: THEORY AND APPLICATIONS and the linear operator H (x, χ, A) : Y ∗ := Y → Y deﬁned by: ⎧ ⎪ H (x, χ, A) := ⎪ χ2 xj 2 I j ⎪ ⎪ ⎪ ⎪ S ⊂N j ∈S ⎪ ⎨ + γR (χ)γS (χ)AR (x−i )AS (x−i )∗ ⎪ ⎪ R,S ⊂N i∈R∩S ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ +γR\i (χ)γS \i (χ)AR (x) ⊗ AS (x) Then the regulation map is deﬁned by ⎧ −1 ⎪ RM (x, χ, A) := H (x, χ, A) ⎪ ⎪ ⎪ ⎨ αS (A)(x) + γS (χ)AS (x−i , ci (x)) S ⊂N i∈S ⎪ ⎪ ⎪ ⎪ ⎩ +γS \i (χ)κi (χ)AS (x) − TM (h(x, χ, A)) Indeed, the regulation map RM associates with any (x, χ, A) the subset RM (x, χ, A) of q ∈ Y ∗ such that h (x, χ, A)((c(x), κ(χ), α(A)) − h (x, χ, A)∗ q ) ∈ co (TM (h(x))) We next observe that h (x, χ, A)h (x, χ, A)∗ = H (x, χ, A) and that ⎧ ⎪ h (x, χ, A)(c(x), κ(χ), α(A)) ⎨ ⎪ ⎩ =
S ⊂N αS (A)(x)+
i∈S γS (χ)AS (x−i , ci (x))+ γS \i (χ)κi (χ)AS (x) Remark: Links between viability and Lagrange multipliers — The point made in this paper is to show how the mathematical methods presented in a general way can be useful in designing other models, as the Lagrange multiplier rule does in the static framework. By comparison, we see that if we minimize a collective utility function:
n n ui (xi ) +
i=1 i=1 vi (χi ) +
S ⊂N wS (AS ) under constraints (1.1), then ﬁrstorder optimality conditions at a optimum ((xi )i , (χi )i , (AS )S ⊂N ) imply the existence of Lagrange multipliers p such that: 1 Dynamical Connectionist Network and Cooperative Games 21 ⎧ ⎪ ∇u (x ) = ⎪ ii ⎪ ⎪ ⎪ ⎪ S ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ∇vi (χi ) = ⎪ S ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∇wS (AS ) = ⎪ ⎩ χj AS (x−i (t))∗ p,
i j ∈S i = 1, . . . , n χj
i j ∈S \i p, AS (x) , i = 1, . . . , n χj
j ∈S j ∈S xj ⊗ p, S⊂N 2 5. 5.1 Dynamical fuzzy cooperative games under tychastic uncertainty Static fuzzy cooperative games Definition 1.2 A Fuzzy game with sidepayments is deﬁned by a characteristic function u : [0, 1]n → R+ of a fuzzy game assumed to be positively homogenous.
When the characteristic function of the static cooperative game u is concave, positively homogeneous and continuous on the interior of Rn , one checks6 that the generalized gradient ∂ u(χN ) is not empty and + coincides with the subset of imputations p := (p1 , . . . , pn ) ∈ Rn accepted + by all fuzzy coalitions in the sense that
n ∀ χ ∈ [0, 1]n , p, χ =
i=1 pi χi ≥ u(χ) (1.3) and that, for the grand coalition χN := (1, . . . , 1),
n p, χN =
i=1 pi = u(χN ) It has been shown that in the framework of static cooperative games with side payments involving fuzzy coalitions, the concepts of Shapley value and core coincide with the (generalized) gradient ∂ u(χN ) of the “characteristic function” u : [0, 1]n → R+ at the “grand coalition” χN := (1, . . . , 1), the characteristic function of N := {1, 2, . . . , n}. The diﬀerences between these concepts for usual games is explained by the diﬀerent ways one “fuzzyﬁes” a characteristic function deﬁned on the set of usual coalitions.
6 See Aubin (1981a,b), Aubin (1979), Chapter 12 and Aubin (1998, 1993), Chapter 13. 22 DYNAMIC GAMES: THEORY AND APPLICATIONS 5.2 Three examples of game rules In a dynamical context, (fuzzy) coalitions evolve, so that static conditions (1.3) should be replaced by conditions7 stating that for any evolution t → x(t) of fuzzy coalitions, the payoﬀ y (t) := p(t), χ(t) should be larger than or equal to u(χ(t)) according (at least) to one of the three following rules: 1. at a prescribed ﬁnal time T of the end of the game:
n y (T ) :=
i=1 pi (T )χi (T ) ≥ u(χ(T )) 2. during the whole time span of the game:
n ∀ t ∈ [0, T ], y (t) :=
i=1 pi (t)χi (t) ≥ u(χ(t)) 3. at the ﬁrst winning time t∗ ∈ [0, T ] when y (t∗ ) :=
n i=1 pi (t∗ )χi (t∗ ) ≥ u(χ(t∗ )) at which time the game stops. Summarizing, the above conditions require to ﬁnd — for each of the above three rules of the game — an evolution of an imputation p(t) ∈ Rn such that, for all evolutions of fuzzy coalitions χ(t) ∈ [0, 1]n starting at χ, the corresponding rule of the game ⎧ n ⎨ i) i=1 pi (T )χi (T ) ≥ u(χ(T )) n ii) ∀ t ∈ [0, T ], (1.4) i=1 pi (t)χi (t) ≥ u(χ(t)) ⎩ n ∗ ∈ [0, T ] such that pi (t∗ )χi (t∗ ) ≥ u(χ(t∗ )) iii) ∃t i=1 must be satisﬁed. Therefore, for each one of the above three rules of the game (1.4), a concept of dynamical core should provide a setvalued map Γ : R+ × [0, 1]n ; Rn associating with each time t and any fuzzy coalition χ a set Γ(t, χ) of imputations p ∈ Rn such that, taking p(t) ∈ Γ(T − t, χ(t)), and + in particular, p(0) ∈ Γ(T, χ(0)), the chosen above condition is satisﬁed. This is the purpose of this study.
7 Naturally, the privileged role played by the grand coalition in the static case must be abandoned, since the coalitions evolve, so that the grand coalition eventually loses its capital status. 1 Dynamical Connectionist Network and Cooperative Games 23 5.3 A general class of game rules Actually, in order to treat the three rules of the game (1.4) as particular cases of a more general framework, we introduce two nonnegative extended functions b and c (characteristic functions of the cooperative games) satisfying ∀ (t, χ) ∈ R+ × Rn × Rn , + 0 ≤ b(t, χ) ≤ c(t, χ) ≤ +∞ By associating with the initial characteristic function u of the game adequate pairs (b, c) of extended functions, we shall replace the requirements (1.4) by the requirement i) ∀ t ∈ [0, t∗ ], y (t) ≥ b(T − t, χ(t))(dynamical constraints) ii) y (t∗ ) ≥ c(T − t∗ , χ(t∗ ))(objective) (1.5) We extend the functions b and c as functions from R × Rn × Rn to R+ ∪ {+∞} by setting ∀ t < 0, b(t, χ) = c(t, χ) = +∞ so that nonnegativity constraints on time are automatically taken into account. For instance, problems with prescribed ﬁnal time are obtained with objective functions satisfying the condition ∀ t > 0, c(t, χ) := +∞ In this case, t∗ = T and condition (1.5) boils down to i) ∀ t ∈ [0, T ], y (t) ≥ b(T − t, χ(t)) ii) y (T ) ≥ c(0, χ(T )) Indeed, since y (t∗ ) is ﬁnite and since c(T − t∗ , χ(t∗ )) is inﬁnite whenever T − t∗ > 0, we infer from inequality (1.5)ii) that T − t∗ must be equal to 0. 2 Allowing the characteristic functions to take inﬁnite values (i.e., to be extended), allows us to acclimate many examples. For example, the three rules (1.4) associated with a same characteristic function u : [0, 1]n → R ∪ {+∞} can be written in the form (1.5) by adequate choices of pairs (b, c) of functions associated with u. Indeed, denoting by u∞ the function deﬁned by u∞ (t, χ) := u(χ) if t = 0 +∞ if t > 0 24 DYNAMIC GAMES: THEORY AND APPLICATIONS and by 0 the function deﬁned by 0(t, χ) = 0 if t ≥ 0, +∞ if not we can recover the three rules of the game 1. We take b(t, χ) := 0(t, χ) and c(t, χ) = u∞ (t, χ), we obtain the prescribed ﬁnal time rule (1.4)i). 2. We take b(t, χ) := u(χ) and c(t, χ) := u∞ (t, χ), we obtain the span time rule (1.4)ii). 3. We take b(t, χ) := 0(t, χ) and c(t, χ) = u(χ), we obtain the ﬁrst winning time rule (1.4)iii). 5.4 Dynamics of fuzzy cooperative games Naturally, games are played under uncertainty. In games arising social or biological sciences, uncertainty is rarely od a probabilistic and stochastic nature (with statistical regularity), but of a tychastic nature, according to a terminology borrowed to Charles Peirce.
Statedependent uncertainty can also be translated mathematically by parameters on which actors, agents, decision makers, etc. have no controls. These parameters are often perturbations, disturbances (as in “robust control” or “diﬀerential games against nature”) or more generally, tyches (meaning “chance” in classical Greek, from the Goddess Tyche) ranging over a statedependent tychastic map. They could be called “random variables” if this vocabulary were not already conﬁscated by probabilists. This is why we borrow the term of tychastic evolution to Charles Peirce who introduced it in a paper published in 1893 under the title evolutionary love. One can prove that stochastic viability is a (very) particular case of tychastic viability. The size of the tychastic map captures mathematically the concept of “versatility (tychastic volatility)” — instead of “(stochastic) volatility”: The larger the graph of the tychastic map, the more “versatile” the system. Next, we deﬁne the dynamics of the coalitions and of the imputations, assumed to be given. 1 Dynamical Connectionist Network and Cooperative Games 25 1. the evolution of coalitions χ(t) ∈ Rn is governed by diﬀerential inclusions χ (t) := f (χ(t), v (t))wherev (t) ∈ Q(χ(t)) where v (t) are tyches, 2. static constraints ∀ χ ∈ [0, 1]n , p ∈ P (χ) ⊂ Rn + and dynamic constraints on the velocities of the imputations p(t) ∈ Rn of the form + p (t), χ(t) = −m(χ(t), p(t), v (t)) p(t), χ(t) stating that the cost p , χ of the instantaneous change of imputation of a coalition is proportional to it by a discount factor m(χ, p) 3. from which we deduce the velocity y (t) = p(t), f (χ(t), v (t)) − m(χ(t), p(t))y (t) of the payoﬀ y (t) := p(t), χ(t) of the fuzzy coalition χ(t). The evolution of the fuzzy coalitions is thus parameterized by imputations and tyches, i.e., is governed by a dynamic game ⎧ χ (t) = f (χ(t), v (t)) ⎨ i) ii) y (t) = p(t), f (χ(t), v (t)) − m(χ(t), p(t))y (t) (1.6) ⎩ iii) where p(t) ∈ P (χ(t)) & v (t) ∈ Q(χ(t)) A feedback p is a selection of the setvalued map P in the sense that for any χ ∈ [0, 1]n , p(χ) ∈ P (χ). We thus associate with any feedback p the set Cp (χ) of triples (χ(·), y (·), v (·)) solutions to ⎧ ⎨ i) χ (t) = f (χ(t), v (t)) ii) y (t) = p(χ(t)), f (χ(t), v (t)) − y (t)m(χ(t), p(χ(t)), v (t)) ⎩ where v (t) ∈ Q(χ(t)) (1.7) 5.5 Valuation of the dynamical game We shall characterize the dynamical core of the fuzzy dynamical cooperative game in terms of the derivatives of a valuation function that we now deﬁne. For each rule of the game (1.5), the set V of initial conditions (T, χ, y ) such that there exists a feedback χ → p(χ) ∈ P (χ) such that, for all 26 DYNAMIC GAMES: THEORY AND APPLICATIONS tyches t ∈ [0, T ] → v (t) ∈ Q(χ(t)), for all solutions to system (1.7) of diﬀerential equations satisfying χ(0) = χ, y (0) = y , the corresponding condition (1.5) is satisﬁed, is called the guaranteed valuation set8 . Knowing it, we deduce the valuation function V (T, χ) := inf {y (T, χ, y ) ∈ V } providing the cheapest initial payoﬀ allowing to satisfy the viability/ capturability conditions (1.5). It satisﬁes the initial condition: V (0, χ) := u(χ) In each of the three cases, we shall compute explicitly the valuation functions as infsup of underlying criteria we shall uncover. For that purpose, we associate with the characteristic function u : [0, 1]n → R ∪ {+∞} of the dynamical cooperative game the functional ⎧ t ⎪ Ju (t; (χ(·), v (·)); p)(χ) := e 0 m(χ(s),p(χ(s)),v(s))ds u(χ(t)) ⎪ ⎨ ⎪ ⎪ ⎩ −
0 t e τ 0 m(χ(s),p(χ(s)),v (s))ds p(χ(τ )), f (χ(τ ), v (τ )) dτ We shall associate with it and with each of the three rules of the game (1.4) the three corresponding valuation functions: 1. prescribed end rule: We obtain V(0,u∞ ) (T, χ) :=
p(χ)∈P (χ) (χ(·),v (·))∈Cp (χ) inf sup Ju (T ; (χ(·), v (·)); p)(χ) (1.8) 2. time span rule: We obtain V(u,u∞ ) (T, χ) :=
p(χ)∈P (χ) (χ(·),v (·))∈Cp (χ) t∈[0,T ] inf sup sup Ju (t; (χ(·), v (·)); p)(χ) (1.9) 3. ﬁrst winning time rule: We obtain V(0,u) (T, χ) :=
p(χ)∈P (χ) (χ(·),v (·))∈Cp (χ) t∈[0,T ] inf sup inf Ju (t; (χ(·), v (·)); p)(χ) (1.10) A general formula for game rules 1.5 does exist, but is too involved to be reproduced in this survey.
8 One can also deﬁne the conditional valuation set V of initial conditions (T, χ, y ) such that for all tyches v , there exists an evolution of the imputation p(·) such that viability/capturability conditions (1.5) are satisﬁed. We omit this study for the sake of brevity, since it is parallel to the one of guaranteed valuation sets. 1 Dynamical Connectionist Network and Cooperative Games 27 5.6 HamiltonJacobi equations and dynamical core Although these functions are only lower semicontinuous, one can deﬁne epiderivatives of lower semicontinuous functions (or generalized gradients) in adequate ways and compute the core Γ: for instance, when the valuation function is diﬀerentiable, we shall prove that Γ associates with any (t, χ) ∈ R+ × Rn the subset Γ(t, χ) of imputations p ∈ P (χ) satisfying
n sup
v ∈Q(χ) i=1 ∂ V (t, χ) − pi fi (χ, v ) + m(χ, p, v )V (t, χ) ∂χi ≤ ∂V (t, χ) ∂t The valuation function V is actually a solution to the nonlinear HamiltonJacobiIsaacs partial diﬀerential equation − ∂ v(t, χ) + inf sup ∂t p∈P (χ) v ∈Q(χ)
n i=1 ∂ v(t, χ) − pi fi (χ, v ) ∂χi + m(χ, p, v )v(t, χ) =0 satisfying the initial condition v(0, χ) = u(χ) on the subset Ω(b,c) (v) := {(t, χ)c(t, χ) > v(t, χ) ≥ v(t, χ)} For each of the game rules (1.4), these subsets are written 1. prescribed end rule: Ω(0,u∞ ) (v) := {(t, χ)t > 0&v(t, χ) ≥ 0} 2. time span rule Ω(u,u∞ ) (v) := {(t, χ)t > 0&v(t, χ) ≥ u(χ)} 3. ﬁrst winning time rule Ω(0,u) (v) := {(t, χ)t > 0&u(χ) > v(t, χ) ≥ 0} 28 DYNAMIC GAMES: THEORY AND APPLICATIONS Actually, the solution of the above partial diﬀerential equation is taken in the “contingent sense”, where the directional derivatives are the contingent epiderivatives D↑ v(t, χ) of v at (t, χ). They are deﬁned by D↑ v(t, χ)(λ, v ) := lim inf
h→0+,u→v v(t + hλ, χ + hu) h (see for instance Aubin and Frankowska (1990) and Rockafellar and Wets (1997)). Definition 1.3 (Dynamical Core) Consider the dynamic fuzzy cooperative game with game rules (1.5). The dynamical core Γ of the corresponding fuzzy dynamical cooperative game is equal to ⎧ ⎪ Γ(t, χ) := p ∈ P (χ) such that ⎪ ⎨ supv∈Q(χ) (D↑ V (t, χ)(−1, f (χ, v )) − p, f (χ, v ) ⎪ ⎪ ⎩ +m(χ, p, v )V (t, χ)) ≤ 0
where V is the corresponding value function. We can prove that for each feedback p(t, χ) ∈ Γ(t, χ) being a selection of the dynamical core Γ, all evolutions (χ(·), v (·)) of the system ⎧ χ (t) = f (χ(t), v (t)) ⎨ i) ii) y (t) = p(T − t, χ(t)), χ(t) − m(χ(t), p(T − t, χ(t))y (t) ⎩ iii) v (t) ∈ Q(χ(t)) (1.11) satisfy the corresponding condition (1.5). 5.7 The static case as inﬁnite versatility Let us consider the case when m(χ, p, v ) = 0 (selfﬁnancing of fuzzy coalitions) and when the evolution of coalitions is governed by f (χ, v ) = v and Q(χ) = rB . Then the dynamical core is the subset Γ(t, χ) of imputations p ∈ P (χ) satisfying on Ω(V ) the equation9 r ∂ V (t, χ) ∂V (t, χ) −p = ∂χ ∂t Now, assuming that the data and the solution are smooth we deduce formally that letting the versatility r → ∞, we obtain as a limiting case
9 when p = 0, we ﬁnd the eikonal equation. 1 Dynamical Connectionist Network and Cooperative Games
∂V (t,χ) ∂χ 29 that p = and that Shapley value of the fuzzy static that in this case Γ(t, χ) = cooperative game when the characteristic function u is diﬀerentiable and positively homogenous, and the core of the fuzzy static cooperative game when the characteristic function u is concave, continuous and positively homogenous. 2 ∂V (t,χ) = 0. ∂t ∂ u(χ) ∂χ , i.e., the Since V (0, χ) = u(χ), we infer 6. 6.1
by The viability/capturability strategy The epigraphical approach
E p(v) := {(χ, λ) ∈ X × Rv(χ) ≤ λ} The epigraph of an extended function v : X → R ∪ {+∞ is deﬁned We recall that an extended function v is convex (resp. positively homogeneous) if and only if its epigraph is convex (resp. a cone) and that the epigraph of v is closed if and only if v is lower semicontinuous: ∀ χ ∈ X, v(χ) = lim inf v(y )
y →x With these deﬁnitions, we can translate the viability/capturability conditions (1.5) in the following geometric form: ⎧ ⎪ i) ∀ t ∈ [0, t∗ ], (T − t, χ(t), y (t)) ∈ E p(b) ⎪ ⎨ (viability constraint) (1.12) ii) (T − t∗ , χ(t∗ ), y (t∗ )) ∈ E p(c) ⎪ ⎪ ⎩ (capturability of a target) This “epigraphical approach” proposed by J.J. Moreau and R.T. Rockafellar in convex analysis in the early 60’s 10 , has been used in optimal control by H. Frankowska in a series of papers Frankowska (1989a,b, 1993) and Aubin and Frankowska (1996) for studying the value function of optimal control problems and characterize it as generalized solution (episolutions and/or viscosity solutions) of (ﬁrstorder) HamiltonJacobiBellman equations, in Aubin (1981c); Aubin and Cellina (1984) Aubin (1986, 1991) for characterizing and constructing Lyapunov functions, in Cardaliaguet (1994, 1996, 1997, 2000) for characterizing the minimal time function, in Pujal (2000) and Aubin, Pujal and SaintPierre (2001) in ﬁnance and other authors since. This is this approach that we adopt and adapt here, since the concepts of “capturability of
10 see for instance Aubin and Frankowska (1990) and Rockafellar and Wets (1997) among many other references. 30 DYNAMIC GAMES: THEORY AND APPLICATIONS a target” and of “viability” of a constrained set allows us to study this problem under a new light (see for instance Aubin (1991, 1997) for economic applications) for studying the evolution of the state of a tychastic control system subjected to viability constraints in control theory and in dynamical games against nature or robust control (see Quincampoix (1992), Cardaliaguet (1994, 1996, 1997, 2000), Cardaliaguet, Quincampoix and SaintPierre (1999). Numerical algorithms for ﬁnding viability kernels have been designed in SaintPierre (1994) and adapted to our type of problems in Pujal (2000). The properties and characterizations of the valuation function are thus derived from the ones of guaranteed viablecapture basins, that are easier to study — and that have been studied — in the framework of plain constrained sets K and targets C ⊂ K (see Aubin (2001a, 2002) and Aubin and Catt´ (2002) for recent results on that topic). e 6.2 Introducing auxiliary dynamical games We observe that the evolution of (T − t, χ(t), y (t)) made up of the backward time τ (t) := T − t, of fuzzy coalitions χ(t) of the players, of imputations and of the payoﬀ y (t) is governed by the dynamical game ⎧ τ (t) = −1 ⎪ i) ⎪ ⎪ ⎨ ii) ∀ i = 0, . . . , n, χ (t) = f (χ(t), v (t)) i i (1.13) ⎪ iii) y (t) = −y (t)m(χ(t), p(t), v (t)) + p(t), f (χ(t), v (t)) ⎪ ⎪ ⎩ where p(t) ∈ P (χ(t)) & v (t) ∈ Q(χ(t)) starting at (T, χ, y ). We summarize it in the form of the dynamical game i) z (t) ∈ g (z (t), u(t), v (t)) ii) u(t) ∈ P (z (t)) & v (t) ∈ Q(z (t)) where z := (τ, χ, y ) ∈ R × Rn × R, where the controls u := p are the imputations, where the map g : R × Rn × R ; R × Rn × Rn × R is deﬁned by g (z, v ) = (−1, f (χ, v ), u, −m(χ, u, v )y + u, f (χ, v ) ) where u ranges over P (z ) := P (χ) and v over Q(z ) := Q(χ). We say that a selection z → p(z ) ∈ P (z ) is a feedback, regarded as a strategy. One associates with such a feedback chosen by the decision maker or the player the evolutions governed by the tychastic diﬀerential equation z (t) = g (z (t), p(z (t)), v (t)) starting at time 0 at z . 1 Dynamical Connectionist Network and Cooperative Games 31 6.3 Introducing guaranteed capture basins We now deﬁne the guaranteed viablecapture basin that are involved in the deﬁnition of guaranteed valuation subsets. Definition 1.4 Let K and C ⊂ K be two subsets of Z . The guaranteed viablecapture basin of the target C viable in K is the set of elements z ∈ K such that there exists a continuous feedback p(z ) ∈ P (z ) such that for every v (·) ∈ Q(z (·)), for every solutions z (·) to z = g (z, p(z ), v ), there exists t∗ ∈ R+ such that the viability/capturability conditions i) ∀ t ∈ [0, t∗ ], z (t) ∈ K ii) z (t∗ ) ∈ C
are satisﬁed. 6.4 The strategy We thus observe that Proposition 1.1 The guaranteed valuation subset V is the guaranteed viablecapture basin under the dynamical game (1.13) of the epigraph of the function c viable in the epigraph of the function b.
Since we have related the guaranteed valuation problem to the much simpler — although more abstract — study of guaranteed viablecapture basin of a target and other guaranteed viability/capturability issues for dynamical games, 1. we ﬁrst “solve” these “viability/capturability problems” for dynamical games at this general level, and in particular, study the tangential conditions enjoyed by the guaranteed viablecapture basins, 2. and use setvalued analysis and nonsmooth analysis for translating the general results of viability theory to the corresponding results of the auxiliary dynamical game, in particular translating tangential conditions to give a meaning to the concept of a generalized solution (Frankowska’s episolutions or, by duality, viscosity solutions) to HamiltonJacobiIsaacs variational inequalities. References
Aubin, J.P. (2004). Dynamic core of fuzzy dynamical cooperative games. Annals of Dynamic Games, Ninth International Symposium on Dynamical Games and Applications, Adelaide, 2000. 32 DYNAMIC GAMES: THEORY AND APPLICATIONS Aubin, J.P. (2002). Boundaryvalue problems for systems of HamiltonJacobiBellman inclusions with constraints. SIAM Journal on Control Optimization, 41(2):425–456. Aubin, J.P. (2001a). Viability kernels and capture basins of sets under diﬀerential inclusions. SIAM Journal on Control Optimization, 40:853–881. Aubin, J.P. (2001b). Regulation of the evolution of the architecture of a network by connectionist tensors operating on coalitions of actors. Preprint. Aubin, J.P. (1999). Mutational and Morphological Analysis: Tools for Shape Regulation and Morphogenesis. Birkh¨user. a Aubin, J.P. (1998, 1993). Optima and equilibria. Second edition, SpringerVerlag. Aubin, J.P. (1998a). Connectionist complexity and its evolution. In: Equations aux d´riv´es partielles, Articles d´di´s ` J.L. Lions, pages ee eea 50–79, Elsevier. Aubin, J.P. (1998b). Minimal complexity and maximal decentralization. In: H.J. Beckmann, B. Johansson, F. Snickars and D. Thord (eds.), Knowledge and Information in a Dynamic Economy, pages 83–104, Springer. Aubin, J.P. (1997). Dynamic Economic Theory: A Viability Approach, SpringerVerlag. Aubin, J.P. (1996). Neural Networks and Qualitative Physics: A Viability Approach. Cambridge University Press. Aubin, J.P. (1995). Learning as adaptive control of synaptic matrices. In: M. Arbib (ed.), The Handbook of Brain Theory and Neural Networks, pages 527–530, Bradford Books and MIT Press. Aubin, J.P. (1993). Beyond neural networks: Cognitive systems. In: J. Demongeot and Capasso (eds.), Mathematics Applied to Biology and Medicine, Wuers, Winnipeg. Aubin, J.P. (1991). Viability Theory. Birkh¨user, Boston, Basel, Berlin. a Aubin, J.P. (1986). A viability approach to Lyapunov’s second method. In: A. Kurzhanski and K. Sigmund (eds.), Dynamical Systems, Lectures Notes in Economics and Math. Systems, 287:31–38, SpringerVerlag. Aubin, J.P. (1982). An alternative mathematical description of a player in game theory. IIASA WP82–122. Aubin, J.P. (1981a). Cooperative fuzzy games. Mathematics of Operations Research, 6:1–13. Aubin, J.P. (1981b). Locally Lipchitz cooperative games. Journal of Mathematical Economics, 8:241–262. 1 Dynamical Connectionist Network and Cooperative Games 33 Aubin, J.P. (1981c). Contingent derivatives of setvalued maps and existence of solutions to nonlinear inclusions and diﬀerential inclusions. In: Nachbin L. (ed.), Advances in Mathematics, Supplementary Studies, pages 160–232. Aubin, J.P. (1979). Mathematical methods of game and economic theory. NorthHolland, Studies in Mathematics and its applications, 7: 1–619. Aubin, J.P.and Burnod, Y. (1998). Hebbian Learning in Neural Networks with Gates. Cahiers du Centre de Recherche Viabilit´, Jeux, e Contrˆle, # 981. o Aubin, J.P. and Catt´, F. (2002). Bilateral ﬁxedpoints and algebraic e properties of viability kernels and capture basins of sets. SetValued Analysis, 10(4):379–416. Aubin, J.P. and Cellina, A. (1984). Diﬀerential Inclusions. SpringerVerlag. Aubin, J.P. and Dordan, O. (1996). Fuzzy systems, viability theory and toll sets. In: H. Nguyen (ed.), Handbook of Fuzzy Systems, Modeling and Control, pages 461–488, Kluwer. Aubin, J.P. and Foray, D. (1998). The emergence of network organizations in processes of technological choice: A viability approach. In: P. Cohendet, P. Llerena, H. Stahn and G. Umbhauer (eds.), The Economics of Networks, pages 283–290, Springer. Aubin, J.P. and Frankowska, H. (1990). SetValued Analysis. Birkh¨user, a Boston, Basel, Berlin. Aubin, J.P. and Frankowska, H. (1996). The viability kernel algorithm for computing value functions of inﬁnite horizon optimal control problems. Journal of Mathematical Analysis and Applications, 201:555– 576. Aubin, J.P., LouisGuerin, C., and Zavalloni, M. (1979). Comptabilit´ e entre conduites sociales r´elles dans les groupes et les repr´sentations e e symboliques de ces groupes : un essai de formalisation math´matique. e Mathmatiques et Sciences Humaines, 68:27–61. Aubin, J.P., Pujal, D., and SaintPierre, P. (2001). Dynamic Management of Portfolios with Transaction Costs under Tychastic Uncertainty. Preprint. Basile, A. (1993). Finitely additive nonatomic coalition production economies: CoreWalras equivalence. International Economic Review, 34: 993–995. Basile, A. (1994). Finitely additive correpondences. Proceedings of the American Mathematical Society, 121:883–891. Basile, A. (1995). On the range of certain additive correspondences. Universita di Napoli, (to appear). 34 DYNAMIC GAMES: THEORY AND APPLICATIONS Basile, A., De Simone, A., and Graziano, M.G. (1996). On the Aubinlike characterization of competitive equilibria in inﬁnitedimensional economies. Rivista di Matematica per le Scienze Economiche e Sociali, 19:187–213. Bonneuil, N. (2001). History, diﬀerential inclusions and Narrative. History and Theory, Theme issue 40, Agency after Postmodernism, pages 101–115, Wesleyan University. Bonneuil, N. (1998a). Games, equilibria, and population regulation under viability constraints: An interpretation of the work of the anthropologist Fredrik Barth. Population: An English selection, special issue of New Methods in Demography, pages 151–179. Bonneuil, N. (1998b). Population paths implied by the mean number of pairwise nucleotide diﬀerences among mitochondrial sequences. Annals of Human Genetics, 62:61–73. Bonneuil, N. (2005). Possible coalescent populations implied by pairwise comparisons of mitochondrial DNA sequences. Theoretical Population Biology, (to appear). Bonneuil, N. (2000). Viability in dynamic social networks. Journal of Mathematical Sociology, 24:175–182. Bonneuil, N., and SaintPierre, P. (2000). Protected polymorphism in the theolocus haploid model with unpredictable ﬁrnesses. Journal of Mathematical Biology, 40:251–377. Bonneuil, N. and SaintPierre, P. (1998). Domaine de victoire et strat´e gies viable dans le cas d’une correspondance non convexe : application a ` l’anthropologie des pˆcheurs selon Fredrik Barth. Math´matiques et e e Sciences Humaines, 132:43–66. Cardaliaguet, P. (1994). Domaines dicriminants en jeux diﬀ´rentiels. e Th`se de l’Universit´ de ParisDauphine. e e Cardaliaguet, P. (1996). A diﬀerential game with two players and one target. SIAM Journal on Control and Optimization, 34(4):1441–1460. Cardaliaguet, P. (1997). On the regularity of semipermeable surfaces in control theory with application to the optimal exittime problem (Part II). SIAM Journal on Control Optimization, 35(5):1638–1652. Cardaliaguet, P. (2000). Introduction a la th´orie des jeux diﬀ´rentiels. ` e e Lecture Notes, Universit´ ParisDauphine. e Cardaliaguet, P., Quincampoix, M., and SaintPierre, P. (1999). Setvalued numerical methods for optimal control and diﬀerential games. In: Stochastic and diﬀerential games. Theory and Numerical Methods, Annals of the International Society of Dynamical Games, pages 177– 247, Birkh¨user. a 1 Dynamical Connectionist Network and Cooperative Games 35 Cornet, B. (1976). On planning procedures deﬁned by multivalued diﬀerential equations. Syst`me dynamiques et mod`les ´conomiques e e e (C.N.R.S.). Cornet, B. (1983). An existence theorem of slow solutions for a class of diﬀerential inclusions. Journal of Mathematical Analysis and Applications, 96:130–147. Filar, J.A. and Petrosjan, L.A. (2000). Dynamic cooperative games. International Game Theory Review, 2:47–65. Florenzano, M. (1990). Edgeworth equilibria, fuzzy core and equilibria of a production economy without ordered preferences. Journal of Mathematical Analysis and Applications, 153:18–36. Frankowska, H. (1987a). L’´quation d’HamiltonJacobi contingente. e ComptesRendus de l’Acad´mie des Sciences, PARIS, 1(304):295–298. e Frankowska, H. (1987b). Optimal trajectories associated to a solution of contingent HamiltonJacobi equations. IEEE, 26th, CDC Conference, Los Angeles. Frankowska, H. (1989a). Optimal trajectories associated to a solution of contingent HamiltonJacobi equations. Applied Mathematics and Optimization, 19:291–311. Frankowska, H. (1989b). HamiltonJacobi equation: Viscosity solutions and generalized gradients. Journal of Mathematical Analysis and Applications, 141:21–26. Frankowska, H. (1993). Lower semicontinuous solutions of HamiltonJacobiBellman equation. SIAM Journal on Control Optimization, 31(1):257–272. Haurie, A. (1975). On some properties of the characteristic function and core of multistage game of coalitions. IEEE Transactions on Automatic Control, 20:238–241. Henry, C. (1972). Diﬀerential equations with discontinuous right hand side. Journal of Economic Theory, 4:545–551. Mares, M. (2001). Fuzzy Cooperative Games. Cooperation with Vague Expectations. Physica Verlag. Mishizaki, I. and Sokawa, M. (2001). Fuzzy and Multiobjective Games for Conﬂict Resolution. Physica Verlag. Petrosjan, L.A. (1996). The time consistency (dynamic stability) in differential games with a discount factor. Game Theory and Applications, pages 4753, Nova Sci. Publ., Commack, NY. Petrosjan, L.A. and Zenkevitch, N.A. (1996). Game Theory. World Scientiﬁc, Singapore. Pujal, D. (2000). Valuation et gestion dynamiques de portefeuilles. Th`se e de l’Universit´ de ParisDauphine. e 36 DYNAMIC GAMES: THEORY AND APPLICATIONS Quincampoix, M. (1992). Diﬀerential inclusions and target problems. SIAM Journal on Control Optimization, 30:324–335. Rockafellar, R.T. and Wets, R. (1997). Variational Analysis. SpringerVerlag SaintPierre, P. (1994). Approximation of the viability kernel. Applied Mathematics & Optimisation, 29:187–209. Chapter 2 A DIRECT METHOD FOR OPENLOOP DYNAMIC GAMES FOR AFFINE CONTROL SYSTEMS
Dean A. Carlson George Leitmann
Abstract Recently in Carlson and Leitmann (2004) some improvements on Leitmann’s direct method, ﬁrst presented for problems in the calculus of variations in Leitmann (1967), for openloop dynamic games in Dockner and Leitmann (2001) were given. In these papers each player has its own state which it controls with its own control inputs. That is, there is a state equation for each player. However, many applications involve the players competing for a single resource (e.g., two countries competing for a single species of ﬁsh). In this note we investigate the utility of the direct method for a class of games whose dynamics are described by a single equation for which the state dynamics are aﬃne in the players strategies. An illustrative example is also presented 1. The direct method In Carlson and Leitmann (2004) a direct method for ﬁnding openloop Nash equilibria for a class of diﬀerential N player games is presented. A particular case included in this study concerns the situation in which the j th player’s dynamics at any time t ∈ [t0 , tf ] is a vectorvalued function t → xj (t) ∈ Rnj that is described by an ordinary control system of the form xj (t) = fj (t, x(t)) + gj (t, x(t))uj (t) a.e. t0 ≤ t ≤ tf ˙ xj (t0 ) = xjt0 and xj (tf ) = xjtf with control constraints uj (t) ∈ Uj (t) ⊂ Rmj a.e. t ∈ [t0 , tf ], (2.3) (2.1) (2.2) 38 DYNAMIC GAMES: THEORY AND APPLICATIONS and state constraints x(t) ∈ X(t) ⊂ Rn for t ∈ [t0 , tf ], (2.4) in which for each j = 1, 2, . . . N the function fj (·, ·) : [t0 , tf ] × Rn → Rnj is continuous, gj (·, ·) : [t0 , tf ] × Rmj ×nj is a continuous mj × nj matrixvalued function having a left inverse, and Uj (·) is setvalued mapping, and X(t) is a given set in Rn for t ∈ [t0 , tf ]. Here we use the notation x = (x1 , x2 , . . . , xN ) ∈ Rn1 × Rn2 × RnN = Rn , where n = n1 + n2 + . . . + nN ; similarly u = (u1 , u2 . . . , uN ) ∈ Rm , m = m1 + m2 . . . + mN . Additionally we assume that the sets, Mj = {(t, x, uj ) ∈ [t0 , tf ] × Rn × Rmj : uj ∈ Uj (t)} are closed and nonempty. The objective of each player is to minimize an objective function of the form, Jj (x(·), uj (·)) =
tf t0 0 fj (t, x(t), uj (t)) dt, (2.5) 0 where we assume that for each j = 1, 2, . . . , N the function fj (·, ·, ·) : n × Rmj is continuous. Mj × R With the above model description we now deﬁne the feasible set of admissible trajectorystrategy pairs. Definition 2.1 We say a pair of functions {x(·), u(·)} : [t0 , tf ] → Rn × Rm is an admissible trajectorystrategy pair iﬀ t → x(t) is absolutely continuous on [t0 , tf ], t → u(t) is Lebesgue measurable on [t0 , tf ], for each j = 1, 2, . . . , N , the relations (2.1)–(2.3) are satisﬁed, and for each j = 1, 2, . . . , N , the functionals (2.5) are ﬁnite Lebesgue integrals. Remark 2.1 For brevity we will refer to an admissible trajectorystrategy pair as an admissible pair. Also, for a given admissible pair, {x(·), u(·)}, we will follow the traditional convention and refer to x(·) as an admissible trajectory and u(·) as an admissible strategy.
[xj , yj ] to yj ∈ Rnj . For a ﬁxed j = 1, 2, . . . , N , x ∈ Rn , and yj ∈ Rnj we use the notation denote a new vector in Rn in which xj ∈ Rnj is replaced by That is, . [xj , yj ] = (x1 , x2 , . . . , xj −1 , yj , xj +1 , . . . , xN ). . Analogously [uj , vj ] = (u1 , u2 , . . . , uj −1 , vj , uj +1 , . . . , uN ) for all u ∈ Rm , vj ∈ Rmj , and j = 1, 2, . . . , N . With this notation we now have the following two deﬁnitions. Definition 2.2 Let j = 1, 2, . . . , N be ﬁxed and let {x(·), u(·)} be an admissible pair. We say that the pair of functions {yj (·), vj (·)} : [t0 , tf ] 2 Direct Method for Aﬃne Control Systems 39 → Rnj × Rmj is an admissible trajectorystrategy pair for player j relative to {x(·), u(·)} iﬀ the pair {[x(·)j , yj (·)], [u(·)j , vj (·)]} is an admissible pair. Definition 2.3 An admissible pair {x∗ (·), u∗ (·)} is a Nash equilibrium iﬀ for each j = 1, 2, . . . , N and each pair {yj (·), vj (·)} that is admissible for player j relative to {x∗ (·), u∗ (·)}, it is the case that
Jj (x∗ (·), u∗ (·)) = j ≤
tf t0 tf t0 0 fj (t, x∗ (t), u∗ (t)) dt j 0 fj (t, [x∗ (t)j , yj (t)], vj (t)) dt = Jj ([x∗ (·)j , yj (·)], vj (·)). Our goal in this paper is to provide a “direct method” which in some cases will enable us to determine a Nash equilibrium. We point out that relative to a ﬁxed Nash equilibrium {x∗ (·), u∗ (·)} each of the players in the above game solves an optimization problem taking the form of a standard problem of optimal control. Thus, under suitable additional assumptions, it is relatively easy to derive a set of necessary conditions (in the form of a Pontryagintype maximum principle) that must be satisﬁed by all Nash equilibria. Unfortunately these conditions are only necessary and not suﬃcient. Further, it is well known that nonuniqueness is always a source of diﬃculty in dynamic games so that in general the necessary conditions are not uniquely solvable (as is often the case in optimal control theory, when suﬃcient convexity is imposed). Therefore it is important to be able to ﬁnd usable suﬃcient conditions for Nash equilibria. The associated variational game We observe that, under our assumptions, the algebraic equations, zj = fj (t, x) + gj (t, x)uj j = 1, 2, . . . N, (2.6) can be solved for uj in terms of t, zj , and x to obtain uj = gj (t, x)−1 (zj − fj (t, x)) , j = 1, 2, . . . N, (2.7) where gj (t, x)−1 denotes the inverse of the matrix gj (t, x). As a consequence we can deﬁne the extended realvalued functions Lj (·, ·, ·) : [t0 , tf ] × Rn × Rnj → R ∪ +∞ as
0 Lj (t, x, zj ) = fj (t, x, gj (t, x)−1 (zj − fj (t, x))) (2.8) 40 DYNAMIC GAMES: THEORY AND APPLICATIONS if gj (t, x)−1 (zj − fj (t, x)) ∈ Uj (t) with Lj (t, x, zj ) = +∞ otherwise. With these functions we can consider the N player variational game in which the objective functional for the j th player is deﬁned by, Ij (x(·)) =
tf t0 Lj (t, x(t), xj (t)) dt. ˙ (2.9) With this notation we have the following additional deﬁnitions. Definition 2.4 An absolutely continuous function x(·) : [t0 , tf ] → Rn is said to be admissible for the variational game iﬀ it satisﬁes the boundary conditions given in equation (2.2) and such that the map t → ˙ Lj (t, x(t), xj (t)) is ﬁnitely Lebesgue integrable on [t0 , tf ] for each j = 1, 2, . . . , N . Definition 2.5 Let x(·) : [t0 , tf ] → Rn be admissible for the variational game and let j ∈ {1, 2, . . . , N } be ﬁxed. We say that yj (·) : [t0 , tf ] → Rnj is admissible for player j relative to x(·) iﬀ [xj (·), yj (·)] is admissible for the variational game. Definition 2.6 We say that x∗ (·) : [t0 , tf ] → Rn is a Nash equilibrium for the variational game iﬀ for each j = 1, 2, . . . , N ,
Ij (x∗ (·)) ≤ Ij ([x∗j (·), yj (·)]) for all functions yj (·) : [t0 , tf ] → Rnj that are admissible for player j relative to x∗ (·). Clearly the variational game and our original game are related. In particular we have the following theorem given in Carlson and Leitmann (2004). Theorem 2.1 Let x∗ (·) be a Nash equilibrium for the variational game deﬁned above. Then there exists a measurable function u∗ (·) : [t0 , tf ] → Rm such that the pair {x∗ (·), u∗ (·)} is an admissible trajectorystrategy pair for the original dynamic game. Moreover, it is a Nash equilibrium for the original game as well.
Proof. See Carlson and Leitmann (2004), Theorem 7.1. 2 Remark 2.2 The above result holds in a much more general setting than indicated above. We chose the restricted setting since it is suﬃcient for our needs in the analysis of the model we will consider in the next section. 2 Direct Method for Aﬃne Control Systems 41 With the above result we now focus our attention on the variational game. In 1967, for the case of one player variational games (i.e., the calculus of variations), Leitmann (1967) presented a technique (the “direct method”) for determining solutions of these games by comparing their solutions to that of an equivalent problem whose solution is more easily determined than that of the original game. This equivalence was obtained through a coordinate transformation. Since then this method has been used successfully to solve a variety of problems. Recently, Carlson (2002) presented an extension of this method that expands the utility of the approach as well as made some useful comparisons with a technique originally presented by Carath´odory in the early twentieth e century (see Carath´odory (1982)). Also, Dockner and Leitmann (2001) e extended the original direct method to include the case of openloop dynamic games. Finally, the extension of Carlson to the method was also modiﬁed in Leitmann (2004) to the include the case of openloop diﬀerential games in Carlson and Leitmann (2004). We begin by stating the following lemma found in Carlson and Leitmann (2004). Lemma 2.1 Let xj = zj (t, xj ) be a transformation of class C 1 having ˜ ˜ a unique inverse xj = zj (t, xj ) for all t ∈ [t0 , tf ] such that there is a ˜ onetoone correspondence x(t) ⇔ x(t), for all admissible trajectories ˜ x(·) satisfying the boundary conditions (2.2) and for all x(·) satisfying ˜
˜ xj (t0 ) = zj (t0 , x0j ) ˜ and xj (tf ) = zj (tf , xtf j ) ˜ ˜ ˜ for all j = 1, 2, . . . , N . Furthermore, for each j = 1, 2, . . . , N let Lj (·, ·, ·) : n × Rnj → R be a given integrand. For a given admissible [t0 , tf ] × R ˜ x∗ (·) : [t0 , tf ] → Rn suppose the transformations xj = zj (t, xj ) are such that there exists a C 1 function Hj (·, ·) : [t0 , tf ] × Rnj → R so that the functional identity ˜ ˙ ˙ ˜ ˜ Lj (t, [x∗j (t), xj (t)], xj (t)) − Lj (t, [x∗j (t), xj (t)], xj (t)) d Hj (t, xj (t)) ˜ (2.10) = dt ˜ holds on [t0 , tf ]. If x∗ (·) yields an extremum of Ij ([x∗j (·), ·]) with x∗ (·) ˜j ˜j satisfying the transformed boundary conditions, then x∗ (·) with x∗ (t) = j j ˜ zj (t, x∗ (t)) yields an extremum for Ij ([x∗j (·), ·]) with the boundary conditions (2.2). Moreover, the function x∗ (·) is an openloop Nash equilibrium for the variational game. Proof. See Carlson and Leitmann (2004), Lemma 5.1. 2 42 DYNAMIC GAMES: THEORY AND APPLICATIONS This lemma has three useful corollaries which we state below. Corollary 2.1 The existence of Hj (·, ·) in (2.9) implies that the following identities hold for (t, xj ) ∈ (t0 , tf ) × Rnj and for j = 1, 2, . . . , N : ˜
˜ Lj (t, [x∗j (t), zj (t, xj )], ∂zj (t, xj )) ˜ ∂t + ∇xj zj (t, xj ), pj ) ˜˜ ˜ (2.11) ∂Hj (t, xj ) ˜ + ∇xj Hj (t, xj ), pj , ˜˜ ˜ ∂t in which ∇xj Hj (·, ·) denotes the gradient of Hj (·, ·) with respect to the ˜ variables xj and ·, · denotes the usual scalar or inner product in Rnj . ˜ ˜ −Lj (t, [x∗j (t), xj ], pj ) ≡ ˜˜ Corollary 2.2 For each j = 1, 2, . . . , N the lefthand side of the identity, (2.11) is linear in pj , that is, it is of the form, ˜
˜ ˜˜ θj (t, xj ) + ψj (t, xj ), pj and, ˜ ∂Hj (t, xj ) = θj (t, xj ) ˜ ∂t on [t0 , tf ] × Rnj . and ∇xj Hj (t, xj ) = ψ (t, xj ) ˜ ˜ ˜ Corollary 2.3 For integrands Lj (·, ·, ·) of the form,
Lj (t, [x∗j (t), xj (t)], xj (t)) = xj (t)aj (t, [x∗j (t), xj (t)])xj (t) ˙ ˙ ˙ +bj (t, [x∗j (t), xj (t)]) xj (t) ˙ +cj (t, [x∗j (t), xj (t)]), and ˜ Lj (t, [x∗j (t), xj (t)], xj (t)) = xj (t)αj (t, [x∗j (t), xj (t)])xj (t) ˙ ˙ ˙ +βj (t, [x∗j (t), xj (t)]) xj (t) ˙ +γj (t, [x∗j (t), xj (t)]), with aj (t, [x∗j (t), xj (t)]) = 0 and αj (t, [x∗j (t), xj (t)]) = 0, the class of transformations that permit us to obtain (2.11) must satisfy, ˜ ∂ zj (t, xj ) ∂ xj ˜ aj (t, [x∗ (t)j , zj (t, xj )]) ˜ ∂ zj (t, xj ) ˜ = αj (t, [x∗ (t)j , xj ]) ˜ ∂ xj ˜ for (t, xj ) ∈ [t0 , t1 ] × Rnj . A class of dynamic games to which the above method has not been applied is that in which there is a single state equation which is controlled by all of the players. A simple example of such a problem is the competitive harvesting of a renewable resource (e.g., a single species ﬁshery model). In the next section we show how the direct method described above can be applied to a class of these types of models. 2 Direct Method for Aﬃne Control Systems 43 2. The model Consider an N player game where a single state x(t) ∈ Rn satisﬁes an ordinary control system of the form
N x(t) = F (t, x(t)) + ˙
i=1 Gi (t, x(t))ui (t) a.e. t0 ≤ t ≤ tf , (2.12) with initial and terminal conditions x(t0 ) = xt0 a ﬁxed state constraint, x(t) ∈ X (t) ⊂ Rn for t0 ≤ t ≤ tf , (2.14) and x(tf ) = xtf , (2.13) with X (t) a convex set for each t0 ≤ t ≤ tf , and control constraints, ui (t) ∈ Ui (t) ⊂ Rmi a.e. t0 ≤ t ≤ tf i = 1, 2, . . . N. (2.15) In this system each player has a strategy, ui (·), which inﬂuences the state variable x(·) over time. Definition 2.7 A set of functions . {x(·), u(·)} = {x(·), u1 (·), u2 (·), . . . , uN (·)}
deﬁned for t0 ≤ t ≤ tf is called an admissible trajectorystrategy pair iﬀ x(·) is absolutely continuous on its domain, u(·) is Lebesgue measurable on its domain, and the equations (2.12)– (2.15) are satisﬁed. We assume that F (·, ·) : [t0 , +∞) × Rn → Rn and Gi (·, ·, ·) : [t0 , +∞) × × Rmj → Rn is suﬃciently smooth so that for each selection of strategies u(·) (i.e., measurable functions) the initial value problem given by (2.12)–(2.13) has a unique solution xu (·). These conditions can be made more explicit for particular models and are not unduly restrictive. For brevity we do not to indicate these explicitly. Each of the players in the dynamic game wishes to minimize a performance criterion given of the form, Rn Jj (x(·), uj (·)) =
tf t0 fj (t, x(t), uj (t)) dt, j = 1, 2, . . . , N, (2.16) in which we assume that fj (·, ·, ·) : [t0 , tf ] × Rn × Rmj → R is continuous. To place the above dynamic game into a form amenable to the direct method consider a set of strictly positive weights, say αi > 0, 44 DYNAMIC GAMES: THEORY AND APPLICATIONS
N i=1 αi i = 1, 2, . . . N , which satisfy nary control system
N = 1 and consider the related ordiN xi (t) = F ˙ t,
i=1 αi xi (t) 1 + G t, αi αi xi (t) ui (t) a.e. t ≥ t0 ,
i=1 (2.17) i = 1, 2, . . . N, (2.18) i = 1, 2, . . . , N , with boundary conditions, xi (t0 ) = xt0 and xi (tf ) = xtf , and control constraints and state constraints, ui (t) ∈ Ui (t) ⊂ Rmi a.e. t0 ≤ t ≤ tf i = 1, 2, . . . N, . xi (t) ∈ Xi (t) = X (t) ⊂ Rn for t0 ≤ t ≤ tf i = 1, 2, . . . N. (2.19) (2.20) Definition 2.8 A set of functions . {x(·), u(·)} = {x1 (·), x2 (·), . . . xN (·), u1 (·), u2 (·), . . . , uN (·)}
deﬁned for t0 ≤ t ≤ tf is called an admissible trajectorystrategy pair for the related system iﬀ x(·) : [t0 , +∞) → Rn , where n = nN , is absolutely continuous on its domain, u(·) : [t0 , +∞) → Rm , where m = m1 + m2 + . . . + mN , is Lebesgue measurable on its domain, and the equations (2.17)–(2.19) are satisﬁed. For this related system it is easy to see that the conditions guaranteeing uniqueness for the original system would also insure the existence of the solution x(·) for a ﬁxed set of strategies ui (·). Proposition 2.1 Let {x(·), u(·)} be an admissible trajectorystrategy pair for the related control system. Then the pair, {x(·), u(·)}, with . x(t) = N αi xi (t) is an admissible trajectorystrategy pair for the origii=1 nal control system. Conversely, if {x(·), u(·)} is an admissible trajectorystrategy pair for the original control system, then there exists a function . x(·) = (x1 (·), . . . , xN (·)) so that x(t) = N αi xi (t) for i = 1, 2, . . . N i=1 and {x(·), u(·)} is an admissible trajectorystrategy pair for the related control system.
Proof. We begin by ﬁrst letting {x(·), u(·)} be an admissible trajectorystrategy pair for the related control system. Then deﬁning x(t) = N i=1 αi xi (t) for t0 ≤ t ≤ tf we observe that
N x(t) = ˙
i=1 αi xi (t) ˙ 2 Direct Method for Aﬃne Control Systems
N
45 =
i=1 N αi F (t, x(t)) +
N 1 Gi (t, x(t))ui (t) αi Gi (t, x(t))ui (t) =
i=1 αi F (t, x(t)) +
i=1 N = F (t, x(t)) +
i=1 Gi (t, x(t))ui (t), since N i=1 αi = 1. Further we also have that
N x(t0 ) =
i=1 αi xt0 = xt0 , uj (t) ∈ Uj (t) for almost all t0 ≤ t ≤ tf and j = 1, 2, . . . N, xj (t) ∈ X (t) for t0 ≤ t ≤ tf and j = 1, 2, . . . N, implying that {x(·), u(·)} is an admissible trajectorystrategy pair. Now assume that {x(·), u(·)} is an admissible trajectorystrategy pair for the original dynamical system (2.12–2.15) and consider the system of diﬀerential equations given by (2.17) with the initial conditions (2.18). By our hypotheses this system has a unique solution x(·) : [t0 , +∞) → RN . Furthermore, from the above computation we know that the func. tion, y (·) = N αi xi (·), along with the strategies, u(·) satisfy the difi=1 ferential equation (2.12) as well as the initial condition (2.13). However, this initial value problem has a unique solution, namely x(·), so that we must have y (t) ≡ x(t) for all t0 ≤ t ≤ tf . Further, we also have the constraints, (2.19) and (2.20), holding as well. Hence we have, {x(·), u(·)} is an admissible trajectorystrategy pair for the related system as desired. 2 In light of the above theorem it is clear that to use the direct method to solve the dynamic game described by (2.12)–(2.16) we consider the game described by the dynamic equations (2.17)–(2.19) where now the objective for player j , j = 1, 2, . . . N , is given as Jj (x(·), uj (·)) =
tf t0 N 0 fj t,
i=1 αi x(t), uj (t) dt. (2.21) In the next section we demonstrate this process with an example from mathematical economics. Remark 2.3 In solving constrained optimization or dynamic games problems one of the biggest diﬃculties is ﬁnding reasonable candidates 46 DYNAMIC GAMES: THEORY AND APPLICATIONS for the solution that meet the constraints. Perhaps the most often used method is to solve the unconstrained problem and hope that it satisﬁes the constraints. To understand why this technique works we observe that either in a game or in an optimization problem the set of admissible trajectorystrategy pairs that satisfy the constraints is a subset of the set of all admissible for pairs for the problem without constraints. Consequently, if you can ﬁnd an admissible trajectorystrategy pair which is an optimal (or Nash equilibrium) solution for the problem without constraints (say via the direct method for the unconstrained problem) and if additionally it actually satisﬁes the constraints you indeed have a solution for the original problem with constraints. It is this technique that is used in the next section to obtain the Nash equilibrium. 3. Example We consider two ﬁrms which produce an identical product. The production cost for each ﬁrm is given by the total cost function, 1 C (uj ) = u2 , 2j j = 1, 2, in which uj refers to a j th ﬁrm’s production level. Each ﬁrm supplies all that it produces to the market at all times. The amount supplied at each time eﬀects the price, P (t), and the total inventory of the market determines the price according to the ordinary control system, ˙ P (t) = s[a − u1 (t) − u2 (t) − P (t)] a.e. t ∈ [t0 , tf ]. (2.22) Here s > 0 refers to the speed at which the price adjusts to the price corresponding to the total quantity (i.e., u1 (t) + u2 (t)). The model assumes a linear demand rate given by Π = a − X where X denotes total supply related to a price P . Thus the dynamics above says that the rate of change of price at time t is proportional to the diﬀerence between the actual price P (t) and the idealized price Π(t) = a − u1 (t) − u2 (t). We assume that (through negotiation perhaps) the ﬁrms have agreed to move from the price P0 at time t0 to a price Pf at time tf . This leads to the boundary conditions, P (t0 ) = P0 and P (tf ) = Pf . (2.23) Additionally we also impose the constraints uj (t) ≥ 0 for almost all and P (t) ≥ 0 for t ∈ [t0 , tf ]. (2.25) t ∈ [t0 , tf ]. (2.24) 2 Direct Method for Aﬃne Control Systems 47 The goal of each ﬁrm is to maximize its accumulated proﬁt, assuming that it sells all that it produces, over the interval, [t0 , tf ] given by the integral functional, Jj (P (·), uj (·)) =
tf t0 1 P (t)uj (t) − u2 (t) dt. 2j (2.26) To put the above dynamic game into the framework to use the direct method let α, β > 0 satisfy α + β = 1 and consider the ordinary 2dimensional control system, x(t) = −s(αx(t) + βy (t) − a) − ˙ s u1 (t), a.e. t0 ≤ t ≤ tf (2.27) α s y (t) = −s(αx(t) + βy (t) − a) − u2 (t), a.e. t0 ≤ t ≤ tf (2.28) ˙ β with the boundary conditions, x(t0 ) = y (t0 ) = P0 x(tf ) = y (tf ) = Pf , (2.29) (2.30) and of course the control constraints given by (2.24) and state constraints (2.25). The payoﬀs for each of the player now become, Jj (x(·), y (·), uj (·)) =
tf t0 1 (αx(t) + βy (t))uj (t) − uj (t)2 dt 2 (2.31) for j = 1, 2. This gives a dynamic game for which the direct method can be applied. We now put the above game in the equivalent variational form by solving the dynamic equations (2.27) and (2.28) for the individual strategies. That is we have, 1 u1 = α(a − (αx + βy ) − p) s 1 u2 = β (a − (αx + βy ) − q ) s (2.32) (2.33) which gives (after a number of elementary steps of algebra) the new objectives (with negative sign to pose the variational problems as minimization problems) to get J1 (x(·), y (·), x(·)) = ˙ +
tf t0 α2 α2 a2 x(t)2 + ˙ 2s2 2 α2 + α (αx(t) + βy (t))2 2 48
+ DYNAMIC GAMES: THEORY AND APPLICATIONS α2 α (αx(t) + βy (t)) − (a − (αx(t) + βy (t))) x(t) ˙ s s dt (2.34) − a(α2 + α)(αx(t) + βy (t)) and ˙ J2 (x(·), y (·), y (t)) = + +
tf t0 β2 β 2 a2 y (t)2 + ˙ 2s2 2 β2 + β (αx(t) + βy (t))2 2 β2 β (αx(t) + βy (t)) − (a − (αx(t) + βy (t))) y (t) ˙ s s dt. (2.35) − a(β 2 + α)(αx(t) + βy (t)) For the remainder of our discussion we focus on the ﬁrst player as the computation of the second player is the same. We begin by observing that the integrand for player 1 is L1 (x, y, p) = α2 α2 2 α2 a2 + + α (αx + βy )2 p+ 2s2 2 2 α2 α (αx + βy ) − (a − (αx + βy )) p + s s −a(α2 + α)(αx + βy ) . ˜ Inspecting this integrand we choose L(·, ·, ·) to be, α2 a2 α2 ˜x˜˜ ˜ L(˜, y , p) = 2 p2 + 2s 2 from which we immediately deduce, applying Corollary 2.3, that the appropriate transformation, z1 (·, ·), must satisfy the partial diﬀerential equation, ∂ z1 2 =1 ∂x ˜ ˜ ˜ giving us that z1 (t, x) = f (t) ± x and that ∂z1 ∂z1 + p = f˙(t) ± p. ˜ ˜ ∂t ∂x ˜ (2.36) 2 Direct Method for Aﬃne Control Systems 49 From this we now compute, ˜x ˜ ˜ ˜ ΔL1 = L1 (f (t) ± x, y ∗ (t), f˙(t) ± p) − L1 (˜, y ∗ (t), p) 2 2 a2 2 α˙ α α = (f (t) ± p)2 + ˜ + + α (α(f (t) ± x) + βy ∗ (t))2 ˜ 2 2s 2 2 α (α(f (t) ± x) + βy ∗ (t)) ˜ + s α2 − ˜ (a − (α(f (t) ± x) + βy ∗ (t))) (f˙(t) ± p) − a(α2 + α) × ˜ s (α(f (t) ± x) + βy ∗ (t)) ˜ α2 2 α2 a2 p+ ˜ 2s2 2 2 α2 α˙2 + α [α(f (t) ± x) + βy ∗ (t)]2 − α2 + α × ˜ f (t) + = 2s2 2 [α(f (t) ± x) + βy ∗ (t)] ˜ 2 α α α2 a ˙ + [α(f (t) ± x) ˜ +βy ∗ (t)] − + f (t) s s s α2 ˙ α2 a α2 α × [α(f (t) ± x) + βy ∗ (t)] − ˜ p ˜ + ± f (t) + s2 s s s ˜ ˜ ∂H1 (t, x) ∂H1 (t, x) . + p. ˜ = ∂t ∂x ˜ − From this we compute the mixed partial derivatives to obtain, ∂ 2 H1 (t, x) = ±2 ˜ ∂ x∂t ˜ α2 + α [α(f (t) ± x) + βy ∗ (t)] α ˜ 2 α2 α ˙ + ∓aα(α2 + α) ± α f (t) s s ˜ = ± α3 (α + 2)(f (t) ± x) + α2 β (α + 2)y ∗ (t) −α2 (α + 1)a + and ∂ 2 H1 (t, x) = ± ˜ ∂t∂ x ˜ =± α2 ¨ α2 α + f (t) + αf˙(t) + β y ∗ (t) ˙ s2 s s α2 ¨ αβ α2 (α + 1)f˙(t) + (α + 1)y ∗ (t) . ˙ f (t) + 2 s s s α2 (α + 1)f˙(t) s 50 DYNAMIC GAMES: THEORY AND APPLICATIONS Assuming suﬃcient smoothness and equating the mixed partial derivatives we obtain the following equation: βs ¨ (α + 1)y ∗ (t) ˙ f (t) − αs2 ( + 2)f (t) = βs2 (α + 2)y ∗ (t) − α ±αs2 (α + 2)˜ − as2 (α + 1). x A similar analysis for player 2 yields: L2 (x, y, q ) = β2 β 2 2 β 2 a2 + + β (αx + βy )2 q+ 2s2 2 2 β β2 (αx + βy ) − (a − (αx + βy ) q s s − a(β 2 + β )(αx + βy ) , and so choosing ˜ x˜˜ L2 (˜, y , q ) = β 2 2 β 2 a2 q+ 2s2 2 (2.37) gives us that the transformation z2 (·, ·) is obtained by solving the partial diﬀerential equation ∂ z2 2 = 1, ∂y ˜ which of course gives us, z2 (t, y ) = g (t) ± y . Proceeding as above we ˜ ˜ arrive at the following diﬀerential equation for g (·), αs (1 + β )x∗ (t) ˙ g (t) − βs2 (β + 2)g (t) = αs2 (β + 2)x∗ (t) − ¨ β ±βs2 (β + 2)˜ − as2 (β + 1). y Now the auxiliary variational problem we must solve consists of minimizing the two functionals,
tf t0 α2 ˙ 2 αa2 x (t) + ˜ 2s2 2 tf dt and
t0 β2 ˙2 βa2 y (t) + ˜ 2s2 2 dt over some appropriately chosen boundary conditions. We observe that these two minimization problems are easily solved if these conditions take the form, ˜ x(t0 ) = x(tf ) = c1 ˜ x∗ (t) ≡ c1 ˜ and y (t0 ) = y (tf ) = c2 ˜ ˜ and y ∗ (t) ≡ c2 ˜ for arbitrary but ﬁxed constants c1 and c2 . The solutions are in fact, 2 Direct Method for Aﬃne Control Systems 51 According to our theory we then have that the solution to our variational game is, x∗ (t) = f (t) ± c1 and y ∗ (t) = g (t) ± c2 . In particular, using this information in the equations for f (·) and g (·) with x = c1 and with y = c2 we obtain the following equations for x∗ (·) ˜ ˜ and y ∗ (·), x∗ (t) − αs2 (α + 2)x∗ (t) = βs2 (α + 2)y ∗ (t) ¨ βs ˙ − (α + 1)y ∗ (t) − as2 (α + 1) α y ∗ (t) − βs2 (β + 2)y ∗ (t) = αs2 (β + 2)x∗ (t) ¨ αs ˙ − (1 + β )x∗ (t) − as2 (β + 1), β with the end conditions, x∗ (t0 ) = y ∗ (t0 ) = P0 and x∗ (tf ) = y ∗ (tf ) = Pf . These equations coincide exactly with the EulerLagrange equations, as derived by the Maximum Principle for the openloop variational game without constraints. Additionally we note that as these equations are derived here via the direct method we see that they become suﬃcient conditions for a Nash equilibrium of the unconstrained system, and hence for the constrained system for solutions which satisfy the constraints (see the comments in Remark 2.3). Moreover, we also observe that we can recover the functions Hj (·, ·), for j = 1, 2, since we can recover both f (·) and g (·) by the formulas f (t) = x∗ (t) ∓ c1 and g (t) = y ∗ (t) ∓ c2 . The required functions are now recovered by integrating the partial derivatives of H1 (·, ·) and H2 (·, ·) which can be computed. Consequently, we see that in this instance the solution to our variational game is given by the solutions of the above EulerLagrange system, provided the resulting strategies and the price satisfy the requisite constraints. Finally, we can obtain the solution to the original problem by taking, P ∗ (t) = αx∗ (t) + βy ∗ (t), 1 ˙ u∗ (t) = α a − P ∗ (t) − x∗ (t) , 1 s and 1 u∗ (t) = β a − P ∗ (t) − y ∗ (t) . ˙ 2 s 52 DYNAMIC GAMES: THEORY AND APPLICATIONS Of course, we still must check that these functions meet whatever constraints are required (i.e., ui (t) ≥ 0 and P (t) ≥ 0). There is one special case of the above analysis in which the solution can be obtained easily. This is the case when α = β = 1 . In this case 2 the above EulerLagrange system becomes, 5 x∗ (t) − s2 x∗ (t) = ¨ 4 5 ∗ y (t) − s2 y ∗ (t) = ¨ 4 52∗ 3 3 s y (t) − sy ∗ (t) − as2 ˙ 4 2 2 52∗ 3∗ 3 s x (t) − sx (t) − as2 . ˙ 4 2 2 Using the fact that P ∗ (t) = 1 (x∗ (t) + y ∗ (t)) for all t ∈ [t0 , tf ] we can 2 multiply each of these equations by 1 and add them together to obtain 2 the following equation for P ∗ (·), 3˙ 5 3 ¨ P ∗ (t) + sP ∗ (t) − s2 P ∗ (t) = − as2 , 2 2 2 for t0 ≤ t ≤ tf . This equation is an elementary nonhomogeneous second order linear equation with constant coecients whose general solution is given by 3 P ∗ (t) = Aer+ (t−t0 ) + Ber− (t−t0 ) + a 5 in which r± are the characteristics roots of the equation and A and B are arbitrary constants. More speciﬁcally, the characteristic roots are roots of the polynomial 5 3 r2 + sr − s2 = 0 2 2 and are given by 5 r+ = s and r− = − s. 2 Thus, to solve the dynamic game in this case we select A and B so that P ∗ (·) satisﬁes the ﬁxed boundary conditions. Further we note that we can also take 1 x∗ (t) = y ∗ (t) = P ∗ (t) 2 and so obtain the optimal strategies as u∗ (t) = u∗ (t) = 1 2 1 2 1˙ a − P ∗ (t) − P ∗ (t) . s It remains to verify that there exists some choice of parameters for which the optimal price, P ∗ (·), and the optimal strategies, u∗ (·), u∗ (·) 1 2 2 Direct Method for Aﬃne Control Systems 53 remain nonnegative. To this end we observe that we impose the ﬁxed boundary conditions to obtain the following linear system of equations for the unknowns, A and B : 1 es(tf −t0 ) e 1
− 5 s(tf −t0 ) 2 A B = P0 − 3 a 5 Pf − 3 a 5 . Using Cramer’s rule we obtain the following formulas for A and B , A= B= 1 D 1 D
5 3 3 P0 − a e− 2 s(tf −t0 ) − Pf − a 5 5 3 3 Pf − a − P0 − a es(tf −t0 ) 5 5 in which D is the determinant of the coeﬃcient matrix and is given by D = e− 2 s(tf −t0 ) − es(tf −t0 ) = es(tf −t0 ) e− 2 s(tf −t0 ) − 1 . We observe that D is clearly negative since tf > t0 . Also, to insure that P ∗ (t) is nonnegative for t ∈ [t0 , tf ] it is suﬃcient to insure that A and B are both positive. This means we must have, 0> 0>
5 3 3 P0 − a e− 2 s(tf −t0 ) − Pf − a 5 5 3 3 Pf − a − P0 − a es(tf −t0 ) 5 5 5 7 which can be equivalently expressed as,
5 3 3 P0 − a e− 2 s(tf −t0 ) < Pf − a < 5 5 3 P0 − a es(tf −t0 ) . 5 (2.38) Observe that as long as P0 and Pf are chosen to be larger than 3 a this 5 last inequality can be satisﬁed if we choose tf − t0 suﬃciently large. In this case we have explicitly given the optimal price, P ∗ (·) in terms of the model parameters P0 , Pf , t0 , tf , a, and s (all strictly positive). It remains to check that the strategies are nonnegative. To this end we notice that, 5 5 ˙ P ∗ (t) = Ases(t−t0 ) − Bse− 2 s(t−t0 ) 2 so that we have, the admissible strategies given by, for j = 1, 2, u∗ (t) = j 1 1˙ a − P ∗ (t) − P ∗ (t) 2 2s 54
= DYNAMIC GAMES: THEORY AND APPLICATIONS
5 1 a − Aes(t−t0 ) + Be− 2 s(t−t0 ) 2 5 1 5 − Ases(t−t0 ) − Bse− 2 s(t−t0 ) 2s 2 5 1 3 1 = a − Aes(t−t0 ) + Be− 2 (t−t0 ) . 2 2 4 Taking the time derivative of u∗ (·) we obtain j u∗ (t) = ˙j
5 1 5 3 − Ases(t−t0 ) − Bse− 2 (t−t0 ) < 0, 2 2 8 since A and B are positive. This implies that u∗ (t) ≥ u∗ (tf ) for all j j t ∈ [t0 , tf ]. Thus to insure that u∗ (·) is nonnegative it is suﬃcient to j insure u∗ (tf ) ≥ 0 which holds if we have j
5 1 3 a − Aes(tf −t0 ) + Be− 2 (tf −t0 ) ≥ 0. 2 4 To investigate this inequality we ﬁrst observe that we have, from the solution P ∗ (·), that Pf = Aes(tf −t0 ) + Be
−5 s(tf −t0 ) 2 3 + a. 5 This allows us to rewrite the last inequality in the form, 1 7 a − Ae−s(tf −t0 ) + 4 4 3 Pf − a 5 ≥0 or equivalently (using the explicit expression for A), 3 1 Pf − a ≥ 7 7 5 e− 2 s(tf −t0 ) − 1
5 3 3 P0 − a e− 2 s(tf −t0 ) − Pf − a 5 5 − 4a. Solving this inequality for Pf − 3 a we obtain the inequality, 5 7e− 2 s(tf −t0 ) 3 Pf − a ≤ 7 5 6 + e− 2 s(tf −t0 )
5 1 − e− 2 s(tf −t0 ) 3 P0 − a + 4a . 7 5 6 + e− 2 s(tf −t0 ) 7 (2.39) Thus to insure that the state and control constraints, P ∗ (t) ≥ 0 and ui (t) ≥ 0 for t ∈ [t0 , tf ], hold, we must check that the parameters of the system satisfy inequalities (2.38) and (2.39). We have already observed that for P0 , Pf ≥ 3 a we can choose tf − t0 suﬃciently large to insure 5 that (2.38) holds. Further, we observe that as tf − t0 → +∞ the right 2 Direct Method for Aﬃne Control Systems 55 side of (2.39) tends 2 a so that we can always ﬁnd tf − t0 suﬃciently 3 large so that we have 7e− 2 s(tf −t0 ) 6+e
− 7 s(tf −t0 ) 2
5 1 − e− 2 s(tf −t0 ) 3 P0 − a + 4a ≤ 7 5 6 + e− 2 s(tf −t0 ) 7 3 P0 − a es(tf −t0 ) 5 Moreover, it is easy to see that 7e− 2 s(tf −t0 ) 6+e
− 7 s(tf −t0 ) 2
5 1 − e− 2 s(tf −t0 ) 3 P0 − a +4a ≥ 7 5 6 + e− 2 s(tf −t0 ) 7 5 3 P0 − a e− 2 s(tf −t0 ) 5 holds whenever tf − t0 is suﬃciently large. Combining these observations allows us to conclude that for tf − t0 suﬃciently large {P ∗ (·), u∗ (·), u∗ (·)} 1 2 is a Nash equilibrium for the original dynamic game. 4. Conclusion In this paper we have presented means to utilize the direct method to obtain openloop Nash equilibria for diﬀerential games for which there is a single state whose time evolution is determined by the competitive strategies of several players appearing linearly in the equation. That is a so called aﬃne control system with “many inputs and one output.” References
Carath´odory, C. (1982). Calculus of Variations and Partial Diﬀerential e Equations. Chelsea, New York, New York. Carlson, D.A. (2002). An observation on two methods of obtaining solutions to variational problems. Journal of Optimization Theory and Applications, 114:345–362. Carlson, D.A. and Leitmann, G. (2004). An extension of the coordinate transformation method for openloop Nash equilibria. Journal of Optimization Theory and Applications, To appear. Dockner, E.J. and Leitmann, G. (2001) Coordinate transformation and derivation of openloop Nash equilibrium. Journal of Optimization Theory and Applications, 110(1):1–16. Leitmann, G. (1967). A note on absolute extrema of certain integrals. International Journal of NonLinear Mechanics, 2:55–59. Leitmann, G. (2004). A direct method of optimization and its application to a class of diﬀerential games. Dynamics of Continuous, Discrete and Impulsive Systems – Series A: Mathematical Analysis, 11:191–204. Chapter 3 BRAESS PARADOX AND PROPERTIES OF WARDROP EQUILIBRIUM IN SOME MULTISERVICE NETWORKS
Rachid El Azouzi Eitan Altman Odile Pourtallier
Abstract In recent years there has been a growing interest in mathematical models for routing in networks in which the decisions are taken in a noncooperative way. Instead of a single decision maker (that may represent the network) that chooses the paths so as to maximize a global utility, one considers a number of decision makers having each its own utility to maximize by routing its own ﬂow. This gives rise to the use of noncooperative game theory and the Nash equilibrium concept for optimality. In the special case in which each decision maker wishes to ﬁnd a minimal path for each routed object (e.g. a packet) then the solution concept is the Wardrop equilibrium. It is well known that equilibria may exhibit ineﬃciencies and paradoxical behavior, such as the famous Braess paradox (in which the addition of a link to a network results in worse performance to all users). This raises the challenge for the network administrator of how to upgrade the network so that it indeed results in improved performance. We present in this paper some guidelines for that. 1. Introduction In this paper, we consider the problem of routing, in which the performance measure to be minimized is some general cost (which could represent the expected delay). We assume that some objects, are routed over shortest paths computed in terms of that cost. An object could correspond to a whole session in case all packets of a connection are assumed to follow the same path. It could correspond to a single packet if each packet could have its own path. A routing approach in which 58 DYNAMIC GAMES: THEORY AND APPLICATIONS each packet follows a shortest delay path has been advocated in Adhoc networks (Gupta and Kumar (1998)), in which, the large amount of mobility of both users as well as of the routers requires to update the routes frequently; it has further been argued that by minimizing the delay of each packet, we minimize resequencing delays, that may be harmful in real time applications, but also in data communications (indeed, the throughput of TCP/IP connections may quite deteriorate when packets arrive out of sequence, since the latter is frequently interpreted wrongly as a signal of a loss or of a congestion). When the above type of routing approach is used then the expected load at diﬀerent links in the network can be predicted as an equilibrium which can be computed in a way similar to equilibria that arise in road traﬃc. The latter is known as a Wardrop equilibrium (Wardrop (1952)) (it is known to exist and to be unique under general assumptions on the topology and on the cost; see, Patriksson (1994), p. 74–75). We study in this paper some properties of the equilibrium. In particular, we are interested in the impact of the demand of link capacities and of the topology on the performance measures at equilibrium. This has a particular signiﬁcance for the network administrator or designer when it comes to upgrading the network. A frequently used heuristic approach for upgrading a network is through Bottleneck Analysis. A system bottleneck is deﬁned as “a resource or service facility whose capacity seriously limits the performance of the entire system” (Kobayashi (1978), p. 13). Bottleneck analysis consists of adding capacity to identiﬁed bottlenecks until they cease to be bottlenecks. In a noncooperative framework, however, this approach may have devastating eﬀects; adding capacity to a link (and in particular, to a bottleneck link) may cause delays of all users to increase; in an economic context in which users pay the service provider, this may further cause a decrease in the revenues of the provider. The ﬁrst problem has already been identiﬁed in the roadtraﬃc context by Braess (1968) (see also Dafermos and Nagurney (1984a); Smith (1979)), and have further been studied in the context of queuing networks (Beans, Kelly and Taylor (1997); Calvert, Solomon and Ziedins (1997); Cohen and Jeﬀries (1997); Cohen and Kelly (1990)). In the latter references both queuing delay as well as rejection probabilities have been considered as performance measure. The focus of Braess paradox on the bottleneck link in a queuing context, as well as the paradoxical impact on the service provider have been studied in Massuda (1999). In all the above references, the paradoxical behavior occurs in models in which the number of users is inﬁnitely large, and the equilibrium concept is that of Wardrop equilibrium (Wardrop (1952)). Yet the problem may occur also in models involving ﬁnite number of players (e.g. service providers) for 3 Braess Paradox and Multiservice Networks 59 which the Nash equilibrium is the optimality concept. This has been illustrated in Korilis, Lazar and Orda (1995); Korilis, Lazar and Orda (1999). The Braess paradox has further been identiﬁed and studied in the context of distributed computing (Kameda, Altman and Kozawa (1999); Kameda et al. (2000)) where arrivals of jobs may be routed and performed on diﬀerent processors. Several papers that are scheduled to appear in JACM identify the Braess paradox that occurs in the context of non cooperative communication networks and of load balancing, see Kameda and Pourtallier (2002) and Roughgarden and Tardos (2002). Both papers illustrate how harmful the Braess paradox can be. These papers reﬂect the interest in understanding the degradation that is due to noncooperative nature of networks, in which added capacity can result in degraded performance. In view of the interest in this identiﬁed problem, it seems important to come up with engineering design tools that can predict how to upgrade a network (by adding links or capacity) so as to avoid the harmful paradox. This is what our paper proposes. The Braess paradox illustrates that the network designer or service providers, or more generally, whoever is responsible to the network topology and link capacities, have to take into consideration the reaction of noncooperative users to his decisions. Some upgrading guidelines have been proposed in Altman, El Azouzi and Pourtallier (2001); Kameda and Pourtallier (2002); Korilis, Lazar and Orda (1999) so as to avoid the Braess paradox or so as to obtain a better performance. They considered not only the framework of the Wardrop equilibrium, but also the Nashequilibrium concept in which a ﬁnite number of service providers try to minimize the average delays (or cost) for all the ﬂow generated by its subscribers. The results obtained for the Wardrop equilibrium were restricted to a particular cost representing the delay of a M/M/1 queue at each link. In this paper we extend the above results to general costs. We further consider a more general routing structure (between paths and not just between links) and allow for several classes of users (so that the cost of a path or of a link may depend on the class in some way). Some other guidelines for avoiding Braess paradox in the setting of Nash equilibrium have been obtained in Altman, El Azouzi and Pourtallier (2001), yet in that setting the guidelines turn out to be much more restrictive than those we obtain for the setting of Wardrop equilibrium. The main objective of this present paper is to pursue that direction and to provide new guidelines for avoiding the Braess paradox when upgrading the network. The Braess paradox implies that there is no monotonicity of performance measures with respect to link capacities. Another objective of this paper is to check under what conditions are delays as well as the marginal costs at equilibrium increasing in the 60 DYNAMIC GAMES: THEORY AND APPLICATIONS demands. The answer to this question turns out to be useful for the analysis of the Braess paradox. Some results on the monotonicity in the demand are already available in Dafermos and Nagurney (1984b). The paper is organized as follows: In the next section (Section 2), we present the network model, we deﬁne the concept of Wardrop equilibrium, and formulate the problem. In Section 3 we then present a framework of that equilibrium that allows for diﬀerent costs for diﬀerent classes of users (which may reﬂect, for example, that packets of diﬀerent users may have diﬀerent priorities and thus have diﬀerent delays due to appropriate buﬀer management schemes). In Section 4 we then present a suﬃcient condition for the monotonicity of performance measures when the demands increase. This allows us then to study in Section 5 methods for capacity addition. In Section 6, we demonstrate the eﬃciency of the proposed capacity addition by means of a numerical example in BCMP queuing network. 2. Problem formulation and notation We consider an open network model that consists of a set I containM ing M nodes, and a set I containing L links. We call the unit that has L to be routed a “job”. It may stand for a packet (in a packet switched network) or to a whole session (if indeed all packets of a session follow the same path). The network is crossed through by inﬁnitely many jobs that have to choose their routing. Jobs are classiﬁed into K diﬀerent classes (we will denote I the K set of classes). For example, in the context of road traﬃc a class may represent the set of a given type of vehicles, such as busses, trucks, cars or bicycles. In the context of telecommunications a class may represent the jobs sent by all the users of a given service provider. We assume that jobs do not change their class while passing through the network. We suppose that the jobs of a given class k may arrive in the system at some diﬀerent possible points, and leave the system at some diﬀerent possible points. Nevertheless the origin and destination points of a given job are determined when the job arrives in the network, and cannot change while in the system. We call a pair of one origin and one destination points an OD pair. A job with a given OD pair (od) arrives in the system at node o and leaves it at node d after visiting a series of nodes and links, which we refer to as a path, then it leaves the system. In many previous papers (Orda, Rom and Shimkin (1993); Korilis, Lazar and Orda (1995)), routing could be done at each node. In this paper we follow the approach in which a job of class k with OD pair 3 Braess Paradox and Multiservice Networks 61 (od) has to choose one of a given ﬁnite set of paths (see also Kameda and Zhang (1995); Patriksson (1994)). In this paper we suppose that the routing decision scheme is completely decentralized: each single job has to decide among a set of possible paths that connect the OD pair of that job. This choice will be made in order to minimize the cost of that job. The solution concept we are thus going to use is the Wardrop equilibrium (Wardrop (1952)). Let l denote a link of the network connecting a pair of nodes and let p denote a path, consisting of a sequence of links connecting an OD pair of nodes. Let W k denote the set of OD pairs for the jobs of class k . Denote also W the union W = k W k . The set of paths connecting k the OD pair w ∈ W k is denoted by Pw and the entire set of paths in the network for the jobs of class k by P k . There are nk paths in the p networks for jobs of class k and np paths of all jobs. Let ylk denote the ﬂow of class k on link l and let xk denote the p nonnegative ﬂow of class k on path p. The relationship between the link loads by class and the path ﬂows is: ylk =
p∈P k xk δlp p where δlr = 1, if link l is contained in path p, and 0, otherwise. Let μk l the service rate of class k at link l. Hence, the utilization of link l for class k is given by ρk = ylk /μk and the total utilization on link l is: l l ρl =
k∈I K k Let rw denote the demand of class k for OD pair w, where the following conditions are satisﬁed: k rw =
k p∈Pw ρk . l xk , ∀ k, ∀ w, p In addition, let xp denote the total ﬂow on path p, where xp =
k∈I K xk , ∀ p ∈ P . p We group the class path ﬂows into the nk dimensional column vector X p with components: [x11 , . . . , xK k ]T ; We refer to such a vector as a ﬂow r r conﬁguration. We also group total path ﬂow into a np dimensional column vector x with components: [xr1 , . . . , xrnp ]T . We call this vector
np 62 DYNAMIC GAMES: THEORY AND APPLICATIONS the total path ﬂow vector. A ﬂow conﬁguration X is said feasible, if it satisﬁes for each OD pair w ∈ W k ,
k xk = rw . p
k p∈Pw (3.1) We are now ready to describe the cost functions associated with the k paths. We consider a feasible ﬂow conﬁguration X. Let Tp (X) denote the travel cost incurred by a job of class k for using the path p if the ﬂow conﬁguration resulting from the routing of each job is X. 3. Wardrop equilibrium for a multiclass network Each individual job of class k with OD pair w, chooses its routing k through the system, by means of the choice of a path p ∈ Pw . A ﬂow conﬁguration X follows from the choices of each of the inﬁnitely many jobs. A ﬂow conﬁguration X will be said to be a Wardrop equilibrium or individually optimal, if none of the jobs has any incentive to change unilaterally its decision. This equilibrium concept was ﬁrst introduced by Wardrop (1952) in the ﬁeld of transportation and can be deﬁned through the two principles: Wardrop’s ﬁrst principle: the cost for crossing the used paths between a source and a destination are equal, the cost for any unused path with same OD pair is larger or equal that that of used ones. Wardrop’s second principle: the cost is minimum for each job. Formally, in the context of multiclass this can be deﬁned as, Definition 3.1 A feasible ﬂow conﬁguration (i.e., satisfying equation (3.1)) X, is a Wardrop equilibrium for the multiclass problem if for any k class k , any w ∈ W k and any path p ∈ Pw we have
k Tp (X) ≥ λk if xk = 0, w p k Tp (X) = λk if xk ≥ 0, w p (3.2) k where λk = min Tp (X). The minimal cost λk will be referred to as “the w w travel cost” associated to class k and OD pair w. We need one of the following assumptions on the cost function: k p∈Pw 3 Braess Paradox and Multiservice Networks 63 Assumption A 1. There exists a function Tp that depends only upon the total ﬂow vector x (and not on the ﬂow sent by each class), such that the average cost per ﬂow unit, for jobs of class k , can be written as k Tp (X) = ck Tp (x), ∀ p ∈ P k , where ck is some class dependent positive constant. 2. Tp is positive, continuous and strictly increasing. We will denote T = (Tp1 , Tp2 , . . . ) the vector of functions Tp . Assumption B 1. The average cost per ﬂow unit for jobs of class k that passes through path p ∈ P k is:
k Tp (X) = δlp Tl (ρl ), μk l∈I Ll where Tl (ρl ) is the weighted cost per unit ﬂow in link l (the function Tl does not depend on the class k ). 2. Tl (.) is positive, continuous and strictly increasing. 3. μk can be represented as μl /ck where ck is some class dependent l positive constant, and 0 < μl is ﬁnite.
kk We denote vw = k∈I c rw the weighted total demand for OD K pair w. We make the following observation. Lemma 3.1 Consider a cost vector T satisfying Assumption A or B. Then the Wardrop equilibrium conditions (3.1) and (3.2) become: For k all k , all w ∈ W k and all p ∈ Pw ,
k Tp (X) ≥ λk , if xp = 0, w k Tp (X) = λk , if xp > 0. w (3.3) Moreover, the ratio λk /ck is independent of class k , so that we can deﬁne w λw by λw := λk /ck . w Proof. Consider ﬁrst the case of a cost vector T that satisﬁes AssumpK tion A. Let w ∈ W and p ∈ P k . If xp = 0 then xk = 0 ∀ k ∈ I . p The ﬁrst part of (3.3) follows from the ﬁrst part of (3.2). Suppose that ¯K xp > 0 and, by contradiction, that there exists k ∈ I such that
k Tp (X) = Tp (x)ck > λk . w ¯ ¯ ¯ (3.4) 64 DYNAMIC GAMES: THEORY AND APPLICATIONS Since xp > 0, there exit k0 ∈ I such that xk0 > 0. From the second K p part of (3.2), we have
k Tp 0 (X) = Tp (x)ck0 = λk0 . w ¯ ¯ ¯ (3.5) k k Because rw > 0, there exists p ∈ Pw such that xk > 0. Then, from p (3.2) we get ¯ ¯ ¯ k Tp (X) = Tp (x)ck = λk . (3.6) w It follows from (3.4) and (3.6) that Tp (x) > Tp (x). (3.7) Since λk0 ≤ Tp (x)ck0 , from (3.5), we obtain Tp (x) ≤ Tp (x), which w contradicts (3.7). This establishes (3.3). k For any w ∈ W , let p ∈ k Pw be a path such that xp > 0. From (3.3),
k it comes that for any class k such that p ∈ Pw , Tp (x) = pck = λw . The ck second part of Lemma 3.1 follows, since the terms in the above equation do not depend on k . The proof for a cost function vector satisfying Assumption B follows along similar lines. 2 T k (X)
k 4. Impact of throughput variation on the equilibrium k In this section, we study the impact of a variation of the demands rw of some class k on the cost vector T(X) at the (Wardrop) equilibrium X. The results of this section extend those of Dafermos and Nagurney (1984b), where a simpler cost structure was considered considered. Namely, for any class k , the cost for using a path was the sum of link costs along that path, and the link costs did not depend on k . The following theorem, states that under Assumption A or B, an increase in the demands associated to a particular OD pair w always leads to an increase of the cost associated to w for all classes k . Theorem 3.1 Consider two throughput demand proﬁles (˜w )(w,k) and rk ˜ ˆ (ˆw )(w,k) . Let X and X be the Wardrop equilibria associated to these rk ˜ ˆ throughput demands, and let λk and λk be the class k’s travel cost asw w sociated to these two equilibria.
˜ 1. For a cost vector T satisfying Assumption A, if rw < rw , for some ˆ ˆ ˜ w ∈ W and rw = rw for all w = w, then λk < λk ∀ k ∈ I . ˆ ˜ K w w ˜ 2. For a cost vector T satisfying Assumption B, if vw < vw , for some ˆ ˆ ˜ w ∈ W and vw = vw for all w = w, then λk < λk ∀ k ∈ I . ˆ ˜ K w w 3 Braess Paradox and Multiservice Networks 65 Proof. Consider ﬁrst the case of a cost vector that satisﬁes Assumption A. 1. From (3.3) and Assumption A we have ˆ x ˆ λw = Tp (ˆ) if xp > 0, ˆ w ≤ Tp (ˆ), if xp = 0, x ˆ and ˜ x ˜ λw = Tp (˜), if xp > 0, ˜ x ˜ λw ≤ Tp (˜), if xp = 0. xp λw = Tp (˜ )˜p , ˜˜ xp λw = Tp (ˆ )ˆp , ˆˆ xx xx ˜ ˆ w ≤ Tp (ˆ )˜p , and xp λw ≤ Tp (˜ )ˆp . ˆ xx xp λ ˜ xx rw λ w = ˆˆ
p∈Pw Thus Now by summing up over p ∈ Pw , we obtain Tp (ˆ )ˆp , xx Tp (ˆ )˜p , xx
p∈Pw rw λw ≤ ˜ˆ and rw λw = ˜˜
p∈Pw Tp (˜ )˜p , xx Tp (˜ )ˆp . xx
p∈Pw rw λw ≤ ˆ˜ By summing up over w ∈ W , it comes ˆ ˆ ˜x˜ (˜w − rw )(λw − λw ) ≥ (T (x) − T (x))(ˆ − x) > 0. r ˆ˜
w∈W (3.8) ˜ The last inequality follows from assumption A. Since rw = rw for ˆ ˆ r ˆ˜ w = w, inequality (3.8) yields (˜w − rw )(λw − λw ) > 0, which ˜ ˆ ˜ ˆ ˆ implies, since rw > rw , that λw > λw . It follows that λk > λk for ˜ w w all k ∈ I . K Consider now the case of a cost vector that satisﬁes Assumption B. 2. From (3.3) and Assumption B we have ˆ λw = δlp Tl (ˆl ) if xp > 0, ρ ˆ μl l∈I L δlp ˆ λw ≤ Tl (ˆl ), if xp = 0, ρ ˆ μl
l∈I L 66
and DYNAMIC GAMES: THEORY AND APPLICATIONS δlp Tl (˜l ), if xp > 0, ρ ˜ μl l∈I L δlp ˜ λw ≤ Tl (˜l ), if xp = 0. ρ ˜ μl ˜ λw =
l∈I L kk k∈I p c xp . K Let zp = The above equations become δlp Tl (ˆl )ˆp , ρz μl l∈I L δlp zp λw ≤ ˜ˆ Tl (ˆl )˜p , ρz μl zp λ w = ˆˆ
l∈I L and δlp Tl (˜l )˜p , ρz μl l∈I L δlp zp λw ≤ ˆ˜ Tl (˜l )ˆp . ρz μl zp λw = ˜˜
l∈I L By summing up over p ∈ Pw , and w ∈ W , we obtain vw λ w = ˆˆ
w∈W w∈W ρl Tl (ˆl ), ˆρ
l∈I L vw λ w ≤ ˜ˆ ρl Tl (ˆl ), ˆρ
l∈I L and
w∈W w∈W vw λ w = ˜˜ vw λ w ≤ ˆ˜
l∈I L ρl Tl (˜l ), ˜ρ ρl Tl (˜l ). ˜ρ
l∈I L k Indeed, we have from (3.1), rw = p∈Pw xk , multiplying by ck and k p summing up over k ∈ I , we obtain K vw =
k k∈I p∈Pw K ck xk = p
p∈Pw k∈I p K ck xk p =
p∈Pw zp . It comes that ˆ (˜w − vw )(λw − λw ) ≥ v ˆ˜
w∈W l∈I L (˜l − ρl )(Tl (˜) − Tl (ˆl )) > 0. (3.9) ρ ˆ ρ ρ 3 Braess Paradox and Multiservice Networks 67 The last inequality follows from assumption B. Proceeding as in ˆ ˜ K 2 the ﬁrst part of the proof, we obtain λk > λk for all k ∈ I . w w Remark 3.1 In the case where all classes ship ﬂow from a common source s to a common destination d i.e., P k = {(sd)}, ∀ k , Theorem 3.1 establishes the monotonicity of performance (given by travel cost λksd) ) ( at Wardrop equilibrium for all k ∈ I , when the demands of classes K increases. 5. Avoiding Braess paradox The purpose of this section is to provide some methods for adding resources to a general network, with one source s and one destination d, that guarantee improvement in performance. This would guarantee in particular that the well known Braess paradox (in which adding a link results in deterioration of the performance for all users) does not occur. For some given network with one source and one destination, the designer problem is to distribute some additional capacity among the links of the network so as to improve the performances at the (Wardrop) equilibrium. Adding capacity in the network can be done by several ways. Among them, (1) by adding a new direct path from the source s to the destination d, (2) by improving an existing direct path, (3) by improving all the paths connecting s to d. We ﬁrst consider (1), i.e. the addition of a direct path from s to d that can be used by the jobs of all classes. That direct path could be in fact a whole new network, provided that it is disjoint with the previous network; it may also have new sources and destinations in addition to s and d and new traﬃc from new classes that use these new sources and destinations. The next theorem shows that this may lead to a decrease of the costs of all paths used at equilibrium. Theorem 3.2 Consider a cost vector T that satisﬁes Assumption A or ˆ ˜ B. Let X and X be the Wardrop equilibria after and before the addition ˆ ˜ of a direct path p from s to d. Consider λksd) and λksd) the travel costs ˆ ( ( ˆ ˜ ˜ ˆ for class k respectively at X and X. Then, λk ≤ λk , ∀ k ∈ I . K
Moreover the last inequality is strict if xp > 0. ˆˆ
(sd) (sd) Proof. Consider the same network (I , I ) with the initial service ML ¯k rate conﬁguration μ and throughput demand (¯(sd) )k∈I where r(sd) = ˜ rk K 68 DYNAMIC GAMES: THEORY AND APPLICATIONS k ¯ r(sd) − xp for all class k ∈ I . Let X represent the Wardrop equilibrium ˆk K ˆ ¯ associated to this new throughput demand and λksd) the travel cost for ( ¯ class k at Wardrop equilibrium X. From Conditions (3.1) and (3.2) we ˆ ¯ ˜ ¯ K have λksd) = λksd) , ∀ k ∈ I . If xp = 0, then λksd) = λksd) , which implies ˆ ( ( ( ( ˆ ˜ that λk = λk . (sd) (sd) ¯ Assume, then, that xp > 0. We have r(sd) < r(sd) (which will be used ˆˆ for Assumption A) and v(sd) < v(sd) (which will be used for Assump¯ ¯ ˜ ˆ tion B), following Theorem 3.1, we conclude that λksd) = λksd) < λksd) , ( ( ( for all k ∈ I and this completes the proof. K 2 We now examine the second way of adding capacity to the network, namely, the improvement of an existing direct path. We consider a network (I , I ) that contains a direct path, p, from s to d that can ML ˆ be used by the jobs of all classes. We derive suﬃcient conditions that guarantee an improvement in the performance when we increase the capacity of this direct path. Theorem 3.3 Let T a cost vector satisfying Assumptions A. We consider an improvement of the path p so that the cost associated to this path ˆ ˆ ˜ ˜ˆ ˆˆ is smaller for all classes, i.e., Tp (x) < Tp (x). Let X and X the Wardrop ˜ equilibria respectively after and before this improvement. Consider λksd) ( ˆ ˆ ˜ and λk the travel cost of class k at equilibria. Then λk ≤ λk ,
˜ˆ ∀ k ∈ I . Moreover the inequality is strict if xp > 0 or xp > 0. K ˆˆ Proof. From Lemma 3.1 we have λ(sd) = Tp (x), xp > 0, λ(sd) ≤ Tp (x), xp = 0, xp = r(sd) , xp ≥ 0, p ∈ P k .
k p∈∪k P(sd) (sd) (sd) (sd) (3.10) ˆ We know from Theorems 3.2 and 3.14 in Patriksson (1994) that x and ˜ x must satisfy the variational inequalities ˆˆ ˆ T (x)T (x − x) ≥ 0, ∀ x that satisﬁes (3.10), ˜ (x)T (x − x) ≥ 0, ∀ x that satisﬁes (3.10). ˜ Tˆ (3.11) (3.12) ˆˆ ˜ ˆ By adding (3.11) with x = x and (3.12) with x = x, we obtain [T (x) − ˜ (x)][ˆ − x] ≤ 0, thus T˜ x ˜ ˜ˆ ˜ˆ ˜˜ x ˜ ˆˆ [T (x) − T (x) + T (x) − T (x)][ˆ − x] ≤ 0, 3 Braess Paradox and Multiservice Networks 69 and ˜ˆ x ˜ ˜ˆ ˜˜ x ˜ ˆˆ [T (x) − T (x)][ˆ − x] ≤ [T (x) − T (x)][ˆ − x] < 0. (3.13) ˆ ˜ ˆ Since the costs of other paths are unchanged, i.e., Tp = Tp for all p = p, ˆˆ x ˜ˆ x x ˆ ˜ ˆ ˆ ˜ Equation (3.13) becomes (Tp (ˆ ) − Tp (ˆ )(ˆp − xp ) < 0 if x = x. Since ˜ˆ x ˆˆ x Tp (ˆ ) < Tp (ˆ ), we have xp > xp if x = x. ˆˆ ˜ˆ ˆ ˜ Now we have two cases: ˆˆ ˜ˆ ˆ˜ – If x = x and since Tp (x) = Tp (x) ∀ x, it follows that xp = xp = 0, ˆˆ ˜ˆ ˆ k = λk . ˜ which implies that λ(sd) (sd) ˆ ˜ ˜ˆ – If x = x, then from (3.14) we have xp > xp . Consider now two ˆˆ networks that diﬀer only by the presence or absence of the direct path p from s to d. In both networks we have the same initial ˆ capacity conﬁguration and the same set I of classes, with respecK k k ˇ ˆk ¯k ˜k tively demands r(sd) = r(sd) − xp and r(sd) = r(sd) − xp . Let λksd) ˇk ˆ ˆ ( k ¯ and λ(sd) the travel cost of class k associated to these throughput ˇ ¯ demands. Since xp > xp then r(sd) < r(sd) , and from Theorem 3.1 ˆˆ ˜ˆ we have ¯ ˇ (3.15) ∀k ∈I K, λksd) < λksd) . ( ( On the other hand, for the network with demands (ˇ(sd) )k∈K , it rk is easy to see that the equilibria conditions (3.1) and (3.2) are ˆ ˇ ˇ satisﬁed by the system ﬂow conﬁguration X, with λksd) = λksd) . ( ( k) Similarly we conclude that the network with demands (ˇ(sd) k∈I r K ¯ ˜ ¯ , with λk = λk . Hence has the system ﬂow conﬁguration X ˆ ˜ from (3.15) we obtain λksd) < λksd) . ( (
(sd) (sd) (3.14) 2 Theorem 3.4 Consider a cost function vector that satisﬁes Assump˜l tions B. Let μk and μk , respectively, be the service rate conﬁgurations ˆl ˜ ˆ after and before adding the capacity to the path p, i.e μl > μl for l ∈ p ˆ ˆ ˆ ˜ ˜ ˆ and μl = μl for l ∈ p . Let X and X, respectively, the Wardrop equilibria ˆ ˜ ˆ after and before this improvement. Consider λksd) and λksd) the travel ( ( ˆ ˜ cost of class k at the equilibria. Then λk ≤ λk , ∀ k ∈ I . Moreover K
˜ˆ the inequality is strict if xp > 0 or xp > 0. ˆˆ
(sd) (sd) ˆ Proof. Note that if there exists a link l1 that belongs to the path p, ˜ ˆ ˜ ˆ such that ρl1 < ρl1 , then ρl < ρl for each link that belongs to the path p. ˆ 70 DYNAMIC GAMES: THEORY AND APPLICATIONS Assume, then, that ρl ≤ ρl for l ∈ p. We have two possibilities. First, if ˆ ˜ ˆ ˆ ˜ K ˜ˆ xp = 0, then λksd) = λksd) for all k ∈ I . Second, if xp > 0, then we have ˜ˆ ( ( ˜ λ(sd) =
l∈p ˆ ˜ρ Tl (˜l ) ˆ and λ(sd) ≤ μl ˜ l∈p ˆ ˆρ Tl (ˆl ) . μl ˆ Since Tl (.) is strictly increasing and μl < μl for all l ∈ p, we have ˜ ˆ ˆ ˆρ ˜ρ Tl (ˆl ) Tl (˜l ) ˆ ˜ ˆ ˜ λ(sd) ≤ l∈p μl < l∈p μl = λ(sd) . It follows that λksd) < λksd) for ˆˆ ˆ˜ ( ( all k ∈ I . K ˜ ˆ Now assume that ρl > ρl for l ∈ p. Let us consider the two networks ˆ that diﬀer only by the presence or absence of the direct path p from s to ˆ d. In both networks we have the same initial capacity conﬁguration and k ˆk the same set I of classes, with respective demands r(sd) = r(sd) − xp K ˇk ˆ ˇ ¯ and rk = rk − xk . Let λk and λk the travel costs of class k ¯ ˜ associated to these throughput demands. Since ρl > ρl and μl > μl for ˆ ˜ ˆ ˜ l ∈ p, we have ˆ ˇ v(sd) − v(sd) > ¯
k∈I K (sd) (sd) p ˆ (sd) (sd) ck r(sd) − ck r(sd) ¯k ˇk
k k ck (r(sd) − xp ) − ck (r(sd) − xp ) ˜k ˆk ˆ ˆ =
k∈I K =
k∈I K ck (ˆp ) − ck (˜p ) xk xk ˆ ˆ (ˆl ρˆ − μl ρl ) > 0. μ ˆl ˜ ˜ =
l∈p ˆ ¯ ˇ K From Theorem 3.1, we conclude that λksd) < λksd) for all k ∈ I . Pro( ( k k ˆ ˜ ceeding as in the proof of Theorem 3.3, we obtain λ(sd) < λ(sd) for all k∈I . K 2 We examine the last way of adding capacity to the network, i.e. the addition of capacity on all the paths connecting s to d. Consider a network (I , I ) and a cost vector T that satisﬁes Assumption A or ML B. We consider the improvement of the capacity of all path so that the following holds: 1 ˜k X ˆk (3.16) Tp (X) = Tp ( ), with α > 1. α α 1 ˜k ˆk ˜k ˜k We observe that for any α > 1, Tp (X) = α Tp ( X ) < Tp ( X ) < Tp (X). α α Theorem 3.5 Consider a cost vector T that satisﬁes Assumption A or ˜ ˆ B. Let X and X be the Wardrop equilibria associated respectively to cost 3 Braess Paradox and Multiservice Networks 71 ˜ ˆ ˜k ˆk functions Tp and Tp . Consider λksd) and λksd) the travel cost of class ( ( ˆ ˜ ˜ ˆ k at the respective Wardrop equilibria X and X. Then λk < λk , ∀k ∈I . K
(sd) (sd) ˜k Proof. We consider now the network (I , I ), with travel costs Tp and ML k ¯ throughput demands r(sd) = r(sd) /α, k ∈ I . Let λksd) the travel cost of ¯k K ( ˆ class k associated to these throughput demands. At equilibrium X, by ˆ k and (1/α)ˆk , respectively, xp redeﬁning the cost and path ﬂows as αλ(sd) k it is straightforward to show that changing the demands from r(sd) to ˆk rk using the cost functions Tp (X) is equivalent to changing the cost ¯
k ˜k ˆk functions from Tp (X) to Tp (X/α) using the demands r(sd) . Hence the ¯ ˆ corresponding travel costs are λksd) = αλksd) . On the other hand, we ( ( ¯ have r(sd) = r(sd) /α < r(sd) and v(sd) = v(sd) /α < v(sd) hence from ¯ ˜ ˆ ¯ ¯ ˜ ¯ Theorem 3.1, λksd) < λksd) , λksd) = λksd) /α < λksd) < λksd) , which ( ( ( ( ( ( concludes the proof. 2 (sd) 6. An open BCMP queuing network In this section we study an example of such Braess paradox in networks consisting entirely of BCMP (Baskett et al. (1975)) queuing networks (BCMP stand for the initial of the authors) (see also Kelly (1979)). 6.1 BCMP queuing network We consider an open BCMP queuing network model that consists of L service links. Each service center contains either a singleserver queue with the processorsharing (PS). We assume that the service rate of each single server is state independent. Jobs are classiﬁed into K diﬀerent classes. The arrival process of jobs of each class forms a Poisson process and is independent of the state of the system. Let us denote the state of the network by n = (n1 , n2 , . . . , nL ) where k k nl = (n1 , n2 , . . . , nK ) and nl = l∈L nl where nl denotes the total l l l number of jobs of class k visiting link l. For an open queuing network (Kelly (1979); Baskett et al. (1975)), the equilibrium probability of the network state n is obtained as follows: pl (nl ) , p(n) = Gl
l∈L and Gl = 1/(1 − ρl ). Let E [nk ] be the where pl (nl ) = nl ! l average number of class k jobs at link l. We have E [nk ] = ρk /(1 − ρl ). l l
k k l∈L (ρl )/nl 72 DYNAMIC GAMES: THEORY AND APPLICATIONS By using Little’s formula, we have Tlk = E [nk ] 1/μk l l , = k (1 − ρl ) xl from which the average delay of a class k job that ﬂows through pathclass p ∈ P k is given by
k Tp = l∈L δlp Tlk =
l∈L δlp 1/μk l . (1 − ρl ) We assume that μk can be represented by μl /ck , hence the average delays l satisfy assumption B.
output 4 k=1,2,3 μ
k 2, n μ μ3
k k 1 ,n 2 3 μk 1 1 μk, n 2 input Figure 3.1. Network 6.2 Braess paradox Consider the networks shown in Figure 3.1. Packets are classiﬁed into three diﬀerent classes. Links (1,2) and (3,4) have each the following service rates: μk = μ1 , μ2 = 2μ1 and μ3 = 3μ1 where μ1 = 2.7. Link 1 1 1 (1,3) represents a path of n tandem links, each with the service rates: μ1 = μ2 , μ2 = 2μ2 and μ3 = 3μ2 with μ2 = 27. Similarly link (2,4) is 2 2 2 a path made of n consecutive links, each with service rates: μ1 = 27, 2 μ2 = 54 and μ3 = 81. Link (2,3) is path of n consecutive links each with 2 2 service rate of each class μ1 = μ, μ2 = 2μ and μ2 = 3μ where μ varies 3 3 3 3 Braess Paradox and Multiservice Networks 73 from 0 (absence of the link) to inﬁnity. We denote xk1 the left ﬂow of p class k using links (1,2) and (2,4), xk2 the right ﬂow of class k using links p (1,3) and (3,4), and xp3 the zigzag ﬂow of class k using links (1,2), (2,3) and (3,4). The total cost for each class is given by
k k k Tk = xk1 Tp1 + xk2 Tp2 + xk3 Tp3 , p p p where xk1 + xk1 + xk1 = rk . p p p We ﬁrst consider the scenario where additional capacity μ is added to path (2,3), for n = 54, r1 = 0.6, r2 = 1.6 and r3 = 1.8. In Figure 3.2 we observe that no traﬃc uses the zigzag path for 0 ≤ μ ≤ 36.28. For 36.28 ≤ μ ≤ 96.49, all three paths are used. For μ > 96.49, all traﬃc uses the zigzag path. For μ between 36.28 and 96.49, the delay is, paradoxically, worse than it would be without the zigzag path. The delay of class 1 (resp. 2,3) decreases to 2.85 (resp. 1.42, 0.95) as μ goes to inﬁnity.
4 zigzag path not used
3.5 Delay of class 1 Delay of class 2 Delay of class 3 3 Delay 2.5 only zigzag path used
all paths used
2 1.5 1 0.5 0 20 40 60 80 100 120 140 160 180 200 Capacity μ Figure 3.2. Delay of each class as a function of the added capacity in path (2,3) 6.3 Adding a direct path between source and destination Now we use the method proposed in Theorems 3.2 and 3.3, i.e., the upgrade achieved by adding a direct path connecting source 1 and destination 4. The results in Theorems 3.2 and 3.3 suggest that yet another good design practice is to focus the upgrades on direct connections between source and destination; and Figure 3.4 illustrates that indeed this approach decreases the delay of each class. 74 DYNAMIC GAMES: THEORY AND APPLICATIONS output 4 k=1,2,3 μk, n 2 μk , n 3 μ ,n
k 4 μk 1 2 3 μ1
k μ2, n
k 1 input Figure 3.3. New network 3 all paths used 2.5 Delay of class 1 Delay of class 2 Delay of class 3 2 New path not used
only zigzag path and new path used Delay 1.5 only new path used
1 0.5 0 0 50 100 150 200 Capacity μ Figure 3.4. Delay as a function of the added capacity in path (1,4) 3 Braess Paradox and Multiservice Networks 75 6.4 Multiplying the capacity of all links (l ∈ I ) L by a constant factor α > 1. Now we use the method proposed in Theorem 3.5 for eﬃciently adding resources to this network.
output 4 k=1,2,3
k α μ2, n k α μ1 k α μ3 , n 2 3 k α μ1 k α μ2, n 1 input Figure 3.5. New network Figure 3.6 shows the delay of each class as a function of the additional capacity μ where μ = (α − 1)(2μ1 + 2μ2 + μ3) with μ1 = 2.7, μ2 = 27
3 2.5 Delay of class 1 Delay of class 2 Delay of class 3 2 Delay only zigzag path used
1.5 all paths used 1 0.5 0 0 50 Addition capacity μ 100 150 200 Figure 3.6. Delay of each class as a function of the added capacity in all links 76 DYNAMIC GAMES: THEORY AND APPLICATIONS and μ3 = 40. Figure 3.6 indicates that the delay of each class decreases when the additional capacity μ increases. Hence the Braess paradox is indeed avoided. Acknowledgments. This work was partially supported by a research contract with France Telecom R&D No. 001B001. References
Altman, E., El Azouzi, R., and Pourtallier, O. (2001). Avoiding paradoxes in routing games. Proceedings of the 17th International Teletraﬃc Congress, Salvador da Bahia, Brazil, September 24–28. Baskett, F., Chandy, K.M., Muntz, R.R., and Palacios, F. (1975). Open, closed and mixed networks of queues with diﬀerent classes of customers. Journal of the ACM, 22(2):248–260. Beans, N.G., Kelly, F.P., and Taylor, P.G. (1997). Braess’s paradox in a loss network. Journal of Applied Probability, 34:155—159. ¨ Braess, D. (1968). Uber ein Paradoxen aus der Verkehrsplanung. Unternehmensforschung, 12:258–268. Calvert, B., Solomon, W., and Ziedins, I. (1997). Braess’s paradox in a queueing network with statedependent routing. Journal of Applied Probability, 34:134–154. Cohen, J.E. and Jeﬀries, C. (1997). Congestion Resulting from Increased Capacity in SingleServer Queueing Networks. IEEE/ACM Transactions on Networking, 5(2):1220–1225. Cohen, J.E. and Kelly, F.P. (1990). A Paradox of Congestion in a Queueing Network. Journal of Applied Probability, 27:730–734. Dafermos and Nagurney, A. (1984a). On some traﬃc equilibrium theory paradoxes. Transportation Research B, 18:101–110. Dafermos and Nagurney, A. (1984b). Sensitivity analysis for the asymmetric network equilibrium problem. Mathematical Programming, 28: 174–184. Gupta, P. and Kumar, P.R. (1998). A system and traﬃc dependent adaptive routing algorithm for Ad Hoc networks. Proceedings of the 37th IEEE Conference on Decision and Control, Tampa, Florida, USA. Kameda, H. and Pourtallier, O. (2002). Paradoxes in Distributed Decisions on Optimal Load Balancing for Networks of Homogeneous Computers. Journal of the ACM, 49(3):407433. Kameda, H. and Zhang, Y. (1995). Uniqueness of the solution for optimal static routing in open BCMP queueing networks. Mathematical and Computer Modelling, 22(10–12):119–130. Kameda,H., Altman, E., and Kozawa, T. (1999). A case where a paradox like Braess’s occurs in the Nash equilibrium but does not occur in 3 Braess Paradox and Multiservice Networks 77 the Wardrop equilibrium — a situation of load balancing in distributed computer systems. Proceedings of the 38th IEEE Conference on Decision and Control, Phoenix, Arizona, USA. Kameda, H., Altman, E., Kozawa, T., and Hosokawa, Y. (2000). Braesslike paradoxes of Nash Equilibria for load balancing in distributed computer systems. IEEE Transactions on Automatic Control, 45(9): 1687–1691. Kelly, F.P. (1979). Reversibility and Stochastic Networks. John Wiley & Sons, Ltd., New York. Kobayashi, H. (1978). Modelling and Analysis, an Introduction to system Performance Evaluation Methodology. Addison Wesley. Korilis, Y.A., Lazar, A.A., and Orda, A. (1995). Architecting noncooperative networks. IEEE Journal on Selected Areas in Communications, 13:1241–1251. Korilis, Y.A., Lazar, A.A., and Orda, A. (1997). Capacity allocation under noncooperative routing. IEEE Transactions on Automatic Control, 42(3):309–325. Korilis, Y.A., Lazar, A.A., and Orda, A. (1999). Avoiding the Braess paradox in noncooperative networks. Journal of Applied Probability, 36:211–222. Massuda, Y. (1999). Braess’s Paradox and Capacity Management in Decentralised Network, manuscript. Orda, A., Rom, R., and Shimkin, N. (1993). Competitive routing in multiuser environments. IEEE/ACM Transactions on Networking, 1:510–521. Patriksson, M. (1994). The Traﬃc Assignment Problem: Models and Methods. VSP BV, Topics in Transportation, The Netherlands. Roughgarden, T. and Tardos, E. (2002). How bad is selﬁsh routing. Journal of the ACM, 49(2):236–259. Smith, M.J. (1979). The marginal cost taxation of a transportation network. Transportation ResearchB, 13:237–242. Wardrop, J.G. (1952). Some theoretical aspects of road traﬃc research. Proceedings of the Institution of Civil Engineers, Part 2, pages 325– 378. Chapter 4 PRODUCTION GAMES AND PRICE DYNAMICS
Sjur Didrik Fl˚ am
Abstract This note considers production (or market) games with transferable utility. It brings out that, in many cases, explicit core solutions may be deﬁned by shadow prices — and reached via quite natural dynamics. 1. Introduction Noncooperative game theory has, during recent decades, come to occupy central ground in economics. It now uniﬁes diverse ﬁelds and links many branches (Forg´, Sz´p and Szidarovszky (1999), Gintis (2000), o e VergaRedondo (2003)). Much progress came with circumventing the strategic form, focusing instead on extensive games. Important in such games are the rules that prescribe who can do what, when, and on the basis of which information. Cooperative game theory (Forg´, Sz´p and Szidarovszky (1999), Peyo e ton Young (1994)) has, however, in the same period, seen less of expansion and new applications. This fact might mirror some dissatisfaction with the plethora of solution concepts — or with applying only the characteristic function. Notably, that function subsumes — and often conceals — underlying activities, choices and data; all indispensable for proper understanding of the situation at hand. By directing full attention to payoﬀ (or costs) sharing, the said function presumes that each relevant input already be processed. Such predilection to work only with reduced, essential data may entail several risks: One is to overlook prospective “devils hidden in the details.” Others could come with ignoring particular features, crucial for formation of viable coalitions. Also, the timing of players’ decisions, and the associated information ﬂow, might easily escape into the background (Fl˚ (2002)). am 80 DYNAMIC GAMES: THEORY AND APPLICATIONS All these risks apply, in strong measure, to socalledproduction (or market) games, and they are best mitigated by keeping data pretty much as is. The said games are of frequent occurrence and great importance. Each instance involves parties concerned with equitable sharing of eﬃcient production costs. Given a nonempty ﬁnite player set I , coalition S ⊆ I presumably incurs a standalone cost CS ∈ R∪ {+∞} that results from explicit planning and optimization.1 Along that line I consider here the quite general format CS := inf {fS (x) + hS (gS (x))} . (4.1) In (4.1) the function fS takes a prescribed set XS into R∪ {+∞} ; the operator gS maps that same set XS into a real vector space E; and ﬁnally, hS : E → R∪ {+∞} is a sort of penalty function. Section 2 provides examples. As customary, a cost proﬁle c = (ci ) ∈ RI is declared in the core, written c ∈ core, iﬀ it embodies full cost cover: coalitional stability:
i∈I ci ≥ CI i∈S ci ≤ CS and for each nonempty subset S ⊆ I. Plainly, this solution concept makes good sense when core is neither empty, nor too large, nor very sensitive to data. Given the characteristic function S → CS , deﬁned by (4.1), my objects below are three: ﬁrst, to study duality without invoking integer programming2 ; second, to display explicit core solutions, generated by socalled shadow prices; and third, to elaborate how such entities might be reached. Motivation stems from several sources: There is the recurrent need to reach beyond instances with convex preferences and production sets. Notably, some room should be given to discrete activity (decision) sets XS — as well as to nonconvex objectives fS and constraint functions gS . Besides, the well known bridge connecting competitive equilibrium to core outcomes (Ellickson (1993)), while central in welfare economics, deserves easier and more frequent crossings — in both directions. Also, it merits emphasis that Lagrangian duality, the main vehicle here, often invites more tractable computations than might commonly be expected. And, not the least, like in microeconomic theory (MasColell, Whinston
include Dubey and Shapley (1984), Evstigneev and Fl˚ (2001), Fl˚ (2002), am am Fl˚ and Jourani (2003), Granot (1986), Kalai and Zemel (1982), Owen (1975), Samet and am Zemel (1994), Sandsmark (1999) and Shapley and Shubik (1969). 2 Important issues concern indivisibilities and mathematical programming, but these will be avoided here; see Gomory and Baumol (1960), Scarf (1990), Scarf (1994), Wolsey (1981).
1 References 4 Production Games and Price Dynamics 81 and Green (1995)), one wonders about the emergence and stability of equilibrium prices. What imports in the sequel is that key resources — seen as private endowments, and construed as vectors in E — be perfectly divisible and transferable. Resource scarcity generates common willingness to pay for appropriation. Thus emerge endogenous shadow prices that equilibrate intrinsic exchange markets. Regarding the grand coalition, Section 3 argues that — absent a duality gap, and granted attainment of optimal dual values — these prices determine speciﬁc core imputations. Section 4 brings out that equilibrating prices can be reached via repeated play. 2. Production games As said, coalition S , if it were to form, would attempt to solve problem (4.1). For interpretation construe fS : XS → R∪ {+∞} as an aggregate cost function. Further, let gS : XS → E govern — and account for — resource consumption or technological features. Finally, hS : E → R∪ {+∞} should be seen as a penalty mechanism meant to enforce feasibility. Coalition S has XS as activity (decision) space. In the sequel no linear or topological structure will be imposed on the latter. Note though that E, the vector space of “resource endowments”, is common to all agents / and coalitions. By tacit assumption hS (gS (x)) = +∞ when x ∈ XS . To my knowledge T U production (or market) games have rarely been deﬁned in such generality. Format (PS ) can accommodate a wide variety of instances. To wit, consider Example 4.1 (Nonlinear constrained, cooperative programming.) For each i ∈ I there is a nonempty set Xi , two functions fi : Xi → R∪ {+∞}, gi : Xi → E, and a constraint gi (xi ) ∈ Ki ⊂ E. Let then XS := Πi∈S Xi . Further, posit fS (x) := i∈S fi (xi ) and gS (x) := i∈S gi (xi ). Finally, deﬁne hS (e) = 0 if e ∈ i∈S Ki , and let hS (e) = +∞ otherwise. Example 4.2 (Infconvolution.) Of particular importance is the special case of the preceding example where Xi = E, gi (xi ) = xi − ei , and Ki = {0}. Coalition cost is then deﬁned by the socalled inﬁmal convolution
CS := inf
i∈S fi (xi ) :
i∈S xi =
i∈S ei . In Example 4.2 only convexity is needed to have a nonempty core. This is brought out by the following 82 DYNAMIC GAMES: THEORY AND APPLICATIONS Proposition 4.1 (Convex separable cost yields nonempty core.) Suppose XS = Πi∈S Xi with each Xi convex. Also suppose
CS ≥ inf
i∈S fi (xi ) :
i∈S Ai xi = 0, xi ∈ Xi ∀S ⊂ I, with each fi convex, each Ai : Xi E aﬃne, and equality when S = I . Then the core is nonempty. Proof. Let S → wS ≥ 0 be any balanced collection of weights. That is, assume S :i∈S wS = 1 for all i. Pick any positive ε and for each coalition S a proﬁle xS = (xiS ) ∈ XS such that i∈S fi (xiS ) ≤ CS + ε and i∈S Ai xiS = 0. Posit xi := S :i∈S wS xiS . Then xi ∈ Xi , i∈I Ai xi = 0, and CI ≤
i∈I fi
S :i∈S wS xiS ≤
i∈I S :i∈S wS fi (xiS ) =
S wS ≤
S fi (xiS ) i∈S wS [CS + ε] . Since ε > 0 was arbitrary, it follows that CI ≤ S wS CS . The BondarevaShapley Shapley (1967) result now certiﬁes that the core is nonempty. 2 Proposition 4.1 indicates good prospects for ﬁnding nonempty cores, but it provides less than full satisfaction: No explicit solution is listed. “Too much” convexity is required in the activity sets Xi and cost functions Fi . Resource aggregation is “too linear.” And the original data do not enter explicitly. Together these drawbacks motivate next a closer look at the grand coalition S = I . 3. Lagrange multipliers, subgradients and minmax This section contains auxiliary, quite useful material. It takes out time and space to consider the problem and cost (PI ) CI := inf {fI (x) + hI (gI (x))} of the grand coalition. For easier notations, write simply P , C , f , g , h, X instead of PI , CI , fI , gI , hI , XI , respectively. Much analysis revolves hereafter around the perturbed function (x, e, y ) ∈ X × E × Y → f (x) + h(g (x) + e) − y , e . (4.2) 4 Production Games and Price Dynamics 83 Here Y is a judiciously chosen, convex, nonempty set of linear functionals y : E → R. The appropriate nature of Y is made precise later. This means that additional properties of the functionals y (besides linearity) will be invoked only when needed. As customary, the expression y , e stands for y (e). Objective (4.2) features a perturbation e available at a premium y , e . Thus (4.2) relaxes and imbeds problem (P ) into a competitive market where any endowment e ∈ E is evaluated at constant “unit price” y . To mirror this situation, associate the Fenchel conjugate h∗ (y ) := sup { y, e − h(e) : e ∈ E} to h. If h(e) denotes the cost of an enterprise that produces output e at revenues y , e , then h∗ (y ) is the corresponding proﬁt. In economic jargon the ﬁrm at hand is a pricetaker in the output market. Clearly, h∗ : Y → R∪ {±∞} is convex, and the biconjugate function h∗∗ (e) := sup { y, e − h∗ (y ) : y ∈ Y} satisﬁes h∗∗ ≤ h. Anyway, the relaxed objective (4.2) naturally generates a Lagrangian L(x, y ) := inf {f (x) + h(g (x) + e) − y , e : e ∈ E} = f (x) + y , g (x) − h∗ (y ), deﬁned on X × Y. Call now y ∈ Y a Lagrange multiplier iﬀ it belongs to ¯ the set ¯ M := y ∈ Y : inf L(x, y ) ≥ C =: inf(P ) . ¯
x Note that M is convex, but maybe empty. Intimately related to the problem (P ) is also the marginal (optimal value) function e ∈ E → V (e) := inf {f (x) + h(g (x) + e) : x ∈ X} . Of prime interest are diﬀerential properties of V (·) at the distinguished point e = 0. The functional y ∈ Y is called a subgradient of V : E → ¯ R∪ {±∞} at 0, written y ∈ ∂V (0), iﬀ ¯ V (e) ≥ V (0) + y , e for all e. ¯ And V is declared subdiﬀerentiable at 0 iﬀ ∂V (0) is nonempty. The following two propositions are, at least in parts, well known. They are included and proven here for completeness: Proposition 4.2 (Subgradient = Lagrange Multiplier.) Suppose C = inf(P ) = V (0) is ﬁnite. Then
∂V (0) = M. 84 DYNAMIC GAMES: THEORY AND APPLICATIONS Proof. Letting e := g (x) + e we get ¯ y ∈ ∂V (0) ¯ f (x) + h(¯) = f (x) + h(g (x) + e) ≥ V (e) ≥ V (0) + y , e e ¯ ∀(x, e) ∈ X × E f (x) + y , g (x) + h(¯) − y , e ≥ V (0) ∀(x, e) ∈ X × E ¯ e ¯¯ ¯ ∗ y f (x) + y , g (x) − h (¯) ≥ V (0) ∀x ∈ X ¯ ¯ ¯ inf L(x, y ) ≥ inf(P ) ⇔ y ∈ M.
x ⇔ ⇔ ⇔ ⇔ 2 Proposition 4.3 (Strong stability, minmax, biconjugacy and value attainment.) The value function V is subdiﬀerentiable at 0 — and problem (P ) is then called strongly stable — iﬀ inf(P ) is ﬁnite and equals the saddle value
¯ ¯ inf L(x, y ) = inf sup L(x, y ) for each Lagrange multiplier y .
x x y In that case V ∗∗ (0) = V (0). Moreover, if problem (P ) is strongly stable, then for each optimal solution x to (P ) and Lagrange multiplier y it ¯ ¯ holds that y ∈ ∂h(g (¯)) and ¯ x f (¯) + y , g (¯) = min {f (x) + y , g (x) : x ∈ X} ; x ¯x ¯ (4.3) for each pair (¯, y ) ∈ X × Y that satisﬁes (4.3) with y ∈ ∂h(g (¯), x¯ ¯ x the point x solves (P ) optimally. ¯ Proof. By Proposition 4.2 ∂V (0) = ∅ ⇔ M = ∅ ⇔ ∃y ∈ Y such that ¯ inf x L(x, y ) ≥ V (0). In this string, any y ∈ ∂V (0) = M applies to yield sup inf L(x, y ) ≥ inf L(x, y ) ≥ inf(P ). ¯
y x x In addition, the inequality f (x) + h(g (x)) ≥ f (x) + y , g (x) − h∗ (y ) is valid for all (x, y ) ∈ X × Y. Consequently, inf(P ) ≥ inf x supy L(x, y ). ¯ Thus the inequality inf x L(x, y ) ≥ inf(P ) is squeezed in sandwich: ¯ sup inf L(x, y ) ≥ inf L(x, y ) ≥ inf(P ) ≥ inf sup L(x, y ).
y x x x y (4.4) 4 Production Games and Price Dynamics 85 Equalities hold in (4.4) because supy inf x L(x, y ) ≤ inf x supy L(x, y ). From V ∗ (y ) = − inf x L(x, y ) it follows that V ∗∗ (0) = supy inf x L(x, y ). If (P ) is strongly stable, then — by the preceding argument — the last entity equals inf(P ) = V (0). Finally, given any minimizer x of (P ), pick an arbitrary y ∈ M = ¯ ¯ ∂V (0). It holds for each x ∈ X that y f (¯) + h(g (¯)) = inf(P ) ≤ L(x, y ) = f (x) + y , g (x) − h∗ (¯). x x ¯ ¯ Insert x = x on the right hand side to have h(g (¯)) ≤ y , g (¯) − h∗ (¯) ¯ x ¯x y whence y ¯x (4.5) h(g (¯)) + h∗ (¯) = y , g (¯) . x This implies ﬁrst, y ∈ ∂h(g (¯)) and second, (4.3). For the last bullet, ¯ x (4.3) and y ∈ h(g (¯)) (⇔ (4.5)) yield ¯ x f (¯) + h(g (¯)) = min {f (x) + y , g (x) − h∗ (¯)} = min L(x, y ). x x ¯ y ¯
x x This tells, in view of (4.4), that x minimizes (P ) — and that y ∈ M . 2 ¯ ¯ So far, using only algebra and numerical ordering, Lagrange multipliers — or equivalently, subgradients — were proven expedient for Lagrangian duality. It remains to be argued next that such multipliers do indeed exist in common circumstances. For that purpose recall that a point c in a subset C of real vector space is declared absorbing if for each nonzero direction d in that space there exists a real r > 0 such that c + ]0, r[ d ⊂ C . Also recall that convC denotes the convex hull of C , and epiV := {(e, r) ∈ E × R : V (e) ≤ r} is short hand for the epigraph of V . Proposition 4.4 (Linear support of V at 0.) Suppose 0 is absorbing in domV . Also suppose conv (epiV ) contains an absorbing point, but that (0, V (0)) is not of such sort. Then, letting Y consist of all linear y : E → R, the subdiﬀerential ∂V (0) is nonempty.
Proof. By the HahnBanach separation theorem there is a hyperplane that supports C := conv (epiV ) in the point (0, V (0)). That hyperplane is deﬁned in terms of a linear functional (e∗ , r∗ ) = 0 by e∗ , e + r∗ r ≥ r∗ V (0) for all (e, r) ∈ C. (4.6) Plainly, (e, r) ∈ C & r > r ⇒ (e, r) ∈ C . Consequently, r∗ ≥ 0. If ¯ ¯ ∗ = 0, then, since 0 is absorbing in domV , it holds that e∗ , e ≥ 0 for r all e ∈ E, whence e∗ = 0, and the contradiction (e∗ , r∗ ) = 0 obtains. So, 86 DYNAMIC GAMES: THEORY AND APPLICATIONS divide through (4.6) with r∗ > 0, deﬁne y := −e∗ /r∗ , and put r = V (e) to have V (e) ≥ V (0) + y , e for all e ∈ E. That is, y ∈ ∂V (0). 2 Proposition 4.5 (Continuous linear support of V at 0.) Let E be a ˇ topological, locally convex, separated, real vector space. Denote by V the largest convex function ≤ V . Suppose V is ﬁnitevalued, bounded above ˇ on a neighborhood of 0 and V (0) = V (0). Then, letting Y consist of all continuous linear functionals y : E → R, the subdiﬀerential ∂V (0) is nonempty.
Propositions 4.4 and 4.5 emphasize the convenience of (0, V (0)) being “noninterior” to conv (epiV ). In particular, it simpliﬁes things to have epiV — or equivalently, V itself — convex. 4. Saddlepoints and core solutions After so much preparation it is time to reconsider the production game having coalition costs CS deﬁned by (4.1). Quite reasonably, suppose henceforth that a weak form of subadditivity holds, namely: CI ≤ i∈I Ci < +∞. As announced, the objective is to ﬁnd explicit cost allocations c = (ci ) ∈ core. For that purpose, in view of Example 4.2, recall that LS (x, y ) = fS (x) + y , gS (x) − h∗ (y ), S and introduce a standing Hypothesis on additive estimates: Let henceforth XS := Πi∈S Xi and suppose there exist for each i, functions fi : Xi → R∪ {+∞}, gi : Xi → E; and hi : E → R∪ {+∞} such that for all S ⊆ I and y ∈ Y, inf LS (x, y ) ≥ inf
x i∈S [fi (xi ) + y , gi (xi ) − h∗ (y )] : xi ∈ Xi i . (4.7) Further, for the grand coalition S = I it should hold that inf LI (x, y ) ≤ sup inf
x y i∈I [fi (xi ) + y , gi (xi ) − h∗ (y )] : xi ∈ Xi i . Proposition 4.6 The standing hypothesis holds if for all S ⊆ I , x ∈ XS , y ∈ Y
fS (x) + y , gS (x) ≥
i∈S {fi (xi ) + y , gi (xi ) } , (4.8) 4 Production Games and Price Dynamics 87 and for all e ∈ E, hS (e) ≥ inf
i∈S hi (ei ) :
i∈S ei = e , (4.9) with equalities when S = I . Proof. (4.9) implies h∗ (y ) ≤ S
i∈S h∗ (y ). i 2 Example 4.3 (Positive homogenous penalty.) Let h : E → R∪ {+∞} be positively homogeneous. For example, h could be the support function of some nonempty subset of a predual to E. Then h∗ , restricted to Y, is the extended indicator δY of some convex set Y ⊆ Y. That is, h∗ (y ) = 0 if y ∈ Y , +∞ otherwise. Suppose h∗ = h∗ for all S ⊆ I . Then S (4.8) implies (4.7). Example 4.4 (Cone constraints.) Of special notice is the instance when h, as described in Example 4.3, equals the extended indicator δK of a convex cone K ⊂ E. Then h∗ = δK ∗ where K ∗ := {y : y , K ≤ 0} is the negative dual cone. In Example 4.1 let all Ki be the same convex cone K ⊂ E and posit h∗ := h∗ for all S ⊆ I . Then the standing S hypothesis is satisﬁed, and coalition S incurs cost
CS := inf
i∈S fi (xi ) :
i∈S gi (xi ) ∈ K, xi ∈ Xi . Observe that costs and constraints are here pooled additively. However, no activity set can be transferred from any agent to another. Example 4.5 (Infconvolution of penalties.) When
hS (e) := inf
i∈S hi (xi ) :
i∈S xi = e , we get h∗ (y ) = S i∈S h∗ (y ). i Theorem 4.1 (Nonemptiness of the core and explicit allocations.)
Suppose VI∗∗ (0) = VI (0). Then, under the standing hypothesis, core = ∅. If moreover, VI (·) is subdiﬀerentiable at 0 — that is, if (PI ) is strongly stable — then each Lagrange multiplier y for problem (PI ) ¯ generates a cost allocation c = (ci ) ∈ core by the formula y ¯ ci = ci (¯) := inf {fi (xi ) + y , gi (xi ) − h∗ (¯) : xi ∈ Xi } . iy (4.10) 88 DYNAMIC GAMES: THEORY AND APPLICATIONS Proof. By the standing assumption it holds for any price y ∈ Y and each coalition S that ci (y ) ≤ inf LS (x, y ) ≤ sup inf LS (x, y ) ≤ inf sup LS (x, y )
i∈S x x y x x y = inf {fS (x) + h∗∗ (gS (x))} ≤ inf {fS (x) + hS (gS (x))} = CS . S
x So, no coalition ought reasonably block a payment scheme of the said sort i → ci (y ). In addition, if y is a Lagrange multiplier, then ¯ ci (¯) = inf LI (x, y ) ≥ CI . y ¯
i∈I x In lack of strong stability, when merely VI∗∗ (0) = VI (0), choose for each integer n a “price” y n ∈ Y such that the numbers cn = ci (y n ), i ∈ I , i satisfy cn = inf LI (x, y n ) ≥ CI − 1/n. i
i∈I x As argued above, i∈S cn ≤ CS for all S ⊆ I . In particular, cn ≤ Ci < i i +∞. From cn ≥ CI − 1/n − j =i Cj it follows that the sequence (cn ) is i bounded. Clearly, any accumulation point c belongs to core. 2 Example 4.6 (Cooperative linear programming.) A special and important version of Example 4.1 — and Example 4.2 — has Xi := Rni with + T linear cost ki xi , ki ∈ Rni , and linear constraints gi (xi ) := Ai xi − ei , ei ∈ Rm , Ai being a m × ni matrix. Posit Ki := {0} for all i to get, for coalition S , cost given by the standard linear program
(PS ) CS := inf
i∈S T ki xi : i∈S Ai xi =
i∈S ei with xi ≥ 0 for all i Suppose that the primal problem (PI ), as just deﬁned, and its dual (DI ) sup y T
i∈I ei : AT y ≤ ki for all i i are both feasible. Then inf(DI ) attained and, by Theorem 4.1, for ¯ any optimal solution y to (DI ), the payment scheme ci := y T ei yields ¯ (ci ) ∈ core. Thus, regarding ei as the production target of “factory” or corporate division i, those targets are evaluated by a common price y ¯ generated endogenously. 4 Production Games and Price Dynamics 89 ¯ Example 4.7 (Infconvolution continued.) Each Lagrange multiplier y that applies to Example 4.2, generates a cost allocation (ci ) ∈ core via ¯ y ci := y , ei − fi∗ (¯). This formula is quite telling: each agent is charged for his “production target” less the pricetaking proﬁt he can generate, both entities calculated at shadow prices. 5. Price dynamics Suppose henceforth that there exists at least one Lagrange multiplier. That is, suppose M = ∅. Further, for simplicity, let the endowment space E, and its topological dual Y, be real (ﬁnitedimensional) Euclidean.3 Denote by D(y ) := inf x∈XI LI (x, y ) the socalled dual objective. Most importantly, the function y ∈ Y → D(y ) ∈ R∪ {−∞} so deﬁned, is upper semicontinuous concave. And M , the optimal solution set of the dual problem sup {D(y ) : y ∈ Y} , is nonempty closed convex. Consequently, each y ∈ Y has a unique, orthogonal projection (closest approximation) y = PM y in M . Let Y := ¯ domD and suppose Y is closed. Denote by T y := clR+ {Y − y } the tangent cone of Y at the point y ∈ Y and by N y := {y ∗ : y ∗ , T y ≤ 0} the associated negative dual or normal cone. Proposition 4.7 (Continuoustime price convergence to Lagrange multipliers.) For any initial point y (0) ∈ Y , at which D is superdifferentiable, the two diﬀerential inclusions
y ∈ PT y ∂D(y ) ˙ and y ∈ ∂D(y ) − N y ˙ (4.11) admit the same, unique, absolutely continuous, inﬁnitely extendable trajectory 0 ≤ t → y (t) ∈ Y . Moreover, y (t) − PM y (t) → 0. Proof. Let Δ(y ) := min { y − y : y ∈ M } denote the Euclidean dis¯¯ tance from y to the set M of optimal dual solutions. System y ∈ ˙ ∂D(y ) − N y has a unique, inﬁnitely extendable solution y (·) along which d Δ(y (t))2 /2 = y − PM y, y ∈ y − PM y, ∂D(y ) − N y ˙ dt ≤ y − PM y, ∂D(y ) ≤ D(y ) − D(PM y ).
3 Real Hilbert spaces E can also be accommodated. 90 DYNAMIC GAMES: THEORY AND APPLICATIONS Concavity explains the last inequality. Because D(y ) − D(PM y ) ≤ 0, it ˙ follows that y (t) − PM y (t) → 0. System y ∈ PT y ∂D(y ) also has an inﬁnitely extendable solution by Nagumo’s theorem Aubin (1991). Since 2 PT y ∂D(y ) ⊆ ∂D(y ) − N y that solution is unique as well. Proposition 4.8 (Discretetime price convergence to Lagrange multipliers) Suppose D is superdiﬀerentiable on Y with ∂ D(y ) 2 ≤ κ(1 + y − PM y 2 ) for some constant κ > 0. Select step sizes sk > 0 such that s2 < +∞. Then, for any initial point y0 ∈ Y , the sk = +∞ and k sequence {yk } generated iteratively by the diﬀerence inclusion
yk+1 ∈ PY [yk + sk ∂D(yk )] , (4.12) is bounded, and every accumulation point y must be a Lagrange multi¯ plier. Proof. This result is well known, but its proof is outlined for complete¯ ness. Let yk := PM yk and αk = yk − yk 2 . Then (4.12) implies ¯ ¯ ¯ αk+1 = yk+1 − yk+1 2≤ yk+1 − yk
2 ∈ PY [yk + sk ∂D(yk )] − PY yk ¯
2 2 2 ≤ yk + sk ∂D(yk ) − yk ¯ ≤ yk − yk 2 + s2 ∂ D(yk ) ¯ k + 2sk yk − yk , ∂D(yk ) ¯ ≤ αk (1 + βk ) + γk − δk with βk := s2 κ, γk := s2 κ, and δk := −2sk yk − yk , ∂D(yk ) . The ¯ k k demonstration of Proposition 4.7 brought out that y − PM y > 0 ⇒ sup y − PM y, ∂D(y ) < 0. Thus δk ≥ 0. Since βk < +∞ and γk < +∞, it must be the δk < +∞; see Benveniste, M´tivier, and e case that αk converges, and Priouret (1990) Chapter 5, Lemma 2. If lim αk > 0, the property sk = +∞ would imply the contradiction δk = +∞. Thus αk → 0, and the proof is complete. 2 Acknowledgments. I thank INDAM for generous support, Universita degli Studi di Trento for great hospitality, and Gabriele H. Greco for stimulating discussions. References
Aubin, JP. (1991). Viability Theory. Birkh¨user, Boston. a 4 Production Games and Price Dynamics 91 Benveniste, A., M´tivier, M., and Priouret, P. (1990). Adaptive Algoe rithms and Stochastic Approximations. Springer, Berlin. Bourass, A. and Giner, E. (2001). KuhnTucker conditions and integral functionals. Journal of Convex Analysis 8, 2, 533553 (2001). Dubey, P. and Shapley, L.S. (1984). Totally balanced games arising from controlled programming problems. Mathematical Programming, 29:245–276. Ellickson, B. (1993). Competitive Equilibrium. Cambridge University Press. Evstigneev, I.V. and Fl˚ S.D. (2001). Sharing nonconvex cost. Journal am, of Global Optimization, 20:257–271. Fl˚ S.D. (2002). Stochastic programming, cooperation, and risk exam, change. Optimization Methods and Software, 17:493–504. Fl˚ S.D., and Jourani, A. (2003). Strategic behavior and partial cost am, sharing. Games and Economic Behavior, 43:44–56. Forg´, F., Sz´p, J., and Szidarovszky, F. (1999). Introduction to the Theo e ory of Games. Kluwer Academic Publishers, Dordrecht. Gintis, H. (2000). Game Theory Evolving. Princeton University Press. Gomory, R.E. and Baumol, W.J. (1960). Integer programming and pricing. Econometrica, 28(3):521–550. Granot, D. (1986). A generalized linear production model: A unifying model. Mathematical Programming, 43:212–222. Kalai, E. and Zemel, E. (1982). Generalized network problems yielding totally balanced games. Operations Research, 30(5):998–1008. MasColell, A., Whinston, M.D., and Green, J.R. (1995). Microeconomic Theory. Oxford University Press. Owen, G. (1975). On the core of linear production games. Mathematical Programming, 9:358–370. Peyton Young, H. (1994). Equity. Princeton University Press. Samet, D. and Zemel, S. (1994). On the core and dual set of linear programming games. Mathematics of Operations Research, 9(2): 309–316. Sandsmark, M. (1999). Production games under uncertainty. Computational Economics, 3:237–253. Scarf, H.E. (1990). Mathematical programming and economic theory. Operations Research, 38:377–385. Scarf, H.E. (1994). The allocation of resources in the presence of indivisibilities. Journal of Economic Perspectives, 8(4):111–128. Shapley, L.S. (1967). On balanced sets and cores. Naval Research Logistics Quarterly, 14:453–461. Shapley, L.S. and Shubik, M. (1969). On market games. Journal of Economic Theory, 1:9–25. 92 DYNAMIC GAMES: THEORY AND APPLICATIONS VergaRedondo, F. (2003). Economics and the Theory of Games. Cambridge University Press. Wolsey, L.A. (1981). Integer programming duality: Price functions and sensitivity analysis. Mathematical Programming, 20:173–195. Chapter 5 CONSISTENT CONJECTURES, EQUILIBRIA AND DYNAMIC GAMES
Alain JeanMarie Mabel Tidball
Abstract We discuss in this paper the relationships between conjectures, conjectural equilibria, consistency and Nash equilibria in the classical theory of discretetime dynamic games. We propose a theoretical framework in which we deﬁne conjectural equilibria with several degrees of consistency. In particular, we introduce feedbackconsistency, and we prove that the corresponding equilibria and Nashfeedback equilibria of the game coincide. We discuss the relationship between these results and previous studies based on diﬀerential games and supergames. 1. Introduction This paper discusses the relationships between the concept of conjectures and the classical theory of equilibria in dynamic games. The idea of introducing conjectures in games has a long history, which goes back to the work of Bowley (1924) and Frisch (1933). There are, at least, two related reasons for this. One is the wish to capture the idea that economic agents seem to have a tendency, in practice, to anticipate the move of their opponents. The other one is the necessity to cope with the lack or the imprecision of the information available to players. The ﬁrst notion of conjectures has been developed for static games and has led to the theory of conjectural variations equilibria. The principle is that each player i assumes that her opponent j will “respond” to (inﬁnitesimal) variations of her strategy δei by a proportional variation δej = rij δei . Considering this, player i is faced with an optimization problem in which her payoﬀ Πi is perceived as depending only on her strategy ei . A set of conjectural variations rij and a strategy proﬁle 94 DYNAMIC GAMES: THEORY AND APPLICATIONS (e1 , . . . , en ) is a conjectural variations equilibrium if it solves simultaneously all players’ optimization problems. The ﬁrst order conditions for those are: ∂i Π (e1 , . . . , en ) + ∂ei rij
j =i ∂i Π (e1 , . . . , en ) = 0. ∂ej (5.1) Conjectural variations equilibria generalize Nash equilibria, which corresponds to “zero conjectures” rij = 0. The concept of conjectural variations equilibria has received numerous criticisms. First, there is a problem of rationality. Under the assumptions of complete knowledge, and common knowledge of rationality, the process of elimination of dominated strategies usually rules out anything but the Nash equilibrium. Second, the choice of the conjectural variations rij is, a priori, arbitrary, and without a way to point out particularly reasonable conjectures, the theory seems to be able to explain any observed outcome. Bresnahan (1981) has proposed to select conjectures that are consistent, in the sense that reaction functions and conjectured actions mutually coincide. Yet, the principal criticisms persist. These criticisms are based on the assumption of complete knowledge, and on the fact that conjectural variation games are static. Yet, from the onset, it was clear to the various authors discussing conjectural variations in static games, that the proper (but less tractable) way of modeling agents would be a dynamic setting. Only the presence of a dynamic structure, with repeated interactions, and the observation of what rivals have actually played, gives substance to the idea that players have responses. In a dynamic game, the structure of information is clearly made precise, specifying in particular what is the information observed and available to agents, based on which they can choose their decisions. This allows to state the principle of consistency as the coincidence of prior assumptions and observed facts. Making precise how conjectures and consistency can be deﬁned is the main purpose of this paper. Before turning to this point, let us note that there exists a second approach linking conjectures and dynamic games. Several authors have pointed out the fact that computing stationary Nashfeedback equilibria in certain dynamic games, leads to steadystate solutions which are identiﬁable to the conjectural variations equilibria of the associated static game. In that case, the value of the conjectural variation rij is precisely deﬁned from the parameters of the dynamic game. This correspondence illustrates the idea that static conjectural variations equilibria are worthwhile being studied, if not as a factual description of interactions between players, but at least as a technical “shortcut” for studying more 5. Consistent Conjectures and Dynamic Games 95 complex dynamic interactions. This principle has been applied in dynamic games, in particular by Driskill and McCaﬀerty (1989), Wildasin (1991), Dockner (1992), Cabral (1995), Itaya and Shimomura (2001), Itaya and Okamura (2003) and Figui`res (2002). These results are ree ported in Figui`res et al. (2004), Chapter 2. e The rest of this paper is organized as follows. In Section 2, we present a model in which the notion of consistent conjecture is embedded in the deﬁnition of the game. We show in this context that consistent conjectures and feedback strategies are deeply related. In particular, we prove that, in contrast with the static case, Nash equilibria can be seen as consistent conjectural equilibria. This part is a development on ideas sketched in Chapter 3 of Figui`res et al. (2004). e These results are the discretetime analogy of results of Fershtman and Kamien (1985) who have ﬁrst incorporated the notion of consistent conjectural equilibria into the theory of diﬀerential games. Section 3 is devoted to this case, in which conjectural equilibria provide a new interpretation of openloop and closedloop equilibria. We ﬁnish with a description of the model of Friedman (1968), who was the ﬁrst to develop the idea of consistent conjectures, in the case of supergames and with an inﬁnite time horizon. We show how Friedman’s proposal of reaction function equilibria ﬁts in our general framework. We also review existence results obtained for such equilibria, in particular for Cournot’s duopoly in a linearquadratic setting (Section 4). We conclude the discussion in Section 5. 2. Conjectures for dynamic games, equilibria and consistency The purpose of this section is to present a theoretical framework for deﬁning consistent conjectures in discretetime dynamic games, based essentially on the ideas of Fershtman and Kamien (1985) and Friedman (1968). The general principle is that i) players form conjectures on how the other players react (or would react) to their actions, ii) they optimize their payoﬀ based on this assumption, and iii) conjectures should be consistent with facts, in the sense that the evolution of the game should be the same as what was believed before implementing the decisions. The idea of individual optimization based on some assumed “reaction” of other players is the heart of the conjectural principle, as we have seen in the introduction when discussing conjectural variations equilibria. We shall see however that diﬀerences appear in the way consistency is enforced. This depends on the information which is assumed available to players, in a way very similar to the deﬁnition of diﬀerent equilibrium 96 DYNAMIC GAMES: THEORY AND APPLICATIONS concepts in dynamic games. Among the possibilities, one postulates that consistency exists if conjectures and best responses coincide. This requirement is in the spirit of the deﬁnition of Bresnahan (1981) for consistency in static conjectural variations games. We review the general framework and diﬀerent variants for the notion of consistency in Section 2.1. The principal contribution of this survey is a terminology distinguishing between the diﬀerent variants of conjectures and conjectural equilibria used so far in the literature. Next, in Section 2.2, we establish the links which exist between a particular concept of conjectural equilibria (“feedbackconsistent” equilibria) and Nashfeedback equilibria for discretetime dynamic games. 2.1 Principle Consider a dynamic game with n players and time horizon T , ﬁnite or inﬁnite. The state of the game at time t is described by a vector x(t) ∈ S. Player i has an inﬂuence on the evolution of the state, through a control variable. Let ei (t) be the control performed by player i in the tth period of the game, that is, between time t and t + 1. Let E i be the set in which player i chooses her control, E = E 1 × · · · × E n and E−i = E 1 × · · · × E i−1 × E i+1 × · · · × E n be the space of controls of player i’s opponents. Let also e(t) = (e1 (t), . . . , en (t)) ∈ E denote the vector of controls applied by each player. An element of E−i will be denoted by e−i . The state of the game at time t + 1 results from the combination of the controls e(t) and the state x(t). Formally, the state evolves according to some dynamics x(t + 1) = ft (x(t), e(t)), x(0) = x0 , (5.2) for some sequence of functions ft : S × E → S. The instantaneous payoﬀ of player i is a function Πi : S × E → R of the state and controls, and t her total payoﬀ is given by:
T V i (x0 ; e(0), e(1), e(2), . . . , e(T )) =
t=0 Πi (x(t), e(t)), t (5.3) for some sequence of payoﬀ functions Πi . The deﬁnition of conjectural t equilibria involves the deﬁnition of conjectures, and the resolution of individual optimization problems. Conjectures. Each player has a belief on the behavior of the other ones. More precisely player i thinks that player j chooses her control by applying some function φij to observed quantities. Several degrees of t 5. Consistent Conjectures and Dynamic Games 97 behavioral complexity are possible here. We identify some of those found in the current literature in the following deﬁnition. In the sequel, we shall use the superscript “i†” as a shorthand for “believed by player i”. Definition 5.1 The conjecture of player i about player j is a sequence of functions φij , t = 0, 1, . . . , T , which deﬁne the conjectured value of t player j ’s control, ei† (t). Depending on the situation, we may have: j
φij : S → E j , t (statebased conjectures) or φij : S × E → E j , t φij : S × E−i → E j , t (“complete” conjectures). The ﬁrst form (5.4) is the basic one, and is used by Fershtman and Kamien (1985) with diﬀerential games. The second one (5.5) is inspired by the supergame1 model of Friedman (1968), in which the conjecture involves the last observed move of the opponents. We have generalized the idea in the deﬁnition, and we come back to the speciﬁc situation of supergames in the sequel. Conjectures that are based on the state and the control need the speciﬁcation of an initial control e−1 , observed at the beginning of the game, in addition to the initial state x0 . The third form was also introduced by Fershtman and Kamien (1985) who termed it “complete”. In discrete time, endowing players with such conjectures brings forth the problem of justifying how all players can simultaneously think that their opponents observe their moves and react, as if they were all leaders in a Stackelberg game. Indeed, the conjecture involves the quantity e−j (t), that is, the control that players i’s opponents are about to choose, and which is not a priori observable at the moment player i’s decision is done. In that sense, this form is more in the spirit of conjectural variations. We shall see indeed in Section 2.2 that the two ﬁrst forms are related to Nashfeedback equilibria, while the third is more related to static conjectural variations equilibria. Laitner (1980) has proposed a related form in a discretetime supergame. He assumes a conjecture of the form ei† (t) = φij (e−j (t), t j
1 We adopt here the terminology of Myerson (1991), according to which a supergame is a dynamic game with a constant state and complete information. with ei† (t) = φij (x(t)), t j ei† (t) = φij (x(t), e(t − 1)), t j ei† (t) = φij (x(t), e−j (t)) t j (5.4) with (5.5) (state and controlbased conjectures), or with (5.6) 98 DYNAMIC GAMES: THEORY AND APPLICATIONS e(t − 1)), which could be generalized to ei† (t) = φij (x(t), e−j (t), e(t − 1)) t j in a game with a nontrivial state evolution. Individual optimization. Consider ﬁrst statebased conjectures, of the form (5.4). Given her conjectures, player i is faced with a classical dynamic control problem. Indeed, she is led to conclude by replacing ej (t), for j = i, by φij (.) in (5.2), that the state actually evolves according t to some dynamics depending only on her own control sequence ei (t), t = 0, . . . , T . Likewise, her payoﬀ depends only on her own control. More precisely: the conjectured dynamics and payoﬀ are:
xi† (t + 1) = ft xi† (t), φi1 (xi† (t)), . . . , φi,i−1 (xi† (t)), ei (t), t t φi,i+1 (xi† (t)), . . . φin (xi† (t)) , t t Πi† (x, ei ) t = Πi t (5.7) . (5.8) x, φi1 (x), . . . , φi,i−1 (x), ei , φi,i+1 (x), . . . φin (x) t t t t For player i, the system evolves therefore according to some dynamics of the form: ˜ xi† (t + 1) = fti (xi† (t), ei (t)), xi† (0) = x0 . (5.9) If conjectures are of the form (5.5), a diﬃculty arises. Replacing the conjectures in the state dynamics (5.2), we obtain: xi† (t + 1) = ft xi† (t), φi1 (xi† (t), e(t − 1)), . . . , φi,i−1 (xi† (t), e(t − 1)), t t ei (t), φi,i+1 (xi† (t), e(t − 1)), . . . , φin (xi† (t), e(t − 1)) . t t This equation involves the ej (t − 1). Replacing them by their conjectured values φij 1 (xi† (t − 1), e(t − 2)) makes appear the previous state xi† (t − 1) t− and still involves unresolved quantities ek (t − 2). Unless there are only two players, this elimination process necessitates going backwards in time until t = 0. The resulting formula for xi† (t + 1) will therefore involve all previous states as well as all previous controls ei (s), s ≤ t. Such an evolution is improper for setting up a classical control problem with Markovian dynamics. In order to circumvent this diﬃculty, it is possible to deﬁne a proper control problem for player i in an enlarged state space. Indeed, deﬁne y(t) = e(t − 1). Then the previous equation rewrites as: xi† (t + 1) = ft xi† (t), φi1 (xi† (t), y(t)), . . . , φi,i−1 (xi† (t), y(t)), ei (t), t t φi,i+1 (xi† (t), y(t)), . . . , φin (xi† (t), y(t)) , t t yj (t + 1) = φij (x(t), y(t)) t j=i (5.10) (5.11) 5. Consistent Conjectures and Dynamic Games 99
(5.12) yi (t + 1) = ei (t). The initial conditions are x(0) = x0 and y(0) = e−1 . Similarly, the conjectured cost function can be written as: Πi† (x, y, ei ) t = Πi x, φi1 (x, y), . . . , φi,i−1 (x, y), ei , φi,i+1 (x, y), . . . , φin (x, y) . t t t t t With this cost function and the state dynamics (5.10)–(5.12), player i faces a welldeﬁned control problem. Consistency. In general terms, consistency of a conjectural mechanism is the requirement that the outcome of each player’s individual optimization problem corresponds to what has been conjectured about her. But diﬀerent interpretations of this general rule are possible, depending on what kind of “outcome” is selected. It is wellknown that solving a deterministic dynamic control problem can be done in an “openloop” or in a “feedback” perspective. In the ﬁrst case, player i will obtain an optimal action proﬁle {ei† (t), t = 0, . . . , T }, i assumed unique. In the second case, player i will obtain an optimal i i feedback {γt † (·), t = 0, . . . , T }, where each γt † is a function of the state space into the action space E i . Select ﬁrst the openloop point of view. Starting from the computed optimal proﬁle ei† (t), player i deduces the i conjectured actions of her opponents using her conjecture scheme φij . t She therefore obtains {ei† (t)}, j = i. Replacing in turn these values in j the dynamics, she obtains a conjectured state path {xi† (t)}. If all players j actually implements their decision rule ej † , the evoluj tion of the state will follow the real dynamics (5.2), and result in some actual trajectory {xa (t)}. Speciﬁcally, the actual evolution of the state is:
xa (t + 1) = ft (xa (t), e1† (t), . . . , en† (t)), n 1 xa (0) = x0 . (5.13) Players will observe a discrepancy with their beliefs unless the actual path coincides with their conjectured path. If it does, no player will have a reason to challenge their conjectures and deviate from the “optimal” control they have computed. This leads to the following deﬁnition of a stateconsistent equilibrium. Denote by φi the vector of functions t (φi1 , . . . , φi,i−1 , φi,i+1 , . . . , φin ). t t t t Definition 5.2 (Stateconsistent conjectural equilibrium) The vector of conjectures (φ1 , . . . , φn ) is a stateconsistent conjectural equilibrium t t if (5.14) xi† (t) = xa (t), 100 DYNAMIC GAMES: THEORY AND APPLICATIONS for all i and t, and for all initial state x0 . An alternative deﬁnition, proposed by Fershtman and Kamien (1985), requires the coincidence of control paths, given the initial condition of the state: Definition 5.3 (Weak controlconsistent conjectural equilibrium) The vector of conjectures (φ1 , . . . , φn ) is a weak controlconsistent conjectural t t equilibrium if (5.15) ei† (t) = ej † (t),
for all i = j and t, given the initial state x0 . The stronger notion proposed by Fershtman and Kamien (1985) (where it is termed the “perfect” equilibrium) requires the coincidence for all possible initial states: Definition 5.4 (Controlconsistent conjectural equilibrium) The vector of conjectures (φ1 , . . . , φn ) is a controlconsistent conjectural equit t librium if the coincidence of controls (5.15) holds for all i = j , all t and all initial state x0 .
Clearly, any controlconsistent conjectural equilibrium is stateconsitent provided the dynamics (5.2) have a unique solution. It is of course possible to deﬁne a weak stateconsistent equilibrium, where coincidence of trajectories is required only for some particular initial value. This concept does not seem to be used in the existing literature. Now, consider that the solution of the deterministic control problems is expressed as a state feedback. Accordingly, when solving her (conjectured) optimization problem, player i concludes that there exists a i sequence of functions γt † : S → E i such that: ei† (t) = γ i† (x(t)). i Consistency can then be deﬁned as the requirement that optimal feedbacks coincide with conjectures. Definition 5.5 (Feedbackconsistent conjectural equilibrium) The vector of statebased conjectures (φ1 , . . . , φn ) is a feedbackconsistent cont t i jectural equilibrium if γt † = φji for all i = j and all t. t
Obviously, consistency in this sense implies that the conjectures of two diﬀerent players about some third player i coincide: φji = φki , t t ∀i = j = k, ∀t. (5.16) 5. Consistent Conjectures and Dynamic Games 101 In addition, if the time horizon T is inﬁnite, and if there exists a stai† tionary feedback γ∞ , then a conjecture which is consistent with this stationary feedback should coincide with it at any time. This implies that the conjecture does not vary over time. For a simple equilibrium in the sense of Deﬁnition 5.2, none of these requirements are necessary a priori. It may happen that trajectories coincide in a “casual” way, resulting from discrepant conjectures of the diﬀerent players. 2.2 Feedback consistency and Nashfeedback equilibria We now turn to the principal result establishing the link between conjectural equilibria and the classical Nashfeedback equilibria of dynamic games. We assume in this section that T is ﬁnite. Theorem 5.1 Consider a game with statebased conjectures φji : S → t E i . In such a game, Feedbackconsistent equilibria and Nashfeedback equilibria coincide.
Proof. The proof consists in identifying the value functions of both problems. According to the description above, looking for a feedbackconsistent conjectural equilibrium involves the solution of the control problem:
T max
ei (.) t=0 Πi† (xi† (t), ei (t)) , t with the state evolution (5.9): ˜ xi† (t + 1) = fti (xi† (t), ei (t)), xi† (0) = x0 , ˜ and where fti and Πi† have been deﬁned by Equations (5.7)–(5.9). The t optimal feedback control in this case is given by the solution of the dynamic programming equation: ˜ Wti−1 (x) = max Πi† (x, ei ) + Wti (fti (x, ei )) t
ei ∈E i which deﬁnes recursively the sequence of value functions Wti (·), starting i with WT +1 ≡ 0. Consider now the Nashfeedback equilibria (NFBE) of the dynamic game: for each player i, maximize
T Πi (x(t), e(t)) t
t=0 102 DYNAMIC GAMES: THEORY AND APPLICATIONS with the state dynamics (5.2): x(t + 1) = ft (x(t), e(t)), x(0) = x0 . According to Theorem 6.6 of Ba¸ar and Olsder (1999), the set of feedback s 1 n i strategies {(γt ∗ (·), . . . , γt ∗ (·)), t = 0, . . . , T }, where each γt ∗ is a function i , is a NFBE, if and only if there exists a sequence of functions from S to E deﬁned recursively by: Vti 1 (x) − =
ei ∈E i (5.17) 1 n ˆ max Πi x, γt ∗ (x), . . . , ei , . . . , γt ∗ (x) + Vti (fti (x, ei )) , t i with VT +1 ≡ 0, and where (i−1)∗ (i+1)∗ 1 n ˆ (x), ei , γt (x), . . . , γt ∗ (x) . (5.18) fti (x) = ft x, γt ∗ (x), . . . , γt Now, replacing γ j ∗ by φij in Equations (5.17) and (5.18), we see that the ˜ ˆ dynamics fti and fti coincide, as well as the sequences of value functions i and V i . This means that every NFBE will provide a feedback consisWt t tent system of conjectures. Conversely, if feedback consistent conjectures φij are found, then φki (x) will solve the dynamic programming equation t t (5.17) in which γ j ∗ is set to φkj (recall that in a feedbackconsistent system of conjectures, the functions φij are actually independent from i). t Therefore, such a conjecture will be a NFBE. 2 Using the same arguments, we obtain a similar result for state and controlbased conjectures. Theorem 5.2 Consider a game with state and controlbased conjectures φij : S × E → E j . The Feedbackconsistent equilibria of this game cot incide with the Nashfeedback equilibria of the game with extended state space S × E and dynamics deﬁned in Equations (5.10)–(5.12).
Let us now turn to complete conjectures of the form (5.6). As in the case of state and controlbased conjectures, using the conjectures does not allow player i to formulate a control problem, unless there are only two players. We therefore assume n = 2. The optimization problem of player i is then, with j = i,
T max
ei (·) t=0 Πi x(t), ei (t), φij (x(t), ei (t)) , t t 5. Consistent Conjectures and Dynamic Games 103 with the conjectured evolution of the state: x(t + 1) = ft (x(t), ei (t), φij (x(t), ei (t))), t x(0) = x0 . Accordingly, the optimal reaction satisﬁes the following necessary conditions (see Theorem 5.5 of Ba¸ar and Olsder (1999)): s Theorem 5.3 Consider a twoplayer game, with complete conjectures φij which are diﬀerentiable. Let ei† (t) be the conjectured optimal control t path of player i, and
xi† (t + 1) = ft (xi† (t), ei† (t), φij (xi† (t), ei† (t))), t i i xi† (0) = x0 be the conjectured optimal state path. Then there exist for each player a sequence pi (t) of costate vectors such that: pi (t) = ∂ Πi ∂φij ∂ Πi t t t + ∂x ∂ej ∂ x + (pi (t + 1)) · ∂ft ∂φij ∂ ft t + ∂x ∂ej ∂ x with pi (T + 1) = 0, where functions ft and Πi are evaluated at (xi† (t), t ei† (t), φij (xi† (t), ei† (t))) and φij is evaluated at (xi† (t), ei† (t))). Also, t t i i i for all i = j , ei† (t) ∈ arg max Πi xi† (t), ei , φij (xi† (t), ei ) t t i
ei ∈E i + (pi (t + 1)) · ft (xi† (t), ei , φij (xi† (t), ei )) . t For this type of conjectures, several remarks arise. First, the notions of consistency which can be appropriate are state consistency or control consistency. Clearly from the necessary conditions above, computing consistent equilibria will be more complicated than for statebased conjectures. Next, the ﬁrst order conditions of the maximization problem are: 0= ∂ Πi ∂φij ∂ Πi t t t + ∂ei ∂ej ∂ei + (pi (t + 1)) · ∂ ft ∂ft ∂φij t + ∂ei ∂ej ∂ei . One recognizes in the ﬁrst term of the righthand side the formula for the conjectural variations equilibrium of a static game, see Equation (5.1). Observe also that when the conjecture φij is just a function of the t state, we are back to statebased conjectures and the consistency conditions obtained with the theorem above are that of a Nashfeedback equilibrium. This was observed by Fershtman and Kamien (1985) for diﬀerential games. 104 DYNAMIC GAMES: THEORY AND APPLICATIONS Finally, if there are more than two players having complete conjectures, the decision problem at each instant in time is not a collection of individual control problems, but a game. We are not aware of results in the literature concerning this situation. 3. Consistent conjectures in diﬀerential games The model of Fershtman and Kamien (1985) is a continuoustime, ﬁnitehorizon game. With respect to the general framework of the previous section, the equation of the dynamics (5.2) becomes ˙ x(t) = ft (x(t), e(t)), and the total payoﬀ is:
T 0 Πi (x(t), e(t)) dt. t Players have conjectures of the form (5.4), or “complete conjectures” of the form (5.6). Conjectures are assumed to be the same for all players (Condition (5.16)). Classically, the deﬁnition of a dynamic game must specify the space of strategies with which players can construct their action ei (t) at time t. The information potentially available being the initial state x0 and the current state x(t), three classes of strategies are considered by Fershtman and Kamien: i) closedloop nomemory strategies, where ei (t) = ψ i (x0 , x, t), ii) feedback Nash strategies where ei (t) = ψ i (x, t), and iii) openloop strategies where ei (t) = ψ i (x0 , t). The ﬁrst concept of consistent conjectural equilibrium studied is the one of Deﬁnition 5.3 (weak controlconsistent equilibrium). The following results are then obtained: Openloop Nash equilibria are weak controlconsistent conjectural equilibria. Weak controlconsistent conjectural equilibria are closedloop nomemory equilibria. In other words, the class of weak controlconsistent conjectural equilibria is situated between openloop and closedloop nomemory equilibria. Fershtman and Kamien further deﬁne perfect conjectural equilibria as in Deﬁnition 5.4: controlconsistent equilibria. The result is then: Controlconsistent conjectural equilibria and feedback Nash equilibria coincide. Further results of the paper include the statement of the problem of calculating complete conjectural equilibria (deﬁned as Deﬁnition 5.3 5. Consistent Conjectures and Dynamic Games 105 with conjectures of the form (5.6)). The particular case of a duopoly market is studied. The price is the state variable x(t) of this model; it evolves according to a diﬀerential equation depending on the quantities (e1 (t), e2 (t)) produced by both ﬁrms. The complete conjectures have here the form: φij (x; ej ). The feedback Nash equilibrium is computed, as well as the complete conjectural equilibrium with aﬃne conjectures. The two equilibria coincide when conjectures are actually constant. When ∂φij (x; ej )/∂ej = 1, the stationary price is the monopoly price. 4. Consistent conjectures for supergames In this section, we consider the problem set in Friedman (1968) (see also Friedman (1976) and Friedman (1977), Chapter 5). The model is a discretetime, inﬁnitehorizon supergame with n players. The total payoﬀ of player i has the form:
∞ V (x0 ; e(0), e(1), . . . ) =
t=0 i t θi Πi (e(t)), for some discount factor θi , and where e(t) ∈ E is the proﬁle of strategies played at time t. Friedman assumes that players have (timeindependent) conjectures of the form ej (t + 1) = φij (e(t)), and proposes the following notion of equilibrium. Given this conjecture, player i is faced with an inﬁnitehorizon control problem, and when solving it, she hopefully ends up with a stationary feedback policy γ i : E → E i . It can be seen as a reaction function to the vector of conjectures φi . This gives the name to the equilibrium advocated in Friedman (1968): Definition 5.6 (Reaction function equilibrium) The vector of conjectures (φ1 , . . . , φn ) is a reaction function equilibrium if
γ i = φki , ∀k, ∀i. In the terminology we have introduced in Section 2.1, the conjectures are “state and controlbased” (Deﬁnition 5.1). Applying Theorem 5.2 to this special situation where the state space is reduced to a single element, we have: Theorem 5.4 Reaction function equilibria coincide with the stationary NashFeedback equilibria of the game described by the dynamics:
x(t) = e(t − 1), and the payoﬀ functions: Πi ≡ 0 and 0
t Πi (x, e) = θi −1 Πi (x), t t ≥ 1. 106 DYNAMIC GAMES: THEORY AND APPLICATIONS In the process of ﬁnding reaction function equilibria, Friedman introduces a reﬁnement. He suggests to solve the control problem with a ﬁnite horizon T , and then let T tend to inﬁnity to obtain a stationary optimal feedback control. If the ﬁnitehorizon solutions converge to a stationary feedback control, this has the eﬀect of selecting certain solutions among the possible solutions of the inﬁnitehorizon problem. Once the stationary feedbacks γ i are computed, the problem is to match them with the conjectures φji . No concrete example of such an equilibrium is known so far in the literature, except for the obvious one consisting in the repetition of the Nash equilibrium of the static game. Indeed, we have: Theorem 5.5 (The repeated static Nash equilibrium is a reaction function equilibrium.) Assume there exists a unique Nash equilibrium (eN , . . . , eN ) for the onestage (static) game. If some player i conjecn 1 tures that the other players will play the strategies eNi at each stage, then − her own optimal response is unique and is to play eN at each stage. i
Proof. Let (eN , eNi ) denote the unique Nash equilibrium of the static i − game. Since player i assumes that her opponents systematically play eNi , we have e−i (t) = eNi for all t. Therefore, her perceived optimization − − problem is:
∞ {ei (0),ei (1),... } max t θi Πi ei (t), eNi . − t=0 is the best response to eNi , the optimal control of player i is Since − ei (t) = eN for all t ≥ 0. In other words, player i should respond to her i “Nash conjecture” by playing Nash repeatedly. 2 Based on Theorem 5.4 and tools of optimal control and games, it is possible to develop the calculation of Friedman’s reaction function equilibria in the case of linearquadratic games. Even for such simple games, ﬁnding Nashfeedback equilibria usually involves the solution of algebraic Ricatti equations, which cannot always be done in closed form. However, the games we have here have a special form, due to their simpliﬁed dynamics. The detailed analysis, reported in Figui`res et al. (2004), proceeds e in several steps. First, we consider a ﬁnite time horizon game with stationary aﬃne conjectures, and we construct the optimal control of each player. We obtain that the optimal control is also aﬃne, and that its multiplicative coeﬃcients are always proportional to those of the conjecture. We deduce conditions for the existence of a consistent equilibrium in the sense of Deﬁnition 5.5, in the case of symmetric players. Those eN i 5. Consistent Conjectures and Dynamic Games 107 conditions bear on the sign of quantities involving the coeﬃcients of the conjectures and the parameters of the model. Applied to Cournot’s and Bertrand’s duopoly models, we demonstrate that the repeatedNash strategy of Theorem 5.5 is the unique reaction function equilibrium of the game. We therefore answer to Friedman’s interrogation about the multiplicity of such equilibria in the duopoly. 5. Conclusion In this paper, we have put together several concepts of consistent conjectural equilibria in a dynamic setting, collected from diﬀerent papers. This allows us to draw a number of perspectives. First, ﬁnding in which class of dynamic games the diﬀerent deﬁnitions of Section 2.1 coincide is an interesting research direction. As we have observed above, an equilibrium according to Deﬁnition 5.4 (controlconsistent) is an equilibrium according to Deﬁnition 5.2 (stateconsistent). We feel however that when conjectures are of the simple form (5.4), it may be that actions of the players are not observable, so that players should be happy with the coincidence of state paths. Stateconsistency is then the natural notion. The stronger controlconsistency is however appropriate when conjectures are of the “complete” form. We have further observed that for repeated games, Deﬁnitions 5.2 and 5.4 coincide. The problem disappears when consistency in feedback is considered, since the requirements of feedbackconsistency (Deﬁnition 5.5) imply the coincidence of conjectures and actual values for both controls and states. Indeed, if the feedback functions of diﬀerent players coincide, their conjectured state paths and control paths will coincide, since the initial state x0 is common knowledge. Another issue is that of the information available to optimizing agents. In the models of Section 2, agents do not know the payoﬀ functions of their opponents when they compute their optimal control, based on their own conjectures. Computing an equilibrium requires however the complete knowledge of the payoﬀs. This is not in accordance with the idea that players hold conjectures in order to compensate for the lack of information. On the other hand, verifying that a conjecture is consistent requires less information. Weak controlconsistency (and the weak stateconsistency that could have been deﬁned in the same spirit) is veriﬁed by the observation of the equilibrium path. A possibility in this case is to develop learning models such as in JeanMarie and Tidball (2004). The stronger stateconsistency, controlconsistency and feedbackconsistency 108 DYNAMIC GAMES: THEORY AND APPLICATIONS can be checked by computations based on the knowledge of one’s payoﬀ function and the conjectures of the opponents. Finally, we point out that other interesting dynamic game models with conjectures and/or consistency have been left out of this survey. We have already mentioned the work of Laitner (1980). Other ideas are possible: for instance, it is possible to assume as in Ba¸ar, Turnovsky s and d’Orey (1986) that players consider the game as a static conjectural variations game at each instant in time. Consistency in the sense of Bresnahan is then used. Also related is the paper of Kalai and Stanford (1985) in which a model similar to that of Friedman is analyzed. These papers illustrate the fact that some types of conjectures may lead to a multiplicity of consistent equilibria. References
Ba¸ar, T. and Olsder, G.J. (1999). Dynamic Noncooperative Game Thes ory. Second Ed., SIAM, Philadelphia. Ba¸ar, T., Turnovsky, S.J., and d’Orey, V. (1986). Optimal strategic s monetary policies in dynamic interdependent economies. Springer Lecture Notes on Economics and Management Science, 265:134–178. Bowley, A.L. (1924). The Mathematical Groundwork of Economics. Oxford University Press, Oxford. Bresnahan, T.F. (1981). Duopoly models with consistent conjectures. American Economic Review, 71(5):934–945. Cabral, L.M.B. (1995). Conjectural variations as a reduced form. Economics Letters, 49:397–402. Dockner, E.J. (1992). A dynamic theory of conjectural variations. The Journal of Industrial Economics, XL(4):377–395. Driskill, R.A. and McCaﬀerty, S. (1989). Dynamic duopoly with adjustment costs: A diﬀerential game approach. Journal of Economic Theory, 49:324–338. Fershtman, C. and Kamien, M.I. (1985). Conjectural equilibrium and strategy spaces in diﬀerential games. Optimal Control Theory and Economic Analysis, 2:569–579. Figui`res, C. (2002). Complementarity, substitutability and the stratee gic accumulation of capital. International Game Theory Review, 4(4): 371–390. Figui`res, C., JeanMarie, A., Qu´rou, N., and Tidball, M. (2004). The e e Theory of Conjectural Variations. World Scientiﬁc Publishing. Friedman, J.W. (1968). Reaction functions and the theory of duopoly. Review of Economic Studies, 35:257–272. 5. Consistent Conjectures and Dynamic Games 109 Friedman, J.W. (1976). Reaction functions as Nash equilibria. Journal of Economic Studies, 43:83–90. Friedman, J.W. (1977). Oligopoly and the Theory of Games. NorthHolland, Amsterdam. Frisch, R. (1933). Monopole, Polypole – La notion de force en ´conomie. e Nationaløkonomisk Tidsskrift, 71:241–259. (reprinted “Monopoly, Polypoly: The concept of force in the economy” International Economic Papers, 1:23–36, (1951). Itaya, J.I. and Okamura, M. (2003). Conjectural variations and voluntary public good provision in a repeated game setting. Journal of Public Economic Theory, 5(1):51–66. Itaya, J.I. and Shimomura, K. (2001). A dynamic conjectural variations model in the private provision of public goods: A diﬀerential game approach. Journal of Public Economics, 81:153–172. JeanMarie, A. and Tidball, M. (2004). Adapting behaviors through a learning process. To appear in Journal of Economic Behavior and Organization. Kalai, E. and Stanford, W. (1985). Conjectural variations strategies in accelerated Cournot games. Intl. Journal of Industrial Organization, 3:133–152. Laitner, J. (1980). Rational duopoly equilibria. The Quarterly Journal of Economics, 641–662. Myerson, R.B. (1991). Game Theory, Harvard University Press. Wildasin, D.E. (1991). Some rudimentary ‘duopolity theory’. Regional Science and Urban Economics, 21:393–421. Chapter 6 COOPERATIVE DYNAMIC GAMES WITH INCOMPLETE INFORMATION
Leon A. Petrosjan
Abstract The deﬁnition of cooperative game in characteristic function form with incomplete information on a game tree is given. The notions of optimality principle and based on it solution concepts are introduced. The new concept of “imputation distribution procedure” is deﬁned connected with the basic deﬁnitions of timeconsistency and strongly timeconsistency. Suﬃcient conditions of the existence of timeconsistent solutions are derived. For a large class of games where these conditions cannot be satisﬁed the regularization procedure is developed and new characteristic function is constructed. The “regularized” core is deﬁned and strongly timeconsistency proved. The special case of stochastic games is also investigated in details. 1. Introduction In npersons games in extensive form as in classical simultaneous game theory diﬀerent solution concepts are used. The most common approach is a noncooperative setting where as solution the Nash Equilibrium is considered. In the same time not much attention is given to the problem of timeconsistency of solution considered in each speciﬁc case. This may follow from the fact that in most cases the Nash Equilibrium turns to be timeconsistent, but also not always as it was shown in Petrosjan (1996). The problem becomes more serious when cooperation in games in extensive form is considered. Usually in cooperative settings players agree to use such strategies which maximize the sum of their payoﬀs. As a result the game then develops along the cooperative trajectory (conditionally optimal trajectory). The corresponding maximal total payoﬀ satisﬁes Bellman’s Equation and thus is timeconsistent. But the values of characteristic function for each subcoalition of players naturally did not satisfy this property along conditionally optimal trajectory. The 112 DYNAMIC GAMES: THEORY AND APPLICATIONS characteristic function plays key role in construction of solution concepts in cooperative game theory. And impossibility to satisfy Bellman’s Equation for values of characteristic function for subcoalitions implies the timeinconsistency of cooperative solution concepts. This was seen ﬁrst for nperson diﬀerential games in papers (Filar and Petrosjan (2000); Haurie (1975); Kaitala and Pohjola (1988)) and in the papers (Petrosjan (1995); Petrosjan (1993); Petrosjan and Danilov (1985)) it was purposed to introduce a special rule of distribution of the players gain under cooperative behavior over time interval in such a way that timeconsistency of the solution could be restored in given sense. In this paper we formalize the notion of timeconsistency and strongly timeconsistency for cooperative games in extensive form with incomplete information, propose the regularization method which makes possible to restore classical simultaneous solution concepts in a way they became useful in the games in extensive form. We prove theorems concerning strongly timeconsistency of regularized solutions and give a constructive method of computing of such solutions. 2. Deﬁnition of the multistage game with incomplete information in characteristic function form To deﬁne the multistage cooperative game (in characteristic function form) with incomplete information we have to deﬁne ﬁrst the multistage game in extensive form. In this deﬁnition we follow H. Kuhn (1953), with the only diﬀerence that in our deﬁnition we shall not allow chance moves, and the payoﬀs of players will be deﬁned at each vertex of the game tree. Definition 6.1 The nperson multistage game in extensive form is deﬁned by
1. Specifying the ﬁnite graph tree G = (X, F ) with initial vertex x0 referred to as the initial position of the game (here X is the set of vertexes and F : X → 2X is a pointtoset mapping, and let Fx = F (x)). 2. Partition of the set of all vertices X into n +1 sets X1 , X2 , . . ., Xn , Xn+1 — called players partition, where Xi is interpreted as the set of vertices (positions) where player i “makes a move”, i = 1, . . . , n, and Xn+1 = {x : Fx = ∅} is called the set of ﬁnal positions. 3. For each x ∈ X specifying the vector function h(x) = (h1 (x), . . . , hn (x)), hi (x) ≥ 0, i = 1, . . . , n; the vector function hi (x) is called the instantaneous payoﬀ of the ith player. 6 Cooperative Dynamic Games with Incomplete Information 113 4. Subpartition of the set Xi , i = 1, . . . , n, into nonoverlapping subsets Xij referred to as information sets of the ith player. In this case, for any position of one and the same information set the set of its subsequent vertices should contain one and the same number of vertices, i.e., for any x, y ∈ Xij Fx  = Fy  (Fx  is the number of elements of the set Fx ), and no vertex of the information set should follow another vertex of this set, i.e., if x ∈ Xij then there is no other vertex y ∈ Xij such that
2 r ˆ y ∈ Fx = {x ∪ Fx ∪ Fx ∪ · · · ∪ Fx ∪ · · · }. k 1 1 k (here Fx is deﬁned by induction Fx = Fx , Fx = F (Fx −1 ) = (6.1)
k y ∈Fx −1 Fy ). The conceptual meaning of the informational partition is that when a player makes his move in position x ∈ Xi in terms of incomplete information he does not know the position x itself, but knows that this position is in certain set Xij ⊂ Xi (x ∈ Xij ). Some restrictions are imposed by condition 4 on the players information sets. the requirement Fx  = Fy  for any two vertices of the same information set are introduced to make vertices x, y ∈ Xij indistinguishable. In fact, with Fx  = Fy  player i could distinguish among the vertices x, y ∈ Xij by the number of arcs emanating therefrom. If one information set could have two vertices x, y ˆ such that y ∈ Fx this would mean that a play of game can intersect twice an information set, but this in turn is equivalent to the fact that player j has no memory of the number of the moves he made before given stage which can hardly be concerned in the actual play of the game. Denote the multistage game in extensive form starting from the vertex x0 ∈ X by Γ(x0 ). For purpose of further discussion we need to introduce some additional notions. Definition 6.2 The arcs incidental with x, i.e., {(x, y ) : y ∈ Fx }, are called alternatives at the vertex x ∈ X .
If Fx  = k then there are k alternatives at vertex x. We assume that if at the vertex x there are k alternatives then they are designated by integers 1, . . . , k with the vertex x bypassed in clockwise sense. The ﬁrst alternative at the vertex x0 is indicated in an arbitrary way. If some vertex x = x0 is bypassed in a clockwise sense, then an alternative which − follows a single arc (Fx 1 , x) entering into x is called the ﬁrst alternative at x. Suppose that in the game all alternatives are enumerated as above. Let Ak be the set of all vertices x ∈ X having exactly k alternatives, i.e., Ak = {x : Fx  = k }. 114 DYNAMIC GAMES: THEORY AND APPLICATIONS Let Ii = {Xij : Xij ⊂ Xi } be the set of all information sets of player i. By deﬁnition the pure strategy of player i means the function ui mapping Ii into the set of positive numbers so that ui (Xij ) ≤ k if Xij ⊂ Ak . We say that the strategy ui chooses alternative l in position x ∈ Xij if ui (Xij ) = l, where l is the number of the alternative. One may see that to each ntuple in pure strategies u(·) = (u1 (·), . . . , un (·)) uniquely corresponds a path (trajectory) w = x0 , . . . , xl , xl ∈ Xn+1 , and hence the payoﬀ Ki (x0 ; u1 (·), . . . , un (·)) = hi ≥ 0.
l k=0 hi (xk ), (6.2) (6.3) Here xl ∈ Xn+1 is a ﬁnal position (vertex) and w = {x0 , x1 , . . . , xl } is the only path leading (F is a tree) from x0 to xl . The condition that the position (vertex) y belongs to w will be written as y ∈ w. Consider the cooperative form of the game Γ(x0 ). In this formalization we suppose, that the players before starting the game agree to play u∗ , . . . , u∗ such that the corresponding path (trajectory) w∗ = n 1 {x∗ , . . . , x∗ , . . . , x∗ } (x∗ ∈ Xn+1 ) maximizes the sum of the payoﬀs 0 k l l
n n max
u i=1 Ki (x0 ; u1 (·), . . . , un (·)) =
i=1 n Ki (x0 ; u∗ (·), . . . , u∗ (·)) 1 n
l k=0 =
i=1 hi (x∗ ) = v (N ; x0 ), k where x0 is the initial vertex of the game Γ(x0 ) and N is the set of all players in Γ(x0 ). The trajectory w∗ is called conditionally optimal. To deﬁne the cooperative game one has to introduce the characteristic function. The values of characteristic function for each coalition are deﬁned in a classical way as values of associated zerosum games. Consider a zerosum game deﬁned over the structure of the game Γ(x0 ) between the coalition S as ﬁrst player and the coalition N \ S as second player, and suppose that the payoﬀ of S is equal to the sum of payoﬀs of players from S . Denote this game as ΓS (x0 ). Suppose that v (S ; x0 ) is the value of such game. The characteristic function is deﬁned for each S ⊂ N as value v (S ; x0 ) of ΓS (x0 ). From the deﬁnition of v (S ; x0 ) it follows that v (S ; x0 ) is superadditive (see Owen (1968)). It follows from the superadditivity condition that it is advantageous for the players to form a maximal coalition N and obtain a maximal total payoﬀ v (N ; x0 ) that is possible in the game. Purposefully, the quantity v (S ; x0 ) (S = N ) is equal to a maximal guaranteed payoﬀ of the coalition S obtained 6 Cooperative Dynamic Games with Incomplete Information 115 irrespective of the behavior of other players, even the other form a coalition N \ S against S . Note that the positiveness of payoﬀ functions hi , i = 1, . . . , n implies that of characteristic function. From the superadditivity of v it follows that v (S ; x0 ) ≥ v (S ; x0 ) for any S, S ⊂ N such that S ⊂ S , i.e., the superadditivity of the function v in S implies that this function is monotone in S . The pair N , v (·, x0 ) , where N is the set of players, and v the characteristic function, is called the cooperative game with incomplete information in the form of characteristic function v . For short, it will be denoted by Γv (x0 ). Various methods for “equitable” allocation of the total proﬁt among players are treated as solutions in cooperative games. The set of such allocations satisfying an optimality principle is called a solution of the cooperative game (in the sense of this optimality principle). We will now deﬁne solutions of the game Γv (N ; x0 ). Denote by ξi a share of the player i ∈ N in the total gain v (N ; x0 ). Definition 6.3 The vector ξ = (ξ1 , . . . , ξn ), whose components satisfy the conditions:
1. ξi ≥ v ({i}; x0 ), 2.
i∈N ξi i ∈ N, = v (N ; x0 ), is called an imputation in the game Γv (x0 ). Denote the set of all imputations in Γv (x0 ) by Lv (x0 ). Under the solution of Γv (x0 ) we will understand a subset Wv (x0 ) ⊂ Lv (x0 ) of imputation set which satisﬁes additional “optimality” conditions. The equity of the allocation ξ = (ξ1 , . . . , ξn ) representing an imputation is that each player receives at least maximal guaranteed payoﬀ and the entire maximal payoﬀ is distributed evenly without a remainder. 3. Principle of timeconsistency (dynamic stability) Formalization of the notion of optimal behavior constitutes one of fundamental problems in the theory of nperson games. At present, for the various classes of games diﬀerent solution concepts are constructed. Recall that the players’ behavior (strategies in noncooperative games or imputations in cooperative games) satisfying some given optimality 116 DYNAMIC GAMES: THEORY AND APPLICATIONS principle is called a solution of the game in the sense of this principle and must possess two properties. On the one hand, it must be feasible under conditions of the game where it is applied. On the other hand, it must adequately reﬂect the conceptual notion of optimality providing special features of the class of games for which it is deﬁned. In dynamic games, one more requirement is naturally added to the mentioned requirements, viz. the purposefulness and feasibility of an optimality principle are to be preserved throughout the game. This requirement is called the timeconsistency of a solution of the game (dynamic stability). The timeconsistency of a solution of dynamic game is the property that, when the game proceeds along a “conditionally optimal” trajectory, at each instant of time the players are to be guided by the same optimality principle, and hence do not have any ground for deviation from the previously adopted “optimal” behavior throughout the game. When the timeconsistency is betrayed, at some instant of time there are conditions under which the continuation of the initial behavior becomes nonoptimal and hence initially chosen solution proves to be unfeasible. Assume that at the start of the game the players adopt an optimality principle and construct a solution based on it (an imputation set satisfying the chosen principle of optimality, say the core, nucleolus, N M –solution, etc.). From the deﬁnition of cooperative game it follows that the evolution of the game is to be along the trajectory providing a maximal total payoﬀ for the players. When moving along this “conditionally optimal” trajectory, the players pass through subgames with current initial states and current duration. In due course, not only the conditions of the game and the players opportunities, but even the players’ interests may change. Therefore, at some stage (at some vertex y on the conditionally optimal trajectory) the initially optimal solution of the current game may not exist or satisfy players at this stage. Then, at this stage (starting from vertex y ) players will have no ground to keep to the initially chosen “conditionally optimal” trajectory. The latter exactly means the timeinconsistency of the chosen optimality principle and, as a result, the instability of the motion itself. We now focus our attention on timeconsistent solutions in the cooperative games with incomplete information. Let an optimality principle be chosen in the game Γv (x0 ). The solution of this game constructed in the initial state x0 based on the chosen principle of optimality is denoted by Wv (x0 ). The set Wv (x0 ) is a subset of the imputation set Lv (x0 ) in the game Γv (x0 ). Assume that Wv (x0 ) = ∅. Let w∗ = {x∗ , . . . , x∗ , . . . , x∗ } be the conditionally optimal trajectory. 0 k l 6 Cooperative Dynamic Games with Incomplete Information 117 The deﬁnition suggests that along the conditionally optimal trajectory players obtain the largest total payoﬀ. For further consideration an important assumption is needed. Assumption A. The ntuple u∗ (·) = (u∗ (·), . . . , u∗ (·)) and the corn 1 responding trajectory w∗ = {x∗ , . . . , x∗ , . . . , x∗ } are common knowledge 0 k l in Γv (x0 ). This assumption means that being at vertex x∗ ∈ Xi player i knows k that he is in x∗ . This changes the informational structures of subgames k Γ(x∗ ) along w∗ in the following natural way. k Denote by G(x∗ ) the subtree of tree G corresponding to the subgame k Γ(x∗ ) with initial vertex x∗ . The information sets in Γ(x∗ ) coincide k k k with the intersections G(x∗ ) ∩ Xij = Xij (k ) for all i, j where Xij is the k information set in Γ(x0 ). The informational structure of Γ(x∗ ) consists k from the sets Xij (k ), for all i, j . As before we can deﬁne the current cooperative subgame Γv (x∗ ) of k the subgame Γ(x∗ ). k We will now consider the behavior of the set Wv (x0 ) along the conditionally optimal trajectory w∗ . Towards this end, in each current state x∗ current subgame Γv (x∗ ) is deﬁned as follows. In the state x∗ , we k k k deﬁne the characteristic function v (S ; x∗ ) as the value of the zerosum k game ΓS (x∗ ) between coalitions S and N \ S from the initial state x∗ k k (as it was done already for the game Γ(x0 )). The current cooperative subgame Γv (x∗ ) is deﬁned as k N , v (S, x∗ ) . The imputation set in the game Γv (x∗ ) is of the form: k k Lv (x∗ ) = k where v (N ; x∗ ) = v (N ; x∗ ) − 0 k The quantity
k−1 m=0 i∈N ξ ∈ Rn  ξi ≥ v ({i}; x∗ ), i = 1, . . . , n; k
i∈N k−1 m=0 i∈N ξi = v (N ; x∗ ) , k hi (x∗ ). m hi (x∗ ) m is interpreted as the total gain of the players on the ﬁrst k − 1 steps when the motion is carried out along the trajectory w∗ . Consider the family of current games {Γv (x∗ ) = N , v (S ; x∗ ) , 0 ≤ k ≤ l} , k k 118 DYNAMIC GAMES: THEORY AND APPLICATIONS determined along the conditionally optimal trajectory w∗ and their solutions Wv (x∗ ) ⊂ Lv (x∗ ) generated by the same principle of optimality k k as the initial solution Wv (x∗ ). 0 It is obvious that the set Wv (x∗ ) is a solution of terminal game Γv (x∗ ) l l and is composed of the only imputation h(x∗ ) = {hi (x∗ ), i = 1, . . . , n}, l l where hi (x∗ ) is the terminal part of player i’s payoﬀ along the trajecl tory w∗ . 4. Timeconsistency of the solution Let the conditionally optimal trajectory w∗ be such that Wv (x∗ ) = ∅, k 0 ≤ k ≤ l. If this condition is not satisﬁed, it is impossible for players to adhere to the chosen principle of optimality, since at the very ﬁrst stage k , when Wv (x∗ ) = ∅, the players have no possibility to follow this k principle. Assume that in the initial state x0 the players agree upon the imputation ξ 0 ∈ Wv (x0 ). This means that in the state x0 the players agree upon such an allocation of the total maximal gain that (when the 0 game terminates at x∗ ) the share of ith player is equal to ξi , i.e., the l 0 . Suppose the player i’s payoﬀ (his ith component of the imputation ξ share) on the ﬁrst k stages x∗ , x∗ , . . . , x∗ −1 is ξi (x∗ −1 ). Then, on the 0 1 k k remaining stages x∗ , . . . , x∗ according to the ξ 0 he has to receive the k l k 0 gain ηi = ξi − ξi (x∗ −1 ). For the original agreement (the imputation k ξ 0 ) to remain in force at the instant k , it is essential that the vector k k η k = (η1 , . . . , ηn ) belongs to the set Wv (x∗ ), i.e., a solution of the current k subgame Γv (x∗ ). If such a condition is satisﬁed at each stage along the k trajectory w∗ , then the imputation ξ 0 is realized. Such is the conceptual meaning of the timeconsistency of the imputation. Along the trajectory w∗ , the coalition N obtains the payoﬀ v (N ; x∗ ) = k
i∈N l m=k hi (x∗ ) . m Then the diﬀerence v (N ; x0 ) − v (N ; x∗ ) = k
k−1 m=0 i∈N hi (x∗ ) m is equal to the payoﬀ the coalition N obtains on the ﬁrst k stages (0, . . . , k − 1). The share of the ith player in this payoﬀ, considering the transferability of payoﬀs, may be represented as
k−1 n γi (k − 1) =
m=0 βi (m)
i=1 hi (x∗ ) = γi (x∗ −1 , β ), m k (6.4) 6 Cooperative Dynamic Games with Incomplete Information 119 where βi (m) satisﬁes the condition
n βi (m) = 1, βi (m) ≥ 0, m = 0, 1, . . . , l,
i=1 i ∈ N. (6.5) From (6.4) we necessarily get
n γi (k ) − γi (k − 1) = βi (k )
i=1 hi (x∗ ). k This quantity may be interpreted as an instantaneous gain of the player i at the stage k . Hence it is clear the vector β (k ) = (β1 (k ), . . . , βn (k )) prescribes distribution of the total gain among the members of coalition N . By properly choosing β (k ), the players can ensure the desirable outcome, i.e., to regulate the players’ gain receipt with respect to time, so that at each stage k there will be no objection against realization of original agreement (the imputation ξ 0 ). Definition 6.4 The imputation ξ 0 ∈ Wv (x0 ) is called timeconsistent in the game Γv (x0 ) if the following conditions are satisﬁed:
1. there exists a conditionally optimal trajectory w∗ = {x∗ , . . . , x∗ , 0 k . . . , x∗ } along which Wv (x∗ ) = ∅, k = 0, 1, . . . , l, l k 2. there exists such vectors β (k ) = (β1 (k ), . . . , βn (k )) that for each n k = 0, 1, . . . , l, βi (k ) ≥ 0, i=1 βi (k ) = 1 and
l ξ0 ∈
k=0 [γ (x∗ −1 , β ) ⊕ Wv (x∗ )], k k (6.6) where γ (x∗ , β ) = (γ1 (x∗ , β ), . . . , γn (x∗ , β )), and Wv (x∗ ) is a solution of k k k k the current game Γv (x∗ ). k The sum ⊕ in the above deﬁnition has the following meaning: for η ∈ Rn and A ⊂ Rn η ⊕ A = {η + a  a ∈ A}. The game Γv (x0 ) has a timeconsistent solution Wv (x0 ) if all the imputations ξ ∈ Wv (x0 ) are timeconsistent. The conditionally optimal trajectory along which there exists a timeconsistent solution of the game Γv (x0 ) is called an optimal trajectory. The timeconsistent imputation ξ 0 ∈ Wv (x0 ) may be realized as follows. From (6.6) at any stage k we have ξ 0 ∈ [γ (x∗ −1 , β ) ⊕ Wv (x∗ )], k k (6.7) 120
where DYNAMIC GAMES: THEORY AND APPLICATIONS
k−1 γ (x∗ −1 , β ) = k β (m)
m=0 i∈N hi (x∗ ) m is the payoﬀ vector on the ﬁrst k stages, the player i’s share in the gain on the same interval being γi (x∗ −1 , β ) = k
k−1 βi (m)
m=0 i∈N hi (x∗ ). m When the game proceeds along the optimal trajectory, the players on the ﬁrst k stages share the total gain
k−1 m=0 i∈N hi (x∗ ) m among themselves, ξ 0 − γ (x∗ −1 , β ) ∈ Wy (x∗ ) k k (6.8) so that the inclusion (6.8) is satisﬁed. Furthermore, (6.8) implies the existence of such vector ξ k ∈ Wv (x∗ ) that ξ 0 = γ (x∗ −1 , β ) + ξ k . That is, k k in the description of the above method of choosing β (m), the vector of the gains to be obtained by the players at the remaining stages of the game ξ k = ξ 0 − γ (x∗ −1 , β ) = k belongs to the set Wv (x∗ ). k We also have
l l m=k βi (m)h(x∗ ) m ξ0 =
m=0 βi (m)
i∈N h(x∗ ). m The vector αi (m) = βi (m)
i∈N h(x∗ ), m i ∈ N, m = 0, 1, . . . , l, is called the imputation distribution procedure (IDP) In general, it is fairly easy to see that there may exist an inﬁnite number of vectors β (m) satisfying conditions (6.4), (6.5). Therefore the sharing method proposed here seems to lack true uniqueness. However, for any vector β (m) satisfying conditions (6.4), (6.5) at each stage k 6 Cooperative Dynamic Games with Incomplete Information 121 the players are guided by the imputation ξ k ∈ Wv (x∗ ) and the same k optimality principle throughout the game, and hence have no reason to violate the previously achieved agreement. Let us make the following additional assumption Assumption B. The vectors ξ k ∈ Wv (x∗ ) may be chosen as monotone k nonincreasing sequence of the argument k . Show that by properly choosing β (m) we may always ensure timeconsistency of the imputation ξ 0 ∈ Wv (x0 ) under assumption B. and the ﬁrst condition of deﬁnition (i.e., along the conditionally optimal trajectory at each stage k Wv (x∗ ) = ∅). k Choose ξ k ∈ Wv (x∗ ) to be a monotone nonincreasing sequence. Conk struct the diﬀerence ξ 0 − ξ k = γ (k − 1) then we get ξ k + γ (k − 1) ∈ Wv (x∗ ). k Let β (k ) = (β1 (k ), . . . , βn (k )) be vectors satisfying conditions (6.4), (6.5). Instead of writing γ (x∗ , β ) we will write for simplicity γ (k ). k Rewriting (6.4) in vector form we get
k−1 β (m)
m=0 i∈N hi (x∗ ) = γ (k − 1) m and we get the following expression for β (k ) β (k ) = ξ k − ξ k−1 γ (k ) − γ (k − 1) =− ∗ ∗ ≥ 0. i∈N hi (xk ) i∈N hi (xk ) ξ 0 = γ (k ) + ξ k . (6.9)
Here the last expression follows from equality Theorem 6.1 If the assumption B is satisﬁed and
W (x∗ ) = ∅, k = 0, 1, . . . , l k solution W (x0 ) is timeconsistent. Theoretically, the main problem is to study conditions imposed on the vector function β (m) in order to ensure timeconsistency of speciﬁc forms of solutions Wv (x0 ) in various classes of games. Consider the new concept of strongly timeconsistency and deﬁne timeconsistent solutions for cooperative games with terminal payoﬀs. (6.10) 5. Strongly timeconsistent solutions For the time consistent imputation ξ 0 ∈ Wv (x0 ), as follows from the deﬁnition, there exists sequence of vectors β (m) and imputation 122 DYNAMIC GAMES: THEORY AND APPLICATIONS ξ k (generally nonunique) from the solution Wv (x∗ ) of the current game k Γv (x∗ ) that ξ 0 = γ (x∗ −1 , β ) + ξ k . The conditions of timeconsistency k k do not aﬀect the imputation from the set Wv (x∗ ) which fail to satk isfy this equation. Furthermore, of interest is the case when any imputation from current solution Wv (x∗ ) may provide a “good” continuak tion for the original agreement, i.e., for a timeconsistent imputation ξ 0 ∈ Wv (x0 ) at any stage k and for every ξ k ∈ Wv (x∗ ) the condik tion γ (x∗ −1 , β ) + ξ k ∈ Wv (x0 ), where γ (x∗ , β ) = ξ 0 , be satisﬁed. By k l slightly strengthening this requirement, we obtain a qualitatively new timeconsistency concept of the solution Wv (x0 ) of the game Γv (x0 ) and call it a strongly timeconsistency. Definition 6.5 The imputation ξ 0 ∈ Wv (x0 ) is called strongly timeconsistent (STC) in the game Γv (x0 ), if the following conditions are satisﬁed:
1. the imputation ξ 0 is timeconsistent; 2. for any 0 ≤ q ≤ r ≤ l and β 0 corresponding to the imputation ξ 0 we have, γ (x∗ , β 0 ) ⊕ Wv (x∗ ) ⊂ γ (x∗ , β 0 ) ⊕ Wv (x∗ ). r r q r (6.11) The game Γv (x0 ) has a strongly timeconsistent solution Wv (x0 ) if all the imputations from Wv (x0 ) are strongly timeconsistent. 6. Terminal payoﬀs In (6.2) let hi (xk ) ≡ 0, i = 1, . . . , n, k = 1, . . . , l − 1. The cooperative game with terminal payoﬀs is denoted by the same symbol Γv (x0 ). In such games the payoﬀs are payed when the game terminates (at terminal vertex xl ∈ Xn+1 ). Theorem 6.2 In the cooperative game Γv (x0 ) with terminal payoﬀs hi (xl ), i = 1, . . . , n, only the vector h(x∗ ) = {hi (x∗ ), i = 1, . . . , n} whose l l components are equal to the players payoﬀs at the terminal point of the conditionally optimal trajectory may be timeconsistent.
Proof. It follows from the timeconsistency of the imputation ξ 0 ∈ Wv (x0 ) that ξ0 ∈ Wv (x∗ ). k But since the current game Lv (x∗ ) = Wv (x∗ ) = h(x∗ ). Hence l l l
0≤k≤l 0≤k≤l Γv (x∗ ) l is of zero duration, then therein Wv (x∗ ) = h(x∗ ), k l 6 Cooperative Dynamic Games with Incomplete Information 123
2 i.e., ξ 0 = h(x∗ ) and there are no other imputations. l Theorem 6.3 For the existence of the timeconsistent solution in the game with terminal payoﬀ it is necessary and suﬃcient that for all 0 ≤ k≤l h(x∗ ) ∈ Wv (x∗ ), l k
where h(x∗ ) is the players payoﬀ vector at the terminal point of the l conditionally optimal trajectory w∗ = {x∗ , . . . , x∗ , . . . , x∗ }, with Wv (x∗ ), 0 k l k 0 ≤ k ≤ l being the solutions of the current games along the conditionally optimal trajectory generated by the chosen principle of optimality. This theorem is a corollary of the previous one. Thus, if in the game with terminal payoﬀs there is a time consistent imputation, then the players in the initial state x0 have to agree upon realization of the vector (imputation) h(x∗ ) ∈ Wv (x0 ) and, with the l motion along the optimal trajectory w∗ = {x∗ , . . . , x∗ , . . . , x∗ }, at each 0 k l stage 0 ≤ k ≤ l this imputation h(x∗ ) belongs to the solution of the l current games Γv (x∗ ). k As the theorem shows, in the game with terminal payoﬀs only a unique imputation from the set Wv (x0 ) may be timeconsistent. Which is a highly improbable event since this means,that imputation h(x∗ ) belongs l to the solutions of all subgames along the conditionally optimal trajectory. Therefore, in such games there is no point in discussing both the timeconsistency of the solution Wv (x0 ) as a whole and its strongly timeconsistency. 7. Regularization For some economic applications it is necessary that the instantaneous gain of player i at the stage k , which by properly choosing β (k ) regulates ith player’s gain receipt with respect to time βi (k )
i∈N hi (x∗ ) = αi (k ) k be nonnegative (IDP, αi ≥ 0). Unfortunately this condition cannot be always guaranteed. In the same time we shall purpose a new characteristic function (c. f.) based on classical one deﬁned earlier, such that solution deﬁned in games with this new c. f. would be strongly timeconsistent and would guarantee nonnegative instantaneous gain of player i at each stage k . Let v (S ; x∗ ) S ⊂ N be the c. f. deﬁned in subgame Γ(x∗ ) in Section 2 k k using classical maxmin approach. 124
w∗ DYNAMIC GAMES: THEORY AND APPLICATIONS For the function V (N ; x∗ ) (S = N ) the Bellman’s equation along k = {x∗ , . . . , x∗ , . . . , x∗ } is satisﬁed, i.e., 0 k l
k−1 n V (N ; x0 ) =
m=0 i=1 hi (x∗ ) + V (N ; x∗ ). m k (6.12) Deﬁne the new “regularized” function v (S ; x0 ), S ⊂ N by formula
l v (S ; x0 ) =
m=0 v (S ; x∗ ) m n ∗ i=1 hi (xm ) v (N ; x∗ ) m (6.13) And in the same manner for 0 ≤ k ≤ l v (S ; x∗ ) k
l =
m=k v (S, x∗ ) m n ∗ i=1 hi (xm ) v (N ; x∗ ) m (6.14) It can be proved that v is superadditive and v (N ; x∗ ) = v (N ; x∗ ) k k Denote the set of imputations deﬁned by characteristic functions v (S ; x∗ ), v (S ; x∗ ), k = 0, 1, . . . , l by L(x∗ ) and L(x∗ ) correspondingly. k k k k Let ξ k ∈ L(x∗ ) be a selector, 0 ≤ k ≤ l, deﬁne k
l ξ=
k=0 k l ξk n ∗ i=1 hi (xk ) , v (N ; x∗ ) k n ∗ i=1 hi (xm ) . v (N ; x∗ ) m (6.15) ξ=
m=k ξm (6.16) Definition 6.6 The set L(x0 ) consists of vectors deﬁned by (6.15) for all possible selectors ξ k , 0 ≤ k ≤ l with values in L(x∗ ). k
Let ξ ∈ L(x0 ) and the functions αi (k ), i = 1, . . . , n, 0 ≤ k ≤ l satisfy the condition
l αi (k ) = ξ i ,
k=0 αi ≥ 0. (6.17) The vector function α(k ) = {αi (k )} deﬁned by the formula (6.17) is called “imputation distribution procedure” (IDP) (see Section 4). Deﬁne
k−1 αi (m) = ξ i (k − 1),
m=0 i = 1, . . . , n. 6 Cooperative Dynamic Games with Incomplete Information 125 The following formula connects αi and βi (see Section 4) αi (k ) = βi (k )
i∈N hi (x∗ ). k Let C (x0 ) ⊂ L(x0 ) be any of the known classical optimality principles from the cooperative game theory (core, nucleolus, N M solution, Shapley value or any other OP ). Consider C (x0 ) as an optimality principle in Γ(x0 ). In the same manner let C (x∗ ) be an optimality principle in k Γ(x∗ ), 0 ≤ k ≤ l. k The ST C of the optimality principle means that if an imputation ξ ∈ C (x0 ) and an IDP α(k ) = {αi (k )} of ξ are selected, then after getting by the players, on the ﬁrst k stages, the amount
k−1 ξ i (k − 1) =
m=0 αi (m), i = 1, . . . , n, the optimal income (in the sense of the optimality principle C (x∗ )) on the k last l − k stages in the subgame Γ(x∗ ) together with ξ (k − 1) constitutes k the imputation belonging to the OP in the original game Γ(x0 ). The condition is stronger than timeconsistency, which means only that the part of the previously considered “optimal” imputation belongs to the solution in the corresponding current subgame Γ(x∗ −1 ). k Suppose C (x0 ) = L(x0 ) and C (x∗ ) = L(x∗ ), then k k L(x0 ) ⊃ ξ (k − 1) ⊕ L(x∗ ) k for all 0 ≤ k ≤ l and this implies that the set of all imputations L(x0 ) if considered as solution in Γ(x0 ) is strongly time consistent (here a ⊕ B , a ∈ Rn , B ⊂ Rn is the set of vectors a + b, b ∈ B ). Suppose that the set C (x0 ) consists of unique imputation — the Shapley value. In this case from time consistency the strong timeconsistency follows immediately. Suppose now that C (x0 ) ⊂ L(x0 ), C (x∗ ) ⊂ L(x∗ ), 0 ≤ k ≤ l are cores k k of Γ(x0 ) and correspondingly of subgames Γ(x∗ ). k We suppose that the sets C (x∗ ), 0 ≤ k ≤ l are nonempty. Let C (x0 ) k and C (x∗ ), 0 ≤ k ≤ l, be the sets of all possible vectors ξ, ξ k from (6.15), k (6.16) and ξ k ∈ C (x∗ ), 0 ≤ k ≤ l. And let C (x0 ) and C (x∗ ), 0 ≤ k ≤ l, k k be cores of Γ(x0 ), Γ(x∗ ) deﬁned for c. f. v (S ; x0 ), v (S ; x∗ ). k k Proposition 6.1 The following inclusions hold
C (x0 ) ⊂ C (x0 ) (6.18) 126 DYNAMIC GAMES: THEORY AND APPLICATIONS C (x∗ ) ⊂ C (x∗ ), 0 ≤ k ≤ l. k k (6.19) Proof. The necessary and suﬃcient condition for imputation ξ belong to the core C (x0 ) is the condition ξ i ≥ v (S ; x0 ),
i∈S S ⊂ N. If ξ ∈ C (x0 ), then
l ξ=
m=0 ξm n ∗ i=1 hi (xm ) , v (N ; x∗ ) m where ξ m ∈ C (x∗ ). Thus m
m ξi ≥ v (S ; x∗ ), S ⊂ N, 0 ≤ m ≤ l. m i∈S And we get
l ¯ ξi =
i∈S m=0 i∈S l ξm ≥
m=0 n ∗ i=1 hi (xm ) v (N ; x∗ ) m n ∗ i=1 hi (xm ) v (N ; x∗ ) m v (S ; x∗ ) m = v (S ; x0 ). the inclusion (6.18) is proved similarly. Deﬁne a new solution in Γ(x0 ) as C (x0 ) which we will call “regularized” subcore. C (x0 ) is always timeconsistent and strongly timeconsistent
k−1 C (x0 ) ⊃
m=0 ξ m n ∗ i=1 hi (xm ) v (N ; x∗ ) m ⊕ C (x∗ ), 0 ≤ k ≤ m. m Here under a ⊕ A, where a ∈ Rn , A ⊂ Rn the set of all vector a + b, b ∈ A is understood. The quantity αi = ξ m is IDP and is nonnegative.
n ∗ i=1 hi (xm ) v (N ; x∗ ) m ≥0 2 6 Cooperative Dynamic Games with Incomplete Information 127 8. Cooperative stochastic games Stochastic games (Shapley (1953b)) constitute a special subclass of extensive form games, but our previous construction cannot be used to create the cooperative theory for such games, since we considered only games in extensive form (and incomplete information) which do not contain chance moves. But chance move play an an essential role in stochastic games. Although the theory is very close to the discussed in previous sections one cannot derive from it immediately the results, for cooperative stochastic games, and we shall provide here the corresponding investigation in details. 8.1 Cooperative game Consider a ﬁnite graph tree G = (Z, L) where Z is the set of all vertexes and L : Z → 2Z point to set mapping (Lz = L(z ) ⊂ Z, z ∈ Z ). In our setting each vertex z ∈ Z is considered as an nperson simultaneous (one stage) game
z z z z Γ(z ) = N ; X1 , . . . , Xn ; K1 , . . . , Kn , where N = {1, . . . , n} is the set of players which is the same for all z ∈ Z , Xiz — the set of strategies of player i ∈ N , Kiz (xz , . . . , xz ) (we n 1 suppose that Kiz ≥ 0) is the payoﬀ of player i (i ∈ N, xz ∈ Xiz ). The i ntuple xz = (xz , . . . , xz ) is called situation in the game Γ(z ). The game n 1 Γ(z ) is called a stage game. For each z ∈ Z the transition probabilities p(z, y ; xz , . . . , xz ) = p(z, y ; xz ) ≥ 0, 1 n p(z, y ; xz ) = 1
y ∈Lz are deﬁned. p(z, y ; xz ) is the probability that the game Γ(y ), y ∈ Lz , will be played next after the game Γ(z ) under the condition that in Γ(z ) the situation xz = (xz , . . . , xz ) was realized. n 1 Also p(z, y ; xz ) ≡ 0 if Lz = ∅. Suppose that in the game the path z0 , z1 , . . . , zl (Lzl = ∅) is realized. Then the payoﬀ of player i ∈ N is deﬁned as
l Ki (z0 ) =
j =0 Ki j (x1j , . . . , xnj ) =
j =0 z z z l Ki j (xzj ). z 128 DYNAMIC GAMES: THEORY AND APPLICATIONS But since the transition from one stage game to the other has stochastic character, one has to consider the mathematical expectation of player’s payoﬀ Ei (z0 ) = exp Ki (z0 ). The following formula holds Ei (z0 ) = Kiz0 (xz0 ) + p(z0 , y ; xz0 )Ei (y )
y ∈Lz0 (6.20) where Ei (y ) is the mathematical expectation of player ith payoﬀ in the stochastic subgame starting from the stage game Γ(y ), y ∈ Lz0 . The strategy πi (·) of player i ∈ N is a mapping which for each stage game Γ(y ) determines which local strategy xy in this stage game is to i be selected. Thus πi (y ) = xy for y ∈ Z . i We shall denote the described stochastic game as G(z0 ). Denote by G(z ) any subgame of G(z0 ) starting from the stage game Γ(z ) (played on a subgraph of the graph G starting from vertex z ∈ Z ). If πi (·) is a strategy of player i ∈ N in G(z0 ), then the trace of this y strategy πi (·) deﬁned on a subtree G(y ) of G is a strategy in a subgame G(y ) of the game G(z0 ). The following version of (6.20) holds for a subgame G(z ) (for the mathematical expectation of player ith payoﬀ in G(z )) Ei (z ) = Kiz (xz ) +
y ∈Lz p(z, y ; xz )Ei (y ). As mixed strategies in G(z0 ) we consider behavior strategies (Kuhn (1953)). Denote them by qi (·), i ∈ N , and the corresponding situation as q N (·) = (q1 (·), . . . , qn (·)). Here qi (z ) for each z ∈ Z is a mixed strategy of player i in a stage game Γ(z ). π ¯ Denote by π N (·) = (¯1 (·), . . . , πn (·)) the ntuple of pure strategies in ¯ G(z0 ) which maximizes the sum of expected players’ payoﬀs (cooperative solution). Denote this maximal sum by V (z0 ) V (z0 ) = max E (z0 ) = max
i∈N Ei (z0 ) . It can be easily seen that the maximization of the sum of the expected payoﬀs of players over the set of behavior strategies does not exceed V (z0 ). 6 Cooperative Dynamic Games with Incomplete Information 129 In the same way we can deﬁne then cooperative ntuple of strategies for any subgame G(z ), z ∈ Z , starting from the stage game Γ(z ). From Bellman’s optimality principle it follows that each of these ntuples is a trace of π N (·) in the subgame Γ(z ). The following Bellman equation ¯ holds (Bellman (1957)) V (z ) = max z z
x ∈X i i i∈N Kiz (xz ) + i
i∈N y ∈Lz p(z, y ; xz )V (y ) (6.21) =
i∈N Kiz (¯z ) + x
y ∈Lz p(z, y ; xz )V (y ) ¯ with the initial condition V (z ) =
z xz ∈Xi i∈N i max Kiz (xz ), z ∈ {z : Lz = ∅}.
i∈N (6.22) The maximizing ntuple π N (·) = (¯1 (·), . . . , πn (·)) deﬁnes the probabil¯ π ¯ ˆ ity measure over the game tree G(z0 ). Consider a subtree G(z0 ) of G(z0 ) which consists of paths in G(z0 ) having the positive probability under ˆ the measure generated by π N (·). We shall call G(z0 ) cooperative subtree ¯ ˆ (z0 ) shall denote by CZ ⊂ Z . and the vertexes in G For each z ∈ CZ deﬁne a zerosum game over the structure of the graph G(z ) between the coalition S ⊂ N as maximizing player and coalition N \ S as minimizing. Let V (S, z ) be the value of this game in behavior strategies (the existence follows from Kuhn (1953)). Thus ¯ for each subgame G(z ), z ∈ CZ , we can deﬁne a characteristic function V (S, z ),S ⊂ N , with V (N, z ) = V (z ) deﬁned by (6.21), (6.22). Consider the cooperative version G(z ), z ∈ Z , of a subgame G(z ) with characteristic function V (S, z ). Let I (z ) be the imputation set in G(z ) I (z ) = αz :
i∈N z z αi = V (z ) = V (N, z ), αi ≥ V ({i}, z ) . (6.23) As solution in G(z ) we can understand any given subset C (z ) ⊂ I (z ). This can be any of classical cooperative solution (nucleous, core, NMsolution, Shapley Value). For simplicity (in what follows) we suppose that C (z ) is a Shapley Value C (z ) = Sh(z ) = {Sh1 (z ), . . . , Shn (z )} ⊂ I (z ) but all conclusions can be automatically applied for other cooperative solution concepts. 130 DYNAMIC GAMES: THEORY AND APPLICATIONS 8.2 Cooperative Payoﬀ Distribution Procedure (CPDP)
Kiz (¯z , . . . , xz ), x1 ¯n
i∈N The vector function β (z ) = (β1 (z ), . . . , βn (z )) is called CPDP if βi (z ) =
i∈N (6.24) x1 ¯n where xz = (¯z , . . . , xz ) satisﬁes (6.21). In each subgame G(z ) with ¯ each path z = z, . . . , zm (Lzm = ∅) in this subgame one can associate ¯ the random variable — the sum of βi (z ) along this path z . Denote the expected value of this sum in G(z ) as Bi (z ). It can be easily seen that Bi (z ) satisﬁes the following functional equation p(z, y ; xz )Bi (y ). (6.25) Bi (z ) = βi (z ) +
y ∈Lz Calculate Shapley Value (Shapley (1953a)) for each subgame G(z ) for z ∈ CZ Shi (z ) =
S ⊂N i∈S (S  − 1)!(n − S )! (V (S, z ) − V (S \ {i}, z )) n! (6.26) where S  is the number of elements in S . Deﬁne γi (z ) by formula Shi (z ) = γi (z ) +
y ∈Z p(z, y ; xz )Shi (y ). (6.27) Since Sh(z ) ∈ I (z ) we get from (6.27) V (N ; z ) =
i∈N γi (z ) +
y ∈Lz p(z, y ; xz )V (N ; y ), for z ∈ {z : Lz = ∅}. (6.28) andV (N ; z ) =↔
i∈N γi (z ), Comparing (6.28) and (6.21) we get that γi (z ) =
i∈N i∈N Kiz (˜z ) x (6.29) for xz = (xz , . . . , xz ), xz ∈ Xiz , i ∈ N and thus the following lemma n 1 i holds Lemma 6.1 γ (z ) = (γ1 (z ), . . . , γn (z )) deﬁned by (6.27) is CPDP. 6 Cooperative Dynamic Games with Incomplete Information 131 Definition 6.7 Shapley Value {Sh(z0 )} is called timeconsistent in G(z0 ) if there exists a nonnegative CPDP (βi (z ) ≥ 0) such that the following condition holds
Shi (z ) = βi (z ) +
y ∈Lz p(z, y ; xz )Shi (y ), i ∈ N, z ∈ Z. (6.30) From (6.30) we get βi (z ) = Shi (z ) −
y ∈Lz p(z, y ; xz )Shi (y ) and the nonnegativity of CPDP βi (z ) can follow from the monotonicˆ ˆ ity of Shapley Value along the paths on cooperative subgame G(z0 ) (Shi (y ) ≤ Shi (z ) for y ∈ Lz ). In the same time the nonnegativity of CPDP βi (z ) from (6.30) in general does not hold. Denote as before by Bi (z ) the expected value of the sums of βi (y ) ˆ ˆ from (6.30), y ∈ Z along the paths in the cooperative subgame G(z ) of ˆ ˆ the game G(z0 ). Lemma 6.2
Bi (z ) = Shi (z ), i ∈ N. We have for Bi (z ) the equation (6.25) Bi (z ) = βi (z ) +
y ∈Lz (6.31) p(z, y ; xz )Bi (y ) (6.32) with initial condition Bi (z ) = Shi (z ) for z ∈ {z : Lz = ∅}, and for the Shapley Value we have Shi (z ) = βi (z ) +
y ∈Lz (6.33) p(z, y ; xz )Shi (y ). (6.34) From (6.32), (6.33), (6.34) it follows that Bi (z ) and Shi (z ) satisfy the same functional equations with the same initial condition (6.33), and the proof follows from backward induction consideration. Lemma 6.2 gives natural interpretation for CPDP βi (z ), βi (z ) can be interpreted as the instantaneous payoﬀ which player has to get in a stage game Γ(z ) when this game actually occurs along the paths of ˆ ˆ cooperative subtree G(z0 ), if his payoﬀ in the whole game equals to 132 DYNAMIC GAMES: THEORY AND APPLICATIONS the ith component of the Shapley Value. So the CPDP shows the distribution in time of the Shapley Value in such a way that the players in each subgame are oriented to get the current Shapley Value of this subgame. 8.3 Regularization In this section we purpose the procedure similar to one used in differential cooperative games (Petrosjan (1993)) which will guarantee the existence of timeconsistent Shapley Value in the cooperative stochastic game (nonnegative CPDP). Introduce Ki (¯z , . . . , xz ) x1 ¯n ¯ Shi (z ) (6.35) βi (z ) = i∈N V (N, z ) x1 ¯n where xz = (¯z , . . . , xz ), z ∈ Z is the realization of the ntuple of strate¯ ¯ gies π (·) = (¯1 (·), . . . , πn (·)) maximizing the mathematical expectation ¯ π of the sum of players’ payoﬀs in the game G(z0 ) (cooperative solution) and V (N, z ) is the value of c. f. for the grand coalition N in a sub¯ game G(z ). Since Shi (z ) = V (N, z ) from (6.35) it follows that βi (z ), i ∈ N , z ∈ Z , is CPDP. From (6.35) it follows also that the instantaneous payoﬀ of the player in a stage game Γ(z ) must be proportional to the Shapley Value in a subgame G(z ) of the game G(z0 ). Deﬁne the regularized Shapley Value (RSV) in G(z ) by induction as follows Ki (¯z ) x ˆ Shi (z ) + p(z, y ; xz )Shi (y ) ¯ˆ (6.36) Shi (z ) = i∈N V (N, z )
y ∈Lz i∈N with the initial condition Ki (¯z ) x ˆ Shi (z ) = Shi (z ) for z ∈ {z : Lz = ∅}. Shi (z ) = i∈N V (N, z ) (6.37) ¯ Since Ki (x) ≥ 0 from (6.35) it follows that βi (z ) ≥ 0, and the new ˆ regularized Shapley Value Shi (z ) is timeconsistent by (6.36). ˆ Introduce the new characteristic function V (S, z ) in G(z ) by induction using the formula (S ⊂ N ) ˆ V (S, z ) =
i∈N Ki (¯z ) x V (S, z ) + V (N, z ) p(z, y ; xz )V (S, y ) ¯ˆ
y ∈Lz (6.38) with initial condition ˆ V (S, z ) = V (S, z ) for z ∈ {z : Lz = ∅}. 6 Cooperative Dynamic Games with Incomplete Information 133 ˆ ˆ Here V (S, z ) is superadditive, so is V (S, z ), and V (N, z ) = V (N, z ) ˆ (N, z ) and V (N, z ) satisfy the same functional since both functions V equation (6.21) with the initial condition (6.22). Rewriting (6.38) for {S \ i} we get ˆ V (S \ i, z ) =
i∈N Ki (¯z ) x V (S \ i, z ) + V (N, z ) p(z, y ; xz )V (S \ i, y ). (6.39) ¯ˆ
y ∈Lz Subtracting (6.39) from (6.38) and multiplying on and summing upon S ⊂ N , S i we get (S − 1)!(n −S )! ˆ ˆ V (S, z ) − V (S \ i, z ) n! S ⊂N
Si (S  − 1)!(n − S )! n! =
S ⊂N Si (S − 1)!(n −S )! [V (S, z ) − V (S \ i, z )] n! p(z, y ; xz ) ¯ i∈N Ki (¯z ) x V (N, z ) +
y ∈Lz (S − 1)!(n −S )! [V (S, z ) − V (S \ i, z )] . n! S ⊂N
Si (6.40) ˆ From (6.36), (6.37) and (6.40) it follows that (RSV) Sh(z ) is a Shapley ˆ ˆ Value for the c. f. V (S, z ), since we got that Shi (z ) and the function (S  − 1)!(n − S )! ˆ ˆ V (S, z ) − V (S \ i, z ) n! S ⊂N S i satisfy the same functional equations with the initial condition. (S  − 1)!(n − S )! ˆ ˆ V (S, z ) − V (S \ i, z ) n! = S ⊂N Si S ⊂N Si (S  − 1)!(n − S )! [V (S, z ) − V (S \ i, z )] = Shi (z ) n! which also coincides with (6.37). Thus ˆ Shi =
S ⊂N Si (S  − 1)!(n − S )! ˆ ˆ V (S, z ) − V (S \ i, z ) . n! 134 DYNAMIC GAMES: THEORY AND APPLICATIONS Theorem 6.4 The RSV is Timeconsistent and is a Shapley Value for ˆ the regularized characteristic function V (S, z ) deﬁned for any subgame G(z ) of the stochastic game G(z0 ). References
Bellman, R.E. (1957). Dynamic Programming. Princeton University Press, Prinstone, NY. Filar, J. and Petrosjan, L.A. (2000). Dynamic cooperative games. International Game Theory Review, 2(1):42–65. Haurie, A. (1975). On some properties of the characteristic function and core of multistage game of coalitions. IEEE Transaction on Automatic Control, 20(2):238–241. Haurie, A. (1976). A note on nonzerosum diﬀerential games with bargaining solution. Journal of Optimization Theory and Application, 18(1):31–39. Kaitala, V. and Pohjola, M. (1988). Optimal recovery of a shared resource stock. A diﬀerential game model with eﬃcient memory equilibria. Natural Resource Modeling, 3(1):191–199. Kuhn, H.W. (1953). Extensive games and the problem of information. Annals of Mathematics Studies, 28:193–216. Owen, G. (1968). Game Theory. W.B. Saunders Co., Philadelphia. Petrosjan, L.A. (1993). Diﬀerential Games of Pursuit. World Scientiﬁc, London. Petrosjan, L.A. (1995). The Shapley value for diﬀerential games. In: Geert Olsder (eds.), New Trends in Dynamic Games and Applications, pages 409–417, Birkhauser. Petrosjan, L.A. (1996). On the timeconcistency of the Nash equilibrium in multistage games with discount payoﬀs. Applied Mathematics and Mechanics (ZAMM), 76(3):535–536. Petrosjan, L.A. and Danilov, N.N. (1985). Cooperative Diﬀerential Games. Tomsk University Press, Vol. 276. Petrosjan, L.A. and Danilov, N.N. (1979). Stability of solution in nonzerosum diﬀerential games with transferable payoﬀs. Vestnik of Leningrad University, 1:52–59. Petrosjan, L.A. and Zaccour, G. (2003). Timeconsistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 27:381–398. Shapley, L.S. (1953a). A value for nperson games. Annals of Mathematics Studies, 28:307–317. Shapley, L.S. (1953b). Stochastic games. Proceedings of the National Academy of U.S.A., 39:1095–1100. Chapter 7 ELECTRICITY PRICES IN A GAME THEORY CONTEXT
Mireille Bossy Nadia Ma¨ ızi Geert Jan Olsder Odile Pourtallier Etienne Tanr´ e
Abstract We consider a model of an electricity market in which S suppliers oﬀer electricity: each supplier Si oﬀers a maximum quantity qi at a ﬁxed price pi . The response of the market to these oﬀers is the quantities bought from the suppliers. The objective of the market is to satisfy its demand at minimal price. We investigate two cases. In the ﬁrst case, each of the suppliers strives to maximize its market share on the market; in the second case each supplier strives to maximize its proﬁt. We show that in both cases some Nash equilibrium exists. Nevertheless a close analysis of the equilibrium for proﬁt maximization shows that it is not realistic. This raises the diﬃculty to predict the behavior of a market where the suppliers are known to be mainly interested by proﬁt maximization. 1. Introduction Since the deregulation process of electricity exchanges has been initiated in European countries, many diﬀerent market structures have appeared (see e.g. Stoft (2002)). Among them are the so called day ahead markets where suppliers face a decision process that relies on a centralized auction mechanism. It consists in submitting bids, more or less complicated, depending on the design of the day ahead market (power pools, power exchanges, . . . ). The problem is the determination of the quantity and price that will win the process of selection on the market. Our aim in this paper is to describe the behavior of the participants 136 DYNAMIC GAMES: THEORY AND APPLICATIONS (suppliers) through a static game approach. We consider a market where S suppliers are involved. Each supplier oﬀers on the market a maximal quantity of electricity, q , that it is ready to deliver at a ﬁxed price p. The response of the market to these oﬀers is the quantities bought from each supplier. The objective of the market is to satisfy its demand at minimal price. Closely related papers are Supatchiat, Zhang and Birge (2001) and Madrigal and Quintana (2001). They also consider optimal bids on electricity markets. Nevertheless, in Supatchiat, Zhang and Birge (2001), the authors take the quantity of electricity proposed on the market as exogenous, whereas here we consider the quantity as part of the bid. In Madrigal and Quintana (2001), the authors do not consider exactly the same kind of market mechanism, in particular they consider open bids and ﬁx the market clearing price as the highest price among the accepted bids. They consider ﬁxed demand but also stochastic demand. The paper is organized as follows. The model is described in Section 2, together with the proposed solution concept. In Section 3 we consider the case where the suppliers strive to maximize their market share, while in Section 4 we analyze the case where the goal is proﬁt maximization. We conclude in Section 5 with some comparison remarks on the two criteria used, and some possible directions for future work. 2. 2.1 Problem statement The agents and their choices We consider a single market, that has an inelastic demand for d units of electricity that is provided by S local suppliers called Sj , j = 1, 2, . . . , S . 2.1.1 The suppliers. Each supplier Sj sends an oﬀer to the market that consists in a price function pj (·), that associates to any quantity of electricity q , the unit price pj (q ) at which it is ready to sell this quantity. We shall use the following special form of the price function: Definition 7.1 For supplier Sj , a quantityprice strategy, referred to as the pair (qj , pj ), is a price function pj (·) deﬁned by
pj (q ) = pj ≥ 0, for q ≤ qj , +∞, for q > qj . (7.1) qj is the maximal quantity Sj oﬀers to sell at the ﬁnite price pj . For higher quantities the price becomes inﬁnity. 7 Electricity Prices in a Game Theory Context 137 Note that we use the same notation, (pj ) for the price function and for the ﬁxed price. This should not cause any confusion. 2.1.2 The market. The market collects the oﬀers made by the suppliers, i.e., the price functions p1 (·), p2 (·), . . . , pS (·), and has to choose the quantities q j to buy from each supplier Sj , j = 1, . . . , S . The unit price paid to Sj is pj (q j ). We suppose that an admissible choice of the market is such that the demand is fully satisﬁed at ﬁnite price, i.e., such that,
S q j = d, q j ≥ 0, and pj (q j ) < +∞, ∀j.
j =1 (7.2) When the set of admissible choices is empty, i.e., when the demand cannot be satisﬁed at ﬁnite cost (for example when the demand is too large with respect to some ﬁnite production capacity), then the market buys the maximal quantity of electricity it can at ﬁnite price, though the full demand is not satisﬁed. 2.2 Evaluation functions and objective 2.2.1 The market. We suppose that the objective of the market is to choose an admissible strategy (i.e., satisfying (7.2)), (q 1 , . . . , q S ) in response to the oﬀers p1 (·), . . . , pS (·) of the suppliers, so as to minimize the total cost. More precisely the market problem is:
{q j }j =1···S min ϕM (p1 (·), . . . , pS (·), q 1 , . . . , q S ),
S (7.3) with ϕM (p1 (·), . . . , pS (·), q 1 , . . . , q S ) = subject to constraints (7.2).
def pj (q j )q j ,
j =1 (7.4) 2.2.2 The suppliers. The two criteria, proﬁt and market share, will be studied for the suppliers:
The proﬁt — When the market buys quantities q j , j = 1, . . . , S , supplier Sj ’s proﬁt to be maximized is ϕSj (p1 (·), . . . , pS (·), q j ) = pj (q j )q j − Cj (q j ), where Cj (·) is the production cost function.
def (7.5) 138 DYNAMIC GAMES: THEORY AND APPLICATIONS Assumption 7.1 We suppose that, for each supplier Sj , the production cost Cj (·) is a piecewise C 1 and convex function. When Cj is not diﬀerentiable we deﬁne the marginal cost C (q ) as dC lim →0+ dqj (q − ). Because of the assumption made on Cj , the marginal cost Cj is monotonic and nondecreasing. In particular it can be a piecewise constant increasing function.
A typical special case in electricity production is when the marginal costs are piecewise constant. It corresponds to the fact that the producers starts producing in its cheapest production facility. If the market asks more electricity, the producers start up the one but cheapest production facility, etc. The market share — for supplier Sj , q j is the quantity bought from him by the market, i.e., we deﬁne this criterion as ϕSj (p1 (·), . . . , pS (·), q j ) = q j .
def (7.6) For this criterion, it is necessary to introduce a price constraint. As a matter of fact, the obvious, but unrealistic, solution without price constraint would be to set the price to zero whatever the quantity bought is. We need a constraint such that, for example, the proﬁt is nonnegative, or such that the unit price is always above the marginal cost, Cj . For the sake of generality we suppose the existence of a minimal unit price function Lj for each supplier. Supplier Sj is not allowed to sell the quantity q at a unit price lower than Lj (q ). A natural choice for Lj is Cj , which expresses the usual constraint that the unit price is above the marginal cost. 2.3 Equilibria From a game theoretical point of view, a two time step problem with S + 1 players will be formulated. At a ﬁrst time step the suppliers announce their oﬀers (the price functions) to the market, and at the second time step the market reacts to these oﬀers by choosing the quantities q j of electricity to buy from each supplier. Each player strives to optimize (i.e., maximize for the suppliers, minimize for the market) his own criterion function (ϕSj , j = 1, . . . , S , ϕM ) by properly choosing his own decision variable(s). The numerical outcome of each criterion function will in general depend on all decision variables involved. In contrast to 7 Electricity Prices in a Game Theory Context 139 conventional optimization problems, in which there is only one decision maker, and where the word “optimum” has an unambiguous meaning, the notion of “optimality” in games is open to discussion and must be deﬁned properly. Various notions of “optimality” exist (see Ba¸ar and s Olsder (1999)). Here the structure of the problem leads us to use a combined Nash Stackelberg equilibrium. Please note that the “leaders”, i.e., suppliers, s choose and announce functions pj (·). In Ba¸ar and Olsder (1999) the corresponding equilibrium is referred to as inverse Stackelberg. More precisely, deﬁne {q j (p1 (·), . . . , pS (·)), j = 1, . . . , S}, the best response of the market to the oﬀers (p1 (·), . . . , pS (·)) of the suppliers, i.e., a solution of the problem ((7.2)(7.3)). The choices ({p∗ (·)}, {q ∗ }, j = j j 1, . . . , S ) will be said optimal if the following holds true, q ∗ = q ∗ (p∗ (·), . . . , p∗ (·)), j j1 S
def (7.7) For every supplier Sj , j = 1, . . . , S and any admissible price function pj (·) we have ˜ ˜ ˜ ϕSj (p∗ (·), . . . , p∗ (·), q ∗ ) ≥ ϕSj (p∗ (·), . . . , pj (·), . . . , p∗ (·), qj ), 1 S j 1 S where ˜ qj = q ∗ (p∗ (·), . . . , pj (·), . . . , p∗ (·)). ˜ j1 S
def (7.8) (7.9) The Nash equilibrium Equation (7.8) tells us that supplier Sj cannot increase its outcome by deviating unilaterally from its equilibrium choice (p∗ (·)). Note that in the second term of Equation (7.8), the action of j the market is given by (7.9): if Sj deviates from p∗ (·) by oﬀering the j price function pj (·), the market reacts by buying from Sj the quantity ˜ qj instead of q ∗ . ˜ j Remark 7.1 As already noticed the minimization problem ((7.2)–(7.3)) deﬁning the behavior of the market may not have any solution. In that case the market reacts by buying the maximal quantity of electricity it can at ﬁnite price. At the other extreme, it may have inﬁnitely many solutions (for example when several suppliers use the same price function). In that case q ∗ (·) is not uniquely deﬁned by Equation (7.7), nor consequently is the j Nash equilibrium deﬁned by Equation (7.8). We would need an additional rule that says how the market reacts when its minimization problem has several (possibly inﬁnitely many) solutions. Such an additional rule could be, for example, that the market ﬁrst buys from supplier S1 then from supplier S2 , etc. or that the market prefers the oﬀers with larger quantities, etc. Nevertheless, it is 140 DYNAMIC GAMES: THEORY AND APPLICATIONS not necessary to make this additional rule explicit in this paper. So we do assume that there is an additional rule, known by all the suppliers that insures that the reaction of the market is unique. 3. Suppliers maximize market share In this section we analyze the case where the suppliers strive to maximize their market shares by appropriately choosing the price functions pj (·) at which they oﬀer their electricity on the market. We restrict our attention to price functions pj (·) given in Deﬁnition 7.1 and referred to as the quantityprice pair (qj , pj ). For supplier Sj we denote Lj (·) its minimal unit price function that we suppose nondecreasing with respect to the quantity sold. Classically this minimal unit price function may represent the marginal production cost. Using a quantityprice pair (qj , pj ) for each supplier, the market problem (7.3) can be written as
S {q j ,j =1,...,S} min pj q j ,
j =1 S under 0 ≤ q j ≤ qj ,
j =1 q j = d. To deﬁne a unique reaction of the market we use Remark 7.1, when Problem (7.9) does not have any solution (i.e., when S=1 qj < d) or j at the other extreme when Problem (7.9) has possibly inﬁnitely many solutions. Hence we can deﬁne the evaluation function of the suppliers by JSj ((q1 , p1 ), . . . , (qS , pS )) = ϕSj (p1 (·), . . . , pS (·), q ∗ (p1 (·), . . . , pS (·)), j
def where the price function pj (·) is the pair (qj , pj ) and q ∗ (p1 (·), . . . , pS (·)) j is the unique optimal reaction of the market. Now the Nash Stackelberg solution can be simply expressed as a Nash def def ∗ solution, i.e., ﬁnd u∗ = (u∗ , . . . , u∗ ), u∗ = (qj , p∗ ), so that for any 1 j j S ˜ q˜ supplier Sj and any pair uj = (˜j , pj ) we have JSj (u∗ ) ≥ JSj (u∗ j , uj ), −˜ (7.10) where (u∗ j , uj ) denotes the vector (u∗ , . . . , u∗−1 , uj , u∗+1 , . . . , u∗ ). ˜j 1 −˜ j S 7 Electricity Prices in a Game Theory Context 141 Assumption 7.2 We suppose that there exist quantities Qj for j = 1, . . . , S , such that
S Qj ≥ d,
j =1 (7.11) and such that the minimal price functions Lj are deﬁned for the set [0, Qj ] to R+ , with ﬁnite values for any q in [0, Qj ]. The quantities Qj represent the maximal quantities of electricity supplier Sj can oﬀer to the market. It may reﬂect maximal production capacity for producers or more generally any other constraints such that transportation constraints. Remark 7.2 The condition (7.11) insures that shortage can be avoided even if this implies high, but ﬁnite, prices.
We consider successively in the next subsections the cases where the minimal price functions Lj are continuous (Subsection 3.1) or discontinuous (Subsection 3.2). This last case is the most important from the application point of view, since we often take Lj = Cj which is not in general continuous. 3.1 Continuous strictly increasing minimal price We suppose the following assumption holds, Assumption 7.3 For any supplier Sj , j ∈ {1, . . . , S} the minimal price function Lj is continuous and strictly increasing from [0, Qj ] to R+ . Proposition 7.1
1. Suppose that Assumption 7.3 holds. Then any strategy proﬁle u∗ = ∗ (u∗ , u∗ , . . . , u∗ ) with u∗ = (qj , p∗ ) such that 12 j S
∗ ∗ Lj (qj ) = p∗ , ∀j ∈ {1, . . . , S} such that qj > 0 ∗ Lj (0) ≥ p∗ , ∀j ∈ {1, . . . , S} such that qj = 0 ∗ = d, j ∈{1,2···S} qj (7.12) is a Nash equilibrium. 2. Suppose furthermore that Assumption 7.2 holds, then the equilibrium exists and is unique. We omit the proof of this proposition which is in the same vein as the proof of the following Proposition 7.2. Nevertheless it can be found in Bossy et al. (2004). 142 DYNAMIC GAMES: THEORY AND APPLICATIONS 3.2 Discontinuous nondecreasing minimal price We now address the problem where the minimal price functions Lj are not necessarily continuous and not necessarily strictly increasing. Nevertheless we assume that they are non decreasing. We set the following assumption, Assumption 7.4 We suppose that the minimal price functions Lj are nondecreasing, piecewise continuous, and that lim Lj (y ) = L(x) for
any x ≥ 0.
y →x− Replacing Assumption 7.3 by Assumption 7.4, there may not be any strategy proﬁle or at the other extreme there may be possibly inﬁnitely many strategy proﬁles that satisfy Equations (7.12). Proposition 7.1 fail to characterize the Nash equilibria. For any p ≥ 0, we deﬁne ρj (p), the maximal quantity supplier Sj can oﬀer at price p, i.e., ρj (p) = max{q ≥ 0, Lj (q ) ≤ p}, if j is such that Lj (0) ≤ p, 0, otherwise . (7.13) Hence ρj (p) is only determined by the structure of the minimal price function Lj . In particular it is not dependent on any choice of the suppliers. As a consequence of Assumption 7.4, ρj (p) increases with p, and for any p ≥ 0, limy→p+ ρj (y ) = ρj (p). Denote by O(·) the function from R+ to R+ deﬁned by
S O(p) =
j =1 ρj (p). (7.14) O(p) is the maximal total oﬀer that can be achieved at price p by the suppliers respecting the price constraints. The function O is possibly discontinuous, non decreasing (but not necessarily strictly increasing) and satisﬁes limy→p+ O(y ) = O(p). Assumption 7.2 implies that
S S O(sup Lj (Qj )) ≥
j j =1 ρj (Lj (Qj )) ≥
j =1 Qj ≥ d, hence there exists a unique p∗ ≤ supj Lj (Qj ) < +∞ such that O(p ) = ∀ > 0,
j =1 ∗ S ρj (p∗ ) ≥ d, − ) < d. (7.15) O(p∗ 7 Electricity Prices in a Game Theory Context 143 The price p∗ represents the minimal price at which the demand could be fully satisﬁed taking into account the minimal price constraint. Assumption 7.5 For p∗ deﬁned by (7.15), one of the following two condition holds:
1. We suppose that there exists a unique ¯ ∈ {1, . . . , S} such that j
− L¯ 1 (p∗ ) = ∅, where L−1 (p) = {q ∈ [0, d], Lj (q ) = p}. In particuj j lar, there exists a unique ¯ ∈ {1, . . . , S} such that L¯(ρ¯(p∗ )) = p∗ , j jj and such that for j = ¯, we have Lj (ρj (p∗ )) < p∗ . j def 2. At price p∗ the maximal total quantity suppliers are allowed to propose is exactly d, i.e., S=1 ρj (p∗ ) = d. j Proposition 7.2 Suppose Assumptions 7.4 and 7.5 hold. Consider the ∗ strategy proﬁle u∗ = (u∗ , . . . , u∗ ), u∗ = (qj , p∗ ) such that, 1 j S
p∗ is deﬁned by Equation (7.15), for j = ¯, i.e. such that Lj (ρj (p∗ )) < p∗ (see Assumption 7.5), we j ∗ ∗ have qj = ρj (p∗ ) and p∗ ∈ [Lj (qj ), p∗ [ , j for j = ¯, i.e. such that L¯(ρ¯(p∗ )) = p∗ (see Assumption 7.5), we j jj ∗ ∗ ∗ have qj ∈ [min((d − k=¯ qk ) , ρ¯(p∗ )) , ρ¯(p∗ )], and p¯ ∈ [p∗ , p[, j j j j where p is deﬁned by
∗ p = min{Lk (qk+ ), k = ¯} j def (7.16) then, u∗ is a Nash equilibrium. Remark 7.3 There exists an inﬁnite number of strategy proﬁles that satisfy the conditions of Proposition 7.2 (the prices p∗ are deﬁned as j elements of some intervals). Nevertheless, we can observe that there is no need for any coordination among the suppliers to get a Nash equilibrium. Each supplier can choose independently a strategy as described in Proposition 7.2, the resulting strategy proﬁle is a Nash equilibrium. Note that this property does not hold in general for nonzero sum games (see the classical “battle of the sexes” game Luce and Raifa (1957)). We can also observe that for each supplier the outcome is the same whatever the Nash equilibrium set. In that sense we can say that all these Nash equilibria are equivalent. A reasonable manner to select a particular Nash equilibrium is to suppose that the suppliers may strive for the maximization of their proﬁts as an auxiliary criteria. More precisely, among the equilibria with market 144 DYNAMIC GAMES: THEORY AND APPLICATIONS share maximization as criteria, they choose the equilibrium that brings them the maximal income. Because the equilibria we have found are independent, it is possible for each supplier to choose its preferred equilibrium. More precisely, with this auxiliary criterion, the equilibrium selected will be,
∗ j qj = ρj (p∗ ), p∗ = p∗ − , for j = ¯ (i.e. such that Lj (ρj (p∗ )) < p∗ ), j ∗ ∗ q¯ = ρ¯(p∗ ), p¯ = p − , j j j where can be deﬁned as the smallest monetary unit. Remark 7.4 Assumption 7.5 is necessary for the solution of the market problem (7.9) to have a unique solution for the strategies described in Proposition 7.2, which are consequently well deﬁned. If Assumption 7.5 does not hold, we would need to make the additional decision rule of the market explicit (see Remark 7.1). This is shown in the following example (Figure 7.1), with S = 2. The Nash equilibrium may depend upon the additional decision rule of the market. In Figure 7.1, we have L1 (ρ1 (p∗ )) = L2 (ρ2 (p∗ )) = p∗ and ρ1 (p∗ ) + ρ2 (p∗ ) > d, where p∗ is the price deﬁned at (7.15). This means that Assumption 7.5 does not hold. Suppose the additional decision rule of the market p* ~ p2 ~ q1 ^ q1 d ^ d−q2 ~ d−q2 Figure 7.1. Example is to give the preference to supplier S1 , i.e., for a pair of strategies ((q1 , p), (q2 , p)) such that q1 + q2 > d the market reacts by buying the respective quantities q1 and d − q1 respectively to supplier S1 and to supplier S2 . The Nash equilibria for market share maximization are,
∗ ˜ˆ u∗ = (q1 ∈ [d − q2 , q1 ], p∗ ), 1 u∗ = (˜2 , p∗ ∈ [˜2 , p∗ [), q2 p 2 ˜ ˜ q where qi = ρi (p∗ ), qi = ρi (p∗ − ), and p2 = L2 (˜2 ). ˆ 7 Electricity Prices in a Game Theory Context 145 Suppose now the additional decision rule of the market is a preference for supplier S2 . The previous pair of strategies is not a Nash equilibrium any more. Indeed, supplier S2 can increase its oﬀer, at price p∗ , to the quantity q2 . The equilibrium in that case is ˆ
∗ u∗ = (q1 ∈ [d − q2 , q1 ], p∗ ), ˆˆ 1 u∗ = (ˆ2 , p∗ ). q 2 Remark 7.5 In Proposition 7.2 we see that at equilibrium, the maximal price p that can be proposed is given by (7.16). A suﬃcient condition ¯ for that price to be ﬁnite is that for any j ∈ {1, 2, . . . , S} we have,
Qk > d.
k=j (7.17) Equation (7.17) means that with the withdrawal of an individual supplier, the demand can still be satisﬁed. This will insure that none of the suppliers can create a ﬁctive shortage and then increase unlimitedly the price of electricity. Proof of Proposition 7.2. We have to prove that for supplier Sj there is no proﬁtable deviation of strategy, i.e. for any uj = u∗ , we have j JPj (u∗ j , u∗ ) ≥ JPj (u∗ j , uj ). − j − Suppose ﬁrst that j ∈ S (p∗ ) so that Lj (ρj (p∗ )) < p∗ . Since for the ∗ proposed Nash strategy u∗ = (qj , p∗ ), we have p∗ < p∗ , the total j j j ∗ quantity proposed by Sj is bought by the market (q j = qj ). Hence ∗ ) = q∗. Jj (u j
∗ – If the deviation uj = (qj , pj ) is such that qj ≤ qj , then clearly ∗ ) = q ∗ ≥ q ≥ J (u∗ , u ), whatever the price p is. JPj (u j j j Pj j −j – If the deviation uj = (qj , pj ) is such that qj > ρj (p∗ ) then necessarily, by the minimal price constraint, Assumption 7.5 ∗ and the deﬁnition of qj = ρj (p∗ ), we have
∗ pj ≥ Lj (qj ) ≥ Lj (qj + ) > sup p∗ . k k=j Hence now supplier Sj is the supplier with the highest price. Consequently the market ﬁrst buys from the other suppliers and satisﬁes the demand, when necessary, with the electricity produced by supplier Sj (instead of the supplier S¯). Hence j the market share of Sj cannot increase with this deviation. Suppose now that j = ¯, i.e., we have, L¯(ρ¯(p∗ )) = p∗ . j jj 146 DYNAMIC GAMES: THEORY AND APPLICATIONS – If the ﬁrst item of Assumption 7.5 holds, then at the proposed Nash equilibrium, supplier S¯ is the supplier that meets the j demand since it proposes the highest price. Hence if supplier S¯ wants to increases its market share, it j ∗ has to sell a quantity q¯ ≥ d − k=¯ qk . But we have, ˜j j L¯(˜¯ ) ≥ L¯(d − j qj j
k=¯ j ∗ qk ) = p∗ > max p∗ . k k=¯ j This proves that the quantity q¯ cannot be oﬀered at a price ˜j such that the market would buy it. – If the second item of Assumption 7.5 holds, then the proposition states that the quantity proposed, and bought by the market is ρ¯. An increase in the quantity proposed would imj ply a higher price, which would not imply a higher quantity proposed by the market since now the supplier would have the highest price. 2 Now we suppose that Assumption 7.5 does not hold. So for the price p∗ deﬁned by (7.15) we have more than one supplier Sj such that Lj (ρj (p∗ )) = p∗ . As shown in the example of Remark 7.4 (see Figure 7.1), the Nash equilibria may depend upon the reaction of the market when two suppliers, Si and Sj , have the same price pi = pj = p∗ . It is clear that for a supplier Sj in such a way that Lj (ρj (p∗ )) = p∗ , two possibilities may occur at equilibrium. Either, for some supplier Sj that ﬁxes its price to pj = p∗ , the market reacts in such a way that q j < ρj (p∗ − ), in which case at equilibrium we will have p∗ = p∗ − , or the market reacts such j that q j ≥ ρj (p∗ − ), and in that case we will have p∗ = p∗ . j Although the existence of Nash equilibria seems clear for any possible reaction of the market, we restrict our attention to the case where the market reacts by choosing quantities (q j )j =1,...S that are monotonically nondecreasing with respect to the quantity qj proposed by each supplier Sj . More precisely we have the following assumption, Assumption 7.6 Let u = (u1 , . . . , uS ) be a strategy proﬁle of the suppliers with ui = (qi , p) for i ∈ {1, . . . , k}. Suppose the market has to use ˜ its additional rule to decide how to share the quantity d ≤ d among sup˜ pliers S1 to Sk (the quantity d − d has already been bought from suppliers with a price lower than p). Let i ∈ {1, . . . , k}, we deﬁne the function that associates q i to any qi ≥ 0, where q i is the ith component of the reaction (q 1 , . . . , q S ) of the 7 Electricity Prices in a Game Theory Context 147 market to the strategy proﬁle (u−i , (qi , p)). We suppose that this function is not decreasing with respect to the quantity qi . The meaning of this assumption is that the market does not penalize an “overoﬀer” of a supplier. For ﬁxed strategies u−i of all the suppliers but Si , if supplier Si , such that pi = p increases its quantity qi , then the quantity bought by the market from Si cannot decrease. It can increase or stay constant. In particular, it encompasses the case where the market has a preference order between the suppliers (for example, it ﬁrst buys from supplier Sj1 , then supplier Sj2 etc), or when the market buys some ﬁxed proportion from each supplier. It does not encompass the case where the market prefers the smallest oﬀer. Proposition 7.3 Suppose Assumption 7.5 does not hold while Assump∗ ∗ tion 7.6 does. Let the strategy proﬁle ((q1 , p∗ ), . . . , (qS , p∗ )) be deﬁned 1 S by
– If Sj is such that Lj (ρj (p∗ )) < p∗ then p ∗ = p∗ − , j
∗ qj = ρj (p∗ − ). – If Sj is such that Lj (ρj (p∗ )) = p∗ , then either p∗ = p∗ , j
∗ qj = ρj (p∗ ), (7.18) when the reaction of the market is such that q j ≥ ρj (p∗ − ), or p ∗ = p∗ − , j
∗ qj = ρj (p∗ − ). ∀ > 0, (7.19) when the new reaction of the market, for a deviation pj = p∗ would be such that q j < ρj (p∗ − ). This strategy proﬁle is a Nash equilibrium. Proof. The proof follows directly from the discussion made before the proposition, and from the proof of Proposition 7.2. 2 Example. We consider a market with 5 suppliers and a demand d equal to 10. We suppose that the minimal price functions Lj of suppliers are increasing staircase functions, given in the following table (the notation (]a, b]; c) indicates that the value of the function in the interval ]a, b] is c), 148
supplier supplier supplier supplier supplier 1 2 3 4 5 DYNAMIC GAMES: THEORY AND APPLICATIONS ([0, 1]; 10), (]1, 3]; 15), (]3, 4]; 25), (]4, 10]; 50) ([0, 5]; 20), (]5, 6]; 23), (]6, 7]; 40), (]7, 10]; 70) ([0, 2]; 15), (]2, 6]; 25), (]6, 7]; 30), (]7, 10]; 50) ([0, 1]; 10), (]1, 4]; 15), (]4, 5]; 20), (]5, 10]; 50) ([0, 4]; 30), (]4, 8]; 90), (]8, 10]; 100) We display in the following table the values for ρj (p) and O(p) respectively deﬁned by equations (7.13) and (7.14). p p ∈ [0, 10[ p ∈ [10, 15[ p ∈ [15, 20[ p ∈ [20, 23[ ρ1 (p) 0 1 3 3 ρ2 (p) 0 0 0 5 ρ3 (p) 0 0 2 2 ρ4 (p) 0 1 4 5 ρ5 (p) 0 0 0 0 O(p) 0 2 9 15 The previous table shows that for a price p in [15, 20[, only suppliers S1 , S3 and S4 can bring some positive quantity of electricity. The total maximal quantity that can be provided is 9 which is strictly lower than the demand d = 10. For a price in [20, 23[, we see that supplier S2 can also bring some positive quantity of electricity, the total maximal quantity is then 15 which is higher than the demand. Then we conclude that the price p∗ deﬁned by Equation (7.15) is p∗ = 20. Moreover, L2 (ρ2 (p∗ )) = L4 (ρ4 (p∗ )) = p∗ which means that Assumption 7.5 is not satisﬁed. Notice that for supplier S5 , we have L5 (0) = 30 > p∗ . Supplier S5 will not be able to sell anything to the market, hence, whatever its bid is, we have q 5 = 0. We suppose that Assumption 7.6 holds. According to Proposition 7.3, we have the following equilibria. u∗ = (3, p∗ ∈ [15, 20[), u∗ = (2, p∗ ∈ [15, 20[) and u∗ = (p∗ , q ∗ ), p∗ ≥ L5 (q ) 1 1 3 3 5 to which the market reacts by buying the respective quantities q 1 (u∗ ) = 3, q 3 (u∗ ) = 2 and q 5 (u∗ ) = 0. The quantity 5 remains to be shared between S2 and S4 according to the additional rule of the market. For example, suppose that the market prefers S2 to all other suppliers. Then
∗ u∗ = (q2 ∈ [1, 5], p∗ = 20) and u∗ = (4, p∗ ∈ [15, 20[). 2 2 4 4 to which the market reacts by buying q 2 (u∗ ) = 1 and q 4 (u∗ ) = 4. If now the market prefers S4 to any other, then
∗ u∗ = (q2 ∈ [1, 5], p∗ = 20) and u∗ = (5, p∗ = 20). 2 2 4 4 to which the market reacts by buying q 2 (u∗ ) = 0 and q 4 (u∗ ) = 5. 7 Electricity Prices in a Game Theory Context 149 4. Suppliers maximize proﬁt In this section, the objective of the suppliers is to maximize their proﬁt, i.e. for a strategy proﬁle u = (u1 , . . . , uS ), uj = (qj , pj ), their evaluation functions are JSj (u) = pj q j − Cj (q j ), (7.20) where Cj (·) denotes supplier Sj ’s production cost function, and q j is the optimal reaction of the market, i.e. the solution of Problem (7.9) together with an additional decision rule, known by all the suppliers, in case of nonunique solutions (see Remark 7.1). As before, we do not need to make this rule explicit. In contrast to the market share maximization, we do not need a minimal price functions Lj . Nevertheless we need a maximal unit price pmax under which the suppliers are allowed to sell their electricity. This maximal price can either be ﬁnite and ﬁxed by the market or be inﬁnite. From all the assumptions previously made, we only retain in this section Assumption 7.1. Lemma 7.1 We deﬁne, for any ﬁnite price p ≥ 0, Qj (p) as the set of quantities that maximizes the quantity qp − Cj (q ), i.e.,
Qj (p) = arg max qp − Cj (q ),
q ∈[0,d] def and for inﬁnite price, Qj (+∞) = min(Qj , d), where Qj is the maximal production capacity of Sj . We have for ﬁnite p, Qj (p) = {0} if C (0) > p, Qj (p) = {d} if C (d) < p, Qj (p) = {q, Cj (q − ) ≤ p ≤ Cj (q + )}, otherwise. (7.21) Proof. We prove the last equality of (7.21). For any q ∈ Qj (p), we have for any > 0, pq − Cj (q ) ≥ p(q + ) − Cj (q + ), from which we deduce that Cj (q + ) − Cj (q ) ≥ p, 150 DYNAMIC GAMES: THEORY AND APPLICATIONS and letting tends to zero, it follows that Cj (q + ) ≥ p. The other equality is obtained with negative . The ﬁrst two equalities of (7.21) follow directly from the fact that C is supposed to be non decreasing. 2 Note that if C (·) is a continuous and non decreasing function, then (7.21) is equivalent to the classical ﬁrst order condition for the evaluation function of the supplier. Lemma 7.2 The function p → max (qp − Ci (q )) is continuous and
q ∈[0,d] strictly increasing. Proof. We recognize the LegendreFenchel transform of the convex function Cj . The continuity follows from classical properties of this transform. ˜ The function is strictly increasing, since for p > p , if we denote by q a quantity in arg maxq∈[0,d] (qp − Ci (q )), we have
q ∈[0,d] max (qp − Ci (q )) ≥ q p − Ci (˜) ˜ q q > q p − Ci (˜)= max (qp − Ci (q )). ˜
q ∈[0,d] 2 We now restrict our attention to the two suppliers’ case, i.e. S = 2. Our aim is to determine the Nash equilibrium if such an equilibrium ∗ ∗ ∗ exists. Hence we need to ﬁnd a pair ((q1 , p∗ ), (q2 , p∗ )) such that (q1 , p∗ ) 1 2 1 ∗ , p∗ ), and conis the best strategy of supplier S1 if supplier S2 chooses (q2 2 ∗ versely, (q2 , p∗ ) is the best strategy of supplier S2 if supplier S1 chooses 2 ∗ , p∗ ). Equivalently we need to ﬁnd a pair ((q ∗ , p∗ ), (q ∗ , p∗ )) such that (q1 1 11 22 there is no proﬁtable deviation for any supplier Si , i = 1, 2. ∗ ∗ Let us determine the conditions which a pair ((q1 , p∗ ), (q2 , p∗ )) must 1 2 satisfy in order to be a Nash equilibrium, i.e., no proﬁtable deviation exists for any supplier. We will successively examine the case where we ∗ ∗ have an excess demand (q1 + q2 ≤ d) and the case where we have an ∗ + q ∗ > d). excess supply (q1 2
∗ ∗ Excess demand: q1 + q2 ≤ d. In that case the market buys all the ∗ quantities proposed by the suppliers, i.e., q ∗ = qi , i = 1, 2. i 1. Suppose that for at least one supplier, say supplier S1 , we have p∗ < pmax . Then supplier S1 can increase its proﬁt by increasing 1 ∗ ∗ its price to pmax . Since q1 + q2 ≤ d the reaction of the market to ∗, p ∗ ∗∗ the new pair of strategies ((q1 max ), (q2 , p∗ )) is still q1 , q2 . Hence 2 ∗p ∗ ) > q ∗ p∗ − C (q ∗ ). the new proﬁt of S1 is now q1 max − C1 (q1 11 11 7 Electricity Prices in a Game Theory Context 151 ∗ We have exhibited a proﬁtable deviation, (q1 , pmax ) for supplier ∗ ∗ S1 . This proves that a pair of strategies such that q1 + q2 ≤ d with at least one price p∗ < pmax cannot be a Nash equilibrium. i 2. Suppose that p∗ = p∗ = pmax , and that there exists at least one 1 2 ∗ supplier, say supplier S1 , such that q1 = q ∗ ∈ Q1 (pmax ), i.e., such 1 that the reaction of the market does not maximize S1 ’s proﬁt (see Lemma 7.1). Consequently, the proﬁt for S1 , associated with the ∗ ∗ pair ((q1 , pmax ), (q2 , pmax )) is such that q ∗ pmax − C1 (q ∗ ) < max (qpmax − C1 (q )). 1 1
q ∈[0,d] Since
→0+ q ∈[0,d] lim max (q (pmax − ) − C1 (q )) = max (qpmax − C1 (q )),
q ∈[0,d] there exists some ¯ > 0 such that
q ∈[0,d] max (q (pmax − ¯) − C1 (q )) > q ∗ pmax − C1 (q ∗ ). 1 1 This proves that any deviation (ˆ1 , pmax − ¯) of supplier S1 , such q that q1 ∈ Q1 (pmax − ¯), is proﬁtable for S1 . ˆ
∗ ∗ Hence, a pair of strategies such that q1 + q2 ≤ d, p∗ = p∗ = p∗ , to 1 2 which the market reacts with, for at least one supplier, a quantity ˆ q ∗ ∈ Qi (pmax ) cannot be a Nash equilibrium. i ∗ ∗ ˆ 3. Suppose that p∗ = p∗ = pmax , q1 = q ∗ ∈ Q1 (pmax ) and q2 = q ∗ ∈ 1 2 1 2 ˆ Q2 (pmax ) (i.e., the market reacts optimally for both suppliers). ∗ ∗ In that case the pair ((q1 , p∗ ), (q2 , p∗ )) is a Nash equilibrium. As a 1 2 matter of fact no deviation by changing the quantity can be profitable: since q ∗ is optimal for pmax , the price cannot be increased, i and a decrease of the proﬁt will follow from a decrease of the price of one supplier (Lemma 7.2). ∗ ∗ Excess supply: q1 + q2 > d. Two possibilities occur depending on whether the prices pj , j = 1, 2 diﬀer or not. 1. The prices are diﬀerent, i.e., p∗ < p∗ for example. 1 2 In that case the market ﬁrst buys from the supplier with lower ∗ price (hence q ∗ = inf(q1 , d)), and then completes its demand to 1 the supplier with highest price, S2 (hence q ∗ = d − q ∗ ). 2 1 152 DYNAMIC GAMES: THEORY AND APPLICATIONS For ¯ > 0 such that p∗ + ¯ < p∗ , we have 1 2
∗ ∗ q1 p∗ − C1 (q1 ) ≤ max {qp∗ − C1 (q )} < max {q (p∗ + ¯) − C1 (q )}. 1 1 1 q ∈[0,d] q ∈[0,d] Hence supplier S1 is better oﬀ increasing its price to p∗ + ¯ and 1 proposing quantity q1 ∈ Q1 (p∗ + ¯). As a matter of fact, since ˆ 1 ˆ p∗ + ¯ < p∗ , the reaction of the market will be q 1 = q1 . 1 2 So a pair of strategies with p1 = p2 cannot be a Nash equilibrium. 2. The prices are equal, i.e., p∗ = p∗ = p∗ . 1 2 Now the market faces an optimization problem (7.9) with several solutions. Hence it has to use its additional rule in order to determine the quantities q ∗ , q ∗ to buy from each supplier in response to 12 ∗∗ their oﬀers q1 , q2 . If this response is not optimal for any supplier, i.e., q ∗ ∈ Q1 (p∗ ) 1 and q ∗ ∈ Q2 (p∗ ), the same line of reasoning as in Item 2 of the 2 ∗ ∗ excess demand case proves that the pair ((q1 , p∗ ), (q2 , p∗ )) cannot 1 2 be a Nash equilibrium. Suppose that the reaction of the market is optimal for both suppliers, i.e., q ∗ ∈ Q1 (p∗ ) and q ∗ ∈ Q2 (p∗ ). A necessary condition for 1 2 a supplier, say S1 , to increase its proﬁt is to increase its price and consequently to complete the oﬀer of the other supplier S2 . We have two possibilities, (a) If for at least one supplier, say supplier S1 , we have,
∗ ∗ (d − q2 )pmax − C1 (d − q2 ) > max {qp∗ − C1 (q )}, q ∈[0,d] def then supplier S1 is better oﬀ increasing its price to pmax and ∗ completing the market to sell the quantity d − q2 . (b) Conversely, if none of the suppliers can increase its proﬁt by “completing the oﬀer of the other”, i.e., if
∗ ∗ (d − q2 )pmax − C1 (d − q2 ) ≤ max {qp∗ − C1 (q )}, q ∈[0,d] q ∈[0,d] (7.22) ∗ ∗ (d − q1 )pmax − C2 (d − q1 ) ≤ max {qp∗ − C2 (q )}, (7.23) ∗ ∗ then the pair ((q1 , p∗ ), (q2 , p∗ )) is a Nash equilibrium. 1 2 As a matter of fact, for one supplier, say S1 , changing only ∗ the quantity is not proﬁtable since q1 ∈ Q1 , decreasing the 7 Electricity Prices in a Game Theory Context 153 price is not proﬁtable because of Lemma 7.2. Inequality (7.22) prevents S1 from increasing its price. Remark 7.6 Note that a suﬃcient condition for Inequality (7.22) and ∗ (7.23) to be true, is that both suppliers choose qj = d, j = 1, 2. With this choice each supplier prevents the other supplier from completing its demand with maximal price. This can be interpreted as a wish for the suppliers to obtain a Nash equilibrium. Nevertheless, to do that, the suppliers have to propose to the market a quantity d at price p∗ which may be very risky and hence may not be credible. As a matter of fact, suppose S1 chooses the strategy q1 = d, p1 = p∗ < pmax . If for some reason supplier S2 proposes a quantity q2 at a price p2 > p1 , then S1 has to provide the market with the quantity d at price p1 since q 1 = d, which may be disastrous for S1 . Note also that if pmax is not very high compared with p∗ the inequalities (7.22) and (7.23) will not be satisﬁed. Hence these inequalities could be used for the market to choose a maximal price pmax such that the equilibrium may be possible.
The previous discussion shows that in case of excess supply, the only possibility to have a Nash equilibrium, is that both suppliers propose ∗ ∗ the same price p∗ and quantities q1 , q2 , such that, together with an additional rule, the market can choose optimal quantities q ∗ , q ∗ that 1 2 satisfy its demand and such that q ∗ ∈ Qj (p∗ ). j This is clearly not possible for any price p∗ . If the price p∗ is too small, then the optimal quantity the suppliers can bring to the market is small, and for any q ∈ Q1 (p∗ ) + Q2 (p∗ ), we have q < d. If the price p∗ is too high, then the optimal quantity the suppliers are willing to bring to the market are large, and for any q ∈ Q1 (p) + Q2 (p), we have q > d. The following Lemma characterizes the possible values of p∗ for which it is possible to ﬁnd q1 and q2 that satisfy q1 ∈ Q1 (p∗ ), q2 ∈ Q2 (p∗ ) and q1 + q2 = d . Let us ﬁrst deﬁne the function Cj from [0, d] to the set of intervals of + as R Cj (q ) = [Cj (q − ), Cj (q + )], for q smaller than the maximal production quantity Qj , and Cj (q ) = ∅ for q > Qj . Clearly Cj (q ) = {Cj (q )} except when Cj has a discontinuity in q . We now can state the lemma. Lemma 7.3 It is possible to ﬁnd q1 , q2 such that q1 + q2 = d, q1 ∈ Q1 (p) and q2 ∈ Q2 (p) if and only if
p ∈ I, 154
where, DYNAMIC GAMES: THEORY AND APPLICATIONS I= def q ∈[0,d] (C1 (q ) ∩ C2 (d − q )),
def or, equivalently, when Q1 + Q2 ≥ d, I = [I − , I + ], where I − = min{p, max(q ∈ Q1 (p)) + max(q ∈ Q2 (p)) ≥ d}, I + = max{p, min(q ∈ Q1 (p)) + min(q ∈ Q2 (p)) ≤ d}, and I = ∅ when Q1 + Q2 < d. Proof. If p ∈ I , then there exists q ∈ [0, d] such that p ∈ C1 (q ) and p ∈ C2 (d − q ). We take q1 = q , q2 = d − q and conclude by applying Lemma 7.1. Conversely, if p ∈ I , then it is not possible to ﬁnd q1 , q2 such that q1 + q2 = d, and such that p ∈ C1 (q1 ) ∩ C2 (q2 ) i.e., such that according to Lemma 7.1 q1 ∈ Q1 (p) and q2 ∈ Q2 (p). Straightforwardly, if I − ≤ p ≤ I + , then there exists q1 ∈ Q1 (p) and 2 q2 ∈ Q2 (p) such that q1 + q2 = d. We sum up the previous analysis by the following proposition.
def def Proposition 7.4 In a market with maximal price pmax with two suppliers, each having to propose quantity and price to the market, and each one wanting to maximize its proﬁt, we have the following Nash equilibrium:
1. If pmax < min{p ∈ I}Excess demand case, any strategy proﬁle ∗ ∗ ∗ ∗ ((q1 , pmax ), (q2 , pmax )), with q1 ∈ Q1 (pmax ) and q2 ∈ Q2 (pmax ) is a ∗ + q ∗ < d. Nash equilibrium. In that case we have q1 2
∗ ∗ ∗ 2. If pmax = min{p ∈ I}, any pair ((q1 , pmax ), (q2 , pmax )) where q1 ∈ ∗ Q1 (pmax ) and q2 ∈ Q2 (pmax ) is a Nash equilibrium. In that case ∗ + q ∗ ≥ d or q ∗ + q ∗ < d. we may have q1 2 1 2 3. If pmax > min{p ∈ I} Excess supply case, any pair ∗ ∗ ((q1 , p∗ ), (q2 , p∗ )), such that p∗ ∈ I , p∗ ≤ pmax and which induces ∗ ∗ a reaction (q 1 , q 2 ), q 1 ≥ q1 , q 2 ≥ q2 , such that (a) q 1 + q 2 = d, (b) q 1 ∈ Q1 (p∗ ), q 2 ∈ Q2 (p∗ ), (c) ∗ ∗ (d − q2 )pmax − C1 (d − q2 ) ≤ q 1 p∗ − C1 (q 1 ), ∗ )p ∗ ) ≤ q p∗ − C (q ), d − q1 max − C2 (d − q1 22 2 is a Nash equilibrium. 7 Electricity Prices in a Game Theory Context 155 We now want to generalize the previous result to a market with S ≥ 2 suppliers. ∗ ∗ Let ((q1 , p∗ ), . . . , (qS , p∗ )) be a strategy proﬁle, and let (q 1 , . . . , q S ) be 1 S the induced reaction of the market. This strategy proﬁle is a Nash equi∗ librium, if for any two suppliers, Si , Sj , the pair of strategies ((qi , p∗ ), i ∗ , p∗ )) is a Nash equilibrium for a market with two suppliers (with (qj j ˜ evaluation function deﬁned by Equation (7.20)) and demand d = d − k∈{i,j } q k . Hence using the above Proposition 7.4, we know that necessarily at equilibrium the prices proposed by the suppliers are equal, and the quan∗ tities qi induce a reaction of the market such that q i ∈ Qi (p∗ ). Let us ﬁrst extend the previous deﬁnition of the set I by I = ∅ if S j =1 Qj < d, and otherwise, I = [I − , I + ], where I − = min{p, I + = max{p, We have the following
def S j =1 max(q S j =1 min(q ∈ Qj (p)) ≥ d}, ∈ Qj (p)) ≤ d}. (7.24) Theorem 7.1 Suppose we have S suppliers on a market with maximal price pmax and demand d.
1. If pmax < min{p ∈ I}Excess demand case, any strategy proﬁle ∗ ∗ ∗ ((q1 , pmax ), . . . , (qS , pmax )), with qj ∈ Qj (pmax ), j = 1, . . . , S , is a ∗ Nash equilibrium. In that case we have S=1 qj < d. j
∗ ∗ 2. If pmax = min{p ∈ I}, any strategy proﬁle ((q1 ,pmax ),. . ., (qS ,pmax )) ∗ where qj ∈ Qj (pmax ), j = 1, . . . , S is a Nash equilibrium. In that ∗ case we may have S=1 qj ≥ d or S=1 qj < d. j j 3. If pmax > min{p ∈ I} Excess supply case, any strategy proﬁle ∗ ∗ ((q1 , p∗ ), . . . , (qS , p∗ )), such that p∗ ∈ I , p∗ ≤ pmax and which ∗ induces a reaction (q 1 , . . . , q S ), q j ≥ qj , j = 1, . . . , S , such that (a)
S j =1 q j = d, (b) q j ∈ Qj (p∗ ), j = 1, . . . , S , (c) for any j = 1, . . . , S (d −
k=j ∗ qk )pmax − Cj (d − k=j ∗ qk ) ≤ q j p∗ − Cj (q j ), (7.25) is a Nash equilibrium. 156 DYNAMIC GAMES: THEORY AND APPLICATIONS The previous results show that a Nash equilibrium always exists for the case where the proﬁt is used by the suppliers as an evaluation function. For convenience we have supposed the existence of a maximal price pmax . On real markets we observe that, usually this maximal price is inﬁnity, since most markets do not impose a maximal price on electricity. Hence the interesting case is the case where pmax > min{p ∈ I}. The case with pmax ≤ min{p ∈ I}, can be interpreted as a monopolistic situation. The demand is so large compared with the maximal price that each supplier can behave as if it is alone on the market. When pmax is large enough, Proposition 7.4 and Theorem 7.1 exhibit some conditions for a strategy proﬁle to be a Nash equilibrium. We can make several remarks.
∗ Remark 7.7 Note that conditions (7.25) are satisﬁed for qj = d. Hence we can conclude that, provided that the market reacts in such a way that q j ∈ Qj , the strategy proﬁle ((d, p∗ ), . . . , (d, p∗ )) is a Nash equilibrium. Nevertheless, this equilibrium is not realistic. As a matter of fact, to implement this equilibrium, the suppliers have to propose to the market a quantity that is higher than the optimal quantity, and which possibly may lead to a negative proﬁt (when pmax is very large). The second aspect that may appear unrealistic is the fact that the suppliers give up their power of decision. As a matter of fact, they announce high quantities, so that (7.25) is satisﬁed, and let the market choose the appropriate q j . Example. We consider the market, with demand d = 10 and maximal price pmax = +∞, with the ﬁve suppliers already described page 147. In order to be able to compare the equilibria for both criteria, market share and proﬁt, we suppose that the marginal cost is equal to the minimal price function displayed in the table page 148, i.e., Cj = Lj .
The following table displays the quantities o(p) = Qj (p)} and O(p) =
def 5 j =1 max{q def 5 j =1 min{q ∈ ∈ Qj (p)}. p ∈ [0, 10[ = 10 ∈]10, 15[ = 15 ∈]15, 20[ = 20 ∈]20, 23[ o(p) 0 0 2 2 9 9 15 O(p) 0 2 2 9 9 15 15 From the above table we deduce that I = {20}, hence the only possible equilibrium price is p∗ = 20. As a matter of fact, we have O(20) = 15 > 10 = d, and for any p < 20, O(p) < 10 = d, and o(20) = 9 < 10 = d, and for any p > 20, o(p) > 10 = d. Hence p∗ = 20 ∈ I as deﬁned by Equation (7.24). 7 Electricity Prices in a Game Theory Context 157 Now concerning the quantities, the equilibrium depend upon the additional rule of the market. We suppose that the market chooses q i ∈ Qi , ∀i ∈ {1, . . . , 5}, and then to give preference to S1 , then to S2 , etc. ∗ ∗ ∗ ∗ ∗ The equilibrium is q1 ≥ 3, q2 ≥ 0, q3 ≥ 2, q4 ≥ 4, q2 ≥ ∗ ≥ 0. 0, q5 The fact that the market wants to choose quantities q i ∈ Qi (20) implies that q 1 ∈ Q1 (20) = 3, q 2 ∈ Q2 (20) = [0, 5], q 3 ∈ Q3 (20) = 2, q 4 ∈ Q4 (20) = [4, 5], q 5 = 0, and the preference for S2 compared to S5 implies that q 1 = 3, q 2 = 1, q 3 = 2, q 4 = 4, q 5 = 0. If the preference would have been S5 then S4 then S3 etc. the equilibrium would have been the same, but we would have had q 1 = 3, q 2 = 0, q 3 = 2, q 4 = 5, q 5 = 0. 5. Conclusion We have shown in the previous sections that for both criteria, market share and proﬁt maximization, it is possible to ﬁnd a Nash equilibrium for a number S of suppliers. It is noticeable that for both cases the equilibrium price involved is the same (i.e., p∗ given by Equation (7.14) for market share maximization and p∗ ∈ I deﬁned by Equation (7.24) for proﬁt maximization), only the quantities proposed by the suppliers diﬀer. Nevertheless as already discussed in Remark 7.7, for proﬁt maximization, the equilibrium strategies involved are not realistic in the interesting cases (pmax large). This may suggest that on these unregulated markets where suppliers are interested in instantaneous proﬁt maximization, an equilibrium never occurs. Prices may becomes arbitrarily high and anticipation of the market behavior, and particularly market price, basically impossible. An extended discussion on this topic can be found in Bossy et al. (2004). This paper contains some modeling aspects that could be considered in more detail in future works. A ﬁrst extension would be to consider more general suppliers. As a matter of fact, in the current paper, the evaluation functions chosen are more suitable for providers. Indeed, for proﬁt maximization, we assumed that we had a production cost only for that part of electricity which is actually sold. This would ﬁt the case where suppliers are producers. They produce only the electricity sold. The evaluation function chosen does not ﬁt the case of traders who may have already bought some electricity and try to sell at best price the maximal electricity they have. The extension of our results in that case should not be diﬃcult. 158 DYNAMIC GAMES: THEORY AND APPLICATIONS We supposed that every supplier perfectly knows the evaluation function of the other suppliers, and in particular their marginal costs. In general this is not true. Hence some imperfect information version of the model should probably be investigated. An other extension worthwhile would be to consider the multi market case, since the suppliers have the possibility to sell their electricity on several markets. This aspect has been brieﬂy discussed in Bossy et al. (2004). It leads to a much more complex model, which in particular involves a two level game where at both levels the agents strive to set a Nash equilibrium. Acknowledgments. This work was partially performed while Geert Jan Olsder was on leave at CNRS/i3s/UNSA, Sophia Antipolis, France during the academic year 02/03. References
Ba¸ar, T. and Olsder, G.J. (1999). Dynamic Noncooperative Game Thes ory. SIAM, Philadelphia. Bossy, M., Ma¨ N., Olsder, G.J., Pourtallier, O., and Tanr´, E. (2004). ızi, e Using game theory for the electricity market. Research report INRIA RR5274. H¨m¨l¨inen, R.P. (1996). Agentbased modeling of the electricity distria aa bution system. Proceedings of the 15th IAESTED International Conference, pages 344–346, Innsbruck, Austria. Kempert, C. and Tol, R. (2000). The Liberalisation of the German Electricity Market: Modeling an Oligopolistic Structure by a Computational Game Theoretic Modeling Tool. Working paper, Department of Economics I, Oldenburg University, Oldenburg, Germany. Luce, D. and Raifa, H. (1957). Games and Decisions. Wiley, New York. Madlener, R. and Kaufmann, M. (2002). Power Exchange Spot Market Trading in Europe: Theoretical Considerations and Empirical Evidence. Contract no. ENSKCT200000094 of the 5th Framework Programme of the European Community. Madrigal, M. and Quintana, V.H. (2001). Existence and determination of competitive equilibrium in unit commitment power pool auctions: Price setting and scheduling alternatives. IEEE Transactions on Power Systems, 16:380–388. Murto, P. (2003). On Investment, Uncertainty and Strategic Interaction with Applications to Energy Markets. Ph.D. Thesis, Helsinki University of Technology. 7 Electricity Prices in a Game Theory Context 159 Oﬃce of Electricity Regulation (OFFER, now OFGEM). (1998). Pool Price, A consultation by OFFER. Technical Report, Birmingham, U.K. Ruusunen, J. (1994). Barter contracts in intertemporal energy exchange. European Journal of Operational research, 75:589–599. Stoft, S. (2002). Power System EconomicsDisigning Markets for Electricity. IEEE Press, John Wiley & Sons, Inc. Publication. Supatchiat, C., Zhang, R.Q., and Birge, J.R. (2001). Equilibrium values in a competitive power exchange market. Computational Economics, 17:93–121. Chapter 8 EFFICIENCY OF BERTRAND AND COURNOT: A TWO STAGE GAME
Mich`le Breton e Abdalla Turki
Abstract We consider a diﬀerentiated duopoly where ﬁrms invest in research and development (R&D) to reduce their production cost. The objective of this study is to derive and compare Bertrand and Cournot equilibria, and then examine the robustness of the literature’s results, especially those of Qiu (1997). We ﬁnd that The main results of this study are as follows: (a) Bertrand competition is more eﬃcient if R&D productivity is low, industry spillovers are weak, or products are very diﬀerent. (b) Cournot competition is more eﬃcient if R&D productivity is high and R&D spillovers and products’ degree of substitutability are not very small. (c) Cournot competition may lead to higher outputs, higher consumer surpluses and lower prices, provided that R&D productivity is very high and spillovers and degree of substitutability of ﬁrms’ products are moderate to high. (d) Cournot competition results in higher R&D investments compared to Bertrand’s. These results show that the relative eﬃciencies of Bertrand and Cournot equilibria are sensitive to the suggested speciﬁcations, and hence far from being robust. 1. Introduction This paper compares the relative eﬃciency of Cournot and Bertrand equilibria in a diﬀerentiated duopoly. In a Cournot game, players compete by choosing their outputs while in a Bertrand game, the strategies are the prices of these products. In a seminal paper, Singh and Vives (1984) show that (i) Bertrand competition is always more eﬃcient than Cournot competition, (ii) Bertrand prices (quantities ) are lower (higher ) than Cournot prices (quantities) if the goods are substitutes (complements ), and (iii) it is a dominant strategy for a ﬁrm to choose quantity 162 DYNAMIC GAMES: THEORY AND APPLICATIONS (price ) as its strategic variable provided that the goods are substitutes (complements ). These ﬁndings attracted the economists’ attention and gave rise to two main streams in the literature. The ﬁrst one extends the above model in diﬀerent ways. However, it treats the ﬁrms’ costs of production as constants and assumes that ﬁrms face the same demand and cost structure in both types of competition (Vives (1985), Okuguchi (1987), Dastidar (1997), Hackner (2000), Amir and Jin (2001), and Tanaka (2001)). The second stream of research aims at examining the robustness of the ﬁndings in Singh and Vives (1984) in a duopoly where ﬁrms’ strategies include investments in research and development (R&D), in addition to prices or quantities (see for instance, Delbono and Denicolo (1990), Motta (1993), Qiu (1997) and Symeonidis (2003)). A general result is that the ﬁndings in Singh and Vives (1984) may not always hold. For instance, in a two stage model of a duopoly producing substitute goods and engaging in process R&D (to reduce their unit production cost), Qiu (1997) ﬁnds that (i) although Cournot competition induces more R&D eﬀort than Bertrand competition the latter still results in lower prices and higher quantities, (ii) Bertrand competition is more eﬃcient if either R&D productivity is low, or spillovers are weak, or products are very diﬀerent, and (iii) Cournot competition is more eﬃcient if either R&D productivity is high, spillovers are strong, and products are close substitutes. Similar results are obtained by Symeonidis (2003) for the case where the duopoly engage in product R&D. This paper belongs to this second stream, extending the model of Qiu (1997) in three ways. First, costs of production are assumed to be quadratic in the production levels, rather than linear (decreasing returns to scale). Second, we incorporate the speciﬁcation of the R&D cost function suggested by Amir (2000), making it depend on the spillover level to correct for the eventual biases introduced by postulating additive spillovers on R&D outputs rather than inputs. Finally, the ﬁrms’ beneﬁts from each other in reducing their costs of production are assumed to depend not only on the R&D spillovers but also on the degree of substitutability of their products. We show in this setting that Cournot competition may lead to higher outputs, higher consumer surpluses and lower prices when R&D productivity is high. We also show that when R&D productivity is high, Cournot competition is more eﬃcient. Finally, we show that our results are robust, whether the investment cost is independent or not of the spillover level. The rest of this paper is organized as follows. In Section 2, we outline the model. In Section 3 and 4, we respectively characterize Cournot and 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 163 Bertrand equilibria. In Section 5, we compare the results and we brieﬂy conclude in Section 6. 2. The Model Consider an industry formed of two ﬁrms producing diﬀerentiated but substitutable goods. Each ﬁrm independently undertakes costreducing R&D investments, and chooses the price pi or the quantity qi of its product, so as to maximize its proﬁts. If the ﬁrms choose quantity (price ) levels, then it is said that they engage in a Cournot (Bertrand ) game. Following Singh and Vives (1984), the representative consumer’s preferences are described by the following utility function: 12 2 U (q1 , q2 ) = a(q1 + q2 ) − (q1 + q2 ) − ηq1 q2 , 2 (8.1) where a is positive constant and 0 ≤ η < 1. The parameter η represents the degree of product diﬀerentiation; products become closer to substitutes as this parameter approaches 1. The resulting market inverse demands are linear and given by pi = a − qi − ηqj , i, j = 1, 2, i = j . (8.2) Denote by xi ﬁrm i’s R&D investment. The unit production cost is assumed to depend on both ﬁrms’ investments in R&D as well as on the quantity produced and has the following form r Ci (qi , xi , xj ) = (c + qi − xi − ηβxj ), i, j = 1, 2, i = j , 2 (8.3) where 0 < c < a, r ≥ 0 and 0 ≤ β ≤ 1. The parameter β is the industry degree of R&D spillover and ηβ represents the eﬀective R&D spillover. We assume that the unit cost is strictly positive. The cost speciﬁcation in (8.3) diﬀers from the one proposed by Qiu (1997) in two aspects. First, the beneﬁt in cost reduction from rival’s R&D depends on the eﬀective spillover rate ηβ and not only on β . An explanation of our assumption lies in the fact that the products must be related, i.e., η = 0, for spillovers to be beneﬁcial to ﬁrms (another way to achieve this is to restrict the spillover rate in Qui’s model to values that are smaller than η ). Second, ﬁrm i’s production cost function is assumed to be quadratic in its level of output rather than linear, which allows to model decreasing returns to scale. The investment cost incurred by player i is assumed quadratic in R&D eﬀort, i.e., ν + δηβ 2 xi , i = 1, 2, (8.4) Fi (xi ) = 2 164 DYNAMIC GAMES: THEORY AND APPLICATIONS where δ ≥ 0. For δ > 0, the cost is increasing in the eﬀective spillover level ηβ . Thus higher R&D eﬀective spillover leads to higher R&D costs for each ﬁrm. For δ = 0, the cost is independent of the spillover, as in Qui’s model. When δ > 0, the cost function is steeper in R&D and is similar to the one proposed by Amir (2000)1 . The total proﬁt of ﬁrm i, to be maximized, is given by πi = (pi − Ci (qi , xi , xj )) qi − Fi (xi ). (8.5) We shall in the sequel compare consumer surplus (CS ) and total welfare (T W ) under Bertrand and Cournot modes of play. Recall that consumer surplus is deﬁned as consumer’s utility minus total expenditures evaluated at equilibrium, i.e.,
∗∗ ∗ ∗ CS = U (q1 , q2 ) − p∗ q1 − p∗ q2 . 1 2 (8.6) Total welfare is deﬁned as the sum of consumer surplus and industry’s proﬁt, i.e., ∗ ∗ (8.7) T W = CS + π1 + π2 , where the superscript ∗ refers to (Bertrand or Cournot) equilibrium values. Remark 8.1 Our model is symmetric, i.e., all parameters involved are the same for both players. This assumption allows us to compare Bertrand and Cournot equilibria in a setting where any diﬀerence would be due to the choice of the strategic variables and nothing else. We shall conﬁne our interest to symmetric equilibria in both Cournot and Bertrand games. 3. Cournot Equilibrium In the noncooperative two stage Cournot game, ﬁrms select their R&D investments and output levels, independently and sequentially. To
1 Notice that following d’Aspremont and Jacquemin (1988, 1990), the models used in Qui (1997) postulate that the possible R&D spillovers take place in the ﬁnal R&D outcomes. However, Kamien et al. (1992), among others, presume that such spillovers take place in the R&D dollars (spending). Amir (2000) makes an extensive comparison between these two models (i.e. d’Aspremont and Jacquemin (1988, 1990), and Kamien et al. (1992)), and assesses their validity. He concludes that the latter’s predictions and results are more valid and robust. Furthermore, in order to make the above two models equivalent, Amir (2000) suggests a certain speciﬁcation of the R&D cost function of the ﬁrst abovementioned models. As he shows, such speciﬁcation does make the R&D cost functions steeper, and hence do correct for the biases introduced by postulating additive spillovers on R&D outputs rather than inputs. 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 165 characterize Cournot equilibrium, it will be convenient to use the following notation: A V B R1 G1 D1 E1 Δ1 = = = = = = = = a−c ν + δηβ ηβ + 1 r+2 r+2+η r+2−η r + 2 − η2 β D1 G2 V − BE1 R1 . 1 We assume that the parameters satisfy the conditions:
2 2 D1 G2 V − E1 R1 > 0 1 (8.8) (8.9) D1 G1 V G1 − (a − c) (1 + η ) a − BE1 R1 > 0. The following proposition characterizes Cournot equilibrium. Proposition 8.1 Assuming (8.88.9), in the unique symmetric subgame perfect Cournot equilibrium, output and R&D investment strategies are given by
qC xC = = AD1 G1 V , Δ1 AE1 R1 . Δ1 (8.10) (8.11) Proof. Firm i’s proﬁt function is given by: πi = r ν + δηβ 2 a − qi − ηqj − (c + qi − xi − ηβxj ) qi − xi , 2 2 i, j = 1, 2, i = j. (cournot proﬁt function) Cournot subgame perfect equilibria are derived by maximizing the above proﬁt function sequentially as follows: Given any ﬁrststage R&D outcome (xi , xj ), ﬁrms choose output to maximize their respective market proﬁts. Individual proﬁt functions are strictly concave in qi and, assuming an interior solution, the ﬁrst order conditions yield the unique NashCournot equilibrium output:
∗ qi = AD1 − ηxj (1 − βR1 ) + xi E1 , i, j = 1, 2, i = j. D 1 G1 166 DYNAMIC GAMES: THEORY AND APPLICATIONS In the ﬁrst stage ﬁrms choose R&D levels to maximize their respective proﬁts taking the equilibrium output into account. After substituting for ∗ the equilibrium output levels qi , individual proﬁt functions are concave in xi if (8.8) is satisﬁed. The ﬁrst order conditions yield the unique symmetric R&D equilibrium given in (8.11), which is interior if D1 G2 V − 1 ∗ BE1 R1 = Δ1 > 0, which is implied by (8.9). Substituting for xC in qi yields the Cournot equilibrium output (8.10). 2 The equilibrium Cournot price and proﬁt are given by pC = a − πC = AD1 G1 V (1 + η ) , Δ1 2 2 A2 R1 V D1 G2 V − E1 R1 1 2Δ2 1 which are respectively positive if (8.9) and (8.8) are satisﬁed. Inserting Cournot equilibrium values in (8.6) and (8.7) provides the following consumer surplus and total welfare CS C TWC (AD1 G1 V )2 (1 + η ) , Δ2 1 22 D2 G2 V (1 + η + R1 ) − E1 R1 = A2 V 1 1 . Δ2 1 = (8.12) (8.13) The following proposition compares the results under the two speciﬁcations of the investment cost function in R&D, i.e., for δ > 0 and δ = 0. Proposition 8.2 Cournot’s output, investment in R&D, consumer surplus and total welfare are lower and price is higher when the cost function is steeper in R&D investments (i.e., δ > 0).
Proof. Compute
d dδ xC =− that equilibrium R&D investment is decreasing in δ . Denote xCδ the Cournot equilibrium investment corresponding to a given value of δ . Straightforward computations lead to AD1 E1 G2 R δηβ xCδ − xC 0 = − D G2 (ν +δηβ )−BE R1 1 D G2 ν −BE R . (11 1 1 )( 1 1 1 1) Notice that: qC = A + BxC , G1 (D1 G2 (ν +δηβ )−BE1 R1 ) 1 AD1 E1 G2 R1 ηβ 1 2 < 0, indicating 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 167 pC CS C = a − q C (1 + η ) , = qC
2 (1 + η ) , yielding the results for output, price and consumer surplus. To assess the eﬀect of δ on proﬁts, compute
2 1 D1 G2 V (2BD1 − E1 ) − BE1 R1 d 2 1 . π C = − A2 E1 R1 ηβ 3 dδ 2 D1 G2 V − BE1 R1 1 Using 2BD1 − E1 > 0, D 1 G2 V 1
2 (2BD1 − E1 ) − BE1 R1 > 2BE1 R1 (BD1 − E1 ) , we see that the sign of the derivative depends on the parameter’s values, 1 so that proﬁt can increase with δ when ν is small and β is less than R1 (see appendix for an illustration). However, the impact of δ on total welfare remains negative: d TWC dδ = −A2 E1 R1 ηβ
22 D1 G2 V (2BD1 (1 + R1 + η ) − E1 R1 ) − BE1 R1 1 22 < −2A2 BE1 R1 ηβ D1 G2 V − BE1 R1 1 (D1 B (1 + R1 + η ) − E1 R1 ) D1 G2 V − BE1 R1 1
3 3 < 0. 2 4. Bertrand Equilibrium In the noncooperative two stage Bertrand game, ﬁrms select their R&D investments and prices, independently and sequentially. From (8.2), we deﬁne the demand functions qi = (1 − η )a − pi + ηpj , i, j = 1, 2, i = j. (1 − η 2 ) (8.14) To characterize Bertrand equilibrium, it will be convenient to use the following notation: G2 = r + 2 + η − η 2 D2 = r + 2 − η − η 2 E2 = r + 2 − η 2 β − η 2 168 DYNAMIC GAMES: THEORY AND APPLICATIONS R2 = r + 2 − 2η 2 Δ2 = D2 G2 V − BE2 R2 . 2 We assume that the parameters satisfy the conditions:
2 2 D 2 G 2 V − E 2 R2 > 0 2 (8.15) (8.16) D 2 G2 V G2 − (a − c) (η + 1) a − BE2 R2 > 0. The following proposition characterizes Bertrand equilibrium. Proposition 8.3 Assuming (8.158.16), in the unique symmetric subgame perfect Bertrand equilibrium, price and R&D investment strategies are given by
pB = D2 G2 V (aG2 − A (η + 1)) − aBE2 R2 , Δ2 xB = AE2 R2 Δ2 (8.17) (8.18) Proof. The proof is similar to that of Proposition 1 and is omitted. 2 The equilibrium Bertrand output and proﬁt are given by qB = πB = AD2 G2 V . Δ2 (8.19) 2 2 A2 R2 V D2 G2 V − E2 R2 2 2Δ2 2 which are respectively positive if (8.16) and (8.15) are satisﬁed. Consumer surplus and total welfare are then given by: CS B = TWB (AD2 G2 V )2 (1 + η ) , Δ2 2 22 D2 G2 V (1 + η + R2 ) − R2 E2 = A2 V 2 2 . Δ2 2 (8.20) (8.21) The following proposition compares the Bertrand equilibrium under the two speciﬁcations of the investment cost function in R&D, i.e., for δ > 0 and δ = 0. Proposition 8.4 Bertrand’s output, investment in R&D, consumer surplus and total welfare are lower and price is higher when the cost function is steeper in R&D investments (i.e., δ > 0). 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 169 Proof. The impact of δ on output, investment, consumer surplus and total welfare is obtained as in the proof of Proposition 2. Again, the effect of δ on proﬁt depends on the parameter’s value, and can be positive −β when η is large and r is less than (21β +1) (see appendix for an illustration). 2 5. Comparison of Equilibria In this section we compare Cournot outputs, prices, proﬁts, R&D levels, consumer surplus and total welfare with their Bertrand counterparts. We assume that conditions (8.8), (8.9), (8.15) and (8.16) are satisﬁed by the parameters. These conditions are necessary and suﬃcient for the equilibrium prices, quantities and proﬁts to be positive in both modes of play. Proposition 8.5 Cournot R&D investments are always higher than Bertrand’s.
Proof. From (8.11) and (8.18) we have xC − xB = AV D2 E1 G2 R1 − D1 E2 G2 R2 2 1 , Δ1 Δ2 where the denominator is positive under conditions (8.9) and (8.16). Substituting the parameter values in the numerator yields D2 E1 G2 R1 − D1 E2 G2 R2 = 2 1 η 3 r3 (η + ηβ + 1) + η3 r2 + η3 r 2 − η 2 (3ηβ + 1) + (2 − η ) (η + 2) (η + 1) + η 2 12 − 11η 2 − η 3 + η 4 (ηβ + 1) + η (2 − η ) (η + 2) (η + 1) + 2η 3 (1 − η ) (2 − η ) (η + 2) (η + 1) (ηβ + 1) which is positive. 2 This ﬁnding is similar to that of Qiu (1997) and can be explained in the same way. Qiu analyzes the factors that induce ﬁrms to undertake R&D under both Cournot and Bertrand settings, using general demand and R&D cost functions. He decomposes these factors into four main categories: (i) strategic eﬀects, (ii) spillover eﬀects, (iii) size eﬀects and (iv) cost eﬀects. He shows that while the last three factors do have similar impact on both Cournot and Bertrand, the strategic factor does induce more R&D under the Cournot and discourages it under Bertrand. Therefore he concludes that due to the strategic factor, R&D investment under Cournot will always be higher than that of Bertrand. 170 DYNAMIC GAMES: THEORY AND APPLICATIONS Proposition 8.6 Cournot output and consumer surplus could be higher than Bertrand’s.
Proof. Use (8.10) and (8.19) to get q C − q B = AV (D1 G1 Δ2 − D2 G2 Δ1 ) . Δ1 Δ2 The numerator is the diﬀerence of two positive numbers, and is positive if B (G2 D2 R1 E1 − D1 G1 E2 R2 ) , V< D 2 G2 D 1 G1 η 2 that is, when V is small while satisfying conditions (8.8), (8.9), (8.15) and (8.16), which are lower bounds on V . The intersection of these conditions is not empty (see appendix for examples). This shows that the output under Cournot could be higher than that of Bertrand when R&D productivity is high. Consequently, consumer surplus under Cournot could be higher than under Bertrand. 2 These results are diﬀerent from those of Qiu and Singh and Vives as well. Furthermore, they contrast the fundamental results of the oligopoly theory. As expected, if R&D productivity is low, then the traditional results, which state that output and consumer surplus (prices) under Bertrand are higher (lower) than Cournot’s, still hold. Notice that Proposition 6 holds when r = 0 and δ = 0.2 The diﬀerence in proﬁts can be written: πC − πB = V 2 A2 2B 2 D 1 G3 1 R1 B (ηβ − 1) + D1 G1 V Δ2 1 R2 B (ηβ − 1) + G2 D2 V −D2 G3 2 Δ2 2 = where V 2 A2 αV 2 + γV + λ , 2Δ2 Δ2 12 2 2 α = D2 G2 D1 G2 η 3 (R1 (2 + η ) + 2η ) > 0 2 1 2 22 2 22 γ = 2D2 R2 R1 G2 D1 G2 Bη 3 (ηβ − 1) + D1 G4 E2 R2 − D2 G4 E1 R1 1 2 1 2 2 2 λ = (1 − ηβ ) R2 R1 B D2 G3 E1 R1 − E2 G3 D1 R2 . 2 1 The sign of the above expression is not obvious. Extensive numerical investigations failed to provide an instance where the diﬀerence is
2 This apparent contradiction with Qiu’s result is due to his using a restrictive condition on the parameter values which is suﬃcient but not necessary for the solutions to make sense. 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 171 negative, provided conditions (8.8), (8.9), (8.15) and (8.16) are satisﬁed. Since V can take arbitrarily large values and α > 0, the diﬀerence is increasing in ν when suﬃciently large. This result is similar to that of Qiu (1997) and Singh and Vives (1984) and indicates that ﬁrm should prefer Cournot equilibrium, and more so when productivity of R&D is low. As a consequence, it is apparent that total welfare under Cournot can be higher than that under Bertrand (when for instance both consumer surplus and proﬁt are higher). Examples of positive and negative diﬀerences in total welfare are provided in the appendix. With respect to the conclusions in Qiu, we ﬁnd that eﬃciency of Cournot competition does not require that η and β both be very high, provided R&D productivity is high. Again, this is true for r = 0 and δ = 0. Remark 8.2 It can be easily veriﬁed that if η = 0, i.e., the products are not substitutes, then Bertrand and Cournot equilibria coincide. This means that each ﬁrm becomes a monopoly in its market and it does not matter if the players choose prices or quantities as strategic variables. 6. Concluding Remarks In this paper, we developed a two stage games for a diﬀerentiated duopoly. Each of the two ﬁrms produces one variety of substitute goods. It also engages in process R&D investment activities, which aims at reducing its production costs, incidentally reducing also the competitor’s costs because of spillovers which are proportional to the degree of substitability of the products. The ﬁrms intend to maximize their total proﬁt functions by choosing the optimal levels of either their outputs/or prices as well as R&D investments. We derived and compared the Bertrand and the Cournot equilibria, and then examined the robustness of the literature’s results, especially those of Qiu (1997). The main results of this study are as follows: (a) Bertrand competition is more eﬃcient if R&D productivity is low and eﬀective spillovers are weak. (b) Cournot competition may lead to higher outputs, higher consumer surpluses and lower prices, provided that R&D productivity is very high. (c) Cournot competition results in higher R&D investments compared to Bertrand’s. (d) A steeper investment cost speciﬁcation lowers output, investment and consumer surplus in both kinds of competition but does not change qualitatively the results about their comparative eﬃciencies. (e) These results are robust to a convex production cost speciﬁcation. 172 DYNAMIC GAMES: THEORY AND APPLICATIONS Appendix:
a c η β r v δ xC qC pC πC CS C TWC xB qB pB πB CS B TWB qC − qB πC − πB CS C − CS B TWC − TWB 1 400 300 0.8 0.1 1 0.4 0 276 105 62 1232 19724 22188 201 100 220 581 18108 19271 4 651 1616 2917 2 400 300 0.8 0.1 1 0.4 1 154 70 124 1682 8810 12174 123 74 117 1039 9768 11846 4 643 958 328 3 400 300 0.9 0.2 0.3 1 0 60 54 297 1510 5573 8593 40 63 280 444 7533 8421 9 1066 1960 172 4 400 300 0.9 0.2 0.3 1 1 46 49 307 1453 4542 7448 32 59 288 493 6553 7539 10 960 2010 90 5 900 800 0.9 0.9 0 0.6 0 268 202 17 19129 77192 115450 40 82 244 811 12805 14428 119 18318 64387 101022 6 300 200 0.9 0.7 1 10 0 2 27 249 1036 1348 3419 2 33 237 750 2113 3613 7 286 766 194 Columns 1 and 2 show an example where increasing δ increases Cournot proﬁt. Columns 3 and 4 show an example where increasing δ increases Bertrand proﬁt. Columns 1 and 5 show examples where Cournot quantities and consumer surplus are larger than Bertrand’s. Columns 1, 2, 3 and 5 show examples where total welfare is larger under Cournot, columns 4 and 6 where it is smaller. References
Amir, R. (2000). Modelling imperfectly appropriable R&D via spillovers. International Journal of Industrial Organization, 18:1013–1032. Amir, R. and Jin, J.Y. (2001). Cournot and Bertrand equilibria compared: substitutability, complementarity, and concavity. International Journal of Industrial Organization, 19:303–317. d’Aspremont, C. and Jacquemin, A. (1988). Cooperative and non cooperative R&D in duopoly with spillovers. American Economic Review, 78:1133–1137. d’Aspremont, C., and Jacquemin, A. (1990). Cooperative and Noncooperative R&D in Duopoly with Spillovers: Erratum. American Economic Review, 80:641–642. 8 Eﬃciency of Bertrand and Cournot: A Two Stage Game 173 Dastidar, K.G. (1997). Comparing Cournot and Bertrand in a homogeneous product market. Journal of Economic Theory, 75:205–212. Delbono, F. and Denicolo, V. (1990). R&D in a symmetric and homogeneous oligopoly. International Journal of Industrial Organization, 8:297–313. Hackner, J. (2000). A note on price and quantity competition in diﬀerentiated oligopolies. Journal of Economic Theory, 93:233–239. Kamien, M., Muller, E., and Zang, I. (1992). Research Joint Ventures and R&D Cartels. American Economic Review, 82:1293–1306. Motta, M. (1993). Endogenous quality choice: price versus quality competition. Journal of Industrial Economics, 41:113–131. Okuguchi, K. (1987). Equilibrium prices in Bertrand and Cournot oligopolies. Journal of Economic Theory, 42:128–139. Qiu, L. (1997). On the dynamic eﬃciency of Bertrand and Cournot equilibria. Journal of Economic Theory, 75:213–229. Singh, N. and Vives, X. (1984). Price and quantity competition in a diﬀerentiated duopoly. Rand Journal of Economics, 15:546–554. Symeonidis, G. (2003). Comparing Cournot and Bertrand equilibria in a diﬀerentiated duopoly with product R&D. International Journal of Industrial Organization, 21:39–55. Tanaka, Y. (2001). Proﬁtability of price and quantity strategies in an oligopoly. Journal of Mathematical Economics, 35:409–418. Vives, X. (1985). On the eﬃciency of Bertrand and Cournot equilibria with product diﬀerentiation. Journal Economic Theory, 36:166–175. Chapter 9 CHEAP TALK, GULLIBILITY, AND WELFARE IN AN ENVIRONMENTAL TAXATION GAME
Herbert Dawid Christophe Deissenberg ˇc Pavel Sevˇik
Abstract We consider a simple dynamic model of environmental taxation that exhibits time inconsistency. There are two categories of ﬁrms, Believers, who take the tax announcements made by the Regulator to face value, and NonBelievers, who perfectly anticipate the Regulator’s decisions, albeit at a cost. The proportion of Believers and NonBelievers changes over time depending on the relative proﬁts of both groups. We show that the Regulator can use misleading tax announcements to steer the economy to an equilibrium that is Pareto superior to the solutions usually suggested in the literature. Depending upon the initial proportion of Believers, the Regulator may prefer a fast or a low speed of reaction of the ﬁrms to diﬀerences in Believers/NonBelievers proﬁts. 1. Introduction The use of taxes as a regulatory instrument in environmental economics is a classic topic. In a nutshell, the need for regulation usually arises because producing causes detrimental emissions. Due to the lack of a proper market, the ﬁrms do not internalize the impact of these emissions on the utility of other agents. Thus, they take their decisions on the basis of prices that do not reﬂect the true social costs of their production. Taxes can be used to modify the prices confronting the ﬁrms so that the socially desirable decisions are taken. The problem has been exhaustively investigated in static settings where there is no room for strategic interaction between the Regulator and the ﬁrms. Consider, however, the following situation: (a) The emission taxes have a dual eﬀect, they incite the ﬁrms to reduce 176 DYNAMIC GAMES: THEORY AND APPLICATIONS production and to undertake investments in abatement technology. This is typically the case when the emissions are increasing in the output and decreasing in the abatement technology; (b) Emission reduction is socially desirable, the reduction of production is not; and (c) The investments are irreversible. In that case, the Regulator must ﬁnd an optimal compromise between implementing high taxes to motivate high investments, and keeping the taxes low to encourage production. The fact that the investments are irreversible introduces a strategic element in the problem. If the ﬁrms are naive and believe his announcements, the Regulator can insure high production and important investments by ﬁrst declaring high taxes and reducing them once the corresponding investments have been realized. More sophisticated ﬁrms, however, recognize that the initially high taxes will not be implemented, and are reluctant to invest in the ﬁrst place. In other words, one is confronted with a typical time inconsistency problem, which has been extensively treated in the monetary policy literature following Kydland and Prescott (1977) and Barro and Gordon (1983). In environmental economics, the time inconsistency problem has received yet only limited attention, although it frequently occurs. See among others Gersbach and Glazer (1999) for a number of examples and for an interesting model, Abrego and Perroni (1999), Batabyal (1996a), Batabyal (1996b), Dijkstra (2002), Marsiliani and Renstr˝m (2000), Petrakis and Xepapadeas (2003). o The time inconsistency is directly related to the fact that the situation described above deﬁnes a Stackelberg game between the Regulator (the leader) and the ﬁrms (the followers). As noted in the seminal work of Simaan and Cruz (1973a,b), inconsistency arises because the Stackelberg equilibrium is not deﬁned by mutual best responses. It implies that the follower uses a best response in reaction the leader’s action, but not that the leader’s action is itself a best response to the follower’s. This opens the door to a reoptimization by the leader once the follower has played. Thus, a Regulator who announces that he will implement the Stackelberg solution is not credible. An usual conclusion is that, in the absence of additional mechanisms, the economy is doomed to converge towards the less desirable Nash solution. A number of options to insure credible solutions have been considered in the literature – credible binding commitments by the Stackelberg leader, reputation building, use of trigger strategies by the followers, etc. See McCallum (1997) for a review in a monetary policy context. Schematically, these solutions aim at assuring the time consistency of Stackelberg solution with either the Regulator or the ﬁrms as a leader. Usually, these solutions are not eﬃcient and can be Paretoimproved. 9. Environmental Taxation Game 177 In this paper, we suggest a new solution to the time inconsistency problem in environmental policy. We show that nonbinding tax announcements can increase the payoﬀ not only of the Regulator, but also of all ﬁrms, if these include any number of naive Believers who take the announcements at face value. Moreover, if ﬁrms tend to adopt the behavior of the most successful ones, a stable equilibrium may exist where a positive fraction of ﬁrms are Believers. This equilibrium Paretodominates the one where all ﬁrms anticipate perfectly the Regulator’s action. To attain the superior equilibrium, the Regulator builds reputation and leadership by making announcements and implementing taxes in a way that generates good results for the Believers, rather than by precommitting to his announcements. This Paretosuperior equilibrium does not always exist. Depending upon the model parameters (most crucially: upon the speed with which the ﬁrms that follow diﬀerent strategies react to diﬀerences in their respective proﬁts, i.e., upon the ﬂexibility of the ﬁrms) it may be rational for the Regulator to steer the Paretoinferior fully rational equilibrium. This paper, thus, stresses the importance of the ﬂexibility in explaining the policies followed by a Regulator, the welfare level realized, and the persistence or decay of private conﬁdence in the Regulator’s announcements. The potential usefulness of employing misleading announcements to Paretoimprove upon standard gametheoretic equilibrium solutions was suggested for the case of general linearquadratic dynamic games in Vall´e et al. (1999) and developed by the same authors in subsequent pae pers. An early application to environmental economics is Vall´e (1998). e The Believers/NonBelievers dichotomy was introduced in Deissenberg and Alvarez Gonzalez (2002), who study the credibility problem in monetary economics in a discretetime framework with reinforcement learning. A similar monetary policy problem has been investigated in Dawid and Deissenberg (2004) in a continuoustime setting akin to the one used in the present work. The paper is organized as follows. In Section 2, we present the model of environmental taxation, introduce the imitationtype dynamics that determine the evolution of the number of Believers in the economy, and derive the optimal reaction functions of the ﬁrms. In Section 3, we discuss the solution of the static problem one obtains by assuming a constant proportion of Believers. Section 4 is devoted to the analysis of the dynamic problem and to the presentation of the main results. Section 5 concludes. 178 DYNAMIC GAMES: THEORY AND APPLICATIONS 2. The Model We consider an economy consisting of a Regulator R and of a continuum of atomistic, proﬁtmaximizing ﬁrms i with identical production technology. Time τ is continuous. To keep notation simple, we do not index the variables with either i or τ , unless useful for a better understanding. In a nutshell, the situation we consider is the following. The Regulator can tax the ﬁrms in order to incite them to reduce their emissions. Taxes, however, have a negative impact on the employment. Thus, R has to choose them in order to achieve an optimal compromise between emissions reduction and employment. The following sequence of events occurs in every τ : R makes a nonbinding announcement ta ≥ 0 about the tax level he will implement. The tax level is deﬁned as the amount each ﬁrm has to pay per unit of its emissions. Given ta , the ﬁrms form expectations te about the actual level of the environmental tax. As will be described in more detail later, there are two diﬀerent ways for an individual ﬁrm to form its expectations. Each ﬁrm i decides about its level of emission reduction vi based on its expectation te and makes the necessary i investments. R chooses the actual level of tax t ≥ 0. Each ﬁrm i produces a quantity xi generating emissions xi − vi . The individual ﬁrms revise the way they form their expectations (that is, revise their beliefs) depending on the proﬁts they have realized. The Firms Each ﬁrm produces the same homogenous good using a linear production technology: The production of x units of output requires x units of labor and generates x units of environmentally damaging emissions. The production costs are given by: c(x) = wx + cx x2 , (9.1) where x is the output, w > 0 the ﬁxed wage rate, and cx > 0 a parameter. For simplicity’s sake, the demand is assumed inﬁnitely elastic at the given price p > w. Let p := p − w > 0. ˜ ˜ 9. Environmental Taxation Game 179 At each point of time, each ﬁrm can spend an additional amount of money γ in order to reduce its current emissions. The investment γ (v ) = cv v 2 , (9.2) with cv > 0 a given parameter, is needed in order to reduce the ﬁrm’s current emissions by v ∈ [0, x]. The investment in one period has no impact on the emissions in future periods. Rather than expenditures in emissionreducing capital, γ can therefore be interpreted as the additional costs resulting of a temporary switch to a cleaner resource – say, of a switch from coal to natural gas. Depending on the way they form their expectations te , we consider two types of ﬁrms, Believers B and NonBelievers N B . The fraction of Believers in the population is denoted by π ∈ [0, 1]. Believers consider the Regulator’s announcement to be truthful and set te = ta . NonBelievers disregard the announcement and anticipate perfectly the actual tax level, te = t. Making perfect anticipations at any point of time, however, is costly. Thus, NonBelievers occur costs of δ > 0 per unit of time. The ﬁrms are proﬁtmaximizers. As will become apparent in the following, one can assume without loss of substance that they are myopic, that is, maximize in every τ their current proﬁt. The Regulator R The Regulator’s goal is to maximize over an inﬁnite horizon the cumulated discounted value of an objective function with the employment, the emissions, and the tax revenue as arguments. In order to realize this objective, it has two instruments at his disposal, the announced instantaneous tax level ta , and the actually realized level t. The objective function is given by: Φ(ta , t) =
0 ∞ e−ρτ φ(ta , t)dτ e−ρτ k (πxb + (1 − π )xnb ) − κ(π (xb − v b ) (9.3) ∞ :=
0 +(1 − π )(xnb − v nb )) + t(π (xb − v b ) + (1 − π )(xnb − v nb )) dτ, where xb , xnb , v b , v nb denote the optimal production respectively investment chosen by the Believers B and the NonBelievers N B , and where k and κ are strictly positive weights placed by R on the average employment respectively on the average emissions (remember that output and employment are in a onetoone relationship in this economy). The strictly positive parameter ρ is a social discount factor. 180 DYNAMIC GAMES: THEORY AND APPLICATIONS Belief Dynamics The ﬁrms’ beliefs (B or N B ) change according to a imitationtype dynamics, see Dawid (1999), Hofbauer and Sigmund (1998). The ﬁrms meet randomly twobytwo, each pairing being equiprobable. At each encounter the ﬁrm with the lower current proﬁt adopts the belief of the other ﬁrm with a probability proportional to the current diﬀerence between the individual proﬁts. This gives rise to the dynamics: π = βπ (1 − π )(g b − g nb ), ˙ (9.4) where g b and g nb denote the proﬁts of a Believer and of a NonBeliever: g b = pxb − cx (xb )2 − t(xb − v b ) − cv (v b )2 , g nb = pxnb − cx (xnb )2 − t(xnb − v nb ) − cv (v nb )2 − δ. Notice that π (1 − π ) reaches its maximum for π = 1 (the value of π for 2 which the probability of encounter between ﬁrms with diﬀerent proﬁts is maximized), and tends towards 0 for π → 0 and π → 1 (for extreme values of π , almost all ﬁrms have the same proﬁts). The parameter β ≥ 0, that depends on the adoption probability of the other’s strategy may be interpreted as a measure of the willingness to change strategies, that is, of the ﬂexibility of the ﬁrms. Equation (9.4) implies that by choosing the value of (ta , t) at time τ , the Regulator not only inﬂuences its instantaneous objective but also the future proportion of B s in the economy. This, in turn, has an impact on the values of its objective function. Hence, although there are no explicit dynamics for the economic variables v and x, the R faces a nontrivial intertemporal optimization problem. Optimal Decisions of the Firms Since the ﬁrms are atomistic, each single producer is too small to inﬂuence the dynamics of π . Thus, the single ﬁrm does not take into account any intertemporal eﬀect and, independently of its true planing horizon, de facto maximizes its current proﬁt in every τ . Each ﬁrm chooses its investment v after it has learned ta but before t has been made public. However, it ﬁxes its production level x after v and t are known. The ﬁrms being price takers, the optimal production decision is: x= p−t . 2cx (9.5) The thus deﬁned optimal x does not depend upon ta , neither directly nor indirectly. As a consequence, both B s and N B s choose the same 9. Environmental Taxation Game 181 production level (9.5) as a function of the realized tax t alone, xb = xnb = x. The proﬁt of a ﬁrm given that an investment v has been realized is: g (v ; t) = (p − t)2 + tv. 4cx When the ﬁrms determine their investment v , the actual tax rate is not known. Therefore, they solve: max[g (v ; te ) − cv v 2 ],
v with te = t if the ﬁrm is a N B and te = ta if the ﬁrm is a B . The interior solution to this problem is: vb = ta t , v nb = . 2cv 2cv (9.6) The net emissions x − v of any ﬁrm will remain nonnegative after the investment, i.e., v ∈ [0, x] will hold, if: p≥ c v + cx max[t, ta ]. cv (9.7) Given above expressions (9.5) for x and (9.6) for v , it is straightforward to see that the belief dynamics can be written as: π = βπ (1 − π ) ˙ −(ta − t)2 +δ . 4cv (9.8) The two eﬀects that govern the evolution of π become now apparent. Large deviations of the realized tax level t from ta induce a decrease in the stock of believers, whereas the stock of believers tends to grow if the cost δ necessary to form rational expectations is high. Using (9.5) and (9.6), the instantaneous objective function φ of the Regulator becomes: φ(ta , t) = (k − κ + t) p−t κ−t a + (πt + (1 − π )t). 2cx 2cv (9.9) 3. The static problem In the model, there is only one source of dynamics, the beliefs updating (9.4). Before investigating the dynamic problem, it is instructive to cursorily consider the solution $ of the static case in which R maximizes the integrand in (9.3) for a given value of π . 182 DYNAMIC GAMES: THEORY AND APPLICATIONS From (9.9), one recognizes easily that at the optimum ta$ will either take its highest possible value or be zero, depending on wether κ − t$ is positive or negative. The case κ − t$ < 0 corresponds to the uninteresting situation where the regulator values tax income more than emissions reduction and thus tries to increase the volume of emissions. We therefore restrict our analysis to the environmentally friendly case t$ < κ. ¯ Note that (9.7) provides a natural upper bound ta for t, namely: ¯ ta = pcv . cv + cx (9.10) ¯ Assuming that the optimal announcement ta$ takes the upper value ta just deﬁned (the choice of another bound is inconsequential for the qualitative results), the optimal tax t$ is: 1 cv k ¯ t$ = (κ + ta − ). 2 cv + cx − cx π (9.11) Note that t$ is decreasing in π . (As π increases, the announcement ta becomes a more powerful instrument, making a recourse to t less necessary). The requirement κ > t$ is fulﬁlled for all π iﬀ: cv (k + κ − p) + cx κ > 0. (9.12) In the reminder of this paper, we assume that (9.12) holds. Turning to the ﬁrms proﬁts, one recognizes the diﬀerence g nb − g b between the N B s and B s proﬁts is increasing in t$ − ta$ : g nb − g b = (t$ − ta$ )2 − δ. 4cv (9.13) For δ = 0, the proﬁt of the N B s is always higher that the proﬁt of the B s whenever ta = t, reﬂecting the fact the latter make a systematic error about the true value of t. The proﬁt of the B s, however, can exceed the one of the N B s if the learning cost δ is high enough. Since ta$ is constant and t$ decreasing in π , and since t$ < ta$ , the diﬀerence ta$ − t$ increases in π . Therefore, the diﬀerence between the proﬁts of the N B s and B s, (9.13), is increasing in π . Further analytical insights are exceedingly cumbersome to present due to the complexity of the functions involved. We therefore illustrate a remarkable, generic feature of the solution with the help of Figure 9.11 .
1 The parameter values underlying the ﬁgure are cv = 5, cx = 3, δ = 0, p = 6, k = 4, κ = 3. The ﬁgure is qualitatively robust with respect to changes in these values. 9. Environmental Taxation Game 183 Not only the Regulator’s utility φ increases with π , so do also the proﬁts of the B s and N B s. For π = 0, the outcome is the Nash solution of a game between the N B s and the Regulator. This outcome is not eﬃcient, leaving room for Paretoimprovement. As π increases, the B s enjoy the ¯ beneﬁts of a decreasing t, while their investment remains ﬁxed at v (ta ). Likewise, the N B s beneﬁt from the decrease in taxation. The lowering of the taxation is made rational by the existence of the Believers, who are led by the R’s announcements to invest more than they would otherwise, and to subsequently emit less. Accordingly, the marginal tax income goes down as π increases and therefore R is induced to reduce taxation if the proportion of believers goes up. The only motive that could lead R to reduce the spread between t ¯ and ta , and in particular to choose ta < ta , lies in the impact of the tax announcements on the beliefs dynamics. Ceteris paribus, R prefers a high proportion of B s to a low one, since it has one more instrument (that is, ta ) to inﬂuence the B s than to control the N B s. A high spread ta − t, however, implies that the proﬁts of the B s will be low compared to those of the N B s. This, by (9.4), reduces the value of π and leads ˙ over time to a lower proportion of B s, diminishing the instantaneous utility of R. Therefore, in the dynamic problem, R will have to ﬁnd an optimal compromise between choosing a high value of ta , which allows a low value of t and insures the Regulator a high instantaneous utility, and choosing a lower one, leading to a more favorable proportion of B s in the future.
2.4 φ, g
2.2 2 1.8 1.6 1.4 0.2 0.4 0.6 0.8 π Figure 9.1. Proﬁts of the Believers (solid line), of the NonBelievers (dashed line), and Regulator’s utility (boldline) as a function of π . 184 DYNAMIC GAMES: THEORY AND APPLICATIONS 4. 4.1 Dynamic analysis Characterization of the optimal paths As pointed out earlier, the Regulator faces a dynamic optimization problem because of the eﬀect of his current action on the future stock of believers. This problem is given by:
0≤ta (τ ),t(τ ) max Φ(ta , t) subject to (9.8). The Hamiltonian of the problem is given by: p−t κ−t a + (πt + (1 − π )t) H (ta , t, π, λ) = (k − κ + t) 2cx 2cv 1a (t − t)2 + δ , 4cv where λ denotes the costate variable. The Hamiltonian is concave in (t, ta ) iﬀ: (9.14) 2λβ (1 − π )(cv + cx ) − πcx > 0. ∗ are then given by: The optimal controls + λβπ (1 − π ) − t∗ (π, λ) = ta∗ (π, λ) = p λβ (1 − π )(cv (ˆ − k + κ) + cx κ) − cx κπ 2λβ (1 − π )(cv + cx ) − cx π 1 cx κ(1 − π ) 2λβ (1 − π )(cv + cx ) − cx π p p + λβ (1 − π )(cv (ˆ − k + κ) + cx κ) − cv (ˆ − k − κ) Otherwise, there are no interior optimal controls. In what follows we assume that (9.14) is satisﬁed along the optimal path. It can be easily checked that this is the case at the equilibrium discussed below. The diﬀerence between the optimal announced and realized tax levels is given by: κ − t∗ . (9.17) ta ∗ − t∗ = λβ (1 − π ) Hence the optimal announced tax exceeds the realized one if and only if t∗ < κ. As in the static case, we restrict the analysis to the environmentally friendly case t∗ < κ and assume that (9.12) holds. According to Pontriagins’ Maximum Principle, an optimal solution (π (τ ), λ(τ )) has to satisfy the state dynamics (9.8) plus: ∂H (ta∗ (π, λ), t∗ (π, λ), π, λ) ˙ , λ = ρλ − ∂π lim e−ρτ λ(τ ) = 0.
τ →∞ (9.15) (9.16) (9.18) (9.19) 9. Environmental Taxation Game 185 In our case the costate dynamics are given by: 1a κ−t a ˙ (t − t) − λβ (1 − 2π ) − (t − t)2 + δ ) . λ = ρλ − 2cv 4cv (9.20) In order to analyze the long run behavior of the system we now provide a characterization of the steady states for diﬀerent values of the public ﬂexibility parameter β . Due to the structure of the state dynamics there are always trivial steady states for π = 0 and π = 1. For π = 0 the announcement is irrelevant. Its optimal value is therefore indeterminate. The optimal tax level is t = κ. For π = 1, the concavity condition (9.14) is violated and the optimal controls are like in the static case. In the following, we restrict our attention to π < 1. In Proposition 1, we ﬁrst show under which conditions there exists an interior steady state where Believers coexist with NonBelievers. Furthermore, we discuss the stability of the steady states. Proposition 9.1 Steady states and their stability:
ρ (i) For 0 < β ≤ 2δ there exists no interior steady state. The steady state at π = 0 is stable. (ii) For β > with: ρ 2δ there exists a unique stable interior steady state π + π+ = 1 − (9.21) ρ . 2βδ The steady state at π = 0 is unstable. Proof. We prove the existence and the stability of the interior steady state. The claims about the boundary steady state at π = 0 follow directly. Equation (9.8) implies that, in order for π = 0, to hold one must have: ˙ (ta∗ − t∗ )2 = 4cv δ. That is, taking into account (9.17): t∗ = κ − 2λ+ β (1 − π + ) cv δ. (9.23) (9.22) ˙ Equation (9.22) implies that λ = 0 is satisﬁed iﬀ: ρλ − κ − t∗ a∗ (t − t∗ ) = 0. 2cv (9.24) Using (9.17) for ta∗ − t∗ in (9.24) we obtain the condition: t∗ = κ − 2ρβ (1 − π + )cv . (9.25) 186 DYNAMIC GAMES: THEORY AND APPLICATIONS Combining (9.23) and (9.25) shows that 2λ+ β (1 − π + ) cv δ = 2ρβ (1 − π + )cv must hold at an interior steady state. This condition is equivalent to ρ (9.21). For β ≤ 2δ , (9.21) becomes smaller or equal zero. Therefore, an ρ interior steady state is only possible for β > 2δ . Using (9.21, 9.15, 9.23), one obtains for the value of the costate at the steady state: √ √ δβ (cx (2 cv δ + κ) + cv (k + κ − p)) − ρcx cv δ + √ . (9.26) λ= 2ρβ cv δ (cv + cx ) To determine the stability of the interior steady state we investigate the Jacobian matrix J of the canonical system (9.8, 9.20) at the steady state. The eigenvalues of J are given by: e1,2 = tr(J ) ± tr(J )2 − 4det(J ) . 2 Therefore, the steady state is saddle point stable if and only if the determinant of J is negative. Inserting (9.15, 9.16) into the canonical system (9.8, 9.20), taking the derivatives with respect to (π, λ) and then inserting (9.21, 9.26) gives the matrix J . Tedious calculations show that its determinant is given by: det(j ) = C − √ δ β (cv (k + κ − p) + cx κ) − cx cv (2βδ − ρ) , where C is a positive constant. The ﬁrst of the two terms in the bracket is negative due to the assumption (9.12), the second is negative whenever ρ β > 2δ . Hence, det(J ) < 0 whenever an interior steady state exists, ρ implying that the interior steady state is always stable. For β = 2δ this stable steady state collides with the unstable one at π = 0. The steady ρ 2 state at π = 0 therefore becomes stable for β ≤ 2δ . Since there is always only one stable steady state and since cycles are impossible in a control problem with a onedimensional state, we can conclude from Proposition 9.1 that the stable steady state is always a global attractor. Convergence to the unique steady state is always monotone. The long run fraction of believers in the population is independent from the original level of trust. From (9.21) one recognizes that it decreases with ρ. An impatient Regulator will not attempt to build up a large proportion of B s since the time and eﬀorts needed now for an additional increase of π weighs heavily compared to the future beneﬁts. By contrast, π + is increasing in β and δ . A high ﬂexibility β of the 9. Environmental Taxation Game 187 ﬁrms means that the cumulated loss of potential utility occurred by the Regulator en route to π + will be small and easily compensated by the gains in the vicinity of and at π + . Reinforcing this, the Regulator does not have to make B s much better oﬀ than N B s in order to insure a fast reaction. As a result, for β large, the equilibrium π + is characterized by a large proportion of B s and provide high proﬁts respectively utility to all players. A high learning cost δ means that the Regulator can make the B s better oﬀ than the N B s at low or no cost, implying again a high value of π at the steady state. Note that it is never optimal for the Regulator to follow a policy that would ultimately insure that all ﬁrms are Believers, π + = 1. There are two concurrent reasons for that. On the one hand, as π increases, the Regulator has to deviate more and more from the statically optimal solution ta$ (π ), t$ (π ) to make believing more proﬁtable than nonbelieving. On the other hand, the beliefs dynamics slow down. Thus, the discounted beneﬁts from increasing π decrease. For ρ = 0.8, cv = 5, cx = 3, δ = 0.15, p = 6, k = 4, κ = 3 e.g., the proﬁts and utility at the steady state π + = 0.733333 are g b = g nb = 1.43785, φ = 2.33043. This steady state Paretodominates the fully rational equilibrium with π = 0, where g nb = 1.32708, φ = 2.30844. It also dominates the equilibrium attained when the belief dynamics (9.8) holds but the Regulator maximizes in each period his instantaneous utility φ instead of Φ. At this last equilibrium, π = 0.21036, g b = g nb = 1.375, φ = 2.23689. Note that the last two equilibria cannot be compared, since the latter provides a higher proﬁt to the ﬁrms but a lower utility to the Regulator. This ranking of equilibria is robust with respect to parameter variations. A clear message emerges. As we contended at the beginning of this paper the suggested solution Paretoimproves on the static Nash equilibrium. This solution implies both a beliefs dynamics among the ﬁrms and a farsighted Regulator. A farsighted Regulator without beliefs dynamics is pointless. Beliefs dynamics with a myopic Regulator lead to a more modest Paretoimprovement. But it is the combination of beliefs dynamics and farsightedness that Paretodominates all other solutions. 4.2 The inﬂuence of public ﬂexibility An interesting question is whether the Regulator would prefer a population that reacts quickly to proﬁt diﬀerences, shifting from Believing to NonBelieving or vice versa in a short time, or if it would prefer a less reactive population. In other words, would the Regulator prefer a high or a low value of β ? 188 DYNAMIC GAMES: THEORY AND APPLICATIONS From Proposition 9.1 we know that the long run fraction of Believers is given by: ρ . π + = max 0, 1 − 2βδ
ρ A minimum level β > 2δ of ﬂexibility is necessary for the system to converge towards an interior steady state with a positive fraction of B s. ρ For β greater than 2δ , the fraction of B s at the steady state increases with β , converging towards π = 1 as β goes to inﬁnity. One might think that, since the Regulator always prefers a high proportion of B s at equilibrium, he would also prefer a high value of β . Stated in a more formal manner, one might expect that the value function of the Regulator, V R (π0 ), increases with β regardless of π0 . This, however, is not the case. The dependence of V R (π0 ) on β is nonmonotone and depends crucially on π0 . An analytical characterization of V R (π0 ) being impossible, we use a numerical example to illustrate that point. The results are very robust with respect to parameter variations. Figure 9.2 shows the steady state value π + = π + (β ) of π for β ∈ [1, 30]. Figure 9.3 compares2 V R (0.2) and V R (0.8) for the same values of β . The other parameter values are as before ρ = 0.8, cv = 5, cx = 3, δ = 0.15, p = 6, k = 4, κ = 3 in both cases. 1 0.8 0.6 π ∗ 0.4 0.2 0 0 5 10 15 20 25 30 β
Figure 9.2. The proportion π of Believers at the steady state for β ∈ [1, 30]. 2 The numerical calculations underlying this ﬁgure were carried out using a powerful proprietary computer program for the treatment of optimal control problems graciously provided by Lars Gr¨ne, whose support is most gratefully acknowledged. See Gr¨ ne and Semmler u u (2002). 9. Environmental Taxation Game 189 VR 0 5 10 15 20 25 30 β
Figure 9.3. The value function of the Regulator for π = 0.2 (dotted line) and π = 0.8 (solid line) for β ∈ [1, 30]. Figure 9.3 reveals that one always has V R (0.8) > V R (0.2), reﬂecting the general result that V R (π0 ) is increasing in π0 for any β . Both value functions are not monotone in β but Ushaped. Combining Figure 9.2 and Figure 9.3 shows that the minimum of V R (π0 ) is always attained for the value of β at which the steady state value π + (β ) coincides with the initial fraction of Believers π0 . This result is quite intuitive. If π + (β ) < π0 , it is optimal for the Regulator to reduce π over time. The Regulator does it by announcing a tax ta much greater than the tax t he will implement, making the B s worse oﬀ than the N B s, but also increasing his own instantaneous beneﬁts φ. Thus, R prefers that the convergence towards the steady state be as slow as possible. That is, V R is decreasing in β . On the other hand, if π + > π0 , it is optimal for R to increase π over time. To do so, he must follow a policy that makes the B s better oﬀ than the N B s, but this is costly in terms of his instantaneous objective function φ. It is therefore better for R if the ﬁrms react fast to the proﬁt diﬀerence. The value function V R increases with β . Summarizing, the Regulator prefers (depending on π0 ) to be confronted with either very ﬂexible or very inﬂexible ﬁrms. Inbetween values of β provide him smaller discounted beneﬁt streams. Whether a low or a high ﬂexibility is preferable for R depends on the initial fraction π0 of Believers. Our numerical analysis suggests that a Regulator facing a small π0 prefer large values of β , whereas he prefers a low value of β when π0 is large. This result may follow from the speciﬁc functional form used in the model rather than reﬂect any fundamental property of the solution. 190 DYNAMIC GAMES: THEORY AND APPLICATIONS If β = 0, the proportion of B s remains ﬁxed over time at π0 – any initial value of π0 ∈ (0, 1) corresponds to a stable equilibrium. The Paretoimproving character of the inner equilibrium then disappears. Given β = 0, condition (9.14) is violated. The Regulator can announce any tax level without having to fear negative long term consequences. Thus, it is in his interest to exploit the gullibility of the B s to the maximum. To obtain a meaningful, Paretoimproving solution, some ﬂexibility is necessary that assures that the ﬁrms are not kept captive of beliefs that penalize them. Only then will the Regulator be led to take into account the B s interest. 5. Conclusions The starting point of this paper is a situation frequently encountered in environmental economics (and similarly in other economic contexts as well): If all ﬁrms are perfectly rational NonBelievers who make perfect predictions of the Regulator’s actions and discard the Regulators announcements as cheap talk, standard optimizing behavior leads to a Paretoinferior outcome, although there are no conﬂicts of interest between the diﬀerent ﬁrms and although the objectives of the ﬁrms and of the Regulator largely concur. We show that, in a static world, the existence of a positive fraction of Believers who take the Regulator’s announcement at face value Paretoimproves the economic outcome. This property crucially hinges on the fact that the ﬁrms are atomistic and thus do not anticipate the collective impact of their individual decisions. The static model is extended by assuming that the proportion of Believers and NonBelievers changes over time depending on the diﬀerence in the proﬁts made by the two types of ﬁrms. The Regulator is assumed to recognize his ability to inﬂuence the evolution of the proportion of Believers by his choice of announced and realized taxes, and to be interested not only in his instantaneous but also in his future utility. It is shown that a rational Regulator will never steer the economy towards a Paretooptimal equilibrium. However, his optimal policy may lead to a stable steady state with a strictly positive proportion of Believers that is Paretosuperior to the equilibrium where all agents perfectly anticipate his actions. Prerequisites therefore are a suﬃciently patient Regulator and ﬁrms that occur suﬃciently high costs for building perfect anticipations of the government actions and/or are suitably ﬂexible, i.e., switch adequately fast between Believing and NonBelieving. The conjunction of beliefs dynamics for the ﬁrms and of a farsighted Regulator allows for a larger Paretoimprovement than either only beliefs dynamics or only farsightedness. Depending upon the initial proportion of Believers, the 9. Environmental Taxation Game 191 Regulator is better oﬀ if the ﬁrms are very ﬂexible or very inﬂexible. Intermediate values of the ﬂexibility parameter are never optimal for the Regulator. References
Abrego, L. and Perroni, C. (1999). Investment Subsidies and TimeConsistent Environmental Policy. Discussion paper, University of Warwick, U.K. Barro, R. and D. Gordon. (1983). Rules, discretion and reputation in a model of monetary policy. Journal of Monetary Economics, 12: 101–122. Batabyal, A. (1996a). Consistency and optimality in a dynamic game of pollution control I: Competition. Environmental and Resource Economics, 8:205–220. Batabyal, A. (1996b). Consistency and optimality in a dynamic game of pollution control II: Monopoly. Environmental and Resource Economics, 8:315–330. Biglaiser, G., Horowitz, J., and Quiggin, J. (1995). Dynamic pollution regulation. Journal of Regulatory Economics, 8:33–44. Dawid, H. (1999). On the dynamics of word of mouth learning with and without anticipations. Annals of Operations Research, 89:273–295. Dawid, H. and Deissenberg, C. (2004). On the eﬃciencyeﬀects of private (dis)trust in the government. Forthcoming in Journal of Economic Behavior and Organization. Deissenberg, C. and Alvarez Gonzalez, F. (2002). Paretoimproving cheating in a monetary policy game. Journal of Economic Dynamics and Control, 26:1457–1479. Dijkstra, B. (2002). Time Consistency and Investment Incentives in Environmental Policy. Discussion paper 02/12, School of Economics, University of Nottingham, U.K. Gersbach, H., and Glazer, A. (1999). Markets and regulatory holdup problems. Journal of Environmental Economics and Management, 37: 151–64. Gr¨ne, L., and Semmler, W. (2002). Using Dynamic Programming with u Adaptive Grid Scheme for Optimal Control Problems in Economics. Working paper, Center for Empirical Macroeconomics, University of Bielefeld, Germany, 2002. Forthcoming in Journal of Economic Dynamics and Control. Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge University Press. 192 DYNAMIC GAMES: THEORY AND APPLICATIONS Kydland, F. and Prescott, E. (1977). Rules rather than discretion: The inconsistency of optimal plans. Journal of Political Economy, 85: 473–491. Marsiliani, L. and Renstr˝m, T. (2000). Time inconsistency in environo mental policy: Tax earmarking as a commitment solution. Economic Journal, 110:C123–C138. McCallum, B. (1997). Critical issues concerning central bank independence. Journal of Monetary Economics, 39:99–112. Petrakis, E. and Xepapadeas, A. (2003). Location decisions of a polluting ﬁrm and the time consistency of environmental policy. Resource and Energy Economics, 25:197–214. Simaan, M. and Cruz, J.B. (1973a). On the Stackelberg strategy in nonzerosum games. Journal of Optimization Theory and Applications, 11:533–555. Simaan, M. and Cruz, J.B. (1973b). Additional aspects of the Stackelberg strategy in nonzerosum games. Journal of Optimization Theory and Applications, 11:613–626. Vall´e, T., Deissenberg, C., and Ba¸ar, T. (1999). Optimal open loop e s cheating in dynamic reversed linearquadratic Stackelberg games. Annals of Operations Research, 88:247–266. Vall´e, T. (1998). Comparison of Diﬀerent Stackelberg Solutions in a e Deterministic Dynamic Pollution Control Problem. Discussion Paper, LENC3E, University of Nantes, France. Chapter 10 A TWOTIMESCALE STOCHASTIC GAME FRAMEWORK FOR CLIMATE CHANGE POLICY ASSESSMENT
Alain Haurie
Abstract In this paper we show how a multitimescale hierarchical noncooperative game paradigm can contribute to the development of integrated assessment models of climate change policies. We exploit the well recognized fact that the climate and economic subsystems evolve at very different time scales. We formulate the international negotiation at the level of climate control as a piecewise deterministic stochastic game played in the “slow” time scale, whereas the economic adjustments in the diﬀerent nations take place in a “faster” time scale. We show how the negotiations on emissions abatement can be represented in the slow time scale whereas the economic adjustments are represented in the fast time scale as solutions of general economic equilibrium models. We provide some indications on the integration of diﬀerent classes of models that could be made, using an hierarchical game theoretic structure. 1. Introduction The design and assessment of climate change policies, both at national and international levels, is a major challenge of the 21st century. The problem of climate change concerns all the developed and developing countries that together inﬂuence the same climate system through their emissions of greenhouse gases (GHG) associated with their economic development, and which will be diﬀerently aﬀected by the induced climate change. Therefore the problem is naturally posed in a multiagent dynamic decision analysis setting. The Kyoto and Marrakech protocol and the eventual continuation of the international negotiations in this domain illustrate this point. To summarize the issues let us quote Ed 194 DYNAMIC GAMES: THEORY AND APPLICATIONS wards et al., 2005 who conclude as follows their introductory paper of a book1 on the coupling of climate and economic dynamics :
 Knowledge of the dynamics of the carbon cycle and the forcing by greenhouse gases permits us to predict global climate change due to anthropogenic inﬂuences on a time scale of a century (albeit with uncertainty).  Stabilizing the temperature change to an acceptable level calls for a drastic worldwide reduction of the GHG emissions level (to around a quarter of the 1990 level) over the next 50 years.  Climate inertia implies that many of those who will beneﬁt (suﬀer) most from our mitigation actions (lack of mitigation) are not yet born.  The climate change impacts may be large and unequally distributed over the planet, with a heavier toll for some DCs.  The rapid rise of GHG emissions has accompanied economic development since the beginning of the industrial era; new ways of bypassing the Kuznets curve phenomenon have to be found for permitting DCs to enter into a necessary global emissions reduction scheme.  The energy system sustaining the world economy has to be profoundly modiﬁed; there are possible technological solutions but their implementations necessitate a drastic reorganization of the infrastructures with considerable economic and geopolitical consequences.  The policies to implement at international level must take explicitly into account the intergenerational and interregional equity issues.  The magnitude of the changes that will be necessary impose the implementation of marketbased instruments to limit the welfare losses of the diﬀerent parties (groups of consumers) involved. The global anthropogenic climate change problem is now relatively well identiﬁed... viable policy options will have to be designed as equilibrium solutions to dynamic games played by diﬀerent groups of nations. The paradigm of dynamic games is particularly well suited to represent the conﬂict of a set of economic agents (here the nations) involved jointly in the control of a complex dynamical system (the climate), over a very long time horizon, with distributed and highly unequal costs and beneﬁts... Integrated assessment models (IAMs) are the main tools for analyzing the interactions between climatic change and socioeconomic development (see the recent survey by Toth, 2005). In a ﬁrst category of IAMs one ﬁnds the models based on a paradigm of optimal economic growth a ` la Ramsey, 1928, to describe the world economy, associated with a simpliﬁed representation of the climate dynamics, in the form of a set of diﬀerential equations. The main models in this category are DICE94 (Nordhaus, 1994), DICE99 and RICE99 (Nordhaus and Boyer, 2000), MERGE (Manne et al., 1995), and more recently ICLIPS (Toth et al., 2003 and Toth, 2005). In a second category of IAMs one ﬁnds a represen1 We refer the reader to this book for a more detailed presentation of the climate change policy issues. 10 A Two Timescale Dynamic Game Framework for Climate Change 195 tation of the world economy in the form of a computable general equilibrium model (CGEM), whereas the climate dynamics is studied through a (sometimes simpliﬁed) landandoceanresolving (LO) model of the atmosphere, coupled to a 3D ocean general circulation model (GCM). This is the case of the MIT Integrated Global System Model (IGSM) which is presented by Prinn et al., 1999 as “designed for simulating the global environmental changes that may arise as a result of anthropogenic causes, the uncertainties associated with the projected changes, and the eﬀect of proposed policies on such changes”. Early applications of dynamic game paradigms to the analysis of GHG induced climate change issues are reported in the book edited by Carraro and Filar, 1995. Haurie, 1995, and Haurie and Zaccour, 1995, propose a general diﬀerential game formalism to design emission taxes in an imperfect competition context. Kaitala and Pohjola, 1995, address the issue of designing transfer payment that would make international agreements on greenhouse warming sustainable. More recent dynamic game models dealing with this issue are those by Petrosjan and Zaccour, 2003, and Germain et al., 2003. Carraro and Siniscalco, 1992, Carraro and Siniscalco, 1993, Carraro and Siniscalco, 1996, Buchner et al., 2005, have studied the dynamic game structure of Kyoto and postKyoto negotiations with a particular attention given to the “issuelinking” process, where agreement on the environmental agenda is linked with other possible international trade agreement (R&D sharing example). These game theoretic models have used very simple qualitative models or adaptations of the DICE or RICE models to represent the climateeconomy interactions. Another class of game theoretic models of climate change negotiations has been developed on the basis of IAMs incorporating a CGEM description of the world economy. Kemfert, 2005, uses such an IAM (WIAGEM which combines a CGEM with a simpliﬁed climate description) to analyze a game of climate policy cooperation between developed and developing nations. Haurie and Viguier, 2003b, Bernard et al., 2002, Haurie and Viguier, 2003a, Viguier et al., 2004, use a twolevel game structure to analyze diﬀerent climate change policy issues. At a lower level the World or European economy is described as a CGEM, at an upper level a negotiation game is deﬁned where strategies correspond to strategic negotiation decisions taken by countries or group of countries in the KyotoMarrakech agreement implementation. In this paper we propose a general framework based on a multitimescale stochastic game theoretic paradigm to build IAMs for global climate change policies. The particular feature that we shall try to represent in our modeling exercise is the diﬀerence in timescales between the 196 DYNAMIC GAMES: THEORY AND APPLICATIONS interacting economic and climate systems. In Haurie, 2003, and Haurie, 2002, some considerations have already been given to that issue. In the present paper we propose to use the formalism of hierarchical control and singular perturbation theory to take into account these features (we shall use in particular the formalism developed by Filar et al., 2001). The paper is organized as follows: In Section 2 we propose a general modeling framework for the interactions between economic development and climate change. In particular we show that the combined economyclimate dynamical system is characterized by two timescales; in Section 3 we formulate the long term game of economy and climate control which we call game of sustainable development. In Section 4 we exploit the multi timescale structure of the controlled system to deﬁne a reduced order game, involving only the slow varying climate related variables. In Section 5 we propose a research agenda for developing IAMs based on the formalism proposed in this paper. 2. Climate and economic dynamics In this section we propose a general controltheoretic modeling framework for the representation of the interactions between the climate and economic systems. 2.1 The linking of climate and economic dynamics 2.1.1 The variables of the climateeconomy system. We view the climate and the economy as two dynamical systems that are coupled. We represent the state of the economy at time t by an hybrid vector e(t) = (ζ (t), x(t)) where ζ (t) ∈ I is a discrete variable that represents a particular mode of the world economy (describing for example major technological advances, or major geopolitical reorganizations) whereas the variable x(t) represents physical capital stocks, production output, household consumption levels, etc. in diﬀerent countries. We also represent the climate state as an hybrid vector c(t) = (κ(t), y(t, ·)), where κ(t) ∈ L is a discrete variable describing diﬀerent possible climate modes, like e.g. those that may result from threshold events like the disruption of the thermohaline circulation or the melting of the iceshields in Greenland or Antarctic, and y(t, ·) = (y (t, ω )) : ω ∈ Ω) is a spatially distributed variable where the set Ω represents all the diﬀerent locations on Earth where climate matters. The climate state variable y represents typically the average surface temperatures, the average precipitation levels, etc. 10 A Two Timescale Dynamic Game Framework for Climate Change 197 The world is divided into a set J of economic regions that we shall call nations. The linking between climate and economics occurs because climate change may cause damages, measured through a vulnerability index2 υj (t) of region j ∈ M , and also because emissions abatement policies may exert a toll on the economy. The climate policies of nations is summarized by the state variables zj (t), j ∈ M , which represent the cap on GHG emissions that they impose on their respective economies at time t. A control variable uj (t) will represent the abatement eﬀort of region j ∈ M which acts on the cap trajectory z (·) and through it on the evolution of the climate variables c(·). We summarize the set of variables and indices in Table 10.1.
Table 10.1. List of variables in the climate economy model meaning time index (fast) “stretched out” time index timescales ratio spatial index economic region (nation, player) index population in region j ∈ M at time t economic/geopolitical mode climate mode pooled mode indicator economic state variable for Nation j economic state variable (all countries) climate state variable cap on GHG emissions for Nation j world global cap on GHG emissions slow paced economyclimate state variables GHG abatement eﬀort of Nation j vulnerability indicator for Nation j Variable name t t τ=ε ε ω∈Ω j ∈ M = {1, . . . , m} Lj (t) j ∈ M ζ (t) ∈ I κ(t) ∈ L ξ (t) = (ζ (t), κ(t)) ∈ I × L xj (t) ∈ Xj x(t) ∈ X y(t) = (y (t, ω ) ∈ Y (ω )) zj (t) ∈ Zj z(t) ∈ Z s(t) = (z(t), y(t)) uj (t) ∈ Uj υj (t) ∈ Υj 2.1.2 The dynamics. Our understanding of the carbon cycle and the temperature forcing due to the accumulation of CO2 in the atmosphere is that the dynamics of anthropogenic climate change has a slow pace compared to the speed of adjustment in the “fast” economic systems. We shall therefore propose a modeling framework characterized by two timescales with a ratio ε between the fast and the slow one. 2 This index is a functional of the climate state c(t). 198 DYNAMIC GAMES: THEORY AND APPLICATIONS Economic and climate mode dynamics. We assume that the economic and climate modes may evolve according to controlled jump processes described by the following transition rates
P[ζ (t + δ ) = ζ (t) = k and x(t)] = pk (x(t))δ + o(δ ) o(δ ) = 0. lim δ →0 δ and P[κ(t + δ ) = ικ(t) = i and y(t)] = qiι (y(t))δ + o(δ ) (10.3) o(δ ) = 0. (10.4) lim δ →0 δ Combining the two jump processes we have to consider the pooled mode (k, i) ∈ I × L and the possible transitions (k, i) → ( , i) with rate pk (x(t)) or transitions (k, i) → (i, ι) with rate qiι (y(t)). (10.1) (10.2) Climate dynamics. The climate state variable is spatially distributed (it is typically the average surface temperatures and the precipitation levels on diﬀerent points of the planet). The climate state evolves according to complex earth system dynamics with a forcing term which is directly linked with the global GHG emissions levels z(t). We may thus represent the climate state dynamics as a distributed parameter system obeying a (generalized) diﬀerential equation which is indexed over the mode (climate regime) κ(t)
y (t, ·) = g ξ(t) (z(t), y (t, ·)) ˙ y (0, ·) = y o (·). (10.5) (10.6) Economic dynamics. The world economy can be described as a set of interconnected dynamical systems. The state dynamics of each region j ∈ M is depending on the state of the other regions due to international trade and technology transfers. Each region is also characterized by the current cap on GHG emissions zj (t) and by its abatement eﬀort uj (t). The nations j ∈ M occupy respective territories Ωj ⊂ Ω. The economic performance of Nation j will be aﬀected by its vulnerability to climate change. In the most simple way one can represent this indicator as a vector υj (t) ∈ IRp (e.g. the percentage loss of output for each of the p economic sectors) deﬁned as a functional (e.g. an integral over Ωj ) of ˜κ(t) ˜ a distributed damage function dj (ω, y (t, ω )) associated with climate mode ξ (t) and distributed climate state y (t, ·).
υj (t) =
Ωj ˜ξ(t) ˜ dj (ω, y (t, ω )) dω. (10.7) 10 A Two Timescale Dynamic Game Framework for Climate Change 199 Now the dynamics of the nationj economy is described by the diﬀerential equation 1 ζ (t) f (t, x(t), υj (t), zj (t)) εj xj (0) = xo j x(t) = (xj (t))j ∈M . xj (t) = ˙
ζ (t) j∈M (10.8) (10.9) (10.10) We have indexed the velocities fj (t, x(t), υj (t), zj (t)) over the discrete economic/geopolitical state ζ (t) since these diﬀerent possible modes have an inﬂuence on the evolution of the economic variables3 . The factor 1 ε where ε is small expresses the fact that the economic adjustments take place in a much faster timescale than the ones for climate or modal states. Notice also that the feedback of climate on the economies is essentially represented by the inﬂuence of the vulnerability variables on the economic dynamics of diﬀerent countries. The GHG emission cap variable zj (t) for nation j ∈ M is a controlled variable which evolves according to the dynamics zj (t) = hj ˙ zj (0) =
o zj ζ (t) (xj (t), zj (t), uj (t)) (10.11) (10.12) where uj (t) is the reduction eﬀort. This formulation should permit the analyst to represent the R&D and other adaptation actions that have been taken by a government in order to implement a GHG emissions ζ (t) reduction. Again we have indexed the velocity hj (xj (t), zj (t), uj (t)) over the economic/geopolitical modes ζ (t) ∈ I . The absence of factor 1 ε in front of velocities in Eqs. (10.11) indicates that we assume a slow pace in the emissions cap adjustments. 2.2 A two timescale system We summarize below the equations characterizing the climateeconomy system. 1 ζ (t) f (t, x(t), zj (t), υj (t)) εj xj (0) = xo j ∈ M j xj (t) = ˙ j∈M 3 We could have also indexed these state equations over the pooled modal state indicator (ζ (t), κ(t)), assuming that the climate regime might also have an inﬂuence on the economic dynamics. For the sake of simplifying the model structure we do not consider the climate regime inﬂuence on economic dynamics. 200 DYNAMIC GAMES: THEORY AND APPLICATIONS x(t) = (xj (t))j ∈M zj (t) = hζ (t) (xj (t), zj (t), uj (t)) ˙ z(t) = zj (0) = zj (t) j ∈M o zj j ∈ M κ(t) y (t, ·) = g (z(t), y (t, ·)) ˙ y (0, ·) = y o (·) υj (t) =
Ωj ˜κ(t) ˜ dj (ω, y (t, ω )) dω P[ζ (t + δ ) = ζ (t) = k and x(t)] = pk (x(t))δ + o(δ ) k, ∈ I o(δ ) = 0 k, ∈ I lim δ →0 δ P[κ(t + δ ) = ικ(t) = i and y(t)] = qiι (y(t))δ + o(δ ) k, ∈ L o(δ ) = 0 i, ι ∈ L. lim δ →0 δ In the parlance of control systems we have constructed a twotimescale piecewise deterministic setting. The time index t refers to the “climate” and slow paced socioeconomic evolutions. The parameter ε is the timescale ratio between the slow and fast timescales. The stretched out timescale is deﬁned as τ = t/ε. It will be used when we study the fast adjustments of the economy given climate, socioeconomic and abatement conditions. 3. The game of sustainable development In this section we describe the dynamic game that nations4 play when they negotiate to deﬁne the GHG emissions cap in international agreements. The negotiation process is represented by the control variables uj (t), j ∈ M . We ﬁrst deﬁne the strategy space then the payoﬀs that will be considered by the nations and we propose a characterization of the equilibrium solutions. 3.1 Strategies We assume that the nations use piecewise openloop (POL) strategies. This means that the controls uj (·) are functions of time that can be adapted after each jump of the ξ (·) process. A strategy for Nation j is therefore a mapping from (t, k, i, x, z, y) ∈ [0, ∞) × I × L × X × Z × Y
4 We identify the players as being countries (possibly group of countries in some applications). 10 A Two Timescale Dynamic Game Framework for Climate Change 201 t into the class Uj of control functions uj (τ ) : τ > t. We denote γ the vector strategy of the m nations. According to these strategies the nations (players) decide about the pace at which they will decrease their total GHG emissions over time. These decision can be revised if a modal change occurs (new climatic regime or new geopolitical conﬁguration). 3.2 Payoﬀs In this model, the nations control the system by modifying their GHG emission caps. As a consequence the climate change is altered, the damages are controlled and the economies are adapting in the fast timescale through the market dynamics. Climate and economic state have also an inﬂuence on the probability of switching to a diﬀerent modal state (geopolitical or climate regime). The payoﬀ to nations is based on a discounted sum of their welfare5 . So, we associate with an initial state (ξ o ; xo , zo , yo ) at time to = 0 and a strategy mtuple γ a payoﬀ for each nation j ∈ M deﬁned as follows
ξ Jj (γ ; to , xo , zo , yo ) = Eγ
o ∞ to e−ρj (t−t ) Wj
o ξ (t) (Lj (t), xj (t), zj (t), υj (t), uj (t)) dt ξ (0) = μ, x(0) = xo , z(0) = zo , y(0) = yo , j = 1, . . . , m
(k,i) (10.13) where Wj (Lj , xj , zj , υj , uj ) represents the welfare of nation j at a given time when the population level is Lj , the cap level is zj , the vulnerability is υj and the cap reduction eﬀort is uj , while the economy/geopoliticalclimate mode is (k, i). The parameter ρj is the discount rate used in nation j . 3.3 Equilibrium and DP equations It is natural to assume that the nations will play a Nash equilibrium, i.e. a strategy mtuple γ ∗ such that for each nation j , the strategy ∗ γj is the best reply to the strategy choice of the other nations. The equilibrium strategy mtuple should therefore satisfy, for all j ∈ M and 5 The simplest expression of it could be L(t)U (C (t)/L(t)) where L(t) is the population size, C (t) is the total consumption by households and U (·) is the utility of percapita consumption. 202 DYNAMIC GAMES: THEORY AND APPLICATIONS all initial conditions6 (t, k, i, xo , so ) ([γ ∗−j , γj ]; to , xo , so ), (10.14) where the expression [γ ∗−j , γj ] stands for the strategy mtuple obtained when Nation j modiﬁes unilaterally its strategy choice. The climateeconomy system model has a piecewise deterministic structure. We say that we have deﬁned a piecewise deterministic diﬀerential game (PDDG). A dynamic programming (DP) functional equation will characterize the optimal payoﬀ function. It is obtained by applying the Bellman optimality principle when one considers the time T of the ﬁrst jump of the ξ process after the initial time to . This yields
(k,i) (k,i) Vj∗ (k, i; to , xo , so ) = Jj (γ ∗ ; to , xo , so ) ≥ Jj Vj∗ (k, i; to , xo , so ) = equil. E T to
o e−ρj (t−t ) Wj
o (k,i) (Lj (t), xj (t), zj (t), υj (t), uj (t)) dt + e−ρj (T −t ) Vj∗ (ζ (T ), κ(T ); T, x(T ), s(T )) j∈M (10.15) where the equilibrium is taken with respect to the trajectories deﬁned by the solutions of the state equations 1k f (t, x(t), zj (t), υj (t)) εj xj (0) = xo j ∈ M j x(t) = (xj (t))j ∈M xj (t) = ˙ zj (t) = hk (xj (t), zj (t), uj (t)) ˙ z(t) = zj (0) =
j ∈M o zj i j∈M zj (t) j∈M y (t, ·) = g (z(t), y (t, ·)) ˙ y (0, ·) = y o (·) υj (t) =
Ωj ˜ ˜ di (ω, y (t, ω )) dω. j The random time T of the next jump, with the new discrete state (ζ (T ), κ(T )) reached at this jump time are random events with probability obtained from the transition rates pk (x(t)) and qiι (y(t)). We
6 Here we use the notation s = (z, y) for the slow varying continuous state variables. 10 A Two Timescale Dynamic Game Framework for Climate Change 203 can use this information to deﬁne a family of associated openloop differential games. 3.4 A family of implicit OLE games We recall here7 that we can associate with a POL equilibrium in a PDDG a family of implicitly deﬁned OpenLoop Equilibrium (OLE) problem for a class of deterministic diﬀerential games. Vj∗ (k, i; xo , so ) = equil.u(·)
∞ e−ρj t+ t 0 λ(k,i) (x(s),y(s))ds 0 k,i Lj (xj (t), zj (t), υj (t), uj (t)) +
∈I−k pk, (x(s))Vj∗ (( , i); x(t), s(t)) qi,ι (y(s))Vj∗ ((k, ι); x(t), s(t)) dt j ∈ M
ι∈L−i + (10.16) where the equilibrium is taken with respect to the trajectories deﬁned by the solutions of the state equations 1k f (t, x(t), zj (t), υj (t)) εj xj (0) = xo j ∈ M j x(t) = (xj (t))j ∈M xj (t) = ˙ zj (t) = hk (xj (t), zj (t), uj (t)) ˙ z(t) = zj (0) =
j ∈M o zj i j∈M zj (t) j∈M y (t, ·) = g (z(t), y (t, ·)) ˙ y (0, ·) = y o (·) υj (t) =
Ωj ˜κ(t) ˜ dj (ω, y (t, ω )) dω and where we have introduced the notation λ(k,i) (x(s), y(s)) =
∈I−k pk (x(s)) +
ι∈L−i qiι (y(s)). (10.17) 7 See Haurie, 1989, and Haurie and Roche, 1994, for details. 204 DYNAMIC GAMES: THEORY AND APPLICATIONS 3.5 Economic interpretation These auxiliary OLE problems oﬀer an interesting economic interpretation. The nations that are competing in their abatement policies have to take into account the combined dynamic eﬀect on welfare of the climate change damages and abatement policy costs; this is represented by the term (k,i) Lj (xj (t), zj (t), υj (t), uj (t)) in the reward expression (10.16). But they have also to tradeoﬀ the risks of modifying the geopolitical mode or the climate regime; the valuation of these risks at each time t is given by the terms pk, (x(s))Vj∗ (( , i); x(t), s(t)) +
∈I−k ι∈L−i qi,ι (y(s))Vj∗ ((k, ι); x(t), s(t)) in the integrand of (10.16). Furthermore the associated deterministic t (k,i) problem involves an endogenous discount term e−ρj t+ 0 λ (x(s),y(s))ds which is related to pure time preference (discount rate ρj ) and controlled probabilities of jumps (pooled jump rate λ(k,i) (x(s), y(s))). In solving these dynamic games the analysts will face several diﬃculties. The ﬁrst one, which is related to the DP approach is to obtain a correct evaluation of the value functions Vj∗ ((k, i); x, s); this implies a computationally demanding ﬁxed point calculation in a high dimensional space. A second diﬃculty will be to ﬁnd the solution of the associated OLEs. These are problems with very large state space, in particular in the representation of the coupled economies of the diﬀerent nations j ∈ M . A possible way to circumvent this diﬃculty is to exploit the hierarchical structure of this dynamic game that is induced by the diﬀerence in timescale between the evolution of the economy related state variables and those linked with climate. 4. The hierarchical game and its limit equilibrium problem In this section we propose to deﬁne an approximate dynamic game which is much simpler to analyze and to solve numerically in IAMs. This approximation is proposed by extending formally, using8 “analogy reasoning” some results obtained in the ﬁeld of control systems (oneplayer games) under the generic name of singular perturbation theory.
8 Analogy reasoning is a sin in mathematics. It is used here to delineate the results (theorems to prove) that are needed to justify the proposed approach. 10 A Two Timescale Dynamic Game Framework for Climate Change 205 We will take our inspiration mostly from Filar et al., 2001. The reader will ﬁnd in this paper a complete list of references on the theory of singularly perturbed control systems. 4.1 The singularly perturbed dynamic game We describe here the possible extension of the fundamental technique used in singular perturbation theory for control systems, which leads to an averaging of the fast part of the system and a “lifting up” of the control problem to the upperlevel slow paced system. 4.1.1 The local economic equilibrium problem. If at a ¯, the nations have adopted GHG emissions caps represented given time t ¯ ¯ ¯ t by zt , the state of climate yt is generating damages υj , j ∈ M , we call ¯¯ local economic equilibrium problem the solution xt of the set of algebraic equations ¯t ¯ μ¯ ¯ t ¯ (10.18) 0 = fj (t, xt , zj , υj ) j ∈ M.
We shall now use a modiﬁcation of the time scale, called the stretched t out timescale. It is obtained when we pose τ = ε . Then we denote x(τ ) = x(τ ε) the economic trajectory when represented in this stretched ˜ out time. Assumption 10.1 We assume that the following holds for ﬁxed values ¯t ¯¯ t ¯ of time t, and slow paced state and control variables (zj , υj , ut ), j ∈ M . j
t t Lk,i (¯t , zj , υj , ut ) + j j xj ¯ ¯ ¯ ¯ pk, (¯ t )vj (( , i); st ) x
∈I−k ¯ ¯ = θ→∞ lim 1 θ θ 0 {Lk,i (˜j (t), zj (t), υj (t), uj (t)) ˜ ˜ jx
¯ +
∈I−k pk, (˜ (s))vj (( , i); , st )} dt x j∈M j∈M (10.19) s.t. ¯ k¯ t ˙ ˜ xj (τ ) = fj (t, x(τ ), zj ) ˜ xj (0) = xo j ∈ M ˜ j xj (θ) = xf ˜ j j ∈ M. (10.20) (10.21) (10.22) The problem (10.19)(10.22) consists in averaging the part of the instantaneous reward in the associated OLE game that depends on the fast ˜ economic variable x. This averaging is made over the x(·) trajectory, when the timescale has been stretched out and when a potential func¯ tion vj (( , ι); , st ) is used instead of the true equilibrium value function 206 DYNAMIC GAMES: THEORY AND APPLICATIONS Vj∗ ((k, ι); x, s). The condition says that this averaging problem has a value which is given by the reward associated with the local economic ¯ equilibrium xt , j ∈ M corresponding to the solution of ¯j
¯t ¯ k¯ ¯ t ¯ 0 = fj (t, xt , zj , υj ) j ∈ M. (10.23) Clearly this assumption holds if the economic equilibrium is a stable point for the dynamical system (10.20) in the stretched out timescale. 4.1.2 The limit equilibrium problem. In control systems an assumption similar to Assumption 10.1 permits one to “lift” the optimal control problem to the “upper level” system which is uniquely concerned with the slow varying variables. The basic idea consists in splitting the time axis [0, ∞) into a succession of small intervals of length Δ(ε) which will tend to 0 together with the timescale ratio ε in such a way that Δ(ε) → ∞. Then using the averaging property (10.19)(10.22) when ε ε → 0 one deﬁnes an approximate control problem called the limit control problem which is uniquely deﬁned at the level of the slow paced variable. We propose here the analogous limit equilibrium problem for the multiagent system that we are studying. We look for equilibrium (potential) ∗ value functions vj ((k, i); , s), j ∈ M (where we use the notation s = (z, y)) that satisfy the family of associated OLE deﬁned as follows
∗ vj ((k, i); , so ) = equil. Lk,i (¯j (t), zj (t), υj (t), uj (t) jx ∗ pk, (¯ (t))vj (( , i); s(t)) x ∈I−k ∗ qi,ι (y(t))vj ((k, ι); s(t)), + + j∈M ι∈L−i s.t. k¯ ¯ 0 = fj (t, x(t), zj (t), υj (t)), j∈M j∈M zj (t) = h (xj (t), zj (t), uj (t)), ˙ z(t) = zj (0) =
j ∈M o zj , i k zj (t), j∈M j∈M y (t, ·) = g (z(t), y (t, ·)) ˙ y (0, ·) = y o (·) υj (t) =
Ωj ˜ ˜ di (ω, y (t, ω )) dω, j j∈M ¯ In this problem, the economic variables x(t) are not state variables anymore; they are not appearing in the arguments of the equilibrium value 10 A Two Timescale Dynamic Game Framework for Climate Change 207 ∗ functions vj ((k, i); , s), j ∈ M . They serve now as auxiliary variables in the deﬁnition of the nations rewards when they have selected an emission cap policy. The reduction in state space dimension is therefore very important and it can be envisioned to solve numerically these equations in simulations of IAMs. 5. A research agenda The reduction of complexity obtained in the limit equilibrium problem is potentially very important. An attempt to solve numerically the resulting dynamic game could be considered, although a high dimensional state variable, the climate descriptor y(t) remains in this limit problem. More research is needed before such an attempt could succeed. We give below a few hints about the topics that need further clariﬁcation. 5.1 Comparison of economic and climate timescales The pace of anthropogenic climate change is still a matter of controversies. A better understanding of the inﬂuence of GHG emissions on climate change should emerge from the development of better intermediate complexity models. Recent experiments by Drouet et al., 2005a, and Drouet et al., 2005b, on the coupling of economic growth models with climate models tend to clarify the diﬀerence in adjustment speeds between the two systems. 5.2 Approximations of equilibrium in a twotimescale game The study of control systems with two timescales has been developed under the generic name of “singular perturbation” theory. A rigorous extension of the control results to a gametheoretic equilibrium solution environment still remains to be done. 5.3 Viability approach Aubin et al., 2005 propose an approach more encompassing than game theory to study the dynamics of climateeconomy systems. The concept of viability could be introduced in the piecewise deterministic formalism proposed here instead of the more “teleonomic” equilibrium solution concept. 208 DYNAMIC GAMES: THEORY AND APPLICATIONS 6. Conclusion In this paper we have proposed to use a formalism directly inspired from the system control and dynamic game literature to model the climateeconomy interplay that characterizes the climate policy negotiations. The Kyoto protocol is the ﬁrst example of international agreement on GHG emissions abatement. It should be followed by other complex negotiations between nations with long term economic and geopolitical consequences at stake. The framework of stochastic piecewise deterministic games with twotimescales oﬀers an interesting paradigm for the construction of IAMs dealing with long term international climate policy. The examples, given in the introduction, of the ﬁrst experiments with the use of hierarchical dynamic games to study real life policies in the realm of the Kyoto protocol tend to show that the approach could lead to interesting policy evaluation tools. It is remarkable that economic growth models as well as climate models are very close to the general system control paradigm. In a proposed research agenda we have indicated the type of developments that are needed for making this approach operational for climate policy assessment. Acknowledgments. This research has been supported by the Swiss NSF under the NCCRClimate program. I thank in particular Laurent Viguier for the enlightening collaboration on the economic modeling of climate change and Biancamaria D’Onofrio and Francesco Moresino for fruitful exchanges on singularly perturbed stochastic games. References
Aubin, J.P., Bernardo, T., and SaintPierre, P. (2005). A viability approach to global climate change issues. In: Haurie, A. and Viguier, L. (eds), The Coupling of Climate and Economic Dynamics, pp. 113–140, Springer. Bernard, A., Haurie, A., Vielle, M., and Viguier, L. (2002). A Twolevel Dynamic Game of Carbon Emissions Trading Between Russia, China, and Annex B Countries. Working Paper 11, NCCRWP4, Geneva. To appear in Journal of Economic Dynamics and Control. Buchner, B., Carraro, C., Cersosimo, I., and Marchiori, C. (2005). Linking climate and economic dynamics. In: Haurie, A. and Viguier, L. (eds), Back to Kyoto? US participation and the linkage between R&D and climate cooperation, Advances in Global Change Research, Springer. 10 A Two Timescale Dynamic Game Framework for Climate Change 209 Carraro, C. and Filar, J.A. (1995). Control and GameTheoretic Models of the Environment, volume 2 of Annals of the International Society of Dynamic Games. Birha¨ser, Boston. u Carraro, C. and Siniscalco, D. (1992). The international dimension of environmental policy. European Economic Review, 36:379–87. Carraro, C. and Siniscalco, D. (1993). Strategies for the international protection of the environment. Journal of Public Economics, 52: 309–28. Carraro, C. and Siniscalco, D. (1996). R&D Cooperation and the Stability of International Environmental Agreements. In: C. Carraro and D. Siniscalco (eds), International Environmental Negotiations, Kluwer Academic Publishers. Drouet, L., Beltran, C., Edwards, N.R., Haurie, A.B., Vial, J.P., and Zachary, D.S. (2005a). An oracle method to couple climate and economic dynamics. In: Haurie, A. and Viguier, L. (eds), The Coupling of Climate and Economic Dynamics, Springer. Drouet, L., Edwards, N.R., and Haurie (2005b). Coupling climate and economic models in a costbeneﬁt framework: A convex optimization approach. Environmental Modeling and Assessment. Edwards, N.R., Greppin, H., Haurie, A.B., and Viguier, L. (2005). Linking climate and economic dynamics. In: Haurie, A. and Viguier, L. (eds), The Coupling of Climate and Economic Dynamics, Advances in Global Change Research, Springer. Filar, J., Gaitsgory, V., and Haurie, A. (2001). Control of singularly perturbed hybrid stochastic systems. IEEE Transactions on Automatic Control, 46(2):179–190. Germain, M., Toint, P., Tulkens, H., and de Zeeuw, A. (2003). Transfers to sustain dynamic coretheoretic cooperation in international stock pollutant control. Journal of Economic Dynamics and Control, 28: 79–99. Haurie, A. (1989). Piecewise deterministic diﬀerential games. In: Bernhard, P. and Ba¸ar, T. (eds), Diﬀerential Games and Applications, s Lecture Notes in Control and Information, volume 119, pp. 114–127, Springer Verlag. Haurie, A. (1995). Environmental coordinatio in dynamic oligopolistic markets. Group Decision anf Negotiation, 4:46–67. Haurie, A. (2002). Turnpikes in multidiscount rate environments and GCC policy evaluation. In: Zaccour, G. (ed.), Optimal Control and Diﬀerential Games: essays in Honor of Steﬀen Jørgensen, volume 5 of Advances in Computational Management Science, Kluwer Academic Publishers. 210 DYNAMIC GAMES: THEORY AND APPLICATIONS Haurie, A. (2003). Integrated assessment modeling for global climate change: an inﬁnite horizon optimization viewpoint. Environmental Modeling and Assessment, 8(3):117–132. Haurie, A. and Roche, M. (1994). Turnpikes and computation of piecewise openloop equilibria in stochastic diﬀerential games. Journal of Economic Dynamics and Control, 18:317–344. Haurie, A. and Viguier, L. (2003a). A stochastic dynamic game of carbon emissions trading. Environmental Modeling & Assessment, 8(3): 239–248. Haurie, A. and Viguier, L. (2003b). A stochastic game of carbon emissions trading. Environmental Modeling and Assessment, 8(3):239–248. Haurie, A. and Zaccour, G. (1995). Diﬀerential game models of global environmental management. In: Carraro, C. and Filar, J. (eds), Control and GameTheoretic Models of the Environment, volume 2 of Annals of the International Society of Dynamic Games, pp. 3–23, Birkh¨user. a Kaitala, V. and Pohjola, M. (1995). Sustainable international agreements on greenhouse warming  a game theory study. In: Carraro, C. and Filar, J. (eds), Control and GameTheoretic Models of the Environment, volume 2 of Annals of the International Society of Dynamic Games, pp. 67–87, Birkh¨user. a Kemfert, C. (2005). Linking climate and economic dynamics. In: Haurie, A. and Viguier, L. (eds), Climate policy cooperation games between developed and developing nations: A quantitative applied analysis, Advances in Global Change Research, Springer. Manne, A.S., Mendelsohn, R., and Richels, R.G. (1995). Merge: A model for evaluating regional and global eﬀects of ghg reduction policies. Energy Policy, 23:17–34. Nordhaus, W. D. (1994). Managing the Global Commons: The Economics of Climate Change. MIT Press, Cambridge, MA. Nordhaus, W. D. and Boyer, J. (2000). Warming the World: Economic Models of Global Warming. MIT Press, Cambridge, MA. Petrosjan, L. and Zaccour, G. (2003). Timeconsistent shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 27:381–398. Prinn, R., Jacoby, H., Sokolov, A., Wang, C., Xiao, X., Yang, Z., Eckaus, R., P. Stone, D. Ellerman, Melillo, J., Fitzmaurice, J., Kicklighter, D., Holian, G., , and May, Y. Liu (1999). Integrated global system model for climate policy assessment: Feedbacks and sensitivity studies. Climatic Change, 41(3/4): 469–546. Ramsey, F. P. (1928). A mathematical theory of saving. Economic Journal, 38(152):543–59. 10 A Two Timescale Dynamic Game Framework for Climate Change 211 Toth, F.L. (2005). Coupling climate and economic dynamics: recent achievements and unresolved problems. In: Haurie, A. and Viguier, L. (eds), Coupling Climate and Economic Dynamics, Springer. Toth, F.L., Bruckner, T., F¨ssel, H.M., Leimbach, M., and Petschelu Held, G. (2003). Integrated assessment of longterm climate policies: Part 1  model presentation. Climatic Change, 56:37–56. Viguier, L., Vielle, M., Haurie, A., and Bernard, A. (2004). A twolevel computable equilibrium model to assess the strategic allocation of emission allowances within the european union. To appear in Computers & Operations Research. Chapter 11 A DIFFERENTIAL GAME OF ADVERTISING FOR NATIONAL AND STORE BRANDS
Salma Karray Georges Zaccour
Abstract We consider a diﬀerential game model for a marketing channel formed by one manufacturer and one retailer. The latter sells the manufacturer’s product and may also introduce a private label at a lower price than the manufacturer’s brand. The aim of this paper is twofold. We ﬁrst assess in a dynamic context the impact of a private label introduction on the players’ payoﬀs. If this is beneﬁcial for the retailer to propose his brand to consumers and detrimental to the manufacturer, we wish then to investigate if a cooperative advertising program could help the manufacturer to mitigate the negative impact of the private label. 1. Introduction Private labels (or store brand) are taking increasing shares in the retail market in Europe and North America. National manufacturers are threatened by such private labels that can cannibalize their market shares and steal their consumers, but they can also beneﬁt from the store traﬃc generated by their presence. In any event, the store brand introduction in a product category aﬀects both retailers and manufacturers marketing decisions and proﬁts. This impact has been studied using static game models with prices as sole decision variables. Mills (1995, 1999) and Narasimhan and Wilcox (1998) showed that for a bilateral monopoly, the presence of a private label gives a bigger bargaining power to the retailer and increases her proﬁt, while the manufacturer gets lower proﬁt. Adding competition at the manufacturing level, Raju et al. (1995) identiﬁed favorable factors to the introduction of a private label for the retailer. They showed in a static context that price 214 DYNAMIC GAMES: THEORY AND APPLICATIONS competition between the store and the national brands, and between national brands has considerable impact on the proﬁtability of the private label introduction. Although price competition is important to understand the competitive interactions between national and private labels, the retailer’s promotional decisions do also aﬀect the sales of both product (Dhar and Hoch 1997). Many retailers do indeed accompany the introduction of a private label by heavy store promotions and invest more funds to promote their own brand than to promote the national ones in some product categories (Chintagunta et al. 2002). In this paper, we present a dynamic model for a marketing channel formed by one manufacturer and one retailer. The latter sells the manufacturer’s product (the national brand) and may also introduce a private brand which would be oﬀered to consumers at a lower price than the manufacturer’s brand. The aim of this paper is twofold. We ﬁrst assess in a dynamic context the impact of a private label introduction on the players’ proﬁts. If we ﬁnd the same results obtained from static models, i.e., that it is beneﬁcial for the retailer to propose his brand to consumers and detrimental to the manufacturer, we wish then to investigate if a cooperative advertising program could help the manufacturer to mitigate, at least partially, the negative impact of the private label. A cooperative advertising (or promotion) program is a cost sharing mechanism where a manufacturer pays part of the cost incurred by a retailer to promote the manufacturer’s brand. One of the ﬁrst attempts to study cooperative advertising, using a (static) game model, is Berger (1972). He studied a case where the manufacturer gives an advertising allowance to his retailer as a ﬁxed discount per item purchased and showed that the use of quantitative analysis is a powerful tool to maximize the proﬁts in the channel. Dant and Berger (1996) used a Stackelberg game to demonstrate that advertising allowance increases retailer’s level of local advertising and total channel proﬁts. Bergen and John (1997) examined a static game where they considered two channel structures: A manufacturer with two competing retailers and two manufacturers with two competing retailers. They showed that the participation of the manufacturers in the advertising expenses of their dealers increases with the degree of competition between these dealers, with advertising spillover and with consumer’s willingness to pay. Kim and Staelin (1999) also explored the twomanufacturers, tworetailers channel, where the cooperative strategy is based on advertising allowances. Studies of cooperative advertising as a coordinating mechanism in a dynamic context are of recent vintages (see, e.g., Jørgensen et al. (2000, 2001), Jørgensen and Zaccour (2003), Jørgensen et al. (2003)). 11 A Diﬀerential Game of Advertising for National and Store Brands 215 Jørgensen et al. (2000) examine a case where both channel members make both long and short term advertising eﬀorts, to stimulate current sales and build up goodwill. The authors suggest a cooperative advertising program that can take diﬀerent forms, i.e., a fullsupport program where the manufacturer contributes to both types of the retailer’s advertising expenditures (long and short term) or a partialsupport program where the manufacturer supports only one of the two types of retailer advertising. The authors show that all three cooperative advertising programs are Paretoimproving (proﬁtwise) and that both players prefer the full support program. The conclusion is thus that a coop advertising program is a coordinating mechanism in also a dynamic setting. Due to the special structure of the game, long term advertising strategies are constant over time. This is less realistic in a dynamic game with an inﬁnite time horizon. A more intuitive strategy is obtained in Jørgensen et al. (2001). This paper reconsiders the issue of cooperative advertising in a twomember channel in which there is, however, only one type of advertising of each player. The manufacturer advertises in national media while the retailer promotes the brand locally. The sales response function is linear in promotion and concave in goodwill. The dynamics are a NerloveArrowtype goodwill evolution equation, depending only on the manufacturer’s national advertising. In this case, one obtains a nondegenerate Markovian advertising strategy, being linearly decreasing in goodwill. In Jørgensen et al. (2000, 2001), it is an assumption that the retailer’s promotion aﬀects positively the brand image (goodwill stock). Jørgensen, et al. (2003) study the case where promotions damage the brand image and ask the question whether a cooperative advertising program is meaningful in such context. The answer is yes if the initial brand image is “weak” or if the initial brand image is at an “intermediate” level and retailer promotions are not “too” damaging to the brand image. Jørgensen and Zaccour (2003) suggest an extension of the setup in Jørgensen et al. (2003). The idea now is that excessive promotions, and not instantaneous action, is harmful to the brand image. To achieve our objective, we shall consider three scenarios or games: 1. Game N : the retailer carries only the N ational brand and no cooperative advertising program is available. The manufacturer and the retailers play a noncooperative game and a feedback Nash equilibrium is found. 2. Game S : the retailer oﬀers a S tore brand along with the manufacturer’s product and there is no cooperative advertising program. 216 DYNAMIC GAMES: THEORY AND APPLICATIONS The mode of play is noncooperative and a feedback Nash equilibrium is the solution concept. 3. Game C : the retailer still oﬀers both brands and the manufacturer proposes to the retailer a C ooperative advertising program. The game is played ` la Stackelberg with the manufacturer as leader. a As in the two other games, we adopt a feedback information structure. Comparing players’ payoﬀs of the ﬁrst two games allows to measure the impact of the private label introduction by the retailer. Comparing the players’ payoﬀs of the last two games permits to see if a cooperative advertising program reduces the harm of the private label for the manufacturer. A necessary condition for the coop plan to be attractive is that it also improves the retailer’s proﬁt, otherwise the will not accept to implement it. The remaining of this paper is organized as follows: In Section 2 we introduce the diﬀerential game model and deﬁne rigorously the three above games. In Section 3 we derive the equilibria for the three games and compare the results in Section 4. In Section 5 we conclude. 2. Model Let the marketing channel be formed of a manufacturer (player M ) and a retailer (player R). The manufacturer controls the rate of national advertising for his brand A(t), t ∈ [0, ∞). Denote by G(t) the goodwill of the manufacturer’s brand, which dynamics evolve a la Nerlove and ` Arrow (1962): ˙ G(t) = λA(t) − δG(t), G(0) = G0 ≥ 0, (11.1) where λ is a positive scaling parameter and δ > 0 is the decay rate. The retailer controls the promotion eﬀorts for the national brand, denoted by p1 (t), and for the store brand, denoted by p2 (t). We consider that promotions have an immediate impact on sales and do not aﬀect the goodwill of the brand. The demand functions for the national brand (Q1 ) and for the store brand (Q2 ) are as follows: Q1 (p1 , p2 , G) = αp1 (t) − βp2 (t) + θG(t) − μG2 (t) , Q2 (p1 , p2 , G) = αp2 (t) − ψp1 (t) − γG(t), (11.2) (11.3) where α, β, θ, μ, ψ andγ are positive parameters. Thus, the demand for each brand depends on the retailer’s promotions for both brands and on the goodwill of the national brand. Both demands are linear in promotions. 11 A Diﬀerential Game of Advertising for National and Store Brands 217 We have assumed for simplicity that the sensitivity of demand to own promotion is the same for both brands considering that the retailer is using usually the same media and methods to promote both brands. However, the cross eﬀect is diﬀerent allowing for asymmetry in brand substitution. We assume that own brand promotion has a greater impact on sales, in absolute value, than competitive brand promotion, i.e., α > β and α > ψ . This assumption mirrors the one usually made on prices in oligopoly theory. We further suppose that the marginal eﬀect of promoting the national brand on the sales of the store brand is higher than the marginal eﬀect of promoting the store brand on the sales of the national brand, i.e., ψ > β . This actually means that the manufacturer’s brand enjoys a priori a stronger consumer preference than the retailer’s one. Putting together these inequalities leads to the following assumption A1 : α > ψ > β > 0. Finally, the demand for the national brand is concave increasing in its goodwill (i.e., ∂Q1 = θ − 2μG > 0, ∀G > 0) and the demand for the ∂G store brand is decreasing in that goodwill. Denote by D(t), 0 ≤ D(t) ≤ 1, the coop participation rate of the manufacturer in the retailer’s promotion cost of the national brand. We assume as in, e.g., Jørgensen et al. (2000, 2003), that the players face quadratic advertising and promotion costs. The net cost incurred by the manufacturer and the retailer are as follows CM (A) = CR (p1 , p2 ) = 1 1 uM A2 (t) + uR D (t) p2 (t), 1 2 2 1 uR 1 − D (t) p2 (t) + p2 (t) , 1 2 2 where uR , uM > 0. Denote by m0 the manufacturer’s margin, by m1 the retailer’s margin on the national brand and by m2 her margin on the store brand. Based on empirical observations, we suppose that the retailer has a higher margin on the private label than on the national brand, i.e., m2 > m1 . Ailawadi and Harlam (2004) found indeed that for product categories where national brands are heavily advertised, the percent retail margins are signiﬁcantly higher for store brands than for national brands. We denote by r the common discount rate and we assume that each player maximizes her stream of discounted proﬁt over an inﬁnite horizon. Omitting the time argument when no ambiguity may arise, the optimization problems of players M and R in the diﬀerent games are as follows: 218 DYNAMIC GAMES: THEORY AND APPLICATIONS Game C : Both brands are oﬀered and a coop program is available.
C max JM = A,D +∞ 0 e−rt m0 αp1 − βp2 + θG − μG2 −
C max JR = p1 ,p2 uM 2 uR A− Dp2 dt, 1 2 2
+∞ e−rt m1 αp1 − βp2 + θG − μG2 dt. 0 1 + m2 αp2 − ψp1 − γG − uR 1 − D p2 + p2 1 2 2 Game S : Both brands are available and there is no coop program.
S max JM = A S max JR = p1 ,p2 +∞ 0 +∞ 0 e−rt m0 αp1 − βp2 + θG − μG2 − e−rt m1 αp1 − βp2 + θG − μG2 uR 2 p1 + p2 2 2 uM 2 A dt, 2 + m2 αp2 − ψp1 − γG − dt. Game N : Only manufacturer’s brand is oﬀered and there is no coop program.
N max JM = A N max JR = p1 +∞ 0 +∞ 0 e−rt m0 αp1 + θG − μG2 − e−rt m1 αp1 + θG − μG2 uM 2 A dt, 2 uR 2 − p dt. 21 3. Equilibria We characterize in this section the equilibria of the three games. In all cases, we assume that the players adopt stationary Markovian strategies, which is rather standard in inﬁnitehorizon diﬀerential games. The following proposition gives the result for Game N . Proposition 11.1 When the retailer does not sell a store brand and the manufacturer does not provide any coop support to the retailer, stationary feedback Nash advertising and promotional strategies are given by
pN = 1 αm1 , uR 11 A Diﬀerential Game of Advertising for National and Store Brands 219 AN (G) = X + Y G, where 2m0 θλ √ , X= r + 2 Δ1 u M Δ1 = δ + r 2
2 √ r + 2δ − 2 Δ1 , Y= 2λ + 2μm0 λ2 . uM Proof. A suﬃcient condition for a stationary feedback Nash equilibrium is the following: Suppose there exists a unique and absolutely continuous solution G (t) to the initial value problem and there exist bounded and continuously diﬀerentiable functions Vi : + → , i ∈ {M, R}, such that the HamiltonJacobiBellman (HJB) equations are satisﬁed for all G ≥ 0: rVM (G) = max m0 αp1 + θG − μG2
A (11.4) 1 − uM A2 + VM (G) λA − δG  A ≥ 0 , 2 max m1 αp1 + θG − μG2
p1 rVR (G) = (11.5) 1 − uR p2 + VR (G) λA − δG  p1 ≥ 0 . 1 2 The maximization of the righthandside of equations (11.4) and (11.5) yields the following advertising and promotional rates: A (G) = λ V (G) , uM m p1 = αm1 . uR Substituting the above in (11.4) and (11.5) leads to the following expressions rVM (G) = m0 α2 m1 + θG − μG2 uR α2 m1 2uR + θG − μG2 + λ2 V (G) 2uM M λ2 uM
2 − δGVM (G) , (11.6) rVR (G) = m1 + VR (G) VM (G) − δG . (11.7) 220 DYNAMIC GAMES: THEORY AND APPLICATIONS It is easy to show that the following quadratic value functions solve the HJB equations; 1 VM (G) = a1 + a2 G + a3 G2 , 2 1 VR (G) = b1 + b2 G + b3 G2 , 2 where a1 , a2 , a3 , b1 , b2 , b3 are constants. Substitute VM (G), VR (G) and their derivatives into equations (11.6) and (11.7) to obtain: a3 2 m0 α2 m1 λ2 a2 2 + G= 2 uR 2uM λ2 a2 a3 λ2 a2 3 G − μm0 + δa3 − + m0 θ − δa2 + uM 2uM r a1 + a2 G + G2 , α2 m2 1 λ2 1 + a2 b2 r b1 + b2 G + b3 G2 = 2 2uR uM λ2 λ2 + m1 θ − δb2 + (a2 b3 + a3 b2 ) G − m1 μ + δb3 − b3 a3 G2 . uM uM By identiﬁcation, we obtain the following values for the coeﬃcients of the value functions: √ r δ + 2 ± Δ1 m1 μ a3 = , b3 = − r 2 /u λ2 λ M 2 + δ − uM a3 a2 = a1 = where Δ1 = δ + r 2
2 m0 θ r+δ−
λ2 uM a3 , b2 = b1 = m0 α2 m1 λ2 a2 2 + , ruR 2ruM λ2 uM b3 a2 λ2 r + δ − uM a3 α2 m2 λ2 a2 b2 1 m1 θ + 2ruR + ruM + 2μm0 λ2 . uM To obtain an asymptotically stable steady state, choose the negative solution for a3 . Note that the identiﬁed solution must satisfy the constraint A(G) > 0. Since uλ VM (G) = A(G), this assumption is true for M ¯ G ∈ 0, GN , where √ 2m0 θλ r + 2δ − 2 Δ1 ¯ N = −a2 , A (G) = λ VM (G) = √ G. + G a3 uM 2λ r + 2 Δ1 u M 2 11 A Diﬀerential Game of Advertising for National and Store Brands 221 The above proposition shows that the retailer promotes always the manufacturer’s brand at a positive constant rate and that the advertising strategy is decreasing in the goodwill. The next proposition characterizes the feedback Nash equilibrium in Game S . Proposition 11.2 When the retailer does sell a store brand and the manufacturer does not provide any coop support to the retailer, assuming an interior solution, stationary feedback Nash advertising and promotional strategies are given by
pS = 1 αm1 − ψm2 , uR pS = 2 αm2 − βm1 , uR AS (G) = AN (G). Proof. The proof proceeds exactly as the previous one and we therefore print only important steps. The HJB equations are given by: rVM (G) = max m0 αp1 − βp2 + θG − μG2
A − uM 2 A + VM (G) λA − δG  A ≥ 0 , 2 rVR (G) = max m1 αp1 − βp2 + θG − μG2 + m2 αp2 − ψp1 − γG
p1 ,p2 uR 2 p + p2 + VR (G) λA − δG  p1 , p2 ≥ 0 . 2 21 The maximization of the righthandside of the above equations yields the following advertising and promotional rates: A (G) = αm1 − ψm2 λ Vm (G) , p1 = , uM uR p2 = αm2 − βm1 . uR We next insert the values of A (G), p1 and p2 from above in the HJB equations and assume that the resulting equations are solved by the following quadratic functions: 1 VM (G) = s1 + s2 G + s3 G2 , 2 1 VR (G) = k1 + k2 G + k3 G2 , 2 where k1 , k2 , k3 , s1 , s2 , s3 are constants. Following the same procedure as in the proof of the previous proposition, we obtain √ r δ + 2 ± Δ2 m1 μ , k3 = − r , s3 = 2 λ2 /uM + δ − λ s3
2 uM 222
s2 = s1 = DYNAMIC GAMES: THEORY AND APPLICATIONS m0 θ r+δ−
λ2 uM s3 , k2 = m1 θ − m2 γ + r+δ− λ2 uM s3 λ2 uM k3 s2 , m0 λ2 2 α (m1 α − m2 ψ ) − β (m2 α − m1 β ) + s, ruR 2ruM 2 1 λ2 k1 = k2 s2 , (m1 α − m2 ψ )2 + (m2 α − m1 β )2 + 2ruR ruM r 2 2μm0 λ2 + . 2 uM In order to obtain an asymptotically stable steady state, we choose for s3 ¯ the negative solution. The assumption A(G) > 0 holds for G ∈ 0, GS , s2 ¯ where GS = − s3 . Note also that s3 = a3 , s2 = a2 and b3 = k3 . Thus S (G) = AN (G) and GS = GN . ¯ ¯ A 2 Δ2 = Δ1 = δ + where − > 0. For pS = αm1uRψm2 to be positive and thus the solution to be inte1 rior, it is necessary that (αm1 − ψm2 ) > 0. This means that the retailer will promote the national brand if the marginal revenue from doing so exceeds the marginal loss on the store brand. This condition has thus an important impact on the results and we shall come back to it in the conclusion. Remark 11.1 Under A1 (α > ψ > β > 0) and the assumption that − m2 > m1 , the retailer will always promote his brand, i.e., pS = αm2uRβm1 2 In the last game, the manufacturer oﬀers a coop promotion program to her retailer and acts as leader in a Stackelberg game. The results are summarized in the following proposition. Proposition 11.3 When the retailer does sell a store brand and the manufacturer provides a coop support to the retailer, assuming an interior solution, stationary feedback Stackelberg advertising and promotional strategies are given by
2αm0 + (αm1 − ψm2 ) αm2 − βm1 , pC = , 2 2uR uR 2αm0 − (αm1 − ψm2 ) . AC (G) = AS (G) , D = 2αm0 + (αm1 − ψm2 ) pC = 1 Proof. We ﬁrst obtain the reaction functions of the follower (retailer) to the leader’s announcement of an advertising strategy and a coop support rate. The later HJB equation is the following 11 A Diﬀerential Game of Advertising for National and Store Brands 223 rVR (G) = max m1 αp1 − βp2 + θG − μG2 + m2 αp2 − ψp1 − γG
p1 ,p2 (11.8) − uR (1 − D) p2 + p2 + VR (G) λA − δG  p1 , p2 ≥ 0 . 1 2 2 αm1 − ψm2 , uR (1 − D) αm2 − βm1 . uR Maximization of the righthandside of (11.8) yields p1 = p2 = (11.9) The manufacturer’s HJB equation is: rVM (G) = max m0 αp1 − βp2 + θG − μG2 −
A,D uM 2 1 A − uR Dp2 1 2 2 + VM (G) λA − δG  A ≥ 0, 0≤D≤1 . Substituting for promotion rates from (11.9) into manufacturer’s HJB equation yields rVM (G) = αm2 − βm1 αm1 − ψm2 −β + θG − μG2 uR (1 − D) uR αm1 − ψm2 2 uM 2 uR A− D + VM (G) λA − δG − 2 2 uR (1 − D) max m0 α
A,D Maximizing the righthandside leads to A (G) = λ 2αm0 − (αm1 − ψm2 ) . VM (G) , D = uM 2αm0 + (αm1 − ψm2 ) 2αm0 + (αm1 − ψm2 ) , 2uR αm2 − βm1 . uR (11.10) Using (11.9) and (11.10) provides the retailer’s promotional strategies p1 = p2 = Following a similar procedure to the one in the proof of Proposition 11.1, it is easy to check that following quadratic value functions provide unique solutions for the HJB equations, 1 1 VM (G) = n1 + n2 G + n3 G2 , VR (G) = l1 + l2 G + l3 G2 , 2 2 where n1 , n2 , n3 , l1 , l2 , l3 are constants given by: √ r δ + 2 ± Δ3 m1 μ n3 = , l3 = − r , 2 λ2 /uM + δ − λ n3
2 uM 224
n2 = DYNAMIC GAMES: THEORY AND APPLICATIONS , l2 = , λ2 λ2 r + δ − uM n 3 r + − uM n 3 m0 1 α αm0 + (m1 α − m2 ψ ) − β m2 α − m1 β n1 = ruR 2 2 1 1 λ 2 α2 m0 2 − m1 α − m2 ψ , n2 − + 2 2ruM 2ruR 4 (m1 α − m2 ψ ) 1 αm0 + m1 α − m2 ψ l1 = 2ruR 2 2 2l n (m2 α − m1 β ) λ22 + + , 2ruR ruM
2
2 m0 θ m1 θ − m2 γ + λ2 uM l 3 n 2 λ r where Δ3 = Δ2 = Δ1 = δ + 2 + 2μm0 uM . To obtain an asymptotically stable steady state, we choose the negative solution for n3 . Note that n3 = s3 = a3 , n2 = s2 = a2 , l3 = k3 = b3 and l2 = k2 . Thus AC (G) = AS (G) = AN (G). 2 Remark 11.2 As in Game S , the retailer will always promote her brand at a positive constant rate. The condition for promoting the manufacturer’s brand is (2αm0 + αm1 − ψm2 ) > 0 (the numerator of pC has to 1 be positive). The condition for an interior solution in Game S was that (αm1 − ψm2 ) > 0. Thus if pS is positive, then pC is also positive. 1 1 Remark 11.3 The support rate is constrained to be between 0 and 1. It is easy to verify that if pC > 0, then a necessary condition for D < 1 is 1 that (αm1 − ψm2 ) > 0, i.e., pS > 0. Assuming pC > 0, otherwise there 1 1 is no reason for the manufacturer to provide a support, the necessary condition for having D > 0 is (2αm0 − αm1 + ψm2 ) > 0. 4. Comparison In making the comparisons, we assume that the solutions in the three games are interior. The following table collects the equilibrium strategies and value functions obtained in the three games. In terms of strategies, it is readily seen that the manufacturer’s advertising strategy (A(G)) is the same in all three games. This is probably a byproduct of the structure of the model. Indeed, advertising does not aﬀect sales directly but do it through the goodwill. Although the later has an impact on the sales of the store brand, this does not aﬀect the proﬁts earned by the manufacturer. The retailer adopts the same promotional strategy for the private label in the games where such brand is available, i.e., whether a coop program is oﬀered or not. This is also due to the simple structure of our model. 11 A Diﬀerential Game of Advertising for National and Store Brands
Summary of Results 225 Table 11.1. Game N p1 p2 A(G) D VM (G) VR (G)
αm1 uR Game S
αm1 −ψm2 uR αm2 −βm1 uR AN (G) Game C
2αm0 +(αm1 −ψm2 ) 2uR αm2 −βm1 uR AN (G) 2αm0 −(αm1 −ψm2 ) 2αm0 +(αm1 −ψm2 ) n1 + a2 G + a3 G2 2 3 l1 + k2 G + b2 G2 AN (G) a1 + a2 G + a3 G2 2 3 b1 + b2 G + b2 G2 s1 + a2 G + a3 G2 2 3 k1 + k2 G + b2 G2 The remaining and most interesting item is how the retailer promotes the manufacturer’s brand in the diﬀerent games. The introduction of the store brand leads to a reduction in the promotional eﬀort of the manufacturer’s brand (pN − pS = ψm2 > 0). The coop program can however 1 1 uR reverse the course of action and increases the promotional eﬀort for the αm manufacturer’s brand pC − pS = 2αm0 −2uR1 +ψm2 > 0 . This result is 1 1 expected and has also been obtained in the literature cited in the introduction. What is not clear cut is whether the level of promotion could reach back the one in the game without the store brand. Indeed, pN − pC is positive if the condition that (αm1 + ψm2 > 2αm0 ) is sat1 1 isﬁed. We now compare the players’ payoﬀs in the diﬀerent games and thus answer the questions raised in this paper. Proposition 11.4 The store brand introduction is harmful for the manufacturer for all values of the parameters.
Proof. From the results of Propositions 11.1 and 11.2, we have:
S N VM (G0 ) − VM (G0 ) = s1 − a1 = − m0 [m2 ψα + β (m2 α − m1 β )] < 0. ruR 2 For the retailer, we cannot state a clearcut result. Compute,
S N VR (G0 ) − VR (G0 ) = k1 − b1 + (k2 − b2 ) G0 1 (m1 α − m2 ψ )2 + (m2 α − m1 β )2 − α2 m2 = 1 2ruR 2m2 γ 4λ2 m0 m2 θγ √ G0 . + √ 2+ r r + Δ2 ruM r + Δ2 226 DYNAMIC GAMES: THEORY AND APPLICATIONS Thus for the retailer to beneﬁt from the introduction of a store brand, the following condition must be satisﬁed √ r + Δ2 2 2 2λ2 m0 θ S N √ VR (G0 ) − VR (G0 ) > 0 ⇔ G0 > α m1 − 4m2 γuR u M r + Δ2 √ r + Δ2 (m1 α − m2 ψ )2 + (m2 α − m1 β )2 . − 4m2 γuR The above inequality says that the retailer will beneﬁt from the introduction of a store brand unless the initial goodwill of the national one is “too low”. One conjecture is that in such case the two brands would be too close and no beneﬁt is generated for the retailer from the product variety. The result that the introduction of a private label is not always in the best interest of a retailer has also been obtained by Raju et al. (1995) who considered price competition between two national brands and a private label. Turning now to the question whether a coop advertising program can mitigate, at least partially, the losses for the manufacturer, we have the following result. Proposition 11.5 The cooperative advertising program is proﬁt Paretoimproving for both players.
Proof. Recall that k2 = l2 , k3 = l3 = n3 and n2 = s2 . Thus for the manufacturer, we have
3 2 VM (G0 ) − VM (G0 ) = n1 − s1 = 1 [2αm0 − (αm1 − ψm2 )]2 > 0. 8ruR For the retailer
C S VR (G0 ) − VR (G0 ) = l1 − k1 = 1 (m1 α − m2 ψ ) (2αm0 − m1 α + m2 ψ ) 4ruR which is positive. Indeed, (m1 α − m2 ψ ) = ur pS which is positive by the 1 assumption of interior solution and (2αm0 − m1 α + m2 ψ ) which is also positive (it is the numerator of D). 2 The above proposition shows that the answer to our question is indeed yes and, importantly, the retailer would be willing to accept a coop program when suggested by the manufacturer. 5. Concluding Remarks The results so far obtained rely heavily on the assumption that the solution of Game S is interior. Indeed, we have assumed that the retailer 11 A Diﬀerential Game of Advertising for National and Store Brands 227 will promote the manufacturer’s brand in that game. A natural question is that what would happen if it were not the case? Recall that we required that pS = 1 αm1 − ψm2 > 0 ⇔ αm1 > ψm2 . uR If αm1 > ψm2 is not satisﬁed, then pS = 0 and the players’ payoﬀs 1 should be adjusted accordingly. The crucial point however is that in such event, the constraint on the participation rate in Game C would be impossible to satisfy. Indeed, recall that D= and compute 2αm0 − (αm1 − ψm2 ) , 2αm0 + αm1 − ψm2 2 (αm1 − ψm2 ) . 2αm0 + αm1 − ψm2 Hence, under the condition that (αm1 − ψm2 < 0), the retailer does not invest in any promotions for the national brand after introducing the private label pS = 0 . In this case, the cooperative advertising 1 program can be implemented only if the retailer does promote the national brand and the manufacturer oﬀers the cooperative advertising program i.e., a positive coop participation rate, which is possible only if (2αm0 + αm1 − ψm2 ) > 0. Now, suppose that we are in a situation where the following conditions are true 1−D = αm1 − ψm2 < 0 and 2αm0 + αm1 − ψm2 > 0 (11.11) In this case, the retailer does promote the manufacturer’s product (pC > 1 0), however we obtain D > 1. This means that the manufacturer has to pay more than the actual cost to get her brand promoted by the retailer in Game C and the constraint D < 1 has to be removed. For pS = 0 and when the conditions in (11.11) are satisﬁed, it is easy 1 to show that the eﬀect of the cooperative advertising program on the proﬁts of retailer and the manufacturer are given by (αm1 − ψm2 ) (2αm0 + m1 α − m2 ψ ) < 0 4ur 1 C S VM (G0 ) − VM (G0 ) = (2m0 α + αm1 − ψm2 )2 > 0 8ur
C S VR (G0 ) − VR (G0 ) = In this case, even if the manufacturer is willing to pay the retailer more then the costs incurred by advertising the national brand, the retailer will not implement the cooperative program. 228 DYNAMIC GAMES: THEORY AND APPLICATIONS To wrap up, the message is that the implementability of a coop promotion program depends on the type of competition one assumes between the two brands and the revenues generated from their sales to the retailer. The model we used here is rather simple and some extensions are desirable such as, e.g., letting the margins or prices be endogenous. References
Ailawadi, K.L. and Harlam, B.A. (2004). An empirical analysis of the determinants of retail margins: The role of storebrand share. Journal of Marketing, 68(1):147–165. Bergen, M. and John, G. (1997). Understanding cooperative advertising participation rates in conventional channels. Journal of Marketing Research, 34(3):357–369. Berger, P.D. (1972). Vertical cooperative advertising ventures. Journal of Marketing Research, 9(3):309–312. Chintagunta, P.K., Bonfrer , A., and Song, I. (2002). Investigating the eﬀects of store brand introduction on retailer demand and pricing behavior. Management Science, 48(10):1242–1267. Dant, R.P. and Berger, P.D. (1996). Modeling cooperative advertising decisions in franchising. Journal of the Operational Research Society, 47(9):1120–1136. Dhar, S.K. and S.J. Hoch (1997). Why store brands penetration varies by retailer. Marketing Science, 16(3):208–227. Jørgensen, S., Sigu´, S.P., and Zaccour, G. (2000). Dynamic cooperative e advertising in a channel. Journal of Retailing, 76(1):71–92. Jørgensen, S., Taboubi, S., and Zaccour, G. (2001). Cooperative advertising in a marketing channel. Journal of Optimization Theory and Applications, 110(1):145–158. Jørgensen, S., Taboubi, S., and Zaccour, G. (2003). Retail promotions with negative brand image eﬀects: Is cooperation possible? European Journal of Operational Research, 150(2):395–405. Jørgensen, S. and Zaccour, G. (2003). A diﬀerential game of retailer promotions. Automatica, 39(7):1145–1155. Kim, S.Y. and Staelin, R. (1999). Manufacturer allowances and retailer passthrough rates in a competitive environment. Marketing Science, 18(1):59–76. Mills, D.E. (1995). Why retailers sell private labels. Journal of Economics & Management Strategy, 4(3):509–528. Mills, D.E. (1999). Private labels and manufacturer counterstrategies. European Review of Agricultural Economics, 26:125–145. 11 A Diﬀerential Game of Advertising for National and Store Brands 229 Narasimhan, C. and Wilcox, R.T. (1998). Private labels and the channel relationship: A crosscategory analysis. Journal of Business, 71(4): 573–600. Nerlove, M. and Arrow, K.J. (1962). Optimal advertising policy under dynamic conditions. Economica, 39:129–142. Raju, J.S., Sethuraman, R., and Dhar, S.K. (1995). The introduction and performance of store brands. Management Science, 41(6):957–978. Chapter 12 INCENTIVE STRATEGIES FOR SHELFSPACE ALLOCATION IN DUOPOLIES
Guiomar Mart´ ınHerr´n a Sihem Taboubi
Abstract We examine the issue of shelfspace allocation in a marketing channel where two manufacturers compete for a limited shelfspace at a retail store. The retailer controls the shelfspace to be allocated to brands while the manufacturers make advertising decisions to build their brand image and to increase ﬁnal demand (pull strategy). Manufacturers also oﬀer an incentive designed to induce the retailer to allocate more shelfspace to their brands (push strategy). The incentive takes the form of a shelf dependent display allowance. The problem is formulated as a Stackelberg diﬀerential game played over an inﬁnite horizon, with manufacturers as leaders. Stationary feedback equilibria are computed, and numerical simulations are carried out in order to illustrate how channel members should allocate their marketing eﬀorts. 1. Introduction The increasing competition in almost all industries and the proliferation of retailers’ private brands in the last decades induce a hard battle for shelfspace between manufacturers at retail stores. Indeed, according to a Food Marketing Institute Report (1999), about 100,000 grocery products are available nowadays on the market, and every year, thousands more new products are introduced. Comparing this number to the number of products that can be placed on the shelves of a typical supermarket (40,000 products) justiﬁes the huge marketing eﬀorts deployed by manufacturers to persuade their dealers to keep their brands on the shelves. Manufacturers invest in promotional and advertising activities designed to ﬁnal consumers (pull strategies) and spend in trade promotions designed for their dealers (push strategies). Trade promotions are 232 DYNAMIC GAMES: THEORY AND APPLICATIONS incentives granted to retailers in return for promoting merchandise at their stores. When these incentives are designed to induce retailers to give a better display to the brand, they are called slotting (or display) allowances. Shelfspace allocation is a decision that has to be taken by the retailer. However, this issue involves also the other channel members (i.e. manufacturers) as long as they can inﬂuence the shelfspace allocation decisions of retailers. Studies on shelfspace allocation can be found in marketing and operational research literature. Some of these studies adopted a normative perspective where the authors investigated optimal shelfspace allocation decisions (e.g., Corstjens and Doyle (1981); Corstjens and Doyle (1983); Zufreyden (1986); Yang (2001)), while others examined the issue of shelfspace allocation in a descriptive manner by proving that shelfspace has a positive impact on sales (e.g., Curhan (1973); Dr`ze, e Hoch and Purk (1994); Desmet and Renaudin (1998)). All the studies mentioned above suppose that shelfspace allocation is a decision taken at the retail level, but they neglected the impact of this decision on the marketing decisions of manufacturers, and the impact of manufacturers’ decisions on the shelfspace allocation policy of the retailers. Studies that examined the marketing decisions in channels by taking into account the dynamic long term interactions between channel members adopted diﬀerential games as a framework, are, among others, Chintagunta and Jain (1992); Jørgensen, Sigu´ and Zaccour (2000); e Jørgensen, Sigu´ and Zaccour (2001); Jørgensen, Taboubi and Zaccour e (2001); Jørgensen and Zaccour (1999); Taboubi and Zaccour (2002). An exhaustive survey of this literature is presented in Jørgensen and Zaccour (2004). In the marketing channels literature, studies that suggested the use of incentive strategies as a coordinating mechanisms of the channel used mainly a static framework (e.g., Jeuland and Shugan (1983); Bergen and John (1997)). More recently, Jørgensen and Zaccour (2003) extended this work to a dynamic setting where a manufacturer and a retailer implement twosided incentives designed to induce each other to choose the coordinating pricing and advertising levels. To our best knowledge, the only studies that investigated the shelfspace allocation issue by considering the whole marketing channel are those of Jeuland and Shugan (1983) and Wang and Gerchack (2001). Both studies examined the shelfspace allocation decisions as a way to reach channel cooperation (i.e., total channel proﬁt maximization). Wang and Gerchack (2001) design an incentive that takes the form of 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 233 an inventoryholding subsidy and that leads the retailer to select the coordinating inventory level (i.e., the inventory level that the manufacturer would have allocated to the brand in case of total channel proﬁt maximization). An important shortcoming of both studies is that the shelfspace allocation decision is considered as a single variable related to the brand of a unique manufacturer (manufacturers of competing brands are passive players in the game). Furthermore, both studies used a static setting, thus ignored the longterm eﬀects of some marketing variables (e.g., advertising and promotional eﬀorts) and the intertemporal interactions that take place between channel members. Recent works by Mart´ ınHerr´n and Taboubi (2004) and Mart´ a ınHerr´n, Taboubi and Zaccour (2004) examine the issue of shelfspace a allocation by taking into account the interactions in the marketing channel and these interactions are not conﬁned to a oneshot game. Both studies assumed that the competing manufacturers can inﬂuence the shelfspace allocation decisions of the unique retailer through advertising targeting their consumers (pull strategies) which builds the brand image and increase sales. In both studies, manufacturers’ inﬂuence on retailer’s shelfspace allocation is indirect. Manufacturers do not use push mechanisms to inﬂuence directly retailer’s decisions. In the present study we examine the case of a channel where a retailer sells the brands of two competing manufacturers. Retailer’s shelfspace allocation decisions can be inﬂuenced directly through the use of incentive strategies by both manufacturers (push strategies), or indirectly through advertising (pull strategies). By considering a dynamic model, we take into account the carryover eﬀects of manufacturers’ advertising investments that build their brand images and the longterm interactions between the partners in the marketing channel. The paper is organized as follows. Section 2 introduces the model. Section 3 gives the analytical solutions to the shelfspace, advertising, and incentive strategy problem of channel members. Section 4 and 5 present some numerical results to illustrate the ﬁndings, and Section 6 concludes. 2. The model The model is designed to solve the problem of shelfspace allocation, advertising investments, and incentive decisions in a competitive marketing channel. The network is composed of a unique retailer selling the brands of two competing manufacturers. 234 DYNAMIC GAMES: THEORY AND APPLICATIONS The retailer has a limited shelfspace in her store. She must decide on how to allocate this limited resource between both brands. Let Si (t) denote the shelfspace to be allocated to brand i at time t and consider that the total shelfspace available at the retail store is a constant, normalized to 1. Hence, the relationship between the shelfspaces given to both brands can be written as S2 (t) = 1 − S1 (t) . We assume that shelfspace costs are linear and the unit cost equal for both brands1 . Without loss of generality, we consider that these costs of shelfspace allocation are equal to zero. The manufacturers compete on the shelfspace available at the retail store. They control their advertising strategies in national media (Ai (t)) in order to increase their goodwill stocks (Gi (t)). We consider that the advertising costs are quadratic: 1 C (Ai (t)) = ui A2 (t) , i = 1, 2, i 2 where ui is a positive parameter. Furthermore, in order to increase the ﬁnal demand for their brands at the retail store, each manufacturer can oﬀer an incentive with the aim that the retailer assigns a greater shelfspace to his brand. The display allowance takes the form of a shelf dependent incentive to the retailer. We suppose that the incentives given by manufacturers 1 and 2 are, respectively I1 (S1 ) = ω1 S1 , I2 (S1 ) = ω2 (1 − S1 ). The manufacturers control the incentive coeﬃcient functions ω1 (t) and ω2 (t), which have to take positive values. The incentive Ii (Si ) is a linear sidepayment which rewards the retailer, with the objective that she allocates more shelfspace to brand i. The retailer faces a demand function for brand i, Di = Di (Si , Gi ), that takes the following form: 1 Di (t) = Si (t) ai Gi (t) − bi Si (t) , 2 i = 1, 2, (12.1) where ai , bi are positive parameters and ai captures the cross eﬀect of shelfspace and goodwill on sales. The interaction between Gi (t) and
1 Shelfspace costs are costs of removing one item from the shelves and replacing it by another, and by putting price information on products. 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 235 Si (t) means that the goodwill eﬀect on the sales of the brand is enhanced 2 by his share of the shelfspace. Notice that the quadratic term 1/2bi Si (t) in the sales function of each brand captures the decreasing marginal eﬀects of shelfspace on sales, which means that every additional unit of the brand on the shelf leads to lower additional sales than the previous one (see, for example, Bultez and Naert (1988)). The demand function must have some features that, in turn, impose conditions on shelfspace and the model parameters: Di (t) 0, ∂Di (t) /∂Gi (t) 0, ∂Di (t) /∂Si (t) 0, i = 1, 2, and the resulting constraints on shelfspace are: 1− a1 a2 G2 (t) ≤ S1 (t) ≤ G1 (t). b2 b1 (12.2) The goodwill for brand i is a stock that captures the longterm eﬀects of the advertising of manufacturer i. It evolves according to the Nerlove and Arrow (1962) dynamics: dGi (t) = αi Ai (t) − δGi (t) , dt Gi (0) = Gi0 > 0, i = 1, 2, (12.3) where αi is a positive parameter that captures the eﬃciency of the advertising investments of manufacturer i, and δ is a decay rate that reﬂects the depreciation of the goodwill stock, because of oblivion, product obsolence or competitive advertising. The game is played over an inﬁnite horizon and ﬁrms have a constant and equal discount rate ρ. To focus on the shelfspace allocation and incentive problems, we consider a situation that involves brands in the same product category with relatively similar prices. Hence, the retailer and both manufacturers have constant retail and wholesale margins for brand i, being ﬁxed at the beginning of the game, and denoted by πRi and πMi 2 . By choosing the amount of shelfspace to allocate to brand 1, the retailer aims at maximizing her proﬁt ﬂow derived from selling the products of the two brands and the sidepayments received from the manufacturers:
∞ 2 JR = exp (−ρt)
i=1 0 (πRi Di (t) + ωi (t)Si (t)) dt. (12.4) 2 This assumption was used in Chintagunta and Jain (1992), Jørgensen, Sigu´ and Zaccour e (2000), and Jørgensen, Taboubi and Zaccour (2001). 236 DYNAMIC GAMES: THEORY AND APPLICATIONS Manufacturer i controls his advertising investment, Ai (t), and the incentive coeﬃcient function ωi (t). His aim is to maximize his proﬁt ﬂow, taking into account the cost of implementing this strategy:
∞ JMi = 0 1 exp (−ρt) πMi Di (t) − ui A2 (t) − ωi Si (t) dt. i 2 (12.5) To recapitulate, we have deﬁned by (12.3), (12.4) and (12.5) a differential game that takes place between two competing manufacturers selling their brands through a common retailer. The game has ﬁve control variables S1 (t), A1 (t), A2 (t), w1 (t), w2 (t) (one for the retailer and two for each manufacturer) and two state variables G1 (t), G2 (t). The controls are constrained by 0 < S1 (t) < 1, A1 (t) 0, A2 (t) 0, 0, w2 (t) 0, and the conditions given in (12.2)3 . The state w1 (t) constraints Gi (t) 0, i = 1, 2, are automatically satisﬁed. 3. Stackelberg game The diﬀerential game played between the diﬀerent channel members is a Stackelberg game where the retailer is the follower and the manufacturers are the leaders. The sequence of the game is the following: the manufacturers as leaders announce, simultaneously, their advertising and incentive strategies. The retailer reacts to this information by choosing the shelfspace level that maximizes her objective functional (12.4). The manufacturers play a noncooperative game a la Nash. We employ ` the hypothesis that both manufacturers only observe the evolution of their own goodwill, not that of their competitor4 . Since the game is played over an inﬁnite time horizon and is autonomous, we suppose that strategies depend on the current level of the state variable only. The following proposition characterizes the retailer’s reaction function. Proposition 12.1 If S1 > 0, the retailer’s reaction function for shelfspace allocation is given by5 :
S1 (G1 , G2 ) = ω1 − ω2 + a1 πR1 G1 − a2 πR2 G2 + b2 πR2 . b1 πR1 + b2 πR2 (12.6) 3 We do not take into account these conditions in the problem resolution. However, we check their fulﬁllment “a posteriori”. 4 This assumption is mainly set for model’s tractability, see Roberts and Samuelson (1988), Jørgensen, Taboubi and Zaccour (2003) and Taboubi and Zaccour (2002). In Mart´ ınHerr´n a and Taboubi (2004), we prove that the qualitative results still hold whenever the hypothesis is removed. 5 From now on, the time argument is often omitted when no confusion can arise. 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 237 Proof. The retailer’s optimization problem is to choose the shelfspace level that maximizes (12.4) subject to the dynamics of the goodwill stocks given in (12.3). The shelfspace decision does not aﬀect the diﬀerential equations in (12.3) and therefore, her optimal shelfspace decision is the solution of the following static optimization problem:
2 max
S1 i=1 (πRi Di + ωi Si ) . The expression in (12.6) is the unique interior shelfspace allocation solution of the problem above. 2 The proposition states that the shelfspace allocated to each brand is positively aﬀected by its own goodwill and negatively aﬀected by the goodwill stock of the competing brand. Shelfspace allocation depends also on the retail margins of both brands and the parameters of the demand functions. Furthermore, the statedependent (see Proposition 12.2 below) coefﬁcient functions of both manufacturers’ incentive strategies (ωi ) aﬀect the shelfspace allocation decisions. Indeed, the term ω1 − ω2 in the numerator indicates that the shelfspace allocated to brand 1 is greater under the implementation of the incentive (than without it) if and only if ω1 − ω2 > 0. That is, manufacturer 1 attains his objective by giving the incentive to the retailer only if his incentive coeﬃcient ω1 is greater than the incentive coeﬃcient ω2 selected by the other manufacturer. Furthermore, when only one manufacturer oﬀers an incentive, the shelfspace allocated to the other brand is reduced compared to the case where none gives an incentive. 3.1 Manufacturers’ incentive strategies Manufacturers play a Nash game and, as leaders in the Stackelberg game, they know the shelfspace that will be allocated to the brands by the retailer. Both manufacturers decide at the same time their advertising investments and values of the incentive coeﬃcients ωi . The manufacturers maximize their objective functionals, where the shelfspace has been replaced by its expression in (12.6), subject to the dynamics of their own brand goodwill. The following proposition characterizes manufacturers’ incentive statedependent coeﬃcient functions at the equilibrium. 238 DYNAMIC GAMES: THEORY AND APPLICATIONS Proposition 12.2 If ωi > 0, manufacturers’ equilibrium incentive coeﬃcients are
ωi (Gi , Gj ) = − [bi (πMi + πRi ) + bj πRj ][bi πRi + bj (πMj + 2πRj )] bi (πMi + 3πRi ) + bj (πMj + 3πRj ) (bi πRi + bj πRj )(πMi − πRi ) + bj πMi (πMj + πRj ) + ai Gi bi (πMi + 3πRi ) + bj (πMj + 3πRj ) (πMj + πRj )[bi (πMi + πRi ) + bj πRj ] aj Gj , + bi (πMi + 3πRi ) + bj (πMj + 3πRj ) i, j = 1, 2, i = j. (12.7) Proof. Since the incentives do not aﬀect the dynamics of the goodwill stocks, the manufacturers solve the static optimization problem: 1 max πMi Di − ui A2 − ωi Si , i ωi 2 i = 1, 2, where S2 = 1 − S1 and S1 is given in (12.6). Equating to zero the partial derivative of manufacturer i’s objective function with respect to ωi , we obtain a system of two equations for ωi , i = 1, 2. Solving this system gives the manufacturers’ incentive coefﬁcient functions at the equilibrium in (12.7). 2 The proposition states that the incentive coeﬃcients at the equilibrium depend on both channel members’ goodwill stocks. The equilibrium value of ωi is increasing in Gj , j = i. This means that each manufacturer increases his incentive coeﬃcient when the goodwill stock of the competing brand increases, a behavior that can be explained by the fact that the retailer’s shelfspace allocation rule indicates that the shelfspace given to a brand increases with the increase of his own goodwill stock. Hence, the manufacturer of the competing brand has to increase his incentive in order to try to increase his share of the shelfspace. The incentive coeﬃcient of a manufacturer could be increasing or decreasing in his goodwill stock, depending on the parameter values. The following corollary gives necessary and suﬃcient conditions ensuring a negative relationship between the manufacturer’s incentive coeﬃcient function and his own goodwill stock. Corollary 12.1 Necessary and suﬃcient conditions guaranteeing that ωi is a decreasing function of Gi are given by:
(bi πRi+bj πRj )(πMi−πRi )+bj πMi (πMj+πRj ) < 0, i, j = 1, 2, i = j. (12.8) 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 239 In the case of symmetric demand functions, a1 = a2 = a, b1 = b2 = b, and symmetric margins, πM 1 = πM 2 = πM , πR1 = πR2 = πR , the inequalities in (12.8) reduce to: √ πR > (3 + 17)/4πM . Proof. Inequalities (12.8) can be derived straightforwardly from the expressions in (12.7). The inequality applying in the symmetric case is obtained from (12.8). 2 The inequality for the symmetric case indicates that the manufacturer of one brand will decrease his shelfspace incentive when the goodwill level of his brand increases if the retailer’s margin is great enough, compared to the manufacturers’ margins. The next corollary establishes a necessary and suﬃcient condition guaranteeing that the implementation of the incentive mechanism allows manufacturer i to attain his objective of having a greater shelfspace allocation at the retail store. Corollary 12.2 The shelfspace allocated to brand i is greater when the incentive strategies are implemented than without, if and only if the following condition holds:
2 2 b2 πRi −b2 πRj + bi bj (πMj πRi − πMi πRj ) i j 2 − [2bi πRi + bj (πMj πRi − (πMi − 2πRi )πRj )]ai Gi 2 + [2bj πRj − bi (πMj πRi − (πMi + 2πRi )πRj )]aj Gj > 0, (12.9) i, j = 1, 2, i = j. In the symmetric case inequality (12.9) reduces to: Gi − Gj < 0, i, j = 1, 2, i = j. Proof. From the expression of the retailer’s reaction function in Proposition 12.1, we have that the shelfspace given to brand i is increased with the incentive whenever the diﬀerence ωi − ωj is positive. This later inequality replacing the optimal expressions of ωk in (12.7) can be rewritten as in inequality (12.9). Exploiting the symmetry hypothesis, inequality (12.9) becomes: −
2 2a(Gi − Gj )πR > 0, πM + 3πR i, j = 1, 2, i = j, (12.10) 2 which is equivalent to Gi − Gj < 0. The result in (12.10) indicates that the shelfspace allocated by the retailer to brand i under the incentive policy is greater than without 240 DYNAMIC GAMES: THEORY AND APPLICATIONS the incentive if and only if the goodwill of brand i is lower than that of his competitor. This result means that if the main objective of manufacturer i, when applying the incentive program, is to attain a greater shelfspace allocation for his brand, then he attains this objective only when his brand has a lower goodwill than that of the other manufacturer. The intuition behind this behavior is that the manufacturer with the lowest goodwill stock will be given a lowest share of total shelfspace, thus, he reacts by oﬀering an incentive. 3.2 Manufacturers’ advertising decisions The following proposition gives the advertising strategies and value functions for both manufacturers. Proposition 12.3 Assuming interior solutions, manufacturers’ advertising strategies and value functions are the following: (i) Advertising strategies:
A1 (G1 , G2 ) = A2 (G1 , G2 ) = α1 (K11 G1 + K13 G2 + K14 ) , u1 (12.11) α2 (K23 G1 + K22 G2 + K25 ) , (12.12) u2 where Kij , i = 1, 2, j = 1, . . . , 6 are parameters given in the Appendix. Furthermore, K11 , K22 ≥ 0 and Ki3 ≥ 0 if Kii is chosen and Γ > 0; Ki3 ≤ 0 if Kii is chosen and Γ < 0 or Kii is chosen, where i, j = 1, 2, i = j , Kii , Kii are given in the Appendix and
2 Γ = δ (δ + ρ)uMi (bi πRi + bj πRj )2 − αi πM i πRi a2 (bi πRi +2bj πRj ). i (ii) Manufacturers’ value functions are the following: 1 1 VMi (G1 , G2 ) = Ki1 G2 + Ki2 G2 + Ki3 G1 G2 + Ki4 G1 + Ki5 G2 + Ki6 , 1 2 2 2 i = 1, 2. Proof. See Appendix. 2 Item (i) indicates that the Markovian advertising strategies in oligopolies are linear in the goodwill levels of both brands in the market. As in situations of monopoly they satisfy the classical rule equating marginal 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 241 revenues to marginal costs. However, for competitive situations, the results state that each manufacturer reacts to his own goodwill increase by rising his advertising investments, while its reaction to the increase of his competitor’s brand image could be an increase or a decrease of his advertising eﬀort. Such reaction diﬀers from that of monopolist manufacturer who must decrease his advertising when his goodwill increases. This reaction is mainly driven by the model’s parameters. According to Roberts and Samuelson (1988), it indicates whether manufacturers’ advertising eﬀect is informative or predatory. When the advertising eﬀect is informative, the advertising investment by one manufacturer leads to higher sales for both brands and a higher total market size. In case of predatory advertising, each manufacturer increases his advertising eﬀort in order to increase his goodwill stock. This increase leads the competitor to decrease his advertising eﬀort, which leads to a decrease of his goodwill stock6 . Item (ii) in the proposition indicates that the values functions of both manufacturers are quadratic in the goodwill levels of both brands. 3.3 Shelfspace allocation The shelfspace allocated to brand 1 can be computed once the optimal values for the incentive coeﬃcients in (12.7) have been substituted. The following proposition characterizes the shelfspace allocation decision and value function of the retailer. Proposition 12.4 If S1 > 0, retailer’s shelfspace allocation decision at the equilibrium and value function are the following: (i) Shelfspace allocation at the equilibrium:
S1 (G1 , G2 ) = a1 (πM1 + πR1 )G1 − a2 (πM2 + πR2 )G2 b1 (πM1 +3πR1 )+ b2 (πM2 +3πR2 ) + b2 (πM2 +2πR2 )+ b1 πR1 . b1 (πM1 +3πR1 )+ b2 (πM2 +3πR2 ) (12.13) (ii) Retailer’s value function: 1 1 VR (G1 , G2 ) = L1 G2 + L2 G2 + L3 G1 G2 + L4 G1 + L5 G2 + L6 , 1 2 2 2 where constants Lk , k = 1, . . . , 6 are given in the Appendix.
6 Items (i) and (ii) in the proposition above are similar, qualitatively speaking, to previous results obtained in Mart´ ınHerr´n and Taboubi (2004). For more details about the advertisa ing strategies of manufacturers and proofs about the issue of bounded goodwill stocks and stability of the optimal time paths, see Mart´ ınHerr´n and Taboubi (2004). a 242 DYNAMIC GAMES: THEORY AND APPLICATIONS Proof. Substituting into the retailer’s reaction function (12.6) the incentive coeﬃcient functions at the equilibrium in (12.7), the expression (12.13) is obtained. The proof of item (ii) follows the same steps than that of Proposition 12.3 and for that reason is omitted here. In the Appendix we state the expressions of the coeﬃcients of the retailer’s value function. 2 From item (i) in the proposition, the shelfspace allocated to brand 1 is always increasing with its goodwill stock and decreasing with the goodwill of the competitor’s brand, 2. Item (ii) of the proposition states that the retailer’s value function is quadratic in both goodwill stocks. Let us note that the expression of the retailer’s value function is not needed to determine her optimal policy, but only necessary to compute the retailer’s proﬁt at the equilibrium. Corollary 12.3 The shelfspace allocated to brand i is greater than that of the competitor if and only if the following condition is fulﬁlled:
(2ai Gi − bi )(πMi + πRi ) − (2aj Gj − bj )(πMj + πRj ) > 0, i, j = 1, 2, i = j. (12.14) Under the symmetry assumption, (12.14) reduces to Gi − Gj > 0, i, j = 1, 2, i = j. (12.15) Proof. Expression (12.13) can be rewritten as: S1 = 1 (2a1 G1 − b1 )(πM 1 + πR1 ) − (2a2 G2 − b2 )(πM 2 + πR2 ) + . 2 2(b1 (πM 1 + πR1 ) + b2 (πM 2 + πR2 )) Therefore, the retailer allocates a greater shelfspace to brand 1 than to brand 2 if and only if the following condition is satisﬁed: 2 In the symmetric case, the condition in (12.15) means that the retailer gives a highest share of her available shelfspace to the brand with the highest goodwill stock. (2a1 G1 − b1 )(πM 1 + πR1 ) − (2a2 G2 − b2 )(πM 2 + πR2 ) > 0. 4. A numerical example In order to illustrate the behavior of the retailer’s and manufacturers’ equilibrium strategies, we present a numerical example. The values of the model parameters are shown in Table 12.1, except α1 = 2.7, α2 = 2.74. The subscript k in Table 12.1 indicates that the same value of the parameter has been ﬁxed for brands 1 and 2. 12 Incentive Strategies for ShelfSpace Allocation in Duopolies
Values of model parameters. 243 Table 12.1. Parameters Fixed values πMk 1 πR k 1.8 ak 0.5 bk 1.62 uk 1 αk 2.7 δ 0.5 ρ 0.35 We assume that both players choose Kii , implying that Ki3 is negative. The steadystate equilibrium for the goodwill variables (G∞ , G∞ ) 1 2 = (7.9072, 7.7000) is a saddle point7 . 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 10 9.5 9 8.5 G2 8 7.5 7 7 7.5 8.5 8 G1 9 9.5 10 S1(G1,G2) Figure 12.1. Shelfspace feedback strategy Figure 12.1 shows the retailer’s feedback equilibrium strategy and displays how the shelfspace for brand 1 varies according to G1 and G2 . The slope of the plane shows that the shelfspace for each brand increases with his own goodwill and decreases with that of the competitor. For high values of G2 and low values of G1 the highest shelfspace is allocated to brand 2. Figure 12.2 shows the incentive strategies of both manufacturers. The slopes of the two planes illustrate that the statedependent coeﬃcient in the incentive strategy of each manufacturer depends negatively on his own goodwill and positively on the goodwill of his competitor. It is easy to verify that both manufacturers choose the same coeﬃcients if and only if G1 equals G2 . Therefore, as Figure 12.2 depicts for values of G1 greater than those of G2 , manufacturer 1 chooses an incentive coeﬃcient lower than that of his competitor. The result is reversed when the goodwill stock of the ﬁrst brand is lower than that of the second brand.
7 The expressions of the steadystate equilibrium values are shown in the Appendix. 244 DYNAMIC GAMES: THEORY AND APPLICATIONS w1(G1,G2) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 10 9.5 9 8.5 G2 8 7.5 7 7 7.5 8.5 8 G1 9 9.5 10 w2(G1,G2) Figure 12.2. Incentive coeﬃcient feedback strategies 3 2.5 2 1.5 1 0.5 0 10 9.5 9 8.5 G2 8 7.5 7 7 7.5 8 8.5 G1 9 9.5 10 A2(G1,G2) A1(G1,G2) Figure 12.3. Advertising feedback strategies Figure 12.3 shows the advertising feedback strategies of the manufacturers. The slopes of the two planes illustrate how the advertising strategy of each manufacturer is positively aﬀected by his own goodwill and negatively by the goodwill of his competitor. Note that this behavior is just the opposite of the one presented above for the incentive coeﬃcient feedback strategies. We also notice that for high values of G1 and low values of G2 , manufacturer 1 invests more in advertising than his competitor, who acts in the opposite way. Both manufacturers invest equally in advertising when their goodwill stocks satisfy the following equality: G2 = 1.2744G1 − 2.3043. As Figure 12.3 illustrates, manufacturer 1 advertises higher than his competitor if and only if G2 < 1.2744G1 − 2.3043. 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 245 An easy computation allows us to establish that if the goodwill stock for the ﬁrst brand, G1 exceeds 8.3976, then whenever G2 < G1 , manufacturer 1 invests more in advertising but chooses a lower incentive coeﬃcient than that of his competitor. When G1 belongs to the interval (1.8081, 8.3976), the same behavior than before can be guaranteed if the goodwill stocks of the both brands satisfy the following inequality: G2 < 1.2744G1 − 2.3043. 5. Sensitivity analysis In order to understand the behavior of the strategies and the outcomes at the steadystate, we use speciﬁc values of the parameters. We examine the sensitivity of the strategies and outcomes to these parameters by ﬁxing all except one. The eﬀects of the parameters are identiﬁed by comparing the results to a “base” case. The parameter values for the “base” case are given in Table 12.1. We examined the sensitivity of the strategies and outcomes to the the eﬀect of advertising on goodwill (αk ), the retail margins (πRk ), and the longterm eﬀect of shelfspace on sales (bk ) under symmetric and asymmetric conditions. In Tables 12.212.4 we report the results at the steadystates8 . Moreover, for all the numerical simulations reported below one of the eigenvalues of the Jacobian matrix associated to the dynamical system is negative, leading to asymptotically stable steadystates for the goodwill stocks. All the results we present correspond to values of the parameters for which the incentive coeﬃcient of a manufacturer is a decreasing function of his own goodwill stock 9 . Each table shows the steadystate values for the goodwill stocks, advertising investments, incentive coeﬃcients, shelfspace for the ﬁrst brand, demand for each brand, and channel members’ individual profits. 5.1 Sensitivity to the advertising eﬀect on goodwill We begin by analyzing the sensitivity of channel members’ strategies and outcomes to the variation in the advertising eﬀect on goodwill, under symmetric and asymmetric conditions. The values in the ﬁrst two columns of Table 12.2 are obtained under the hypotheses of symmetry
8 For all the numerical simulations it has been veriﬁed that the increasing and positiveness conditions on the demand functions leading to conditions (12.2) are satisﬁed. 9 The results of the diﬀerent sensitivity analysis remain qualitatively the same when the values of the parameters lead to incentive coeﬃcient functions which increase with an increment of his own goodwill stock. 246 DYNAMIC GAMES: THEORY AND APPLICATIONS in all the parameters, while the values of the last column give the results of a scenario where all the parameters are set equal, except for the parameter that we vary. This case corresponds to a situation of nonsymmetry.
Table 12.2. Summary of sensitivity results to the advertising eﬀect on goodwill. Sensitivity to G1 G2 A1 A2 w1 w2 S1 D1 D2 JM1 JM2 JR αk = 2.7 7.6920 7.6920 1.4244 1.4244 0.1200 0.1200 0.5000 1.7205 1.7205 1.8456 1.8456 18.0395 αk = 2.74 7.9216 7.9216 1.4456 1.4456 0.2348 0.2348 0.5000 1.7779 1.7779 1.7591 1.7591 18.9579 α2 = 2.74 7.9072 7.7000 1.4643 1.4051 0.1234 0.2282 0.5140 1.8181 1.6798 1.9503 1.6621 18.4875 The results in the ﬁrst column of Table 12.2 are intuitive. They indicate that under full symmetry, manufacturers use both pull and push strategies at the steadystate: they invest equally in advertising and give the same incentive to the retailer, who allocates the shelfspace equally to both brands. Interestingly, we can notice that even though the shelfspace is equally shared by both brands, the manufacturers still oﬀer an incentive to the retailer. This behavior can be explained by the fact that both manufacturers act simultaneously without sharing any information about their advertising and incentive decisions. Each manufacturer gives an incentive with the aim of getting a higher share of the shelfspace compared to his competitor. The second column indicates that a symmetric increase of the advertising eﬀect on the goodwill stock for both brands leads to an increase of advertising and goodwill stocks. The shelfspace is equally shared by both brands, and the manufacturers decide to oﬀer the same incentive coeﬃcient to the retailer, but this coeﬃcient is increased. Now we remove the hypothesis that the eﬀect of advertising on goodwill is the same for both brands, and suppose that these eﬀects are α1 = 2.7 for brand 1, and α2 = 2.74 for brand 2. The results are reported in the last column of Table 12.2 and state that, when the 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 247 advertising eﬃciency of manufacturer 2 is increased10 , compared to that of manufacturer 1, manufacturer 2 allocates his marketing eﬀorts diﬀerently (compared to the symmetric case): he invests more in push than in pull strategies. Indeed manufacturer 2 lowers his advertising investment while his competitor invests more in advertising. The resulting goodwill stock for brand 1 becomes higher than that of brand 2. The retailer then gives a highest share of the available shelfspace to brand 1 and the manufacturer of brand 2 has to ﬁx a highest incentive coeﬃcient in order to inﬂuence the retailer’s shelfspace allocation decision. 5.2 Sensitivity to retailer’s proﬁt margins The results in the ﬁrst and second columns of Table 12.3 indicate the eﬀects of an increase of retailer’s proﬁt margins on channel members’ strategies and outcomes under symmetric conditions. We notice that an increase of πRk leads both manufacturers to allocate more eﬀorts to the pull than the push strategies: both of them increase their advertising investments and decrease their display allowances.
Table 12.3. Summary of sensitivity results to retailer’s margin. Sensitivity to G1 G2 A1 A2 w1 w2 S1 D1 D2 JM1 JM2 JR πRk = 1.8 7.6920 7.6920 1.4244 1.4244 0.1200 0.1200 0.5000 1.7205 1.7205 1.8456 1.8456 18.0395 πRk = 1.85 7.8367 7.8367 1.4512 1.4512 0.1113 0.1113 0.5000 1.7567 1.7567 1.8513 1.8513 18.8886 πR2 = 1.85 7.9294 7.5951 1.4684 1.4273 0.0838 0.1455 0.5152 1.8276 1.6507 2.0180 1.6887 18.4491 Under nonsymmetric conditions, the manufacturer of the brand with the lowest retail margin (the ﬁrst one in this example) increases his advertising eﬀort, which becomes higher than that of his competitor. This leads to a higher goodwill stock for his brand and a higher share of the
10 We can imagine a situation where manufacturer 2 chooses a more eﬃcient advertising media or message. 248 DYNAMIC GAMES: THEORY AND APPLICATIONS shelfspace. The manufacturer of the other brand reacts by increasing the incentive that he oﬀers to the retailer (compared to the symmetric case). The results indicate that, even though the display allowance of manufacturer 2 is higher than that of manufacturer 1, the retailer’s shelfspace allocation decision is driven by the goodwill diﬀerential of brands. Sales and proﬁts are higher for the brand with the lowest retail margin, since sales are aﬀected by the shelfspace and the goodwill levels of the brands, which are higher for brand 1. 5.3 Sensitivity to the eﬀect of shelfspace on sales The results in the ﬁrst two columns of Table 12.4 indicate that the parameter capturing the longterm eﬀect of shelfspace on sales (bk ) has no eﬀect on the steadystate values of the advertising strategies of the manufacturers. However, it has a positive impact on their incentive strategies which are increased under a decrease of this parameter. Hence, when the longterm eﬀect of shelfspace on sales is decreased, manufacturers are better oﬀ when they choose to allocate more marketing eﬀorts in push strategies in order to get an immediate eﬀect on the shelfspace allocation.
Table 12.4. sales. Summary of sensitivity results to the longterm eﬀect of shelfspace on Sensitivity to G1 G2 A1 A2 w1 w2 S1 D1 D2 JM1 JM2 JR bk = 1.62 7.6920 7.6920 1.4244 1.4244 0.1200 0.1200 0.5000 1.7205 1.7205 1.8456 1.8456 18.0395 bk = 1.58 7.6920 7.6920 1.4244 1.4244 0.2120 0.2120 0.5000 1.7255 1.7255 1.7285 1.7285 18.3538 b2 = 1.58 7.7195 7.6645 1.4295 1.4194 0.1622 0.1698 0.5010 1.7305 1.7155 1.7927 1.6676 18.3045 Finally, under nonsymmetric conditions, the results in the last column indicate that the manufacturer of the brand that has the lowest longterm eﬀect of shelfspace on sales (in our numerical example, the 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 249 second one) increases his incentive, but lowers his advertising investment. Thus, his goodwill stock decreases while the goodwill stock of his competitor increases (since the competitor reacts by increasing his advertising investment). The retailer gives a higher share of her shelfspace to the brand with the highest goodwill level and manufacturer 2 reacts by increasing his incentive coeﬃcient and setting it higher than that of his competitor. 6. Concluding remarks
Manufacturers can aﬀect retailer’s shelfspace allocation decisions through the use of incentive strategies (push) and/or advertising investments (pull). The numerical results indicate that a manufacturer who wants to inﬂuence the retailer’s shelfspace allocation decisions can choose between using incentive strategies and/or advertising, this choice depends on the model’s parameters. In further research, we should remove the hypothesis of myopia and relax the assumption of constant margins by introducing retail and wholesale prices as control variables for the retailer and both manufacturers. Acknowledgments. Research completed when the ﬁrst author was visiting professor at GERAD, HEC Montr´al. The ﬁrst author’s research e was partially supported by MCYT under project BEC200202361 and by JCYL under project VA051/03, conﬁnanced by FEDER funds. The second author’s research was supported by NSERC, Canada. The authors thank Georges Zaccour and Steﬀen Jørgensen for helpful comments. Appendix: Proof of Proposition 12.3
We apply the suﬃcient condition for a stationary feedback Nash equilibrium and wish to ﬁnd bounded and continuously functions VMi (G1 , G2 ), i = 1, 2, which satisfy, for all Gi (t) ≥ 0, i = 1, 2, the following HJB equations: ρVMi (G1 , G2 ) = max πMi Di −
Ai ∂VMi 1 (G1 , G2 )(αi Ai − δGi ) , ui A2 − ωi Si + i 2 ∂G1 i = 1, 2, where Di , ωi and Si are given in (12.1), (12.7) and (12.6), respectively. The maximization of the righthand side of the above equation with respect to Ai leads to αi ∂VMi Ai (G1 , G2 ) = (G1 , G2 ). uMi ∂Gi 250 DYNAMIC GAMES: THEORY AND APPLICATIONS Substitution of Ai , i = 1, 2 by their values into the HJB equations leads to conjecture the following quadratic functions for the manufacturers: VMi (Gi , Gj ) = 1 1 Ki1 G2 + Ki2 G2 + Ki3 G1 G2 + Ki4 G1 + Ki5 G2 + Ki6 . 1 2 2 2 Inserting these expressions as well as their partial derivatives in the HJB equations and identifying coeﬃcients, we obtain the following: K11 = (2δ + ρ)u1 ±
2 [(2δ + ρ)u1 ]2 − 4α1 u1 Z 2 2α1 , Y = (b1 (πM1 + 3πR1 ) + b2 (πM2 + 3πR2 ))2 , Z1 = K13 = K12 = a2 (πM1 + πR1 )2 (b1 (πM1 + 2πR1 ) + 2b2 πR2 ) 1 , Y a1 a2 u1 (πM1 + πR1 )(πM2 + πR2 )(b1 (πM1 + 2πR1 ) + 2b2 πR2 ) , 2 Y (K11 α1 − u1 (δ + ρ)) (K13 α1 )2 Y + a2 u1 (πM2 + πR2 )2 (b1 (πM1 + 3πR1 ) + 2b2 πR2 ) 2 , u1 ρY a1 u1 (πM1 + πR1 )(b1 (πM1 + 2πR1 ) + 2b2 πR2 )(b1 πR1 + b2 (πM2 + 2πR2 )) , 2 Y (K11 α1 − u1 (δ + ρ))
2 K13 K14 α1 Y − a2 u1 (πM2 + πR2 )[b2 πR1 (πM1 + 2πR1 ) 1 K14 = − K15 = 1 u1 ρY 1 2u1 ρY + 2b2 πR2 (πM2 + 2πR2 ) + b1 b2 (πM1 (πM2 + 2πR2 ) + 2πR1 (πM2 + 3πR2 ))] , 2 K16 = u1 {b2 πR1 [b1 πR1 (πM1 + 2πR1 ) + 2b2 (πM2 (πM1 + 2πR1 ) 1 + πR2 (2πM1 + 5πR1 ))] + b2 (πM2 + πR2 )[2b2 πR2 (πM2 + 2πR2 ) 2 + b1 (πM1 (πM2 + 2πR2 ) + 2πR1 (πM2 + 4πR2 ))] + (K14 α1 )2 Y } . The coeﬃcients of the value function for the manufacturer 2, as in the standard case, can be obtained following next rule: K12 → K21 , K11 → K22 , K13 → K23 , K15 → K24 , K14 → K25 , K16 → K26 , where the arrow indicates that in each parameter the subscripts 1 and 2 have been interchanged. Appendix: Parameters of retailer’s value function
Parameters of retailer’s value function are the following:
2 Ni = ui (2δ + ρ) − 2Kii αi , Ri = πMi + πRi , Q = b1 πR1 + b2 πR2 , P = b1 (πM1 + 3πR1 ) + b2 (πM2 + 3πR2 ), Ti = (2πMi + 3πRi )(πMi + 3πRi ) + πRi (πMi + 2πRi ),
2 2 Xi = πMi + 5πRi πMi + 5πRi , Yi = πMj Ri + πRj (5πMi + 9πRi ), 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 251 2 2 Zi = 12πMj πRj (πMi + 2πRi ) + πMj (2πMi + 3πRi ) + 2πRj (7πMi + 17πRi ), i, j = 1, 2, i = j, L1 = L2 = L3 = L4 =
2 u1 (2K23 L3 α2 P 2 + a2 u2 R1 2 Q) 1 , u 2 P 2 N1 2 u2 (2K13 L3 α1 P 2 + a2 u1 R2 2 Q) 2 , u 1 P 2 N2 22 22 u1 u2 Q(a2 K23 α2 R2 N1 + a2 K13 α1 R1 N2 − a1 a2 R1 R2 N1 N2 ) 2 1 , 2 (K u α2 + u (K α2 − u (2δ + ρ)))(4K K α2 α2 − N N ) P 11 2 1 1 22 2 2 13 23 1 2 12 22 2 22 u1 [2K14 K23 L3 α1 α2 P 2 + α2 (K25 L3 + K23 L5 )P 2 N1 + a2 K14 u2 α1 R1 Q] 1 2 u2 P 2 N1 (α1 K11 − u1 (δ + ρ)) + L5 = u1 a1 R1 N1 (b2 πR1 (πM1 + 4πR1 ) + b2 X2 + b1 b2 Y1 1 2 , 2 P 2 N1 (α1 K11 − u1 (δ + ρ)) 2 2 2 2 u2 [N2 P 2 α1 (L4 K13 + L3 K14 ) + K25 α2 (Qa2 u1 R2 + 2L3 α1 P 2 )] 2 2 u1 P 2 N2 (u2 (δ + ρ) − K22 α2 ) + L6 = a2 u1 u2 R2 [b2 X1 + b2 (πM2 + 4πR2 ) + b1 b2 Y2 ] 1 2 , 2 u1 P 2 (u2 (δ + ρ) − K22 α2 ) 2 2 2P 2 (u1 K25 L5 α2 + u2 K14 L4 α1 ) − u1 u2 [b3 πR1 T1 + b3 πR2 T2 + b1 b2 (b1 Z1 − b2 Z2 )] 1 2 . 2ρu1 u2 P 2 Appendix: Steadystate equilibrium values for the goodwill stocks
The steadystate values are given by: G∞ = − 1 G∞ = − 2 2 2 2 α1 (K13 K25 α2 + u2 δ − K22 α2 K14 ) 22 2 2, K13 K23 α1 α2 − (u1 δ − K11 α1 ) (u2 δ − K22 α2 ) 2 2 2 α2 (K23 K14 α1 + u1 δ − K11 α1 K25 ) 22 2 2. K13 K23 α1 α2 − (u1 δ − K11 α1 ) (u2 δ − K22 α2 ) References
Bergen, M. and John, G. (1997). Understanding cooperative advertising participation rates in conventional channels. Journal of Marketing Research, 34:357–369. Bultez, A. and Naert, P. (1988). SHARP: Shelfallocation for retailers’ proﬁt. Marketing Science, 7:211–231. Chintagunta, P.K. and Jain, D. (1992). A dynamic model of channel member strategies for marketing expenditures. Marketing Science, 11:168–188. Corstjens, M. and Doyle, P. (1981). A model for optimizing retail space allocation. Management Science, 27:822–833. 252 DYNAMIC GAMES: THEORY AND APPLICATIONS Corstjens, M. and Doyle, P. (1983). A dynamic model for strategically allocating retail space. Journal of the Operational Research Society, 34:943–951. Curhan, R.C. (1973). Shelf space allocation and proﬁt maximization in mass retailing. Journal of Marketing, 37:54–60. Desmet, P. and Renaudin, V. (1998). Estimation of product category sales responsiveness to allocated shelf space. International Journal of Research in Marketing, 15:443–457. Dr`ze, X., Hoch, S.J., and Purk, M.E. (1994). Shelf management and e space elasticity. Journal of Retailing, 70:301–326. Food Marketing Institute Report. (1999). Slotting allowances in the supermarket industry. FMI News Center. htpp://www.fmi.org Jeuland, A.P. and Shugan, S.M. (1983). Coordination in marketing channels. In: D.A. Gautshi (ed.), Productivity and Eﬃciency in Distribution Systems, Elsevier Publishing Co. Jørgensen, S., Sigu´, S.P., and Zaccour, G. (2000). Dynamic cooperative e advertising in a channel. Journal of Retailing, 76:71–92. Jørgensen, S., Sigu´, S.P., and Zaccour, G. (2001). Stackelberg leadership e in a marketing channel. International Game Theory Review, 3:13–26. Jørgensen, S., Taboubi, S., and Zaccour, G. (2001). Cooperative advertising in a marketing channel. Journal of Optimization Theory and Applications, 110:145–158. Jørgensen, S., Taboubi, S., and Zaccour, G. (2003). Retail promotions with negative brand image eﬀects: Is cooperation possible? European Journal of Operational Research, 150:395–405. Jørgensen, S. and Zaccour, G. (1999). Equilibrium pricing and advertising strategies in a marketing channel. Journal of Optimization Theory and Applications, 102:111–125. Jørgensen, S. and Zaccour, G. (2003). Channel coordination over time: Incentive equilibria and credibility. Journal of Economic Dynamics and Control, 27:801–822. Jørgensen, S. and Zaccour, G. (2004). Diﬀerential Games in Marketing. Kluwer Academic Publishers. Mart´ ınHerr´n, G. and Taboubi, S. (2004). Shelfspace allocation and a advertising decisions in the marketing channel: A diﬀerential game approach. To appear in International Game Theory Review. Mart´ ınHerr´n, G., Taboubi, S., and Zaccour, G. (2004). A TimeCona sistent OpenLoop Stackelberg Equilibrium of Shelfspace Allocation. GERAD Discussion Paper G200416. Nerlove, M. and Arrow, K.J. (1962). Optimal advertising policy under dynamic conditions. Economica, 39:129–142. 12 Incentive Strategies for ShelfSpace Allocation in Duopolies 253 Roberts, M.J. and Samuelson, L. (1988). An empirical analysis of dynamic nonprice competition in an oligopolistic industry. RAND Journal of Economics, 19:200–220. Taboubi, S. and Zaccour, G. (2002). Impact of retailer’s myopia on channel’s strategies. In: G. Zaccour (ed.), Optimal Control and Diﬀerential Games, Essays in Honor of Steﬀen Jørgensen, pages 179–192, Advances in Computational Management Science, Kluwer Academic Publishers. Wang, Y. and Gerchack, Y. (2001). Supply chain coordination when demand is shelfspace dependent. Manufacturing & Service Operations Management, 3:82–87. Yang, M.H. (2001). An eﬃcient algorithm to allocate shelfspace. European Journal of Operational Research, 131:107–118. Zufreyden, F.S. (1986). A dynamic programming approach for product selection and supermarket shelfspace allocation. Journal of the Operational Research Society, 37:413–422. Chapter 13 SUBGAME CONSISTENT DORMANTFIRM CARTELS
David W.K. Yeung
Abstract Subgame consistency is a fundamental element in the solution of cooperative stochastic diﬀerential games. In particular, it ensures that the extension of the solution policy to a later starting time and any possible state brought about by prior optimal behavior of the players would remain optimal. Hence no players will have incentive to deviate from the initial plan. Recently a general mechanism for the derivation of payoﬀ distribution procedures of subgame consistent solutions in stochastic cooperative diﬀerential games has been found. In this paper, we consider a duopoly in which the ﬁrms agree to form a cartel. In particular, one ﬁrm has absolute and marginal cost advantage over the other forcing one of the ﬁrms to become a dormant ﬁrm. A subgame consistent solution based on the Nash bargaining axioms is derived. 1. Introduction Formulation of optimal behaviors for players is a fundamental element in the theory of cooperative games. The players’ behaviors satisfying some speciﬁc optimality principles constitute a solution of the game. In other words, the solution of a cooperative game is generated by a set of optimality principles (for instance, the Nash bargaining solution (1953) and the Shapley values (1953)). For dynamic games, an additional stringent condition on their solutions is required: the speciﬁc optimality principle must remain optimal at any instant of time throughout the game along the optimal state trajectory chosen at the outset. This condition is known as dynamic stability or time consistency. In particular, the dynamic stability of a solution of a cooperative diﬀerential game is the property that, when the game proceeds along an “optimal” trajectory, at each instant of time the players are guided by the same 256 DYNAMIC GAMES: THEORY AND APPLICATIONS optimality principles, and hence do not have any ground for deviation from the previously adopted “optimal” behavior throughout the game. The question of dynamic stability in diﬀerential games has been explored rigorously in the past three decades. Haurie (1976) discussed the problem of instability in extending the Nash bargaining solution to diﬀerential games. Petrosyan (1977) formalized mathematically the notion of dynamic stability in solutions of diﬀerential games. Petrosyan and Danilov (1979 and 1985) introduced the notion of “imputation distribution procedure” for cooperative solution. Tolwinski et al. (1986) considered cooperative equilibria in diﬀerential games in which memorydependent strategies and threats are introduced to maintain the agreedupon control path. Petrosyan and Zenkevich (1996) provided a detailed analysis of dynamic stability in cooperative diﬀerential games. In particular, the method of regularization was introduced to construct time consistent solutions. Yeung and Petrosyan (2001) designed a time consistent solution in diﬀerential games and characterized the conditions that the allocation distribution procedure must satisfy. Petrosyan (2003) used regularization method to construct time consistent bargaining procedures. A cooperative solution is subgame consistent if an extension of the solution policy to a situation with a later starting time and any feasible state would remain optimal. Subgame consistency is a stronger notion of timeconsistency. Petrosyan (1997) examined agreeable solutions in diﬀerential games. In the presence of stochastic elements, subgame consistency is required in addition to dynamic stability for a credible cooperative solution. In the ﬁeld of cooperative stochastic diﬀerential games, little research has been published to date due to the inherent diﬃculties in deriving tractable subgame consistent solutions. Haurie et al. (1994) derived cooperative equilibria of a stochastic diﬀerential game of ﬁshery with the use of monitoring and memory strategies. As pointed out by Jørgensen and Zaccour (2001), conditions ensuring time consistency of cooperative solutions could be quite stringent and analytically intractable. The recent work of Yeung and Petrosyan (2004) developed a generalized theorem for the derivation of analytically tractable “payoﬀ distribution procedure” of subgame consistent solution. Being capable of deriving analytical tractable solutions, the work is not only theoretically interesting but would enable the hitherto intractable problems in cooperative stochastic diﬀerential games to be fruitfully explored. In this paper, we consider a duopoly game in which one of the ﬁrms enjoys absolute cost advantage over the other. A subgame consistent solution is developed for a cartel in which one ﬁrm becomes a dormant partner. The paper is organized as follows. Section 2 presents the 13 Subgame Consistent DormantFirm Cartels 257 formulation of a dynamic duopoly game. In Section 3, Pareto optimal trajectories under cooperation are derived. Section 4 examines the notion of subgame consistency and the subgame consistent payoﬀ distribution. Section 5 presents a subgame consistent cartel based on the Nash bargaining axioms. An illustration is provided in Section 6. Concluding remarks are given in Section 7. 2. A generalized dynamic duopoly game Consider a duopoly in which two ﬁrms are allowed to extract a renewable resource within the duration [t0 , T ]. The dynamics of the resource is characterized by the stochastic diﬀerential equations: dx (s) = f [s, x (s) , u1 (s) + u2 (s)] ds + σ [s, x (s)] dz (s) , (13.1) x (t0 ) = x0 ∈ X, where ui ∈ Ui is the (nonnegative) amount of resource extracted by ﬁrm i, for i ∈ [1, 2], σ [s, x (s)] is a scaling function and z (s) is a Wiener process. The extraction cost for ﬁrm i ∈ N depends on the quantity of resource extracted ui (s) and the resource stock size x(s). In particular, ﬁrm i’s extraction cost can be speciﬁed as ci [ui (s) , x (s)]. This formulation of unit cost follows from two assumptions: (i) the cost of extraction is proportional to extraction eﬀort, and (ii) the amount of resource extracted, seen as the output of a production function of two inputs (eﬀort and stock level), is increasing in both inputs (See Clark (1976)). In particular, ﬁrm 1 has absolute and marginal cost advantage so that c1 (u1 , x) < c2 (u2 , x) and ∂c1 (u1 , x) /∂u1 < ∂c2 (u2 , x) /∂u2 . The market price of the resource depends on the total amount extracted and supplied to the market. The priceoutput relationship at time s is given by the following downward sloping inverse demand curve P (s) = g [Q (s)], where Q(s) = u1 (s) + u2 (s) is the total amount of resource extracted and marketed at time s. At time T , ﬁrm i will receive a termination bonus qi (x (T )). There exists a discount rate r, and proﬁts received at time t has to be discounted by the factor exp [−r (t − t0 )]. At time t0 , the expected proﬁt of ﬁrm i ∈ [1, 2] is:
T Et0 t0 g [u1 (s) + u2 (s)] ui (s) − ci [ui (s) , x (s)] exp [−r (s − t0 )] ds (13.2) + exp [−r (T − t0 )] qi [x (T )]  x (t0 ) = x0 , where Et0 denotes the expectation operator performed at time t0 . 258 DYNAMIC GAMES: THEORY AND APPLICATIONS We use Γ (x0 , T − t0 ) to denote the game (13.1)–(13.2) and Γ (xτ , T − τ ) to denote an alternative game with state dynamics (13.1) and payoﬀ structure (13.2), which starts at time τ ∈ [t0 , T ] with initial state xτ ∈ X . A noncooperative Nash equilibrium solution of the game Γ (xτ , T − τ ) can be characterized with the techniques introduced by Fleming (1969), Isaacs (1965) and Bellman (1957) as: Definition 13.1 A set of feedback strategies ui (t) = φi (t, x) , for i ∈ [1, 2] , provides a Nash equilibrium solution to the game Γ (xτ , T − τ ), if there exist twice continuously diﬀerentiable functions V (τ )i (t, x) : [τ, T ] × R → R, i ∈ [1, 2], satisfying the following partial diﬀerential equations:
−Vt
(τ )i (τ )∗ (τ )∗ 1 (τ (t, x) − σ (t, x)2 Vxx )i (t, x) = 2 max
ui g ui + φj (τ )∗ (t, x) ui − ci [ui , x] exp [−r (t − τ )]
(τ ) ( +Vx τ )i (t, x) f t, x, , ui + φj (t, x) , and V (τ )i (T, x) = qi (x) exp [−r (T − τ )] ds, for i ∈ [1, 2] , j ∈ [1, 2] and j = i. Remark 13.1 From Deﬁnition 13.1, one can readily verify that V (τ )i (t, x) (τ )∗ (s)∗ = V (s)i (t, x) exp [−r (τ − s)] and φi (t, x) = φi (t, x), for i ∈ [1, 2], t0 ≤ τ ≤ s ≤ t ≤ T and x ∈ X . 3. Dynamic cooperation and Pareto optimal trajectory Assume that the ﬁrms agree to form a cartel. Since proﬁts are in monetary terms, these ﬁrms are required to solve the following joint proﬁt maximization problem to achieve a Pareto optimum:
T Et0 g [u1 (s) + u2 (s)] [u1 (s) + u2 (s)] − c1 [u1 (s) , x (s)] t0 −c2 [u2 (s) , x (s)] exp [−r (s − t0 )] ds + exp [−r (T − t0 )] (qi [x (T )] + qi [x (T )]]  x (t0 ) = x0 ,(13.3) subject to dynamics (13.1). An optimal solution of the problem (13.1) and (13.3) can be characterized with the techniques introduced by Fleming’s (1969) stochastic control techniques as: 13 Subgame Consistent DormantFirm Cartels
(t )∗ (t )∗ 259 Definition 13.2 A set of feedback strategies ψ1 0 (s, x) , ψ2 0 (s, x) , for s ∈ [t0 , T ] provides an optimal control solution to the problem (13.1) and (13.3), if there exist a twice continuously diﬀerentiable function W (t0 ) (t, x) : [t0 , T ] × R → R satisfying the following partial diﬀerential equations:
−Wt
(t0 ) 1 (t (t, x) − σ (t, x)2 Wxx0 ) (t, x) = 2 g (u1 + u2 ) (u1 + u2 ) − c1 (u1 , x) − c2 (u2 , x) exp [−r (t − τ )]
( +Wxt0 ) (t, x) f (t, x, u1 + u2 ) , and u1 ,u2 max W (t0 ) (T, x) = [q1 (x) + q2 (x)] exp [−r (T − t0 )] . Performing the indicated maximization in Deﬁnition 13.2 yields:
( g (u1 + u2 ) u1 + g (u1 + u2 ) + Wxt0 ) (t, x) fu1 +u2 (t, x, u1 + u2 ) (13.4) −∂c1 (u1 , x) /∂u1 ≤ 0, and
( g (u1 + u2 ) u2 + g (u1 + u2 ) + Wxt0 ) (t, x) fu1 +u2 (t, x, u1 + u2 ) (13.5) −∂c2 (u1 , x) /∂u2 ≤ 0. Since ∂c1 (u1 , x) /∂u1 < ∂c2 (u2 , x) /∂u2 , ﬁrm 2 has to refrain from extraction. (t )∗ (t )∗ Upon substituting ψ1 0 (t, x) and ψ2 0 (t, x) into (13.1) yields the optimal cooperative state dynamics as: dx (s) = f s, x (s) , ψ1 0 (s, x (s)) ds + σ [s, x (s)] dz (s) , x (t0 ) = x0 ∈ X. (13.6)
(t )∗ The solution to (13.6) yields a Pareto optimal trajectory, which can be expressed as: x∗ (t) = x0 +
t t0 f s, x (s) , ψ1 0 (s, x (s)) ds + (t )∗ t σ [s, x (s)] dz (s) .
t0 We denote the set containing realizable values of by t ∈ (t0 , T ]. We use Γc (x0 , T − t0 ) to denote the cooperative game (13.1)–(13.2) and Γc (xτ , T − τ ) to denote an alternative game with state dynamics (13.1) and payoﬀ structure (13.2), which starts at time τ ∈ [t0 , T ] ∗ with initial state xτ ∈ Xτ . x∗ (t) α (t ) Xt 1 0 , (13.7) for 260 DYNAMIC GAMES: THEORY AND APPLICATIONS Remark 13.2 One can readily show that:
W (τ ) (s, x) = exp [−r (t − τ )] W (t) (s, x) , and ψi (s, x (s)) = ψi (s, x (s)) , for s ∈ [t, T ] and i ∈ [1, 2] and t0 ≤ τ ≤ t ≤ s ≤ T.
(τ )∗ (t)∗ 4. Subgame consistency and payoﬀ distribution Consider the cooperative game Γc (x0 , T − t0 ) in which total cooperative payoﬀ is distributed between the two ﬁrms according to an agreeupon optimality principle. At time t0 , with the state being x0 , we use the term ξ (t0 )i (t0 , x0 ) to denote the expected share/imputation of total cooperative payoﬀ (received over the time interval [t0 , T ]) to ﬁrm i guided by the agreeupon optimality principle. We use Γc (xτ , T − τ ) to denote the cooperative game which starts at time τ ∈ [t0 , T ] with initial ∗ state xτ ∈ Xτ . Once again, total cooperative payoﬀ is distributed between the two ﬁrms according to same agreeupon optimality principle as before. Let ξ (τ )i (τ, xτ ) denote the expected share/imputation of total cooperative payoﬀ given to ﬁrm i over the time interval [τ, T ]. The vector ξ (τ ) (τ, xτ ) = ξ (τ )1 (τ, xτ ) , ξ (τ )2 (τ, xτ ) , for τ ∈ (t0 , T ], yields valid imputations if the following conditions are satisﬁed. Definition 13.3 The vectors ξ (τ ) (τ, xτ ) is an imputation of the cooperative game Γc (xτ , T − τ ), for τ ∈ (t0 , T ], if it satisﬁes:
2 (i)
j =1 ξ (τ )j (τ, xτ ) = W (τ ) (τ, xτ ) , and (ii) ξ (τ )i (τ, xτ ) ≥ V (τ )i (τ, xτ ), for i ∈ [1, 2] and τ ∈ [t0 , T ]. In particular, part (i) of Deﬁnition 13.3 ensures Pareto optimality, while part (ii) guarantees individual rationality. A payoﬀ distribution procedure (PDP) of the cooperative game (as proposed in Petrosyan (1997) and Yeung and Petrosyan (2004)) must be now formulated so that the agreed imputations can be realized. Let τ τ the vectors B τ (s) = [B1 (s) , B2 (s)] denote the instantaneous payoﬀ of the cooperative game at time s ∈ [τ, T ] for the cooperative game Γc (xτ , T − τ ). In other words, ﬁrm i, for i ∈ [1, 2], is oﬀered a payoﬀ τ equaling Bi (s) at time instant s. A terminal payment q i (x (T )) is given to ﬁrm i at time T . τ In particular, Bi (s) and q i (x (T )) constitute a PDP for the game Γc (xτ , T − τ ) in the sense that ξ (τ )i (τ, xτ ) equals:
T Eτ τ τ Bi (s) exp [−r (s − τ )] ds 13 Subgame Consistent DormantFirm Cartels 261
 x (τ ) = xτ , (13.8) +q i (x (T )) exp [−r (T − τ )] for i ∈ [1, 2] and τ ∈ [t0 , T ]. Moreover, for i ∈ [1, 2] and t ∈ [τ, T ], we use ξ (τ )i (t, xt ) which equals
T Eτ t τ Bi (s) exp [−r (s − τ )] ds +q i (x (T )) exp [−r (T − τ )]  x (t) = xt , (13.9) to denote the expected present value of ﬁrm i’s cooperative payoﬀ over the time interval [t, T ], given that the state is xt at time t ∈ [τ, T ], for ∗ the game which starts at time τ with state xτ ∈ Xτ . Definition 13.4 The imputation vectors ξ (t) (t, xt ) = ξ (t)1 (t, xt ), ξ (t)2 (t, xt ) , for t ∈ [t0 , T ], as deﬁned by (13.8) and (13.9), are subgame consistent imputations of Γc (xτ , T − τ ) if they satisfy Deﬁnition 13.3 and the condition that ξ (t)i (t, xt ) = exp [−r (t − τ )] ξ (τ )i (t, xt ), where (τ )∗ t0 ≤ τ ≤ t ≤ T, i ∈ [1, 2] and xt ∈ Xt .
The conditions in Deﬁnition 13.4 guarantee subgame consistency of the solution imputations throughout the game interval in the sense that the extension of the solution policy to a situation with a later starting time and any possible state brought about by prior optimal behavior of the players remains optimal. τ t For Deﬁnition 13.4 to hold, it is required that Bi (s) = Bi (s), for i ∈ [1, 2] and τ ∈ [t0 , T ] and t ∈ [t0 , T ] and r = t. Adopting the τ t notation Bi (s) = Bi (s) = Bi (s) and applying Deﬁnition 13.4, the PDP of the subgame consistent imputation vectors ξ (τ ) (τ, xτ ) has to satisfy the following condition. Corollary 13.1 The PDP with B (s) and q (x (T )) corresponding to the subgame consistent imputation vectors ξ (τ ) (τ, xτ ) must satisfy the following conditions:
2 (i)
j =1 Bi (s) = g ψ1 (τ )∗ (s) + ψ2 (τ )∗ (s) ψ1 (τ )∗ (s) + ψ2 (τ )∗ (s) −c1 ψ1 for s ∈ [t0 , T ]; (ii) Eτ (τ )∗ (s) , x (s) − c1 ψ1 (τ )∗ (s) , x (s) , for i ∈ [1, 2] and τ ∈ [t0 , T ]; and T τ Bi (s) exp [−r (s − τ )] ds +q i (x (T )) exp [−r (T − τ )]  x (τ ) = xτ ≥ V (τ )i (τ, xτ ), 262 DYNAMIC GAMES: THEORY AND APPLICATIONS (iii) ξ (τ )i (τ, xτ ) = τ +Δt Bi (s) exp [−r (s − τ )] ds + exp − Eτ τ τ +Δt r (y ) dy τ ×ξ (τ +Δt)i (τ + Δt, xτ + Δxτ )  x (τ ) = xτ , for τ ∈ [t0 , T ] and i ∈ [1, 2]; where (τ )∗ Δxτ = f τ , xτ , ψ1 (τ, xτ ) Δt + σ [τ, xτ ] Δzτ + o (Δt) , ∗ x (τ ) = xτ ∈ Xτ , Δzτ = z (τ + Δt) − z (τ ), and Eτ [o (Δt)] /Δt → 0 as Δt → 0. Consider the following condition concerning subgame consistent imputations ξ (τ ) (τ, xτ ), for τ ∈ [t0 , T ]: Condition 13.1 For i ∈ [1, 2] and t ≥ τ and τ ∈ [t0 , T ], the terms ξ (τ )i (t, xt ) are functions that are continuously twice diﬀerentiable in t and xt . If the subgame consistent imputations ξ (τ ) (τ, xτ ), for τ ∈ [t0 , T ], satisfy Condition 13.1, a PDP with B (s) and q (x (T )) will yield the relationship:
τ +Δt Eτ τ Bi (s) exp − s τ r (y ) dy ds  x (τ ) = xτ
τ +Δt τ = Eτ ξ (τ )i (τ, xτ ) − exp − = Eτ ξ
(τ )i r (y ) dy ξ (τ +Δt)i (τ + Δt, xτ + Δxτ ) (τ, xτ ) − ξ (τ )i (τ + Δt, xτ + Δxτ ) , (13.10) for all τ ∈ [t0 , T ] and i ∈ [1, 2]. With Δt → 0, condition (13.10) can be expressed as: Eτ {Bi (τ ) Δt + o (Δt)} = Eτ − ξt
(τ )i (t, xt ) t=τ Δt
(τ )∗ ( − ξxτ )i (t, xt ) t=τ f τ , xτ , ψ1 t (τ, xτ ) Δt 1 (τ )i − σ [τ, xτ ]2 ξ h ζ (t, xt ) t=τ Δt xt xt 2
( − ξxτ )i (t, xt ) t=τ σ [τ, xτ ] Δzτ − o (Δt) . t (13.11) Taking expectation and dividing (13.11) throughout by Δt, with Δt → 0, yield Bi (τ ) = − ξt
(τ )i (t, xt ) t=τ 13 Subgame Consistent DormantFirm Cartels
( − ξxτ )i (t, xt ) t=τ f τ , xτ , ψ1 t (τ )∗ 263
(τ, xτ ) (13.12) 1 (τ )i − σ [τ, xτ ]2 ξ h ζ (t, xt ) t=τ . xt xt 2 Therefore, one can establish the following theorem. Theorem 13.1 (YeungPetrosyan Equation (2004)). If the solution imputations ξ (τ )i (τ, xτ ), for i ∈ [1, 2] and τ ∈ [t0 , T ], satisfy Deﬁnition 13.4 and Condition 13.1, a PDP with a terminal payment q i (x (T )) at time T and an instantaneous imputation rate at time τ ∈ [t0 , T ]:
Bi (τ ) = − ξt
(τ )i (t, xt ) t=τ
(τ )∗ ( − ξxτ )i (t, xt ) t=τ f τ , xτ , ψ1 t (τ, xτ ) for i ∈ [1, 2] , 1 (τ )i − σ [τ, xτ ]2 ξ h ζ (t, xt ) t=τ , xt xt 2 yielda a subgame consistent solution to the cooperative game Γc (x0 , T − t0 ). 5. A Subgame Consistent Cartel In this section, we present a subgame consistent solution in which the ﬁrms agree to maximize the sum of their expected proﬁts and divide the total cooperative proﬁt satisfying the Nash bargaining outcome – that is, they maximize the product of expected proﬁts in excess of the noncooperative proﬁts. The Nash bargaining solution is perhaps the most popular cooperative solution concept which possesses properties not dominated by those of any other solution concepts. Assume that the agents agree to act and share the total cooperative proﬁt according to an optimality principle satisfying the Nash bargaining axioms: (i) Pareto optimality, (ii) symmetry, (iii) invariant to aﬃne transformation, and (iv) independence from irrelevant alternatives. In economic cooperation where proﬁts are measured in monetary terms, Nash bargaining implies that agents agree to maximize the sum of their proﬁts and then divide the total cooperative proﬁt satisfying the Nash bargaining outcome – that is, they maximize the product of the agents’ gains in excess of the noncooperative proﬁts. In the two player case with transferable payoﬀs, the Nash bargaining outcome also coincides with the Shapley value. 264 DYNAMIC GAMES: THEORY AND APPLICATIONS Let S i denote the aggregate cooperative gain imputed to agent i, the Nash product can be expressed as
2 [S ] W i (t0 ) (t0 , x0 ) −
j =1 V (t0 )j (t0 , x0 ) − S i . Maximization of the Nash product yields 1 S= W (t0 ) (t0 , x0 ) − 2
i 2 V (t0 )j (t0 , x0 ) ,
j =1 for i ∈ [1, 2]. The sharing scheme satisﬁes the socalled Nash formula (see Dixit and Skeath (1999)) for splitting a total value W (t0 ) (t0 , x0 ) symmetrically. To extend the scheme to a dynamic setting, we ﬁrst propose that the optimality principle guided by Nash bargaining outcome be maintained not only at the outset of the game but at every instant within the game interval. Dynamic Nash bargaining can therefore be characterized as: The ﬁrms agree to maximize the sum of their expected proﬁts and distribute the total cooperative proﬁt among themselves so that the Nash bargaining outcome is maintained at every instant of time τ ∈ [t0 , T ]. According the optimality principle generated by dynamic Nash bargaining as stated in the above proposition, the imputation vectors must satisfy: Proposition 13.1 In the cooperative game Γ (xτ , T − τ ), for τ ∈ [t0 , T ], under dynamic Nash bargaining,
ξ (τ )i (τ, xτ ) = V (τ )i (τ, xτ ) + for i ∈ [1, 2].
1 2 W (τ ) (τ, xτ ) − 2 j =1 V (τ )j (τ, xτ ) , Note that each ﬁrm will receive an expected proﬁt equaling its expected noncooperative proﬁt plus half of the expected gains in excess of expected noncooperative proﬁts over the period [τ, T ], for τ ∈ [t0 , T ]. The imputations in Proposition 13.1 satisfy Condition 13.1 and Deﬁnition 13.4. Note that: ξ (t)i (t, xt ) = exp
t t τ r (y ) dy ξ (τ )i (t, xt ) ≡ 1 W (τ ) (t, xt ) − 2
2 exp
τ r (y ) dy V (τ )i (t, xt ) + V (τ )j (t, xt )
j =1 , for t0 ≤ τ ≤ t, (13.13) 13 Subgame Consistent DormantFirm Cartels 265 and hence the imputations satisfy Deﬁnition 13.4. Therefore Proposition 13.1 gives the imputations of a subgame consistent solution to the cooperative game Γc (x0 , T − t0 ). Using Theorem 13.1 we obtain a PDP with a terminal payment q i (x (T )) at time T and an instantaneous imputation rate at time τ ∈ [t0 , T ]: Bi (τ ) = −1 2 Vt
(τ )i (t, xt ) t=τ
(τ )∗ ( + Vxtτ )i (t, xt ) t=τ f τ , xτ , ψ1 (τ, xτ ) 1 (τ )i + σ [τ, xτ ]2 V h ζ (t, xt ) t=τ xt xt 2 1 (τ ) − Wt (t, xt ) t=τ 2
(τ + Wxt ) (t, xt ) t=τ f τ , xτ , ψ1 (τ )∗ (τ, xτ ) 1 (τ ) + σ [τ, xτ ]2 W h ζ (t, xt ) t=τ xt xt 2 1 (τ )j + (t, xt ) t=τ Vt 2
( + Vxtτ )j (t, xt ) t=τ f τ , xτ , ψ1 (τ )∗ (τ, xτ ) 1 (τ )j + σ [τ, xτ ]2 V h ζ (t, xt ) t=τ xt xt 2 , for i ∈ [1, 2]. (13.14) 6. An Illustration Consider a duopoly in which two ﬁrms are allowed to extract a renewable resource within the duration [t0 , T ]. The dynamics of the resource is characterized by the stochastic diﬀerential equations: dx (s) = ax (s)1/2 − bx (s) − u1 (s) − u2 (s) ds + σx (s) dz (s) , x (t0 ) = x0 ∈ X, (13.15) where ui ∈ Ui is the (nonnegative) amount of resource extracted by ﬁrm i, for i ∈ [1, 2], a, b and σ are positive constants, and z (s) is a Wiener process. Similar stock dynamics of a biomass of renewable resource had been used in Jørgensen and Yeung (1996 and 1999), Yeung (2001 and 2003). The extraction cost for ﬁrm i ∈ N depends on the quantity of resource extracted ui (s), the resource stock size x(s), and a parameter ci . In particular, ﬁrm i’s extraction cost can be speciﬁed as ci ui (s) x (s)−1/2 . 266 DYNAMIC GAMES: THEORY AND APPLICATIONS This formulation of unit cost follows from two assumptions: (i) the cost of extraction is proportional to extraction eﬀort, and (ii) the amount of resource extracted, seen as the output of a production function of two inputs (eﬀort and stock level), is increasing in both inputs (See Clark (1976)). In particular, ﬁrm 1 has absolute cost advantage and c1 < c2 . The market price of the resource depends on the total amount extracted and supplied to the market. The priceoutput relationship at time s is given by the following downward sloping inverse demand curve P (s) = Q (s)−1/2 , where Q(s) = u1 (s) + u2 (s) is the total amount of resource extracted and marketed at time s. At time T , ﬁrm i will receive a termination bonus with satisfaction qi x (T )1/2 , where qi is nonnegative. There exists a discount rate r, and proﬁts received at time t has to be discounted by the factor exp [−r (t − t0 )]. At time t0 , the expected proﬁt of ﬁrm i ∈ [1, 2] is:
T Et0 ui (s) [u1 (s) + u2 (s)]
1/2 −
1 ci x (s)1/2 t0 ui (s) exp [−r (s − t0 )] ds (13.16) + exp [−r (T − t0 )] qi x (T ) 2  x (t0 ) = x0 , where Et0 denotes the expectation operator performed at time t0 . (τ )∗ (τ )∗ A set of feedback strategies ui (t) = φi (t, x) , for i ∈ [1, 2] provides a Nash equilibrium solution to the game Γ (xτ , T − τ ), if there exist twice continuously diﬀerentiable functions V (τ )i (t, x) : [τ, T ] × R → R, i ∈ [1, 2], satisfying the following partial diﬀerential equations: −Vt
(τ )i 1 (τ (t, x) − σ 2 x2 Vxx )i (t, x) = 2 ui ci − 1/2 ui exp [−r (t − τ )] max 1/2 ui x (ui + φj (t, x))
( +Vx τ )i (t, x) ax1/2 − bx − ui − φj (t, x) , and V (τ )i (T, x) = qi x1/2 exp [−r (T − τ )] ds, for i ∈ [1, 2] , j ∈ [1, 2] and j = i. (13.17) Proposition 13.2 The value function of ﬁrm i in the game Γ (xτ , T − τ ) is:
V (τ )i (t, x) = exp [−r (t − τ )] Ai (t) x1/2 + Bi (t) , for i ∈ [1, 2] and t ∈ [τ, T ] , (13.18) 13 Subgame Consistent DormantFirm Cartels 267 where Ai (t), Bi (t), Aj (t) and Bj (t) , for i ∈ [1, 2] and j ∈ [1, 2] and i = j , satisfy: 1 b ˙ Ai (t) − Ai (t) = r + σ 2 + 8 2 + + 3 2 9 8
2 3 2 [2cj − ci + Aj (t) − Ai (t) /2] [c1 + c2 + A1 (t) /2 + A2 (t) /2]2 ci [2cj − ci + Aj (t) − Ai (t) /2] [c1 + c2 + A1 (t) /2 + A2 (t) /2]3 Ai (t) , [c1 + c2 + A1 (t) /2 + A2 (t) /2]2 Ai (T ) = qi ; a ˙ Bi (t) = rBi (t) − Ai (t) , and Bi (t) = 0. 2 Proof. Perform the indicated maximization in (13.17) and then substitute the results back into the set of partial diﬀerential equations. Solving this set equations yields Proposition 13.2. 2 Assume that the ﬁrms agree to form a cartel and seek to solve the following joint proﬁt maximization problem to achieve a Pareto optimum:
T Et0 [u1 (s) + u2 (s)]1/2 − c1 u1 (s) + c2 u2 (s) x (s)1/2 exp [−r (s − t0 )] ds (13.19) t0 + exp [−r (T − t0 )] [q1 + q2 ] x (T )1/2  x (t0 ) = x0 , subject to dynamics (13.15). (t )∗ (t )∗ A set of feedback strategies ψ1 0 (s, x) , ψ2 0 (s, x) , for s ∈ [t0 , T ] provides an optimal control solution to the problem (13.15) and (13.19), if there exist a twice continuously diﬀerentiable function W (t0 ) (t, x) : [t0 , T ] × R → R satisfying the following partial diﬀerential equations: −Wt
(t0 ) 1 (t (t, x) − σ 2 x2 Wxx0 ) (t, x) = 2 (u1 + u2 )1/2 − (c1 u1 + c2 u2 ) x−1/2 exp [−r (t − t0 )] , and max
ui ,u2 ( +Wxt0 ) (t, x) ax1/2 − bx − u1 − u2 W (t0 ) (T, x) = (q1 + q2 ) x1/2 exp [−r (T − t0 )] . 268 DYNAMIC GAMES: THEORY AND APPLICATIONS The indicated maximization operation in the above deﬁnition requires: ψ1 0 (t, x) =
(t )∗ x 4 c1 + Wx exp [r (t − t0
2 )] x1/2 and ψ2 0 (t, x) = 0. (t )∗ (13.20) Along the optimal trajectory, ﬁrm 2 has to refrain from extraction. Proposition 13.3 The value function of the stochastic control problem (13.15) and (13.19) can be obtained as:
W (t0 ) (t, x) = exp [−r (t − t0 )] A (t) x1/2 + B (t) , where A (t) and B (t) satisfy: 1 1 b ˙ A (t) = r + σ 2 + A (t) − , 8 2 4 [c1 + A (t) /2] A (T ) = q1 + q2 ; a ˙ B (t) = rB (t) − A (t) , and B (T ) = 0. 2 Proof. Substitute the results from (13.20) into the partial diﬀerential equations in (13.19). Solving this equation yields Proposition 13.3. 2 Upon substituting ψ1 0 (t, x) and ψ2 0 (t, x) into (13.15) yields the optimal cooperative state dynamics as: dx (s) = ax (s)1/2 − bx (s) − x (t0 ) = x0 ∈ X. x (s) ds + σx (s) dz (s) , 4 [c1 + A (s) /2]2 (13.22)
(t )∗ (t )∗ (13.21) The solution to (13.22) yields a Pareto optimal trajectory, which can be expressed as: x∗ (t) = where
t Φ (t, t0 ) x0 1/2 t +
t0 a Φ−1 (s, t0 ) ds 2 2 ,
t (13.23) σ dz (s) . 2
α (t ) Φ (t, t0 ) = exp
t0 −b 1 3σ 2 − − 2 8 8 [c1 + A (s) /2]2 ds +
t0 We denote the set containing realizable values of x∗ (t) by Xt 1 0 , for t ∈ (t0 , T ]. Using Theorem 13.1 we obtain a PDP with a terminal payment q i (x (T )) at time T and an instantaneous imputation rate at time τ ∈ [t0 , T ]: 13 Subgame Consistent DormantFirm Cartels 269 Bi (τ ) = −1 2 Vt (τ )i (t, xt ) t=τ ax1/2 − bxτ − τ xτ 4 [c1 + A (τ ) /2]2 ( + Vxtτ )i (t, xt ) t=τ + σ 2 x2 (τ )i V h ξ (t, xt ) t=τ x t xt 2 1 (τ ) − Wt (t, xt ) t=τ 2 ax1/2 − bxτ − τ xτ 4 [c1 + A (τ ) /2]2 (τ + Wxt ) (t, xt ) t=τ + σ 2 x2 (τ ) W h ξ (t, xt ) t=τ xt xt 2 1 (τ )j Vt (t, xt ) t=τ + 2 ax1/2 − bxτ − τ xτ 4 [c1 + A (τ ) /2]2 (13.24) ( + Vxtτ )j (t, xt ) t=τ + σ 2 x2 (τ )j V h ξ (t, xt ) t=τ x t xt 2 , for i ∈ [1, 2]. yields a subgame consistent solution to the cooperative game Γc (x0 , T − t0 ), in which the ﬁrms agree to divide their cooperative gains according to Proposition 13.1. Using (13.19), we obtain: 1 ( Vxtτ )i (t, xt ) t=τ = Ai (τ ) x−1/2 , τ 2 −1 (τ )i Ai (τ ) x−3/2 , V h ξ (t, xt ) t=τ = τ xt xt 4 and (τ )i ˙ ˙ Vt (t, xt ) t=τ = −r Ai (τ ) x1/2 + Bi (τ ) + Ai (τ ) x1/2 + Bi (τ ) , τ τ for i ∈ [1, 2] , ˙ ˙ where Ai (τ ) and Bi (τ ) are given in Proposition 13.2. Using (13.21), we obtain: 1 (τ Wxt ) (t, xt ) t=τ = A (τ ) x−1/2 , τ 2 −1 (τ ) W h ξ (t, xt ) t=τ = A (τ ) x−3/2 , τ xt xt 4 and (13.25) 270
Wt
(τ ) DYNAMIC GAMES: THEORY AND APPLICATIONS ˙ ˙ (t, xt ) t=τ = −r A (τ ) x1/2 + B (τ ) + A (τ ) x1/2 + B (τ ) τ τ ˙ ˙ where A (τ ) and B (τ ) are given in Proposition 13.3. Bi (τ ) in (13.25) yields the instantaneous imputation that will be offered to ﬁrm i given that the state is xτ at time τ . 7. Concluding Remarks The complexity of stochastic diﬀerential games generally leads to great diﬃculties in the derivation of game solutions. The stringent requirement of subgame consistency imposes additional hurdles to the derivation of solutions for cooperative stochastic diﬀerential games. In this paper, we consider a duopoly in which the ﬁrms agree to form a cartel. In particular, one ﬁrm has absolute cost advantage over the other forcing one of the ﬁrms to become a dormant ﬁrm. A subgame consistent solution based on the Nash bargaining axioms is derived. The analysis can be readily applied to the deterministic version of the duopoly game by setting σ equal zero. Acknowledgments. This research was supported by the Research Grant Council of Hong Kong (Grant # HKBU2056/99H) and Hong Kong Baptist University FRG grants Grant # FRG/0203/II16). References
Bellman, R. (1957). Dynamic Programming. Princeton, Princeton University Press, NJ. Clark, C.W. (1976). Mathematical Bioeconomics: The Optimal Management of Renewable Resources. John Wiley, New York. Dixit, A. and Skeath, S. (1999). Games of Strategy. W.W. Norton & Company, New York. Fleming, W.H. (1969). Optimal continuousparameter stochastic control. SIAM Review, 11:470–509. Haurie, A. (1976). A Note on nonzerosum diﬀerential games with bargaining solution. Journal of Optimization Theory and Applications, 18:31–39. Haurie, A., Krawczyk, J.B., and Roche, M. (1994). Monitoring cooperative equilibria in a stochastic diﬀerential game. Journal of Optimization Theory and Applications, 81:79–95. Isaacs, R. (1965). Diﬀerential Games. Wiley, New York. Jørgensen, S. and Yeung, D.W.K. (1996). Stochastic diﬀerential game model of a common property ﬁshery. Journal of Optimization Theory and Applications, 90:391–403. 13 Subgame Consistent DormantFirm Cartels 271 Jørgensen, S. and Yeung, D.W.K. (1999). Inter and intragenerational renewable resource extraction. Annals of Operations Research, 88:275– 289. Jørgensen, S. and Zaccour, G. (2001). Time consistency in cooperative diﬀerential games. In: G. Zaccour (ed.), Decision and Control in Management Sciences: Essays in Honor of Alain Haurie, pages 349–366, Kluwer Academic Publishers, London. Nash, J.F. (1953). Twoperson cooperative games. Econometrica, 21: 128–140. Petrosyan, L.A. (1977). Stable solutions of diﬀerential games with many participants. Viestnik of Leningrad University, 19:46–52. Petrosyan, L.A. (1997). Agreeable solutions in diﬀerential games. International Journal of Mathematics, Game Theory and Algebra, 7:165– 177. Petrosyan, L.A. (2003). Bargaining in dynamic games. In: L. Petrosyan and D. Yeung (eds.), ICM Millennium Lectures on Games, pages 139143, SpringerVerlag, Berlin, Germany. Petrosyan, L. and Danilov, N. (1979). Stability of solutions in nonzero sum diﬀerential games with transferable payoﬀs. Viestnik of Leningrad Universtiy, 1:52–59. Petrosyan, L.A. and Zenkevich, N.A. (1996). Game Theory. World Scientiﬁc, Republic of Singapore. Shapley, L.S. (1953). A value for N person games. In: H.W. Kuhn and A.W. Tucker (eds.), Contributions to the Theory of Games, vol. 2, pages 307–317, Princeton University Press, Princeton, New Jersey. Tolwinski, B., Haurie, A., and Leitmann, G. (1986). Cooperative equilibria in diﬀerential games. Journal of Mathematical Analysis and Applications, 119:182–202. Yeung, D.W.K. (2001). Inﬁnite horizon stochastic diﬀerential games with branching payoﬀs. Journal of Optimization Theory and Applications, 111:445–460. Yeung,D.W.K.(2003). Randomlyfurcating stochastic diﬀerential games. In: L. Petrosyan and D. Yeung (eds.), ICM Millennium Lectures on Games, pages 107–126, SpringerVerlag, Berlin, Germany. Yeung, D.W.K. and Petrosyan, L.A. (2001). Proportional timeconsistent solutions in diﬀerential games. International Conference on Logic, Game Theory and Social Choice, pages 254–256, St. Petersburg, Russia. Yeung, D.W.K. and Petrosyan, L.A. (2004). Subgame consistent cooperative solutions in stochastic diﬀerential games. Journal of Optimization Theory and Applications, 120:651–666. ...
View
Full
Document
This note was uploaded on 09/27/2010 for the course EE 229 taught by Professor R.srikant during the Spring '09 term at University of Illinois, Urbana Champaign.
 Spring '09
 R.Srikant
 The Land

Click to edit the document details