Patrick Cousot and Radhia Cousot. Abstract interpretation a unified lattice model for static analysi

Patrick Cousot and Radhia Cousot. Abstract interpretation a unified lattice model for static analysi

Info icon This preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 14
Image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 16
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Conference Record of the FOURTH ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES Papers Presented at the Symposium Los Angeles, California January 17—19, 1977 Sponsored by the ASSOCIATION FOR COMPUTING MACHINERY SPECIAL INTEREST GROUP ON AUTOMATA AND COMPUTABILITY THEORY SPECIAL INTEREST GROUP ON PROGRAMMING LANGUAGES ABSTRACT INTERPRETATION : A UNIFIED LATTICE MODEL FOR STATIC ANALYSIS OF PROGRAMS BY CONSTRUCTION OR APPROXIMATION OF FIXPOINTS Patrick Cousot*and Radhia Cousot* * Laboratoire d'Informatique, U.S.M.G., BP. 53 38041 J . Introduction A program denotes computations in some universe of objects. Abstract interpretation of programs con- sists in using that denotation to describe compu- tations in another universe of abstract objects, so that the results of abstract execution give some informations<n1the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -1515 *17 may be understood to denote computations on the abstract universe {(+), (-), (i)} where the se- mantics of arithmetic operators is defined by the rule of signs. The abstract execution -1515* 17 ==> -(+)* (+) ==> (-)* (+) ==> (-), Proves that ~1515 *17 is a negative number. Abstract interpre- tation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inaccurate (e.g. —15]5+—I7 ==> -(+)+-(+) ==> (-)-+(+) ==> (1)). Despite its fundamentally in- complete results abstract interpretation allows the programmer or the compiler to answer ques- tions which do not need full knowledge of program executions or which tolerate an imprecise answer, (e.g. partial correctness proofs of programs igno- ring the termination problems, type checking, pro— gram optimizations which are not carried in the absence of certainty about their feasibility, ...). 2 . Summary Section 3 describes the syntax and mathematical semantics of a simple flowchart language, Scott and Strachey[7l]. This mathematical semantics is used in section 4 to built a more abstract model of the semantics of programs, in that it ignores the sequencing of control flow. This model is taken to be the most concrete of the abstract interpretations of programs. Section 5 gives the formal definition of the abstract interpretations of a program. * Attache de Recherche au C.N.R.S., Laboratoire Associé no 7. ** This work was supported by TRIA-SESORI under grants 75-035 and 76-160. 238 Grenoble cedex, France Abstract program properties are modeled by a com— plete semilattice, Birkhoff[61l, Elementary pro— gram constructs are locally interpreted by order preserving functions which are used to associate a system of recursive equations with a program. The program global properties are then defined as one of the extreme fixpoints of that system, TarskifSSl The abstraction process is defined in section 6. It is shown that the program properties obtained by an abstract interpretation of a program are consis— tent with those obtained by a more refined inter~ pretation of that program. In particular, an ab— stract interpretation may be shown to be consistent with the formal semantics of the language. Levels of abstraction are formalized by showing that con- sistent abstract interpretations form a lattice (section 7). Section 8 gives a constructive defi— nition of abstract properties of programs based on constructive definitions of fixpoints. It shows that various classical algorithms such as Kildall [73], WegbreitL75] compute program properties as limits of finite Kleene[52]'s sequences. Section 9 introduces finite fixpoint approximation methods to be used when Kleene'ssequences are infinite, Cousotf76l. They are shown to be consistent with the abstraction process. Practical examples illus- trate the various sections. The conclusion points out that abstract interpretation of programs is a unified approach to apparently unrelated program analysis techniques. 3. Syntax and Semantics of Programs We will use finite flowcharts as a language inde- pendent representation of programs. 3.] Syntax Of‘a Program A program is built from a set ”Nodes”. Each node has successor and predecessor nodes n—succ, n-pred : Nodes->2N0desl (me:n-succ(n» <==>(ne n-pred(m)) Hereafter, we note ls] the cardinality of a set S. When IS[ = 1 so that S = {xlwe sometimes use S to denote x. The node subsets ”Entries”, ”Assignments", ”Testsl "Junctions” and ”Exits” partition the set Nodes. - An entry node (n 6 Entries) has no predecessors and one successor, ((n-pred(n) = Q) and (In-succ(n)| = 1)). - An assignment node (n s Assignments) has one predecessor and one successor ((ln—pred(n)| =1) and (En—succ(n)l = 1)). Let ”Ident” and ”Expr” be—the distinct syntactic categories of identi- fiers and expressions. An assignment node n as— signs the value of the right hand—side expres— sion expr(n) to the left hand-side identifier ig(n) expr Assignments 9 Expr id : Assignments e Ident — A test node (neTests) has a predecessor and two successors,((En-pred(n)l = 1) and (ln—succ(n)| = 2)). The true and false successor nodes are respectively denoted n-succ-t(n) and n-succ-f(n): n-succ-t, n-succ-f : Tests a Nodes (Vn e Tests, n—succ(n)=={n—succ-t(n), n—succ-f(n)l). Let ”Bexpr” be the syntactic category of boo— lean expressions, each test node n contains a boolean expression test(n) test = Tests + Bexpr - A junction node (n e Junctions)has one succes- sor and more than one predecessor, ((ln—succ(nfl = 1) and (ln—pred(n)l > l)). Immediate predeces— sor nodes of a junction node are not junction nodES, (Vn E Junctions, Vm E n—pred(n), not(m e Junctions)). — An exit node n has one predecessor and no succes- sor, ((ln—pred(n)\ = 1) and_(n—succ(n) = 0)). The set ”Arcs” of edges of a program is a subset of Nodes X Nodes defined by Arcs: {<n,m>‘ (n e Nodes) and (m e n-succ(n))} which may be equivalently definEdwby Arcs=={<n,m>‘ (m E Nodes) and (n E n-pred(m))} We will assume that the directed graph <Nodes, Arcs> is connected. We will use the following functions origin, end Arcs » Nodesl (Va 5 Arts, a = <origin(aL end(a)>) a—succ Nodes + ZArCS ‘ a—succ(n) = {<n,m>‘ m e n—succ(n)} a-pred : Nodes + ZArCS a—pred(n) = {<m,n>l m e n-pred(n)} a-succ-t : Tests a Arcs a-succ-t(n) = <n, n—succ-t(n)> a—succ-f : Tests » Arcs a—succ—f(n) = <n, n-succ—f(n)> EmampZe 239 3.2 Semantics of Programs This section develops a simple ”mathematical seman- tics” of programs, in the style of Scott and Strachey[7l]. If S is a set we denote S0 the complete lattice obtained frmnS by adjoining {iS’ Ts} to it, and imposing the ordering lSS x: TS for all x e S. Thesemantic domain "Values" is a complete latti- ce which is the sum of the lattice 3001 = {true, false}0 and some other primitive domains. Environments are used to hold the bindings of identifiers to their values Env = Icent0 + ValUES We assume that the meaning of an expression expr e Expr in the environment e e Env is given by val Uexpr (e) so that val : Expr e [Env + Values]. In particular the projection vall Bexpr of the function val in domain Bexpr has the functiona— lity “—— fl Bexpr Bexpr + [Env + Bool]. The state set "States” consists of the set of all information configurations that can occur during computations States = Arcs0 X Buy. A state (s 6 States) consists in a control state (cs(s)) and an environment (env(s)), such that Vs 6 States, 5 = <csfls), env(s)>. We use a continuous conditional function cond(b, e , e2) equal to l, e1, e2 or T respectively as the value of b is l, true, false or T. We also use if b then e else e fi to denote cond(b, e , 1 2 —— 1 If e e Env, v 6 Values, x e Ident then 2 [v/X] = Ry. cond(y = x, v, e(y)). The state transition function defines for each state a next state (we consider deterministic programs) n—state : States + States n-state(s) = let n be end(cs(s)), e be_env(s) within caserlin ~fiflAssiEnments ==> <a—succ(n),efval [expr(n)fl (e)/id(nfl> Tests ==>__~ «— cond(valfftest(n)j (e)| Bexpr, _:a:schFt?n): e>,<a-succ—f(n),e>) Junctions ==> <a—succ(n), e> Exits ==> s esac (Each partial function f on a set S is extended to a continuous total function on the correspon- ding domain S0 by f(l) = l, f(T) = T and f(x) =l if the partial function is undefined at x). Let iEnv be the bottom function on Env such that 0 .— (Vx e Ident , lEnv(x) — LValues>‘ Let I—states be the subset of initial states I-states = {<a-succ(m),l >‘ m 6 Entries} —————— Env - A "Computation sequence" with initial state iS E l—states is the sequence 8n = n-staten(is) for n = O, 1,... where f0 is the identity function and fn+1 = f o E“. — The initial to final state transition function 00 n-state States a States is the minimal fixpoint of the functional AF. (n—state o F) Therefore n—statea)= (AF .(n—state o F)) YStates+States where YD(f) denotes the least fixpoint of f : D + D, TarskiiSS]. 4. Static Semantics of Programs The constructive or operational semantics of pro- grams defined in section 3 considers the sequence in which states occur during execution. The funda— mental remark of Floydi67l is that to prove static properties of programs it is often sufficient to consider the sets of states associated with each program point. Hence, we define the context Cq at some program point q 6 Arcs of a program P to be the set of all environments which may be associated to q in all the possible computation sequences of P Cq E Contexts = 2Env Cq = (e! (in 2 O, 3 iS e I-states I <q,e> = n—staten(is))} The context vector Cv associates a context to each of the program points of a program EX 5 Context—Vectors = Arcs0 + Contexts EX = Aq .{ei (3n 2 O, 3 iS e I-states I <q,e> = n-staten(is))} According to the semantics of programs, the con- text gyfir) associated to arc r is related to the contexts Cyflq) at arcs q adjacent to r, . . r . . (end(q) = origin(r), —fl—%———e). From the defini- tion of the state transition function we can prove the equation Cy(r) = n-context(r, 91) where n—context : ArcsO X ContexteVectors 9 Contexts is defined by n-context(r, Cv) = case origin(r) in Entries ==>iI_ } Env Assignments u Tests U Junctions ==> L.) 86 Qy(q) QE a—pred(origin(r)) env-on(r)(n—state(<q,e>)) esac Arcs0 » [States » 2Env1 to be é§(s), iggyfis)}, ¢))). (We define env—on Ar. (ks. cond(r = 240 Since the equation Cv(r) = n-context(r, Cv) must be valid for each aft. Cv is a solution fo the sys- tEm of "forward’i equations Cv = F-cont(Cv) where_— _# F—cont : Context—Vectors + Context-Vectors is defined by F-cont(gy) = Ar Context—Vectors is a complete lattice with union U such that Cv U 9X_ = Ar .(Eyj(r) U 922(r)). ——d F—cont is order preserving for the ordering E of Context-Vectors which is defined by {911 Hence it is known that F—cont has fixpoints, Tarski [55]. However, it is trivial to exhibit examples which show that these fixpoints are not always unique. Fortunately, it can be shown that EX is included in any solution s to the system of equa- tions X = F-cont(X), (Cv E S). TarskifSS] shows that this property uniquely—detefmines CV as the least fixpoint of F-cont. Thus Cv can be—equivalently de- fined by __ . n-context(r, EX) E Ey2l<==> in e Arcs, EX (r) E.EK (r)l D1 Cv = Aq .{ei (in 2 O, % iS e I-states __ n . <q,e> = n—state (13))} or 2 = _ D CV YContext—Vectors(E;E23£) The concrete context vector EX is such that for any program point q 5 Arcs of the program P, (a) 92(q) contains at least the environments e which may be associated to q during any exe— cution of P {Qiz 0, % iS e I-Statesl <q,e> n-statel(is)} ==> {8 6 Egflq)} (B) Cz(q) contains only the environments e which may be associated to q during an execution of P {e 6 gm} => {3i 2 O, % iS 5 I-states ‘ <q,e> = n-statel(is)} Cv is merely a static summary of the possible exe— cutions of the program. However, our definitions D] or D2 of CV cannot be utilized at compile time since the compufation of Cv consists in fact in running the program (for all—the possible input data). In practice compilers may consider states which can never occur during program execution (e.g. some compilers consider that any program may always per— form a division by zero although this is not the case for most programs). Hence compilers may use ”abstract” contexts satisfying (a) but not necessa— rily (B), which thereforecorrectly approximate the concrete contexts we considered until now. 5. Abstract Interpretation of Programs 5.] Formal Definition An abstract interpretation Iofa program P is a tuple I = <A—Cont, o, s , T , l , E23) where the set of abstract contexts is a complete o~semilattice with ordering g, ({x S y} <==> {x o y = yl). This implies that A—Cont has a supre- mum T. We suppose also A—Cont to have an infimum l. This implies that A-Cont is in fact a complete lat— tice, but we need only one of the two join and meet operatipns. The set of context vectors is defined by A-Cont = Arcs“ » A-Cont. Whatever (Cv', Cv") e A:Eo'fit2 may be, we define Cv' E Cv" = hr. Cv'(r) o Cv”(r) Cv' ; Cv" = {Vr e Arcso, Cv'(r) S Cv"(r)} ; = Xr. T and I = Ar .l /‘\_/ N N N N <A~Cont, o, S , T , l> can be shown to be a com- plete lattice. The function 0 f\\/ Int Arcs X A-Cont e-A—Cont defines the interpretation of basic instructions. If {C(q)] q 6 a-pred(n)} is the set of input con- texts of node n, then the output context on exit arc r of n (r E a-succ(n)) is equal to Int(r, C). Tnt_is supposed to be ordermpreserving _—— f‘\./ Va 6 Arcs, V(Cv', Cv") e A-Contz, {9!} S EXT} ==> {Int(a, 93f) S Int(a, Ebe] The local interpretation of elementary program cons- tructs which is defined by Int is used to associate a system of equations with the program. We define ’\/ Int : 5:553: + A:BBnt| Tht(gy) = Ar. Int(r, 9y) It is easy to show that $31 is order-preserving. Hence it has fixpoints, TafskiESSJ. Therefore the context vector resulting from the abstract inter- pretation I of program P, which defines the global properties of P, may be chosen to be one of the extreme solutions to the system of equations c_v = men- 5.2 Typology Of‘Abstracf laterpratations The restriction that "A-Cont" must be a complete semi—lattice is not drastic since Mac Neille[37] showed that any partly ordered set S can be embed— ded in a complete lattice so that inclusion is pre- served, together with all greatest lower bounds and lowest upper bounds existing in S. Hence in practice the set of abstract contexts will be a lattice, which can be considered as a join (U) semi-lattice or a meet (n) semi—lattice, thus giving rise to two dual abstract interpretations. It is a pure coincidence that in most examples (see 5.3.2) the n or U operator represents the effect of path converging. The real need for this operator is to define completeness which ensures TEE to have extreme fixpoints (see 8.4). _—_ The result of an abstract interpretation was defined as a solution to forward (a) equations the output contexts on exit arcs of node n are defined as a function of the input contexts on entry arcs of node n. One can as well consider a system of backward (+) equations a context may be related to its succes- sors. Both systems (+, +) may also be combined. Finally we usually consider a maximal (f) or mini— mal (l) solution to the system of equations, (by agreement, maximal and minimal are related to the ordering g defined by (x s y) <==> (x u y = y) <==> (x n y = X)). However known examples such as Manna and Shamirf75] show that the suitable solu- tion may be somewhere between the extreme ones. These choices give rise to the following types of abstract interpretations (fl,*,i) (Uaa’a‘i’) (Ufiw) (0,",i) (mat) (mtfi) Examples : Kildall[73] uses (n,+,+), (U,*>,¢)- (mat)- Wegbreitf75] uses TenenbaumE74] uses both (U,+,l) and 5.3 Examples 5.3.] Static Semantics Of'Programs The static semanticsof programs we defined in sec- tion 4 is an abstract interpretation ISS = <Contexts, U, E; Env, Q, n-context> where Contexts, U, Q, Env, @, n—context, Context- Vectors, u, C, F—Cont respectively correspond to _, _______ ,\\, N i. ,\J A-Cont, o, S, T, l, Int, A-Cont, o, S, Int. 5.3.2 Data Flow AnaZysis Data flow analysis problems (see references in Ullman[75]) may be formalized as abstract inter- pretations of programs. "Available expressions" give a classical example. An expression is available on arc r, if whenever control reaches r, the value of the expression has been previously computed, and since the last com- putation of the expression, no argument of the EX’ pression has had its value changed. Let ExprP be the set of expressions occuring in a program P. Abstract contexts will be sets of available expressions, represented by boolean vec- tors B-vect + {true, false} ExprP B-vect is clearly a complete boolean lattice. The interpretation of basic nodes is defined by avail(r, Bv) let n BE origin(r) within case—n in EntriES ==> Xe .false Assignments U Tests U Junctions ===> Ae.(generatedfifl@fi or K and EXKP)(€» _, __ pea:predfifl and transparent(n)(e») esac (Nothing is available on entry arcs. An expression e is available on arc r (exit of node n) if either the expression e is generated by n or forall prede- cessorsp of n, e is available on p and n does not modify arguments of e). Theewailable expressions are determined by the ma— ximal solution (for ordering Xe .false<:ke .true) of the system of equations gala ( B_v) E17: The determination of available expressions, back- dominators, intervals, requires a forward sys- tem of equations. Some global flow problems, nota- bly the live variables and very busy expressions require propagating information backward through the program graph, they are examples of backward systems of equations. 5.3.3 Remarks Our formal definition of abstract interpretations has the completeness property since the model en- sures the existence of a particular solution to the system of equations and therefore defines at least some global property of the program. It must also have the consistency property, that is define only correct properties of programs. One can distinguish between syntactic and semantic abstract interpretations of a program. Syntactic interpretations are proved to be correct by refe— rence to the program syntax (e.g. the algorithm for finding available expressions is justified by rea- soning on paths of the program graph). By contrast semantic abstract interpretations must be proved to be consistent with the formal semantics of the language (e.g. constant propagation). 6. Consistent Abstract Interpretations An "abstract" interpretation I. <A-Cont, 3, E, ?', I, Ifif> of a program is consistent with a "con- crete" interpretation I <C—Cont, o, S, T, l, Int> if the context vector E? resulting from I'is a ESE; rect approximation of fhe context vector Cv resul- ting from the more refined interpretation—E. This may be rigorously defined by establishing a corres- pondence ( a abstraction) between copcrete and ab- stract context vectors, andinversely (Y : concreti- zation), and requiring 6.0 {c_vE $136)} and {My} In words the abstract context vector must at least contain the concrete one, (but not only the concrete one). If f D arD' we note 5 = Arcs0 4 D and 5'==Arcsoa»U and t 2 B a 5' Ad . (Ar.f(d(r)). We will suppose a and Y to satisfy the following hypothesis < E} A—Cont » C-Cont 6.1 OI C-Cont »m, y: 6.2 o and Y are order-preserving 6.3 Vx’e KiEBEE; ;’= o(y(§)) 6.4 Vx e C-Cont, x g y(a(x)) Intuitively, hypothesis 6.2 is necessary because context inclusion (that is property comparison) must be preserved by the abstraction or concreti- zation process. 6.3 requires that concretization introduces no loss of informa...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern