This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Abstract Interpretation Based Formal Methods and Future Challenges
Patrick Cousot
École normale supérieure, Département d’informatique, 45 rue d’Ulm, 75230 Paris cedex 05, France Patrick.Cousot@ens.fr http://www.di.ens.fr/˜cousot/ Abstract. In order to contribute to the solution of the software reliabil ity problem, tools have been designed to analyze statically the runtime behavior of programs. Because the correctness problem is undecidable, some form of approximation is needed. The purpose of abstract interpre tation is to formalize this idea of approximation. We illustrate informally the application of abstraction to the semantics of programming languages as well as to static program analysis. The main point is that in order to reason or compute about a complex system, some information must be lost, that is the observation of executions must be either partial or at a high level of abstraction. A few challenges for static program analysis by abstract interpretation are ﬁnally brieﬂy discussed. The electronic version of this paper includes a comparison with other formal methods: typing , modelchecking and deductive methods. 1 Introductory Motivations The evolution of hardware by a factor of 106 over the past 25 years has lead to the explosion of the size of programs in similar proportions. The scope of application of very large programs (from 1 to 40 millions of lines) is likely to widen rapidly in the next decade. Such big programs will have to be designed at a reasonable cost and then modiﬁed and maintained during their lifetime (which is often over 20 years). The size and eﬃciency of the programming and maintenance teams in charge of their design and followup cannot grow in similar proportions. At a not so uncommon (and often optimistic) rate of one bug per thousand lines such huge programs might rapidly become hardly manageable in particular for safety critical systems. Therefore in the next 10 years, the software reliability problem is likely to become a major concern and challenge to modern highly computerdependent societies. In the past decade a lot of progress has been made both on thinking/methodological tools (to enhance the human intellectual ability) to cope with complex software systems and mechanical tools (using the computer) to help the pro grammer to reason about programs. Mechanical tools for computer aided program veriﬁcation started by execut ing or simulating the program in as much as possible environments. However 132 Patrick Cousot debugging of compiled code or simulation of a model of the source program hardly scale up and often oﬀer a low coverage of dynamic program behavior. Formal program veriﬁcation methods attempt to mechanically prove that program execution is correct in all speciﬁed environments. This includes deduc tive methods, model checking, program typing and static program analysis. Since program veriﬁcation is undecidable, computer aided program veriﬁca tion methods are all partial or incomplete. The undecidability or complexity is always solved by using some form of approximation. This means that the me chanical tool will sometimes suﬀer from practical time and space complexity limitations, rely on ﬁniteness hypotheses or provide only semialgorithms, re quire user interaction or be able to consider restricted forms of speciﬁcations or programs only. The mechanical program veriﬁcation tools are all quite similar and essentially diﬀer in their choices regarding the approximations which have to be done in order to cope with undecidability or complexity. The purpose of abstract interpretation is to formalize this notion of approximation in a uniﬁed framework (10; 17). 2 Abstract Interpretation Since program veriﬁcation deals with properties, that is sets (of objects with these properties), abstract interpretation can be formulated in an application independent setting, as a theory for approximating sets and set operations as considered in set (or category) theory, including inductive deﬁnitions (25). A more restricted understanding of abstract interpretation is to view it as a theory of approximation of the behavior of dynamic discrete systems (e.g. the formal semantics of programs or a communication protocol speciﬁcation). Since such behaviors can be characterized by ﬁxpoints (e.g. corresponding to iteration), an essential part of the theory provides constructive and eﬀective methods for ﬁxpoint approximation and checking by abstraction (19; 23). 2.1 Fixpoint Semantics The semantics of a programming language deﬁnes the semantics of any program written in this language. The semantics of a program provides a formal math ematical model of all possible behaviors of a computer system executing this program in interaction with any possible environment. In the following we will try to explain informally why the semantics of a program can be deﬁned as the solution of a ﬁxpoint equation. Then, in order to compare semantics, we will show that all the semantics of a program can be organized in a hierarchy by ab straction. By observing computations at diﬀerent levels of abstraction, one can approximate ﬁxpoints hence organize the semantics of a program in a lattice (15). 2.2 Trace Semantics Our ﬁner grain of observation of program execution, that is the most pre Abstract Interpretation Based Formal Methods and Future Challenges 133 Initial states cise of the semantics that Final states of the Intermediate states we will consider, is that of finite traces abc d a trace semantics (15; 19). An execution of a program e f Infinite for a given speciﬁc interac traces g h tion with its environment x j i is a sequence of states, ob k served at discrete intervals of time, starting from an 0123456789 discrete time initial state, then moving from one state to the next Fig. 1. Examples of Computation Traces state by executing an atomic program step or transition and either ending in a ﬁnal regular or erroneous state or non terminating, in which case the trace is inﬁnite (see Fig. 1). 2.3 Least Fixpoint Trace Semantics Introducing the computational partial ordering (15), we deﬁne the trace seman tics in ﬁxpoint form (15), as the least solution of an equation of the form X = F (X ) where X ranges over sets of ﬁnite and inﬁnite traces. More precisely, let Behaviors be the set of execution traces of a program, possibly starting in any state. We denote by Behaviors+ the subset of ﬁnite traces and by Behaviors∞ the subset of inﬁnite traces. a z . • A ﬁnite trace •−−− . .−−− in Behaviors+ is either reduced to a ﬁnal state (in which case there is no possible transition from state • = •) or the initial state a a b • is not ﬁnal and the trace consists of a ﬁrst computation step •−−− after which, • b from the intermediate state • , the execution goes on with the shorter ﬁnite trace
b a z •−−−. . .−−−• ending in the ﬁnal state •. The ﬁnite traces are therefore all well deﬁned by induction on their length. a . . An inﬁnite trace •−−− . .−−− . . in Behaviors∞ starts with a ﬁrst computa
a b b z z tion step •−−−• after which, from the intermediate state • , the execution goes b on with an inﬁnite trace •−−−. . .−−−. . . starting from the intermediate state b •. These remarks and Behaviors = Behaviors+ ∪ Behaviors∞ lead to the following ﬁxpoint equation: Behaviors = {•  • is a ﬁnal state} ∪ {•−−− •−−− . .−−−  •−−− is an elementary step & . • • z b •−−− . .−−− ∈ Behaviors+ } . • •−−− . .−−− . .  •−−− is an elementary step & . . • ∪ {•−−−
b a b a b a b z a b aa •−−− . .−−− . . ∈ Behaviors∞ } . . In general, the equation has multiple solutions. For example if there is only one a a a nonﬁnal state • and only possible elementary step •−−−• then the equation is 134 Patrick Cousot a a a Behaviors = {•−−− •−−− . .−− . .  •−−− . .−−− . . ∈ Behaviors}. One solution . . . . a a a a •−−− •−−− •−−− . .−−− . .} but another one is the empty set ∅. Therefore, . . is {•−−− we choose the least solution for the computational partial ordering (15): « More ﬁnite traces & less inﬁnite traces » . 2.4 Abstractions & Abstract Domains A programming language semantics is more or less precise according to the considered observation level of program execution. This intuitive idea can be formalized by Abstract interpretation (15) and applied to diﬀerent languages , including for proof methods. The theory of abstract interpretation formalizes this notion of approximation and abstraction in a mathematical setting which is independent of particular applications. In particular, abstractions must be provided for all mathemati cal constructions used in semantic deﬁnitions of programming and speciﬁcation languages (19; 23). An abstract domain is an abstraction of the concrete semantics in the form of abstract properties (approximating the concrete properties Behaviors) and abstract operations (including abstractions of the concrete approximation and computational partial orderings, an approximation of the concrete ﬁxpoint trans former F , etc.). Abstract domains for complex approximations of designed by composing abstract domains for simpler components (19), see Sec. 2.10. If the approximation is coarse enough, the abstraction of a concrete seman tics can lead to an abstract semantics which is less precise, but is eﬀectively computable by a computer. By eﬀective computation of the abstract semantics, the computer is able to analyze the behavior of programs and of software before and without executing them (16). Abstract interpretation algorithms provide ap proximate methods for computing this abstract semantics. The most important algorithms in abstract interpretation are those providing eﬀective methods for the exact or approximate iterative resolution of ﬁxpoint equations (17). We will ﬁrst illustrate formal and eﬀective abstractions for sets. Then we will show that such abstractions can be lifted to functions and ﬁnally to ﬁxpoints. The abstraction idea and its formalization are equally applicable in other ar eas of computer science such as artiﬁcial intelligence e.g. for intelligent planning, proof checking, automated deduction, theorem proving, etc. 2.5 Hierarchy of Abstractions As shown in Fig. 2 (from (15), where Behaviors , denoted τ ∞ for short, is the lattice inﬁmum), all abstractions of a semantics can be organized in a lattice (which is part of the lattice of abstract interpretations introduced in (19)). The approximation partial ordering of this lattice formally corresponds to logical im plication, intuitively to the idea that one semantics is more precise than another one. Abstract Interpretation Based Formal Methods and Future Challenges Hoare pH s ❍ ❍❍ logics τ 135 τ s s weakest wlp ❍ ✯ ✟ ✟ τ wp ❍ precondition τ ❍❍✟✟ s semantics τ gwp sτ s denotational ✣✯ ✡ ✟s ❍ ✡✟τ τ ❍❍ semantics ✟ relational semantics trace semantics s ✯ ✟ ✟ τ tH ✟ ❍s ✟ gH sτ sSs s
τ τ EM
! D τ+ ✯ ✟ τ? s✟ ✶ ✏s ω s ❍ ❍ ✶s ✏✏ τ ✏✏τ ∂ ✻ τ ✻❍ ✏ ❍s ✏
∞ ❍✟ ✡ s sτ τ♦ τ s τ+ τ✻ ω s transition ✿ ✶sτ s ❍ ❍❍ ✏✏✘✘ τ semantics ✏✘✘ ✏✘ ❍✘✘ s ✏ ✲ abstraction τ∞ angelic natural demoniac deterministic inﬁnite equivalence restriction Fig. 2. The Hierarchy of Semantics Fig. 3 illustrates the derivation of a relational semantics (denoted τ ∞ in Fig. 2) from a trace semantics (denoted τ ∞ in Fig. 2). The abstraction αr from trace a z to relational semantics consists in replacing the ﬁnite traces •−−− . .−−− by the . • a b •−−− . .−−− . . . . pair a, z of the initial and ﬁnal states. The inﬁnite traces •−−− are replaced by the pair a, ⊥ where the symbol ⊥ denotes nontermination. Therefore the abstraction is: . • •−−− . .−−− . . ∈ X } . . . αr (X ) = { a, z  •−−− . .−−− ∈ X } ∪ { a, ⊥  •−−− The denotational semantics (denoted τ in Fig. 2) is the isomorphic representa tion of a relation by its rightimage: αd (R) = λ a · {x  a, x ∈ R}. The abstraction from relational to bigstep operational or natural seman tics (denoted τ + in Fig. 2) simply consists in forgetting everything about nontermination, so αn (R) = { a, x ∈ R  x = ⊥} , as illustrated in Fig. 3. A non comparable abstraction consists in collecting the set of initial and ﬁnal states as well as all transitions x,y appearing along some ﬁnite or inﬁnite trace •−−− . . •−−− . . . of the trace semantics. One gets the smallstep operational or . • transition semantics (denoted τ in Fig. 2 and also called Kripke structure in modal logic) as illustrated in Fig. 4. A further abstraction consists in collecting all states appearing along some ﬁnite or inﬁnite trace as illustrated in Fig. 5. This is the partial correctness semantics or the static /collecting semantics for proving invariance properties of programs.
a x y a z a b 136 Patrick Cousot
Initial states Intermediate states abc d e Initial states Final states ad ad ef Infinite α α traces ef gh ij g h ij k⊥ ⊥ 0123456789 discrete time x
x x Final states of finite traces f h j § x g
i k Trace semantics Relational semantics Natural semantics Fig. 3. Abstraction from Trace to Relational and Natural Semantics
Initial states a ab e e g g i k i k Transitions bc d f h j Final states d f h j Fig. 4. Transition Semantics All abstractions considered in this paper are “from above” so that the ab stract semantics describes a superset or logical consequence of the concrete semantics. Abstractions “from below” are dual and consider a subset of the concrete semantics. An example of approximation “from below” is provided by debugging techniques which consider a subset of the possible program executions or by existential checking where one wants to prove the existence of an execu tion trace preﬁx fulﬁlling some given speciﬁcation. In order to avoid repeating two times dual concepts and as we do usually, we only consider approximations “from above”, knowing that approximations “from below” can be easily derived by applying the duality principle (as found e.g. in lattice theory). 2.6 Eﬀective Abstractions Numerical Abstractions Assume that a program has two integer variables X and Y. The trace semantics of the program (Fig. 1) can be abstracted in the static/collecting semantics (Fig. 5). A further abstraction consists in forgetting in a state all but the values x and y of variables X and Y. In this way the trace semantics is abstracted to a set of points (pairs of values), as illustrated in the plane by Fig. 6(a). Abstract Interpretation Based Formal Methods and Future Challenges 137 x
x x
§ x Initial states abc a e e g g i i k k Reachable states d f h j Final states d f h j Fig. 5. Static / Collecting / Partial Correctness Semantics y {. . . , 5, 7 , . . . , 13, 21 , . . .}
x y x≥0 y≥0 x (a) [In]ﬁnite Set of Points (b) Sign Abstraction y x ∈ [3, 27] y ∈ [4, 32] y x = 5 mod 8 y = 7 mod 9 x x (c) Interval Abstraction (d) Simple Congruence Ab straction Fig. 6. Nonrelational Abstractions We now illustrate informally a number of eﬀective abstractions of an [in]ﬁnite set of points. Nonrelational Abstractions The nonrelational, attribute independent or cartesian abstractions (19 , example 6.2.0.2) consists in ignoring the possible relationships between the values of the X and Y variables. So a set of pairs is approximated through projection by a pair of sets. Each such set may still be inﬁnite and in general not exactly computer representable. Further abstractions are therefore needed. The sign abstraction (19) illustrated in Fig. 6(b) consists in replacing integers by their sign thus ignoring their absolute value. The interval abstraction (16) illustrated in Fig. 6(c) is more precise since it approximates a set of integers by 138 Patrick Cousot x
x x
§ x y 3 ≤ x ≤ 7 x+y ≤ 8 4 ≤ y ≤ 5 x−y ≤ 9
x y 7x + 3y ≤ 5 2x + 7y ≥ 0 x (a) Octagonal Abstraction (b) Polyhedral Abstraction y 3x + 5y = 8 mod 7 2x − 9y = 3 mod 5 y 3x + 7y ∈ [2, 7] mod 8 2x − 5y ∈ [0, 9] mod 4 x x (c) Relational Congruence Abstrac tion (d) Trapezoidal Congruence Abstrac tion Fig. 7. Relational Abstractions it minimal and maximal values (including −∞ and +∞ as well as the empty set if necessary). The congruence abstraction (38) (generalizing the parity abstraction (19)) is not comparable, as illustrated in Fig. 6(d). Relational Abstractions Relational abstractions are more precise than non relational ones in that some of the relationships between values of the program states are preserved by the abstraction. For example the polyhedral abstraction (31) illustrated in Fig. 7(b) approxi mates a set of integers by its convex hull. Only nonlinear relationships between the values of the program variables are forgotten. The use of an octagonal abstraction illustrated in Fig. 7(a) is less precise since only some shapes of polyhedra are retained or equivalently only linear relations between any two variables are considered with coeﬃcients +1 or 1 (of the form ±x ± y ≤ c where c is an integer constant). A non comparable relational abstraction is the linear congruence abstraction (39) illustrated in Fig. 7(c). A combination of nonrelational dense approximations (like intervals) and relational sparse approximations (like congruences) is the trapezoidal linear con gruence abstraction (48) as illustrated in Fig. 7(d). Symbolic Abstractions Most structures manipulated by programs are sym bolic structures such as control structures (call graphs), data structures (search Abstract Interpretation Based Formal Methods and Future Challenges 139 x
x x §y x x y y x x (a) yes (b) unkown (c) yes Fig. 8. Is 1/(X+1Y) welldeﬁned? trees, pointers (33; 34; 54; 58)), communication structures (distributed & mobile programs (36; 41; 57)), etc. It is very diﬃcult to ﬁnd compact and expressive abstractions of such sets of objects (sets of languages, sets of automata, sets of trees or graphs, etc.). For example Büchi automata or automata on trees are very expressive but algorithmically expensive. A compromise between semantic expressivity and algorithmic eﬃciency was recently introduced by (49) using Binary Decision Graphs and Tree Schemata to abstract inﬁnite sets of inﬁnite trees. 2.7 Information Loss Any abstraction introduces some loss of information. For example the abstrac tion of the trace semantics into relational or denotational semantics loses all information on the computation cost since all intermediate steps in the execu tion are removed. All answers given by the abstract semantics are always correct with respect to the concrete semantics. For example, if termination is proved using the relational semantics then there is no execution abstracted to a, ⊥ , so there is no inﬁnite a b trace •−−−•−−−. . .−−−. . . in the trace semantics, whence non termination is impossible when starting execution in initial state a. However, because of the information loss, not all questions can be deﬁnitely answered with the abstract semantics. For example, the natural semantics can not answer questions about termination as can be done with the relational or denotational semantics. These semantics cannot answer questions about con crete computation costs. The more concrete is the semantics, the more questions it can answer. The more abstract semantics are simpler. Non comparable abstract semantics (such as intervals and congruences) answer non comparable sets of questions. To illustrate the loss of information, let us consider the problem of deciding whether the operation 1/(X+1Y) appearing in a program is always well deﬁned at runtime. The answer can certainly be given by the concrete semantics since it has no point on the line x + 1 − y = 0 , as shown in Fig. 8(a). 140 Patrick Cousot In practice the concrete abstraction is not computable so it is hardly usable in a useful eﬀective tool. The dense abstractions that we have considered are too approximate as is illustrated in Fig. 8(b). However the answer is positive when using the relational congruence abstrac tion, as shown in Fig. 8(c). 2.8 Function Abstraction We now show how the abstraction of complex mathematical objects used in the semantics of programming or speciﬁcation languages can be deﬁned by compos ing abstractions of simpler mathematical structures. For example knowing abstractions of the Abstract domain parameter and result of a monotonic function F on sets, a function F can be abstracted into an abstract function F as illustrated in Fig. 9 (19). Mathematically, F takes its parame α ter x in the abstract domain. Let γ (x) be the corresponding concrete set (γ is the adjoined, x F intuitively the inverse of the abstraction func tion α). The function F can be applied to get Concrete domain the concrete result ◦ F ◦ γ (x). The abstraction function α can then be applied to approximate F =α◦F ◦γ the result F (x) = α ◦ F ◦ γ (x). In general, neither F , α nor γ are computable Fig. 9. Function Abstraction even though the abstraction α may be eﬀective. So we have got a formal speciﬁcation of the abstract function F and an algo rithm has to be found for an eﬀective implementation. 2.9 Fixpoint Abstraction A ﬁxpoint of a function F can often be obtained as the limit of the iterations of F from a given initial value ⊥. In this case the abstraction of the ﬁxpoint can often be obtained as the abstract limit of the iteration of the abstraction F of F starting from the abstraction α(⊥) of the initial value ⊥. The basic result is that the concretization of the abstract ﬁxpoint is related to the concrete ﬁxpoint by the approximation relation expressing the soundness of the abstraction (19). This is illustrated in Fig. 10. Often states have some ﬁnite component (e.g. a program counter) which can be used to partition into ﬁxpoint system of equations by projection along that component. Then chaotic (18) and asynchronous iteration strategies (10) can be used to solve the equations iteratively. Various eﬃcient iteration strategies have been studied , including ones taking particular properties of abstractions into account and others to speed up the convergence of the iterates (24). Abstract Interpretation Based Formal Methods and Future Challenges 141 x
x x
§ § ⊥ Abstract domain F F F α α F α F F α α F Approximation relation F F ⊥ F F F F F F F Concrete domain
γ (lfp F ) Fig. 10. Fixpoint Abstraction lfp F 2.10 Composing Abstractions Abstractions hence abstract interpreters for static program analysis can be de signed compositionally by stepwise abstraction, combination or reﬁnement (37; 13). An example of stepwise abstraction is the functional abstraction of Sec. 2.8. The abstraction of a function is parameterized by abstractions for the function parameters and the function result which can be chosen later in the modular design of the abstract interpreter. An example of abstraction combination is the reduced product of two abstrac tions (19) which is the most abstract abstraction more precise than these two abstractions or the reduce cardinal power (19) generalizing case analysis. Such combination of abstract domains can be implemented as parameterized modules in static analyzer generators (e.g. (46)) so as to partially automate the design of expressive analyses from simpler ones. An example of reﬁnement is the disjunctive completion (19) which completes an abstract domain by adding concrete disjunctions missing in the abstract domain. Another example of abstract domain reﬁnement is the complementation (8) adding concrete negations missing in the abstract domain. 2.11 Sound and Complete Abstractions Abstract interpretation theory has mainly been concerned with the soundness of the abstract semantics/interpreter, relative to which questions can be answered correctly despite the loss of information (17). Soundness is essential in practice and leads to a formal design method (19). However completeness , relative to the formalization of the loss of information in a controlled way so as to answer a given set of questions, has also been intensively studied (19; 37), including in the context of model checking (14). In practice complete abstractions, including a most abstract one, always exist to check that a given program semantics satisﬁes a given speciﬁcation. 142 Patrick Cousot Moreover any given abstraction can be reﬁned to a complete one. Nevertheless this approach has severe practical limitations since, in general, the design of such complete abstractions or the reﬁnement of a given one is logically equiva lent to the design of an inductive argument for the formal proof that the given program satisﬁes the given speciﬁcation, while the soundness proof of this ab straction logically amounts to checking the inductive veriﬁcation conditions or proof obligations of this formal proof (14). Such proofs can hardly be fully auto mated hence human interaction is unavoidable. Moreover the whole process has to be repeated each time the program or speciﬁcation is modiﬁed. Instead of considering such strong speciﬁcations for a given speciﬁc program, the objective of static program analysis is to consider (often predeﬁned) spec iﬁcations and all possible programs. The practical problem in static program analysis is therefore to design useful abstractions which are computable for all programs and expressive enough to yield interesting information for most pro grams. 3 Static Program Analysis Static program analysis is the automatic static determination of dynamic runtime properties of programs. 3.1 Foundational Ideas of Static Program Analysis Specification Program Given a program and a speciﬁcation, a pro gram analyzer will check if the program seman tics satisﬁes the speciﬁcation (Fig. 11). In case Program analyzer of failure, the analyzer will provide hints to un derstand the origin of errors (e.g. by a backward x analysis providing necessary conditions to be sat Diagnosis isﬁed by counterexamples). The principle of the analysis is to compute an approximate semantics of the program in order Fig. 11. Program Analysis to check a given speciﬁcation. Abstract interpretation is used to derive, from a standard semantics, the approximate and computable abstract semantics. The derivation can often be done by composing standard abstractions to ﬁt a partic ular kind of information which has to be discovered about program execution. This derivation is itself not (fully) mechanizable but static analyzer generators such as PAG (47) and others can provide generic abstractions to be composed with problem speciﬁc ones. In practice, the program analyzer contains a generator reading the pro gram text and producing equations or constraints whose solution is a com puter representation of the program abstract semantics. A solver is then used to solve these abstract equations/constraints. A popular resolution method is to use iteration. Of the numerical abstractions considered in Sec. 2.6 , only the sign and simple congruence abstractions ensure the ﬁnite convergence of Abstract Interpretation Based Formal Methods and Future Challenges 143 the iterates. If the limit of the iterates is inexistent (which may be the case e.g. for the polyhedral abstraction) or it is reached after inﬁnitely many it eration steps (e.g. interval and octagonal abstractions), the convergence may have to be ensured and/or accelerated using a widening to over estimate the solution in ﬁnitely many steps followed by a narrowing to improve it (10; 17; 24). In abstract compilation, the gen Specification Program erator and solver are directly com piled into a program which directly Generator yields the approximate solution. System of fixpoint equations/constraints This solution is an approxima tion of the abstract semantics which Solver is then used by a diagnoser to check x (Approximate) solution the speciﬁcation. Because of the loss Program of information, the diagnosis is al Diagnoser analyzer ways of the form “yes ”, “no ”, “un known ” or “irrelevant ” (e.g. a safety Diagnosis speciﬁcation for unreachable code). The general structure of program an alyzers is illustrated in Fig. 12. Be Fig. 12. Principle of Program Analysis sides diagnosis, static program analysis is also used for other applications in which case the diagnoser is replaced by an optimiser (for compiletime opti mization), a program transformer (for partial evaluation (44)), etc. 3.2 Shortcomings of Static Program Analysis Static program analysis can be used for large programs (e.g. 220,000 lines of C) without user interaction. The abstractions are chosen to be of wide scope with out specialization to a particular program. Abstract algebras can be designed and implemented into libraries which are reusable for diﬀerent programming languages. The objective is to discover invariants that are likely to appear in many programs so that the abstraction must be widely reusable for the program analyzer to be of economic interest. The drawback of this general scope is that the considered abstract speciﬁ cations and properties are often simple, mainly concerning elementary safety properties such as absence of runtime errors. For example nonlinear abstrac tions of sets of points are very diﬃcult and very few mathematical results are of practical interest and directly applicable to program analysis. Checking termi nation and similar liveness properties is trivial with ﬁnite state systems, at least from a theoretical if not algorithmic point of view (e.g. ﬁnding loops in ﬁnite graphs). The same problem is much more diﬃcult for inﬁnite state systems be cause of fairness (49) or of potentially inﬁnite data structures (as considered e.g. in partial evaluation) which do not amount to ﬁnite cycles so that termination or inevitability proofs require the discovery of variant functions on wellfounded sets which is very diﬃcult in full generality. 144 Patrick Cousot Even when considering restricted simple abstract properties, the semantics of reallife programming languages is very complex (recursion, concurrency, modu larity, etc.) whence so is the corresponding abstract interpreter. The abstraction of this semantics, hence the design of the analyzer is mostly manual (and beyond the ability of casual programmers or theorem provers) whence costly. The con sidered abstractions must have a large scope of application and must be easily reusable to be of economic interest. From a user point of view, the results of the analysis have to be presented in a simple way (for example by pointing at errors only or by providing abstract counterexamples, or less frequently concrete ones). Experience shows that the cases of uncertainty represent 5 to 10 % of the possible cases. They must be handled with other empirical or formal methods (including more reﬁned abstract interpretations). 3.3 Applications of Static Program Analysis Among the numerous applications of static program analysis, let us cite data ﬂow analysis (53; 28); program optimization and transformation (including par tial evaluation and program specialization (44) and data dependence analy sis for the parallelisation of sequential languages); setbased analysis (27); type inference (12) (including undecidable systems and soft typing); veriﬁcation of reactive (40; 43), realtime and (linear) hybrid systems including state space re duction; cryptographic protocol analysis; abstract modelchecking of inﬁnite sys tems (28); abstract debugging, testing and veriﬁcation ; cache and pipeline behav ior prediction (35); probabilistic analysis (50); communication topology analysis for mobile/distributed code (36; 41; 57); automatic diﬀerentiation of numeri cal programs; abstract simulation of temporal speciﬁcations; Semantic tattoo ing/watermarking of software (30); etc. Static program analysis has been intensively studied for a variety of pro gramming languages including procedural languages (e.g. for alias and pointer analysis (33; 34; 54; 58)), functional languages (e.g. for binding time (56), strict ness (4; 51) and comportment analysis (26), exception analysis (59)), parallel functional languages, data parallel languages, logic languages including Prolog (1; 22; 32) (e.g. for groundness (9), sharing (7), freeness (5) and their combina tions (6), parallelizatiion (3), etc.), database programming languages, concurrent logic languages, functional logic languages, constraint logic languages, concur rent constraint logic languages, speciﬁcation languages, synchronous languages, procedural/functional concurrent/parallel languages (21), communicating and distributed languages (20) and more recently objectoriented languages (2; 55). Abstract interpretation based static program analyses have been used for the static analysis of the embedded ADA software of the Ariane 5 launcher1 and the ARD2 (45). The static program analyser aims at the automatic detection of
1 2 Flight software (60,000 lines of Ada code) and Inertial Measurement Unit (30,000 lines of Ada code). Atmospheric Reentry Demonstrator. Abstract Interpretation Based Formal Methods and Future Challenges 145 the deﬁniteness , potentiality , impossibility or inaccessibility of runtime errors such as scalar and ﬂoatingpoint overﬂows, array index errors, divisions by zero and related arithmetic exceptions, uninitialized variables, data races on shared data structures, etc. The analyzer was able to automatically discover the Ariane 501 ﬂight error. The static analysis of embedded safety critical software (such as avionic software (52)) is very promising (29). 3.4 Industrialization of Static Analysis by Abstract Interpretation The impressive results obtained by the static analysis of reallife embedded critical software (45; 52) is quite promising for the industrialization of abstract interpretation. This is the explicit objective of AbsInt Angewandte Informatik GmbH created in Germany by R. Wilhelm and C. Ferdinand in 1998 commercial izing the program analyzer generator PAG and an application to determine the worstcase execution time for modern computer architectures with memory caches, pipelines, etc (35). was created in France by A. Deutsch and D. Polyspace Technologies Pilaud in 1999 to develop and commercialize ADA and C program analyzers. created in Other companies like Connected Components Corporation the U.S.A. by W.L. Harrison in 1993 use abstract interpretation internally e.g. for compiler design (42). 4 Grand Challenge for the Next Decade We believe that in the next decade the software industry will certainly have to face its responsibility imposed by a computerdependent society, in particular for safety critical systems. Consequently, Software reliability 3 will be a grand challenge for computer science and practice. The grand challenge for formal methods, in particular abstract interpretation based formal tools, is both the large scale industrialization and the intensiﬁca tion of the fundamental research eﬀort. Generalpurpose, expressive and costeﬀective abstractions have to be devel oped e.g. to handle ﬂoating point numbers, data dependences (e.g. for paralleliza tion), liveness properties with fairness (to extend ﬁnitestate modelchecking to software), timing properties for embedded software, probabilistic properties, etc. Presentday tools will have to be enhanced to handle higherorder compositional modular analyses and to cope with new programming paradigms involving com plex data and control concepts (such as objects, concurrent threads, distrib uted/mobile programming, etc.), to automatically combine and locally reﬁne abstractions in particular to cope with “unknow” answers, to interact nicely with users and other formal or informal methods.
3 other suggestions were “trustworthiness” (C. Jones) and “robustness” (R. Leino). 146 Patrick Cousot The most challenging objective might be to integrate formal analysis by abstract interpretation in the full software development process, from the initial speciﬁcations to the ultimate program development. Acknowledgements I thank Radhia Cousot and Reinhard Wilhelm for their comments on a preliminary version of this paper. This work was supported by the daedalus (29) and tuamotu (30) projects. References [1] R. Barbuti, R. Giacobazzi, and G. Levi. A general framework for se manticsbased bottomup abstract interpretation of logic programs. TOPLAS , 15(1):133–181, Jan. 1993. [2] B. Blanchet. Escape analysis for objectoriented languages: Application to Java. In Proc. ACM SIGPLAN Conf. OOPSLA ’99. ACM SIGPLAN Not. 34(10) , pages 20–34, Denver, CO, US, 1–5 Nov. 1999. [3] F. Bueno, M.J. García de la Banda, and M.V. Hermenegildo. Eﬀectiveness of abstract interpretation in automatic parallelization: A case study in logic programming. TOPLAS , 21(2):189–239, Mar. 1999. [4] G.L. Burn, C.L. Hankin, and S. Abramsky. Strictness analysis of higherorder functions. Sci. Comput. Programming, 7:249–278, Nov. 1986. [5] M. Codish, D. Dams, G. Filè , and M. Bruynooghe. Freeness analysis for logic programs – and correctness? In D.S. Warren, editor, Proc. 10th ICLP ’93 , Budapest, HU, pages 116–131. MIT Press, 21–25 June 1993. [6] M. Codish, H. Søndergaard, and P.J. Stuckey. Sharing and groundness de pendencies in logic programs. TOPLAS , 21(5):948–976, Sep. 1999. [7] A. Cortesi and G. Filé. Sharing is optimal. J. Logic Programming, 38(3):371–386, 1999. [8] A. Cortesi, G. Filé, R. Giacobazzi, C. Palamidessi, and F. Ranzato. Comple mentation in abstract interpretation. TOPLAS , 19(1):7–47, Jan. 1997. [9] A. Cortesi, G. Filé , and W.H. Winsborough. Optimal groundness analysis using propositional logic. J. Logic Programming , 27(2):137–167, 1996. [10] P. Cousot. Méthodes itératives de construction et d’approximation de points ﬁxes d’opérateurs monotones sur un treillis, analyse sémantique de programmes. Thèse d’ État ès sciences mathématiques, Université scientiﬁque et médicale de Grenoble, Grenoble, FR, 21 Mar. 1978. [11] P. Cousot. Constructive design of a hierarchy of semantics of a transition system by abstract interpretation. ENTCS , 6, 1997. http://www.elsevier.nl/locate/entcs/volume6.html , 25 pages. [12] P. Cousot. Types as abstract interpretations, invited paper. In 24th POPL , pages 316–331, Paris, FR, Jan. 1997. ACM Press. [13] P. Cousot. The calculational design of a generic abstract interpreter. In M. Broy and R. Steinbrüggen, editors, Calculational System Design, vol ume 173, pages 421–505. NATO Science Series, Series F: Computer and Systems Sciences. IOS Press, 1999. [14] P. Cousot. Partial completeness of abstract ﬁxpoint checking, invited pa per. In B.Y. Choueiry and T. Walsh, editors, Proc. 4th Int. Symp. SARA ’2000 , Horseshoe Bay, TX, US, LNAI 1864, pages 1–25. SpringerVerlag, 26–29 Jul. 2000. 148 Patrick Cousot [15] P. Cousot. Constructive design of a hierarchy of semantics of a transition system by abstract interpretation. Theoret. Comput. Sci. , To appear (Preliminary version in (11)). [16] P. Cousot and R. Cousot. Static determination of dynamic properties of pro grams. In Proc. 2nd Int. Symp. on Programming, pages 106–130. Dunod, 1976. [17] P. Cousot and R. Cousot. Abstract interpretation: a uniﬁed lattice model for static analysis of programs by construction or approximation of ﬁxpoints. In 4th POPL , pages 238–252, Los Angeles, CA, 1977. ACM Press. [18] P. Cousot and R. Cousot. Automatic synthesis of optimal invariant asser tions: mathematical foundations. In ACM Symposium on Artiﬁcial In telligence & Programming Languages, Rochester, NY, ACM SIGPLAN Not. 12(8):1–12, 1977. [19] P. Cousot and R. Cousot. Systematic design of program analysis frame works. In 6th POPL , pages 269–282, San Antonio, TX, 1979. ACM Press. [20] P. Cousot and R. Cousot. Semantic analysis of communicating sequential processes. In J.W. de Bakker and J. van Leeuwen, editors, 7th ICALP , LNCS 85, pages 119–133. SpringerVerlag, Jul. 1980. [21] P. Cousot and R. Cousot. Invariance proof methods and analysis techniques for parallel programs. In A.W. Biermann, G. Guiho, and Y. Kodratoﬀ, editors, Automatic Program Construction Techniques , chapter 12, pages 243–271. Macmillan, 1984. [22] P. Cousot and R. Cousot. Abstract interpretation and application to logic programs. J. Logic Programming , 13(2–3):103–179, 1992. (The editor of
J. Logic Programming has mistakenly published the unreadable galley proof. For a correct version of this paper, see http://www.di.ens.fr/˜cousot .). [23] P. Cousot and R. Cousot. Abstract interpretation frameworks. J. Logic and Comp. , 2(4):511–547, Aug. 1992. [24] P. Cousot and R. Cousot. Comparing the Galois connection and widen ing/narrowing approaches to abstract interpretation, invited paper. In M. Bruynooghe and M. Wirsing, editors, Proc. 4th Int. Symp. PLILP ’92 , Leuven, BE, 26–28 Aug. 1992, LNCS 631, pages 269–295. SpringerVerlag, 1992. [25] P. Cousot and R. Cousot. Inductive deﬁnitions, semantics and abstract in terpretation. In 19th POPL , pages 83–94, Albuquerque, NM, 1992. ACM Press. [26] P. Cousot and R. Cousot. Higherorder abstract interpretation (and ap plication to comportment analysis generalizing strictness, termination, projection and PER analysis of functional languages), invited paper. In Proc. 1994 ICCL , pages 95–112, Toulouse, FR, 16–19 May 1994. IEEE Comp. Soc. Press. [27] P. Cousot and R. Cousot. Formal language, grammar and setconstraintbased program analysis by abstract interpretation. In Proc. 7th FPCA , pages 170–181, La Jolla, CA, 25–28 June 1995. ACM Press. Abstract Interpretation Based Formal Methods and Future Challenges 149 [28] P. Cousot and R. Cousot. Temporal abstract interpretation. In 27th POPL , pages 12–25, Boston, MA, Jan. 2000. ACM Press. [29] P. Cousot, R. Cousot, A. Deutsch, C. Ferdinand, É. Goubault, N. Jones, D. Pilaud, F. Randimbivololona, M. Sagiv, H. Seidel, and R. Wilhelm. DAEDALUS: Validation of critical software by static analysis and ab stract testing. Project IST199920527 of the european 5th Framework Programme (FP5), Oct. 2000 – Oct. 2002. [30] P. Cousot, R. Cousot, and M. Riguidel. TUAMOTU: Tatouage électronique sémantique de code mobile Java. Project RNRT 1999 n◦ 95, Oct. 1999 – Oct. 2001. [31] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In 5th POPL , pages 84–97, Tucson, AZ, 1978. ACM Press. [32] S.K. Debray. Formal bases for dataﬂow analysis of logic programs. In G. Levi, editor, Advances in Logic Programming Theory , Int. Schools for Computer Scientists, section 3, pages 115–182. Clarendon Press, 1994. [33] A. Deutsch. Semantic models and abstract interpretation techniques for in ductive data structures and pointers, invited paper. In Proc. PEPM ’95 , pages 226–229, La Jolla, CA, 21–23 June 1995. ACM Press. [34] N. Dor, M. Rodeh, and M. Sagiv. Checking cleanness in linked lists. In J. Palsberg, editor, Proc. 7th Int. Symp. SAS ’2000 , Santa Barbara, CA, US, LNCS 1824, pages 115–134. SpringerVerlag, 29 June – 1 Jul. 2000. [35] C. Ferdinand, F. Martin, R. Wilhelm, and M. Alt. Cache behavior predic tion by abstract interpretation. Sci. Comput. Programming, Special Issue on SAS’96 , 35(1):163–189, September 1999. [36] J. Feret. Conﬁdentiality analysis of mobile systems. In J. Palsberg, editor, Proc. 7th Int. Symp. SAS ’2000 , Santa Barbara, CA, US, LNCS 1824, pages 135–154. SpringerVerlag, 29 June – 1 Jul. 2000. [37] R. Giacobazzi, F. Ranzato, and F. Scozzari. Making abstract interpreta tions complete. J. ACM , 47(2):361–416, 2000. [38] P. Granger. Static analysis of arithmetical congruences. Int. J. Comput. Math. , 30:165–190, 1989. [39] P. Granger. Static analysis of linear congruence equalities among variables of a program. In S. Abramsky and T.S.E. Maibaum, editors, Proc. Int. J. Conf. TAPSOFT ’91, Volume 1 (CAAP ’91) , Brighton, GB, LNCS 493, pages 169–192. SpringerVerlag, 1991. [40] N. Halbwachs. About synchronous programming and abstract interpreta tion. Sci. Comput. Programming, 31(1):75–89, May 1998. [41] R.R. Hansen, J.G. Jensen, F. Nielson, and H. Riis Nielson. Abstract inter pretation of mobile ambients. In A. Cortesi and G. Filé, editors, Proc. 6th Int. Symp. SAS ’99 , Venice, IT, 22–24 Sep. 1999, LNCS 1694, pages 134–138. SpringerVerlag, 1999. [42] W.L. Harrison. Can abstract interpretation become a main stream com piler technology? (abstract). In P. Van Hentenryck, editor, Proc. 4th 150 Patrick Cousot [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] Int. Symp. SAS ’97 , Paris, FR, 8–10 Sep. 1997, LNCS 1302, page 395. SpringerVerlag, 1997. T.A. Henzinger, R. Majumbar, F. Mang, and J.F. Raskin. Abstract in terpretation of game properties. In J. Palsberg, editor, Proc. 7th Int. Symp. SAS ’2000 , Santa Barbara, CA, US, LNCS 1824, pages 220–239. SpringerVerlag, 29 June – 1 Jul. 2000. N.D. Jones. Combining abstract interpretation and partial evaluation (brief overview). In P. Van Hentenryck, editor, Proc. 4th Int. Symp. SAS ’97 , Paris, FR, 8–10 Sep. 1997, LNCS 1302, pages 396–405. SpringerVerlag, 1997. P. Lacan, J.N. Monfort, L.V.Q. Ribal, A. Deutsch, and G. Gonthier. The software reliability veriﬁcation process: The Ariane 5 example. In Pro ceedings DASIA 98 – DAta Systems In Aerospace , Athens, GR. ESA Publications, SP422, 25–28 May 1998. B. Le Charlier and P. Van Hentenryck. Experimental evaluation of a generic abstract interpretation algorithm for Prolog. In Proc. 1992 ICCL , Oak land, CA, pages 137–146. IEEE Comp. Soc. Press, 20–23 Apr. 1992. F. Martin. Generating Program Analyzers. Pirrot Verlag, Saarbrücken, DE, 1999. F. Masdupuy. Semantic analysis of interval congruences. In D. Bjørner, M. Broy, and I.V. Pottosin, editors, Proc. FMPA , Akademgorodok, Novosi birsk, RU, LNCS 735, pages 142–155. SpringerVerlag, 28 June – 2 Jul. 1993. L. Mauborgne. Tree schemata and fair termination. In J. Palsberg, editor, Proc. 7th Int. Symp. SAS ’2000 , Santa Barbara, CA, US, LNCS 1824, pages 302–321. SpringerVerlag, 29 June – 1 Jul. 2000. D. Monniaux. Abstract interpretation of probabilistic semantics. In J. Pals berg, editor, Proc. 7th Int. Symp. SAS ’2000 , Santa Barbara, CA, US, LNCS 1824, pages 322–339. SpringerVerlag, 29 June – 1 Jul. 2000. A. Mycroft. Abstract Interpretation and Optimising Transformations for Ap plicative Programs. Ph.D. Dissertation, CST1581, Department of Com puter Science, University of Edinburgh, Edinburg, UK, Dec. 1981. F. Randimbivololona, J. Souyris, and A. Deutsch. Improving avionics soft ware veriﬁcation costeﬀectiveness: Abstract interpretation based tech nology contribution. In Proceedings DASIA 2000 – DAta Systems In Aerospace , Montreal, CA. ESA Publications, 22–26 May 2000. D.A. Schmidt and B. Steﬀen. Program analysis as model checking of ab stract interpretations. In G. Levi, editor, Proc. 5th Int. Symp. SAS ’98 , Pisa, IT, 14–16 Sep. 1998, LNCS 1503, pages 351–380. SpringerVerlag, 1998. J. Stransky. A lattice for abstract interpretation of dynamic (lisplike) structures. Inform. and Comput. , 101(1):70–102, Nov. 1992. R. ValléeRai, H. Hendren, P. Lam, É Gagnon, and P. Co. Soot  a Javatm optimization framework. In CASCON ’99 , Sep. 1999. Abstract Interpretation Based Formal Methods and Future Challenges 151 [56] F. Védrine. Bindingtime analysis and strictness analysis by abstract inter pretation. In A. Mycroft, editor, Proc. 2nd Int. Symp. SAS ’95 , Glasgow, UK, 25–27 Sep. 1995, LNCS 983, pages 400–417. SpringerVerlag, 1995. [57] A. Venet. Automatic determination of communication topologies in mobile systems. In G. Levi, editor, Proc. 5th Int. Symp. SAS ’98 , Pisa, IT, 14–16 Sep. 1998, LNCS 1503, pages 152–167. SpringerVerlag, 1998. [58] A. Venet. Automatic analysis of pointer aliasing for untyped programs. Sci. Comput. Programming, Special Issue on SAS’96 , 35(1):223–248, Septem ber 1999. [59] Kwangkeun Yi. An abstract interpretation for estimating uncaught exceptions in standard ML programs. Sci. Comput. Programming, 31(1):147–173, May 1998.
The electronic version of this paper includes additional material on static pro gram analysis applications as well as a comparison with other formal methods (typing, modelchecking and deductive methods) which, for lack of space, could not be included in this published version. A broader bibliography is available in its extended version.x ...
View
Full
Document
This note was uploaded on 04/18/2011 for the course COMPUTER S 1111 taught by Professor Name during the Spring '05 term at MIT.
 Spring '05
 Name

Click to edit the document details