This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Conference Record
of the FOURTH ACM SYMPOSIUM ON
PRINCIPLES OF PROGRAMMING LANGUAGES Papers Presented at the Symposium
Los Angeles, California
January 17—19, 1977 Sponsored by the
ASSOCIATION FOR COMPUTING MACHINERY
SPECIAL INTEREST GROUP ON AUTOMATA AND COMPUTABILITY THEORY
SPECIAL INTEREST GROUP ON PROGRAMMING LANGUAGES ABSTRACT INTERPRETATION : A UNIFIED LATTICE MODEL FOR STATIC ANALYSIS OF PROGRAMS BY CONSTRUCTION OR APPROXIMATION OF FIXPOINTS Patrick Cousot*and Radhia Cousot* * Laboratoire d'Informatique, U.S.M.G., BP. 53 38041 J . Introduction A program denotes computations in some universe of
objects. Abstract interpretation of programs con
sists in using that denotation to describe compu
tations in another universe of abstract objects,
so that the results of abstract execution give
some informations<n1the actual computations. An
intuitive example (which we borrow from Sintzoff
[72]) is the rule of signs. The text 1515 *17 may be understood to denote computations on the
abstract universe {(+), (), (i)} where the se
mantics of arithmetic operators is defined by the
rule of signs. The abstract execution 1515* 17
==> (+)* (+) ==> ()* (+) ==> (), Proves that
~1515 *17 is a negative number. Abstract interpre
tation is concerned by a particular underlying
structure of the usual universe of computations
(the sign, in our example). It gives a summary of
some facets of the actual executions of a program.
In general this summary is simple to obtain but
inaccurate (e.g. —15]5+—I7 ==> (+)+(+) ==>
()+(+) ==> (1)). Despite its fundamentally in
complete results abstract interpretation allows
the programmer or the compiler to answer ques
tions which do not need full knowledge of program
executions or which tolerate an imprecise answer,
(e.g. partial correctness proofs of programs igno
ring the termination problems, type checking, pro—
gram optimizations which are not carried in the
absence of certainty about their feasibility, ...). 2 . Summary Section 3 describes the syntax and mathematical
semantics of a simple flowchart language, Scott and Strachey[7l]. This mathematical semantics is
used in section 4 to built a more abstract model of
the semantics of programs, in that it ignores the
sequencing of control flow. This model is taken to
be the most concrete of the abstract interpretations
of programs. Section 5 gives the formal definition
of the abstract interpretations of a program. * Attache de Recherche au C.N.R.S., Laboratoire
Associé no 7. ** This work was supported by TRIASESORI under
grants 75035 and 76160. 238 Grenoble cedex, France Abstract program properties are modeled by a com—
plete semilattice, Birkhoff[61l, Elementary pro—
gram constructs are locally interpreted by order
preserving functions which are used to associate a system of recursive equations with a program. The
program global properties are then defined as one
of the extreme fixpoints of that system, TarskifSSl
The abstraction process is defined in section 6. It
is shown that the program properties obtained by
an abstract interpretation of a program are consis—
tent with those obtained by a more refined inter~
pretation of that program. In particular, an ab—
stract interpretation may be shown to be consistent
with the formal semantics of the language. Levels
of abstraction are formalized by showing that con
sistent abstract interpretations form a lattice
(section 7). Section 8 gives a constructive defi—
nition of abstract properties of programs based on
constructive definitions of fixpoints. It shows
that various classical algorithms such as Kildall
[73], WegbreitL75] compute program properties as
limits of finite Kleene[52]'s sequences. Section 9 introduces finite fixpoint approximation methods
to be used when Kleene'ssequences are infinite,
Cousotf76l. They are shown to be consistent with
the abstraction process. Practical examples illus
trate the various sections. The conclusion points
out that abstract interpretation of programs is a
unified approach to apparently unrelated program
analysis techniques. 3. Syntax and Semantics of Programs We will use finite flowcharts as a language inde
pendent representation of programs. 3.] Syntax Of‘a Program A program is built from a set ”Nodes”. Each node has successor and predecessor nodes n—succ, npred : Nodes>2N0desl (me:nsucc(n» <==>(ne npred(m)) Hereafter, we note ls] the cardinality of a set S.
When IS[ = 1 so that S = {xlwe sometimes use S to
denote x. The node subsets ”Entries”, ”Assignments", ”Testsl
"Junctions” and ”Exits” partition the set Nodes.  An entry node (n 6 Entries) has no predecessors
and one successor, ((npred(n) = Q) and
(Insucc(n) = 1)).  An assignment node (n s Assignments) has one
predecessor and one successor ((ln—pred(n) =1)
and (En—succ(n)l = 1)). Let ”Ident” and ”Expr”
be—the distinct syntactic categories of identi
fiers and expressions. An assignment node n as—
signs the value of the right hand—side expres—
sion expr(n) to the left handside identifier
ig(n) expr Assignments 9 Expr id : Assignments e Ident — A test node (neTests) has a predecessor and two
successors,((Enpred(n)l = 1) and (ln—succ(n)
= 2)). The true and false successor nodes are respectively denoted nsucct(n) and nsuccf(n): nsucct, nsuccf : Tests a Nodes (Vn e Tests, n—succ(n)=={n—succt(n),
n—succf(n)l). Let ”Bexpr” be the syntactic category of boo—
lean expressions, each test node n contains a
boolean expression test(n) test = Tests + Bexpr  A junction node (n e Junctions)has one succes
sor and more than one predecessor, ((ln—succ(nﬂ = 1) and (ln—pred(n)l > l)). Immediate predeces— sor nodes of a junction node are not junction
nodES, (Vn E Junctions, Vm E n—pred(n),
not(m e Junctions)). — An exit node n has one predecessor and no succes sor, ((ln—pred(n)\ = 1) and_(n—succ(n) = 0)). The set ”Arcs” of edges of a program is a subset of
Nodes X Nodes defined by
Arcs: {<n,m>‘ (n e Nodes) and (m e nsucc(n))} which may be equivalently definEdwby
Arcs=={<n,m>‘ (m E Nodes) and (n E npred(m))} We will assume that the directed graph <Nodes, Arcs> is connected. We will use the following functions origin, end Arcs » Nodesl (Va 5 Arts, a = <origin(aL
end(a)>)
a—succ Nodes + ZArCS ‘
a—succ(n) = {<n,m>‘ m e n—succ(n)}
apred : Nodes + ZArCS
a—pred(n) = {<m,n>l m e npred(n)} asucct : Tests a Arcs
asucct(n) = <n, n—succt(n)> a—succf : Tests » Arcs
a—succ—f(n) = <n, nsucc—f(n)> EmampZe 239 3.2 Semantics of Programs This section develops a simple ”mathematical seman
tics” of programs, in the style of Scott and
Strachey[7l]. If S is a set we denote S0 the complete lattice
obtained frmnS by adjoining {iS’ Ts} to it, and
imposing the ordering lSS x: TS for all x e S.
Thesemantic domain "Values" is a complete latti
ce which is the sum of the lattice 3001 = {true,
false}0 and some other primitive domains. Environments are used to hold the bindings of identifiers to their values
Env = Icent0 + ValUES We assume that the meaning of an expression
expr e Expr in the environment e e Env is given
by val Uexpr (e) so that val : Expr e [Env + Values]. In particular the projection vall Bexpr of the
function val in domain Bexpr has the functiona—
lity “—— ﬂ Bexpr Bexpr + [Env + Bool].
The state set "States” consists of the set of
all information configurations that can occur
during computations States = Arcs0 X Buy. A state (s 6 States) consists in a control state
(cs(s)) and an environment (env(s)), such that
Vs 6 States, 5 = <csﬂs), env(s)>. We use a continuous conditional function cond(b, e , e2) equal to l, e1, e2 or T respectively as
the value of b is l, true, false or T. We also use
if b then e else e fi to denote cond(b, e , 1 2 —— 1 If e e Env, v 6 Values, x e Ident then
2 [v/X] = Ry. cond(y = x, v, e(y)). The state transition function defines for each
state a next state (we consider deterministic
programs) n—state : States + States nstate(s) =
let n be end(cs(s)), e be_env(s) within caserlin
~ﬁﬂAssiEnments ==>
<a—succ(n),efval [expr(n)ﬂ (e)/id(nﬂ>
Tests ==>__~ «— cond(valfftest(n)j (e) Bexpr,
_:a:schFt?n): e>,<asucc—f(n),e>)
Junctions ==> <a—succ(n), e>
Exits ==> s
esac (Each partial function f on a set S is extended
to a continuous total function on the correspon
ding domain S0 by f(l) = l, f(T) = T and f(x) =l
if the partial function is undefined at x). Let iEnv be the bottom function on Env such that 0 .—
(Vx e Ident , lEnv(x) — LValues>‘ Let I—states be the subset of initial states Istates = {<asucc(m),l >‘ m 6 Entries}
—————— Env  A "Computation sequence" with initial state
iS E l—states is the sequence 8n = nstaten(is) for n = O, 1,... where f0 is the identity function and
fn+1 = f o E“. — The initial to final state transition function 00
nstate States a States
is the minimal fixpoint of the functional
AF. (n—state o F) Therefore n—statea)= (AF .(n—state o F)) YStates+States
where YD(f) denotes the least fixpoint of f : D + D, TarskiiSS]. 4. Static Semantics of Programs The constructive or operational semantics of pro
grams defined in section 3 considers the sequence
in which states occur during execution. The funda—
mental remark of Floydi67l is that to prove static
properties of programs it is often sufficient to
consider the sets of states associated with each
program point. Hence, we define the context Cq at some program
point q 6 Arcs of a program P to be the set of all
environments which may be associated to q in all
the possible computation sequences of P Cq E Contexts = 2Env Cq = (e! (in 2 O, 3 iS e Istates I <q,e> = n—staten(is))} The context vector Cv associates a context to each
of the program points of a program EX 5 Context—Vectors = Arcs0 + Contexts EX = Aq .{ei (3n 2 O, 3 iS e Istates I <q,e> = nstaten(is))} According to the semantics of programs, the con
text gyﬁr) associated to arc r is related to the
contexts Cyﬂq) at arcs q adjacent to r, . . r . .
(end(q) = origin(r), —ﬂ—%———e). From the defini
tion of the state transition function we can prove the equation
Cy(r) = ncontext(r, 91) where n—context : ArcsO X ContexteVectors 9 Contexts is defined by ncontext(r, Cv) =
case origin(r) in Entries ==>iI_ }
Env Assignments u Tests U Junctions ==>
L.) 86 Qy(q)
QE a—pred(origin(r)) envon(r)(n—state(<q,e>)) esac Arcs0 » [States » 2Env1 to be é§(s), iggyﬁs)}, ¢))). (We define env—on
Ar. (ks. cond(r = 240 Since the equation Cv(r) = ncontext(r, Cv) must be valid for each aft. Cv is a solution fo the sys
tEm of "forward’i equations Cv = Fcont(Cv)
where_— _# F—cont : Context—Vectors + ContextVectors is defined by Fcont(gy) = Ar
Context—Vectors is a complete lattice with union U
such that Cv U 9X_ = Ar .(Eyj(r) U 922(r)). ——d F—cont is order preserving for the ordering E of
ContextVectors which is defined by {911
Hence it is known that F—cont has fixpoints, Tarski
[55]. However, it is trivial to exhibit examples
which show that these fixpoints are not always
unique. Fortunately, it can be shown that EX is
included in any solution s to the system of equa
tions X = Fcont(X), (Cv E S). TarskifSS] shows that
this property uniquely—detefmines CV as the least
fixpoint of Fcont. Thus Cv can be—equivalently de
fined by __ . ncontext(r, EX) E Ey2l<==> in e Arcs, EX (r) E.EK (r)l D1 Cv = Aq .{ei (in 2 O, % iS e Istates
__ n .
<q,e> = n—state (13))}
or
2 = _
D CV YContext—Vectors(E;E23£) The concrete context vector EX is such that for any
program point q 5 Arcs of the program P, (a) 92(q) contains at least the environments e
which may be associated to q during any exe— cution of P {Qiz 0, % iS e IStatesl <q,e> nstatel(is)} ==> {8 6 Egﬂq)} (B) Cz(q) contains only the environments e which
may be associated to q during an execution of P
{e 6 gm} => {3i 2 O, % iS 5 Istates ‘
<q,e> = nstatel(is)}
Cv is merely a static summary of the possible exe—
cutions of the program. However, our definitions D]
or D2 of CV cannot be utilized at compile time since
the compufation of Cv consists in fact in running
the program (for all—the possible input data). In
practice compilers may consider states which can
never occur during program execution (e.g. some
compilers consider that any program may always per—
form a division by zero although this is not the
case for most programs). Hence compilers may use
”abstract” contexts satisfying (a) but not necessa—
rily (B), which thereforecorrectly approximate the concrete contexts we considered until now. 5. Abstract Interpretation of Programs
5.] Formal Definition An abstract interpretation Iofa program P is a tuple
I = <A—Cont, o, s , T , l , E23) where the set of abstract contexts is a complete o~semilattice with ordering g, ({x S y} <==> {x o y = yl). This implies that A—Cont has a supre mum T. We suppose also A—Cont to have an infimum l. This implies that ACont is in fact a complete lat—
tice, but we need only one of the two join and meet
operatipns. The set of context vectors is defined
by ACont = Arcs“ » ACont. Whatever (Cv', Cv") e A:Eo'ﬁt2 may be, we define Cv' E Cv" = hr. Cv'(r) o Cv”(r)
Cv' ; Cv" = {Vr e Arcso, Cv'(r) S Cv"(r)}
; = Xr. T and I = Ar .l /‘\_/ N N N N
<A~Cont, o, S , T , l> can be shown to be a com plete lattice. The function 0 f\\/
Int Arcs X ACont eA—Cont defines the interpretation of basic instructions.
If {C(q)] q 6 apred(n)} is the set of input con
texts of node n, then the output context on exit
arc r of n (r E asucc(n)) is equal to Int(r, C).
Tnt_is supposed to be ordermpreserving _—— f‘\./
Va 6 Arcs, V(Cv', Cv") e AContz, {9!} S EXT} ==> {Int(a, 93f) S Int(a, Ebe] The local interpretation of elementary program cons
tructs which is defined by Int is used to associate
a system of equations with the program. We define ’\/
Int : 5:553: + A:BBnt Tht(gy) = Ar. Int(r, 9y) It is easy to show that $31 is orderpreserving.
Hence it has fixpoints, TafskiESSJ. Therefore the
context vector resulting from the abstract inter
pretation I of program P, which defines the global
properties of P, may be chosen to be one of the
extreme solutions to the system of equations c_v = men 5.2 Typology Of‘Abstracf laterpratations The restriction that "ACont" must be a complete
semi—lattice is not drastic since Mac Neille[37]
showed that any partly ordered set S can be embed—
ded in a complete lattice so that inclusion is pre
served, together with all greatest lower bounds and
lowest upper bounds existing in S. Hence in practice
the set of abstract contexts will be a lattice, which
can be considered as a join (U) semilattice or a
meet (n) semi—lattice, thus giving rise to two dual
abstract interpretations. It is a pure coincidence that in most examples (see
5.3.2) the n or U operator represents the effect of
path converging. The real need for this operator is
to define completeness which ensures TEE to have
extreme fixpoints (see 8.4). _—_ The result of an abstract interpretation was defined
as a solution to forward (a) equations the output
contexts on exit arcs of node n are defined as a
function of the input contexts on entry arcs of node
n. One can as well consider a system of backward (+)
equations a context may be related to its succes
sors. Both systems (+, +) may also be combined. Finally we usually consider a maximal (f) or mini—
mal (l) solution to the system of equations, (by
agreement, maximal and minimal are related to the
ordering g defined by (x s y) <==> (x u y = y)
<==> (x n y = X)). However known examples such as
Manna and Shamirf75] show that the suitable solu
tion may be somewhere between the extreme ones. These choices give rise to the following types of
abstract interpretations (ﬂ,*,i) (Uaa’a‘i’) (Uﬁw)
(0,",i) (mat) (mtﬁ) Examples : Kildall[73] uses (n,+,+),
(U,*>,¢)
(mat) Wegbreitf75] uses TenenbaumE74] uses both (U,+,l) and 5.3 Examples 5.3.] Static Semantics Of'Programs The static semanticsof programs we defined in sec
tion 4 is an abstract interpretation ISS = <Contexts, U, E; Env, Q, ncontext> where Contexts, U, Q, Env, @, n—context, Context Vectors, u, C, F—Cont respectively correspond to
_, _______ ,\\, N i. ,\J ACont, o, S, T, l, Int, ACont, o, S, Int. 5.3.2 Data Flow AnaZysis Data flow analysis problems (see references in
Ullman[75]) may be formalized as abstract inter
pretations of programs. "Available expressions" give a classical example.
An expression is available on arc r, if whenever
control reaches r, the value of the expression has
been previously computed, and since the last com
putation of the expression, no argument of the EX’
pression has had its value changed. Let ExprP be the set of expressions occuring in a
program P. Abstract contexts will be sets of
available expressions, represented by boolean vec tors Bvect + {true, false} ExprP Bvect is clearly a complete boolean lattice. The
interpretation of basic nodes is defined by avail(r, Bv)
let n BE origin(r) within
case—n in
EntriES ==> Xe .false
Assignments U Tests U Junctions ===>
Ae.(generatedﬁﬂ@ﬁ or K and EXKP)(€»
_, __ pea:predﬁﬂ
and transparent(n)(e») esac (Nothing is available on entry arcs. An expression
e is available on arc r (exit of node n) if either
the expression e is generated by n or forall prede
cessorsp of n, e is available on p and n does not
modify arguments of e). Theewailable expressions are determined by the ma—
ximal solution (for ordering Xe .false<:ke .true) of the system of equations gala ( B_v) E17: The determination of available expressions, back
dominators, intervals, requires a forward sys
tem of equations. Some global flow problems, nota
bly the live variables and very busy expressions
require propagating information backward through
the program graph, they are examples of backward
systems of equations. 5.3.3 Remarks Our formal definition of abstract interpretations
has the completeness property since the model en
sures the existence of a particular solution to
the system of equations and therefore defines at
least some global property of the program. It must
also have the consistency property, that is define
only correct properties of programs. One can distinguish between syntactic and semantic
abstract interpretations of a program. Syntactic
interpretations are proved to be correct by refe—
rence to the program syntax (e.g. the algorithm for
finding available expressions is justified by rea
soning on paths of the program graph). By contrast
semantic abstract interpretations must be proved to be consistent with the formal semantics of the
language (e.g. constant propagation). 6. Consistent Abstract Interpretations An "abstract" interpretation I. <ACont, 3, E, ?', I, Iﬁf> of a program is consistent with a "con
crete" interpretation I <C—Cont, o, S, T, l, Int>
if the context vector E? resulting from I'is a ESE;
rect approximation of fhe context vector Cv resul
ting from the more refined interpretation—E. This
may be rigorously defined by establishing a corres
pondence ( a abstraction) between copcrete and ab
stract context vectors, andinversely (Y : concreti zation), and requiring
6.0 {c_vE $136)} and {My} In words the abstract context vector must at least
contain the concrete one, (but not only the concrete
one). If f D arD' we note 5 = Arcs0 4 D and 5'==Arcsoa»U
and t 2 B a 5' Ad . (Ar.f(d(r)). We will suppose a and Y to satisfy the following
hypothesis < E} A—Cont » CCont 6.1 OI CCont »m, y:
6.2 o and Y are orderpreserving
6.3 Vx’e KiEBEE; ;’= o(y(§))
6.4 Vx e CCont, x g y(a(x)) Intuitively, hypothesis 6.2 is necessary because
context inclusion (that is property comparison)
must be preserved by the abstraction or concreti
zation process. 6.3 requires that concretization
introduces no loss of informa...
View
Full Document
 Spring '05
 Name

Click to edit the document details