The role of the parser
code
source
tokens
errors
scanner
parser
IR
Parser
•
performs contextfree syntax analysis
•
guides contextsensitive analysis
•
constructs an intermediate representation
•
produces meaningful error messages
•
attempts error correction
Copyright c
±
2007 by Antony L. Hosking.
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not made or distributed for proFt or
commercial advantage and that copies bear this notice and full citation on the Frst page. To copy otherwise, to
republish, to post on servers, or to redistribute to lists, requires prior speciFc permission and/or fee. Request
permission to publish from hosking@cs.purdue.edu.
CS502
Parsing
1
Syntax analysis
Contextfree syntax
is speciFed with a
contextfree grammar
.
±ormally, a C±G
G
is a 4tuple
(
V
t
,
V
n
,
S
,
P
)
, where:
V
t
is the set of
terminal
symbols in the grammar.
±or our purposes,
V
t
is the set of tokens returned by the scanner.
V
n
,
the
nonterminals
, is a set of syntactic variables that denote sets of (sub)strings
occurring in the language.
These are used to impose a structure on the grammar.
S
is a distinguished nonterminal
(
S
∈
V
n
)
denoting the entire set of strings in
L
(
G
)
.
This is sometimes called a
goal symbol
.
P
is a Fnite set of
productions
specifying how terminals and nonterminals can be
combined to form strings in the language.
Each production must have a single nonterminal on its left hand side.
The set
V
=
V
t
∪
V
n
is called the
vocabulary
of
G
CS502
Parsing
2
Notation and terminology
•
a
,
b
,
c
,...
∈
V
t
•
A
,
B
,
C
∈
V
n
•
U
,
V
,
W
∈
V
•
!
,
"
,
#
∈
V
*
•
u
,
v
,
w
∈
V
*
t
If
A
→
#
then
!
A
"
⇒
!#"
is a
singlestep derivation
using
A
→
#
Similarly,
⇒
*
and
⇒
+
denote derivations of
≥
0
and
≥
1
steps
If
S
⇒
*
"
then
"
is said to be a
sentential form
of
G
L
(
G
)=
{
w
∈
V
*
t

S
⇒
+
w
}
,
w
∈
L
(
G
)
is called a
sentence
of
G
Note,
L
(
G
{
"
∈
V
*

S
⇒
*
"
}
∩
V
*
t
CS502
Parsing
3
Syntax analysis
Grammars are often written in BackusNaur form (BN±).
Example:
1
²
goal
³
::
=
²
expr
³
2
²
expr
³
::
=
²
expr
³²
op
³²
expr
³
3

num
4

id
5
²
op
³
::
=+
6


7

*
8

/
This describes simple expressions over numbers and identiFers.
In a BN± for a grammar, we represent
1. nonterminals with angle brackets or capital letters
2. terminals with
typewriter
font or underline
3. productions as in the example
CS502
Parsing
4