The role of the parser
code
source
tokens
errors
scanner
parser
IR
Parser
•
performs contextfree syntax analysis
•
guides contextsensitive analysis
•
constructs an intermediate representation
•
produces meaningful error messages
•
attempts error correction
Copyright c
±
2007 by Antony L. Hosking.
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not made or distributed for proFt or
commercial advantage and that copies bear this notice and full citation on the Frst page. To copy otherwise, to
republish, to post on servers, or to redistribute to lists, requires prior speciFc permission and/or fee. Request
permission to publish from [email protected]
CS502
Parsing
1
Syntax analysis
Contextfree syntax
is speciFed with a
contextfree grammar
.
±ormally, a C±G
G
is a 4tuple
(
V
t
,
V
n
,
S
,
P
)
, where:
V
t
is the set of
terminal
symbols in the grammar.
±or our purposes,
V
t
is the set of tokens returned by the scanner.
V
n
,
the
nonterminals
, is a set of syntactic variables that denote sets of (sub)strings
occurring in the language.
These are used to impose a structure on the grammar.
S
is a distinguished nonterminal
(
S
∈
V
n
)
denoting the entire set of strings in
L
(
G
)
.
This is sometimes called a
goal symbol
.
P
is a Fnite set of
productions
specifying how terminals and nonterminals can be
combined to form strings in the language.
Each production must have a single nonterminal on its left hand side.
The set
V
=
V
t
∪
V
n
is called the
vocabulary
of
G
CS502
Parsing
2
Notation and terminology
•
a
,
b
,
c
,...
∈
V
t
•
A
,
B
,
C
∈
V
n
•
U
,
V
,
W
∈
V
•
!
,
"
,
#
∈
V
*
•
u
,
v
,
w
∈
V
*
t
If
A
→
#
then
!
A
"
⇒
!#"
is a
singlestep derivation
using
A
→
#
Similarly,
⇒
*
and
⇒
+
denote derivations of
≥
0
and
≥
1
steps
If
S
⇒
*
"
then
"
is said to be a
sentential form
of
G
L
(
G
)=
{
w
∈
V
*
t

S
⇒
+
w
}
,
w
∈
L
(
G
)
is called a
sentence
of
G
Note,
L
(
G
{
"
∈
V
*

S
⇒
*
"
}
∩
V
*
t
CS502
Parsing
3
Syntax analysis
Grammars are often written in BackusNaur form (BN±).
Example:
1
²
goal
³
::
=
²
expr
³
2
²
expr
³
::
=
²
expr
³²
op
³²
expr
³
3

num
4

id
5
²
op
³
::
=+
6


7

*
8

/
This describes simple expressions over numbers and identiFers.
In a BN± for a grammar, we represent
1. nonterminals with angle brackets or capital letters
2. terminals with
typewriter
font or underline
3. productions as in the example
CS502
Parsing
4
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentScanning vs. parsing
Where do we draw the line?
term
::
=[
a

zA

z
]([
a

zA

z
]

[
0

9
])
*

0

[
1

9
][
0

9
]
*
op
::
=+



*

/
expr
::
=(
term op
)
*
term
Regular expressions are used to classify:
•
identiFers, numbers, keywords
•
REs are more concise and simpler for tokens than a grammar
•
more efFcient scanners can be built from REs (D±As) than grammars
Contextfree grammars are used to count:
•
brackets:
()
,
begin
...
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Antony,H
 Formal grammar, LL, Bottomup parsing, LR parser, Topdown parsing

Click to edit the document details