CS143
Handout 11
Summer 2008
July 09, 2008
LALR Parsing
Handout written by Maggie Johnson and revised by Julie Zelenski.
Motivation
Because a canonical LR(1) parser splits states based on differing lookahead sets, it can
have many more states than the corresponding SLR(1) or LR(0) parser.
Potentially it
could require splitting a state with just one item into a different state for each subset of
the possible lookaheads; in a pathological case, this means the entire power set of its
follow set (which theoretically could contain all terminals—yikes!).
It never actually
gets that bad in practice, but a canonical LR(1) parser for a programming language
might have an order of magnitude more states than an SLR(1) parser.
Is there
something in between?
With LALR (
lookahead LR
) parsing, we attempt to reduce the number of states in an
LR(1) parser by merging similar states.
This reduces the number of states to the same as
SLR(1), but still retains some of the power of the LR(1) lookaheads.
Let’s examine the
LR(1) configurating sets from an example given in the LR parsing handout.
S' –> S
S –> XX
X –> aX
X
–> b
I
0
:
S' –> •S, $
S –> •XX, $
X –> •aX, a/b
X –> •b, a/b
I
1
:
S' –> S•, $
I
2
:
S –> X•X, $
X –> •aX, $
X –> •b, $
I
3
:
X –> a•X, a/b
X –> •aX, a/b
X –> •b, a/b
I
4
:
X –> b•, a/b
I
5
:
S –> XX•, $
I
6
:
X –> a•X, $
X –> •aX, $
X –> •b, $
I
7
:
X –> b•, $
I
8
:
X –> aX•, a/b
I
9
:
X –> aX•, $
Notice that some of the LR(1) states look suspiciously similar.
Take I
3
and I
6
for
example.
These two states are virtually identical—they have the same number of items,
the core of each item is identical, and they differ only in their lookahead sets.
This
observation may make you wonder if it possible to merge them into one state.
The
same is true of I
4
and I
7
, and I
8
and I
9
. If we did merge, we would end up replacing those
six states with just these three:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document2
I
36
:
X –> a•X, a/b/$
X –> •aX, a/b/$
X –> •b, a/b/$
I
47
:
X –> b•, a/b/$
I
89
:
X –> aX•, a/b/$
But isn’t this just SLR(1) all over again?
In the above example, yes, since after the
merging we coincidentally end up with the complete follow sets as the lookahead.
This
is not always the case however.
Consider this example:
S' –> S
S –> Bbb  aab  bBa
B –> a
I
0
:
S' –> •S, $
S –> •Bbb, $
S –> •aab, $
S –> •bBa, $
B –> •a, b
I
1
:
S' –> S•, $
I
2
:
S –> B•bb, $
I
3
:
S –> a•ab, $
B –> a•, b
....
In an SLR(1) parser there is a shiftreduce conflict in state 3 when the next input is
anything in
Follow(B)
which includes
a
and
b
.
In LALR(1), state 3 will shift on
a
and
reduce on
b
.
Intuitively, this is because the LALR(1) state "remembers" that we arrived
at state 3 after seeing an
a
.
Thus we are trying to parse either
Bbb
or
aab
.
In order for
that first
a
to be a valid reduction to
B
, the next input has to be exactly
b
since that is the
only symbol that can follow
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 potter
 Physiology, LR, LR parser, LALR

Click to edit the document details