CS143
Handout 14
Summer 2011
July 6
th
, 2011
LALR Parsing
Handout written by Maggie Johnson, revised by Julie Zelenski.
Motivation
Because a canonical LR(1) parser splits states based on differing lookahead sets, it can
have many more states than the corresponding SLR(1) or LR(0) parser.
Potentially it
could require splitting a state with just one item into a different state for each subset of
the possible lookaheads; in a pathological case, this means the entire power set of its
follow set (which theoretically could contain all terminals—yikes!).
It never actually
gets that bad in practice, but a canonical LR(1) parser for a programming language
might have an order of magnitude more states than an SLR(1) parser.
Is there
something in between?
With LALR (
lookahead LR
) parsing, we attempt to reduce the number of states in an
LR(1) parser by merging similar states.
This reduces the number of states to the same as
SLR(1), but still retains some of the power of the LR(1) lookaheads.
Let’s examine the
LR(1) configurating sets from an example given in the LR parsing handout.
S' –> S
S –> XX
X –> aX
X
–> b
I
0
:
S' –> •S, $
S –> •XX, $
X –> •aX, a/b
X –> •b, a/b
I
1
:
S' –> S•, $
I
2
:
S –> X•X, $
X –> •aX, $
X –> •b, $
I
3
:
X –> a•X, a/b
X –> •aX, a/b
X –> •b, a/b
I
4
:
X –> b•, a/b
I
5
:
S –> XX•, $
I
6
:
X –> a•X, $
X –> •aX, $
X –> •b, $
I
7
:
X –> b•, $
I
8
:
X –> aX•, a/b
I
9
:
X –> aX•, $
Notice that some of the LR(1) states look suspiciously similar.
Take
I
3
and
I
6
for
example.
These two states are virtually identical—they have the same number of items,
the core of each item is identical, and they differ only in their lookahead sets.
This
observation may make you wonder if it possible to merge them into one state.
The
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
same is true of
I
4
and
I
7
, and
I
8
and
I
9
. If we did merge, we would end up replacing those
six states with just these three:
I
36
:
X –> a•X, a/b/$
X –> •aX, a/b/$
X –> •b, a/b/$
I
47
:
X –> b•, a/b/$
I
89
:
X –> aX•, a/b/$
But isn’t this just SLR(1) all over again?
In the above example, yes, since after the
merging we coincidentally end up with the complete follow sets as the lookahead.
This
is not always the case however.
Consider this example:
S' –> S
S –> Bbb  aab  bBa
B –> a
I
0
:
S' –> •S, $
S –> •Bbb, $
S –> •aab, $
S –> •bBa, $
B –> •a, b
I
1
:
S' –> S•, $
I
2
:
S –> B•bb, $
I
3
:
S –> a•ab, $
B –> a•, b
....
In an SLR(1) parser there is a shiftreduce conflict in state 3 when the next input is
anything in
Follow(B)
which includes
a
and
b
.
In LALR(1), state 3 will shift on
a
and
reduce on
b
.
Intuitively, this is because the LALR(1) state "remembers" that we arrived
at state 3 after seeing an
a
.
Thus we are trying to parse either
Bbb
or
aab
.
In order for
that first
a
to be a valid reduction to
B
, the next input has to be exactly
b
since that is the
only symbol that can follow
B
in this particular context.
Although elsewhere an
expansion of
B
can be followed by an
a
, we consider only the subset of the follow set
that can appear here, and thus avoid the conflict an SLR(1) parser would have.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 LR, LR parser, LALR

Click to edit the document details