CS143
Handout 11
Summer 2008
July 09, 2008
LALR Parsing
Handout written by Maggie Johnson and revised by Julie Zelenski.
Motivation
Because a canonical LR(1) parser splits states based on differing lookahead sets, it can
have many more states than the corresponding SLR(1) or LR(0) parser.
Potentially it
could require splitting a state with just one item into a different state for each subset of
the possible lookaheads; in a pathological case, this means the entire power set of its
follow set (which theoretically could contain all terminals—yikes!).
It never actually
gets that bad in practice, but a canonical LR(1) parser for a programming language
might have an order of magnitude more states than an SLR(1) parser.
Is there
something in between?
With LALR (
lookahead LR
) parsing, we attempt to reduce the number of states in an
LR(1) parser by merging similar states.
This reduces the number of states to the same as
SLR(1), but still retains some of the power of the LR(1) lookaheads.
Let’s examine the
LR(1) configurating sets from an example given in the LR parsing handout.
S' –> S
S –> XX
X –> aX
X
–> b
I
0
:
S' –> •S, $
S –> •XX, $
X –> •aX, a/b
X –> •b, a/b
I
1
:
S' –> S•, $
I
2
:
S –> X•X, $
X –> •aX, $
X –> •b, $
I
3
:
X –> a•X, a/b
X –> •aX, a/b
X –> •b, a/b
I
4
:
X –> b•, a/b
I
5
:
S –> XX•, $
I
6
:
X –> a•X, $
X –> •aX, $
X –> •b, $
I
7
:
X –> b•, $
I
8
:
X –> aX•, a/b
I
9
:
X –> aX•, $
Notice that some of the LR(1) states look suspiciously similar.
Take I
3
and I
6
for
example.
These two states are virtually identical—they have the same number of items,
the core of each item is identical, and they differ only in their lookahead sets.
This
observation may make you wonder if it possible to merge them into one state.
The
same is true of I
4
and I
7
, and I
8
and I
9
. If we did merge, we would end up replacing those
six states with just these three: