Unformatted text preview: Introduction to Compiler Design
Lexical Analysis II Professor YiPing You Department of Computer Science http://www.cs.nctu.edu.tw/~ypyou/ Introduction to Compiler Design, Spring 2010 Page 1 Outline Outline
Finite Automata
Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata
Regular Expression to NFA
Thompson’s Construction NFA to DFA
Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFABased Pattern Matcher
Introduction to Compiler Design, Spring 2010 Page 2 Review: Review: Transition Diagram
Used in a lexical analyzer to recognize a token token
digit id → letter (letter  digit)*
letter or digit start letter * 9 10 other 11 return(id) Introduction to Compiler Design, Spring 2010 Page 3 Finite Finite (State) Automata
Similar to Transition Diagrams Recognizers Recognizers which simply say “yes” or “no” about or about each possible input string Finite automata
A = (S, Σ, s0, F, T) S: all states in the FA FA Σ: all symbols accepted by the language S0: start state F: accepting states T: all transitions
S × Σ → S ( or S × Σ ∪ {ε} → S ) Why finite?
Introduction to Compiler Design, Spring 2010 Page 4 Finite Finite (State) Automata (Cont’d)
Processing input string
Starting Starting from s0, for each input character, make a transition on the automata
If If no transition possible for the input character th → reject When the input string is fully consumed cons
If at a final state → accept Otherwise → reject
Input string s FA representing RE Accept/Reject If accept then s ∈ L(RE)
L(RE): the language defined by RE
Introduction to Compiler Design, Spring 2010 Page 5 Finite Finite Automata State Graphs
A state The start state An accepting state
a A transition
Introduction to Compiler Design, Spring 2010 A B A×a→B Page 6 A Simple Example Simple
A finite automaton that accepts only “1”
1 A finite automaton accepts a string if we finite can follow transitions labeled with the characters in the string from the start to some accepting state
Introduction to Compiler Design, Spring 2010 Page 7 Another Another Simple Example
A finite automaton accepting any number of of 1’s followed by a single 0 single Alphabet: {0,1}
1 0 Introduction to Compiler Design, Spring 2010 Page 8 Epsilon Epsilon Moves
Another kind of transition: εmoves
ε
A B Machine can move from state A to state B without reading input Introduction to Compiler Design, Spring 2010 Page 9 Two Two Kinds of FA
Nondeterministic Finite Automata (NFA)
Can have multiple transitions for one input in a given state α Can have εmoves
α ε Deterministic Finite Automata (DFA)
One transition per input per state No εmoves
α NFAs and DFAs recognize the same set of languages (regular languages)
Introduction to Compiler Design, Spring 2010 Page 10 NFA NFA and DFA
(ab)*abb NFA
start a 0 b a 1 b 2 b 3 DFA
b start 0 a 1 a
Introduction to Compiler Design, Spring 2010 b a b 2 a
Page 11 b 3 NFA NFA vs DFA
DFA
Action on each input is fully determined
Each state has exact Σ outgoing transitions Easier to implement
There are no choices to consider NFA
May have choices at each step Accepts string if there is ANY path to an accepting Accepts state More complex in implementation
May need to backtrack May end up exploring all the paths in the NFA Introduction to Compiler Design, Spring 2010 Page 12 Acceptance Acceptance of NFAs
An NFA can get into multiple states Input: 1 0 1 Rule: Rule: NFA accepts if it can get in a final state
1 0 1 Introduction to Compiler Design, Spring 2010 0
Page 13 NFA NFA vs DFA (Cont’d)
For a given language the NFA can be simpler simpler than the DFA
1
NFA 0 0 DFA DFA can be exponentially larger than NFA
Introduction to Compiler Design, Spring 2010 Page 14 0 1 0 1 1 0 0 Transition Transition Table
Rows are input symbols Columns are current states Entries are resulting states
a start 0 b a 1 b
State 0 1 2 3 2 a b 1} {0, 1} {0} Ø {2} {} Ø {3} Ø b 3 Ø ε Ø Ø Ø Ø Along with the table, a starting state and a set of accepting states are also given
Introduction to Compiler Design, Spring 2010 Page 15 Transition Transition Table (Cont’d)
Pro
We We can easily find the transitions on a given table and input Con
It takes a lot of space, when the input alphabet is large, yet most states do not have any moves on most of the input symbols A more compact representation of transition table table is discussed in Section 3.9.8 Introduction to Compiler Design, Spring 2010 Page 16 Simulating Simulating a DFA
(ab)*abb
b start 0 a 1 a
s = s0; c = nextChar(); while (c != eof) { s = move(s,c); c = nextChar(); nextChar } if (s is in F) return “yes”; else return “no”;
Introduction to Compiler Design, Spring 2010 Page 17 b a b 2 a b 3 Recognition Recognition of Regular Expression Using NFA Simulating NFA
Backtrack/Backup:
remember next alternative configuration (current input & next alternative state) when alternative choices are possible Parallelism:
trace every possible alternatives in parallel Lookahead:
look more input symbols to make it deterministic Introduction to Compiler Design, Spring 2010 Page 18 Simulating Simulating an NFA
(ab)*abb
start a 0 b
S = εclosure(s0); c = nextChar(); while (c != eof) { S = εclosure(move(S,c)); c = nextChar(); nextChar } if (S ∩ F != Ø) return “yes”; else return “no”;
Introduction to Compiler Design, Spring 2010 Page 19 a 1 b 2 b 3 Outline Outline
Finite Automata
Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata
Regular Expression to NFA
Thompson’s Construction NFA to DFA
Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFABased Pattern Matcher
Introduction to Compiler Design, Spring 2010 Page 20 Regular Regular Expressions to Finite Automata
Highlevel sketch
Thompson’s construction (3.7.4) Regular Expression NFA Subset Construction (3.7.1) DFA Lexical Specification Tabledriven Implementation of DFA
Page 21 Introduction to Compiler Design, Spring 2010 Regular Regular Expressions to NFA
Highlevel sketch
Thompson’s construction Regular Expression NFA Subset Construction DFA Lexical Specification Tabledriven Implementation of DFA
Page 22 Introduction to Compiler Design, Spring 2010 Regular Regular Expressions to NFA
Its possible to construct an NFA from a regular regular expression Thompson’s construction algorithm
Build the NFA inductively Define rules for each base RE Combine for more complex RE’s i RE f Introduction to Compiler Design, Spring 2010 Page 23 Review: Review: How to Describe Tokens
Use regular expressions (REs) to describe programming programming language tokens! A regular expression is defined inductively
a ε RS RS R* ordinary character stands for itself empty string either R or S (alteration or union), where R,S = RE R followed by S (concatenation) concatenation of R zero or more times (Kleene closure) Introduction to Compiler Design, Spring 2010 Page 24 Thompson’s Thompson’s Construction (1/3)
For expression ε
start i ε f For any subexpression a in Σ
start i a f Introduction to Compiler Design, Spring 2010 Page 25 Thompson’s Thompson’s Construction (2/3)
Suppose N(s) and N(t) are NFA’s for RE s and t, respectively respectively r = ( s)
L(r) = L(s) N(r) = N(s) r = st (Alternation) ε
i N(s) start ε
f ε N(t) ε Introduction to Compiler Design, Spring 2010 Page 26 Thompson’s Thompson’s Construction (3/3)
r = st (Concatenation)
start i N(s) N(t) f r = s* (Kleene closure)
ε
start i ε N(s) ε ε f Introduction to Compiler Design, Spring 2010 Page 27 Precedence Precedence of Operators Operators
Level of precedence
Kleene closure (*), ?, + (highest level) concatenation alternation () (lowest level) Ex: a*bcd* = ((a*)b)(c(d*)) (( All operators are left associative
Ex: abc = (ab)c b abc = (ab)c Introduction to Compiler Design, Spring 2010 Page 28 RE RE to NFA: An Example (1/4)
Parse tree of (ab)*abb
r11 r9 r7 r5 r4 ( r1 a r3  ) r2 b
Page 29 r10 r8 b r6 * a b Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (2/4) (2
(ab)*abb a (r1) start b (r2) (ab) (r4) 
start 1 start 4 2 a 3 b 5 ε ε 2 a b 3 ε
6 4 5 ε
Page 30 Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (3/4)
(ab)* (r5)
start 0 ε ε
2 a b ε ε a b ε
3 ε 3 ε
6 1 ε 7 ε 4 5 ε (ab)*a (r7) 
start 0 ε ε
1 2 ε
6 ε 7 a 8 ε 4 5 ε
Page 31 Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (4/4)
(ab)*abb (r11)
ε
start 0 ε ε
1 2 a b ε 3 ε
6 ε 7 a 8 b 9 b 10 ε 4 5 ε Introduction to Compiler Design, Spring 2010 Page 32 Properties Properties of the Constructed NFA
N(r) has at most twice as many as states as there are are operators and operands in r N(r) has
one one start state
No incoming transitions one accepting state
No outgoing transitions Each state of N(r) other than the accepting state Each has either
One One outgoing transition on a symbol in Σ ∪ {ε} or Two outgoing transitions, both on ε Introduction to Compiler Design, Spring 2010 Page 33 Regular Regular Expressions to Finite Automata
Highlevel sketch
Thompson’s construction Regular Expression NFA Subset Construction DFA Lexical Specification Tabledriven Implementation of DFA
Page 34 Introduction to Compiler Design, Spring 2010 NFA NFA to DFA: The Basic Concept
Merge NFA’s multiple states, which are connected connected by εtransitions
start 1 ε start 2 12 start 1 ε 2 a ε 3 ε start 4
124 a 34 Introduction to Compiler Design, Spring 2010 Page 35 Operations Operations on NFA States
εclosure(s)
A set of NFA states reachable from NFA state s on εtransition alone εclosure(T)
A set of NFA states reachable from some NFA state s in set T on εtransition alone εclosure(T) = ∪s in T εclosure(s) move(T, a)
A set of NFA states to which there is a transition on input symbol a from some state s in set T
Introduction to Compiler Design, Spring 2010 Page 36 Converting Converting NFA to DFA
Subset construction
Construct the initial state of the DFA
By finding the εclosure of the initial state From a state T in the DFA, for each input character a
Find the εclosure of the set of states in move(T, a), say U Make U a state in DFA if it is not there yet
– If U contains at least one final state in NFA, then mark it as a final state in DFA Make T × a → U a transition in DFA Repeat the step above for all states in DFA that has not been processed yet (use a stack to keep track of) Introduction to Compiler Design, Spring 2010 Page 37 The The Algorithm of Subset Construction
add εclosure(s0) as an unmarked state to Dstates; while (there is an unmarked state T in Dstates) { mark T; for (each input symbol a) { U = εclosure(move(T, a)); if (U is not in Dstates) { add U as an unmarked state to Dstates; mark as final if U contains the original final state; } Dtran[T, a] = U; } }
Introduction to Compiler Design, Spring 2010 Page 38 NFA NFA to DFA: An Example (1/4)
(ab)*abb
start 0 ε
2 ε ε
1 a b ε 3 ε
6 ε 7 a 8 b 9 b 10 ε 4 5 ε add εclosure(s0) as an unmarked state to Dstates; as while (there is an unmarked state T in Dstates) { mark T; εclosure({0}) = {0, 1, 2, 4, 7} = A
Introduction to Compiler Design, Spring 2010 A Dstates
Page 39 NFA NFA to DFA: An Example (2/4)
ε
start 0 ε ε
1 2 a b ε 3 ε
6 ε 7 a 8 b 9 b 10 ε 4 5 ε for (each input symbol a) { U = εclosure(move(T, a)); if (U is not in Dstates) { add U as an unmarked state to Dstates; mark as final if U contains the original final state; } Dtran[T, a] = U; A B Dstates move(A, a) = move({0, 1, 2, 4, 7},a) = {3,8} 7} {3 εclosure(move(A, a))=εclosure({3,8}) = {1,2,3,4,6,7,8} = B Dtran[A, a] = B
Introduction to Compiler Design, Spring 2010 Page 40 NFA NFA to DFA: An Example (3/4) (3
ε
start 0 ε ε
1 2 a b ε 3 ε
6 ε 7 a 8 b 9 b 10 ε 4 5 ε A B B C Dstates move(A, b) = move({0, 1, 2, 4, 7},b) = {5} εclosure(move(A, b))=εclosure({5}) = {1,2,4,5,6,7} = C Dtran[A, b] = C move(B, a) = move({1,2,3,4,6,7,8},a) = {3,8} εclosure(move(B, a))=εclosure({3,8}) = B Dtran[B, a] = B move(B, b) = move({1,2,3,4,6,7,8},b) = {5,9} εclosure(move(B, b))=εclosure({5,9}) = {1,2,4,5,6,7,9} = D Dtran[B, b] = D
Introduction to Compiler Design, Spring 2010 Page 41 NFA NFA to DFA: An Example (4/4)
Transition table Dtran
NFA State {0,1,2,4,7} {1,2,3,4,6,7,8} {1,2,4,5,6,7} {1,2,4,5,6,7,9} {1,2,4,5,6,7,10} DFA State A B C D E a B B B B B b C D C E C b b
A C start a
B a a b b
D b E a a
Page 42 Introduction to Compiler Design, Spring 2010 Simulating Simulating an NFA
(ab)*abb
start a 0 b
S = εclosure(s0); c = nextChar(); while (c != eof) { S = εclosure(move(S,c)); c = nextChar(); nextChar } if (S ∩ F != Ø) return “yes”; else return “no”;
Introduction to Compiler Design, Spring 2010 Page 43 a 1 b 2 b 3 Properties Properties of the Converted DFA
Each state of the constructed DFA corresponds corresponds to a set of NFA states It is possible that the number of DFA states is exponential in the number of NFA states However for However, for real languages, the NFA and the DFA have approximately the same number of states of Introduction to Compiler Design, Spring 2010 Page 44 Observations Observations on DFA’s
Many DFA’s recognize the same language
E.g., E.g., both of the following DFA’s recognize language L((ab)*abb)
b
b b start 0 a 1 a a b 2 a b 3 b
start A C a a
B a b b
D b E a a We would generally prefer a DFA with as few states as possible
E.g., states A and C of the right figure are equivalent states
Introduction to Compiler Design, Spring 2010 Page 45 Regular Regular Expressions to Finite Automata
Highlevel sketch
Thompson’s construction Regular Expression NFA Subset Construction DFA →
Minimized DFA (Section 3.9.6) Lexical Specification Tabledriven Implementation of DFA
Page 46 Introduction to Compiler Design, Spring 2010 DFA DFA Minimization: An Example
DFA State A B C D E a B B B B B
Merge A and C b C D C E C b b
A C start a
B a a b b
D b E a a DFA State A B C D E a B B B B B b A C D C A E C A b
start A a B a b D b E b a a
Page 47 Introduction to Compiler Design, Spring 2010 DFA DFA Minimization: Another Example
The previous method do not always get the minimal minimal DFA
0 b 3 a 1 a b 2 b 4 b
0 1 2 3 4 a 1 4 b 3 2 2 4 Cannot merge Further? Actually can be minimized further Actually can be minimized further
0 a b 1 b a 3 2 b Introduction to Compiler Design, Spring 2010 Page 48 DFA DFA Minimization
How to identify states that can be merged?
Final Final states and nonfinal states can never be merged Starting from states s and t, for all strings x ll
If the acceptance decision is always the same, th then s and t are indistinguishable (equivalent) Introduction to Compiler Design, Spring 2010 Page 49 Distinguishability Distinguishability between States
String x distinguishes state s from state t if exactly one of the states reached from s and t by following the path with with label x is an accepting state is State s is distinguishable from state t if there is some string string that distinguishes them
E.g., empty string distinguishes any accepting state from any nonaccepting state E.g., string bb distinguishes state A from state B string b b
A C start a
B a a b b
D b E a
Introduction to Compiler Design, Spring 2010 a
Page 50 DFA DFA Minimization
Method Initialization: Initialization:
Divide the states into two groups:
– final states – nonfinal states Division within a group G
If for every input symbol a, two states s and t in G have transitions on a to the same group, then s and t stay in the same group Otherwise, divide G and put s and t to different groups Repeat the division, until no changes on grouping
Introduction to Compiler Design, Spring 2010 Page 51 DFA DFA Minimization: An Example
b 1. Final states: {E} C Nonfinal states: {A,B,C,D} , b b a a 2. Try to split {A,B,C,D} start a b b A B D A,B,C,D all go to {A,B,C,D} on a a A,B,C go to {A,B,C,D} on b a D goes to {E} on b {A,B,C},{D} ,{E} 3. Try to split {A,B,C} b A,B,C all go to {A,B,C} on a a start A,C go to {A,B,C} on b aB b Db A B goes to {D} on b a b a {A,C},{B},{D} ,{E} 4. A,C go to the same states on each input go
Introduction to Compiler Design, Spring 2010 E E Page 52 Outline Outline
Finite Automata
Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata
Regular Expression to NFA
Thompson’s Construction NFA to DFA
Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFABased Pattern Matcher
Introduction to Compiler Design, Spring 2010 Page 53 Efficiency Efficiency of StringProcessing Algorithms StringAlgorithm 3.18 (DFA simulation): O(x) Algorithm 3.22 (NFA simulation): O(x*size of graph) One issue that may favor NFA
Subset construction can exponentiate the number of states, in the worst case Reading in Section 3.7.3 (pp.157) and Section 3.7.5 (pp.163) Automaton NFA DFA typical case DFA worst case
Introduction to Compiler Design, Spring 2010 Initial O(r) O(r3) O(r32r) Per String O(r×x) O(x) O(x)
Page 54 DFA DFA or NFA? Which One?
If the stringprocessor is to be executed many times, then the cost of converting to then a DFA is worthwhile In grep, it may be more efficient to skip the step of constructing a DFA and simulate the NFA directly
Ex: Ex:
grep “finite automata” file1 grep “auto*” files fil Introduction to Compiler Design, Spring 2010 Page 55 Outline Outline
Finite Automata
Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata
Regular Expression to NFA
Thompson’s Construction NFA to DFA
Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFABased Pattern Matcher
Introduction to Compiler Design, Spring 2010 Page 56 Design Design of a LexicalAnalyzer Generator LexicalInput buffer lexeme lexemeBegin forward Automaton simulator P1 {action A1} P2 {action A2} … Pn {action An} Lex Program Lex Compiler Transition table Actions Introduction to Compiler Design, Spring 2010 Page 57 Scanner Scanner using FA  Ambiguity Resolution (1/3)
Longest match
Continue Continue till there is no further transition feasible for the next input Implies the need to “lookahead” before accepting E.g., for the following DFA, input is abbbbccd Need to look ahead to determine when to accept
After processing abbbb, need to determine whether to accept or to go to the next state If the following string is cca then go for longest match and accept at state 6 If the following string is not cca then accept at state 3 b
start 1 Lookahead a2 b 3 c 4 c 5 a 6
Page 58 Introduction to Compiler Design, Spring 2010 Scanner Scanner using FA  Ambiguity Resolution (2/3)
First match
Patterns are listed sequentially When conflict, choose the one that is listed first E.g.
P1: abc P2: abc* Input: abc
– Can be P1 or P2 – Use first match rule, should be P1 Should arrange the REs properly th RE
E.g., keywords REs should appear before the id RE In there In practice, there is no need to have DFA with all keywords in it
– Reducing the number of states to save space Introduction to Compiler Design, Spring 2010 Page 59 Scanner Scanner using FA  Ambiguity Resolution (3/3) Lookahead
E.g., in Fortran, a keyword can also be an in keyword identifier
IF (a = b) THEN … b) THEN IF (I, J) = 3 IF = 1.0 Previous rules (longest match and first match) do not resolve this problem Solution: allow specification of a lookahead RE
r1/r2 E.g., IF / \( .* \) letter {return IF} Otherwise, return identifier
Introduction to Compiler Design, Spring 2010 Page 60 Pattern Pattern Matching Based on NFA’s
Combine all NFA’s into one
Lex program
P1 {action A1} P2 {action A2} … Pn {action An} N(p1) ε ε
s0 N(p2) … N(pn) ε Introduction to Compiler Design, Spring 2010 Page 61 Pattern Pattern Matching (NFA): An Example
a abb a * b+ {action A1 for pattern p1} {action A2 for pattern p2} pattern {action A3 for pattern p3}
1 a 2 ε
start 0 ε ε 3 a 4 b 5 b 6 7 b b 8 a
Introduction to Compiler Design, Spring 2010 Page 62 Pattern Pattern Matching (NFA): An Example
Input: aaba
start 0 ε ε ε 1 3 7 a a b 2 4 8 b 5 b 6 Processing sequence
a
0 1 3 7 a b b a*b+
8 a
2 4 7 a
7 a n on e a * b+ {action A3 for pattern p3}
Page 63 Introduction to Compiler Design, Spring 2010 Pattern Pattern Matching Based on DFA’s
The NFA for all patterns are converted into an equivalent equivalent DFA Simulate the DFA until at some point there is no next next state (next state is a dead state)
start 0137 a a a
7 a
247 aaba b b b
8 b b b 68 58 a*b+ abb a*b+
Page 64 Introduction to Compiler Design, Spring 2010 Implementation Implementation of DFA’s
A DFA can be implemented by a 2D table T
One dimension is “states” Other dimension is “input symbols” For every transition Si →a Sk define T[i,a] = k DFA “execution”
If in state Si and input a, read T[i,a] = k and skip to state Sk skip Very efficient Introduction to Compiler Design, Spring 2010 Page 65 Implementing Implementing the Lookahead Operator
IF / \( .* \) letter any
start 0 I 1 F 2 ε (/) 3 ( 4 ) 5 letter 6 Introduction to Compiler Design, Spring 2010 Page 66 Automatic Automatic Generation of Lexers
2 programs developed at Bell Labs in mid 70’s for use with UNIX
Lex  transforms an input stream into the alphabet of the grammar processed by yacc
Flex = fast lex, later developed by Free Software Foundation Yacc – yet another compilercompiler Input to lexer generator
List of regular expressions in priority order Associated action with each RE Output
Program Program that reads input stream and breaks it up into tokens according the the REs Introduction to Compiler Design, Spring 2010 Page 67 How How Does Lex Work?
Lex / Flex Lex program (REs for Tokens) (input) RE NFA NFA DFA Optimize DFA Character Stream DFA Simulation
C program (output) Token stream (and errors) Introduction to Compiler Design, Spring 2010 Page 68 Rules Rules for Pattern Matching
REs alone not enough, need rules for choosing choosing when we get multiple matches
Longest matching token wins Ties in length resolved by priorities
Token specification order often defines priority RE’s + priorities + longest matching token rule = definition of a lexer Introduction to Compiler Design, Spring 2010 Page 69 Outline Outline
Finite Automata
Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata
Regular Expression to NFA
Thompson’s Construction NFA to DFA
Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFABased Pattern Matcher
Introduction to Compiler Design, Spring 2010 Page 70 Converting Converting a RE Directly to a DFA
Sections 3.9.13.9.5 Method
Construct a syntax tree T from the argumented regular expression (r)# Compute nullable, firstpos, lastpos, and followpos for T Construct Dstates and Dtran by Figure 3.62 Construct (pp.180) Introduction to Compiler Design, Spring 2010 Page 71 Syntax Syntax Tree Construction
Each leaf not labeled ε is attached a unique integer, integer, called position o (ab)*abb#
o o b 4 a 3 b 5 # 6 o *  a 1 b 2 Introduction to Compiler Design, Spring 2010 Page 72 Function Function Computation
n: a syntax node nullable(n)
True iff n can generate ε firstpos(n)
A set of positions (correspond to a set of symbols, which are the first symbols of the strings that n can generate) lastpos(n)
A set of positions (correspond to a set of symbols, which are the last symbols of the strings that n can generate) ge last followpos(p)
p: a position A set of positions (correspond to a set of symbols, which are the symbols that can be followed by position p) Introduction to Compiler Design, Spring 2010 Page 73 Rules Rules for Computing nullable, firstpos, and lastpos nullable firstpos lastpos Node n A leaf labeled ε A leaf with position i An ornode n = c1c2 An catnode n = c1c2 nullable(n) true false nullable(c1) or nullable(c2) nullable(c1) and nullable(c2) firstpos(n) ø {i} firstpos(c1) ∪ firstpos(c2) if (nullable(c1)) firstpos(c1) ∪ firstpos(c2) else firstpos(c1) firstpos(c1) lastpos(n) ø {i} lastpos(c1) ∪ lastpos(c2) if (nullable(c2)) lastpos(c1) ∪ lastpos(c2) else lastpos(c2) lastpos(c1) An starnode n = c1* true Introduction to Compiler Design, Spring 2010 Page 74 Computing followpos Computing followpos
Two ways that a position of a regular expression can can be made to follow another
If n is a catnode (with left child c1 and right child c2), then for every position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i) If n is a starnode, and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i) Introduction to Compiler Design, Spring 2010 Page 75 Continuing Continuing the Example
(ab)*abb#
{1,2,3} o {6} {6} {1,2,3} o {5} {1,2,3} o {4} {1,2,3} o {3} {1,2} * {1,2} {1,2}  {1,2} {1} a {1} {2} b {2} 1 2
Introduction to Compiler Design, Spring 2010 {6} # {6} 6 {5} b {5} 5 Position n 1 2 3 4 5 6 followpos(n) {1,2,3} {1,2,3} {4} {5} {6} ø
Page 76 {4} b {4} 4 {3} a {3} 3 RE RE to DFA Algorithm (Figure 3.62)
S0 = firstpos(root) where root is the root of the syntax tree Dstates = {s0} and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a ∈ Σ do let U be the set of positions that are in followpos(p) for some position p in T, some position such that the symbol at position p is a if U is not empty and not in Dstates then not empty and not in Dstates add U as an unmarked state to Dstates end if Dtran[T,a] = U end do end do
Page 77 Introduction to Compiler Design, Spring 2010 Continuing Continuing the Example
firstpos(root) ={1,2,3}=A Dtran[A,a]=followpos(1)∪followpos(3) = {1,2,3,4}=B , Dtran[A,b]=followpos(2)={1,2,3}=A Dtran[B,a]= followpos(1)∪followpos(3) = B , Dtran[B,b]= followpos(2)∪followpos(4) = {1,2,3,5}=C Dtran[C,a]= … Dtran[C,b]= … …
b a a
1234 b start
123 b 1235 b 1236 a a Introduction to Compiler Design, Spring 2010 Page 78 Trading Trading Time for Space in DFA Simulation
The simplest and fastest way to represent the transition function of a DFA is a twodimensional table indexed by states and characters
A typical lexical analyzer has several hundred states in its DFA and involves the ASCII alphabet of 128 input characters
The array consumes less than a megabyte Alternative:
Represent each state by a list of transitions (characterstate pairs) ended by a default state that is to be chosen for any input character not on the list Introduction to Compiler Design, Spring 2010 Page 79 Trading Trading Time for Space in DFA Simulation
A more compact representation for the transition table Four arrays
base: base location of entries for state s next: r is the next state if the checking is valid check: checking if t == s default: alternative location if the check array tells us the one given given by base[s] is invalid
default base next check int nextState(s,a) { if (check[base[s]+a] == s) return next[base[s]+a]; else return nextState(default[s],a); } s q a r t Introduction to Compiler Design, Spring 2010 Page 80 An An Example
letter or digit start 9 letter 10 other 11 * return(id) state A 10 10 … default … 10 10 … base Z 10 10 … a 10 10 … next .. 10 10 … z 10 10 … check _ … 10 … 0 … 10 … … … 10 … 9 … 10 … … … … … 9 10 11 10 10 10 e 10 10 Suppose state 10 is entered after seeing the letters th e is the next input character
Page 81 Introduction to Compiler Design, Spring 2010 ...
View
Full Document
 Spring '10
 79979
 Regular expression, Nondeterministic finite state machine, Automata theory

Click to edit the document details