L3-Lexical2 - Introduction to Compiler Design Lexical...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Introduction to Compiler Design Lexical Analysis II Professor Yi-Ping You Department of Computer Science http://www.cs.nctu.edu.tw/~ypyou/ Introduction to Compiler Design, Spring 2010 Page 1 Outline Outline Finite Automata Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata Regular Expression to NFA Thompson’s Construction NFA to DFA Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFA-Based Pattern Matcher Introduction to Compiler Design, Spring 2010 Page 2 Review: Review: Transition Diagram Used in a lexical analyzer to recognize a token token digit id → letter (letter | digit)* letter or digit start letter * 9 10 other 11 return(id) Introduction to Compiler Design, Spring 2010 Page 3 Finite Finite (State) Automata Similar to Transition Diagrams Recognizers Recognizers which simply say “yes” or “no” about or about each possible input string Finite automata A = (S, Σ, s0, F, T) S: all states in the FA FA Σ: all symbols accepted by the language S0: start state F: accepting states T: all transitions S × Σ → S ( or S × Σ ∪ {ε} → S ) Why finite? Introduction to Compiler Design, Spring 2010 Page 4 Finite Finite (State) Automata (Cont’d) Processing input string Starting Starting from s0, for each input character, make a transition on the automata If If no transition possible for the input character th → reject When the input string is fully consumed cons If at a final state → accept Otherwise → reject Input string s FA representing RE Accept/Reject If accept then s ∈ L(RE) L(RE): the language defined by RE Introduction to Compiler Design, Spring 2010 Page 5 Finite Finite Automata State Graphs A state The start state An accepting state a A transition Introduction to Compiler Design, Spring 2010 A B A×a→B Page 6 A Simple Example Simple A finite automaton that accepts only “1” 1 A finite automaton accepts a string if we finite can follow transitions labeled with the characters in the string from the start to some accepting state Introduction to Compiler Design, Spring 2010 Page 7 Another Another Simple Example A finite automaton accepting any number of of 1’s followed by a single 0 single Alphabet: {0,1} 1 0 Introduction to Compiler Design, Spring 2010 Page 8 Epsilon Epsilon Moves Another kind of transition: ε-moves ε A B Machine can move from state A to state B without reading input Introduction to Compiler Design, Spring 2010 Page 9 Two Two Kinds of FA Nondeterministic Finite Automata (NFA) Can have multiple transitions for one input in a given state α Can have ε-moves α ε Deterministic Finite Automata (DFA) One transition per input per state No ε-moves α NFAs and DFAs recognize the same set of languages (regular languages) Introduction to Compiler Design, Spring 2010 Page 10 NFA NFA and DFA (a|b)*abb NFA start a 0 b a 1 b 2 b 3 DFA b start 0 a 1 a Introduction to Compiler Design, Spring 2010 b a b 2 a Page 11 b 3 NFA NFA vs DFA DFA Action on each input is fully determined Each state has exact |Σ| outgoing transitions Easier to implement There are no choices to consider NFA May have choices at each step Accepts string if there is ANY path to an accepting Accepts state More complex in implementation May need to backtrack May end up exploring all the paths in the NFA Introduction to Compiler Design, Spring 2010 Page 12 Acceptance Acceptance of NFAs An NFA can get into multiple states Input: 1 0 1 Rule: Rule: NFA accepts if it can get in a final state 1 0 1 Introduction to Compiler Design, Spring 2010 0 Page 13 NFA NFA vs DFA (Cont’d) For a given language the NFA can be simpler simpler than the DFA 1 NFA 0 0 DFA DFA can be exponentially larger than NFA Introduction to Compiler Design, Spring 2010 Page 14 0 1 0 1 1 0 0 Transition Transition Table Rows are input symbols Columns are current states Entries are resulting states a start 0 b a 1 b State 0 1 2 3 2 a b 1} {0, 1} {0} Ø {2} {} Ø {3} Ø b 3 Ø ε Ø Ø Ø Ø Along with the table, a starting state and a set of accepting states are also given Introduction to Compiler Design, Spring 2010 Page 15 Transition Transition Table (Cont’d) Pro We We can easily find the transitions on a given table and input Con It takes a lot of space, when the input alphabet is large, yet most states do not have any moves on most of the input symbols A more compact representation of transition table table is discussed in Section 3.9.8 Introduction to Compiler Design, Spring 2010 Page 16 Simulating Simulating a DFA (a|b)*abb b start 0 a 1 a s = s0; c = nextChar(); while (c != eof) { s = move(s,c); c = nextChar(); nextChar } if (s is in F) return “yes”; else return “no”; Introduction to Compiler Design, Spring 2010 Page 17 b a b 2 a b 3 Recognition Recognition of Regular Expression Using NFA Simulating NFA Backtrack/Backup: remember next alternative configuration (current input & next alternative state) when alternative choices are possible Parallelism: trace every possible alternatives in parallel Look-ahead: look more input symbols to make it deterministic Introduction to Compiler Design, Spring 2010 Page 18 Simulating Simulating an NFA (a|b)*abb start a 0 b S = ε-closure(s0); c = nextChar(); while (c != eof) { S = ε-closure(move(S,c)); c = nextChar(); nextChar } if (S ∩ F != Ø) return “yes”; else return “no”; Introduction to Compiler Design, Spring 2010 Page 19 a 1 b 2 b 3 Outline Outline Finite Automata Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata Regular Expression to NFA Thompson’s Construction NFA to DFA Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFA-Based Pattern Matcher Introduction to Compiler Design, Spring 2010 Page 20 Regular Regular Expressions to Finite Automata High-level sketch Thompson’s construction (3.7.4) Regular Expression NFA Subset Construction (3.7.1) DFA Lexical Specification Table-driven Implementation of DFA Page 21 Introduction to Compiler Design, Spring 2010 Regular Regular Expressions to NFA High-level sketch Thompson’s construction Regular Expression NFA Subset Construction DFA Lexical Specification Table-driven Implementation of DFA Page 22 Introduction to Compiler Design, Spring 2010 Regular Regular Expressions to NFA Its possible to construct an NFA from a regular regular expression Thompson’s construction algorithm Build the NFA inductively Define rules for each base RE Combine for more complex RE’s i RE f Introduction to Compiler Design, Spring 2010 Page 23 Review: Review: How to Describe Tokens Use regular expressions (REs) to describe programming programming language tokens! A regular expression is defined inductively a ε R|S RS R* ordinary character stands for itself empty string either R or S (alteration or union), where R,S = RE R followed by S (concatenation) concatenation of R zero or more times (Kleene closure) Introduction to Compiler Design, Spring 2010 Page 24 Thompson’s Thompson’s Construction (1/3) For expression ε start i ε f For any subexpression a in Σ start i a f Introduction to Compiler Design, Spring 2010 Page 25 Thompson’s Thompson’s Construction (2/3) Suppose N(s) and N(t) are NFA’s for RE s and t, respectively respectively r = ( s) L(r) = L(s) N(r) = N(s) r = s|t (Alternation) ε i N(s) start ε f ε N(t) ε Introduction to Compiler Design, Spring 2010 Page 26 Thompson’s Thompson’s Construction (3/3) r = st (Concatenation) start i N(s) N(t) f r = s* (Kleene closure) ε start i ε N(s) ε ε f Introduction to Compiler Design, Spring 2010 Page 27 Precedence Precedence of Operators Operators Level of precedence Kleene closure (*), ?, + (highest level) concatenation alternation (|) (lowest level) Ex: a*b|cd* = ((a*)b)|(c(d*)) (( All operators are left associative Ex: abc = (ab)c b a|b|c = (a|b)|c Introduction to Compiler Design, Spring 2010 Page 28 RE RE to NFA: An Example (1/4) Parse tree of (a|b)*abb r11 r9 r7 r5 r4 ( r1 a r3 | ) r2 b Page 29 r10 r8 b r6 * a b Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (2/4) (2 (a|b)*abb a (r1) start b (r2) (a|b) (r4) | start 1 start 4 2 a 3 b 5 ε ε 2 a b 3 ε 6 4 5 ε Page 30 Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (3/4) (a|b)* (r5) start 0 ε ε 2 a b ε ε a b ε 3 ε 3 ε 6 1 ε 7 ε 4 5 ε (a|b)*a (r7) | start 0 ε ε 1 2 ε 6 ε 7 a 8 ε 4 5 ε Page 31 Introduction to Compiler Design, Spring 2010 RE RE to NFA: An Example (4/4) (a|b)*abb (r11) ε start 0 ε ε 1 2 a b ε 3 ε 6 ε 7 a 8 b 9 b 10 ε 4 5 ε Introduction to Compiler Design, Spring 2010 Page 32 Properties Properties of the Constructed NFA N(r) has at most twice as many as states as there are are operators and operands in r N(r) has one one start state No incoming transitions one accepting state No outgoing transitions Each state of N(r) other than the accepting state Each has either One One outgoing transition on a symbol in Σ ∪ {ε} or Two outgoing transitions, both on ε Introduction to Compiler Design, Spring 2010 Page 33 Regular Regular Expressions to Finite Automata High-level sketch Thompson’s construction Regular Expression NFA Subset Construction DFA Lexical Specification Table-driven Implementation of DFA Page 34 Introduction to Compiler Design, Spring 2010 NFA NFA to DFA: The Basic Concept Merge NFA’s multiple states, which are connected connected by ε-transitions start 1 ε start 2 12 start 1 ε 2 a ε 3 ε start 4 124 a 34 Introduction to Compiler Design, Spring 2010 Page 35 Operations Operations on NFA States ε-closure(s) A set of NFA states reachable from NFA state s on ε-transition alone ε-closure(T) A set of NFA states reachable from some NFA state s in set T on ε-transition alone ε-closure(T) = ∪s in T ε-closure(s) move(T, a) A set of NFA states to which there is a transition on input symbol a from some state s in set T Introduction to Compiler Design, Spring 2010 Page 36 Converting Converting NFA to DFA Subset construction Construct the initial state of the DFA By finding the ε-closure of the initial state From a state T in the DFA, for each input character a Find the ε-closure of the set of states in move(T, a), say U Make U a state in DFA if it is not there yet – If U contains at least one final state in NFA, then mark it as a final state in DFA Make T × a → U a transition in DFA Repeat the step above for all states in DFA that has not been processed yet (use a stack to keep track of) Introduction to Compiler Design, Spring 2010 Page 37 The The Algorithm of Subset Construction add ε-closure(s0) as an unmarked state to Dstates; while (there is an unmarked state T in Dstates) { mark T; for (each input symbol a) { U = ε-closure(move(T, a)); if (U is not in Dstates) { add U as an unmarked state to Dstates; mark as final if U contains the original final state; } Dtran[T, a] = U; } } Introduction to Compiler Design, Spring 2010 Page 38 NFA NFA to DFA: An Example (1/4) (a|b)*abb start 0 ε 2 ε ε 1 a b ε 3 ε 6 ε 7 a 8 b 9 b 10 ε 4 5 ε add ε-closure(s0) as an unmarked state to Dstates; as while (there is an unmarked state T in Dstates) { mark T; ε-closure({0}) = {0, 1, 2, 4, 7} = A Introduction to Compiler Design, Spring 2010 A Dstates Page 39 NFA NFA to DFA: An Example (2/4) ε start 0 ε ε 1 2 a b ε 3 ε 6 ε 7 a 8 b 9 b 10 ε 4 5 ε for (each input symbol a) { U = ε-closure(move(T, a)); if (U is not in Dstates) { add U as an unmarked state to Dstates; mark as final if U contains the original final state; } Dtran[T, a] = U; A B Dstates move(A, a) = move({0, 1, 2, 4, 7},a) = {3,8} 7} {3 ε-closure(move(A, a))=ε-closure({3,8}) = {1,2,3,4,6,7,8} = B Dtran[A, a] = B Introduction to Compiler Design, Spring 2010 Page 40 NFA NFA to DFA: An Example (3/4) (3 ε start 0 ε ε 1 2 a b ε 3 ε 6 ε 7 a 8 b 9 b 10 ε 4 5 ε A B B C Dstates move(A, b) = move({0, 1, 2, 4, 7},b) = {5} ε-closure(move(A, b))=ε-closure({5}) = {1,2,4,5,6,7} = C Dtran[A, b] = C move(B, a) = move({1,2,3,4,6,7,8},a) = {3,8} ε-closure(move(B, a))=ε-closure({3,8}) = B Dtran[B, a] = B move(B, b) = move({1,2,3,4,6,7,8},b) = {5,9} ε-closure(move(B, b))=ε-closure({5,9}) = {1,2,4,5,6,7,9} = D Dtran[B, b] = D Introduction to Compiler Design, Spring 2010 Page 41 NFA NFA to DFA: An Example (4/4) Transition table Dtran NFA State {0,1,2,4,7} {1,2,3,4,6,7,8} {1,2,4,5,6,7} {1,2,4,5,6,7,9} {1,2,4,5,6,7,10} DFA State A B C D E a B B B B B b C D C E C b b A C start a B a a b b D b E a a Page 42 Introduction to Compiler Design, Spring 2010 Simulating Simulating an NFA (a|b)*abb start a 0 b S = ε-closure(s0); c = nextChar(); while (c != eof) { S = ε-closure(move(S,c)); c = nextChar(); nextChar } if (S ∩ F != Ø) return “yes”; else return “no”; Introduction to Compiler Design, Spring 2010 Page 43 a 1 b 2 b 3 Properties Properties of the Converted DFA Each state of the constructed DFA corresponds corresponds to a set of NFA states It is possible that the number of DFA states is exponential in the number of NFA states However for However, for real languages, the NFA and the DFA have approximately the same number of states of Introduction to Compiler Design, Spring 2010 Page 44 Observations Observations on DFA’s Many DFA’s recognize the same language E.g., E.g., both of the following DFA’s recognize language L((a|b)*abb) b b b start 0 a 1 a a b 2 a b 3 b start A C a a B a b b D b E a a We would generally prefer a DFA with as few states as possible E.g., states A and C of the right figure are equivalent states Introduction to Compiler Design, Spring 2010 Page 45 Regular Regular Expressions to Finite Automata High-level sketch Thompson’s construction Regular Expression NFA Subset Construction DFA → Minimized DFA (Section 3.9.6) Lexical Specification Table-driven Implementation of DFA Page 46 Introduction to Compiler Design, Spring 2010 DFA DFA Minimization: An Example DFA State A B C D E a B B B B B Merge A and C b C D C E C b b A C start a B a a b b D b E a a DFA State A B C D E a B B B B B b A C D C A E C A b start A a B a b D b E b a a Page 47 Introduction to Compiler Design, Spring 2010 DFA DFA Minimization: Another Example The previous method do not always get the minimal minimal DFA 0 b 3 a 1 a b 2 b 4 b 0 1 2 3 4 a 1 4 b 3 2 2 4 Cannot merge Further? Actually can be minimized further Actually can be minimized further 0 a b 1 b a 3 2 b Introduction to Compiler Design, Spring 2010 Page 48 DFA DFA Minimization How to identify states that can be merged? Final Final states and non-final states can never be merged Starting from states s and t, for all strings x ll If the acceptance decision is always the same, th then s and t are indistinguishable (equivalent) Introduction to Compiler Design, Spring 2010 Page 49 Distinguishability Distinguishability between States String x distinguishes state s from state t if exactly one of the states reached from s and t by following the path with with label x is an accepting state is State s is distinguishable from state t if there is some string string that distinguishes them E.g., empty string distinguishes any accepting state from any non-accepting state E.g., string bb distinguishes state A from state B string b b A C start a B a a b b D b E a Introduction to Compiler Design, Spring 2010 a Page 50 DFA DFA Minimization Method Initialization: Initialization: Divide the states into two groups: – final states – non-final states Division within a group G If for every input symbol a, two states s and t in G have transitions on a to the same group, then s and t stay in the same group Otherwise, divide G and put s and t to different groups Repeat the division, until no changes on grouping Introduction to Compiler Design, Spring 2010 Page 51 DFA DFA Minimization: An Example b 1. Final states: {E} C Non-final states: {A,B,C,D} , b b a a 2. Try to split {A,B,C,D} start a b b A B D A,B,C,D all go to {A,B,C,D} on a a A,B,C go to {A,B,C,D} on b a D goes to {E} on b {A,B,C},{D} ,{E} 3. Try to split {A,B,C} b A,B,C all go to {A,B,C} on a a start A,C go to {A,B,C} on b aB b Db A B goes to {D} on b a b a {A,C},{B},{D} ,{E} 4. A,C go to the same states on each input go Introduction to Compiler Design, Spring 2010 E E Page 52 Outline Outline Finite Automata Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata Regular Expression to NFA Thompson’s Construction NFA to DFA Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFA-Based Pattern Matcher Introduction to Compiler Design, Spring 2010 Page 53 Efficiency Efficiency of String-Processing Algorithms StringAlgorithm 3.18 (DFA simulation): O(|x|) Algorithm 3.22 (NFA simulation): O(|x|*size of graph) One issue that may favor NFA Subset construction can exponentiate the number of states, in the worst case Reading in Section 3.7.3 (pp.157) and Section 3.7.5 (pp.163) Automaton NFA DFA typical case DFA worst case Introduction to Compiler Design, Spring 2010 Initial O(|r|) O(|r|3) O(|r|32|r|) Per String O(|r|×|x|) O(|x|) O(|x|) Page 54 DFA DFA or NFA? Which One? If the string-processor is to be executed many times, then the cost of converting to then a DFA is worthwhile In grep, it may be more efficient to skip the step of constructing a DFA and simulate the NFA directly Ex: Ex: grep “finite automata” file1 grep “auto*” files fil Introduction to Compiler Design, Spring 2010 Page 55 Outline Outline Finite Automata Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata Regular Expression to NFA Thompson’s Construction NFA to DFA Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFA-Based Pattern Matcher Introduction to Compiler Design, Spring 2010 Page 56 Design Design of a Lexical-Analyzer Generator LexicalInput buffer lexeme lexemeBegin forward Automaton simulator P1 {action A1} P2 {action A2} … Pn {action An} Lex Program Lex Compiler Transition table Actions Introduction to Compiler Design, Spring 2010 Page 57 Scanner Scanner using FA -- Ambiguity Resolution (1/3) Longest match Continue Continue till there is no further transition feasible for the next input Implies the need to “lookahead” before accepting E.g., for the following DFA, input is abbbbccd Need to look ahead to determine when to accept After processing abbbb, need to determine whether to accept or to go to the next state If the following string is cca then go for longest match and accept at state 6 If the following string is not cca then accept at state 3 b start 1 Lookahead a2 b 3 c 4 c 5 a 6 Page 58 Introduction to Compiler Design, Spring 2010 Scanner Scanner using FA -- Ambiguity Resolution (2/3) First match Patterns are listed sequentially When conflict, choose the one that is listed first E.g. P1: abc P2: abc* Input: abc – Can be P1 or P2 – Use first match rule, should be P1 Should arrange the REs properly th RE E.g., keywords REs should appear before the id RE In there In practice, there is no need to have DFA with all keywords in it – Reducing the number of states to save space Introduction to Compiler Design, Spring 2010 Page 59 Scanner Scanner using FA -- Ambiguity Resolution (3/3) Lookahead E.g., in Fortran, a keyword can also be an in keyword identifier IF (a = b) THEN … b) THEN IF (I, J) = 3 IF = 1.0 Previous rules (longest match and first match) do not resolve this problem Solution: allow specification of a lookahead RE r1/r2 E.g., IF / \( .* \) letter {return IF} Otherwise, return identifier Introduction to Compiler Design, Spring 2010 Page 60 Pattern Pattern Matching Based on NFA’s Combine all NFA’s into one Lex program P1 {action A1} P2 {action A2} … Pn {action An} N(p1) ε ε s0 N(p2) … N(pn) ε Introduction to Compiler Design, Spring 2010 Page 61 Pattern Pattern Matching (NFA): An Example a abb a * b+ {action A1 for pattern p1} {action A2 for pattern p2} pattern {action A3 for pattern p3} 1 a 2 ε start 0 ε ε 3 a 4 b 5 b 6 7 b b 8 a Introduction to Compiler Design, Spring 2010 Page 62 Pattern Pattern Matching (NFA): An Example Input: aaba start 0 ε ε ε 1 3 7 a a b 2 4 8 b 5 b 6 Processing sequence a 0 1 3 7 a b b a*b+ 8 a 2 4 7 a 7 a n on e a * b+ {action A3 for pattern p3} Page 63 Introduction to Compiler Design, Spring 2010 Pattern Pattern Matching Based on DFA’s The NFA for all patterns are converted into an equivalent equivalent DFA Simulate the DFA until at some point there is no next next state (next state is a dead state) start 0137 a a a 7 a 247 aaba b b b 8 b b b 68 58 a*b+ abb a*b+ Page 64 Introduction to Compiler Design, Spring 2010 Implementation Implementation of DFA’s A DFA can be implemented by a 2D table T One dimension is “states” Other dimension is “input symbols” For every transition Si →a Sk define T[i,a] = k DFA “execution” If in state Si and input a, read T[i,a] = k and skip to state Sk skip Very efficient Introduction to Compiler Design, Spring 2010 Page 65 Implementing Implementing the Lookahead Operator IF / \( .* \) letter any start 0 I 1 F 2 ε (/) 3 ( 4 ) 5 letter 6 Introduction to Compiler Design, Spring 2010 Page 66 Automatic Automatic Generation of Lexers 2 programs developed at Bell Labs in mid 70’s for use with UNIX Lex - transforms an input stream into the alphabet of the grammar processed by yacc Flex = fast lex, later developed by Free Software Foundation Yacc – yet another compiler-compiler Input to lexer generator List of regular expressions in priority order Associated action with each RE Output Program Program that reads input stream and breaks it up into tokens according the the REs Introduction to Compiler Design, Spring 2010 Page 67 How How Does Lex Work? Lex / Flex Lex program (REs for Tokens) (input) RE NFA NFA DFA Optimize DFA Character Stream DFA Simulation C program (output) Token stream (and errors) Introduction to Compiler Design, Spring 2010 Page 68 Rules Rules for Pattern Matching REs alone not enough, need rules for choosing choosing when we get multiple matches Longest matching token wins Ties in length resolved by priorities Token specification order often defines priority RE’s + priorities + longest matching token rule = definition of a lexer Introduction to Compiler Design, Spring 2010 Page 69 Outline Outline Finite Automata Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Regular Expression to Automata Regular Expression to NFA Thompson’s Construction NFA to DFA Subset Construction Efficiency of DFA and NFA Implementation of Lexical Analyzer Generator Optimization of DFA-Based Pattern Matcher Introduction to Compiler Design, Spring 2010 Page 70 Converting Converting a RE Directly to a DFA Sections 3.9.1-3.9.5 Method Construct a syntax tree T from the argumented regular expression (r)# Compute nullable, firstpos, lastpos, and followpos for T Construct Dstates and Dtran by Figure 3.62 Construct (pp.180) Introduction to Compiler Design, Spring 2010 Page 71 Syntax Syntax Tree Construction Each leaf not labeled ε is attached a unique integer, integer, called position o (a|b)*abb# o o b 4 a 3 b 5 # 6 o * | a 1 b 2 Introduction to Compiler Design, Spring 2010 Page 72 Function Function Computation n: a syntax node nullable(n) True iff n can generate ε firstpos(n) A set of positions (correspond to a set of symbols, which are the first symbols of the strings that n can generate) lastpos(n) A set of positions (correspond to a set of symbols, which are the last symbols of the strings that n can generate) ge last followpos(p) p: a position A set of positions (correspond to a set of symbols, which are the symbols that can be followed by position p) Introduction to Compiler Design, Spring 2010 Page 73 Rules Rules for Computing nullable, firstpos, and lastpos nullable firstpos lastpos Node n A leaf labeled ε A leaf with position i An or-node n = c1|c2 An cat-node n = c1c2 nullable(n) true false nullable(c1) or nullable(c2) nullable(c1) and nullable(c2) firstpos(n) ø {i} firstpos(c1) ∪ firstpos(c2) if (nullable(c1)) firstpos(c1) ∪ firstpos(c2) else firstpos(c1) firstpos(c1) lastpos(n) ø {i} lastpos(c1) ∪ lastpos(c2) if (nullable(c2)) lastpos(c1) ∪ lastpos(c2) else lastpos(c2) lastpos(c1) An star-node n = c1* true Introduction to Compiler Design, Spring 2010 Page 74 Computing followpos Computing followpos Two ways that a position of a regular expression can can be made to follow another If n is a cat-node (with left child c1 and right child c2), then for every position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i) If n is a star-node, and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i) Introduction to Compiler Design, Spring 2010 Page 75 Continuing Continuing the Example (a|b)*abb# {1,2,3} o {6} {6} {1,2,3} o {5} {1,2,3} o {4} {1,2,3} o {3} {1,2} * {1,2} {1,2} | {1,2} {1} a {1} {2} b {2} 1 2 Introduction to Compiler Design, Spring 2010 {6} # {6} 6 {5} b {5} 5 Position n 1 2 3 4 5 6 followpos(n) {1,2,3} {1,2,3} {4} {5} {6} ø Page 76 {4} b {4} 4 {3} a {3} 3 RE RE to DFA Algorithm (Figure 3.62) S0 = firstpos(root) where root is the root of the syntax tree Dstates = {s0} and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a ∈ Σ do let U be the set of positions that are in followpos(p) for some position p in T, some position such that the symbol at position p is a if U is not empty and not in Dstates then not empty and not in Dstates add U as an unmarked state to Dstates end if Dtran[T,a] = U end do end do Page 77 Introduction to Compiler Design, Spring 2010 Continuing Continuing the Example firstpos(root) ={1,2,3}=A Dtran[A,a]=followpos(1)∪followpos(3) = {1,2,3,4}=B , Dtran[A,b]=followpos(2)={1,2,3}=A Dtran[B,a]= followpos(1)∪followpos(3) = B , Dtran[B,b]= followpos(2)∪followpos(4) = {1,2,3,5}=C Dtran[C,a]= … Dtran[C,b]= … … b a a 1234 b start 123 b 1235 b 1236 a a Introduction to Compiler Design, Spring 2010 Page 78 Trading Trading Time for Space in DFA Simulation The simplest and fastest way to represent the transition function of a DFA is a two-dimensional table indexed by states and characters A typical lexical analyzer has several hundred states in its DFA and involves the ASCII alphabet of 128 input characters The array consumes less than a megabyte Alternative: Represent each state by a list of transitions (character-state pairs) ended by a default state that is to be chosen for any input character not on the list Introduction to Compiler Design, Spring 2010 Page 79 Trading Trading Time for Space in DFA Simulation A more compact representation for the transition table Four arrays base: base location of entries for state s next: r is the next state if the checking is valid check: checking if t == s default: alternative location if the check array tells us the one given given by base[s] is invalid default base next check int nextState(s,a) { if (check[base[s]+a] == s) return next[base[s]+a]; else return nextState(default[s],a); } s q a r t Introduction to Compiler Design, Spring 2010 Page 80 An An Example letter or digit start 9 letter 10 other 11 * return(id) state A 10 10 … default … 10 10 … base Z 10 10 … a 10 10 … next .. 10 10 … z 10 10 … check _ … 10 … 0 … 10 … … … 10 … 9 … 10 … … … … … 9 10 11 10 10 10 e 10 10 Suppose state 10 is entered after seeing the letters th e is the next input character Page 81 Introduction to Compiler Design, Spring 2010 ...
View Full Document

Ask a homework question - tutors are online