{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture-02

# lecture-02 - Scanners Friday Scanners Sometimes called...

This preview shows pages 1–11. Sign up to view the full content.

Scanners Friday, August 26, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can we recognize tokens? How do we write scanners? Friday, August 26, 2011
Regular expressions Regular sets: set of strings defined by regular expressions Strings are regular sets (with one element): purdue 3.14159 So is the empty string: λ (sometimes use ɛ instead) Concatentations of regular sets are regular: purdue3.14159 To avoid ambiguity, can use ( ) to group regexps together A choice between two regular sets is regular, using | : ( purdue | 3.14159 ) 0 or more of a regular set is regular, using * : ( purdue )* Some other notation used for convenience: Use Not to accept all strings except those in a regular set Use ? to make a string optional: x ? equivalent to ( x | λ ) Use + to mean 1 or more strings from a set: x + equivalent to xx * Use [ ] to present a range of choices: [ 1 - 3 ] equivalent to ( 1 | 2 | 3 ) Friday, August 26, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Examples of regular expressions Numbers: D = [ 0 - 9 ]+ Words: L = [ A - Za - z ]+ Literals (integers or floats): - ? D +(. D *)? Identifiers: ( _ | L )( _ | L | D )* Comments (as in LITTLE): -- Not( \n )* \n More complex comments (delimited by ##, can use # inside comment): ## (( # | λ )Not( # ))* ## Friday, August 26, 2011
How do we build a scanner? Idea: represent each token as a regular expression Match token if regular expression matches Big problem: string of characters can have multiple tokens Simpler problem for now: decide if a regular expression matches the entire string Friday, August 26, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Finite automata Finite state machine which will only accept a string if it is in the set defined by the regular expression (a b c+)+ a b c a c start state transition state final state Friday, August 26, 2011
λ transitions Transitions between states that aren’t triggered by seeing another character Can optionally take the transition, but do not have to Can be used to link states together ! Friday, August 26, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Non-deterministic FAs (NFAs) What happens when we have an FA that offers multiple choices? FA is non-deterministic if, from one state reading a single character could result in transition to multiple states If a finite automaton has a λ -transition in it, it may be non-deterministic (do we take the transition? or not?) 1 2 4 3 5 ! a a, b a a b Friday, August 26, 2011
Simulating NFAs To run NFA, simulate every possible path Intuition: deterministic FAs (DFAs) have a “pointer” that follows the single path from one state to the next When we come to a non-deterministic choice, we can “split” the pointer into two, one for each path Termination conditions If any pointer is in an accept state at the end of input, the NFA accepts (intuitively: there was one possible path that took us to the accept state) If all pointers enter an error state, the NFA enters the error state (intuitively: no possible path avoids the error state) Friday, August 26, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example 1 2 4 3 5 ! a a, b a a b Friday, August 26, 2011
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}