This preview shows pages 1–6. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Scanner code source tokens errors scanner parser IR maps characters into tokens the basic unit of syntax x = x + y; becomes < id, x > = < id, x > + < id, y > ; character string value for a token is a lexeme typical tokens: number , id , + , , * , / , do , end eliminates white space ( tabs, blanks, comments ) a key issue is speed use specialized recognizer (as opposed to lex ) Copyright c 2010 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu. 1 Specifying patterns A scanner must recognize the units of syntax Some parts are easy: white space < ws > ::= < ws >  < ws > \t   \t keywords and operators specified as literal patterns: do , end comments opening and closing delimiters: /* */ 2 Specifying patterns A scanner must recognize the units of syntax Other parts are much harder: identifiers alphabetic followed by k alphanumerics ( , $, &, . . . ) numbers integers: 0 or digit from 19 followed by digits from 09 decimals: integer . digits from 09 reals: (integer or decimal) E (+ or ) digits from 09 complex: ( real , real ) We need a powerful notation to specify these patterns 3 Operations on languages Operation Definition union of L and M L M = { s  s L or s M } written L M concatenation of L and M LM = { st  s L and t M } written LM Kleene closure of L L * = S i = L i written L * positive closure of L L + = S i = 1 L i written L + 4 Regular expressions Patterns are often specified as regular languages Notations used to describe a regular language (or a regular set) include both regular expressions and regular grammars Regular expressions ( over an alphabet ): 1. is a RE denoting the set { } 2. if a , then a is a RE denoting { a } 3. if r and s are REs, denoting L ( r ) and L ( s ) , then: ( r ) is a RE denoting L ( r ) ( r )  ( s ) is a RE denoting L ( r ) S L ( s ) ( r )( s ) is a RE denoting L ( r ) L ( s ) ( r ) * is a RE denoting L ( r ) * If we adopt a precedence for operators, the extra parentheses can go...
View
Full
Document
This note was uploaded on 02/23/2012 for the course CS 352 taught by Professor Staff during the Fall '08 term at Purdue UniversityWest Lafayette.
 Fall '08
 Staff

Click to edit the document details