02-lex

02-lex - Scanner source code scanner tokens parser IR...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Scanner code source tokens errors scanner parser IR maps characters into tokens – the basic unit of syntax x = x + y; becomes < id, x > = < id, x > + < id, y > ; character string value for a token is a lexeme typical tokens: number , id , + , - , * , / , do , end eliminates white space ( tabs, blanks, comments ) a key issue is speed use specialized recognizer (as opposed to lex ) Copyright c ± 2007 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proFt or commercial advantage and that copies bear this notice and full citation on the Frst page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior speciFc permission and/or fee. Request permission to publish from [email protected] CS502 Scanning 1 Specifying patterns A scanner must recognize the units of syntax Some parts are easy: white space < ws > ::= < ws > ’’ | < ws > ’\t’ | | ’\t’ keywords and operators speciFed as literal patterns: do , end comments opening and closing delimiters: /* ··· */ CS502 Scanning 2 Specifying patterns A scanner must recognize the units of syntax Other parts are much harder: identiFers alphabetic followed by k alphanumerics ( , $, &, . . . ) numbers integers: 0 or digit from 1-9 followed by digits from 0-9 decimals: integer ’.’ digits from 0-9 reals: (integer or decimal) ’E’ (+ or -) digits from 0-9 complex: ’(’ real ’,’ real ’)’ We need a powerful notation to specify these patterns CS502 Scanning 3 Operations on languages Operation DeFnition union of L and M L M = { s | s L or s M } written L M concatenation of L and M LM = { st | s L and t M } written LM Kleene closure of L L * = S ! i = 0 L i written L * positive closure of L L + = S ! i = 1 L i written L + CS502 Scanning 4
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Regular expressions Patterns are often speciFed as regular languages Notations used to describe a regular language (or a regular set) include both regular expressions and regular grammars Regular expressions ( over an alphabet " ): 1. # is a RE denoting the set { # } 2. if a " , then a is a RE denoting { a } 3. if r and s are REs, denoting L ( r ) and L ( s ) , then: ( r ) is a RE denoting L ( r ) ( r ) | ( s ) is a RE denoting L ( r ) S L ( s ) ( r )( s ) is a RE denoting L ( r ) L ( s ) ( r ) * is a RE denoting L ( r ) * If we adopt a precedence for operators, the extra parentheses can go away. We assume closure , then concatenation , then alternation as the order of precedence.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/23/2012 for the course CS 502 taught by Professor Antony,h during the Spring '08 term at Purdue.

Page1 / 7

02-lex - Scanner source code scanner tokens parser IR...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online