02-lex - Scanner code source tokens errors scanner parser...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Scanner code source tokens errors scanner parser IR maps characters into tokens the basic unit of syntax x = x + y; becomes < id, x > = < id, x > + < id, y > ; character string value for a token is a lexeme typical tokens: number , id , + ,- , * , / , do , end eliminates white space ( tabs, blanks, comments ) a key issue is speed use specialized recognizer (as opposed to lex ) Copyright c 2010 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu. 1 Specifying patterns A scanner must recognize the units of syntax Some parts are easy: white space < ws > ::= < ws > | < ws > \t | | \t keywords and operators specified as literal patterns: do , end comments opening and closing delimiters: /* */ 2 Specifying patterns A scanner must recognize the units of syntax Other parts are much harder: identifiers alphabetic followed by k alphanumerics ( , $, &, . . . ) numbers integers: 0 or digit from 1-9 followed by digits from 0-9 decimals: integer . digits from 0-9 reals: (integer or decimal) E (+ or -) digits from 0-9 complex: ( real , real ) We need a powerful notation to specify these patterns 3 Operations on languages Operation Definition union of L and M L M = { s | s L or s M } written L M concatenation of L and M LM = { st | s L and t M } written LM Kleene closure of L L * = S i = L i written L * positive closure of L L + = S i = 1 L i written L + 4 Regular expressions Patterns are often specified as regular languages Notations used to describe a regular language (or a regular set) include both regular expressions and regular grammars Regular expressions ( over an alphabet ): 1. is a RE denoting the set { } 2. if a , then a is a RE denoting { a } 3. if r and s are REs, denoting L ( r ) and L ( s ) , then: ( r ) is a RE denoting L ( r ) ( r ) | ( s ) is a RE denoting L ( r ) S L ( s ) ( r )( s ) is a RE denoting L ( r ) L ( s ) ( r ) * is a RE denoting L ( r ) * If we adopt a precedence for operators, the extra parentheses can go...
View Full Document

This note was uploaded on 02/23/2012 for the course CS 352 taught by Professor Staff during the Fall '08 term at Purdue University-West Lafayette.

Page1 / 27

02-lex - Scanner code source tokens errors scanner parser...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online