This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lexical Analysis Part I CSC 435 Department of CIS Shaw University 2  Frontend Structure Lexical Analysis Syntax Analysis Semantic Analysis Errors Abstract Syntax Tree Source Code Language Preprocessor Trivial errors Processing of #include, #defines #ifdef, etc 3  Lexical Analysis Process Lexical Analysis or Scanner if (b == 0) a = b; Preprocessed source code, read char by char if ( b == ) a = b ; Lexical analysis Transform multicharacter input stream to token stream Reduce length of program representation (remove spaces) 4  Tokens Identifiers: x y11 elsex Keywords: if else while for break Integers: 2 1000 20 Floatingpoint: 2.0 0.0010 .02 1e5 Symbols: + * { } ++ << < <= [ ] Strings: x He said, \I love EECS 483\ Comments: /* bla bla bla */ 5  How to Describe Tokens Use regular expressions to describe programming language tokens! A regular expression (RE) is defined inductively a ordinary character stands for itself a empty string RS either R or S (alteration), where R,S = RE RS R followed by S (concatenation) R* concatenation of R 0 or more times (Kleene closure) 6  Language A regular expression R describes a set of strings of characters denoted L(R) L(R) = the language defined by R L(abc) = { abc } L(hellogoodbye) = { hello, goodbye } L(1(01)*) = all binary numbers that start with a 1 Each token can be defined using a regular expression 7  RE Notational Shorthand R+ one or more strings of R: R(R*) R? optional R: (R ) [abcd] one of listed characters: (abcd) [az] one character from this range: (abc d...z) [^ab] anything but one of the listed chars [^az] one character not from this range 8  Example Regular Expressions Regular Expression, R a ab ab (ab)* (a )b digit = [09] posint = digit+ int = ? Posint (+ ) posint real = int (  (. posint)) = ?[09]+ ( (.[09]+)) Strings in L(R) a ab a, b , ab, abab, ..., ab, abab, ....
View Full
Document
 Spring '11
 jin

Click to edit the document details