lect2_lex1 - Lexical Analysis – Part I CSC 435 Department...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lexical Analysis – Part I CSC 435 Department of CIS Shaw University- 2 - Frontend Structure Lexical Analysis Syntax Analysis Semantic Analysis Errors Abstract Syntax Tree Source Code Language Preprocessor Trivial errors Processing of #include, #defines #ifdef, etc- 3 - Lexical Analysis Process Lexical Analysis or Scanner if (b == 0) a = b; Preprocessed source code, read char by char if ( b == ) a = b ; Lexical analysis- Transform multi-character input stream to token stream- Reduce length of program representation (remove spaces)- 4 - Tokens ❖ Identifiers: x y11 elsex ❖ Keywords: if else while for break ❖ Integers: 2 1000 -20 ❖ Floating-point: 2.0 -0.0010 .02 1e5 ❖ Symbols: + * { } ++ << < <= [ ] ❖ Strings: “x” “He said, \“I love EECS 483\”” ❖ Comments: /* bla bla bla */- 5 - How to Describe Tokens ❖ Use regular expressions to describe programming language tokens! ❖ A regular expression (RE) is defined inductively » a ordinary character stands for itself “a” ≈ ε empty string “” » R|S either R or S (alteration), where R,S = RE » RS R followed by S (concatenation) » R* concatenation of R 0 or more times (Kleene closure)- 6 - Language ❖ A regular expression R describes a set of strings of characters denoted L(R) ❖ L(R) = the language defined by R » L(abc) = { abc } » L(hello|goodbye) = { hello, goodbye } » L(1(0|1)*) = all binary numbers that start with a 1 ❖ Each token can be defined using a regular expression- 7 - RE Notational Shorthand ❖ R+ one or more strings of R: R(R*) ❖ R? optional R: (R| ε ) ❖ [abcd] one of listed characters: (a|b|c|d) ❖ [a-z] one character from this range: (a|b|c| d...|z) ❖ [^ab] anything but one of the listed chars ❖ [^a-z] one character not from this range- 8 - Example Regular Expressions ❖ Regular Expression, R » a » ab » a|b » (ab)* » (a| ε )b » digit = [0-9] » posint = digit+ » int = -? Posint (-|+| ε ) posint » real = int ( ε | (. posint)) = -?[0-9]+ ( ε |(.[0-9]+)) ❖ Strings in L(R) » “a” » “ab” » “a”, “b” » “”, “ab”, “abab”, ...“”, “ab”, “abab”, ....
View Full Document

This note was uploaded on 12/06/2011 for the course CIS 332 taught by Professor Jin during the Spring '11 term at Shaw University.

Page1 / 27

lect2_lex1 - Lexical Analysis – Part I CSC 435 Department...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online