{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lect2_lex1 - Lexical Analysis Part I CSC 435 Department of...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Lexical Analysis – Part I CSC 435 Department of CIS Shaw University
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
- 2 - Frontend Structure Lexical Analysis Syntax Analysis Semantic Analysis Errors Abstract Syntax Tree Source Code Language Preprocessor Trivial errors Processing of #include, #defines #ifdef, etc
Background image of page 2
- 3 - Lexical Analysis Process Lexical Analysis or Scanner if (b == 0) a = b; Preprocessed source code, read char by char if ( b == 0 ) a = b ; Lexical analysis - Transform multi-character input stream to token stream - Reduce length of program representation (remove spaces)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
- 4 - Tokens Identifiers: x y11 elsex Keywords: if else while for break Integers: 2 1000 -20 Floating-point: 2.0 -0.0010 .02 1e5 Symbols: + * { } ++ << < <= [ ] Strings: “x” “He said, \“I love EECS 483\”” Comments: /* bla bla bla */
Background image of page 4
- 5 - How to Describe Tokens Use regular expressions to describe programming language tokens! A regular expression (RE) is defined inductively » a ordinary character stands for itself “a” ε empty string “” » R|S either R or S (alteration), where R,S = RE » RS R followed by S (concatenation) » R* concatenation of R 0 or more times (Kleene closure)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
- 6 - Language A regular expression R describes a set of strings of characters denoted L(R) L(R) = the language defined by R » L(abc) = { abc } » L(hello|goodbye) = { hello, goodbye } » L(1(0|1)*) = all binary numbers that start with a 1 Each token can be defined using a regular expression
Background image of page 6
- 7 - RE Notational Shorthand R+ one or more strings of R: R(R*) R? optional R: (R| ε ) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c| d...|z) [^ab] anything but one of the listed chars [^a-z] one character not from this range
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
- 8 - Example Regular Expressions Regular Expression, R » a » ab » a|b » (ab)* » (a| ε )b » digit = [0-9] » posint = digit+ » int = -? Posint (-|+| ε ) posint » real = int ( ε | (. posint)) = -?[0-9]+ ( ε |(.[0-9]+)) Strings in L(R) » “a” » “ab” » “a”, “b” » “”, “ab”, “abab”, ...
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}