lect2_lex1 - Lexical Analysis Part I CSC 435 Department of...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lexical Analysis Part I CSC 435 Department of CIS Shaw University- 2 - Frontend Structure Lexical Analysis Syntax Analysis Semantic Analysis Errors Abstract Syntax Tree Source Code Language Preprocessor Trivial errors Processing of #include, #defines #ifdef, etc- 3 - Lexical Analysis Process Lexical Analysis or Scanner if (b == 0) a = b; Preprocessed source code, read char by char if ( b == ) a = b ; Lexical analysis- Transform multi-character input stream to token stream- Reduce length of program representation (remove spaces)- 4 - Tokens Identifiers: x y11 elsex Keywords: if else while for break Integers: 2 1000 -20 Floating-point: 2.0 -0.0010 .02 1e5 Symbols: + * { } ++ << < <= [ ] Strings: x He said, \I love EECS 483\ Comments: /* bla bla bla */- 5 - How to Describe Tokens Use regular expressions to describe programming language tokens! A regular expression (RE) is defined inductively a ordinary character stands for itself a empty string R|S either R or S (alteration), where R,S = RE RS R followed by S (concatenation) R* concatenation of R 0 or more times (Kleene closure)- 6 - Language A regular expression R describes a set of strings of characters denoted L(R) L(R) = the language defined by R L(abc) = { abc } L(hello|goodbye) = { hello, goodbye } L(1(0|1)*) = all binary numbers that start with a 1 Each token can be defined using a regular expression- 7 - RE Notational Shorthand R+ one or more strings of R: R(R*) R? optional R: (R| ) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c| d...|z) [^ab] anything but one of the listed chars [^a-z] one character not from this range- 8 - Example Regular Expressions Regular Expression, R a ab a|b (ab)* (a| )b digit = [0-9] posint = digit+ int = -? Posint (-|+| ) posint real = int ( | (. posint)) = -?[0-9]+ ( |(.[0-9]+)) Strings in L(R) a ab a, b , ab, abab, ..., ab, abab, ....
View Full Document

Page1 / 27

lect2_lex1 - Lexical Analysis Part I CSC 435 Department of...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online