lec01-lexicalanalyzer

lec01-lexicalanalyzer - Lexical Analyzer Lexical Analyzer...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
CS416 Compiler Design 1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it. Lexical Analyzer Parser source program token get next token
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS416 Compiler Design 2 Token Token represents a set of strings described by a pattern. Identifier represents a set of strings which start with a letter continues with letters and digits The actual string (newval) is called as lexeme . Tokens: identifier, number, addop, delimeter, … Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the attribute of the token. For simplicity, a token may have a single attribute which holds the required information for that token. For identifiers, this attribute a pointer to the symbol table, and the symbol table holds the actual attributes for that token. Some attributes: <id,attr> where attr is pointer to the symbol table <assgop,_> no attribute is needed (if there is only one assignment operator) <num,val> where val is the actual value of the number. Token type and its attribute uniquely identifies a lexeme. Regular expressions are widely used to specify patterns.
Background image of page 2
CS416 Compiler Design 3 Terminology of Languages Alphabet : a finite set of symbols (ASCII characters) String : Finite sequence of symbols on an alphabet Sentence and word are also used in terms of string ε is the empty string |s| is the length of string s. Language : sets of strings over some fixed alphabet the empty set is a language. { ε } the set containing empty string is a language The set of well-formed C programs is a language The set of all possible identifiers is a language. Operators on Strings : Concatenation : xy represents the concatenation of strings x and y. s ε = s ε s = s s n = s s s . . s ( n times) s 0 = ε
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS416 Compiler Design 4 Operations on Languages Concatenation: L 1 L 2 = { s 1 s 2 | s 1 L 1 and s 2 L 2 } Union L 1 L 2 = { s | s L 1 or s L 2 } Exponentiation: L 0 = { ε } L 1 = L L 2 = LL Kleene Closure L * = Positive Closure L + = C = 0 i i L e = 1 i i L
Background image of page 4
CS416 Compiler Design 5 Example L 1 = {a,b,c,d} L 2 = {1,2} L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2} L 1 L 2 = {a,b,c,d,1,2} L 1 3 = all strings with length three (using a,b,c,d} L 1 * = all strings using letters a,b,c,d and empty string L 1 + = doesn’t include the empty string
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS416 Compiler Design 6 Regular Expressions We use regular expressions to describe tokens of a programming language. A regular expression is built up of simpler regular expressions (using
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 38

lec01-lexicalanalyzer - Lexical Analyzer Lexical Analyzer...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online