c02 - CS421 COMPILERS AND INTERPRETERS CS421 Lexical...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS421 COMPILERS AND INTERPRETERS Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 1 of 40 Lexical Analysis • Read source program and produce a list of tokens (“linear” analysis) • The lexical structure is specified using regular expressions • Other secondary tasks: (1) get rid of white spaces (e.g., \t,\n,\sp ) and comments (2) line numbering token get next token lexical analyzer parser source program Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 2 of 40 Example: Source Code A Sample Toy Program: (* define valid mutually recursive procedures *) let function do_nothing1(a: int, b: string)= do_nothing2(a+1) function do_nothing2(d: int) = do_nothing1(d, “str”) in do_nothing1(0, “str2”) end What do we really care here ? Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 3 of 40 The Lexical Structure Output after the Lexical Analysis ----- token + associated value LET 51 FUNCTION 56 ID (do_nothing1) 65 LPAREN 76 ID (a) 77 COLON 78 ID (int) 80 COMMA 83 ID (b) 85 COLON 86 ID (string) 88 RPAREN 94 EQ 95 ID (do_nothing2) 99 LPAREN 110 ID (a) 111 PLUS 112 INT (1) 113 RPAREN 114 FUNCTION 117 ID (do_nothing2) 126 LPAREN 137 ID (d) 138 COLON 139 ID (int) 141 RPAREN 144 EQ 146 ID (do_nothing1) 150 LPAREN 161 ID (d) 162 COMMA 163 STRING (str) 165 RPAREN 170 IN 173 ID (do_nothing1) 177 LPAREN 188 INT (0) 189 COMMA 190 STRING (str2) 192 RPAREN 198 END 200 EOF 203 Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 4 of 40 Tokens Tokens are the atomic unit of a language, and are usually specific strings or instances of classes of strings. Tokens Sample Values Informal Description LET let keyword LET END end keyword END PLUS + LPAREN ( COLON : STRING “str” RPAREN ) INT 49, 48 integer constants ID do_nothing1, a, int, string letter followed by letters, digits, and under-scores EQ = EOF end of file
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS421 COMPILERS AND INTERPRETERS Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 5 of 40 Lexical Analysis, How? • First, write down the lexical specification (how each token is defined?) using regular expression to specify the lexical structure: identifier = letter (letter | digit | underscore) * letter = a | . .. | z | A | . .. | Z digit = 0 | 1 | . .. | 9 • Second, based on the above lexical specification , build the lexical analyzer (to recognize tokens) by hand, Regular Expression Spec ==> NFA ==> DFA ==>Transition Table ==> Lexical Analyzer • Or just by using lex --- the lexical analyzer generator Regular Expression Spec (in lex format) ==> feed to lex ==> Lexical Analyzer Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 6 of 40 Regular Expressions regular expressions are concise, linguistic characterization of regular languages (regular sets) each regular expression define a regular language --- a set of strings over some alphabet, such as ASCII characters; each member of this set is called a sentence, or a word • we use regular expressions to define each category of tokens For example, the above identifier specifies a set of strings that are a sequence of letters, digits, and underscores, starting with a letter.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 10

c02 - CS421 COMPILERS AND INTERPRETERS CS421 Lexical...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online