1CSE 450: CompilersK. StirewaltLexical analysisTopics:– Issues and complexity of lexical analysis– Regular expressionsCSE 450: CompilersK. StirewaltRecall: Structure of a CompilerSource LanguageTarget LanguageSemantic Analyzer Syntax AnalyzerLexical AnalyzerFrontEndCode Optimizer Target Code Generator BackEndInt. Code Generator Intermediate CodeCSE 450: CompilersK. StirewaltToday!Source LanguageTarget LanguageSemantic Analyzer Syntax AnalyzerLexical AnalyzerFrontEndCode Optimizer Target Code Generator BackEndInt. Code Generator Intermediate CodeCSE 450: CompilersK. StirewaltWhat exactly is lexing?Consider the code:if (i==j);z=1;else;z=0;endif;This is really nothing more than a string of characters:if_(i==j);\n\tz=1;\nelse;\n\tz=0;\nendif;Lexical analysis (akascanning) divides this string into meaningful, multi-character chunks called tokensCSE 450: CompilersK. StirewaltTokensMeaningful units of input textLanguages generally contain small number of token types.•English tokens are things like parts of speech (e.g., “noun”, “verb”, “adjective”), punctuation, etc.•In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…More abstract than the substrings they represent•E.g., IDENTIFIER vs. “employeeName”•E.g., BLOCK-COMMAND vs. “if”CSE 450: CompilersK. StirewaltIdentifying TokensThe string that an instance of a token denotes is called a lexeme.The set of all possible lexemes denoted by a given type of token is described by the use of a pattern.For example, the pattern to describe an identifier(e.g., a user-defined variable, method name, etc.) is a string of letters, numbers, or underscores, beginning with a non-number.Patterns typically described using regular expressions.
has intentionally blurred sections.
Sign up to view the full version.