030_Lexical_Analysis

030_Lexical_Analysis - CS143 Handout 03 Summer 2011 Lexical...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS143 Handout 03 Summer 2011 June 22, 2011 Lexical Analysis Handout written by Maggie Johnson and Julie Zelenski, with edits by Keith. The Basics Lexical analysis or scanning is the process where the stream of characters making up the source program is read from left-to-right and grouped into tokens. Tokens are sequences of characters with a collective meaning. There are usually only a small number of tokens for a programming language: constants (integer, double, char, string, etc.), operators (arithmetic, relational, logical), punctuation, and reserved words. while (i > 0) i = i - 2; while ( i > 0 ) i = ... The lexical analyzer takes a source program as input, and produces a stream of tokens as output. The lexical analyzer might recognize particular instances of tokens such as: 3 or 255 for an integer constant token "Fred" or "Wilma" for a string constant token numTickets or queue for a variable token Such specific instances are called lexemes . A lexeme is the actual character sequence forming a token , the general class that a lexeme belongs to. Some tokens have exactly one lexeme (e.g., the > character); for others, there are many lexemes (e.g., integer constants). The scanner is tasked with determining that the input stream can be divided into valid symbols in the source language, but has no smarts about which token should come where. Few errors can be detected at the lexical level alone because the scanner has a very localized view of the source program without any context. The scanner can report about characters that are not valid tokens (e.g., an illegal or unrecognized symbol) and a few other malformed entities (illegal characters within a string constant, unterminated comments, etc.) It does not look for or detect garbled sequences, tokens out of place, undeclared identifiers, misspelled keywords, mismatched types and the like. For example, the following input will not generate any errors in the lexical analysis phase, Lexical Analyzer error messages source language token stream
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 because the scanner has no concept of the appropriate arrangement of tokens for a declaration. The syntax analyzer will catch this error later in the next phase. int a double } switch b[2] =; Furthermore, the scanner has no idea how tokens are grouped. In the above sequence, it returns b , [ , 2 , and ] as four separate tokens, having no idea they collectively form an array access. The lexical analyzer can be a convenient place to carry out some other chores like stripping out comments and white space between tokens and perhaps even some features like macros and conditional compilation (although often these are handled by some sort of preprocessor which filters the input before the compiler runs).
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 07/18/2011.

Page1 / 15

030_Lexical_Analysis - CS143 Handout 03 Summer 2011 Lexical...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online