Chapter_2 - Chapter 2 Scanning 1 Scanner(or Lexical...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Chapter 2: Scanning & &
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an intermediate file. more commonly, it is a routine called by parser. scans character stream from where it left off and returns next token to parser. Actually the token's lexical category is returned in the form of a simple index # and a value for the token is left in the global variables. For some tokens only token type is returned.
Background image of page 2
Input Buffering Why input buffer is needed? ==> we can identify some token only when many characters beyond the token have been examined. Two pointers (one marks the beginning of the token & the other one is a lookahead pointer) delimit the context of the token string. However, the lookahead is limited.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
a x [ = ] i n d e + 4 a [index] = 4 + 2 2 Lookahead pointer Beginner pointer a x [ = ] i n d e + 4 2 Lookahead pointer Beginner pointer
Background image of page 4
Token, Pattern & Lexeme A token is a sequence of characters that represents a unit of information in the source program. In general, there is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. The pattern is said to match each specific string in the set. A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Pattern vs. Regular Expression Regular Expression: A notation suitable for describing tokens (patterns). Each pattern is a regular language, i.e., it can be described by a regular expression
Background image of page 6
7 Two kinds of token specific string ( e.g., "if" "," "=="), that is, a single string. class of string (e.g., identifier, number, label), that is, a multiple strings.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Common Lexical Categories 1. identifiers 2. literals 3. keywords (not necessarily to be a reserved word) - What is the difference between keyword and reserved word? 1. operators 2. numbers 3. punctuation symbols: e.g. ' ( ‘ , ' ; ‘ , ' , '
Background image of page 8
9 The lexical categories are regular languages and they can be described by giving regular expressions.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10 An instance: Scanning E = M * C ** 2 // return token type (an index) and value ==> < id, pointer to symbol table entry for E > < assign_op > < id, pointer to symbol table entry for M > < mult_op > < id, pointer to symbol table entry for C > < expo_op > < num, integer value 2 >
Background image of page 10
11 The compiler may store the character string that forms a number in a symbol table and let the attribute of token const be a pointer to the table entry.
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
12 Regular Expression vs. Finite Automata Regular Expression : A notation suitable for describing tokens (patterns). Regular Expression can be converted into Finite Automata Finite Automata : Formal specifications of transition diagrams
Background image of page 12
13 Definitions symbol : undefined entity.
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/28/2011 for the course ENGINEERIN 100 taught by Professor Yangwei during the Spring '10 term at National Cheng Kung University.

Page1 / 72

Chapter_2 - Chapter 2 Scanning 1 Scanner(or Lexical...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online