cs160-lec2-3

cs160-lec2-3 - 1 CMPSC 160 Translation of Programming...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 CMPSC 160 Translation of Programming Languages Lectures 2 & 3: Lexical Analysis (Scanning) Reading Assignment Read Chapter 2 from the textbook 2 First Phase: Lexical Analysis (Scanning) Scanner Maps stream of characters into words Basic unit of syntax Characters that form a word are its lexeme Its syntactic category is called its token Scanner discards white space and comments Source code Scanner IR Parser Errors token get next token Why Lexical Analysis? By separating context free syntax from lexical analysis We can develop efficient scanners We can automate efficient scanner construction We can write simple specifications for tokens Scanner Scanner Generator specifications (regular expressions) source code tokens tables or code 3 What are Tokens? Token: Basic unit of syntax they are the atoms Keywords if, while, ... Operators +, *, <=, ||, ... Identifiers (names of variables, arrays, procedures, classes) i, i1, j1, count, sum, ... Numbers 12, 3.14, 7.2E-2, ... What are Tokens? Tokens are terminal symbols for the parser Tokens are treated as indivisible units in the grammar defining the source language 1. S expr 2. expr expr op term 3. | term 4. term number 5. | id 6. op + 7. | - number , id , + , - are tokens passed from scanner to parser. They form the terminal symbols of this simple grammar. 4 Tokens can have Attributes A problem If we send this output to the parser, is it enough? Where are the variable names, procedure, names, etc.? All identifiers look the same. Tokens can have attributes that they can pass to the parser (using the symbol table) if (i == j) z = 0; else z = 1; becomes IF, LPAREN,ID,EQEQ,ID,RPAREN, ID,EQ,NUM,SEMICOLON,ELSE, ID,EQ,NUM,SEMICOLON IF, LPAREN,<ID, i>,EQEQ,<ID, j>,RPAREN, <ID, z>,EQ,<NUM,0>,SEMICOLON,ELSE, <ID,z>,EQ,<NUM,1>,SEMICOLON Lexical Concepts Token : Basic unit of syntax, syntactic output of the scanner Pattern : The rule that describes the set of strings that correspond to a token, i.e., specification of the token Lexeme : A sequence of input characters which match to a pattern and generate the token WHILE while while IF if if ID i1 , length, letter followed by count , sqrt letters and digits Token Lexeme Pattern 5 How do we specify lexical patterns? Some patterns are easy Keywords and operators Specified as literal patterns: if , then , else , while , = , + , Some patterns are more complex Identifiers letter followed by letters and digits Numbers Integer: 0 or a digit between 1 and 9 followed by digits between 0 and 9 Decimal: An optional sign (which can be + or -) followed by digit 0 or a nonzero digit followed by an arbitrary number of digits followed by a decimal point followed by an arbitrary number of digits GOAL: We want to have concise descriptions of patterns, and we want to automatically construct the scanner from these descriptions Specifying Lexical Patterns 6...
View Full Document

This note was uploaded on 11/23/2010 for the course MATH 104b taught by Professor Ceniceros,h during the Spring '08 term at UCSB.

Page1 / 22

cs160-lec2-3 - 1 CMPSC 160 Translation of Programming...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online