Guidebook 1

Guidebook 1 - CSCI 3155 Student Notes Lexical Analysis 1.1...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CSCI 3155 Student Notes Lexical Analysis 1.1 1. Lexical Analysis Coursework and Goals for This Part of the Course : By the end of this section, you will be able to… o use flex notation to write regular expressions to describe the lexemes of token categories such as floating-point values or variable identifiers. o describe in English the pattern recognized by a given regular expression. o determine whether a given input string matches a given regular expression. o write a complete flex specification file to build a lexical analyzer for several dozen complex tokens. o use a flex-built lexical analyzer in a small test program. Read Sebesta §1.1, 2.1, 3.1 to 3.3, 4.1 to 4.2 Lecture topics: o What we are studying o Why design a toy programming language? o The CU++ Language o The phases of a typical compiler o Lexemes, tokens and the job of the lexical analyzer o Regular expressions o Building a lexical analyzer with the flex tool o Example: www.cs.colorado.edu/~main/proglang/examples o Homework discussion Flex Reference: www.delorie.com/gnu/docs/flex/flex_toc.html Homework 1: Building a Lexical Analyzer. 1.1. Introduction to Lexical Analysis The smallest meaningful element of a program is called a lexeme . For example, some lexemes in a C++ program might be a numeric constant 42.87 , a variable named count , and a reserved word such as while . The first stage of a compiler, called the lexical analyzer , analyzes a source program, breaking it into lexemes. The various different lexemes are further classified into categories called token categories. Some token categories are large (such as the category that contains all C++ identifiers), but other token categories might consist of just a single lexeme (such as a category that contains only the reserved word while ). The lexical analyzer is also called the lexer or scanner . The lexer assigns a symbolic name or number to each token category; these symbols or numbers are just called tokens .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSCI 3155 Student Notes Lexical Analysis 1.2 For example, here's a C++ statement: student += (knowledge - 42.0); The lexer for a C++ compiler might find the following sequence of lexemes for the statement: Lexeme Found by the Lexer Token student IDENTIFIER += PLUSASSIGN ( LEFTPAREN Knowledge IDENTIFIER - MINUS 42.0 FLOATVALUE ) RIGHTPAREN ; SEMICOLON Spaces in the C++ statement sometimes serve to separate lexemes from one another, but such spaces are discarded by the lexer. The rest of this section describes the lexemes and tokens of the CU++ Programming Language. We also give some discussion on how you will implement a lexer for our language using a programming tool called flex , but most of the implementation discussion will be in class. 1.2. Reserved words
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 13

Guidebook 1 - CSCI 3155 Student Notes Lexical Analysis 1.1...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online