4 Pages

CS544 Class2

Course: CS 544, Fall 2008
School: Uni. Worcester
Rating:
 
 
 
 
 

Word Count: 1155

Document Preview

Class CS544: 2 Scanning 1 Overview Scanning is also called lexical analysis. This is because the scanner analyzes the input stream, consisting of lexical elements, called lexemes. The term lexeme comes from the field of linguistics, where it means the smallest unit of a language which has meaning. Scanning is, perhaps, the part of compilation that is most based upon formal mathematics and foundations of computer...

Register Now

Unformatted Document Excerpt

Coursehero >> United Kingdom >> Uni. Worcester >> CS 544

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Class CS544: 2 Scanning 1 Overview Scanning is also called lexical analysis. This is because the scanner analyzes the input stream, consisting of lexical elements, called lexemes. The term lexeme comes from the field of linguistics, where it means the smallest unit of a language which has meaning. Scanning is, perhaps, the part of compilation that is most based upon formal mathematics and foundations of computer science. For this class, we are ignoring preprocessing as a phase in the compilation process. There are many language systems that apply one or more preprocessors to the program. Preprocessing techniques are basically a subset of the material we will cover; although preprocessors can present their own set of problems. First program transformation The scanner converts the source program's stream of lexemes into an equivalent representation of tokens. Tokens represent the lexeme and the context of the lexeme. For example, a typical token contains: The type of lexeme (e.g., IDENTIFIER) The lexeme's image ("aVariable") The lexeme's position in the input stream The next part of the compiler, the parser, works on a stream of tokens. Clean up the input Scanners normalize the input stream. There is often unwanted text, such as comments, white space, and other lexemes that have nothing to contribute to the program translation. The scanner removes it, or puts it in some special form, so that the parser and does not have to attend to it. Lexeme interpretation One of the interesting decisions a compiler writer must make is what type of analysis and synthesis operations should go in which compiler component. For example, consider the following part of a program's source: -1.23. How many lexemes (and corresponding tokens) are there? Various answers, all correct, are 1, 2, and 4. Let's look a little further into this. If your scanner takes into account the structure of real and signed numbers, there is a single token that would represent a real number with a value of -1.23. Perhaps your scanner separates the sign from the number. Then you have two tokens. The first represents the minus sign. The second represents the positive real number 1.23. Maybe your scanner takes even smaller bites of the input. Then, you would have the four lexemes: - 1 . 23. The last approach makes for a simpler scanner. It does, however, put more of a burden upon the parser. You need to decide which is more appropriate. My preference is to use the last approach. This has a couple of benefits: The scanner is not responsible for determining the appropriate format for numbers. The scanner does not have to make a determination as to whether the text: 1-2.3 is an arithmetic expression or two numbers. Theoretical foundations There are two theoretical building blocks that have helped make scanner construction efficient and automatic: finite automata and regular expressions. Regular expressions Regular expressions give us a way to formally express the lexical structure of a language. From your foundations course you should recall that we can build regular expressions using the following rules. Let be the input alphabet. If a then a is a regular expression (denoting the set containing a). If a and b are regular expressions, then: (a) is a regular expression. a | b is a regular expression. ab is a regular expression. a* is a regular expression. Finite automata Finite automata are a way of representing the structure of regular languages, and hence, regular expressions. There are type two of finite automata: deterministic (DFA) and non- deterministic (NFA). Both of these are used in the construction of many types of scanners. Even when scanners are written by hand, the state-based approach and results from finite automata are used. Three important results When building a scanner, you will deal with tables that represent states of finite automata, and operations of combining regular expressions. In order to make your scanner efficient, you will base your code (or the scanner generator will base it's generated code) on three particular results from theory: Kleene's construction that takes a DFA to regular expressions. This is important for ensuring that the DFA that you use actually represents the language you think it does. Thompson's construction that creates an NFA from a regular expression. This is the first step to converting the language of the regular expression grammar to an executable scanner. Subset construction that transforms an NFA to a corresponding DFA. The DFA is the final form we want to have for implementing the scanner since our programming languages are (usually) deterministic. There is a fourth result that you use to minimalize the DFA that results from the application of subset construction. The result is Hopcroft's algorithm. Example Let's take an example through the complete cycle. We will start with a regular expression, R: c(c|d|u)*. If you think of c representing the set of letter characters, d representing the digits, and u representing the underscore character, then you can see that R describes the syntax for identifiers in many programming languages. Thompson's construction We need to construct an NFA from the regular expression. This is done in the following sequence: Create the trivial NFAs for each character, c, d, and u. Using the NFAs for regular expression operators (fig. 2.4 in the textbook), combine the simpler NFAs into more complex ones in the following order: c|d, c|d|u, (c|d|u)*, c(c|d|u)*. Subset construction Use the subset construction algorithm shown in fig. 2.6 of the text to create a DFA from the ...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Uni. Worcester - CS - 544
CS544: Class 3 Scanning 2 From theory to practiceThe theoretical results we discussed in the previous class are important for several reasons. Two of the most important are: All scanner generators are based upon the theory. If you write one yourse
Uni. Worcester - CS - 544
CS544: Class 4 Parsing 1 What is parsing?The scanner has transformed a stream of characters into a stream of tokens. Now what? We need to determine if the tokens, in the order that they are entered represent a sentence in the language. Parsing is ab
Uni. Worcester - CS - 544
CS544: Class 5 Parsing 2 Ambiguity if-then-elseWe will look in detail at the if-then-else ambiguity and how to solve it. This continues the discussion from last class.Initial grammarWe start with the following grammar:The problemUsing the gram
Uni. Worcester - CS - 544
CS544: Class 6 Parsing 3 Exporting Eclipse projectsWhen you turn in your code from an Eclipse project, use the File>Export. There is a movie showing how to do this on my movies page.Creating a parser in JavaCCIn this class, we will create the par
Uni. Worcester - CS - 509
designEditor: Martin FowlerIT h o u g h t Wo r k sIfowler@acm.orgContinuous DesignJim Shorehe rising popularity of refactoring, tools such as JUnit, and agile methodologies such as Extreme Programming (XP) has brought a new style of desig
Uni. Worcester - CS - 509
CS509: Design of Software SystemsClass 12WEWPI1CS509-S04Principles of package design Reuse-Release Equivalence Principle (REP) The granule of reuse is the granule of release If you make something reusable, release the whole thing as a
Uni. Worcester - CS - 509
<Project Name / Title><Team Identification> CS509-S04WEWPI1Outline Introduction Requirements Needs Features Design and architecture Demo Project management Lessons learnedWEWPI2Requirements Where did they come from How d
Uni. Worcester - CS - 3733
CS3733-B03 Software Engineering Final ExamName:Answer all of the questions as completely as possible. Do your own work. There is an academic honesty policy at WPI and it will be enforced. Good luck. Part 1: Basic Knowledge (60 pts.) 1. In the batt
Uni. Worcester - CS - 3733
CS3733-D04 Final ExamName:Answer all of the questions as completely as possible. Do your own work. There is an academic honesty policy at WPI and it will be enforced. Good luck. Part 1: Basic Knowledge For multiple choice questions, circle all tha
Uni. Worcester - CS - 3733
CS3733-D04 Midterm ExamName:Answer all of the questions as completely as possible. Do your own work. There is an academic honesty policy at WPI and it will be enforced. Good luck. Part 1: Basic Knowledge For multiple choice questions, circle all t
Allan Hancock College - MATH - 2965
The University of Sydney Applied Mathematics 2MATH2965 Lecturers:Introduction to PDEs (Advanced) D. J. Galloway & R. Thompson2008Tutorial 7 For the week beginning Monday 15th September 1. Define the Laplace transforms I = L t-3/2 e-a (a)2 2
Texas San Antonio - MS - 5003
MS 5003 : Quantitative Methods for Business AnalysisSpring 2002 Test 2 Name: SSN: Solutions1.Atlas Sporting Goods has implemented a special trade promotion for its propane stove and feels that the promotion should result in a price change for th
Texas San Antonio - STA - 3533
STA 3533: Probability and Random Processes (Spring 2005) Text: Probability and Stochastic Processes by Yates and Goodman, 2nd ed., WileyChapter 1: Experiments, Models, and Probabilities: Lecture 1.1 Concepts Covered: Experiments, Randomness, Causes
Texas San Antonio - STA - 3523
Methods for Categorical Data Analysis: Applications of ChiSquare: 1. Testing goodness of fit of a model:Many experiments result into more than two outcomes. For example classifying a family bases on number of members in a household, classifying cars
Texas San Antonio - STA - 3533
Lecture10_1:Concepts: Definitions of Stochastic Processes, Sample Function, Ensemble; Types of Stochastic Processes and Random Sequences and Bernoulli processIn the study of a stochastic process, we study random variables which are functions of ti
LSU - NR - 13634
Summer 2003 Vol. 46, No. 3Louisiana Agriculture, Summer 20031Ahead of the Curve on MosquitoesAs we headed into mosquito season in June 2002, the LSU AgCenter sponsored a one-day conference on mosquito-borne diseases. This was the first such co
Malone - MEDIA - 465
Virginia Tech - AOE - 4134
HW Set 2 P1 Solution23 September 19981) B-M-W (Problem 1.8) a) We evaluate the energy E= v2 - = 1.125 - .33 > 0(SU ) 2 rThe orbit has positive total energy and is a hyperbola b) In this instance we are told that rp = 1.5DU and p = 3DURecall r
Texas San Antonio - CS - 3873
Interplay between routing and forwardingrouting algorithmRouting AlgorithmsRajendra V. Boppana CS Department UT San AntonioAdapted from the original slides by Kurose and Ross, Computer Networking, 4th ed.Fall 07 - Spring 09local forwarding ta
Texas San Antonio - CS - 3873
CS 3873 Computer NetworksProject Assigned: 3/27/09 Due: 4/27-5/8/09Spring 2009Given below is a project on the Cisco router cluster available in the main CS Lab. Students may form groups of two to work on this project. No collaboration among grou
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsHomework 1: Processes and Threads Due: September 28, 1999 VIP Students: Due one week from when you receive the assignment.1. (10 pts) What are differentiates between a program, an executable, and a process? 2. (10 pts)
UMass (Amherst) - CS - 377
%!PS-Adobe-2.0 %Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software %Title: head.dvi %Pages: 3 %PageOrder: Ascend %BoundingBox: 0 0 612 792 %DocumentFonts: Helvetica-Bold Times-Roman Times-Bold Times-Italic %+ Courier %EndComments %DVIPSWebPag
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsSolution to homework 1: Processes and Threads1. (10 pts) What are differentiates between a program, an executable, and a process? Solution: A program is a collection of source files in some high level language that you
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsHomework 2: Scheduling and Synchronization Due: October 14, 1999 VIP Students: Due one week from when you receive the assignment.1. (10 pts) Scheduling. Given the following mix of job, job lengths, and arrival times, a
UMass (Amherst) - CS - 377
%!PS-Adobe-2.0 %Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software %Title: head.dvi %Pages: 4 %PageOrder: Ascend %BoundingBox: 0 0 612 792 %DocumentFonts: Helvetica-Bold Times-Roman Times-Bold Courier %EndComments %DVIPSWebPage: (www.radicale
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsSolution to homework 2: Scheduling and Synchronization1. (10 pts) Scheduling. Given the following mix of job, job lengths, and arrival times, assume a time slice of 15 and compute the completion and average response ti
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsHomework 3: Monitors and Deadlock Due: October 26, 1999 VIP Students: Due 12 days from when you receive the assignment.1. (20 pts) Semaphores & Monitors. (a) (16 pts) Solve the candy shop problem with semaphores and wi
UMass (Amherst) - CS - 377
%!PS-Adobe-2.0 %Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software %Title: head.dvi %Pages: 5 %PageOrder: Ascend %BoundingBox: 0 0 612 792 %DocumentFonts: Helvetica-Bold Times-Roman Times-Bold Courier %+ Times-Italic %EndComments %DVIPSWebPag
UMass (Amherst) - CS - 377
CMPSCI 377: Operating SystemsSolution to homework 3: Monitors and Deadlock1. (20 pts) Semaphores & Monitors. (a) (16 pts) Solve the candy shop problem with semaphores and with monitors. In the candy shop, a customer enters and takes a number (don'
East Los Angeles College - GREE - 0579
Hydrodynamic construction of the electromagnetic fieldPeter Holland Green College University of Oxford Oxford OX2 6HG England peter.holland@green.ox.ac.uk 11th June 2005 Abstract We present an alternative Eulerian hydrodynamic model for the electrom
BYU - MFG - 201
Travis Leithead Dr. Strong The History of Creativity Aspects of Creativity in my paper Pope Wars In inception for this creative project began with a joke that a light discussion of the lecture on the middle ages generated among some of my friends. W