This question was created from parser10.cpp
View the step-by-step solution to:

Question

This question was created from parser10.cpp https://www.coursehero.com/file/23682935/parser10cpp/

23682935-358158.jpeg

#include <iostream> #include <stdlib.h> #include "wktokens.h" using namespace std ; // the last token read - start with ? static wktokens *tokeniser = NULL ; static string token = "?" ; static string tokenclass = "?" ; static string tokenvalue = "?" ; // useful parsing functions void nextToken() { token = tokeniser->next_token() ; tokenclass = tokeniser->token_class() ; tokenvalue = tokeniser->token_value() ; } // if we have the expected token - move to next token otherwise give up! void mustbe(string expected) { if ( expected != token ) { cout << "Error: found token "" << token << "" but expected "" << expected << """ << endl ; exit(-1) ; } nextToken() ; } // if we have the expected token - move to next token and return true bool have(string expected) { if ( expected != token ) return false ; nextToken() ; return true ; } // the grammar we are recognising // TERM: DEFINITION // program: declarations statement // declarations: ('var' identifier ';')* // statement: whileStatement // ifStatement | // letStatement | // '{' statementSequence '}' // whileStatement: 'while' '(' condition ')' statement // ifStatement: 'if' '(' condition ')' statement ('else' statement)? // letStatement: 'let' identifier '=' expression ';' // statementSequence: (statement)* // expression: term (op term)? // condition: term relop term // term: identifier | integerConstant // TOKEN: DEFINITION // identifier: ('a'-'z'|'A'-'Z')('a'-'z'|'A'-'Z'|'0'-'9')* // integerConstant: ('0'-'9')('0'-'9')* // relop: '<' | '<=' | '==' | '!=' | '>' | >=' // op: '+' | '-' | '*' | '/' // keyword: 'var' | 'while' | 'if' | 'else' | 'let' // symbol: '{' | '}' | '(' | ')' | ';' | '=' // since parsing is recursive, forward declare one function to parse each non-terminal: void parseProgram() ; void parseDeclarations() ; void parseStatement() ; void parseWhileStatement() ; void parseIfStatement() ; void parseLetStatement() ; void parseStatementSequence() ; void parseExpression() ; void parseCondition() ; void parseTerm() ; // now implement the parsing functions void parseProgram() { } void parseDeclarations() { } void parseStatement() { if ( have("while") ) parseWhileStatement() ; else if ( have("if") ) parseIfStatement() ; else if ( have("let") ) parseLetStatement() ; else { mustbe("{") ; parseStatementSequence() ; mustbe("}") ; } } void parseWhileStatement() { } void parseIfStatement() { } void parseLetStatement() { } void parseStatementSequence() { } void parseExpression() { } void parseCondition() { } void parseTerm() { } // call parseProgram from here int main() { // tokeniser and read first token to initialise it tokeniser = wktokens::newtokeniser() ; nextToken() ; // parse a Program parseProgram() ; // check for end of file mustbe("?") ; } Program Structure So that you do not have to worry about how to tokenise the input, we have provided a working tokeniser. The most useful functions, next_token(), mustbe() and have(). next_token() calls the tokeniser and returns the next token in the input. The token kinds can be found in the file includes/tokeniser.h. The functions token_kind(), token_spelling() and token_ivalue() can be used to inspect the contents of a token object. If there are any errors constructing the next token or the end of the input is reached, the returned token kind will be tk_eoi. mustbe() is used when you know what the last symbol read must be. The function will check whether or not the last symbol read is the expected symbol and will terminate the program with an error if it is not. If no error occurred, it will then read the next symbol from the input but return the matching token. have() is used when you want to know if a particular symbol has just been read. The function will check whether or not the last symbol read is the expected symbol. If the expected symbol was read, the function returns true, otherwise it returns false. There are some situations where you want to check if a token is one of a number of possibilities such as a relational operator or the start of one of the statements. To accommodate this, the tokeniser has four token kinds, tk_statement, tk_infix_op, tk_relop and tk_term, that can be used with mustbe() and have(). The mustbe() function will advance the input if it finds matching token but the parser may need to know which token it found. For this reason, the mustbe() function returns the token that it found. Parsing Functions In a recursive descent parser we start by writing a function for each term in the programming language's grammar which is responsible for parsing that term. In this parser, each function is also going to return an abstract syntax tree node representing what it just parsed. For example, if parsing the grammar shown above, we would start with the functions: ast parseProgram() ; ast parseDeclarations() ; ast parseStatement() ; ast parseWhileStatement() ; ast parseIfStatement() ; ast parseLetStatement() ; ast parseStatementSequence() ; ast parseExpression() ; ast parseCondition() ; ast parseTerm() ; you need to complete the bodies of these parse functions. The parser_xml() function initialises the tokeniser for you with a call to next_token() and then calls the parseProgram() function. Example Parse Function To illustrate how we use the have() and mustbe() functions, here is what the parseStatementSequence() function could look like: // statementSequence: '{' statement* '}' ast parseStatementSequence() { vector<seq> seq ; mustbe(tk_lcb) ; while ( have(tk_statement) ) { seq.push_back(parseStatement()) ; } mustbe(tk_rcb) ; return create_statements(seq) ; } The statementSequence rule states that a statementSequence must start with a '{' (tk_lcb), it may be followed by 0 or any number of statements and then finishes with a '}' (tk_rcb). Therefore our parse function starts with a call of mustbe(tk_lcb) so that if that is not the next token a fatal error is reported. The input will then have moved onto the next token which should be either the start of a statement or a '}'. There is a group token tk_statement that matches any token that can start a statement so we use a call of have(tk_statement) to tell if the next token starts a statement. While the next token is the start of a statement we go around a while loop parsing statements by calling parseStatement(). When the next token cannot start a statement, the while loop terminates. If the input is correct then there must be a '}' to complete the statement sequence. Therefore the last parsing step is to call mustbe(tk_rcb) so that if that is not the next token a fatal error is reported. This example also shows the approach taken to creating abstract syntax tree nodes. In each parse function we collect the components of a tree node and only create the node as the last step. In this case, we do not know how many times we will call parseStatement() to create statements nodes, so we collect them in a vector which is finally passed to the create_statements() function. Variable Declarations As part of writing the parser you must construct tree nodes to represent the variables that encode their name, their type, their segment and their offset within their segment. In the simple language all variables are stored in the local segment and are of type int. This can be achieved by using a symbol table to record which variables have been declared and where the variables are actually stored in the local segment. When a new variable is declared the declare_variable() function is called to add the variable to the symbol table. The variable is allocated the next free memory location in the local segment. If a variable has already been declared an error message is printed and the program will exit. Once the variable has been added to the symbol table, declare_variable() returns a tree node representation of the variable. When a variable is used in a statement or expression, the lookup_variable() function is called to find out where the variable is stored in the local segment. If the variable has not been declared an error message is printed and the program will exit. If the variable is present in the symbol table, lookup_variable() returns a tree node representation of the variable. Part 3 - Error Handling If you have completed steps 1 and 2 it is instructive to consider how to report syntax errors. For example, what information goes into the error message, how do you represent the location of the error and do we attempt to continue parsing so we can identify as many syntax errors as possible? Recovering from a syntax error may not always be possible because the parser may not be able to tell what the programmer was trying to write , and therefore how to correct the error. There are number options available. The easiest option is to simply give up completely so the compiler never has to handle more than one error. If the compiler writer wants to look for other errors there are two main approaches. Either pretending an expected symbol was present after all and continuing to parse the program on that basis. Alternatively, assuming that some extra input is present and simply deleting all input until the expected symbol is finally found. Many compilers adopt a combination of both. The way to implement the reporting and recovery is by writing your own version of the mustbe() function. When the first error is discovered, your mustbe() function simply reports the error and returns. This in effect, simulates inserting the missing symbol into the program and continuing as if the program was correct. The second time an error is discovered by your mustbe() function, the error is reported but, before it returns, your mustbe() function reads and discards tokens until either it finds the expected token or the end of the input. For this step, write our own mustbe() function so that it can recover from syntax errors as just described. If you now give your parser an incorrect program, what error messages would help you as a programmer work out what your mistake was? You may need to use a different name for your mustbe() function and edit your parser.cpp file to use the new name throughout. Please fill the empty functions!!!

23682935-358158.jpeg

// now implement the parsing functions
void parseProgram( )
void parseDeclarations( )
void parseStatement ( )
if ( have(&quot;while&quot;) ) parseWhileStatement ( ) ; else
if ( have(&quot;if&quot;) ) parseIfStatement( ) ; else
if ( have(&quot;let&quot;) ) parseLetStatement( ) ; else
{
mustbe ( &quot;{&quot;) ;
parseStatementSequence ( ) ;
mustbe(&quot;}&quot;) ;
void parseWhileStatement( )
void parseIfStatement ( )
void parseLetStatement ( )
void parseStatementSequence ( )
void parseExpression( )
void parseCondition( )
void parseTerm( )
// call parseProgram from here
int main( )

Recently Asked Questions

Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.

  • -

    Study Documents

    Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

    Browse Documents
  • -

    Question & Answers

    Get one-on-one homework help from our expert tutors—available online 24/7. Ask your own questions or browse existing Q&A threads. Satisfaction guaranteed!

    Ask a Question
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes