Parsing and Syntax-Directed Translation: A Comprehensive Guide
Top-Down Parsing: Table-driven LL(I) Parser. Top-down parsers cannot handle left-recursive grammars. Left factoring example: In recursive descent, at each step, there are many choices of production to use. In LL(1), at each step, there is only one choice of production. Bottom-Up Parsing: It traces a rightmost derivation in reverse. A handle is a string that can be reduced and also allows further reductions back to the start symbol. In shift-reduce parsing, handles always appear at the top of the stack. R is the follow of the nonterminal in that state. Add an extra production S’ –> Start to the grammar. S –> .Start;
LR(1) is more powerful. LR(0) is basically SLR. Syntax-Directed Translation: Approaches to syntax error recovery: Panic mode (when an error is detected, discard tokens until one with a clear role is found, e.g., delimiters such as semicolon, }); error productions (promotes common errors to alternate syntax, e.g., 5x instead of 5*x); automatic local or global correction (idea: find a correct “nearby” program). Abstract syntax trees (AST): operators appear at internal nodes, not at leaves, “chains” of single productions are collapsed, syntactic details are omitted (e.g., (), , ;); Example: consider a grammar: E → int | (E) | E + E. And the string 5+(2+3)
Constructing an AST: we assume int.lexval is the value of the integer lexeme. , look at the AST tree above. Syntactic action: used for constructing AST. Lexical analysis: detects inputs with illegal tokens. Parsing: Detects inputs with ill-formed parse trees. Semantic analysis: Catches all remaining errors. Classes of errors: Lexical – detected by scanner, e.g., illegal character in input, illegal comments; Syntactic: detected by the parser, input is not a legal program because it cannot be parsed by CFG, e.g., a = *5; Static semantic: can be detected in parser or in separate semantic analysis passes, input can be parsed by CFG but some non-context-free error, e.g., multiply declared variables (a variable should be declared at most once), undeclared variables, functions call with the wrong number of arguments, type mismatches (type of the left-hand side of an assignment should match the type of the right-hand side); Semantic: may be detected at compile time, may also add checks in code for runtime, e.g., division by 0, array index out-of-bounds, null pointer dereference; Logical: hardest to detect, program is syntactically and semantically correct but does not do the “correct” things, compiler can sometimes detect problem things: if (x=0) result = 1; result = 2; The purpose of the symbol table is to keep track of names declared in the program. Scoping general rules: Determine which declaration of a named object corresponds to each use of the object. Each function has two or more scopes: one for the parameters, one for the function body, and possible additional scopes in the function. Dynamic scoping: a use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function, e.g., Lisp, SNOBOL. The symbol table will answer two questions: Given a declaration of a name, is there already a declaration of the same name in the current scope, i.e., is it multiply declared, second is given a use of a name, to which declaration does it correspond or is it undeclared. For static scoping, we don;t allow multiple declarations of a name in the same scope, we allow the same name to be declared in multiple nested scopes, use the same scope for a method’s parameters and for local variables declared at the beginning of the method. List of hash tables: On scope entry: Increment the current level number and add a new empty hash table to the front of the list. To process a declaration of x: Look up x in the first table in the list, If it is there, then issue a “multiply declared variable” error Otherwise, add x to the first table in the list. To process a use of x: Look up x starting in the first table in the list If it is not there, then look up x in each successive table in the list If it is not in any table then issue an “undeclared variable” error. On scope exit: Remove the first table from the list and decrement the current level number. Hash table of lists: There is just one big hash table, containing an entry for each variable for which there is: Some declaration in scope S, or In a scope that encloses S. Associated with each variable is a list of symbol table entries. The first list item corresponds to the most closely enclosing declaration. The other list items correspond to declarations in enclosing scopes. On scope entry: Increment the current level number – To process a declaration of x: look up x in the symbol table. If x is there, fetch the level number from the first list item If that level number = the current level then issue a “multiply declared variable” error Otherwise, add a new item to the front of the list with the appropriate type and the current level number. If x is not there, add x into the table and a new item to the front of its list 3. The numbers represent the level.