Parsing and Syntax-Directed Translation: A Comprehensive Guide

Top-Down Parsing: Table-driven LL(I) Parser. Top-down parsers cannot handle left-recursive grammars. up2YqaEGLaYQRZ1k7iw5aKpPSutbqucZ_Za-zUbGwmwA3HTd6eLv2lfTpDWAJsfeRn96anOQqjL1hmEKiC9SQfSVexySdvKtQGlvB5o0saiX4asQCOydfX-pH_1Pbw0sfy-zwelVve16cLU-DJ1nyKE Hc1uUMXeadThdv7dxXXYaizcnFgHXAJLGFkx1cEsI0bd3Q_2Xi8Lc3_Bhk0uDuXoB11j0CT20bwATw6-hWSQzIWKZnifq15HquvdlwJMsUPuoSCA16pOh6ZFQVZV2VCEnAIe3Q12hpBNEdFKXsSEHtI Left factoring example: YujMCBAgQqKJAIwYDVbx9TREgQIAAAQIECBAgQIAAAQJNJCAYaKLBdCsECBAgUDsBwUDtbLVMgAABAgQIECBAgAABAgQI1FdAMFBfb1cjQIAAgQYVEAw06MDpNgECBAgQIECAAAECBAgQINBGQDDQhkQFAQIECBBoKyAYaGuihgABAgQIECBAgAABAgQIEGhMAcFAY46bXhMgQIBAnQUEA3UGdzkCBAgQIECAAAECBAgQIECgZgKCgZrRapgAAQIEmklAMNBMo+leCBAgQIAAAQIECBAgQIBA7xYQDPTu8Xf3BAgQINBJAcFAJ6EcRoAAAQIECBAgQIAAAQIECBReQDBQ+CHSQQIECBAogoBgoAijoA8ECBAgQIAAAQIECBAgQIBANQQEA9VQ1AYBAgQINL2AYKDph9gNEiBAgAABAgQIECBAgACBXiMgGOg1Q+1GCRAgQKA7AoKB7ug5lwABAgQIECBAgAABAgQIECiSgGCgSKOhLwQIECBQWAHBQGGHRscIECBAgAABAgQIECBAgACBCgUEAxWCOZwAAQIEeqeAYKB3jru7JkCAAAECBAgQIECAAAECzSggGGjGUXVPBAgQIFB1AcFA1Uk1SIAAAQIECBAgQIAAAQIECPSQgGCgh+BdlgABAgQaS0Aw0FjjpbcECBAgQIAAAQIECBAgQIBAeQHBQHkbewgQIECAwFqBN954I1188cVp1apVa+viw+jRo9N+++2Xq7NBgAABAgQIECBAgAABAgQIECiygGCgyKOjbwQIECBAgAABAgQIECBAgAABAgQIECBAoMoCgoEqg2qOAAECBAgQIECAAAECBAgQIECAAAECBAgUWUAwUOTR0TcCBAgQIECAAAECBAgQIECAAAECBAgQIFBlAcFAlUE1R4AAAQIECBAgQIAAAQIECBAgQIAAAQIEiiwgGCjy6OgbAQIECBAgQIAAAQIECBAgQIAAAQIECBCosoBgoMqgmiNAgAABAgQIECBAgAABAgQIECBAgAABAkUWEAwUeXT0jQABAgQIECBAgAABAgQIECBAgAABAgQIVFlAMFBlUM0RIECAAAECBAgQIECAAAECBAgQIECAAIEiCwgGijw6+kaAAAECBAgQIECAAAECBAgQIECAAAECBKosIBioMqjmCBAgQIAAAQIECBAgQIAAAQIECBAgQIBAkQUEA0UeHX0jQIAAAQIECBAgQIAAAQIECBAgQIAAAQJVFhAMVBlUcwQIECBAgAABAgQIECBAgAABAgQIECBAoMgCgoEij46+ESBAgAABAgQIECBAgAABAgQIECBAgACBKgsIBqoMqjkCBAgQIECAAAECBAgQIECAAAECBAgQIFBkAcFAkUdH3wgQIECAAAECBAgQIECAAAECBAgQIECAQJUFBANVBtUcAQIECBAgQIAAAQIECBAgQIAAAQIECBAosoBgoMijo28ECBAgQIAAAQIECBAgQIAAAQIECBAgQKDKAoKBKoNqjgABAgQIECBAgAABAgQIECBAgAABAgQIFFlAMFDk0dE3AgQIECBAgAABAgQIECBAgAABAgQIECBQZQHBQJVBNUeAAAECBAgQIECAAAECBAgQIECAAAECBIosIBgo8ujoGwECBAgQIECAAAECBAgQIECAAAECBAgQqLKAYKDKoJojQIAAAQIECBAgQIAAAQIECBAgQIAAAQJFFhAMFHl09I0AAQIECBAgQIAAAQIECBAgQIAAAQIECFRZQDBQZVDNESBAgAABAgQIECBAgAABAgQIECBAgACBIgsIBoo8OvpGgAABAgQIECBAgAABAgQIECBAgAABAgSqLCAYqDKo5ggQIECAAAECBAgQIECAAAECBAgQIECAQJEFBANFHh19I0CAAAECBAgQIECAAAECBAgQIECAAAECVRb4b+AUvgf2CyhpAAAAAElFTkSuQmCC In recursive descent, at each step, there are many choices of production to use. In LL(1), at each step, there is only one choice of production. cDHdo0BJBq1YDd9jQeTAGpbjM58PjfNnSghRvZx3W6AxVupD6krNePWDfX4hbLxPgVymynEx_E98rAO9UeqX3wcUtDG_pqe9qC_JB2C-rhrlW0cmYY-3r-FG0KLHHyfk3DjTtbxE8bHI4BcS_yDHbh8 nV2Jfz1MkkqRgUl_vl10S8JQy5S0ZNWICnhg21Zu3GKcjzczVOmexKOh1mh9cFtWcdjbr1JbIvOmrV5Cgzg2U9BmX3c5JnUNTi0e7nemLalQ_fgjgSwZifNZYzhLWjqptGw_0dbHW3_PyRmWZtnhnAE wfOc1ojjZC68wAAAABJRU5ErkJggg== wBbdUqDtjgkMAAAAABJRU5ErkJggg== Bottom-Up Parsing: It traces a rightmost derivation in reverse. A handle is a string that can be reduced and also allows further reductions back to the start symbol. In shift-reduce parsing, handles always appear at the top of the stack. b2fKW1Z7aIwAAAABJRU5ErkJggg== rpGgjRdHTYBgEIQAACEIAABCAAAQhAIHcJIERzNxsqgwAEIAABCEAAAhCAAAQgAIEsE0CIZhkoh4MABCAAAQhAAAIQgAAEIKCAwP8HWud7If1WaK4AAAAASUVORK5CYII= R is the follow of the nonterminal in that state. kKCF8fmrG6wAAAABJRU5ErkJggg== Add an extra production S’ –> Start to the grammar. S –> .Start;

LR(1) is more powerful. LR(0) is basically SLR. Syntax-Directed Translation: Approaches to syntax error recovery: Panic mode (when an error is detected, discard tokens until one with a clear role is found, e.g., delimiters such as semicolon, }); error productions (promotes common errors to alternate syntax, e.g., 5x instead of 5*x); automatic local or global correction (idea: find a correct “nearby” program). Abstract syntax trees (AST): operators appear at internal nodes, not at leaves, “chains” of single productions are collapsed, syntactic details are omitted (e.g., (), , ;); Example: consider a grammar: E → int | (E) | E + E. And the string 5+(2+3)

kSzcSbbn5lZWBWqwKaLV_I6bi0Fu11svJnx1SlGbZFJjBna2NiqxRZnuNRZ3BJiEQLVUcpypl_t81YuIc7YmNh76AjCUxrDnRZ0AdcLBnxw0Y7UuUlxHA0K75B7JqxoV7RRg-quF-sapTZt1pOQ9rc4 yj0Gzh49uPOyaF2YBitsefkxhsN-YYCItFUoVCcPJhVSC6UIofjhJkiPIfelGExDWqzluEHT-nFnSMoao9mur491lxLN4KomVf8r_pT1AqQbdNKFWKhCHytrwPPQ-XUHvR-Cb71ZvNtyvWpNDttrvcc NZKDGsHCEhJB_XoLKk8bhxYZq3u_1OPD8tZECD6uYg4aFCA-2lgUof-gH6jBPHJqL1LkjtxV7lqOThGEdkjmb03ZCNiPyzX_O2KO9IeJE2yYLMzUeWgqWZNB5LL-rKKMxMk0lZpcwdqUHMkgUqejn3E Constructing an AST: we assume int.lexval is the value of the integer lexeme. S6MaWEoGKRzRONX9ZAFKkMZcV6FXyE5E5K_O9opyiMxdDFqwHBIU9DKYjzYj0uWG6mAWT1R_AaTfVoVmb__JGswAWfid86rzgWuhatYTMicl6lQD5epecISk4ba7_Aqq98kCfhFjXuBr-5GCOJarqX8 rAvOJ5MT-Kj9kouCzfvUt_9AeTfBLKW6a3i23g53AWLLStFLIP2Avzh0izWucYPgo5FuhUk2uobQpNWCNjX0Oii__kR26Jmkp3LSmnnjMAhHYp7gggiFCob0M8oU6dHiChjKcjxGU6fjQoMEzMdN3xc , look at the AST tree above. Syntactic action: used for constructing AST. Lexical analysis: detects inputs with illegal tokens. Parsing: Detects inputs with ill-formed parse trees. Semantic analysis: Catches all remaining errors. Classes of errors: Lexical – detected by scanner, e.g., illegal character in input, illegal comments; Syntactic: detected by the parser, input is not a legal program because it cannot be parsed by CFG, e.g., a = *5; Static semantic: can be detected in parser or in separate semantic analysis passes, input can be parsed by CFG but some non-context-free error, e.g., multiply declared variables (a variable should be declared at most once), undeclared variables, functions call with the wrong number of arguments, type mismatches (type of the left-hand side of an assignment should match the type of the right-hand side); Semantic: may be detected at compile time, may also add checks in code for runtime, e.g., division by 0, array index out-of-bounds, null pointer dereference; Logical: hardest to detect, program is syntactically and semantically correct but does not do the “correct” things, compiler can sometimes detect problem things: if (x=0) result = 1; result = 2; The purpose of the symbol table is to keep track of names declared in the program. w3etkWC_1Pd9ZjHSorCEFB8iYlrV_1SSXArpkp9VbT7yOrVkN1jHcs1KvcWS82wg2YJlW_LES8ErG_FEX_LHjXJOA0lMJ0bH7c__QE0Xl3lJnub8-Hype4g5xWekr16ZubOGuqmmFYisJKNPEGErslQ Scoping general rules: Determine which declaration of a named object corresponds to each use of the object. Each function has two or more scopes: one for the parameters, one for the function body, and possible additional scopes in the function. Dynamic scoping: a use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function, e.g., Lisp, SNOBOL. The symbol table will answer two questions: Given a declaration of a name, is there already a declaration of the same name in the current scope, i.e., is it multiply declared, second is given a use of a name, to which declaration does it correspond or is it undeclared. For static scoping, we don;t allow multiple declarations of a name in the same scope, we allow the same name to be declared in multiple nested scopes, use the same scope for a method’s parameters and for local variables declared at the beginning of the method. List of hash tables: On scope entry: Increment the current level number and add a new empty hash table to the front of the list. To process a declaration of x: Look up x in the first table in the list, If it is there, then issue a “multiply declared variable” error Otherwise, add x to the first table in the list. To process a use of x: Look up x starting in the first table in the list If it is not there, then look up x in each successive table in the list If it is not in any table then issue an “undeclared variable” error. On scope exit: Remove the first table from the list and decrement the current level number. Hash table of lists: There is just one big hash table, containing an entry for each variable for which there is: Some declaration in scope S, or In a scope that encloses S.  Associated with each variable is a list of symbol table entries. The first list item corresponds to the most closely enclosing declaration. The other list items correspond to declarations in enclosing scopes. On scope entry: Increment the current level number – To process a declaration of x: look up x in the symbol table. If x is there, fetch the level number from the first list item If that level number = the current level then issue a “multiply declared variable” error Otherwise, add a new item to the front of the list with the appropriate type and the current level number. If x is not there, add x into the table and a new item to the front of its list 3. The numbers represent the level. fjg6_hGr2kfYLtPq9iMvGpprcQlqivuTCeT6ovP4l9cEwoiPo9QP5_U76e_JN5MR3ASc1JuYXWfbezWwentSSBpYWQCaSvWTxD_HY7HTX4cuMOp_S1UT9MlZe0N1JN_chjHnCk3HHnoqe3eE3KXDkO0 s6scE4Hy5rv3XzOJ725I9prT7NL8YTsd4Gxhv7gPbuUG4JQ7DKzPpfc5JugQZe_Pv9a7wd1BN93toavLxHnTIz6Ot5tLaE-MYuvb6agug1UmIQe1Gz1bRzoCvVfHSwzrXRiQQF01jKD00aPRtugZXCc