Compiler Design: Parsing, Optimization, and Memory Management
Top-Down vs. Bottom-Up Parsing
Bottom-up parsing starts from the input string and reduces it to the start symbol (leaves → root). Example: LR parsing.
Top-down parsing is simple but less powerful, while bottom-up is complex but handles more grammars.
FIRST Algorithm
- If X is terminal → FIRST(X) = {X}
- If X → ε → include ε
- If X → Y₁Y₂…Yₙ:
- Add FIRST(Y₁) (except ε)
- If ε in FIRST(Y₁), check next
- If all have ε → add ε
- Repeat until no change
FOLLOW Algorithm
- Put $ in FOLLOW(Start Symbol)
- For A → αBβ:
- Add FIRST(β) (except ε) to FOLLOW(B)
- If A → αB OR ε in FIRST(β), add FOLLOW(A) to FOLLOW(B)
- Repeat until no change
DFA vs. NFA
- DFA: 1 transition, no ε
- NFA: Multiple transitions or ε allowed
Left Recursion Removal
If: A → Aα | β
Convert to:
- A → βA′
- A′ → αA′ | ε
This removes left recursion and helps in predictive parsing.
Syntax Directed Definition (SDD)
SDD is a context-free grammar with attributes and rules. It is used to define semantic meaning and translate source code into intermediate code.
Syntax Tree vs. DAG
- Syntax Tree: Represents the hierarchical structure of expressions.
- DAG (Directed Acyclic Graph): Removes duplicate subexpressions and is used for optimization. It reduces computation by eliminating repeated calculations.
Three Address Code (TAC)
TAC is an intermediate representation where each instruction contains at most three operands (e.g., x = y op z). It breaks complex expressions into smaller steps using temporary variables, making the program easier to analyze, optimize, and translate into machine code.
Quadruple, Triple, and Indirect Triple
- Quadruple: (op, arg1, arg2, result) — easy to rearrange.
- Triple: Uses position instead of result — less memory.
- Indirect Triple: Uses pointers — flexible but complex.
Activation Record
A data structure used during function calls containing:
- Return address
- Parameters
- Local variables
- Temporary variables
Runtime Storage Allocation
- Static: Memory fixed at compile time.
- Stack: Memory allocated during function calls.
- Heap: Dynamic memory allocation at runtime.
Basic Blocks and Flow Graphs
A basic block is a sequence of consecutive instructions with a single entry and exit point. A flow graph represents the control flow of a program, where nodes are basic blocks and edges represent control flow.
Code Optimization
The process of improving performance without changing output. Techniques include:
- Common subexpression elimination: Avoids repeated calculations.
- Dead code elimination: Removes unused code.
- Copy propagation: Replaces variables with assigned values.
- Peephole optimization: Simplifies small sets of instructions.
Operator Precedence Grammar
Defines precedence relations between operators, used in expression parsing where operators have different priorities.
LL(1) Parser
A top-down parser that reads input left-to-right, constructs a leftmost derivation, and uses one lookahead symbol. It requires no left recursion and left factoring.
Error Handling
- Types: Lexical, Syntax, Semantic.
- Recovery: Panic mode, phrase-level recovery.
