Compiler Construction: Phases and Parsing Techniques
Compiler Design and Phases: Lexical Analysis to Code Generation
1. Phases of Compilation
The compilation process is divided into six key phases:
Lexical Analysis
- Converts the source code into tokens such as keywords, identifiers, operators, and delimiters.
- Example: For the code
int x = 5;, the tokens generated areint,x,=,5,;.
Syntax Analysis
- Checks the structure of code against the grammar rules of the programming language and builds a parse tree.
- Example: Parsing
E → E + T | Tfor the expressiona + bproduces a parse tree with+as the root.
Semantic Analysis
- Ensures the meaning of the code is valid, such as type compatibility, variable declarations, and scope.
- Example: In
int a = "hello";, the type mismatch is detected.
Intermediate Code Generation
- Converts the code into an intermediate representation that is machine-independent.
- Example: For
a = b + c;, the intermediate code may be:t1 = b + c,a = t1
Code Optimization
- Refines the intermediate code to improve efficiency, reducing runtime or memory usage.
- Example: Removing redundant instructions or loops.
Code Generation
- Converts the optimized intermediate code into target machine code or assembly code.
- Example: Converts
t1 = b + cintoADD R1, B, Cfor a specific processor.
Complete Diagram of Compiler Phases
Source Code↓Lexical Analysis → Tokens↓Syntax Analysis → Parse Tree↓Semantic Analysis → Annotated Syntax Tree↓Intermediate Code Generation → Intermediate Code↓Code Optimization → Optimized Intermediate Code↓Code Generation → Target Machine Code
2. Parsing Techniques: LL & LR Parsers
LL Parsers (Top-Down Parsing)
- Reads input from left to right and constructs a leftmost derivation.
- Suitable for simple grammars without left recursion.
- Example: Predictive parsing for the grammar
E → T + E | T. Parsingid + idinvolves constructing the parse tree step by step.
LR Parsers (Bottom-Up Parsing)
- Reads input from left to right and constructs a rightmost derivation in reverse.
- Can handle a larger set of grammars, including left-recursive grammars.
- Example: Shift-reduce parsing for
E → E + T | T.
Differences:
| Aspect | LL Parser | LR Parser |
|---|---|---|
| Parsing Direction | Top-down | Bottom-up |
| Grammar Restrictions | No left recursion, simple CFG | All CFGs |
| Error Detection | Limited | Strong |
Examples:
- LL Parser: Parsing
E → T + E | Tinvolves expanding the leftmost symbol first. - LR Parser: Shift-reduce parsing handles
E → T + E | Tusing a stack-based approach.
3. Optimization Techniques
Basic Blocks and Flow Graphs
- Basic Block: A sequence of instructions without branches, except at the start and end.
- Example:
t1 = a + b,t2 = c * d,t3 = t1 + t2 - Flow Graph: Represents control flow among basic blocks, with nodes as blocks and edges showing the control flow.
Loop Optimization
- Enhances loop performance by reducing redundancy.
- Techniques:
- Code Motion: Moves invariant code outside the loop.
- Example:
Before:for (i = 0; i < n; i++) { x = 10; sum += arr[i] * x; }
After:x = 10; for (i = 0; i < n; i++) { sum += arr[i] * x; }
Peephole Optimization
- Local optimization of small code segments.
- Techniques:
- Redundant Instruction Removal: Eliminates unnecessary instructions.
- Strength Reduction: Replaces expensive operations with simpler ones.
- Example: Replace
x * 2withx << 1.
By using these optimization techniques, compilers improve the efficiency of the generated code.
