Compiler Design: Analysis and Synthesis Phases

Compiler Phases: Analysis, Optimization, and Synthesis

A compiler translates a high-level program (like C or Java) into machine code understood by the hardware.
The compilation process is divided into multiple phases, each with a specific role. These phases work together to convert source code into an efficient executable program.

The phases are generally grouped into:

  1. Front End – Analysis Phases
  2. Middle End – Optimization Phase
  3. Back End – Synthesis Phases

Below is the complete flow:

Source Program → Lexical Analysis → Syntax Analysis → Semantic Analysis → Intermediate Code Generation → Code Optimization → Code Generation → Target Machine Code


1. Lexical Analysis (The Scanner)

Purpose

Breaks the source program into tokens.

What is a Token?

A token is a meaningful unit:
Identifiers (x, sum), Keywords (if, while), Operators (+, =), Literals (10, 3.14), Punctuation (;, ,)

Tasks Performed

  • Removes whitespace and comments
  • Groups characters into valid tokens
  • Reports lexical errors (invalid characters)
  • Maintains symbol table entries for identifiers

Example

sum = a + 20;

Tokens → sum, =, a, +, 20, ;


2. Syntax Analysis (The Parser)

Purpose

Checks whether tokens follow the grammar of the language.
Builds a parse tree / syntax tree.

Tasks Performed

  • Verifies structure using Context-Free Grammar (CFG)
  • Detects syntax errors
  • Constructs parse tree

Example

Expression a + b * c
The parser ensures * has higher precedence and produces a syntax tree with correct associativity.


3. Semantic Analysis

Purpose

Ensures that the parse tree follows semantic rules of the language.

Tasks Performed

  • Type checking
    Example: int a; a = "hello"; → type mismatch error
  • Function argument checks
  • Variable declaration checks
  • Scope resolution
  • Inserts/updates information in the symbol table

Example

int x;
x = 3.5;

The semantic analyzer reports an error: assigning float to int.


4. Intermediate Code Generation (ICG)

Purpose

Generates machine-independent intermediate code.

A common form is Three-Address Code (TAC).

TAC Example

For expression: a + b * c

Intermediate code:

t1 = b * c
t2 = a + t1

Benefits

  • Easy to optimize
  • Independent of machine architecture

5. Code Optimization

Purpose

Improves the intermediate code to make the final program faster and more efficient without changing meaning.

Types of Optimizations

Local Optimization

Within a basic block
Example:

x = y * 2
z = y * 2      → eliminate this, reuse previous result

Global Optimization

Across basic blocks
Example: moving invariant computations out of loops.

Machine-Independent Optimization

Constant folding, dead code elimination
Example:

a = 10 * 20    → replaced with a = 200

6. Code Generation (Target Code)

Purpose

Converts optimized intermediate code into machine code / assembly code.

Tasks Performed

  • Selects machine instructions
  • Allocates CPU registers
  • Translates three-address code to machine instructions
  • Performs basic low-level optimizations

Example

TAC:

t1 = b * c
t2 = a + t1

Possible assembly (example):

MUL R1, b, c
ADD R2, a, R1

7. Symbol Table Management

Purpose

Stores information about identifiers:

IdentifierTypeScopeMemory Location
xintlocalstack offset
sumintglobaldata segment

Used by:

  • Lexical analyzer
  • Semantic analyzer
  • Code generator

8. Error Handling

Types of Errors

  • Lexical errors → invalid tokens
  • Syntax errors → grammar violation
  • Semantic errors → type/scope violations
  • Runtime errors → division by zero
  • Logical errors → wrong logic

The compiler tries to recover and continue analysis using:

  • Panic mode
  • Phrase-level recovery
  • Error productions

Compiler Phases Flow Diagram

Source Program
       ↓
Lexical Analysis → Tokens
       ↓
Syntax Analysis → Parse Tree
       ↓
Semantic Analysis → Annotated Tree
       ↓
Intermediate Code Generation → TAC
       ↓
Code Optimization → Optimized TAC
       ↓
Code Generation → Machine Code
       ↓
Target Program / Executable

Conclusion and Core Components

The compiler works step-by-step from source code to machine code through eight main components:

  1. Lexical Analysis
  2. Syntax Analysis
  3. Semantic Analysis
  4. Intermediate Code Generation
  5. Code Optimization
  6. Code Generation
  7. Symbol Table
  8. Error Handling

Understanding these phases clearly is essential because almost every question in Compiler Design (CS3501) is built around these core concepts.


This is a clean, structured, exam-fit 16-mark answer exactly as required.


Summary: Syntax Analysis Revisited

Purpose

Checks whether tokens follow the grammar of the language.
Builds a parse tree / syntax tree.

Tasks Performed

  • Verifies structure using Context-Free Grammar (CFG)
  • Detects syntax errors
  • Constructs parse tree

Example

Expression a + b * c
The parser ensures * has higher precedence and produces a syntax tree with correct associativity.