VLSI Design Fundamentals and Fabrication

Module 1 – VLSI Design Methodologies

VLSI (Very Large Scale Integration) integrates thousands to millions of transistors on a single chip, drastically reducing size, cost, and power consumption.

Moore’s Law states the number of transistors in a dense IC doubles approximately every 18 months. The Second Law states the cost of a semiconductor fabrication plant doubles every 4 years.

Application Specific Integrated Circuits (ASIC)

An ASIC (Application Specific Integrated Circuit) is a custom IC designed for specific applications. Types include Full-Custom, Semi-Custom, and Programmable ASICs. Advantages include being compact, fast, and having low power consumption. Disadvantages include a long time-to-market, high cost, and inflexibility.

ASIC Types

  • Full-Custom ASIC: All cells and masks are customized. Offers high performance and compact area but is expensive. Used in processors and memories.
  • Semi-Custom ASICs: Use predesigned logic cells. Types include Standard Cell Based ASICs (CBIC) and Gate Array Based ASICs.
    • CBIC: Uses flexible blocks of predesigned logic. Allows the use of megacells. Optimized for area and speed.
    • Gate Array (GA): Uses a base array of prefabricated transistors, where only the interconnects are customized. Types include Channeled GA, Channelless GA, and Structured GA.

Field Programmable Gate Arrays (FPGA)

An FPGA (Field Programmable Gate Array) is a programmable logic array with configurable logic blocks (CLBs), I/O blocks, and programmable interconnects. No mask customization is needed, offering fast design turnaround. Uses include prototyping, testing, and low-volume production.

CPLD vs FPGA

A CPLD is PAL-like with low density, predictable speed, and high power consumption. An FPGA is gate-array-like with medium to high density, application-dependent speed, and lower power consumption.

System on Chip (SoC)

A SoC (System on Chip) integrates a CPU, memory, I/O ports, and analog and digital components into one IC. This includes components like WiFi, Bluetooth, USB, ADC/DAC, and more. Benefits include low power, compactness, and speed. Challenges include cost, packaging, and heat dissipation.

ASIC Design Flow

  1. Design Entry: Using HDL (Verilog/VHDL) or schematic capture.
  2. Logic Synthesis: Produces a netlist.
  3. System Partitioning: Splits the design into ASIC-sized blocks.
  4. Pre-layout Simulation: Verifies the logic function.
  5. Floor Planning: Assigns physical area for blocks.
  6. Placement: Positions cells for minimum wire length.
  7. Routing: Connects cells (global and local).
  8. Extraction: Determines electrical properties (Resistance, Capacitance).
  9. Post-layout Simulation: Final validation including parasitic effects.

FPGA Design Flow

  1. Behavioral HDL description → Simulation → Synthesis
  2. Implementation (placement, routing) → Bitstream generation
  3. Program using JTAG → Debug in system

Design Approaches

  • Top-Down Design: Starts with a high-level system view and breaks it down into subsystems, eventually reaching logic gates. Allows team parallelism and design abstraction.
  • Bottom-Up Design: Begins from basic blocks (gates, flip-flops) and combines them into modules and systems. Promotes reuse and gradual complexity buildup.

Design Stages

  • Logical Design: Includes design entry, synthesis, partitioning, and simulation.
  • Physical Design: Includes floor planning, placement, routing, extraction, and layout verification.

Key Design Considerations

  • Speed: Affected by logic depth, wire delay, and layout.
  • Power: Includes dynamic (switching) and static (leakage) power, influenced by clocking and architecture.
  • Area: Depends on gate count, interconnect, and cell reuse. Must balance cost and performance.

Module 2 – Static CMOS Logic Design

NMOS Inverter (Static Analysis)

Consists of an NMOS transistor and a resistive load. When the input is LOW, the NMOS is OFF, and the output is HIGH. When the input is HIGH, the NMOS is ON, and the output is LOW. It acts as a NOT gate.

Regions of NMOS Operation

  • Cutoff: Vgs < Vt
  • Triode: Vgs > Vt and Vds < Vgs − Vt
  • Saturation: Vgs > Vt and Vds ≥ Vgs − Vt

Basic Logic Gates

NAND, NOR, and Complex gates are constructed using NMOS/PMOS combinations. Static CMOS Logic combines complementary pull-up (p-type) and pull-down (n-type) networks.

Inverter DC Transfer Characteristic (VTC)

Shows the relationship between Vin and Vout. Regions:

  • Region 1: Vin=0 → Vout=VDD (VOH)
  • Region 5: Vin=VDD → Vout=0 (VOL)
  • VIL: Maximum Vin treated as logic 0.
  • VIH: Minimum Vin treated as logic 1.
  • Vth: Vin = Vout, the inverter threshold voltage.

Noise Margin: Tolerance against voltage variation, ensuring logic integrity.

CMOS Inverter – Transient Analysis

  • Rise Time (tr): Time for output to swing from 10% to 90% of VDD.
  • Fall Time (tf): Time for output to swing from 90% to 10% of VDD.
  • Delay (td): Time between 50% input and 50% output transitions.

td &propto; CL / (VDD × k). To reduce delay, minimize CL, increase k (transistor W/L ratio), or increase VDD.

Power Dissipation in CMOS

  1. Static Dissipation: Due to leakage currents (reverse bias, subthreshold). Typically small, in the nW range per inverter.
  2. Dynamic Dissipation: Due to switching activity and charging/discharging the load capacitance (CL).
    • Power &propto; f × CL × VDD²
    • Short-circuit current during transitions also contributes to dynamic power.

Realization Using Static CMOS

  • Boolean logic is realized via pull-up/pull-down trees.
  • nMOS networks: AND functions use series transistors, OR functions use parallel transistors.
  • pMOS networks: AND functions use parallel transistors, OR functions use series transistors.
  • Static CMOS offers high noise margin, no static power consumption (ideally), and full rail-to-rail output swing.

Pass Transistor Logic (PTL)

  • Uses nMOS transistors as switches. Passes a strong 0 but a degraded 1 due to the Vtn drop.
  • Voltage degradation limits fan-out. Requires fewer transistors than static CMOS but has signal integrity issues.

Transmission Gate Logic (TGL)

  • Combines nMOS and pMOS transistors in parallel. Controlled by complementary inputs.
  • Passes both 0 and 1 strongly, acting as a bidirectional switch.
  • Eliminates Vtn degradation issues. Used in multiplexers, latches, and tri-state buffers.

Complementary Pass-Transistor Logic (CPL)

  • Logic is performed via transmission gates and pass transistors.
  • Used to implement complex functions like XOR and XNOR.
  • Requires pull-up/pull-down networks to restore full logic levels.

Summary

  • Static CMOS provides robust logic implementation with full swing and low static power.
  • PTL and TGL improve area and speed but require careful design for voltage swing restoration.
  • Transient behavior affects delay and power, emphasizing the need to minimize CL and optimize rise/fall times.

Module 3 – Dynamic Logic Design and Storage Cells

Dynamic Logic

Dynamic logic improves transistor efficiency by using precharge and evaluate phases. The pull-down network (PDN) is the same as in static CMOS but is gated by a clock (CLK).

  • Precharge phase (CLK=0): A PMOS transistor precharges the output to VDD, and the NMOS evaluation path is OFF.
  • Evaluate phase (CLK=1): The PMOS is OFF, and the NMOS is ON. The output discharges to GND if the PDN conducts; otherwise, it retains the precharged value. Only one input transition is allowed per clock cycle during evaluation.

Advantages of Dynamic Logic

  • Reduced transistor count (N+2 vs 2N for an N-input gate).
  • Faster due to smaller capacitance.
  • No static power consumption.
  • No short-circuit or glitching power.

Disadvantages of Dynamic Logic

  • Charge leakage: Occurs via reverse-biased diodes and subthreshold current.
  • Charge sharing: Redistributes charge when uninitialized nodes connect to charged nodes.
  • Clock skew: Can lead to race conditions in multi-stage logic.
  • Cascading issues: Delay in output discharge affects the next stage’s precharged input.

Mitigation Techniques

  • Charge leakage: Solved using a weak PMOS pull-up (bleeder) or higher threshold devices.
  • Charge sharing: Mitigated with an always-on weak PMOS.
  • Clock skew and cascading issues: Prevented by setting inputs to 0 during the precharge phase.

Domino Logic

Domino Logic combines dynamic nMOS logic with a static inverter. During precharge (CLK=0), the dynamic gate output is set to VDD, and the inverter output is 0. During evaluation (CLK=1), if the PDN conducts, the dynamic output discharges, causing the inverter output to rise. This structure only allows 0→1 transitions at the final output.

Advantages of Domino Logic

  • Fast operation.
  • Fewer transistors compared to static CMOS for complex gates.
  • No glitching or short-circuit power.

Limitations of Domino Logic

  • Requires an inverter after each dynamic gate.
  • Only implements non-inverting logic functions.

Workaround: Use De Morgan’s Theorem or compound domino gates to implement inverting logic.


NORA (NP-Domino) Logic

NORA (NP-Domino) or NP CMOS Logic uses alternating nMOS and pMOS dynamic stages. Precharge/evaluate phases are controlled by CLK and CLK’. nMOS logic stages precharge high, while pMOS logic stages predischarge low. This enables a pipelined architecture and eliminates the need for a static inverter after every stage.

Drawbacks of NORA Logic

  • Slower performance due to pMOS logic stages.
  • Charge sharing and leakage issues still exist.

Storage Cells

Memory Classification

  • ROM (Read-Only Memory): Stores fixed, non-volatile data. Types include Diode ROM, MOS ROM, and array structures like OR/NOR/NAND arrays.
  • RAM (Random Access Memory): Allows read-write access. Can be static (SRAM) or dynamic (DRAM).

ROM Cell Arrays

  • 4×4 NOR ROM: A word line goes high. If a transistor exists at the cross-point with a bit line, the output is 0; otherwise, it is 1.
  • 4×4 NAND ROM: All word lines are high except the selected one, which is low. If a transistor exists at the cross-point, the output is 1; otherwise, it is 0.
  • 4×4 OR ROM: Logic is stored via an ORed connection structure.

Static RAM (SRAM)

SRAM (Static RAM) uses cross-coupled inverters to store data and 2 access transistors (6T cell) to connect to bit lines.

  • Hold: Word Line (WL) = 0, no connection to bit lines (BL). The inverters maintain the state.
  • Read: WL = 1. BL and BL’ are precharged. The stored values Q and Q’ are passed to the bit lines, creating a differential voltage.
  • Write: WL = 1. Data is driven onto BL/BL’ from the write circuitry, forcing the cross-coupled inverters to the new state.

SRAM Pros

  • No refresh required.
  • Fast access times.
  • Low static power consumption.
  • High noise margin.

SRAM Cons

  • Large area per cell compared to DRAM.

Dynamic RAM (DRAM)

DRAM (Dynamic RAM) stores data as charge on a capacitor. The charge leaks over time, requiring periodic refreshing.

  • 1T DRAM: Consists of 1 transistor and 1 capacitor. Data is written by storing charge. Reading the data disturbs the charge, so the data must be refreshed after each read.
  • 3T DRAM: Adds separate write/read control using 3 transistors (M1–M3).
    • Write 1: Write Select (WS) = 1, charge is transferred to capacitor C1.
    • Read 1: Read Select (RS) = 1. If C1 holds a charge (logic 1), transistor C3 conducts, discharging the read bit line.
    • Write 0: WS = 1, C1 is discharged through transistor MD.
    • Read 0: RS = 1. If C1 holds no charge (logic 0), C3 does not conduct, and the read bit line remains charged.

DRAM Pros

  • Small area per cell.
  • Low power consumption (when not refreshing).

DRAM Cons

  • Requires periodic refresh cycles.
  • Needs peripheral logic for control and sensing.

Module 4 – Arithmetic Circuits

Full Adder

A Full Adder generates a Sum = A ⊕ B ⊕ Ci and a Carry = AB + BCi + ACi. It uses generate (G = AB), propagate (P = A⊕B), and delete (D = AB) terms. CMOS implementation involves translating these Boolean equations into transistor networks while minimizing transistor count and delay.

Static CMOS Adder

A Static CMOS Adder shares logic between the sum and carry sub-circuits for efficiency. The carry output often requires inversion, which can impact speed. 28-transistor designs are typical for a full adder.

Ripple Carry Adder

A Ripple Carry Adder connects full adders linearly, where the carry-out of one stage becomes the carry-in of the next. The worst-case delay is linear with the number of bits: tadder = (N–1)tcarry + tsum. The critical path spans all stages. Optimizing the tcarry delay is more crucial than the tsum delay for overall speed.

Delay Reduction

Delay can be reduced by utilizing the inversion property of logic gates — if all inputs to a full adder are inverted, the outputs are also inverted. This allows for reduced inverter stages per full adder, improving delay without affecting the logic function.

Manchester Carry-Chain Adder

A Manchester Carry-Chain Adder uses generate (G), propagate (P), and delete (D) logic implemented with pass transistors. The carry propagates along a chain based on G and P signals, simplifying the carry logic. Both static and dynamic versions exist.

Carry Bypass Adder (Carry Skip Adder)

A Carry Bypass Adder (Carry Skip Adder) allows the carry signal to skip blocks of adders when all propagate signals within that block are high. Typically uses 4-bit blocks with bypass paths. If P0·P1·P2·P3 = 1 for a block, then the carry-out of the block (Co) is equal to the carry-in (Ci), bypassing the internal ripple chain.

Delay Model

tsetup: Time to compute G and P for all bits.
tbypass: Delay through the bypass multiplexer.
td = tsetup + M·tcarry + (N/M–1)·tbypass + (M–1)·tcarry + tsum
For large N, the bypass adder delay increases more slowly compared to a ripple adder.

Linear Carry Select Adder

A Linear Carry Select Adder precomputes the sum and carry for each block assuming both Ci=0 and Ci=1. When the actual carry-in signal arrives, the correct precomputed result is selected using a multiplexer. This cuts down delay at the cost of approximately 30% more hardware.

The adder is divided into equal-sized blocks, each evaluating in parallel. The final output is selected based on the actual carry-in from the previous block.

Delay Model

tadd &propto; N linearly.
tcarry: Delay through a full adder.
tmux: Delay through the multiplexer.

Square Root Carry Select Adder

A Square Root Carry Select Adder varies the block size per stage (e.g., 2, 3, 4, … bits). This approach aims to match the multiplexer delay with the carry generation delay in each stage to equalize arrival times.

Adder stages are sized as 2, 3, 4, … bits to align delays, resulting in a sub-linear delay profile.

Let N be the total number of bits, and P be the number of blocks. If the block sizes are M, M+1, …, M+P-1, then N = M + (M+1) + … + (M+P–1). If M << N, then P &approx; √(2N), and the total delay becomes sublinear in N.


Multipliers

Binary Multiplier

A Binary Multiplier performs multiplication via the generation of partial products and their accumulation. Partial products are formed using AND gates; an N×N multiplier needs N² AND gates.

Summing the partial products is typically done using full adders, often organized in an array multiplier.

Array Multiplier

An Array Multiplier uses a regular structure of AND gates and adders. Each cell computes xi·yj and adds it with the partial sum and carry from adjacent cells. The cells are organized into rows and columns.

For a 4×4 multiplier, this requires 16 AND gates, 4 half adders, and multiple full adders. Partial products are summed using ripple-carry adders within the array structure.

Delay Analysis

Critical paths pass through multiple adder cells, leading to a long propagation time. All paths in a simple array multiplier are identical in length, so all must be optimized. Faster adders (e.g., carry-select) can be used in the critical path to improve performance.

tmult = tand + tcarry + tsum
Minimizing both tcarry and tsum is key to improving the speed of the multiplier.


Module 5: Fabrication Techniques and MOSFET Physical Design

Material Preparation

The process starts with Raw material: Quartzite → Metallurgical Grade Si → Electronic Grade Si → Single Crystal Si Boule → Si Wafer.

Wafer Preparation Steps

  1. Dicing (cutting the boule into wafers)
  2. Lapping (planarizing the surface)
  3. Etching (removing surface damage)
  4. Polishing (creating a mirror-smooth surface)
  5. Cleaning (removing contaminants)

Thermal Oxidation

Oxidation is the formation of SiO₂ on the silicon surface.

Uses of SiO₂

  • Surface passivation
  • Doping barrier
  • Dielectric (field oxide, gate oxide)

Growth Mechanisms

  • Dry oxidation: Uses O₂. Slower growth rate but produces high-quality oxide.
  • Wet oxidation: Uses H₂O. Faster growth rate but produces lower-quality oxide.

Surface passivation: SiO₂ prevents contamination and chemical reactions.
Dopant barrier: Blocks dopants and has a thermal expansion coefficient similar to Si.

Diffusion

Diffusion is a high-temperature process where dopant atoms move into the silicon lattice.

Advantages of Diffusion

  • No crystal damage.
  • Batch processing is possible.

Disadvantages of Diffusion

  • High temperature required.
  • Limited solubility of some dopants.
  • Difficult to form shallow junctions precisely.

Ion Implantation

Ion Implantation accelerates dopant ions into the silicon substrate.

Advantages of Ion Implantation

  • Low temperature process.
  • Precise control of dopant concentration and depth.
  • Compatible with oxide layers (oxide can be used as a mask).

Disadvantages of Ion Implantation

  • Causes implant damage to the crystal lattice.
  • Requires subsequent annealing to repair damage and activate dopants.
  • Risk of dislocation formation.

Epitaxy (Ordered Layer Growth)

Epitaxy is the growth of a single-crystal film on a crystalline substrate, where the film’s crystal structure is an extension of the substrate.

Types of Epitaxy

  • Homoepitaxy: Growth of the same material (e.g., Si on Si). Produces purer films and allows for varied doping profiles.
  • Heteroepitaxy: Growth of a different material (e.g., SiGe on Si).

Method

Molecular Beam Epitaxy (MBE): An ultra-high vacuum deposition technique used for growing single-crystal thin films with precise control.


Lithography

Lithography is used to transfer a pattern from a mask to the wafer surface.

Types of Lithography

  • Photo Lithography: Projects a mask image onto a photoresist-coated wafer using UV light. Most common technique.
  • Electron Beam Lithography: Uses an electron beam to write patterns directly. Offers higher resolution but is slower and more expensive.

Basic Steps of Photolithography

  1. Coat wafer with photoresist (a light-sensitive polymer).
  2. Expose the photoresist through a mask using UV light.
  3. Develop the photoresist, removing either exposed or unexposed areas depending on the resist type.
  4. Etch the underlying layer using the patterned photoresist as a mask.
  5. Strip the remaining photoresist.

Etching Techniques

Etching removes material from the wafer surface based on the lithographically defined pattern.

  • Wet etching: Uses liquid chemicals. Typically isotropic (etches equally in all directions), leading to undercutting.
  • Dry etching: Uses plasma or reactive gases. Can be anisotropic (etches preferentially in one direction, usually vertical), which is preferred for fine features. Plasma etching is a common dry etching method.

Deposition

Deposition adds thin films of various materials (metals, dielectrics, semiconductors) onto the wafer surface.

Deposition Methods

  • Physical Deposition: Evaporation, Physical Vapor Deposition (PVD).
  • Chemical Deposition: Chemical Vapor Deposition (CVD), Metalorganic Chemical Vapor Deposition (MOCVD), Atomic Layer Deposition (ALD).
  • Epitaxial Deposition: MBE, Liquid Phase Epitaxy (LPE), Vapor Phase Epitaxy (VPE).

CVD Reactors

Atmospheric Pressure CVD (APCVD), Low Pressure CVD (LPCVD), Metalorganic CVD (MOCVD), Plasma Enhanced CVD (PECVD).

Deposition Terms

  • Step coverage: The uniformity of the deposited film over steps or features on the wafer surface.
  • Aspect ratio: The ratio of the height (H) to the width (W) of a feature.
  • Throughput: The number of wafers processed per hour.
  • Deposition rate: The rate at which the film thickness increases over time.

MOSFET Fabrication Techniques

CMOS Technology

CMOS Technology uses both N-type and P-type MOSFET devices on the same chip.

CMOS Technologies

  • P-Well: Uses an N-type substrate and creates P-wells for NMOS transistors.
  • N-Well: Uses a P-type substrate and creates N-wells for PMOS transistors.
  • Twin-Tub: Creates both N-wells and P-wells on a lightly doped substrate (often N-type). Allows for better optimization of both p-channel and n-channel transistor performance.
  • SOI (Silicon-on-Insulator): Fabricates nMOS and pMOS transistors on a thin layer of silicon over an insulating layer (like buried oxide). Provides excellent isolation and reduces parasitic effects.

Twin-Tub Process

The Twin-Tub Process typically starts with a high-resistivity N-substrate. Both N-wells (for PMOS) and P-wells (for NMOS) are formed. This process allows for independent optimization of the doping profiles for both well types, leading to better transistor performance and reduced latch-up susceptibility.

Layout Design

Layout Design translates the circuit schematic into a physical representation on the chip.

Stick Diagram

A Stick Diagram is a symbolic, color-coded representation of the layout, capturing the relative placement and connections of different layers (diffusion, polysilicon, metal, etc.).

Purpose of Stick Diagrams

They serve as an interface between the logic design and the actual physical layout, providing a simplified view before detailed geometric layout.

Layers Represented in Stick Diagrams

n-diffusion, p-diffusion, polysilicon, metal1, contacts, etc.

Design Rules

Design Rules are a set of geometric constraints that translate the circuit layout into a manufacturable geometry, ensuring reliable fabrication.

Purpose of Design Rules

To ensure that the layout can be manufactured reliably by the fabrication process, accounting for process variations and limitations.

Types of Design Rules

  • Micron Rules: Specify dimensions and spacing in absolute units (e.g., 0.6 μm).
  • Lambda (λ) Rules: Specify dimensions and spacing in terms of a scalable parameter λ, where λ is typically half of the minimum feature size (f/2). This allows the design to be easily scaled to different process technologies.

Design Rules Include

  • Intra-layer rules: Minimum width and spacing requirements for features on the same layer.
  • Inter-layer rules: Alignment and overlap requirements between features on different layers (e.g., contact size and overlap with diffusion/polysilicon).

Stick Diagram Examples

Illustrations of stick diagrams for basic circuits like:

  • NMOS pull-up: Using a resistor or depletion-mode NMOS.
  • CMOS Inverter, NAND, NOR gates: Showing the layout topology and corresponding logic function.