Microprocessor Systems: 8086 and ARM Architecture Reference
CMPS201 Microprocessor Systems Reference: Front
8086 Architecture and Layer Stack
Layer Stack (Bottom to Top):
- Physics → Transistors → Logic Gates → Microarchitecture (This Course) → Instruction Set Architecture (ISA) → Operating System → Application
Microprocessor (MPU) vs. Microcontroller (MCU)
| Feature | MPU | MCU |
|---|---|---|
| Type | CPU only | System on Chip (CPU, RAM, ROM, I/O) |
| External Hardware Needed? | Yes (Motherboard, RAM) | No (Just battery and crystal) |
| Example | Intel i7, AMD Ryzen | STM32, ATmega, Arduino |
| Use Case | PC or Server | Embedded (Appliances, Cars) |
| Analogy | Ferrari Engine | Toyota Corolla (Ready to drive) |
Von Neumann vs. Harvard Architecture
| Feature | Von Neumann | Harvard |
|---|---|---|
| Buses | 1 Shared | 2 Separate |
| Bottleneck? | Yes (1-lane bridge) | No |
| Fetch and Read? | Serial (Wait) | Simultaneous |
| 8086? | Von Neumann | Modern i7 cache uses Harvard internally |
Warning: 8086 is Von Neumann (shared bus). Harvard is not 8086.
Bus Interface Unit (BIU) vs. Execution Unit (EU)
| Unit | Role | Components |
|---|---|---|
| BIU (Logistics Manager) | External world interface; fetches instructions; reads/writes data; calculates 20-bit physical address. | 6-byte Instruction Queue, Segment Registers (CS, DS, SS, ES), Instruction Pointer (IP), Address Generator. |
| EU (Worker/Brain) | No external pins; gets instructions from BIU queue; executes; writes flags. | Control Circuit (Decoder), Arithmetic Logic Unit (ALU), Flag Register. |
- 6-byte Instruction Queue: Primitive pipelining. While the EU is busy, the BIU pre-fetches the next instruction.
- Warning: JUMP or GOTO instructions cause the queue to be flushed (Branch Penalty/Pipeline Flush), requiring the BIU to restart fetching from a new address.
8086 Registers
General Purpose (16-bit; can split to High/Low byte)
| Register | Name | Critical Rule |
|---|---|---|
| AX (AH|AL) | Accumulator | MUL and DIV must use AX. Math specialist. |
| BX (BH|BL) | Base / Pointer | Only General Purpose register usable as a memory pointer [BX]. [CX] or [DX] are illegal. |
| CX (CH|CL) | Counter | LOOP instructions automatically decrement CX and check for zero. |
| DX (DH|DL) | Data / I/O | Port address for I/O. In 16×16 MUL, the upper 16 bits go to DX and the lower to AX. |
Index and Pointer Registers
- SI (Source Index): Used for string operations (MOVSB reads from [SI]).
- DI (Destination Index): Used for string operations (MOVSB writes to [DI]).
- SP (Stack Pointer): Points to the top of the stack. Never modify manually.
- BP (Base Pointer): Used to access stack locals and parameters without push/pop.
- IP (Instruction Pointer): Program Counter. Cannot be overridden in the BIU.
Segment Registers
- CS (Code Segment): Executable instructions.
- DS (Data Segment): Global variables; default for [BX].
- SS (Stack Segment): Function variables and stack.
- ES (Extra Segment): Destination for string copies.
Warning: Valid pointer registers inside brackets [] are BX, BP, SI, and DI only. AX, CX, and DX are illegal and cause compile errors.
Flag Register (Resides in EU)
- Z (Zero): Result equals 0.
- N/S (Negative/Sign): Result is less than 0.
- C (Carry): Unsigned overflow.
- O/V (Overflow): Signed overflow.
Flags are updated after every ALU operation.
Memory Addressing and Endianness
- Byte-addressable: Smallest unit is 1 Byte (8 bits).
- 16-bit bus: Maximum 64 KB directly addressable.
- 8086 20-bit address bus: 1 MB address space.
- Intel Little Endian: Least Significant Byte (LSB) is stored at the lowest address.
Example: 0x12345678 at address 1000:
- Address 1000: 78 (LSB)
- Address 1001: 56
- Address 1002: 34
- Address 1003: 12 (MSB)
Exam Trap: “What byte is at address 1000?” The answer is 78, not 12. Big Endian would store the MSB first.
Segmentation and Physical Addresses
Physical Address = (Segment × 16) + Offset
- Equivalent to: (Segment << 4) + Offset.
- Example: DS = 0x2000, Offset = 0x0050 → 0x20000 + 0x0050 = 0x20050.
| Segment | Default Register | Content |
|---|---|---|
| CS | IP | Code (Instructions) |
| DS | BX, SI, DI | Global Variables |
| SS | SP, BP | Stack |
| ES | DI | String Destination |
Segment Override: MOV AX, CS:[BX] uses CS instead of DS. Note: You cannot override IP; the CPU always fetches from CS.
Addressing Modes
| Mode | Syntax | Speed | Notes |
|---|---|---|---|
| Immediate | MOV AX, 5 | Fast | Data is in the instruction; no memory fetch. |
| Register | MOV AX, BX | Fastest | In-CPU; zero memory access. |
| Direct | MOV AX, [1000H] | Slower | Hardcoded address (DS:1000H). Used for global variables. |
| Register Indirect | MOV AX, [BX] | Slower | Pointer. BX holds the address. Only BX, BP, SI, DI allowed. |
| Based + Indexed | MOV AL, [BX+SI] | Slower | Arrays. Base (BX/BP) + Index (SI/DI). AGU calculates in 1 cycle. |
Warning: MOV AX, BX copies the value of BX. MOV AX, [BX] goes to the address stored in BX (pointer dereference).
Instruction Cycles and CISC vs. RISC
Fetch-Decode-Execute Cycle
- FETCH: PC address → Address Bus → Memory → Instruction → Instruction Register (IR). PC auto-increments.
- DECODE: Decoder reads opcode bits and activates hardware paths (ALU/MOV).
- EXECUTE: ALU performs math/logic. Write-back to destination register. Flags update.
CISC (x86) vs. RISC (ARM)
| Feature | CISC (8086) | RISC (ARM) |
|---|---|---|
| Philosophy | Hardware does complex work | Software breaks tasks into simple steps |
| Instruction Size | Variable length | Fixed length (32-bit) |
| CPI | > 1 (Many cycles per instruction) | ~1 (Goal) |
| Memory Access | ALU can touch RAM directly | Load/Store only (ALU cannot touch RAM) |
| Power | High (Desktop) | Low (Mobile/Embedded) |
Warning: Modern CISC (Intel) secretly converts instructions into internal Micro-ops, acting like RISC internally.
ARM Cortex-M Registers
| Register | Name | Purpose |
|---|---|---|
| R0–R3 | General Purpose | Arguments and return values. Caller-saved. |
| R4–R11 | General Purpose | Local variables. Callee-saved (must preserve). |
| R12 | IP (Scratch) | Intra-procedure scratch. Auto-saved on interrupt. |
| R13 (SP) | Stack Pointer | Full Descending stack. |
| R14 (LR) | Link Register | Stores return address on BL call. Fast (no RAM needed). |
| R15 (PC) | Program Counter | Writing to R15 causes a Jump. |
xPSR Flags: N (Negative), Z (Zero), C (Carry), V (Overflow). Note: Parity (P) is not an ARM flag.
ARM Assembly (UAL) and Control Flow
Format: OPCODE Destination, Source1, Source2
LDR R0, [R1]: Load from RAM address in R1 to R0.STR R0, [R1]: Store R0 to RAM address in R1.BIC R0, R1, #0x20: Bit Clear (R1 AND NOT mask).BL function: Branch with Link (saves PC+4 in LR).BX LR: Return from function (PC = LR).
Barrel Shifter: ADD R0, R1, R2, LSL #2 → R0 = R1 + (R2 × 4) in one cycle.
Warning: MOV R0, #0x12345678 fails because a 32-bit number cannot fit in a 32-bit instruction with an opcode. Use LDR R0, =0x12345678.
Performance and Power (Iron Law)
- Time = Instruction Count × CPI × Clock Period
- Dynamic Power (P) = C × V² × f
- Voltage (V) is the most impactful factor because it is squared.
- Race-to-Sleep: Run the CPU fast to finish tasks, then sleep immediately to save energy.
CMPS201 Microprocessor Systems Reference: Back
GPIO and Memory-Mapped I/O
Peripherals are mapped to specific memory addresses. Writing to these addresses triggers hardware actions.
Register Address = Peripheral Base Address + Register Offset
GPIO Registers (STM32 Example)
| Offset | Register | Function | Key Values |
|---|---|---|---|
| 0x00 | MODER | Pin Direction | 00=Input, 01=Output, 10=Alternate, 11=Analog |
| 0x10 | IDR | Input Data | Read-only; current pin voltage |
| 0x14 | ODR | Output Data | Read/Write; 1=High, 0=Low |
| 0x18 | BSRR | Atomic Set/Reset | Bits 0-15=Set; Bits 16-31=Reset |
Warning: Always enable the RCC Clock first. Without a clock, register writes are silently ignored.
Bit Manipulation (Read-Modify-Write)
- Set Bit:
REG |= (1 << 5);(OR with 1 forces set). - Clear Bit:
REG &= ~(1 << 5);(AND with 0 forces clear). - Toggle Bit:
REG ^= (1 << 5);(XOR with 1 flips bit).
Safety: Use volatile for hardware pointers to prevent compiler optimization from removing necessary hardware reads.
Interrupts and the NVIC
| Feature | Polling | Interrupts |
|---|---|---|
| CPU Load | 100% (Busy-wait) | ~0% (Sleeps/Works) |
| Responsiveness | Delayed | Instant (Hardware trigger) |
NVIC (Nested Vectored Interrupt Controller)
- Nested: Priority-based pre-emption.
- Vectored: Uses a lookup table for ISR addresses.
- Priority: Lower numbers equal higher priority (0 is highest).
- Tail-chaining: CPU skips unstacking/restacking between back-to-back interrupts to save ~12 cycles.
Critical: You must clear the pending flag inside the ISR (e.g., EXTI_ClearITPendingBit). Forgetting this causes an infinite loop.
Hardware Timers and PWM
- Prescaler (PSC): Divides the clock frequency.
F_timer = F_clk / (PSC + 1). - Auto-Reload (ARR): Defines the period.
ARR = desired_count - 1. - Capture Compare (CCR): Defines the PWM duty cycle.
Duty% = CCR / (ARR + 1) × 100%.
The +1 Rule: Always subtract 1 when setting PSC and ARR because the counter includes zero.
Motor Control
- H-Bridge: Controls direction. Forward (Q1+Q4), Reverse (Q3+Q2), Brake (Q2+Q4), Coast (All open).
- Shoot-Through: If Q1 and Q2 are on simultaneously, it creates a short circuit. “Dead Time” delays prevent this.
- Servo (PPM): Uses pulse width (time) to set position (1.5ms = 90°).
- Stepper: Open-loop control; moves in discrete steps (e.g., 1.8°).
Serial Communication Protocols
| Feature | UART | I2C | SPI |
|---|---|---|---|
| Wires | 2 (TX, RX) | 2 (SDA, SCL) | 4 (MOSI, MISO, SCK, CS) |
| Clock | Asynchronous | Synchronous | Synchronous |
| Addressing | None | 7-bit Software | Hardware (Chip Select) |
| Duplex | Full | Half | Full |
UART Frame: Start Bit (Low), Data (LSB first), Optional Parity, Stop Bit (High). Note: GND must be shared between devices.
ADC, DMA, and Pipelining
- ADC: Converts analog voltage to digital.
Value = Vin / Vref × (2^n - 1). Warning: Vin > Vref can destroy the hardware. - DMA (Direct Memory Access): Copies data (e.g., ADC to RAM) without CPU involvement, freeing the CPU for other tasks.
- Pipelining: Fetch-Decode-Execute-Writeback. Ideal throughput is 1 instruction per cycle. Branch mispredictions cause pipeline flushes.
- Cache: L1 (fastest/smallest) → L2 → RAM (slowest/largest). Sequential access improves the “Cache Hit” rate.
