Microprocessor Systems: 8086 and ARM Architecture Reference

Posted on May 24, 2026 in Computers

CMPS201 Microprocessor Systems Reference: Front

8086 Architecture and Layer Stack

Layer Stack (Bottom to Top):

Physics → Transistors → Logic Gates → Microarchitecture (This Course) → Instruction Set Architecture (ISA) → Operating System → Application

Microprocessor (MPU) vs. Microcontroller (MCU)

Feature	MPU	MCU
Type	CPU only	System on Chip (CPU, RAM, ROM, I/O)
External Hardware Needed?	Yes (Motherboard, RAM)	No (Just battery and crystal)
Example	Intel i7, AMD Ryzen	STM32, ATmega, Arduino
Use Case	PC or Server	Embedded (Appliances, Cars)
Analogy	Ferrari Engine	Toyota Corolla (Ready to drive)

Von Neumann vs. Harvard Architecture

Feature	Von Neumann	Harvard
Buses	1 Shared	2 Separate
Bottleneck?	Yes (1-lane bridge)	No
Fetch and Read?	Serial (Wait)	Simultaneous
8086?	Von Neumann	Modern i7 cache uses Harvard internally

Warning: 8086 is Von Neumann (shared bus). Harvard is not 8086.

Bus Interface Unit (BIU) vs. Execution Unit (EU)

Unit	Role	Components
BIU (Logistics Manager)	External world interface; fetches instructions; reads/writes data; calculates 20-bit physical address.	6-byte Instruction Queue, Segment Registers (CS, DS, SS, ES), Instruction Pointer (IP), Address Generator.
EU (Worker/Brain)	No external pins; gets instructions from BIU queue; executes; writes flags.	Control Circuit (Decoder), Arithmetic Logic Unit (ALU), Flag Register.

6-byte Instruction Queue: Primitive pipelining. While the EU is busy, the BIU pre-fetches the next instruction.
Warning: JUMP or GOTO instructions cause the queue to be flushed (Branch Penalty/Pipeline Flush), requiring the BIU to restart fetching from a new address.

8086 Registers

General Purpose (16-bit; can split to High/Low byte)

Register	Name	Critical Rule
AX (AH\|AL)	Accumulator	MUL and DIV must use AX. Math specialist.
BX (BH\|BL)	Base / Pointer	Only General Purpose register usable as a memory pointer [BX]. [CX] or [DX] are illegal.
CX (CH\|CL)	Counter	LOOP instructions automatically decrement CX and check for zero.
DX (DH\|DL)	Data / I/O	Port address for I/O. In 16×16 MUL, the upper 16 bits go to DX and the lower to AX.

Index and Pointer Registers

SI (Source Index): Used for string operations (MOVSB reads from [SI]).
DI (Destination Index): Used for string operations (MOVSB writes to [DI]).
SP (Stack Pointer): Points to the top of the stack. Never modify manually.
BP (Base Pointer): Used to access stack locals and parameters without push/pop.
IP (Instruction Pointer): Program Counter. Cannot be overridden in the BIU.

Segment Registers

CS (Code Segment): Executable instructions.
DS (Data Segment): Global variables; default for [BX].
SS (Stack Segment): Function variables and stack.
ES (Extra Segment): Destination for string copies.

Warning: Valid pointer registers inside brackets [] are BX, BP, SI, and DI only. AX, CX, and DX are illegal and cause compile errors.

Flag Register (Resides in EU)

Z (Zero): Result equals 0.
N/S (Negative/Sign): Result is less than 0.
C (Carry): Unsigned overflow.
O/V (Overflow): Signed overflow.

Flags are updated after every ALU operation.

Memory Addressing and Endianness

Byte-addressable: Smallest unit is 1 Byte (8 bits).
16-bit bus: Maximum 64 KB directly addressable.
8086 20-bit address bus: 1 MB address space.
Intel Little Endian: Least Significant Byte (LSB) is stored at the lowest address.

Example: 0x12345678 at address 1000:

Address 1000: 78 (LSB)
Address 1001: 56
Address 1002: 34
Address 1003: 12 (MSB)

Exam Trap: “What byte is at address 1000?” The answer is 78, not 12. Big Endian would store the MSB first.

Segmentation and Physical Addresses

Physical Address = (Segment × 16) + Offset

Equivalent to: (Segment << 4) + Offset.
Example: DS = 0x2000, Offset = 0x0050 → 0x20000 + 0x0050 = 0x20050.

Segment	Default Register	Content
CS	IP	Code (Instructions)
DS	BX, SI, DI	Global Variables
SS	SP, BP	Stack
ES	DI	String Destination

Segment Override: MOV AX, CS:[BX] uses CS instead of DS. Note: You cannot override IP; the CPU always fetches from CS.

Addressing Modes

Mode	Syntax	Speed	Notes
Immediate	MOV AX, 5	Fast	Data is in the instruction; no memory fetch.
Register	MOV AX, BX	Fastest	In-CPU; zero memory access.
Direct	MOV AX, [1000H]	Slower	Hardcoded address (DS:1000H). Used for global variables.
Register Indirect	MOV AX, [BX]	Slower	Pointer. BX holds the address. Only BX, BP, SI, DI allowed.
Based + Indexed	MOV AL, [BX+SI]	Slower	Arrays. Base (BX/BP) + Index (SI/DI). AGU calculates in 1 cycle.

Warning: MOV AX, BX copies the value of BX. MOV AX, [BX] goes to the address stored in BX (pointer dereference).

Instruction Cycles and CISC vs. RISC

Fetch-Decode-Execute Cycle

FETCH: PC address → Address Bus → Memory → Instruction → Instruction Register (IR). PC auto-increments.
DECODE: Decoder reads opcode bits and activates hardware paths (ALU/MOV).
EXECUTE: ALU performs math/logic. Write-back to destination register. Flags update.

CISC (x86) vs. RISC (ARM)

Feature	CISC (8086)	RISC (ARM)
Philosophy	Hardware does complex work	Software breaks tasks into simple steps
Instruction Size	Variable length	Fixed length (32-bit)
CPI	> 1 (Many cycles per instruction)	~1 (Goal)
Memory Access	ALU can touch RAM directly	Load/Store only (ALU cannot touch RAM)
Power	High (Desktop)	Low (Mobile/Embedded)

Warning: Modern CISC (Intel) secretly converts instructions into internal Micro-ops, acting like RISC internally.

ARM Cortex-M Registers

Register	Name	Purpose
R0–R3	General Purpose	Arguments and return values. Caller-saved.
R4–R11	General Purpose	Local variables. Callee-saved (must preserve).
R12	IP (Scratch)	Intra-procedure scratch. Auto-saved on interrupt.
R13 (SP)	Stack Pointer	Full Descending stack.
R14 (LR)	Link Register	Stores return address on BL call. Fast (no RAM needed).
R15 (PC)	Program Counter	Writing to R15 causes a Jump.

xPSR Flags: N (Negative), Z (Zero), C (Carry), V (Overflow). Note: Parity (P) is not an ARM flag.

ARM Assembly (UAL) and Control Flow

Format: OPCODE Destination, Source1, Source2

LDR R0, [R1]: Load from RAM address in R1 to R0.
STR R0, [R1]: Store R0 to RAM address in R1.
BIC R0, R1, #0x20: Bit Clear (R1 AND NOT mask).
BL function: Branch with Link (saves PC+4 in LR).
BX LR: Return from function (PC = LR).

Barrel Shifter: ADD R0, R1, R2, LSL #2 → R0 = R1 + (R2 × 4) in one cycle.

Warning: MOV R0, #0x12345678 fails because a 32-bit number cannot fit in a 32-bit instruction with an opcode. Use LDR R0, =0x12345678.

Performance and Power (Iron Law)

Time = Instruction Count × CPI × Clock Period
Dynamic Power (P) = C × V² × f
Voltage (V) is the most impactful factor because it is squared.
Race-to-Sleep: Run the CPU fast to finish tasks, then sleep immediately to save energy.

CMPS201 Microprocessor Systems Reference: Back

GPIO and Memory-Mapped I/O

Peripherals are mapped to specific memory addresses. Writing to these addresses triggers hardware actions.

Register Address = Peripheral Base Address + Register Offset

GPIO Registers (STM32 Example)

Offset	Register	Function	Key Values
0x00	MODER	Pin Direction	00=Input, 01=Output, 10=Alternate, 11=Analog
0x10	IDR	Input Data	Read-only; current pin voltage
0x14	ODR	Output Data	Read/Write; 1=High, 0=Low
0x18	BSRR	Atomic Set/Reset	Bits 0-15=Set; Bits 16-31=Reset

Warning: Always enable the RCC Clock first. Without a clock, register writes are silently ignored.

Bit Manipulation (Read-Modify-Write)

Set Bit: REG |= (1 << 5); (OR with 1 forces set).
Clear Bit: REG &= ~(1 << 5); (AND with 0 forces clear).
Toggle Bit: REG ^= (1 << 5); (XOR with 1 flips bit).

Safety: Use volatile for hardware pointers to prevent compiler optimization from removing necessary hardware reads.

Interrupts and the NVIC

Feature	Polling	Interrupts
CPU Load	100% (Busy-wait)	~0% (Sleeps/Works)
Responsiveness	Delayed	Instant (Hardware trigger)

NVIC (Nested Vectored Interrupt Controller)

Nested: Priority-based pre-emption.
Vectored: Uses a lookup table for ISR addresses.
Priority: Lower numbers equal higher priority (0 is highest).
Tail-chaining: CPU skips unstacking/restacking between back-to-back interrupts to save ~12 cycles.

Critical: You must clear the pending flag inside the ISR (e.g., EXTI_ClearITPendingBit). Forgetting this causes an infinite loop.

Hardware Timers and PWM

Prescaler (PSC): Divides the clock frequency. F_timer = F_clk / (PSC + 1).
Auto-Reload (ARR): Defines the period. ARR = desired_count - 1.
Capture Compare (CCR): Defines the PWM duty cycle. Duty% = CCR / (ARR + 1) × 100%.

The +1 Rule: Always subtract 1 when setting PSC and ARR because the counter includes zero.

Motor Control

H-Bridge: Controls direction. Forward (Q1+Q4), Reverse (Q3+Q2), Brake (Q2+Q4), Coast (All open).
Shoot-Through: If Q1 and Q2 are on simultaneously, it creates a short circuit. “Dead Time” delays prevent this.
Servo (PPM): Uses pulse width (time) to set position (1.5ms = 90°).
Stepper: Open-loop control; moves in discrete steps (e.g., 1.8°).

Serial Communication Protocols

Feature	UART	I2C	SPI
Wires	2 (TX, RX)	2 (SDA, SCL)	4 (MOSI, MISO, SCK, CS)
Clock	Asynchronous	Synchronous	Synchronous
Addressing	None	7-bit Software	Hardware (Chip Select)
Duplex	Full	Half	Full

UART Frame: Start Bit (Low), Data (LSB first), Optional Parity, Stop Bit (High). Note: GND must be shared between devices.

ADC, DMA, and Pipelining

ADC: Converts analog voltage to digital. Value = Vin / Vref × (2^n - 1). Warning: Vin > Vref can destroy the hardware.
DMA (Direct Memory Access): Copies data (e.g., ADC to RAM) without CPU involvement, freeing the CPU for other tasks.
Pipelining: Fetch-Decode-Execute-Writeback. Ideal throughput is 1 instruction per cycle. Branch mispredictions cause pipeline flushes.
Cache: L1 (fastest/smallest) → L2 → RAM (slowest/largest). Sequential access improves the “Cache Hit” rate.