ARM Processor Architecture & Embedded Systems
Current Program Status Register (CPSR)
The Current Program Status Register (CPSR) is a 32-bit special-purpose register in ARM processors. It plays a central role in controlling the state and execution flow of the processor.
CPSR Structure
The CPSR has four 8-bit fields: Flags (holds condition flags), Status and Extension (reserved), and Control. In current ARM designs, only the Control and Flags fields are actively used.
Key CPSR Bits
- N (Bit 31) – Negative flag
- Z (Bit 30) – Zero flag
- C (Bit 29) – Carry flag (unsigned)
- V (Bit 28) – Overflow flag (signed)
- Q (Bit 27) – DSP saturation flag
- J (Bit 8) – Jazelle (Java) state
- I (Bit 7) – IRQ disable (1 = off)
- F (Bit 6) – FIQ disable (1 = off)
- T (Bit 5) – Thumb state (1 = Thumb)
- Mode (Bits 4–0) – Processor mode (e.g., User, IRQ, SVC)
Processor Modes Controlled by CPSR
ARM processor modes include:
- User (USR): For normal application programs.
- FIQ (Fast Interrupt Request): For handling high-priority, fast interrupts.
- IRQ (Interrupt Request): For standard interrupt handling.
- Supervisor (SVC): For system tasks and operating system kernel.
- Abort (ABT): For memory access faults.
- Undefined (UND): For undefined instructions.
- System (SYS): For privileged system-level operations.
CPSR Applications and Functions
The CPSR is crucial for:
- Tracking processor state.
- Managing interrupts.
- Defining the current execution mode.
- Holding condition flags for conditional execution.
- Controlling program branching.
Important CPSR Considerations
- The CPSR is readable in all processor modes but writable only in privileged modes.
- During an interrupt, the CPSR’s contents are copied to the Saved Program Status Register (SPSR).
- The flags within the CPSR support conditional execution and are automatically updated by specific instructions, such as
SUBS
andADDS
.
ARM Processor Pipelining
Pipelining is a technique used in RISC processors, such as ARM, to accelerate instruction execution by overlapping different stages of instruction processing. Instead of executing one instruction completely before starting the next, pipelining divides instruction execution into multiple concurrent stages.
Basic ARM Pipeline Stages
- Fetch: Retrieves the instruction from memory.
- Decode: Decodes the instruction and reads required operands.
- Execute: Performs the operation specified by the instruction.
Pipelining’s Impact on Program Execution
- Without Pipelining: Executing 3 instructions, each with 3 stages, would typically take 9 cycles (3 instructions × 3 stages).
- With Pipelining: The same 3 instructions can complete in approximately 5 cycles.
- Result: Significantly faster execution, as one instruction can complete per cycle once the pipeline is initially filled.
Benefits of Instruction Pipelining
- Increases instruction throughput.
- Improves CPU efficiency.
- Keeps different processor units active simultaneously.
- Enables instruction-level parallelism.
Pipelining Limitations and Challenges
- Pipeline Latency: The first result is available only after all stages of the first instruction are completed.
- Branch Instructions: Branch instructions or jumps can cause a pipeline flush, requiring the pipeline to be refilled, which incurs a performance penalty.
- Data Dependencies: Dependencies between instructions (where one instruction needs the result of a previous one) can require special handling, such as stalling the pipeline.
RISC Architecture: Design & Comparison with CISC
RISC Design Philosophy
RISC Instruction Set
RISC uses fewer, simpler, fixed-length instructions. Complex tasks are achieved by combining multiple simple instructions.
RISC Pipelining
Pipelining breaks instruction execution into parallel steps, significantly increasing throughput. RISC instructions are well-suited for pipelining due to their simple, fixed format.
RISC Register Set
RISC architectures feature a large number of general-purpose registers for data or addresses, which reduces the need for frequent memory access and improves processing speed.
RISC Load-Store Architecture
In a load-store architecture, only dedicated load and store instructions access memory. All other processing operations are performed on data held within registers, making operations simpler and more efficient.
Compiler Role in RISC
RISC shifts much of the complexity and optimization responsibility to the compiler. The compiler handles instruction optimization, efficient register use, and memory access, which simplifies and speeds up the hardware design, making it ideal for embedded systems.
CISC Design Philosophy
CISC Instruction Set
CISC uses many complex, variable-length instructions. Each instruction can perform multiple tasks and often requires multiple clock cycles to complete.
CISC Pipelining
Pipelining is more challenging in CISC due to instruction complexity and variable lengths. CISC often relies on microcode, which can slow down execution and make pipelines less efficient compared to RISC.
CISC Register Set
CISC typically has fewer general-purpose registers. Many operations work directly on memory, and some registers may have fixed, specialized roles.
CISC Memory Access
The load-store architecture is not strictly followed in CISC. Most instructions can access memory directly, which increases complexity and memory traffic.
Compiler Role in CISC
The compiler in CISC systems is generally simpler, as the hardware itself handles much of the instruction optimization and complex operations. This approach increases hardware complexity and cost.
ARM Processor Data Flow Model
The ARM processor data flow model illustrates how data moves through the ARM core during instruction execution. It provides insights into how the processor loads, processes, and stores data using its internal components. Data items and instructions typically share the same data bus to enter the processor core.
Instruction Decoder
The instruction decoder translates instructions before they are executed. It distinguishes between different instruction types, such as load and store instructions.
Sign Extension Unit
The sign extension unit extends signed 8-bit and 16-bit numbers to 32-bit values, ensuring correct arithmetic operations.
Register Files
The ARM processor includes 16 general-purpose registers (R0 to R15). Specific registers have dedicated roles:
- R13: Stack Pointer (SP)
- R14: Link Register (LR)
- R15: Program Counter (PC)
ARM instructions typically operate with two Program Status Registers: the Current Program Status Register (CPSR) and the Saved Program Status Register (SPSR). Instructions often use two source registers (Rn and Rm) and a single destination register (Rd).
ALU and MAC Unit
The Arithmetic Logic Unit (ALU) and Multiply-Accumulate (MAC) unit take register values (Rn, Rm) from the A and B buses. Based on the instruction’s operands, they perform the specified operation and return the result to the destination register.
Address Incrementer
The address incrementer updates the address register before the core reads from or writes to memory, facilitating sequential memory access.
ARM Processor Design Philosophy
The ARM design philosophy centers on creating efficient, low-power, and compact processors, making them ideal for embedded systems and mobile devices.
Low Power Consumption
ARM processors are engineered for very low power consumption, making them highly suitable for battery-powered devices such as smartphones and wearables.
High Code Density
ARM achieves high code density through features like the Thumb instruction set, which enables the use of 16-bit instructions, thereby significantly reducing memory usage.
Cost-Effective Design
ARM designs are optimized to utilize slower and less expensive memory, which lowers the overall system cost and makes them perfect for cost-sensitive applications.
Minimized Die Area
ARM cores are small in physical size, which helps minimize the silicon die area. This reduces manufacturing costs and facilitates integration with other components on a single chip.
Integrated Hardware Debugging
ARM processors incorporate built-in debug hardware. This feature assists developers in efficiently tracing and resolving software issues while the processor is actively running.
ARM-Based Embedded System Hardware Architecture
An embedded system built around an ARM core comprises both hardware and software components that collaborate to perform dedicated tasks.
Main Hardware Components
ARM Processor Core
The ARM processor core controls the entire embedded system. It executes instructions, processes data, and interfaces with memory and various peripherals. Different versions of ARM cores are available, tailored to specific performance and power requirements.
Controllers
Controllers manage specific system functions. Common examples include the interrupt controller and memory controller. They help coordinate communication between the processor and peripherals or memory.
Peripherals
Peripherals enable the system to interact with the external world. Examples include timers, Universal Asynchronous Receiver-Transmitters (UARTs), General Purpose Input/Outputs (GPIOs), and Analog-to-Digital Converters (ADCs). Peripherals can be internal or external to the main chip and are often memory-mapped, meaning they are controlled through specific memory addresses.
Bus System
The bus system transfers data between the processor, memory, and peripherals. ARM typically utilizes the Advanced Microcontroller Bus Architecture (AMBA) standard, which includes buses like the Advanced High-performance Bus (AHB) and the Advanced Peripheral Bus (APB).
Microprocessor vs. Microcontroller Comparison
Microprocessor Characteristics
- Contains only the Central Processing Unit (CPU); requires external memory and peripherals.
- Designed for general-purpose computing tasks.
- Used in devices like desktop computers and laptops.
- Consumes more power and generates more heat.
- Relatively expensive and suited for complex applications.
Microcontroller Characteristics
- Integrates the CPU, memory, and peripherals onto a single chip.
- Designed for specific control tasks in embedded systems.
- Used in devices like washing machines, remote controls, and robots.
- Consumes low power and operates efficiently.
- Cost-effective and ideal for real-time applications.
Embedded System Software Components
Embedded system software comprises the programs that execute on embedded hardware to perform specific tasks. It serves as a crucial bridge between the hardware components and the application logic of the system.
Key Software Components
Initialization Code
This is the first code that runs after power-on. It sets up essential hardware components such as clocks, memory, and I/O pins, preparing the system to hand over control to the operating system or the main application.
Operating System (OS)
The OS manages system resources like the CPU, memory, and I/O. It can be a Real-Time Operating System (RTOS) for time-critical applications or a general-purpose OS (like Linux) for more complex systems. The OS is responsible for scheduling tasks and handling interrupts.
Device Drivers
Device drivers enable the software to communicate with specific hardware peripherals. Each driver provides a standardized software interface for a particular device (e.g., UART, Timer, LCD).
Applications
Applications perform the primary function of the embedded device. This can be a single program or multiple programs running concurrently under an operating system.
Software Execution Flow
- The system boots up by executing the initialization code stored in ROM.
- It may then load an operating system (if one is used) or directly run the application code.
- Device drivers are invoked whenever the application needs to interact with hardware components.
- Some systems also incorporate diagnostic code to test hardware functionality during the boot-up sequence.
ARM Processor Operating Modes
The ARM processor supports seven distinct operating modes, each tailored for a specific purpose. These modes regulate access to certain registers and system resources, enabling the processor to efficiently handle various tasks such as interrupts, exceptions, and user applications.
Types of ARM Operating Modes
User Mode (USR)
This mode is used for normal application programs and has no privileged access to system-level features.
Fast Interrupt Request Mode (FIQ)
Designed to handle high-priority, fast interrupts, FIQ mode utilizes extra banked registers to ensure quicker execution.
Interrupt Request Mode (IRQ)
Used for general-purpose interrupt handling, IRQ mode is commonly activated by external hardware-triggered events.
Supervisor Mode (SVC)
This is the default mode after a reset. Supervisor mode is used by the operating system kernel and has privileged access to system features.
Abort Mode (ABT)
Activated when a memory access violation occurs, Abort mode is used to handle data or instruction fetch failures.
Undefined Mode (UND)
Triggered when an undefined instruction is encountered, this mode allows software to handle unsupported operations.
System Mode (SYS)
Similar to User mode, but with privileged access, System mode is used for running trusted system-level code.
ARM Processor Core Extensions
ARM processors offer various core extensions designed to enhance performance, add functionality, and increase flexibility and efficiency in embedded applications.
Main Core Extensions
Cache and Tightly Coupled Memory (TCM)
- Cache: Improves speed by storing frequently accessed data close to the processor.
- Von Neumann Cache: Uses a single cache for both instructions and data.
- Harvard Cache: Employs separate caches for instructions and data, allowing simultaneous access.
- TCM: Fast, predictable memory directly linked to the core.
- TCM Use: Often used in real-time systems and is memory-mapped for direct access.
Memory Management Unit (MMU) & Memory Protection Unit (MPU)
- MMU: Provides virtual memory, access control, and memory mapping capabilities.
- MMU Use: Essential in complex systems running operating systems like Linux.
- MPU: Offers basic memory protection with configurable regions.
- MPU Use: Ideal for Real-Time Operating Systems (RTOS) and safety-critical applications.
Coprocessor Interface
- Allows the integration of additional coprocessors to extend ARM functionality.
- Example: Coprocessor 15 (CP15) is commonly used to control cache, TCM, and MMU/MPU settings.
- Can add new instruction sets, such as Vector Floating Point (VFP) for enhanced numerical processing.
ARM Instruction Set vs. Traditional RISC
ARM processors are fundamentally based on the RISC (Reduced Instruction Set Computer) architecture. However, ARM has incorporated several unique features that differentiate it from traditional or “pure” RISC designs.
Instruction Sets
Traditional RISC architectures typically use fixed-length instructions (usually 32-bit). ARM, in contrast, supports both 32-bit and 16-bit instructions (e.g., Thumb instruction set) to achieve better code density.
Conditional Execution
A key distinction is that ARM allows most instructions to be conditionally executed, meaning they only execute if certain conditions are met. In traditional RISC, conditional execution is generally limited to branch instructions.
Load/Store Architecture
Both ARM and traditional RISC utilize a load/store model for memory access. However, ARM enhances this with advanced addressing modes and more flexible memory operations.
Barrel Shifter
ARM processors feature a built-in barrel shifter, which can perform shift operations as part of an instruction without requiring a separate instruction. Traditional RISC architectures typically need distinct instructions for shifting.
Instruction Set Extensions
ARM supports various instruction set extensions like Thumb, NEON (for advanced SIMD processing), and Jazelle (for Java acceleration), which are not typically found in standard RISC designs.