Memory Organization and Management in Computer Architecture
Characteristics of Memory
1. Location: Based on its physical location, memory is classified into three types:
- On-Chip: This memory is present inside the CPU. E.g.: Internal Registers and L1 Cache.
- Internal: This memory is present on the motherboard. E.g.: RAM.
- External: This memory is connected to the motherboard. E.g.: Hard disk.
2. Storage Capacity: This indicates the amount of data stored in the memory. Ideally, it should be as large as possible. It is represented as N x M.
- N = Number of memory locations (number of words)
- M = Number of bits per memory location (word size)
E.g.: (4K x 8) means there are 4K locations of 8-bits each.
3. Unit of Transfer: Data can be transferred from memory in two different ways:
- Word Transfer: Here, if the CPU needs some data, it will transfer only that amount of data. E.g.: Data accessed from L1 Cache.
- Block Transfer: Here, if the CPU needs some data, it will transfer an entire block containing that data. This makes further access to the remaining data of this block much faster. This is based on the Principle of Spatial Locality. A processor is most likely to access data near the current location being accessed. E.g.: On a cache miss, the processor goes to main memory and copies a block containing that data.
4. Access Modes: Memories can allow data to be accessed in different ways:
- Sequential Access: E.g.: Magnetic tapes, Magnetic Disks, Optical memories. Here, locations are accessed one by one in a sequential manner. The access time depends on how far the target location is from the current location. The farther the location, the more its access time.
- Random Access: E.g.: RAM & ROM. Here, all locations can be directly accessed in any random order using an index. This means all locations have the same access time irrespective of their address.
- Direct Access followed by Sequential access: E.g.: Magnetic hard disk.
- Associative Access: E.g.: Cache Memories.
5. Access Time & Memory Cycle Time:
- Access Time is the time taken between placing the request and completing the data transfer. It should be as short as possible. It is also known as latency.
- Memory Cycle Time is the access time plus any additional time required before a second access can commence.
6. Transfer Rate: Rate at which data can be transferred into or out of a memory unit.
Tn = TA + nR, where:
- Tn = Average time to read or write n bits
- TA = Average access time
- n = Number of bits
- R = Transfer rate, in bits per second (bps)
7. Reliability: It is the time for which the memory is expected to hold the data without any errors. It is measured as MTTF: Mean Time To Failure. It should be as high as possible.
8. Cost: This indicates the cost of storing data in the memory. It is expressed as Cost/bit. It must be as low as possible.
9. Average Cost: It is the total cost per bit, for the entire memory storage. Consider a system having two memories M1 (RAM) & M2 (ROM). If C1 is the cost of memory M1 of size S1 & C2 is the cost of memory M2 of size S2, then the average cost of the memory is calculated as:
C(AVG) = (C1 * S1 + C2 * S2) / (S1 + S2)
Small sizes of expensive memory and large sizes of cheaper memory lower the average cost.
10. Hit Ratio (H): Consider two memories M1 and M2. M1 is closer to the processor (e.g., RAM) than M2 (e.g., Hard disk). If the desired data is found in M1, then it is called a Hit, else it is a Miss. Let N1 be the number of Hits and N2 the number of Misses. The Hit Ratio H is defined as the number of hits divided by total attempts.
H = (N1) / (N1 + N2)
It is expressed as a percentage. H can never be 100%. In most computers, it is maintained around 98%.
Various Cache Memory Mapping Techniques
Cache memory mapping is essential for determining how data from main memory is placed into the cache. The three primary cache memory mapping techniques are:
- Direct Mapping: Each block of main memory maps to exactly one cache line. The cache line is determined by the modulo operation on the memory block address. For example, if the main memory has 16 blocks and the cache has 4 lines, block 0, 4, 8, 12 will all map to line 0; block 1, 5, 9, 13 will map to line 1, and so on.
- Advantages:
- Simplicity: Easy to implement and understand.
- Speed: Fast access time since the mapping process is straightforward.
- Low Cost: Requires fewer resources for implementation.
- Disadvantages:
- Conflict Misses: High probability of conflict misses, as multiple blocks can map to the same cache line, causing frequent overwrites.
- Poor Utilization: If the program frequently accesses blocks that map to the same cache line, the cache may not be used efficiently.
- Advantages:
- Flexibility: Reduces conflict misses as any block can go into any line.
- Better Utilization: Cache lines are used more efficiently, as there are no fixed mappings.
- Disadvantages:
- Complexity: Requires complex hardware to search the entire cache (associative search).
- Cost: Higher cost due to the need for content-addressable memory (CAM) or equivalent search mechanisms.
- Slower Access: Searching the entire cache can take more time compared to direct mapping.
- Advantages:
- Balanced Approach: Offers a good compromise between complexity and performance.
- Reduced Conflict Misses: Lower conflict misses than direct mapping as there are multiple lines per set.
- Flexibility: More flexible than direct mapping, but simpler than fully associative mapping.
- Disadvantages:
- Moderate Complexity: More complex than direct mapping due to the need to search within a set, but simpler than fully associative mapping.
- Cost: More costly than direct mapping due to the added complexity of set management.
- Moderate Speed: Access time is slower than direct mapping but faster than fully associative mapping.
Need for DMA
Computers need the Direct Memory Access (DMA) module to transmit data between I/O devices and memory without straining the CPU. The main reasons for its need:
Better Performance:
- Parallel Processing: Offloading data transfer operations to the DMA module lets the CPU work on other tasks, enhancing system performance.
- Reduced CPU Overhead: Without DMA, the CPU would process every data transfer operation, wasting power and time.
Efficient Data Transfer:
- High-Speed Transfers: DMA can directly transport data between I/O devices and memory, which is essential for disk operations, network communication, and multimedia processing.
- Large Block Transfer: DMA is optimized for large block transfers, unlike CPU-driven data transfer methods, which are better for smaller, less frequent transfers.
Techniques of Data Transfer using DMA
1. Cycle Stealing: In cycle stealing, the DMA controller transfers one word of data at a time and then releases the bus to the CPU. This process is repeated until the entire data block is transferred.
- Advantages: This technique allows the CPU and DMA to share the bus more equitably, reducing the impact on the CPU’s performance.
- Disadvantages: The overall transfer time is longer compared to burst mode, as the DMA controller must repeatedly acquire and release the bus.
Flip-Flop: A Short Note
A flip-flop is a basic digital memory circuit, capable of storing one bit of information. It is a bistable multivibrator, meaning it has two stable states and can switch between these states based on input signals. Flip-flops are fundamental building blocks in digital electronics, used extensively in various applications such as data storage, data transfer, and synchronization.
Memory Hierarchy
1. Registers
- Location: Inside the CPU.
- Speed: Fastest.
- Size: Smallest.
- Cost: Highest per bit.
- Function: Store temporary data and instructions being used by the CPU.
2. Cache Memory
- Levels: Usually consists of multiple levels (L1, L2, L3).
- L1 Cache: Closest to the CPU, smallest, and fastest.
- L2 Cache: Larger and slightly slower than L1.
- L3 Cache: Larger and slower than L2.
- Speed: Very fast, though slightly slower than registers.
- Size: Larger than registers but smaller than main memory.
- Cost: High per bit but lower than registers.
- Function: Stores frequently accessed data to reduce average access time.
3. Main Memory (RAM)
- Location: On the motherboard.
- Speed: Slower than cache memory.
- Size: Larger than cache memory.
- Cost: Lower per bit than cache memory.
- Function: Stores data and instructions currently being used by the CPU.
4. Secondary Storage
- Types: Hard Disk Drives (HDDs), Solid State Drives (SSDs).
- Speed: Slower than main memory.
- Size: Much larger than main memory.
- Cost: Lower per bit compared to RAM.
- Function: Stores data and programs not currently in use. Provides non-volatile storage.
5. Tertiary Storage
- Types: Optical discs (CDs, DVDs), magnetic tapes.
- Speed: Much slower than secondary storage.
- Size: Variable, often used for backup and archival.
- Cost: Very low per bit.
- Function: Long-term storage and backups.
SR Flip-Flop: Working
The SR (Set-Reset) flip-flop is one of the simplest types of flip-flops. It has two inputs, Set (S) and Reset (R), and two outputs, Q and \( \overline{Q} \) (the complement of Q). The behavior of the SR flip-flop can be summarized as follows:
- S = 0, R = 0: No change in the output. The flip-flop maintains its previous state.
- S = 1, R = 0: Set the output Q to 1 (\( \overline{Q} \) = 0).
- S = 0, R = 1: Reset the output Q to 0 (\( \overline{Q} \) = 1).
- S = 1, R = 1: This is an invalid state for the basic SR flip-flop, leading to an undefined output.
JK Flip-Flop: Working
The JK flip-flop is an enhancement of the SR flip-flop that eliminates the invalid state problem. It has two inputs, J and K, and two outputs, Q and \( \overline{Q} \). The JK flip-flop behaves as follows:
- J = 0, K = 0: No change in the output. The flip-flop maintains its previous state.
- J = 1, K = 0: Set the output Q to 1 (\( \overline{Q} \) = 0).
- J = 0, K = 1: Reset the output Q to 0 (\( \overline{Q} \) = 1).
- J = 1, K = 1: Toggle the output Q. If Q was 1, it becomes 0, and if Q was 0, it becomes 1.
Pipeline Hazards
Pipeline hazards are situations in pipelined processors that cause delays and prevent the next instruction in the instruction stream from executing during its designated clock cycle. These hazards disrupt the smooth execution of instructions in a pipeline. There are three main types of pipeline hazards:
1. Structural Hazards
- Occur when hardware resources required by the pipeline are insufficient to handle all the instructions simultaneously.
- Example: If a processor has a single memory unit for both instruction fetch and data access, a structural hazard can occur if an instruction needs to access data while another instruction is being fetched.
- Solution:
- Duplication of resources, such as having separate instruction and data caches (Harvard architecture).
- Pipeline stalls (inserting no-operation (NOP) instructions).
2. Data Hazards
- Occur when instructions that exhibit data dependency modify data in different stages of the pipeline.
- Types:
- Read After Write (RAW): Also called a true dependency, occurs when an instruction needs to read a location that a previous instruction writes to.
- Write After Read (WAR): Also called an anti-dependency, occurs when an instruction writes to a location that a previous instruction reads from.
- Write After Write (WAW): Also called an output dependency, occurs when an instruction writes to a location that a previous instruction writes to.
- Example:
- ADD R1, R2, R3 ; Instruction 1: R1 = R2 + R3
- SUB R4, R1, R5 ; Instruction 2: R4 = R1 – R5 (RAW dependency on R1)
- Solution:
- Forwarding (Data Bypassing): Directly passing the output of an instruction to a subsequent instruction without writing it to the register file.
- Pipeline Stalls (Bubble Insertion): Inserting NOPs to wait until the required data is available.
- Compiler Scheduling: Rearranging the order of instructions to avoid hazards.
3. Control Hazards
- Occur due to the pipelining of branch instructions and other instructions that change the Program Counter (PC).
- Example: Branch instructions (like jump or conditional branches) may change the flow of instruction execution, leading to control hazards.
- Solution:
- Branch Prediction: Guessing the outcome of a branch and continuing execution accordingly. Modern processors use sophisticated algorithms for prediction.
- Delayed Branch: Reordering instructions such that the branch decision is made earlier.
- Pipeline Flushing: Clearing the pipeline of instructions that were fetched or executed based on incorrect predictions.