Operating System CPU Scheduling and Memory Management

CPU Scheduling Fundamentals

Preemptive vs. Non-Preemptive Scheduling

Explain the difference between preemptive and non-preemptive scheduling. Connect the four process state transitions with these two scheduling schemes:

  • Non-Preemptive: A process runs until it voluntarily gives up the CPU or terminates. This includes transitions like Run → Wait and Run → Terminate.
  • Preemptive: A process can be interrupted and moved to the ready state by the operating system. This includes transitions like Run → Ready and Wait → Ready.

The Role of the Dispatcher

What is a dispatcher? Describe its role in CPU scheduling, including the steps involved in dispatching a process.

The dispatcher is the module that gives control of the CPU to the process selected by the scheduler. Its functions include:

  1. Performing a context switch.
  2. Switching to user mode.
  3. Jumping to the proper location in the user program to restart that program.

CPU Scheduling Criteria and Measurement

Identify the criteria that guide the design of CPU scheduling and explain how each is measured:

  • CPU Utilization: The percentage of time the CPU is busy. Measured as the fraction of time the CPU is actively executing processes.
  • Throughput: The number of processes completed per unit of time. Measured as processes per hour or minute.
  • Turnaround Time: The total time taken to execute a particular process, from submission to completion.
  • Waiting Time: The amount of time a process spends waiting in the ready queue.
  • Response Time: The time from when a request is submitted until the first response is produced (not the final output).

Common CPU Scheduling Algorithms

First-Come, First-Served (FCFS)

A simple scheduling algorithm where processes are executed in the order they arrive.

Shortest-Job-First (SJF)

This algorithm associates with each process the length of its next CPU burst. The CPU is assigned to the process with the smallest next CPU burst. It can be either preemptive or non-preemptive.

Note: The formula τ(n+1) = α * tn + (1-α) * τn is used for predicting the length of the next CPU burst, which is often necessary for implementing SJF.

Shortest-Remaining-Time-First (SRTF)

This is the preemptive version of SJF. If a new process arrives with a CPU burst time shorter than the remaining time of the currently executing process, the CPU is preempted.

Round-Robin (RR) Scheduling

Each process gets a small unit of CPU time, called a time quantum (q), typically 10-100 milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue.

  • Often, 80% of CPU bursts are shorter than the quantum.
  • Context switch time should be less than 10 microseconds.
  • Ensures no starvation.
  • A large quantum makes RR behave like FCFS.
  • A small quantum leads to too many context switches, increasing overhead.
  • Generally results in slower turnaround time but faster response time.

Priority Scheduling

Processes are scheduled based on a priority number. The CPU is allocated to the process with the highest priority.

  • A common solution to indefinite blocking (starvation) is aging, where the priority of processes that wait for a long time is gradually increased.
  • For processes with the same priority, FCFS or Round-Robin scheduling can be used.

Applying Scheduling Algorithms: Gantt Charts

Given a few processes with their time information, use the algorithms 4-8 to plot the Gantt Chart and calculate the average waiting time.

Advanced Scheduling Techniques

Multi-Level Queue Scheduling

The ready queue is partitioned into separate queues, each with its own scheduling algorithm. Processes are permanently assigned to a queue based on properties like memory size, process type, or priority.

Example priority levels: Real-time, System, Interactive, Batch.

Multi-Level Feedback Queue Scheduling

Allows processes to move between queues, enabling dynamic priority changing (e.g., through aging). This prevents starvation and allows for more flexible scheduling.

Example: Three queues (Q0, Q1, Q2).

  • Q0: Round-Robin with quantum = 8 milliseconds.
  • Q1: Round-Robin with quantum = 16 milliseconds.
  • Q2: FCFS.

A process starts at a high-priority queue (Q0) and is demoted to a lower-priority queue if it does not finish its CPU burst within the allocated quantum.

Thread Scheduling: PCS and SCS

Explain the difference between Process-Contention Scope (PCS) and System-Contention Scope (SCS) in thread scheduling.

  • Process-Contention Scope (PCS): User-level threads compete for CPU within the context of a single process. The scheduler decides which user thread to run on an available LWP (Lightweight Process).
  • System-Contention Scope (SCS): Kernel-level threads compete for CPU among all threads in the system. This is the only mode in Windows and Linux because they typically use a 1-to-1 mapping between user and kernel threads.

SMP Multiprocessing and Ready Queues

In Symmetric Multiprocessing (SMP) systems, every processor schedules itself. Describe the two main approaches for managing ready queues:

  • Common Ready Queue: All processors share a single ready queue. This approach:
    1. Requires locking mechanisms to protect the queue, which can introduce overhead.
    2. Can become a performance bottleneck under heavy load.
  • Per-Core Ready Queues: Each processor has its own private ready queue. This approach offers:
    1. Improved cache friendliness, as threads tend to run on the same core.
    2. Reduced contention for the ready queue.
    3. Requires mechanisms for workload balancing to prevent some cores from being idle while others are overloaded.

Multithreaded Multicore Systems Scheduling

Describe the two levels of scheduling in multithreaded multicore systems.

These systems feature multiple hardware threads per core, allowing for switching between threads if one stalls (e.g., due to a memory access). Hardware performs the thread switching.

There are two levels of scheduling:

  1. The Operating System chooses which software thread to run on a logical CPU.
  2. The Core chooses which hardware thread to run on the physical core (e.g., using Round-Robin or urgency/priority-based schemes).

Load Balancing and Processor Affinity

Explain load balancing and processor affinity, and why they often have opposite effects.

  • Load Balancing: Attempts to keep the workload evenly distributed among all processors in an SMP system. This can be achieved through:
    • Push Migration: A periodic task checks the load on each CPU and moves processes from overloaded CPUs to idle or less-busy ones.
    • Pull Migration: An idle CPU pulls waiting tasks from a busy CPU.
  • Processor Affinity: A process has an affinity for the processor on which it is currently running. This is beneficial because it allows the process to reuse data in the processor’s cache, reducing cache misses.
    • Soft Affinity: The OS tries to keep a process on the same processor but does not guarantee it.
    • Hard Affinity: The process specifies a subset of processors on which it may run.

Load balancing and processor affinity have opposite effects because load balancing tries to move processes, while processor affinity tries to keep them on the same processor.

Real-Time CPU Scheduling

Real-time systems require tasks to be completed within strict deadlines. Key aspects include:

  • Rate Monotonic Scheduling (RMS): A static-priority scheduling algorithm where priorities are assigned based on the inverse of the period (shorter period = higher priority).
  • Earliest Deadline First (EDF) Scheduling: A dynamic-priority scheduling algorithm where priorities are assigned based on the earliest deadline (earlier deadline = higher priority).

Real-time process execution involves:

  1. Interrupt processing.
  2. Handling conflicts (e.g., preempting kernel processes, releasing resources from low-priority processes).
  3. Dispatching the real-time process.
  4. Real-time process execution.

Real-Time Scheduling Algorithm Application

Rate Monotonic Algorithm Analysis

Given two or three processes with their period (p), deadline (d), and execution time (t) values, use the Rate Monotonic Algorithm to determine if it is possible to schedule them. If possible, plot the Gantt Chart and calculate the CPU utilization. If not, prove it quantitatively.

The CPU utilization for N processes must be less than or equal to the upper bound: N * (2^(1/N) - 1). RMS is optimal for static priority assignment.

Earliest Deadline First (EDF) Scheduling

Given two or three processes with their period (p), deadline (d), and execution time (t) values, use EDF to find the optimal scheduling. Plot the Gantt Chart and calculate the CPU utilization.

EDF is optimal for dynamic priority assignment, but it requires knowing the deadlines of all processes.

Main Memory Management

Memory Protection Implementation

Provide a simple implementation for memory protection among processes using base and base+limit registers. The base register holds the smallest legal physical address, and the limit register specifies the size of the range. Every address generated by the CPU is checked against these registers to ensure it falls within the process’s allocated memory space.

Address Systems in Program Lifetime

List the different address systems used during the lifetime of a program:

  • Source Code Addresses: Symbolic addresses (e.g., variable names, function names).
  • Compiled Addresses: Relocatable addresses (e.g., an address 14 bytes from the start of a module).
  • Linker/Loader Addresses: Absolute addresses (actual physical memory locations).

Binding of Instructions and Data to Memory

In which stages does the binding of instructions (data) to memory happen, and how?

  • Compile Time: If the memory location is known before execution, absolute code can be generated. However, if the starting location changes, the code must be recompiled.
  • Load Time: If the memory location is not known at compile time, the compiler must generate relocatable code. Binding occurs when the program is loaded into memory.
  • Execution Time: If the process can be moved during its execution, binding must be delayed until run time. This requires hardware support for address maps (e.g., using a Memory Management Unit).

Virtual vs. Physical Addresses

State the difference between virtual and physical addresses. Why does a computer system need virtual addresses?

  • Virtual (Logical) Address: An address generated by the CPU.
  • Physical Address: An address seen by the memory unit.

In compile-time and load-time binding schemes, virtual and physical addresses are the same. In execution-time binding, they differ.

A computer system needs virtual addresses to:

  • Allow a program to be larger than physical memory.
  • Provide memory protection between processes.
  • Enable memory sharing between processes.
  • Simplify memory management for programmers.

MMU Translation: Logical to Physical Address

Provide a simple scheme to use a Memory Management Unit (MMU) to translate a logical address into a physical address.

A simple scheme uses a relocation register. The value in the relocation register is added to every logical address generated by a user process to produce a physical address.

Dynamic Loading and Dynamic Linking

Describe dynamic loading and dynamic linking, including their benefits and requirements.

  • Dynamic Loading: A routine is not loaded into main memory until it is called.
    • Benefits: Better memory-space utilization because unused routines are not loaded. No special support from the operating system is required, but the OS can provide libraries to implement it.
  • Dynamic Linking: Linking is postponed until execution time. Shared libraries (DLLs on Windows, .so files on Linux) are common examples.
    • Benefits: Smaller executable file sizes, shared libraries reduce memory footprint, supports upgrades to libraries without recompiling applications.
    • Requirements: Needs version information for libraries. The OS must intervene for protection and to manage shared libraries.

Contiguous Memory Allocation

Describe how first-fit, best-fit, and worst-fit algorithms work for contiguous memory allocation. Compare their performance.

In contiguous memory allocation, each process is contained in a single, contiguous section of memory.

  • First-Fit: Allocates the first hole that is big enough.
  • Best-Fit: Allocates the smallest hole that is big enough. This requires searching the entire list of holes unless ordered by size.
  • Worst-Fit: Allocates the largest hole. This also requires searching the entire list.

Performance Comparison: First-fit and best-fit generally beat worst-fit in terms of both speed and storage utilization.

Types of Fragmentation

Describe the two types of fragmentation:

  • External Fragmentation: Total memory space exists to satisfy a request, but it is not contiguous. The”50-percent rul” states that, on average, one-third of memory may be unusable due to external fragmentation.
  • Internal Fragmentation: Allocated memory may be slightly larger than requested memory; the difference between these two is internal fragmentation (memory allocated but unused by the process).

Solutions to External Fragmentation

Describe the two main solutions to external fragmentation:

  • Compaction: Shuffles memory contents to place all free memory together in one large block. This is expensive and only possible if relocation is dynamic.
  • Paging: Allows a process’s physical address space to be non-contiguous, eliminating external fragmentation entirely.

Paging: Concepts and Benefits

What is paging, and what are its benefits?

Paging is a memory management scheme that allows the physical address space of a process to be non-contiguous. It divides logical memory into fixed-size blocks called pages and physical memory into fixed-size blocks called frames.

Benefits:

  • Eliminates external fragmentation.
  • Allows a process’s logical address space to be larger than the available physical memory.
  • Simplifies memory allocation.

Logical to Physical Address Translation in Paging

Describe the translation of a logical address to a physical address in paging.

A logical address generated by the CPU is divided into two parts:

  • Page Number (P): Used as an index into the page table. If the logical address space is M bits and the page offset is N bits, then the page number is M-N bits.
  • Page Offset (D): Combined with the frame number obtained from the page table to define the physical memory address. This is N bits.

Translation Lookaside Buffer (TLB)

What is a TLB, and why is it useful for paging?

A Translation Lookaside Buffer (TLB) is a small, fast hardware cache for page table entries. It typically has 32-1024 entries and can be direct-mapped or set-associative.

Usefulness: The TLB speeds up the translation of logical addresses to physical addresses by caching frequently accessed page table entries, thus reducing the number of memory accesses required for each translation.

Some TLBs include Address-Space Identifiers (ASIDs) in each entry to provide address-space protection for that specific process, allowing multiple processes’ entries to reside in the TLB simultaneously.

Effective Memory Access Time (EMAT) with TLB

Given a TLB hit-ratio, calculate the effective memory access time (EMAT).

Memory Protection Measures in Paging

List the measures for memory protection in paging:

  • Valid/Invalid Bit: Each page table entry has a valid/invalid bit. A ‘valid’ bit indicates that the associated page is in the process’s logical address space. An ‘invalid’ bit indicates that the page is not in the process’s logical address space or is not currently in memory.
  • Page Table Length Register (PTLR): Hardware register that indicates the size of the page table, allowing for variable-length page tables and preventing access to invalid page numbers.

Advanced Page Table Structures

Describe hierarchical page tables, hashed page tables, and inverted page tables.

  • Hierarchical Page Tables: Break the page table into smaller pieces, often using a two-level or multi-level approach, to handle large logical address spaces efficiently.
  • Hashed Page Tables: Common for address spaces larger than 32 bits. Each element in the hash table contains:
    • The virtual page number (VPN).
    • The value of the mapped page frame.
    • A pointer to the next element in the hash chain (for collision resolution).
  • Inverted Page Tables: Have one entry for each real memory frame, rather than one entry per page. Each entry contains the virtual page number and the process ID of the page owner.
    • Benefit: Decreases memory needed to store page tables.
    • Drawback: Increases search time to find a page, which can be mitigated by using a hash table to speed up the search.

Standard Swapping and Swapping with Paging

Describe standard swapping and swapping with paging.

  • Standard Swapping: Involves moving an entire process between main memory and a backing store (e.g., a fast disk).
  • Swapping with Paging: Only a few pages of a process are moved in and out of main memory as needed, rather than the entire process. This is a core component of virtual memory.

Virtual Memory Concepts

Virtual Memory: Definition and Benefits

What is virtual memory, and what are its benefits?

Virtual memory is a memory management technique that allows the execution of processes that are not entirely in memory. It separates the user’s logical memory from physical memory.

Benefits:

  • Allows programs to be larger than physical memory.
  • Less physical memory is needed while a program is running.
  • Less I/O is required to load or swap programs into memory.
  • Programs are no longer constrained by physical memory limits.
  • Enables shared address spaces between processes.
  • Facilitates more efficient process creation.

Demand Paging: Concepts and Requirements

What is demand paging? What are its benefits, what hardware support is required, and what is pure demand paging?

Demand paging is a virtual memory technique where pages are loaded into memory only when they are needed (on demand), rather than loading an entire process at once.

Benefits:

  • Less I/O is required.
  • Less physical memory is needed.
  • Faster response times for users.
  • More users can be supported.
  • It is a fundamental building block of virtual memory systems.

Hardware Support Required:

  • A page table with a valid/invalid bit.
  • Secondary memory (with swap space).
  • The ability to restart instructions after a page fault.

Pure Demand Paging: A scheme where a process starts with no pages in memory; all pages are brought in only when referenced.

Page Fault Handling Process

What is a page fault, and how is it handled step-by-step?

A page fault occurs when a process tries to access a page that is not currently in physical memory.

Steps for handling a page fault:

  1. The operating system checks the internal table for the process to determine if the reference was valid or invalid. If invalid, the process is aborted. If valid but the page is not in memory, the OS proceeds to step 2.
  2. A free frame is found (e.g., from a list of free frames).
  3. The desired page is swapped into the free frame via a scheduled disk operation.
  4. The page tables are reset to indicate that the page is now in memory, and the valid bit is set.
  5. The instruction that caused the page fault is restarted.

Maintaining a List of Free Frames

Why does an operating system maintain a list of free frames?

An OS maintains a list of free frames to quickly allocate physical memory when a page fault occurs and a new page needs to be brought into memory. This often involves zero-fill-on-demand, where the new frame is zeroed out before being given to a process to ensure data privacy and security.

Effective Access Time (EAT) in Demand Paging

Calculate the effective access time (EAT) of a demand paging system, given the page-fault rate.

Swap Space Utilization by Operating Systems

What is swap space, and how is it used by different operating systems to gain performance benefits?

Swap space is a portion of secondary memory (disk) used to temporarily hold pages that have been swapped out of main memory.

Performance Benefits: I/O to swap space is generally faster than to the file system because swap space is allocated in larger, contiguous chunks.

Usage by Different OS:

  • Old BSD Unix: Copied the entire process image to swap space at process load time, then paged in and out of swap space.
  • Linux, Windows: Pages are initially loaded from the file system. Pages that are later swapped out are written to swap space.
  • Solaris, Newer BSD: Demand page in from the binary on disk. When freeing a frame, discard the page instead of paging it out if it hasn’t been modified.

Copy-on-Write (CoW)

What is copy-on-write?

Copy-on-write (CoW) is a technique that allows parent and child processes to initially share the same pages in memory. When either process modifies a shared page, a copy of that page is made, and the modification is applied to the private copy. This optimizes process creation by avoiding unnecessary copying of entire address spaces.

The Role of the Per-Page Modify Bit

How does the per-page modify bit help improve the demand paging system?

Each page table entry typically includes a modify bit (or dirty bit). This bit is set by the hardware whenever any byte in the page is written to. When a page is selected for replacement, the modify bit is checked:

  • If the modify bit is not set, the page has not been changed since it was read from disk, so it does not need to be written back to disk.
  • If the modify bit is set, the page is”dirt” and must be written back to swap space before the frame can be reused.

This significantly improves performance by reducing unnecessary disk writes.

Page Replacement Algorithms: FIFO, OPT, LRU

Given a reference string and the number of available frames, use the First-In, First-Out (FIFO), Optimal (OPT), and Least Recently Used (LRU) algorithms to find the page fault rates.

Belady’s Anomaly

What is Belady’s Anomaly?

Belady’s Anomaly is a phenomenon observed in some page replacement algorithms (notably FIFO) where increasing the number of available memory frames can actually lead to an increase in the number of page faults for a given reference string.

LRU Algorithm Implementations

Describe the counter implementation and the stack implementation of the Least Recently Used (LRU) algorithm.

  • Counter Implementation: Each page table entry has a time-of-use field. A logical clock or counter is incremented with each memory reference. When a page is referenced, its time-of-use field is updated with the current counter value. To find the LRU page, the system chooses the page with the smallest time-of-use value.
  • Stack Implementation: A stack of page numbers is maintained. Whenever a page is referenced, it is removed from its current position in the stack and moved to the top. The page at the bottom of the stack is the LRU page. This requires updating pointers (e.g., 6 pointer updates for a typical stack operation).