Operating Systems & Memory Management: A Comprehensive Guide

Exam Cheat Sheet

Memory

Stack & Heap

C programs use the stack for local variables, function parameters, and return addresses.

C programs use the heap for explicitly requested dynamically allocated data (with malloc()).

In general, data is put on the stack unless we use malloc(), then it’s on the heap.

Malloc & Free

Used in C as a type of manual memory management.

Always free malloc()! Else, lose points.

Modern languages like Java, Erlang, etc. use Automatic Memory Management. The system allocates data structures on the heap and uses garbage collection to prevent it from getting full. Garbage collection identifies structures that are no longer needed and reuses that space in continued execution.

Buddy Algorithm with Exponents of 2

  • Coalesce and split blocks of a specified size.
  • Can store free blocks in a linked list or start from the beginning of memory and traverse memory.
  • About 25% internal fragmentation due to not using whole blocks of data.

Free Memory Algorithms

  • Best Fit – First, search through the free list and find chunks of memory that are as big or bigger than the requested size. Then return the smallest in that group. Reduces wasted space; however, it pays in performance by searching through the entire block list every time.
  • Worst Fit – Find the largest chunk and return the requested amount of space within that chunk, then keep the remaining space from the chunk on the free list. Attempts to leave big chunks free instead of fragmentation. However, it needs to go through a full search of the free list.
  • First Fit – Finds the first block that is big enough and returns the requested amount to the user. The remaining free space is kept free in the free list. Has the advantage of speed. But can pollute the beginning of the free list with small objects. Using address-based ordering keeps the list ordered by address, making coalescing easier.
  • Next Fit – Same as first fit but keeps a pointer within the list it was looking at last. Spread the searches for free space throughout the list more uniformly. Avoids splintering at the beginning of the list.
  • Least Recently Used
  • Segregated Lists – Keep a list to manage objects of specific sizes that a particular program frequently requests.

Memory Paging

Paging as opposed to segmentation of memory into variable-sized pieces.

Advantages

  • Most important is flexibility; with a fully developed paging approach, the system will be able to support the abstraction of an address space effectively, regardless of how a process uses the address space.
  • No fragmentation of memory into various sizes, which would make allocation more challenging.
  • Simplicity of free space management that paging allows.

Disadvantages

  • Page tables can get very huge, so a large amount of memory would be dedicated just to translations between virtual and physical memory.
  • It can be very slow, as for every memory reference, paging requires us to perform one extra memory reference to fetch the translation from the page table.

A virtual memory page is mapped to a frame in physical memory.

To record where each virtual page of the address space is placed in physical memory, a per-process data structure called a page table is used.

Page tables are used to store address translations for each virtual page to frame for a process.

A page table entry holds the information for each individual translation between virtual and physical pages.

The virtual address in a memory paging system contains the virtual page number and offset.

The VPN refers to which page our data is in, and the offset refers to the specific byte the data starts at.

The offset is the same once translated into a physical memory address, while the VPN gets translated with a lookup to the page table.

Linear Page Table

An array storing PTEs in a linear format. The OS indexes the array by the virtual page number and looks up the PTE at that index to find the physical frame number (PFN).

Each PTE contains:

  • Valid Bit – Used to indicate whether the particular translation is valid. Some parts of the memory will be marked invalid, and if a process attempts to access them, the OS will likely shut it down.
  • Protection Bits – Indicating whether the page could be read from, written to, or executed from.
  • Present Bit – Indicates whether this page is in physical memory or on the disk.
  • Dirty Bit – Indicates whether the page has been modified since it was brought into memory.
  • Reference Bit – Used to track how many times a page has been accessed and how popular it is, therefore whether to keep it in memory.

Translation-Lookaside Buffer (TLB)

A hardware cache of popular virtual to physical address translations, which is part of the memory management unit (MMU) chip. Upon each virtual memory reference, the hardware first checks the TLB to see if the desired translation is present. This avoids consulting the page table.

Increasing the page size reduces the size of the page table as it now holds fewer entries for the same amount of referenced memory. The issue is more internal fragmentation due to wasted space.

Multi-Level Page Tables

Tree structure of page directories instead of a linear page table. Saves and gets rid of all those invalid regions in the page table instead of keeping them in memory.

Inverted Page Table

Instead of having many page tables, we keep a single page table that has an entry for each physical page table of the system. The entry tells us which process is using this page and which virtual page of that process maps to this physical page.

C Commands

  • dup2()
  • execl()
  • opendir()
  • readdir()
  • open() – Makes a file if there isn’t one and adds it to the FDT.

Unix Commands

  • wc – Word and byte count for a file.
  • ln – Create a link to a file.
  • head – Output the first part of a file.
  • tail – Output the last part of a file.
  • diff – Compare files line by line.
  • sort – Sort lines of text files, e.g., by ascending numerical value.
  • sed – Stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
  • tr – Translate or delete characters. Copies the standard input to the standard output with substitution or deletion of selected characters.

Threads

Threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

Communication

  • Pipes – Connected processes where the stdout of one process feeds directly into the stdin of the next.
  • Socket – Data communications endpoint for exchanging data across two points on the same OS.
Types of Sockets

(Image not included)

Scheduling

We use different metrics to measure scheduling:

  • Performance
  • Fairness
  • Response Time – The time from when the job arrives in a system to the first time it is scheduled. Relates to user interactivity.

Other Definitions

  • Turnaround Time – The time at which the job completes minus the time at which the job arrived in the system.
  • Time Slice – Piece of a job that has been split up to be run using the RR technique.

Scheduling Algorithms

  • FIFO
    • Most basic algorithm, aka First Come, First Served.
    • Advantages: Simple and easy to implement.
    • Disadvantages: Convey effect – a number of relatively short jobs get queued behind a heavyweight job, which increases the average turnaround time for all.
  • Shortest Job First (SJF)
    • Runs the shortest job, then the next shortest, and so on.
    • Advantages: Decreases average turnaround time.
    • Disadvantages: Convey effect can still happen if jobs arrive shortly after each other, e.g., if a heavyweight job arrives before a bunch of smaller jobs.
  • Shortest Time to Completion First (STCF)
    • Aka as preemptive shortest job first.
    • When a new job enters the system, the STCF scheduler determines which of the remaining jobs has the least time left and schedules that one. It does it by interrupting jobs and swapping contexts.
    • Advantages: Much-improved turnaround compared to SJF.
    • Disadvantages: Bad for response time and interactivity of the user; if three jobs arrive simultaneously, the third job has to wait for the previous two jobs to run in their entirety before being scheduled just once.
  • Round Robin (RR)
    • Instead of running jobs to completion, runs a job for a time slice, and then switches to the next job in the run queue.
    • The length of a time slice must be a multiple of a timer interrupt. The shorter the time slice, the better the response time; however, a too-small time slice results in a higher amount of switching contexts, which impacts performance.
    • Advantages: It is a very fair algorithm, as all jobs have similar resources spent on them. Lower response time.
    • Disadvantages: If turnaround time is our metric, RR is one of the worst algorithms, as it stretches out jobs as long as it can.
  • Fixed Priority
  • Multilevel Feedback Queue (MLFQ)
    • Addresses the fact that we cannot know in advance how long a program will run with the other algorithms.
    • MLFQ varies the priority of a job based on its observed behavior, so a job’s priority changes over time.
    • Here are the rules:
      • If priority(A) > priority(B), then run A.
      • If priority(A) == priority(B), then run RR with A & B.
      • When a job enters the system, it is placed at the highest priority.
      • Once a job uses up its time allotment at a given level, its priority is reduced (it moves down one queue).
      • After some period S, move all the jobs in the system to the topmost queue.
    • You need the 5th rule; otherwise, non-interactive processes will starve and will not complete in good time.
    • Advantages: Manages to both optimize turnaround time & makes a system that is responsive to its interactive user by reducing response time.
    • Disadvantages: Complicated?

File Systems and Storage

How an OS Interacts with a Device (I/O)

  • Interrupts – If a process is waiting for a device to return, instead of occasionally polling the device to see if it is done, which is CPU-consuming, the process can go to sleep while the CPU deals with another process, and the device in question will raise a hardware interrupt when done, which causes the OS to wake the process waiting for the I/O. The downside is they only really work for slow devices; otherwise, the cost of context switching and interrupts might outweigh the benefits.
  • Direct Memory Access (DMA) – When transferring large chunks of data, the CPU has to do a simple and easy task of taking data from memory and copying it to the I/O device. The CPU could be used for better tasks that require computation; therefore, a DMA is used. DMA is a specific device that orchestrates transfers between devices and main memory without much CPU intervention.
  • Disk Scheduling – I/O has a high cost, so the OS decides on the order of I/Os issued to the disk. The disk scheduler examines requests and decides which one to schedule next.
  • Shortest Seek Time First (SSTF) – Puts the request on the nearest track to complete first. Meaning the closest read to the current position of the head. The disadvantage is starvation; if there was a steady stream of requests to the near track, any requests to further tracks would be ignored.
  • Nearest Block First (NBF) – Same as SSTF but schedules the request with the nearest block instead of track?
  • Elevator / SCAN / CSCAN – The head sweeps across all tracks on the drive and services the requests as it sweeps over the correct area. The issue is it doesn’t adhere to SJF (shortest job first).
  • Shortest Positioning Time First (SPTF) – Solves the problem by approximating SJF by taking both seek and rotation of the disk into account. Takes the speed of seek (moving the head) and rotation (of the disk) into account when deciding which track & sector is the shortest time away.

File

Is an array of bytes. Each file in a directory has an inode number associated with it.

Directory

Collection of tuples, each of which contains a human name and a low-level name to which it maps. Directories also have an inode number.

File Descriptor Table

Each process has one that keeps track of resources used by that process. e.g., sockets, pipes, files, I/O resources. The first three are 0: STD_IN, 1: STD_OUT, 2: STD_ERR. The entry in the FDT contains the inode number and the current offset of the file.

Inode

Short for index node. Inside each inode is virtually all the information needed about a file: its type, its size, the number of blocks allocated to it, protection info like who owns and can access the file, time information (created, modified, accessed), if the data is on disk.

When you open a file in C, you get a file descriptor returned to you. The file descriptor is a per-process integer for each resource in the file descriptor table.

In the permissions column for files, the leftmost character is”” for normal files,”” for directories,”” for soft links.

Hard Link

Created using the ln command. Creates another name in a directory that refers to the same inode as the one you are linking it to. The file is not copied; you just have two human names that both point to the same file. Hard links are limited, can’t make one to a directory in fear of creating a cycle in the tree, and you can’t create one between disk partitions.

Soft Link / Symbolic Link

Use ln but with the -s flag. A soft link is an actual file with the path to the file it is linked to. It uses up space in the system. If you remove the original file the soft link points to, then the soft link simply points to a file that no longer exists.

Reference Count / Link Count

The number of times an inode number is referenced by file names in the system. If this number goes to 0, the data is”delete” as it is released into the system as free space.

Crash Consistency Problem

How to update a persistent data structure in the event of power losses and crashes.

Solutions to Crash Consistency Problem

  • File System Checker – Runs before the file system is mounted, does sanity checks to make sure there is a coherent structure to the inodes and superblocks, that nothing has been corrupted.
  • Journaling – When updating disks, log what you are about to do. In the event of a crash at reboot, you will check the logs and see if there was anything that needs to be continued that was interrupted.
  • Log-Based File System – A new way of updating the disk. Instead of overwriting files in places, LFS always writes to an unused portion of the disk and later reclaims that old space through cleaning.

Processes

When you fork a process, what are the sharing rules with the new process?

  • Stack: No
  • Heap: No
  • Global Memory: No
  • Code Area: No
  • Open Files: Yes

Time-sharing of the CPU allows users to run as many concurrent processes as they would like by promoting the illusion that there exist infinite virtual CPUs. The cost is performance, as each process runs more slowly if the CPU is shared.

Space sharing is where a resource is divided in space among those who wish to use it. E.g., disk space.

Process API

  • Create: OS must be able to create new processes.
  • Destroy: Forcefully be able to destroy a process.
  • Wait: It is useful to wait for a process to stop running.
  • Miscellaneous Control: E.g., suspend process.
  • Status: Get status information on the running of a process.

C programs use the stack for local variables, function parameters, and return addresses.

C programs use the heap for explicitly requested dynamically allocated data. Programs request this free space with malloc().

Steps for Running a Program Done by the OS

  • Loading the code and static data into memory.
  • Creating & initializing the stack.
  • Other work related to I/O setup.
  • Start the program running at the entry point, main().

A process has three states that it transitions between:

  • Running – Currently executing instructions.
  • Ready – Obvious.
  • Blocked – Not ready until some other event takes place.

A zombie process is a process that has completed execution but is still in the process table. Can occur when child processes are finished but are waiting for their parent to read their exit status. Once the exit status is read by the wait() system call, the child process is no longer a zombie.

Memory Mapping of a Running Process