System Programming Fundamentals: Linking, Memory, and Concurrency
Linking
- The linker is unaware of local variables that exist only within functions.
- Local variables with the same name as global variables follow their local scope.
Symbol Types in Linking
Global Symbols
- Symbols defined by module
m(e.g.,m.cpp) that can be referenced by other modules. - Includes non-static C functions and non-static global variables.
External Symbols
- Global symbols referenced by module
mbut defined by some other module.
Local Symbols
- Symbols defined and referenced exclusively by module
m. - Includes C functions and global variables defined with the
staticattribute. - Note: Local linker symbols are distinct from local program variables (stack variables).
Symbol Strength Rules
Weak Symbols
- Typically uninitialized global variables.
- The linker allows multiple definitions; if found, it randomly picks one.
Strong Symbols
- Functions and initialized global variables.
- The linker allows only one definition; an error occurs if zero or more than one definition is found.
- Weak symbols are replaced by strong symbols during linking if both exist.
Linker Algorithm for Resolving External References
- Scan
.ofiles (object files) and.afiles (archives) in the command-line order. - During the scan, maintain a list of current unresolved references.
- As each new object file (
.oor.a) is encountered, attempt to resolve each unresolved reference in the list against the symbols defined in that object file. - If any entries remain in the unresolved list at the end of the scan, an error is reported.
General Symbol Resolution Method (2012.7)
Precedence: static > Strong Global > Weak Global
- Look for any local changes to the variable in the targeted function.
- Look for any
staticdeclarations in the targeted file. - Trace for the latest change to the target variable as normal (global scope resolution).
- Section Mapping: Functions map to
.text|| Global variables map to.data|| Constants map to.rodata.
Process and Signal Management
fork(): The parent process returns the child’s PID; the child process returns 0.kill(pid, sig): Sends signalsigto processpid(pid=0means self;pid=-1means all processes the user can affect).killreturns after the signal is successfully sent.- A signal sent to the calling process (self) must be handled before
killreturns.
wait(): Waits for one child process to terminate.wait(&status)orwaitpid(): Waits for one child to change state or terminate, depending on arguments.- If a process has no child processes,
waitimmediately returns-1. WNOHANG: If specified children exist but have not changed state, the function returns immediately with0.
- If a process has no child processes,
- Global Variables: Global variables are logically separate between child and parent processes immediately after
fork(), but they utilize Copy-on-Write (CoW) mechanism, meaning the physical memory pages are shared until either process modifies them.
Input/Output (I/O) and File Descriptors
I/O Data Structures
- File Descriptor Table: Stores all file descriptors (FDs) that point to entries in the Open File Table.
- One table per process.
- Open File Table (OFT): Stores the reference count (
refcnt) and the current file position (filepos).- One table shared across all processes (system-wide).
- V-Node Table (Inode Table): Stores metadata about the file (e.g., file type, access permissions, size).
- One table shared across all processes (system-wide).
I/O Operations
open(): Different calls toopen()on the same file return independent instances (separate Open File Table entries), sofileposis not shared.fork()Behavior: A child process created viafork()shares the exact same Open File Table entries (includingfilepos) with the parent.dup2(oldfd, newfd): Copies the per-process File Descriptor Table entryoldfdto entrynewfd.newfdnow points to the same Open File Table entry asoldfd.
read()&write(): These operations dynamically change the file content and position via the file descriptors pointing to the same Open File Table entry.- Reading and writing start at the current
filepos. - If End-of-File (EOF) is encountered immediately,
read()returns 0 (not an error). The buffer content remains unchanged.
- Reading and writing start at the current
- Impact of
fork(): Changes made to the File Descriptor Table (e.g., closing an FD) in one process will not affect others. However, changes to the Open File Table (which storesfilepos) will affect all processes sharing that entry.
Virtual Memory (VM)
- Virtual memory space is typically larger than physical memory (DRAM).
- Programs exhibiting better locality of reference will have smaller working sets, improving performance.
- A virtual page can be mapped to different physical pages at different times.
- The
.text(code) and.data(initialized global variables) sections are copied, page by page, on demand by the VM system (demand paging).
Page Table Structure
The Page Table is a per-process data structure located in DRAM.
- Page Table Entry (PTE) Stores:
- Valid Bit:
0indicates a Page Fault (page not in DRAM);1indicates the page is present in DRAM. - Physical Page Number (PPN): Used to locate the page frame in physical memory.
- Valid Bit:
- Page Fault Handling: Occurs when the required physical memory is not currently in DRAM (i.e., it must be retrieved from disk).
- The OS selects a page frame in DRAM to evict (if necessary).
- The required page is loaded from disk into the frame.
- The Valid Bit in the corresponding PTE is set to
1. - The CPU reruns the instruction that caused the fault.
Address Spaces
Virtual Address Space (VAS)
- Size is $2^n$ bytes.
- Page size is $2^p$ bytes per page.
- $n$ bits total ($n \ge m$, where $m$ is physical address size).
- Virtual Page Number (VPN): Bits $[p, n-1]$.
- Virtual Page Offset (VPO): Bits $[0, p-1]$.
Physical Address Space (PAS)
- Size is $2^m$ bytes.
- Stored in DRAM.
- $m$ bits total ($m \le n$).
- Physical Page Number (PPN): Bits $[p, m-1]$.
- Physical Page Offset (PPO): Bits $[0, p-1]$.
Translation Lookaside Buffer (TLB)
The TLB is a small, fast cache for Page Table Entries (PTEs).
- TLB Block Structure: Valid bit (v) + Tag + PTE.
- The Virtual Page Number (VPN) is split into the TLB Tag (T) and TLB Index (I): $VPN = T + I$.
- The TLB is typically $E$-way set associative with $2^s$ sets, totaling $E \times 2^s$ entries.
- TLB Index ($s$ bits): Selects the set to search within the TLB.
- TLB Tag ($t$ bits): Used to match the specific TLB entry within the selected set.
Address Translation Process
VA $\rightarrow$ TLBI $\rightarrow$ TLBT $\rightarrow$ PTE $\rightarrow$ PA
- The CPU sends the Virtual Address (VA) to the Memory Management Unit (MMU).
- Note: The Virtual Page Offset (VPO) is always equal to the Physical Page Offset (PPO).
Dynamic Memory Allocation (Malloc)
- On x64 systems, memory allocations are typically double word (16 bytes) aligned.
- Internal Fragmentation: Wasted space caused by overhead (headers, footers) within allocated blocks, or padding to meet alignment requirements.
- External Fragmentation: Wasted space caused by free blocks that are too small or non-contiguous to satisfy a requested allocation.
Heap Management Policies
Implicit List
- Each block typically has a header and footer, often 1 word (8 bytes) each.
- Smallest possible block size is often 32 bytes (due to alignment and overhead).
Explicit List
- Each block has a header and footer (1 word each).
- Free blocks contain two pointers (e.g.,
prevandnext) to link them in a free list. - Smallest block size is typically 32 bytes for both allocated and free blocks.
Segregated List (SegList)
- Free blocks are segregated by size into different lists.
- Offers improved performance characteristics compared to simple implicit or explicit lists.
Allocation Strategies
- First Fit: Generally offers better performance (faster search time).
- Best Fit: Generally offers better memory utilization (less fragmentation).
- Extending the Heap: The heap can be extended, often conceptually adding space “in the middle” of the free block structure or at the end of the heap region.
Concurrency and Threads
- Threads exist within a process; termination of the process kills all associated threads.
- Deadlocks: Caused when multiple threads require different locks that are held by others in a circular dependency.
- Deadlock prevention often involves ensuring a strict ordering of lock acquisition (e.g., mutexes must not overlap or must follow a hierarchical subset rule).
Process vs. Thread Comparison
Process Characteristics
- Creation (
fork()) involves system calls and is relatively slow. - The OS treats processes as independent entities.
- Processes have separate copies of data, file descriptors (initially shared, but independent tables), and code segments (CoW).
- Context switching between processes is slower.
- Each process has its own stack pointer and its own set of pending and blocked signals.
Thread Characteristics (POSIX)
- Creation (
pthread_create) typically involves kernel interaction. - All threads of the same program are treated as a single task by the OS scheduler.
- Threads share the code and global data segments. Each thread has its own “local” stack (unprotected from other threads in the same process).
- Context switching between threads is faster than between processes.
- Threads share signal handlers across the whole process but maintain their own stack pointer and blocked signal mask.
POSIX Threads (Pthreads)
pthread_create(&threadid, &attr, func, (void*)argv): Creates a new thread.pthread_join(threadid, &returnval): Waits for the specified thread to terminate and retrieves its return value.- Detached State: A thread is detached if it is marked as not needing to be reaped by
pthread_joinupon termination. - Joinable State: Threads are joinable by default, requiring
pthread_jointo clean up resources.
Synchronization: Semaphores
Semaphores are used for controlling access to shared resources.
sem_t mutex;
sem_init(&mutex, 0, 1); // Initialize as a binary semaphore (mutex)
- P (Wait/Decrement) Operation:
[ while (mutex == 0) wait(); mutex--; ] - V (Signal/Increment) Operation:
[ mutex++; ]
Programming Trivia and Undefined Behavior
- Shift Operators: Using left or right shift operators on negative numbers results in undefined behavior in C/C++.
- Shifting an $N$-bit integer by $N$ or more bits is also undefined behavior.
retInstruction (x64): Executes a pop operation, causing the stack pointer (%rsp) to increase (moving toward the top of the stack). The base pointer (%rbp) remains unchanged byretitself.- Accessing an uninitialized variable results in undefined behavior.
- Signed integer overflow should be avoided as it results in undefined behavior.
- Memory Layout: The stack typically starts high and grows downward (towards lower addresses). The heap starts low and grows upward (towards higher addresses).
- Memory Speed Hierarchy: DRAM is approximately 10x slower than SRAM, but 10,000x faster than disk storage.
Reference Table (Hex, Binary, Octal, Decimal Power)
| Hex | Binary | Octal | Decimal/Power of 2 |
|---|---|---|---|
| 0 | 0000 | 00 | 1 ($2^0$) |
| 1 | 0001 | 01 | 2 ($2^1$) |
| 2 | 0010 | 02 | 4 ($2^2$) |
| 3 | 0011 | 03 | 8 ($2^3$) |
| 4 | 0100 | 04 | 16 ($2^4$) |
| 5 | 0101 | 05 | 32 ($2^5$) |
| 6 | 0110 | 06 | 64 ($2^6$) |
| 7 | 0111 | 07 | 128 ($2^7$) |
| 8 | 1000 | 08 | 256 ($2^8$) |
| 9 | 1001 | 09 | 512 ($2^9$) |
| A | 1010 | 10 | 1024 ($2^{10}$) |
| B | 1011 | 11 | 2048 ($2^{11}$) |
| C | 1100 | 12 | 4096 ($2^{12}$) |
| D | 1101 | 13 | 8M ($2^{23}$) |
| E | 1110 | 14 | 16M ($2^{24}$) |
| F | 1111 | 15 | 32M ($2^{25}$) |
Note: The decimal values for D, E, and F represent powers of 2 in the megabyte range, likely $2^{23}, 2^{24}, 2^{25}$ respectively.
