Operating System Fundamentals: Structure, Processes, and File Systems
W1.1 Operating Systems Fundamentals
W1.1.1 OS Introduction
W1.1.2 Types of Operating Systems
Examples of Operating Systems include: Android, Linux, FreeBSD, MacOS, MS-DOS, Windows, uCOS, VxWorks. These systems are suited to small, medium, or large systems, exhibiting different characteristics regarding:
- Responsiveness
- Efficiency
- Flexibility
- Security
- Reliability
W1.1.2 Different Kinds of OS
Operating systems are categorized based on their usage:
- Single-user systems: Execute one program at a time for one user.
- Multi-user systems: Provide multiple virtual machines for multiple users.
- Embedded systems: Control a particular process or piece of equipment (e.g., a washing machine). These are typically single-user systems.
- Clusters: Used for cloud-based computing and large-scale information systems infrastructure. They require OS-level support for running programs across multiple CPUs to support multiprocessing.
W1.1.3 OS Functions: Program Execution
The process of executing a program involves several steps:
- The program executable must be stored somewhere, usually in secondary memory (non-volatile storage like a hard disk or flash memory).
- To run, the executable is moved to RAM.
- Programs are executed by the CPU, utilizing internal resources.
- Programs may need access to other resources like the screen, mouse, or keyboard.
When multitasking, the OS scheduler rapidly switches between executing programs.
W1.1.3 What OS Does: Resource Management and Virtualization
The OS performs two primary roles:
- Resource Manager: Shares hardware and peripherals between programs and users.
- Internal Resources: CPU time, memory storage, network access, interrupts, storage (HDD/SSD).
- External Resources: Displays, sound input/output, keyboard, mouse/touchscreen, connected USB storage.
W1.1.4 OS Structure
The most common OS structures are:
- Monolithic: One large program, one big blob.
- Layered.
- Microkernel: In a client-server OS, the basic system (kernel) is small, providing only basic services (module communication). Other functionality resides in modules handling specific aspects.
In a microkernel architecture:
Client Program → File System → Hard Disk Driver → Microkernel (providing communications channel)
W1.1.5 OS Features
Key features of an Operating System include:
- Support for concurrency
- Sharing mechanism(s)
- Persistent storage
- Nondeterminacy
- Efficiency
- Reliability
- Resilience
- Maintainability
- Small size
W1.2 Processes and Concurrency
The OS manages a set of activities, some for the system itself and some for users, known as processes. Any computer system contains one or more processors (CPU) on which processes run. Multi-core machines have several CPUs, each potentially containing multiple processors.
Concurrency, processes, and processors require communication and synchronization.
Kernel Definition
The kernel is the core layer of a layered system or the major part of a monolithic kernel. It contains the most critical code; errors here will likely crash the system. It should be as small and simple as possible and must validate all requests.
Kernel functions include process switching and control, task communication and synchronization, and hardware interfacing. Some kernels split into ‘upper’ and ‘lower’ parts; the upper part is generally consistent, while the lower part changes with different hardware.
W1.3 Operating System Kernel Components
- First Level Interrupt Handler (FLIH): The OS instructs hardware to jump to the FLIH when an interrupt occurs. There may be several FLIHs if interrupts are ‘vectored’.
- Communication Primitives: Usually includes semaphores (for synchronization) and message/data passing.
- Low Level Scheduler: Performs process start-up and switching. Start-up is a special case involving: (1) allocating and initializing a Process Control Block (PCB), and (2) adding the PCB to the list of runnable processes and updating system tables.
Process Switching
Process switching involves:
- Choosing the next process to run.
- If it is not the current process: saving the context of the current process to its PCB, loading the new process context from its PCB, and passing control to the new process.
Processes usually have an associated priority governed by factors like importance (system process), base priority, previous resource usage, performance, and held resources. The Process Control Block (PCB), or process descriptor, is a crucial data structure, with one PCB existing for each process.
W2.5 Page Fault Handling
A page fault occurs if a process attempts to read from or write to an address that is:
- In a page not currently in a page frame (indicated by the V bit in the page table entry not being set).
- To a page outside the bounds of the page table.
The hardware generates an interrupt, which the OS handles. A page fault is not an error.
How the OS Responds to a Page Fault
If the page number is out of range, it usually indicates an error (page number beyond VM size), though it might be expandable memory (like a stack).
If the page is legal but never referenced (new/unused), the OS creates an empty page and resumes the process.
If the page is not in a page frame and not in the page file, it might be a program error (hole/reserved page). If the page is in the page file, this triggers a page-in operation.
Page-In Operation Steps
- Locate a free page frame. (The OS may need to retire another page if none are free.)
- Read the page in from the page file to the page frame. (The OS can allow another process to run while waiting for this slow operation.)
- Update the Page Table Entry (PTE) by: marking the page as valid (setting ‘V’ bit), inserting the page frame number, clearing the ‘modified’ flag, and performing other housekeeping.
- Resume the process where it left off.
Address translation is not instantaneous. To speed this up, computers use a fast cache called the Translation Lookaside Buffer (TLB) to store recently used PTEs, leveraging principles of spatial and temporal locality.
Page Fetch and Retirement Strategies
(a) Fetch Strategy: Controls when a page is fetched:
- Demand fetch: Fetch the page when a page fault occurs (normal).
- Anticipatory fetch (pre-paging): The OS analyzes behavior and uses spare CPU time to fetch pages it predicts will be needed soon.
(b) Retirement Strategy: Controls which page is retired to make way for a new page:
- Random retirement: Simple selection.
- Least-recently retirement: If a page wasn’t used recently, it is less likely to be used soon (temporal locality principle).
W3 File System
Storage Media Types
File storage media include:
- Magnetic disc (Hard Disk Drives – HDD)
- Magnetic tape (reel or cartridge)
- Optical disc (CD, DVD, Blu-Ray)
- Flash memory (Solid State Drives – SSD, USB sticks)
File Operations
- Open: Given a name/path, this creates the file if it doesn’t exist or opens it for reading, writing, or appending. The OS creates a memory buffer, locates the file, checks permissions, and returns a unique file descriptor.
- Read: Using a valid file descriptor, data is read from the current file position in equal-sized chunks into the memory buffer. The file position is updated.
- Write: The reverse of read. Data is written using a memory buffer (which must be flushed to disk when full) and uses a file pointer.
- Close: Flushes all write buffers to disk, deallocates buffer memory, updates file information (timestamps, size), and invalidates the file descriptor.
- Seek (and rewind): Alters the file pointer position forwards or backwards from its current value using the file descriptor.
File Organization
Contiguous Organization
Files are stored as consecutively numbered blocks. Retrieval information includes the starting block number and the number of blocks used.
File Map Organization (e.g., FAT)
This organization is more flexible; files can be stored in any available blocks. A map (like the File Allocation Table – FAT in older systems) tracks block usage and links to the next block.
Advantages:
- Fragmentation is manageable (performance hit from widely-separated blocks).
- Easy to alter file size, insert, or delete data.
- Not too slow for program loading.
Disadvantages:
- Wasted space storing the map (e.g., 4 bytes per block).
- Requires reading through the map to find the last block (mitigated by storing the ending block number).
- Not very good for random access.
- Requires extra disk access to read the file map.
Directories
Directories hold file names and retrieval information, plus:
- Permissions and access control settings.
- Owner and group information.
- Timestamps (creation, last update, last access, backup).
- File type and size (number of blocks).
Directory Hierarchies
- UNIX: Everything starts with the root directory (/). Different storage devices can be mounted into this hierarchy. UNIX filenames are always Case Sensitive.
- DOS/Windows: Usually not Case Sensitive. Each drive letter has a separate hierarchy. The path separator is the opposite of UNIX.
File Permissions
Permissions protect files from unauthorized access. Every file has an owner (user or administrator).
Permissions control access operations:
- Reading
- Writing/appending
- Deleting
- Executing/running (for programs)
In UNIX, directories are treated like files; ‘read’ permission is needed to enter and view directory contents.
Backup Systems
A backup system restores file system contents to a previous state, including directory structure, file names, content, and metadata.
- Complete (Full) Backup: Everything is replicated. Uses much storage but recovery is easy.
- Differential Backup: Stores only files that changed since the last full backup.
- Incremental Backup: Stores only files that changed since the last incremental backup. Less data stored, but recovery requires reading all subsequent backups since the last full backup.
W4 Connectivity
Data Communication Links
Links can be classified by directionality:
- Simplex: Unidirectional.
- Half duplex: Bidirectional, taking turns.
- Full duplex: Bidirectional simultaneously.
Data flow can be continuous or packetized.
Circuit Switching
Similar to an old telephone call, a physical end-to-end link is established before communication begins and relinquished afterward. All segments of the link must be reserved exclusively for that connection.
Packetization
Data is transmitted in packets (data + control info) one by one. Packets require:
- A destination address.
- A unique sequence identifier (as packets may arrive out-of-order due to varying routes/speeds).
- A source address (if multiple senders exist).
- Error coding to detect damage or loss, allowing the sender to re-transmit.
A packet typically contains: {data + source & dest address + seq. ID + error coding}.
W4.5.1 OSI Layers
The OSI model defines several layers:
- Physical: Transmits raw bits across the medium (wireless, cable).
- Data Link: Combines bits into data blocks for transmission and manages flow to the correct destination on the local link.
- Network: Routes data blocks across a network of individual data links.
- Transport: Ensures all data arrives, is reassembled in sequence, and is passed to the correct application.
- Session, Presentation, Application: Application programs communicate using protocols ensuring mutual understanding.
W4.6 The Internet Protocol (IP) Model
The Internet model typically uses 5 layers:
- Physical layer: Ensures bits are communicated between nodes.
- Network (Interface layer): Collects bits into larger units (packets) for transmission, hiding network differences.
- Internet layer: Uses the Internet Protocol for end-to-end packet transmission (acknowledgment is not guaranteed).
- Transport layer: Splits data into packets and reassembles them at the destination, using different protocols.
- Application layer: Uses higher-layer protocols like NFS, FTP, SMTP, POP3, etc.
W4.6.1 The Transport Layer Protocols
The two most common transport protocols are:
- Transmission Control Protocol (TCP): Guarantees end-to-end, full duplex, acknowledged transmission sessions with correct sequence delivery. It is reliable but relatively inefficient and can cause delays.
- User Datagram Protocol (UDP): A best-effort, sessionless, unacknowledged, streaming transmission. It offers no guarantee of packet arrival or sequence but is much faster and more efficient for streaming.
