Linux System Fundamentals: Files, Processes, and Hierarchy

Linux File System Architecture

The Linux file system is a hierarchical structure that organizes data on storage devices. Unlike Windows, which uses drive letters (C:, D:), Linux uses a single, unified structure starting from the root directory (/).

1. Files, Inodes, and Structure

A. Linux Files: Everything is a File

In Linux, everything is treated as a file, which simplifies system interaction. This includes:

  • Regular Files: Text files, executable programs, images, documents.
  • Directories: Special files that contain lists of other files and directories.
  • Device Files: Interfaces to hardware (e.g., /dev/sda for a hard drive, /dev/tty1 for a console).
  • Links: Pointers to other files (Hard Links and Symbolic/Soft Links).

B. Inodes and File Structure

The core concept that defines a file in Linux is the inode (index node).

  • Inode Definition: This is a data structure on the disk that stores all information about a file except its name and its actual data.
  • Inode Contents (Metadata): Metadata such as the file’s type, size, permissions (read/write/execute), owner ID, group ID, creation/modification timestamps, and crucially, pointers (addresses) to the disk blocks where the file’s data is stored.
  • File Name Mapping: The file name (and its directory path) is stored in the directory entry. The directory entry simply maps the human-readable file name to its corresponding inode number.

2. File System Components

A complete Linux file system on a partition generally consists of four main areas:

  • Boot Block: Contains the bootloader and instructions needed to start the operating system.
  • Superblock: Contains vital information about the entire file system, such as total size, number of free inodes, number of free data blocks, and the location of the inode table. It is crucial for mounting the file system.
  • Inode Table (or Inode List): The section where all the inodes reside. Each file or directory on the partition has a unique entry (inode number) here.
  • Data Blocks: This is where the actual file content is stored. The addresses of these blocks are listed in the file’s corresponding inode.

3. Standard Linux File System Hierarchy (FHS)

The Filesystem Hierarchy Standard (FHS) defines the standard directory structure in Linux. Everything branches out from the root directory (/).

DirectoryPurposeExample Contents
/Root Directory: The highest level of the file system hierarchy.Contains all other directories.
/binBinaries: Essential user command binaries (e.g., ls, cat, date).
/etcEtcetera: Configuration files for the entire system (e.g., network settings, user passwords)./etc/passwd, /etc/fstab
/homeHome Directories: Personal files and settings for regular users./home/alice, /home/bob
/usrUnix System Resources: Second major hierarchy, typically read-only data, utilities, and applications./usr/bin, /usr/lib
/varVariable Data: Files that change frequently, like logs, temporary internet files, and print queues./var/log, /var/mail
/tmpTemporary: Files that are deleted between reboots.Temporary application files.
/devDevices: Contains files that represent hardware devices./dev/sda, /dev/null
/procProcesses: A virtual filesystem providing information about running processes and kernel status.

4. Common Linux File System Types

Linux supports various file system types, which define how data is stored and retrieved on a partition.

  • ext2/ext3/ext4 (Extended File System):
    • ext2: The original standard, no journaling.
    • ext3: Introduced journaling, which improves reliability and speeds up recovery after a system crash by logging changes before they are made.
    • ext4: The modern default, offering increased speed, larger file size support, and improved performance features (extents).
  • XFS: A high-performance, highly scalable 64-bit journaling file system developed by SGI, popular for large file systems and high-throughput environments.
  • Btrfs (B-tree file system): A modern, copy-on-write (CoW) file system focusing on fault tolerance, repair capabilities, and easy administration (supports snapshots, RAID, and volume management).
  • FAT/NTFS: Used primarily for interoperability with Windows or older systems (e.g., USB drives).

Linux Process Management and Control

A process in Linux is an instance of a running program. Every command you execute runs as a process, each identified by a unique Process ID (PID).

1. Initialization Processes (PID 1)

The system is bootstrapped by the first process, which has a PID of 1. This process is responsible for managing all other processes and services on the system.

  • init (Historical): The traditional initialization system in older Linux distributions.
  • systemd (Modern): The current standard in most major distributions (like Ubuntu, Fedora, Debian). systemd is a complex system and service manager that handles system startup, manages services, and maintains all other processes. It remains active until the system is shut down.

2. Mechanism of Process Creation: Fork and Exec

New processes are created in a two-step sequence using system calls:

  1. fork(): The running process (the parent) calls fork(). This creates a near-exact copy of itself (the child process). The child process inherits the parent’s memory, file descriptors, and environment variables. The only major difference is the PID.
  2. exec(): Immediately after forking, the child process calls one of the exec family of functions (e.g., execve). This function replaces the child process’s memory space and code with the code of the new program that is to be executed. The child process then starts running the new program.

3. Starting and Stopping Processes

A. Starting Processes (Foreground vs. Background)

  • Foreground: By default, a command runs in the foreground. The terminal is locked until the process finishes.
    • Example: vi my_file.txt
  • Background: To run a process in the background and immediately regain control of the terminal, append an ampersand (&) to the command.
    • Example: long_running_script.sh &

B. Stopping and Managing Processes (Signals)

Processes are managed using signals, which are software interrupts sent to a process. The kill command is the standard way to send signals.

CommandSignal NameSignal NumberActionDescription
kill <PID>SIGTERM15Graceful TerminationRequests the process to shut down cleanly (allows cleanup). This is the default signal.
kill -9 <PID>SIGKILL9Forceful TerminationImmediately stops the process without allowing it to clean up. Use as a last resort.
kill -15 <PID>SIGTERM15Graceful TerminationExplicitly sends the graceful termination signal.
killall <name>SIGTERM (or other)15 (or other)Terminates all processes with a specified name.killall firefox
pkill <name>SIGTERM (or other)15 (or other)Kills processes based on name and other criteria (more flexible than killall).pkill -u user_name

4. Job Control in Linux Shells

Job control refers to managing multiple processes (jobs) within a single shell session, often utilizing background and foreground switching.

Command / KeyFunctionDescription
jobsLists all running or stopped jobs in the current shell.Shows the job number (e.g., [1], [2]).
fg %<job_id>Brings a background job to the foreground.Example: fg %1
bg %<job_id>Resumes a stopped job in the background.Example: bg %2
Ctrl + ZSuspends (stops) the current foreground process, moving it to the background.You can then use bg or fg.
Ctrl + CSends the SIGINT signal (Interrupt), usually terminating the foreground process gracefully.

5. Scheduling Commands in Linux

Linux offers several tools to execute commands at a specific time or on a recurring schedule.

A. Single Execution Scheduling

ToolFunctionDescriptionExample
atExecutes a command once at a specified future time.Commands are typed directly or piped to at.echo "backup.sh" | at 23:00
batchExecutes commands once when the system load level drops below a specific value (load average < 1.5).Commands are typed directly or piped to batch.batch (then type commands)

B. Recurring Scheduling

ToolFunctionDescriptionKey Command
cronExecutes commands periodically (hourly, daily, monthly, etc.).Jobs are defined in a file called a crontab, using a five-field time/date syntax.crontab -e (to edit the user’s schedule)

C. Time Measurement

ToolFunctionDescriptionExample
timeMeasures the execution time of a command.Reports real (wall clock), user (CPU time spent in user mode), and sys (CPU time spent in kernel mode) time.time find / -name "*.conf"