add

Exam Cheat Sheet
Memory
Stack & heap:
C programs use the stack for local variables, function parameters and return addresses.
C programs use the heap for explicitly requested dynamically allocated data (with malloc). 
In general, data is put on the stack unless we use malloc then its on heap
Malloc & Free:
Used in C as type of manual memory management
Always free malloc! Else lose point
Modern Languages like Java, Erlang etc. use Automatic Memory Management. The system allocates data structures on the heap and uses garbage collection to stop getting full . Garbage collection identifies structures that are no longer needed and reuses that space in the continued execution.
Buddy algorithm with exponents of 2:
Coalesce and split blocks of a specified size
Can store free blocks in a linked list or start from beginning of memory and traverse memory
About 25% internal fragmentation due to not using whole blocks of data
Free Memory Algorithms:
Best Fit – first, search through the free list and find chunks of memory that are as big or bigger than the requested size. Then return the smallest in that group. Reduces wasted space, however pays in performance by searching through entire block list everytime
Worst fit – find the largest chunk and return the requested amount of space within that chunk, then keep remaining space from the chunk on the free list. Attempts to leave bug chunks free instead of framgamentation. However needs to go through full search of free list.
First Fit – finds the first block that is big enough and returns the requeste amount to the user. The remaing free space is kept free in the free list. Has advantage of speed. But can pollute the beginning of the free list with small objects. Using address based ordering keeps the list orderd by address making coallescing easier. 
Next Fit – Same as first fit but keeps a pointer within the list it was looking last. Spread the searches for free space throughout the list more uniformly. Avoids splinterring at the beggining of the list.
Least Recently Used
Segregated lists – keep a list to manage objects of specifc sizes that a paticluar program frequently requests.
Memory paging
Paging as opposed to segmentation of memory into variable sized pieces.
Advantages:
Most important is flexibility, with a fully developed paging approach, the system will be able to support the abstraction of an address space effectively, regardless of how a process uses the address space
No fragmentation of memory into various sizes, which would make allocation more challenging
Simplicity of free space managment that paging allows
Disadvantages:
Page tables can get very huge, so a large amount of memory would be dedicated just to translations between virtual and physical memory.
It can be very slow as for every memory reference paging requires us to perorm one extra memeory reference in order to first fetch the translation from the page table
A virtual memory page is mapped to a frame in physical memory.
To record where each virtual page of the address space is placed in physical memory, a per-process data strucutre called page table is used.
Page tables are used to store address translations for each virtual page to frame for a process.
A page table entry holds the information for each individual translation between virtual and physical pages.
The virtual address in a memory paging system contains the virtual page number and offset.
The VPN refers to which page our data is in, and the offset refers to the specific byte the data starts at.
The offset is the same once translated into physical memory address, while the VPN gets translated with a lookup to the page table.
Linear page table – an array storing PTE’s in a linear format. The OS indexes the array by the virtual page number and looks up the PTE at that index in order to find the physical frame number (PFN) 
Each PTE contains:
a valid bit – used to indicate whether the paticular translation is valid. Some parts of the memory will be marked invalid and if a process attempts to access them the OS will likely shut it down. 
protection bits – indicating whtehr the page could be read from, written to, or executed from.
present bit – indicates whether this page is in physical memory or on the disk
dirty bit – indicates whether the page has been modififed since it was bought into memory
reference bit – used to track how many times a page has been accessed and how popular it is therefore whether to keep it in memory
Translation-lookaside buffer (TLB) – a hardware cache of popular virtual to physical address translations which is part of the memory management unit(MMU) chip. Upon each virtual memory referecne, the hardware first checks the TLB to see if the desiered translation is present. This avoids consulting the page table. 
Increasing the page size reduces the size of the page table as it now holds less entreis for the same amount of refernced memory. The issue is more internal fragmentation due to wasted space.
Multi level page tables – tree structure of page directoreies instead of linear page table. Saves and gets rid of all those invalid regions in the page table instead of keeping them in memory.
Inverted page table – instead of having many page tables, we keep a single page table that has en entry for each physical page tableof the system. Th enetry tells us which process is using this page and which virtual page of that process maps to this physical page.
C commands
Dup2
Execl
Opendir
Readdir 

Open – makes a file if there isnt one and ads it to the FDT
Unix commands
Wc – word and byte count for file
Ln – create a link to a file
Head – output the first part of a file
Tail – output the last part of a file
Diff – compare files line by line
Sort – sort lines of text files, e.g by ascending numerical value
Sed – stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
Tr – translate or delete characters.copies the standard input to the standard output with substitution or deletion of selected characters.
Threads
Threads of a process share its executable code and the values of its dynamically allocated variables and non thread local global variables at any given time.
Communication 
Pipes – connected processes where the stdout of one process feeds directly into the stdin of the next. 
Socket – data communications endpoint for exchanging data across two points on the same OS.
Types of sockets:
Scheduling
We uses different metrics to measure scheduling:
Performance
Fairness – 
Response Time – the time from when the job arrives in a system to the first time it is scheduled. Relates to user interactivity 
Other Definitions
Turnaround time – the time at which the job completes minus the time at which the job arrived in the system
Time slice – piece of a job that has been split up to be run using the RR technique
Scheduling algorithms:
FIFO
Most basic algorithm, aka First Come, First Served
Advantages:
Simple and easy to implement
Disadvantages:
Convey effect
a number of relatively short jobs get queued behind a heavyweight job, which increase the average turnaround time for all
Shortest Job First (SJF)
Runs the shortest job, then the next shortest and so on
Advantages:
Decreases average turnaround time
Disadvantages:
Convey effect can still happen if jobs arrive shortly after each other, e.g if a heavyweight job arrives before a bunch of smaller jobs
Shortest time to completion first (STCF)
Aka as preemptive shortest job first
When a new job enters the system, the STCF scheduler determines which of the remains jobs has the lest time left, and schedules that one. It does it by interrupting jobs and swapping contexts
Advantages:
Much improved turnaround compares to SJF
Disadvantages:
Bad for response time and interactivity of user, if three jobs arrive at the same time, the third job has to wait for the previous two jobs to run in their entirety before being scheduled just once
Round robin (RR)
Instead of running jobs to completion, runs a job for a time slice, and then switches to the next job in the run queue
The length of a time slice must be a multiple of a timer interrupt. The shorter the time slice, the better response time, however a too small time slice results in a higher amount of switching contexts which impacts performance
Advantages:
It is a very fair algorithm as all jobs have similar resources spent on them
Lower response time
Disadvantages:
If turn around time is our metric, RR is one of the worst algorithms as it stretches out jobs as long as it can
Fixed priority
Multilevel feedback queue (MLFQ)
Addresses the fact that we cannot know in advance how long a program will run with the other algorithms
MLFQ varies the priority of a job based on its observed behaviour, so a jobs priority changes over time.
Here are the rules:
If priortiy(A) > priority(B) then run A
If priority(A) == Priorty(B) then run RR with A & B
When a job enters the system, it is placed at the highest priority
Once a job uses up its time allotment at a give level, its priority is reduced (it moves down one queue)
After some period S, move all the jobs in the system to the topmost queue
You need the 5th rule otherwise non interactive processes will starve and will not complete in good time
Advantages:
Manages to both optimise turnaround time & makes a system that is responsive to its interactive user by reducing response time
Disadvantages: 
Complicated?
File Systems and storage
How an OS interacts with a device (I/O):
Interrupts – if a process is waiting for a device to return, instead of occasionally polling the device to see if it is done which is CPU consuming, the process can go to sleep while the CPU deals with another process, and the device in question will raise a hardwaare interrupt when done which causes the OS wake the process waiting for the I/O. Downside is they only really work for slow devices, otherwise the cost of context siwtiching and interrupts might outwiegh the benefits.
Direct Memory Access (DMA) – when transfering large chunks of data the CPU has to do a simple and easy task of taking data from the memory and copying it to the I/O device. The CPU could be used for better tasks that require computation therefore a DMA is used. DMA is a specific device that orchestrates transfers between devices and main memory without much CPU intervention-
Disk scheduling – I/O has a high cost so the OS decides on the order of I/Os issued to the disk. The disk scheduler examines requests and decides which one to schedule next. 
Shortest Seek Time First (SSTF) – puts request on the nearest track to complete fisrt. Meaning the closest read to the current postion of the head. Disadvantage is starvation, if there was a steady stream of requests to the near track any requests to further tracks would be ignored.
Nearest block first (NBF) – same as SSTF but shceduels the request with nearrest block instead of track?
Elevator / SCAN / CSCAN – The head sweeps across all tracks on the drive and services the requests as it sweeps over the correct area. The issue is it doesn’t adhere to SJF (shortest job first).
Shortest Positioning Time First (SPTF) – Solves problem by approximating SJF by taking both seek and rotation of disk into account. Takes speed of seek (moving the head) and rotation (of the disk) into account when deciding which track & sector is the shortest time away. 
File – is an array of bytes. Each file in a directory has an inode number associated with it. 
Directory – collection of tuples, each of which contains a human name and low level name to which it maps.Directory also have an i number.
File Descriptor Table – each process has one that keeps track of resources used by that process. e.g sockets, pipes, file, I/O resource. First three are 0: STD_IN, 1: STD_OUT, 2: STD_ERR. The entry in the FDT contians the inode numer and the current offset of the file.
Inode – short for index node. Inside each inode is virtually all the information needed about a file. its type, its siz, the number of blocks allocated to it , protection info like who owns and cn access the file, time information (created, modified, accessed), if the data is on disk. 
When you open a file in C you get a file descriptor retunred to you. File descriptor is a per proces integer for each resource in the file descriptor table.
In the permissions column for files the leftmost character is “-” for normal files, “d” for directories, “l” for soft links.
Hard link – created using the ln command. Creates another name in a directory that refers to the same inode as the one you are linking it to. File is not copied, you just have two human names that both point to the same file. Hard links are limited, can’t make one to a directory in fear of creating a cycle in the tree and you can’t create one between disk partitions.
Soft link / symbolic link – Use ln but with the -s flag. A soft link is an actual file with the path to the file it is linked to. It uses up space in the system. If you remove the original file the soft link points to then the soft link simply points to a file that no longer exists.
Reference count / Link count – the amount of times a inode number is refernced by file names in the system. If this number goes to 0 the data is “deleted” as it is released into the system as free space.
Crash consistency problem – how to update a persistent data structure in the event of power losses and crashes. 
Solutions to crash consistency problem:
File System Checker – Runs before the file system is mounted, does sanity checks to make sure there is a coherent structure to the inodes and superblocks, that nothing has been corrupted. 
Journaling – When updating disks, log what you are about to do. In the event of a crash at reboot you will check the logs and see if there was anything that needs to be continued that was interrupted.
Log based file system – new way of updating the disk. Instead of overwriting files in places, LFS always writes to an unused portion of the disk and the later reclaims that old space through cleaning.
Processes
When you fork a process, what are sharing rules with new process:
Stack: no
Heap: no
Global memory: no
Code area: no
Open files: yes
Time Sharing of the CPU allows users to run as many concurrent processes as they would like by promoting the illusion that there exist infinite virtual CPUS. The cost is performance as each process runs more slowly if the CPU is shared.
Space Sharing where a resource is divided in space among those who wish to use it. E.g disk space.
Process API:
Create: OS must be able to create new processes
Destroy: Forcefully be able to destroy a process 
Wait: It is useful to wait for a process to stop running
Miscellaneous Control: E.g suspend process
Status: Get status information on the running of a process
C programs use the stack for local variables, function parameters and return addresses.
C programs use the heap for explicitly requested dynamically allocated data. Programs request this free space with malloc().
Steps for running a program done by OS:
Loading the code and static data into memory
Creating & initialising stack
Other work related to I/O setup
Start program running at entry point, main()
A process has three states that it transitions between:
Running – currently executing instructions
Ready – obvious
Blocked – not ready until some other event takes place
Zombie process is a process that has completed execution but is still in the process table. Can occur when child prcesses are finsihsed but are waiting for their parent to read their exit status, once the exit status is read by the wait system call the child process is no longer a zombie.
Memory mapping of a running process