Data Structures and Algorithms: A Comprehensive Guide

Heap

A heap is a complete binary tree in which the value of each node is greater than or equal to the values of its children (if any) (a maxheap). Dequeuing the object with the greatest value appears to be a (1) operation. It only takes O(lgn) time to turn the structure back into a heap again. The process of swapping downwards to form a new heap is called heapifying. Enqueue uses a loop (a loop is used for heapifying ), and it is a O(lgn) operation (swapping in reverse) (O( lg n ) complexity). Each time we swap downwards, the number of nodes we can travel to is reduced by approximately half. (lch) = 2 * (p) + 1/ (rch) = 2*(p) + 2/ (p) = (ch-1) / 2

PriorityQueue

There are two constructors. The first constructor just initializes an empty heap. The second makes a heap out of an initial array. A heap of size n can be made by enqueuing n elements one at a time –but it would take O(nlg n) time (even though n starts off as small). A faster method can be used to make a heap out of an initial array in ( n ) time We want to heapify starting with the parent of the last leaf node. To find the last leaf node, we use heapsize–-1. Heapifying works only if the structure is a heap except for possibly the root.

Linked Heap

If we have a large element size, we might want to consider conserving memory with a linked heap. should have the same time complexities as the array-based heap.

DequeueRequirements

: left and right pointers in a node, so we can get the children for swapping .a pointer to the last node in the heap, which needs to be maintained (updated) in (1) time.

EnqueueRequirements

a parent pointer in a node, for swapping upwards if necessary,a (1) method of finding the place to put the initial node. We work with the end of the array when enqueuingor dequeuing. We can work with the end of a linked list instaed.

Embedded Heap

:This isn’t a heap now, because it isn’t even a tree –we have cycles. We can say there is an “embedded heap” in this data structure. We’ll need to maintain a pointer at the end of the list called “last”, so we know where we’re supposed to form a new node (for the enqueue) and remove a node (for the dequeue). We’ll also need a pointer to the last parent, so we know which parent to attach a new node to. lastParent= lastParent->next.

Removing Nodes

:When we remove a node, we’ll have to bring the last pointer back one node .If the list through the heap is a singlylinked list, it might not be easy to find the previous node to bring the “last”pointer back to. If the list through the heap is doublylinked, it’s easy to find the node to set “last”to. Using the doubly-linked list, we can also bing lastParentback one node when we need to. We now have five pointers in a node: left, right, parent, back next. Every time we need to add or remove a node, we’ll need to set a number of pointers. In the array-based heap, however, we’d be copying a couple of elements on the average enqueueor dequeue(the average for array expansion / contraction). For large element sizes, it is still worth it. If an element size is 20 bytes, 50% of the memory in the linked (embedded) heap is wasted •If an element size is 1980 bytes, 1% of the memory in the linked (embedded) heap is wasted •For large element sizes, it is a worthwhile data structure to consider. We can avoid handling special cases involving the root by using a “root header node” –similar to the header node in the linked list. The actual root node will branch off as a single child of the root header. The root pointer, last pointer, and lastParentpointer will start off pointing to the root header •We move the lastParentpointer forward after a right child has been attached to its node •Therefore, the actual root should branch off the right side of the root header. heap is empty when root == last.

Sorting AlgorithmS

Sorting is the process of placing elements in order •If elements are objects, they are ordered by a particular data.

Heapsort

Removed,added in O(logN) time. These properties make heaps useful as priority queues. The sort_heap algorithm converts a heap into a sorted collection over the range [first, last) using either the default operator ((n lg n )

time on average. It has a best-case time of ( n ) if all elements have the same value.

Function Templates

A function template can be made for heapsort, so that the client can put heapsort into the client program.A function template is something like a class template –the compiler makes functions from the function template.The client can then use heapsort for different types of arrays, without rewriting the heapsort function.Note that the function prototype for heapsort must also be templated.heapsort( objs ); When the compiler sees this, it makes a function out of the function template. The compiler determines what type objs is.The client could also call heapsort on an array of integers in the same program. the compiler would make another heapsort function using int for DataType. the heapsort function template is placed into the client’s program.Insertion Sort.During the process of insertion sort, the array is divided into a sorted and an unsorted section –the sorted section grows and the unsorted section shrinks.On each iteration of insertion sort, the sorted section grows by 1, and the unsorted section shrinks by 1, until the array is sorted.for each j, from 1 to the length of A –1 : In the algorithm, j is used to mark the index of the next element to be inserted.while i is greater than -1 and A[ i ] is greater than A[ i + 1 ].A section of 1 element (at index 0) is already sorted!The index j starts off as the first element (instead of index 0) in the unsorted section.Inversions.An inversion between any two elements when the element that comes first in the array is greater than the element that comes next.A swap in insertion sort removes an inversion. So a swap removes one and only one inversion each time it is executed; it is also the only way inversions are removed. Therefore, the number of swaps performed in all of insertion sort is equal to the number of inversions in the original array.The worst number of inversions occurs when an array is in descending order •Each element is inverted with every other element (can’t get any worse). There are n elements, and each is inverted with (n –1) elements, so n( n –1)/2. Therefore, in the worst case, the number of swaps performed is n( n –1 ) / 2 which gives us the worst-case time complexity of ( n2) for insertion sort. In the average case, we might assume that each possible inversion has a 50% chance of occurring.Therefore, on average, we would expect that the number of inversions is 50% of n( n –1 ) / 2. i.e n( n –1 ) / 4(still ( n2) on average).There are no inversions in an array that is already sorted, so the condition in the while loop is never true (no swaps). The outer loop still executes n –1 times (the algorithm does not know it is already sorted). Thus, in the best case, insertion sort runs in ( n ) time. Consider the case where we have n inversions, so we have a total of n swaps . For each iteration of the for loop, a swap is executed an average of 1 time (approximately). Hence, we would still have ( n ) time if we have n inversions or less.Insertion sort is often used to sort a small number of elements .we have n( n –1 ) / 4 inversions.at around 5 elements, insertion sort behaves like a ( n ) sorting Speeding:We can use the same idea that we used in the heap to perform “one-assignment” swaps instead of full swaps •We first save the element at index j to a temporary variable to hold it.Quicksortbest:( n lg n ) average:( n lg n )worst:( n2). Quicksort one of the most popular general purpose sorting algorithms.Functions of Quicksort.A recursive function, called quicksort.A nonrecursive function, usually called partition.quicksort calls partition.Partition:In the partition function, the last element is chosen as a pivot–a special element used for comparison purposes.Each element is compared to the pivot.All elements greater than the pivot are on its right side.Partition is called from quicksort more than once, but when called again, it will work with a smaller section of the array.Each section of the array separated by a previous pivot will eventually have partition called for it.Except that partition is not called for a section of just one element –it is already a pivot and it is where it belongs.But to achieve this, what steps does the algorithm for partition go through:Partition has a loop which iterates through each element, comparing it to the pivot.When partition is in progress, there is a partitioned section on the left, and an unpartitioned section on the right.The dividing line in the partitioned section means that all elements to the left are less than or equal to the pivot; all elements to the right are greater than the pivot.i is an index used to mark the last value in the small side of the partition, while j is used to mark the first value in the unpartitioned section.On each iteration of partition, the value at j is compared to the pivot.If the value at j is greater, j is just incremented (the partitioned section grows by one).If, on an iteration, the value at j is less than the pivot then i is incremented and the value at i is swapped with the value at j, then j is incremented.Partition starts off by passing in the array arr the beginning index of the array section it is working with and the ending index of the array section it is working with.The final step of partition returns the INDEX of the pivot back to the quicksort function that called it.The Quicksort Function.Since quicksort is called recursively, it also passes in the array arr, the beginning index of the section it is working with, and the ending section it is working with.partition is not called for a section that has just one element. Counting SortThe time complexity of counting sort depends on both n and also the range of the elements –for example, if the elements range in value from 25 to 30, the range is 30 –25 + 1 = 6If the range is smaller than n, counting sort is very fast, running in ( n ) time. The main counting sort algorithm works with elements that range in value from 0 to some positive integer k •However, with a little modification, counting sort can work with nearly any data, still maintaining a ( n ) time complexity.( k ) + ( n ) If k which sorting algorithm we should use:if max –min Sorting a Linked List. •Consider going through a linked list, element by element, assigning each value to an array ((n)time)Then, sort the array •Then, go through the linked list, assigning each value of the array to the linked list.A sorting algorithm must take at least( n ) time if nothing about the elements is known ahead of time –it takes this long just to look at the elements •Therefore, assigning elements from linked list to array and back to linked list again will not affect the time complexity of a sorting algorithm.Recall that elements in linked lists will tend to be large –a reason for using linked lists •Therefore, it would be very time consuming to do this element copying •We can sort a linked list without copying elements.We use a dynamic array of pointers into the linked list.This loop is a ( n ) loop and did not copy any elements. Step 2 uses a modified sorting algorithm –instead of moving elements, we move array pointers We’ll use insertion sort. In the original insertion sort, temp held an element; now it holds an address. Note that we’ve swapped addresses (much faster than swapping elements). Now, in step 3, we just relink the linked list. This step also takes ( n ) time.Modifying a Sorting Algorithm for a LL:Those variables that were assigned elements in the sorting algorithm should now be assigned addresses (they become pointers).Use ->info when element values need to be compared.Each element of the C array does not need to be an integer .An alternative is to use struct objects as elements of the C array .One data member can be a count for the C array index Another data member can be an array of pointers to linked list nodes, relevant to that index of the C array (the data member for which the list is being sorted)

partial_sort_copy Partial sorts into copy.The Standard C++ Library provides two fundamental sorting algorithms.The sort() algorithm is slightly faster, but it does not guarantee that equal elements in the original sequence retain their relative orderings in the final result. If order is important, the stable_sort() version should be used. Because these algorithms require random access iterators, they can be used only with vectors, deques, and ordinary C pointers. Note, however, that the list container provides its own sort() member function. The comparison operator can be explicitly provided when the default operator Partial Sort :

Sorts only part of sequence.The generic algorithm partial_sort() sorts only a portion of a sequence. In the first version of the algorithm, three iterators are used to describe the beginning, middle, and end of a sequence. If n represents the number of elements between the start and middle, then the smallest n elements are moved into this range in order. The remaining elements are moved into the second region. The order of the elements in this second region is undefined. A second version of the algorithm leaves the input unchanged. The output area is described by a pair of random access iterators. If n represents the size of this area, the smallest n elements in the input are moved into the output in order. If n is larger than the input, the entire input is sorted and placed in the first n locations in the output. In either case, the end of the output sequence is returned as the result of the operation.Because the input to this version of the algorithm is specified only as a pair of input iterators, the partial_sort_copy() algorithm can be used with any of the containers in the Standard C++ Library.Stable sort: Sorts, retaining original order of equal elements The stable_sort algorithm sorts the elements in the range [first, last). The first version of the algorithm uses less than (Merge SortWe will break this array into 2 equal halves. Hash Table ADT:The hash table is a table of elements that have keys .A hash function is used for locating a position in the table •The table of elements is the set of data acted upon by the hash table operations insert, retrieve, ,remove, update,an operation to empty out the hash table. Fast Search.A hash table uses a function of the key value of an element to identify its location in an array. •The function of the key value is called a hash function.The input into a hash function is a key value •The output from a hash function is an index of an array (hash table)where the object containing the key is located •Example of a hash function: h( k ) = k % 100.The search is done in ( 1 ) time.Insertion is done in ( 1 ) time.The most popular way to resolve collisions is by chaining •Instead of having an array of objects, we have an array of linked lists, each node of which contains an object •An element is still inserted by using the hash function -the hash function provides an index of a linked list, andthe element is inserted at the front of that (usually short) linked list •When searching for an element, the hash function isused to get the correct linked list, then the linked list is searched for the key (still much faster than comparing 500,000 keys).The insert function of LinkedList inserts a new element at the BEGINNING of the list.Uniform Hashing.When the elements are spread evenly (or near evenly) among the indexes of a hash table, it is called uniform hashing •If elements are spread evenly, such that the number of elements at an index is less than some small constant, uniform hashing allows a search to be done in ( 1 ) time •The hash function largely determines whether or not we will have uniform hashing. Ideal Hash Function for Uniform Hashing:The hash table size should be a prime number that is not too close to a power of 2.•Speed comes from reducing the number of collisions •In a search, if there are no collisions, the first element in the linked list in the one we want to find (fast) •Therefore, the greatest speed comes about by making a hash table much larger than the number of keys (but there will still be an occasional collision).Each empty LinkedList object in a hash table wastes 8 bytes of memory (4 bytes for the start pointer and 4 bytes for the current pointer) •The best memory conservation comes from trying to reduce the number of empty LinkedList objects •The hash table size would be made much smaller than the number of keys (there would still be an occasional empty linked list).Can we have a hash function which guarantees that there will be no collisions? •Yes: h( k ) = k •Each key k is unique; therefore, each index produced from h( k ) is unique •Consider 300 employees that have a 4 digit id •A hash table size of 10000 with the hash function above guarantees the best possible speed.Implementing a Hash Table We’ll implement a HashTable with linked lists (chaining) –without chaining, a hash table can become full•We shouldn’t write the hash function •The client should write the hash function that he/she would like to use •Then, the client should pass the hash function that he/she wrote as a parameter into the constructor of the HashTable class •This can be implemented with function pointers.A function pointer is a pointer that holds the address of a function •The function can be called using the function pointer instead of the function name. float (*funcptr) (string). The return type of the function that funcptr can point to is given here (in this case, the return type is a float). The parameter list of a function that funcptr can point to is given here –in this case, there is only one parameter of string type.the client passes the name of the hash function, as a parameter into the HashTable constructor •The HashTable constructor accepts the parameter using a function pointer in this parameter location •The address of the function is saved to a function pointer in the private section •Then, the hash table can call the hash function that the client made by using the function pointer.The DataType for Array is LinkedList (DataType in Array is different than DataType in HashTable).It is necessary to overload the == operator for the LinkedList functions.•The hash table ADT description is very close to the list ADT description •The only items missing from the hash table ADT description are: –an iterator –a function to determine when the “list” is empty –find, to determine whether an element is in the “list” –a current position •If we had all of these, we would have a fast list (or an enhanced hash table).Everything would be easy to implement for the hash table, except for the iterator •The iterator is an important part of a “list”, so we would like it to be as fast as possible •We can iterate through a collision list, but finding the next collision list to iterate through might not be so fast. Instead, we can have a linked list run through the collision lists, so that the linked list contains every element •Then, iterating through the linked list is simple and fast. List ADT :insert ( 1 ) ,iterator –each step will be ( 1 ) ,find –element is found by hashing, so it is ( 1 ) for uniform. retrieve –is ( 1 ) for uniform hashing.replace –is ( 1 ) using the current position,determine whether or not the list is empty ( 1 ), because we just test the linked list to see if it is empty;empty out the list –is ( n )•remove –to make this last operation as fast as possible, consider using a doubly-linked list for the linked list. •We’ve seen that a node in a doubly-linked list can be removed in ( 1 ) time, given a pointer to it •Using hashing, we obtain a collision list, which will have a pointer to the node we wish to remove.Doubly-Linked List ADT •To avoid special cases, we’ll have a header node and a trailer node in the doubly-linked list.Few data structures use arrays of doublylinked lists –if such a use arises, we could create a doubly-linked list without header and trailer nodes to avoid wasting memory.Each node has three pointers (not just one).remove in ( 1 ) time.Insert ( n ).Memory Considerations •If there is only one node in the collision list, on average, then the LinkedList used for each element also has two pointers: start and current •This gives us a total of 20 bytes of wasted memory per node (housekeeping)•If each element is 20 bytes, 50% of memory is wasted•However, if each element is 980 bytes, only 2% of memory is wasted •Element size is an important consideration.HashTable Change•It will become a specialized “helper” class for the doubly-linked list•change the name of the class from LinkedList to CollisionList •modify the Node struct so it has the three pointers we need •we need a function to retrieve the current pointer, called getcurrent •instead of eliminating the current position when we insert a node, we will set the current position to the node we inserted •We’ll make some changes to the HashTable class, too •It will, again, be a specialized “helper” class just for use in the doubly-linked list•rename the HashTable class to DLHashTable, short for “Doubly-Linked Hash Table” •keep the location, used throughout the class, as a private data member •have a function getcurrent, which retrieves the current pointer of the CollisionList that was used in the last use of location; that is, return table[location].getcurrent().•We’ll also need some functions which are convenient for the copy constructor and deepCopy •gethashfunc, which will return the hash function pointer •sethashfunc, which will set a hash function pointer •getsize, which will get the Array size of the hash table •changeSize, which will change the Array size of the hash table.We don’t really need the header and trailer pointers, but without them the code may be confusing.When current is set to the trailer node, it means there is no current position.Cuckoo Hashing:The cuckoo hashing algorithm itself is simple: To insert anew item, x, first make sure it is not already there. We can then use the first hash function, and if the (first) table location is empty, the item can be placed.Fortunately, if the table’s load factor is below 0.5, an analysis shows that the probability of a cycle is very low, that the expected number of displacements is a small constant, and that it is extremely unlikely that a successful insertion would require more than O(logN) displacements. As such, we can simply rebuild the tables with new hash functions after a certain number of displacements are detected. More precisely, the probability that a single insertion would require a new set of hash functions can be made to be O(1/N2); the new hash functions themselves generate N more insertions to rebuild the table, but even so, this means the rebuilding cost is minimal. However,ifthetable’sloadfactorisat0.5orhigher,thentheprobabilityofacyclebecomes drastically higher, and this scheme is unlikely to work well at all.

Furthermore, although the insertion time is expected to be constant time as long as the load factor is below 1 2, the bound that has been shown for the expected insertion cost for classic cuckoo hashing with two separate tables (both with load factor λ) is roughly 1/(1−(4λ2)1/3), which deteriorates rapidly as the load factor gets close to 1 2 (the formula itself makes no sense when λ equals or exceeds 1 2). Using lower load factors or more than two hash functions seems like a reasonable alternative.