Data Structures and Algorithms: Complexity, Hashing, and Trees

Posted on Oct 13, 2025 in Computers

Algorithm Analysis: Time, Space, and Efficiency

Algorithm analysis evaluates how efficient an algorithm is by measuring its resource usage. The three main resources analyzed are:

Programming Requirements
This refers to how easy or complex it is to implement the algorithm in code. It involves understanding the logic, writing the correct syntax, and debugging. Simpler algorithms may require less programming effort but might be less efficient.
Time Requirements (Time Complexity)
This measures how much time an algorithm takes to complete as the input size grows. It is usually expressed using Big O notation (e.g., O(n), O(log n)) and helps predict the speed of an algorithm.
Space Requirements (Space Complexity)
This measures how much memory an algorithm needs during its execution, including input storage, temporary variables, and recursion stack. It also uses Big O notation (e.g., O(1), O(n)).

Types of Algorithm Complexity Analysis

Best Case Complexity: The minimum time or space required by the algorithm for any input of size n.
Worst Case Complexity: The maximum time or space required, ensuring performance guarantees.
Average Case Complexity: The expected time or space for a typical input.

Fundamental Data Structures and Classification

Classification of Data Structures

Data structures are broadly classified into two main categories:

1. Primitive Data Structures

These are the basic data types provided by a programming language. They directly operate upon the machine instructions.

Examples: Integer, Float, Character, Boolean, etc.

2. Non-Primitive Data Structures

These are more complex data structures that are built using primitive data types. They are further classified into:

Linear Data Structures

Elements are arranged in a sequential manner. Each element is connected to its previous and next element.

Examples: Array, Linked List, Stack, Queue.

Non-Linear Data Structures

Elements are arranged in a hierarchical or interconnected manner, not sequentially.

Examples: Tree, Graph.

Arrays and Stack Operations

Advantages and Disadvantages of Arrays

Advantages of Arrays:
- Simple and easy to use for storing multiple elements of the same type.
- Fast access to elements using index (constant time O(1)).
- Efficient memory utilization when the number of elements is known in advance.
- Supports random access which helps in quick searching and sorting.
Disadvantages of Arrays:
- Fixed size — the size must be declared before use and cannot be changed dynamically.
- Insertion and deletion are costly as they may require shifting elements (O(n) time).
- Wastes memory if allocated size is much larger than the number of elements stored.
- Does not support flexible memory management like linked lists.

Stack Operations (LIFO)

A stack follows the Last-In, First-Out (LIFO) principle.

1. PUSH Operation (Insertion)

Adds an element to the top of the stack.
Steps:
- Check if the stack is full (stack overflow).
- If not full, increment the top pointer.
- Insert the new element at the top position.

2. POP Operation (Deletion)

Removes the element from the top of the stack.
Steps:
- Check if the stack is empty (stack underflow).
- If not empty, retrieve the element at the top.
- Decrement the top pointer to remove the element.

Queues and Deques

Circular Queue Implementation

A Circular Queue is a linear data structure that follows the FIFO (First In First Out) principle but connects the end of the queue back to the front, forming a circle. This allows efficient use of storage by reusing spaces freed by dequeued elements.

Unlike a simple linear queue, it avoids the “queue full” problem when there is free space at the front.
It uses two pointers: Front (points to the first element) and Rear (points to the last element).

Algorithm to Insert an Element `Data` into Circular Queue

Preconditions: Let MAX be the size of the queue array. front and rear are integer pointers initialized to -1 when the queue is empty. queue is the array to store elements.

Check if queue is full: If (rear + 1) % MAX == front, then the queue is full and insertion is not possible. Report overflow.
If queue is empty: If front == -1, set front = 0 and rear = 0.
Otherwise: Update rear = (rear + 1) % MAX to move the rear pointer circularly.
Insert the new element: queue[rear] = Data
End of insertion.

Priority Queue

A Priority Queue is an abstract data type similar to a regular queue, but with an added feature: each element has a priority associated with it.

Key Points of Priority Queues

Elements are served based on priority, not just the order they arrive.
The element with the highest priority is dequeued before others.
If two elements have the same priority, they are served in the order they arrived (depending on implementation).
Can be implemented using arrays, linked lists, or more efficiently with heaps (especially binary heaps).

Priority Queue Applications

Task scheduling in operating systems.
Handling emergencies in hospital triage systems.
Managing bandwidth in network routers.
Algorithms like Dijkstra’s shortest path use priority queues.

Queue vs. Deque Comparison

Aspect	Queue (FIFO)	Deque (Double-Ended Queue)
Definition	A linear data structure that follows FIFO (First In First Out) order.	A linear data structure where insertion and deletion can happen at both ends.
Insertion	Insert only at the rear (enqueue).	Insert at both front and rear ends.
Deletion	Delete only from the front (dequeue).	Delete from both front and rear ends.
Operations	Two main operations: enqueue and dequeue.	Four operations: insert front, insert rear, delete front, delete rear.
Use Cases	Scheduling, buffering, resource management.	More flexible, used in palindrome checking, undo functionality.

Linked Lists: Types and Features

Definition of a Linked List

A Linked List is a linear data structure where elements, called nodes, are stored in a sequence. Each node contains two parts:

Data: The value stored in the node.
Pointer (or Link): A reference to the next node in the sequence.

Unlike arrays, linked lists do not store elements in contiguous memory locations, allowing dynamic memory allocation and efficient insertions/deletions.

Types of Linked Lists

Singly Linked List
- Each node has a pointer to the next node only.
- Traversal is one-way (forward).
Doubly Linked List (Two-Way Linked List)
A Doubly Linked List is a type of linked list in which each node contains three fields:
1. Data — stores the actual value.
2. Next Pointer — points to the next node in the list.
3. Previous Pointer — points to the previous node in the list.
Key Features of Doubly Linked Lists
- Allows traversal in both directions: forward and backward.
- Easier to implement operations like insertion and deletion from both ends or at any position, compared to a singly linked list.
- Requires extra memory to store the previous pointer in each node.
Circular Linked List
- The last node points back to the first node, forming a circle.
- Can be singly or doubly linked.

Header Linked List

A Header Linked List is a type of linked list where each list has a special node called a header node. This header node contains information about the list such as the count of nodes or pointers to the first and last nodes. The header node itself may or may not hold actual data.

Features of Header Linked List

Simplifies list operations by storing metadata in the header node.
Helps in easily managing the list size or other attributes.
The header node acts as the starting point of the list.

Categories of Header Linked List

Single Header Linked List: Has a single header node pointing to the first node of the list. The header stores information like the count of elements.
Double Header Linked List: Has two header nodes, typically pointing to the first and last nodes. Useful for efficient operations at both ends (like in a double-ended queue).
Circular Header Linked List: The list is circular and has a header node. The last node points back to the header node or the first node, enabling circular traversal.

Applications of Linked Lists

Dynamic Memory Allocation: Easily grows or shrinks during program execution.
Implementing Stacks and Queues: Using linked lists provides efficient insertion and deletion.
Graph Adjacency Lists: Represent graphs efficiently.
Undo/Redo Functionality: Doubly linked lists help move forward and backward through states.
Navigation Systems: Like browser history where back and forward moves are needed.

Trees and Graphs

AVL Tree (Self-Balancing Binary Search Tree)

An AVL Tree is a type of self-balancing binary search tree named after its inventors Adelson-Velsky and Landis.

Key Features of AVL Trees

For every node in the tree, the height difference (balance factor) between its left and right subtrees is at most 1.
The balance factor = height of left subtree − height of right subtree.
If the balance factor becomes less than -1 or greater than 1 after insertion or deletion, the tree is rebalanced using rotations (single or double rotations).

Advantages of AVL Trees

Guarantees O(log n) time complexity for search, insertion, and deletion.
Maintains balanced height, improving efficiency compared to ordinary binary search trees which can become skewed.

AVL Tree Applications

Used in databases and file systems where quick lookups, insertions, and deletions are required.

Graph Traversal Algorithms

Depth-First Search (DFS) Traversal Algorithm

Start at the given vertex, mark it as visited, and process it.
Recursively visit all unvisited adjacent vertices of the current vertex.
Repeat step 2 until all vertices reachable from the starting vertex are visited.

BFS vs. DFS Comparison

Aspect	BFS (Breadth-First Search)	DFS (Depth-First Search)
Traversal Order	Explores all neighbors at the current level before moving to the next level (level by level).	Explores as far as possible along one branch before backtracking.
Data Structure Used	Queue	Stack (or recursion)
Use Case	Finds shortest path in unweighted graphs.	Used for pathfinding, topological sorting, and cycle detection.
Memory Usage	Uses more memory (stores all neighbors in queue).	Uses less memory (follows one path deeply).
Implementation	Iterative	Recursive or iterative

Types of Graphs

Weakly Connected Graph: A directed graph where replacing all edges with undirected edges makes it connected.
Strongly Connected Graph: A directed graph where there is a path from every vertex to every other vertex.
Multigraph: A graph that may have multiple edges between the same pair of vertices.
Unconnected Graph: A graph where at least two vertices have no path connecting them.
Weighted Graph: A graph where each edge has an associated numerical value (weight).

Hashing and Compression

What is a Hash Function?

A Hash Function is a function that takes input data (called a key) and converts it into a fixed-size integer value called a hash code or hash value. This value is used to index into a hash table, allowing for fast data retrieval.

Properties of a Good Hash Function

Deterministic: The same input key should always produce the same hash value.
Uniform Distribution: It should distribute hash values uniformly across the hash table to minimize collisions.
Fast Computation: The function should be efficient to compute to ensure quick access.
Minimizes Collisions: Different keys should rarely produce the same hash value (collision).
Avalanche Effect: A small change in the input key should produce a significantly different hash value.

Applications of Hash Functions

Hash Tables: Used for efficient data retrieval in databases, caches, and associative arrays.
Data Integrity: In checksums and digital signatures to verify data authenticity.
Cryptography: Secure hashing algorithms (like SHA, MD5) protect passwords and sensitive data.
Caching: Quickly locate stored data.
Load Balancing: Distribute requests evenly across servers.

Collision Resolution Techniques

When two different keys map to the same hash index, a collision occurs. These techniques resolve collisions:

Double Hashing
Double hashing is a collision resolution technique where two different hash functions are used. When a collision occurs, the second hash function computes a step size to find the next available slot, reducing clustering and improving distribution.
Bucket Hashing
Bucket hashing stores multiple elements in each slot of the hash table using a bucket (a small container like a linked list or an array). This way, collisions are handled by grouping all colliding keys in the same bucket. It is efficient when the number of collisions is small and supports dynamic growth of buckets.
Bucket Hashing Applications
Used in database indexing and situations where collisions are frequent but manageable.

Huffman Algorithm for Data Compression

Huffman algorithm is a greedy algorithm used for lossless data compression. It assigns variable-length binary codes to input characters, with shorter codes for more frequent characters, reducing the total number of bits needed to represent the data.

Steps of Huffman Algorithm

Calculate frequency of each character in the input.
Create leaf nodes for each character and build a priority queue (min-heap) based on frequencies.
Extract two nodes with the smallest frequencies and combine them into a new node with frequency equal to their sum.
Insert the new node back into the priority queue.
Repeat steps 3 and 4 until only one node remains — this becomes the root of the Huffman tree.
Assign binary codes by traversing the tree: left edge = 0, right edge = 1.

Huffman Example

Given characters with frequencies: A: 5, B: 9, C: 12, D: 13, E: 16, F: 45.

Combine smallest: A(5) + B(9) = 14
New frequencies: 14, 12, 13, 16, 45
Combine smallest: 12 + 13 = 25
New frequencies: 14, 16, 25, 45

Specialized Data Structures and Techniques

Sparse Matrix Representation

A Sparse Matrix is a matrix in which most of the elements are zero. Typically, if more than half of the elements are zero, the matrix is considered sparse.

Concept and Storage Efficiency

Storing all elements, including zeros, wastes memory. Special techniques store only the non-zero elements and their positions to save space.
Common storage methods include Coordinate List (COO) or Compressed Sparse Row (CSR) format.

Sparse Matrix Example

Consider a 4×4 matrix:

[0 0 3 0]
[0 0 0 0]
[0 7 0 0]
[0 0 0 0]

Here, only 2 out of 16 elements are non-zero.

Sparse Matrix Applications

Used in scientific computing, graph algorithms, and machine learning where large datasets often contain many zeros.

Recursion

Recursion is a programming technique where a function calls itself to solve a smaller instance of the same problem until it reaches a base case (a condition where the recursion stops).

Recursion Example

Calculating factorial of a number n:

factorial(n) = n * factorial(n-1), with factorial(0) = 1 (base case)

Demerits of Recursion

High Memory Usage: Each recursive call adds a new layer to the call stack, consuming more memory. This can lead to stack overflow if recursion is too deep.
Slower Execution: Function calls have overhead. Recursive solutions can be slower than iterative ones because of this overhead and repeated function calls.
Complex Debugging: Recursive code can be harder to debug and understand, especially if the recursion logic is complex.
Risk of Infinite Recursion: If the base case is not properly defined or reached, recursion can go on indefinitely, causing program crash.

Search Algorithm Comparison

Aspect	Linear Search	Binary Search
Definition	Searches elements sequentially one by one.	Searches by repeatedly dividing the sorted list in half.
Data Requirement	Works on both sorted and unsorted lists.	Requires the list to be sorted beforehand.
Search Method	Checks each element until target is found or list ends.	Compares target with middle element and narrows search to one half.
Time Complexity	O(n) — worst case, needs to check all elements.	O(log n) — efficiently halves search space each time.
Efficiency	Less efficient for large data sets.	Much more efficient for large, sorted data.
Implementation	Simple to implement.	Slightly more complex due to recursion or loop with indices.