Data Structures and Algorithms Concepts Explained

Posted on Jan 26, 2026 in Computer Engineering

1. Algorithm Fundamentals

Definition and Characteristics of an Algorithm

An algorithm is a structured step-by-step procedure designed to solve a specific problem efficiently and correctly.

Input Requirement: It may accept zero or more input values that provide necessary data for producing meaningful results.
Output Requirement: It always produces at least one definite output representing the final answer of the computation.
Finiteness Property: Every valid algorithm must complete execution after a limited number of well-defined steps.
Clarity Property: All steps must be simple, clear, and unambiguous so anyone can correctly understand and implement them.

2. Asymptotic Notations

Types of Asymptotic Notations

Big-O Notation: Represents worst-case running time showing the maximum growth rate of an algorithm as input increases.
Omega (Ω) Notation: Represents the best-case running time and shows the minimum possible time for any input.
Theta (Θ) Notation: Describes the average or tight bound where upper and lower bounds grow equally.
Little-o Notation: Shows a strictly smaller upper bound indicating slower growth compared to another function.
Little-omega (ω): Shows a strictly larger lower bound indicating faster growth than the compared function.

3. Data Structures Basics

Definition and Types of Data Structures

A data structure is a method of organizing and storing data so operations become easy and efficient.

Linear Structures: Store data in sequential order like arrays, linked lists, stacks, and queues.
Non-linear Structures: Store data hierarchically such as trees and graphs with multiple possible paths.
Primitive Types: Include basic built-in types like integers, characters, floats, and booleans.
Derived Structures: Formed from primitives, including arrays, files, structures, lists, and classes.

Characteristics of an Array

Fixed Size: Array length must be declared at creation and cannot grow or shrink dynamically during execution.
Same Type Elements: All elements in the array must belong to the same data type for uniform storage.
Index-Based Access: Direct access to elements using index makes reading and modification extremely fast.
Contiguous Memory: Elements stored in continuous memory locations enabling fast memory access operations.
Slow Modifications: Insertion and deletion are slow because shifting multiple elements becomes necessary.

Array vs. Linked List Comparison

Feature	Array	Linked List
Size Type	Fixed-size structure requiring predefined length before use.	Dynamic size adjusts automatically with increasing or decreasing nodes.
Memory Layout	Stored in contiguous memory enhancing direct index access speed.	Stored in scattered memory blocks connected through pointers.
Access Time	Fast random access using index values in O(1) time.	Slow sequential access starting from head node until target found.
Insert/Delete	Slow operations because shifting elements is required for adjustments.	Fast operations because only pointer modifications are needed.
Memory Efficiency	Can waste memory when array is not completely filled.	Extremely memory-efficient because only actual edges are stored.

Linear vs. Non-linear Data Structures

Linear Structures: Store data in a straight sequence where every element has a unique previous and next element.
Examples of Linear: Arrays, linked lists, stacks, and queues which support predictable traversal.
Non-linear Structures: Store data in branching form allowing multiple relationships and multiple access paths.
Examples of Non-linear: Trees and graphs representing hierarchical or interconnected data structures.
Usage Difference: Linear used for simple storage; non-linear used for complex relationship modeling.

Self-Referential Structure Definition

A self-referential structure contains one or more pointers that reference another variable of the same structure type.

Purpose: Enables building dynamic data structures such as linked lists, trees, and graphs using the same format.
Basic Example: struct Node { int data; struct Node* next; }; represents a simple self-referential structure.
Usage: Forms backbone of many advanced structures including queues, stacks, binary trees, and adjacency lists.

4. Stacks and Queues

Stack vs. Queue Comparison

Feature	Stack	Queue
Order Rule	Follows LIFO where last inserted item leaves first.	Follows FIFO where first inserted element leaves first.
Insertion Point	New element added on top of stack.	New element added at rear of queue.
Deletion Point	Item removed from stack top only.	Item removed from queue front.
Usage	Used for recursion, expression evaluation, and undo operations.	Used for scheduling, buffering, and managing waiting lines.

Linked List Operations

Algorithm to Add Node at End: Create node, traverse to last node, set last node’s next pointer to new node, set new node’s next to NULL.
Algorithm to Delete Last Node: Traverse to second-last node, set its next pointer to NULL, and free the memory of the former last node.
Algorithm to Insert at Beginning: Create new node, set new node’s next to current head, and update head to point to the new node (O(1) time).

Polynomial Representation using Linked List

Each node stores coefficient, exponent, and pointer to the next term, allowing dynamic manipulation of polynomial terms.

5. Recursion and Sorting

Recursion vs. Iteration

Feature	Recursion	Iteration
Definition	Function calls itself repeatedly until a base condition stops the process.	Loop repeatedly executes statements until a condition becomes false.
Memory Usage	Requires stack memory for each call, increasing memory consumption.	Uses constant memory because loop variables remain fixed.
Execution Speed	Slower due to function call overhead and repeated stack management.	Faster because loop executes continuously without extra function calls.
Stopping Condition	Must include a base case to prevent infinite recursive calls.	Stops when loop condition fails, preventing further iterations.

Recursive Functions Example

Factorial: fact(n) = 1 if n==0 else n * fact(n-1).

Summation: sum(n) = 0 if n==0 else n + sum(n-1).

Sorting Algorithm Analysis

Bubble Sort on [5, 3, 8, 2, 1]

Pass 1: [3, 5, 2, 1, 8]
Pass 2: [3, 2, 1, 5, 8]
Pass 3: [2, 1, 3, 5, 8]
Pass 4: [1, 2, 3, 5, 8] (Sorted)

Selection Sort on [4, 9, 0, 3, 1, 6, 2, 8, -2] after 4 Passes

Pass 1 (Min -2): [-2, 9, 0, 3, 1, 6, 2, 8, 4]
Pass 2 (Min 0): [-2, 0, 9, 3, 1, 6, 2, 8, 4]
Pass 3 (Min 1): [-2, 0, 1, 3, 9, 6, 2, 8, 4]
Pass 4 (Min 2): [-2, 0, 1, 2, 9, 6, 3, 8, 4]

Sort Comparison: Bubble Sort vs. Merge Sort

Feature	Bubble Sort	Merge Sort
Concept	Repeatedly swaps adjacent elements until entire list becomes sorted.	Uses divide-and-conquer by repeatedly splitting and merging sublists.
Time Complexity	Slow O(n²) time for nearly all types of inputs.	Fast O(n log n) time suitable for large datasets.
Memory Use	Works in-place using constant extra memory.	Requires extra memory proportional to list size.
Stability	Stable algorithm preserving order of equal elements.	Stable and consistent even for complex data.

Internal vs. External Sort: Internal sorting fits in RAM; External sorting requires disk access for data exceeding memory.

6. Expression Evaluation

Infix to Postfix Conversion Algorithm

Scan expression left to right.
Add operands directly to postfix output.
Push operators onto stack based on precedence/associativity.
Handle parentheses: push ‘(‘; pop until ‘)’ is found for correct grouping.
Pop remaining operators from stack to postfix string at the end.

Postfix Evaluation

Postfix evaluation works left-to-right using a stack: push operands, and when an operator appears, pop required operands, compute, and push the result back.

Evaluate Postfix: `6 2 3 + - 38 2 / + ×`

2 + 3 = 5 (Stack: [5])
6 – 5 = 1 (Stack: [1])
38 / 2 = 19 (Stack: [1, 19])
1 + 19 = 20 (Stack: [20])
20 × 1 = 20 (Final Result)

Evaluate Prefix: `- 10 / * 3 4 2` (Processed Right-to-Left)

3 × 4 = 12
12 / 2 = 6
10 – 6 = 4 (Final Result)

Evaluate Infix: `5 3 + 7 2 - × 4 / 6 + 1 -`

5 + 3 = 8
7 – 2 = 5
8 × 5 = 40
40 / 4 = 10
10 + 6 = 16
16 – 1 = 15 (Final Result)

7. Trees and Graphs

Binary Search Tree (BST) Concept

A BST is a binary tree where every left child value is smaller than the parent, and every right child value is larger, enabling fast O(log n) searching.

BST Traversal Comparison

Feature	Preorder Traversal	Inorder Traversal
Visiting Order	Visits root first, then left subtree, and finally the entire right subtree.	Visits left subtree, then root, and finally right subtree in sorted manner.
Root Position	Root appears at the very beginning of traversal output.	Root appears between left and right subtree elements.
Expression Use	Helps generate prefix expressions used in expression tree evaluation.	Produces infix expressions and gives sorted order in BSTs.

Tree Traversal Comparison

Feature	Preorder	Inorder	Postorder
Order	Visits root before its subtrees, exploring left first and then right.	Visits left subtree, root, then right subtree in natural sorted manner.	Visits left subtree, then right subtree, finishing with root last.
Main Use	Used for prefix expression generation and tree copying.	Produces sorted order in binary search trees automatically.	Used for postfix expression evaluation and deleting whole trees.
Design Strategy	Root-first functional strategy helpful for cloning structures.	Balanced strategy offering ordered traversal output.	Leaf-first approach especially helpful for safe deletion.

AVL Tree Concepts

Purpose: AVL trees maintain strict height balance (O(log n)) after every insertion/deletion to ensure fast searching.
Balance Factor: BF = height(left subtree) – height(right subtree); must be in {-1, 0, 1}.
Imbalance: Occurs when BF falls outside {-1, 0, 1}, requiring rotations (LL, RR, LR, RL) to restore balance.
Advantage over BST: Guarantees O(log n) search time, preventing worst-case O(n) skewing.

BST Deletion Steps

Node Search: Find the target node using normal BST search rules comparing values.
Leaf Node: If node has no children, remove it directly without further modifications.
Single Child Case: Replace the node with its child to maintain BST ordering properties.
Two Children Case: Replace node’s value with inorder successor to maintain proper value ordering.
Final Cleanup: Delete the successor from its original position to complete deletion process.

Expression Tree

A special binary tree where leaves are operands and internal nodes are operators, allowing easy evaluation and precedence handling.

8. Graph Theory Fundamentals

Graph Definitions

Directed Graph: Contains edges that have direction, meaning connections follow arrow-like paths from one node to another.
Undirected Graph: Contains edges without directions where connections allow two-way movement naturally.

Walk vs. Path

Feature	Walk	Path
Node Repetition	A walk may repeat vertices or edges multiple times without restriction.	A path does not allow any repetition of vertices or edges at all.
Strictness	Walk is less strict and allows flexible movement across graph structure.	Path is stricter requiring unique sequence of vertices.
Traversal Style	Used for general exploration around graph without specific rules.	Used for shortest or simplest connecting route between vertices.

Regular Graph Example

A regular graph is one where every vertex has exactly the same degree (number of connecting edges). Example: A triangle graph is 2-regular.

Graph Representations Comparison

Feature	Adjacency Matrix	Adjacency List
Space Usage	Requires O(n²) memory storing all possible edges, even when most are empty.	Requires O(V + E) memory storing only existing edges efficiently.
Edge Checking	Very fast O(1) edge existence check because matrix lookup is constant-time.	Slower O(degree) search required to check presence of a specific edge.
Graph Type Suitability	Preferred for dense graphs containing many edges between nodes.	Best for sparse graphs where number of edges is much lower.

Graph Traversal: BFS vs. DFS

Feature	BFS	DFS
Traversal Order	Visits nodes level-by-level starting from source, ensuring shallow exploration first.	Moves deep into graph branches before backtracking, exploring paths entirely before others.
Data Structure	Uses queue for storing next nodes to visit in order.	Uses stack or recursion to manage deeper exploration.
Shortest Path	Always finds shortest path in unweighted graphs due to level-based traversal.	Does not guarantee shortest path because it explores deeper paths first.
Memory Usage	Requires more memory for storing entire levels in queue.	Requires less memory unless recursion depth becomes too large.

BFS Explanation: Starts at a node, uses a queue to visit neighbors level-by-level, tracking visited nodes.

DFS Explanation: Starts at a node, uses a stack/recursion to explore as deep as possible along one path before backtracking.

Real-World Applications of Graph Theory

Social Networks: Connecting users and analyzing relationships (e.g., Facebook).
Navigation Systems: Computing shortest and fastest possible routes (e.g., Google Maps).
Computer Networks: Packet routing and network optimization using communication links.
Search Engines: PageRank uses graph algorithms to rank web pages based on links.
Project Scheduling: PERT and CPM use directed graphs for task dependencies and timelines.

9. Hashing

Hashing Definition and Collision Resolution

Hashing converts large keys into small, fixed-size indexes using a hash function for fast O(1) average access.

Collision Resolution Techniques

Chaining: Stores multiple colliding keys in a linked list at the same bucket index.
Linear Probing: Searches sequentially for the next available slot upon collision.
Quadratic Probing: Uses squared increments (1², 2², 3², …) to find the next slot, reducing primary clustering.
Double Hashing: Uses a second hash function to determine the step size for probing.

Hashing Example (Linear Probing)

Insert key 103 into table size 13 (h(k) = k mod 13):

h(103) = 103 mod 13 = 12. (Collision if index 12 is full).
Probe next: Index (12 + 1) mod 13 = 0. (Collision if index 0 is full).
Probe next: Index (0 + 1) mod 13 = 1. Insert at index 1 if empty.

Hashing Example (h(key) = key % 7, Linear Probing)

Keys: 37, 38, 72, 48, 27, 11, 56

37 → 2 (Index 2)
38 → 3 (Index 3)
72 → 2 (Collision at 2, probe 3, insert at Index 4)
48 → 6 (Index 6)
27 → 6 (Collision at 6, probe 0, insert at Index 0)
11 → 4 (Collision at 4, probe 5, insert at Index 5)
56 → 0 (Collision at 0, probe 1, insert at Index 1)

Chain Length Analysis (Buckets = 10, Total Keys = 9)

Average Chain Length: Total Keys / Buckets = 9/10 = 0.9.
Meaning: On average, each bucket holds less than one key, showing low collision chances.

10. Tree and Graph Performance Analysis

DFS vs. BFS Performance Analysis

Feature	DFS	BFS
Time Complexity	Both operate using O(V + E) time, processing every vertex and connecting edge.
Memory Usage	Uses recursion stack memory O(h).	Needs queue memory O(w) for each level (width).
Performance Outcome	Performs better on deep, narrow graphs.	Suits wide and shallow structures more effectively.

AVL Tree vs. B-Tree Performance

Search Efficiency: Both guarantee O(log n) search time. B-trees minimize disk I/O due to high node capacity.
Insertion Behavior: AVL uses rotations; B-trees use splitting/merging, optimized for disk block size.
Usage: AVL is ideal for in-memory structures; B-tree is essential for disk-based databases.

11. Specific Tree Structures

B-Tree Indexing Rationale

B-trees are used for database indexing because they minimize disk reads by storing multiple keys per wide node, keeping the tree shallow and balanced.

B-Tree Construction (Order 3)

Each node holds max 2 keys and 3 children. Insertion involves sequential addition, splitting overflowing nodes, and promoting the middle key to the parent.

AVL Tree Height Calculation (12 Nodes)

The maximum height for an AVL tree with $n=12$ nodes is 5, derived from logarithmic height formulas ensuring balance.

BST Performance Analysis

Best Case: O(log n) when the tree is balanced.
Worst Case: O(n) when the tree becomes skewed (like a linked list).
Efficiency: Performance is entirely dependent on maintaining tree height.