Data Structures and Algorithms Concepts Explained

1. Algorithm Fundamentals

Definition and Characteristics of an Algorithm

An algorithm is a structured step-by-step procedure designed to solve a specific problem efficiently and correctly.

  • Input Requirement: It may accept zero or more input values that provide necessary data for producing meaningful results.
  • Output Requirement: It always produces at least one definite output representing the final answer of the computation.
  • Finiteness Property: Every valid algorithm must complete execution after a limited number of well-defined steps.
  • Clarity Property: All steps must be simple, clear, and unambiguous so anyone can correctly understand and implement them.

2. Asymptotic Notations

Types of Asymptotic Notations

  • Big-O Notation: Represents worst-case running time showing the maximum growth rate of an algorithm as input increases.
  • Omega (Ω) Notation: Represents the best-case running time and shows the minimum possible time for any input.
  • Theta (Θ) Notation: Describes the average or tight bound where upper and lower bounds grow equally.
  • Little-o Notation: Shows a strictly smaller upper bound indicating slower growth compared to another function.
  • Little-omega (ω): Shows a strictly larger lower bound indicating faster growth than the compared function.

3. Data Structures Basics

Definition and Types of Data Structures

A data structure is a method of organizing and storing data so operations become easy and efficient.

  • Linear Structures: Store data in sequential order like arrays, linked lists, stacks, and queues.
  • Non-linear Structures: Store data hierarchically such as trees and graphs with multiple possible paths.
  • Primitive Types: Include basic built-in types like integers, characters, floats, and booleans.
  • Derived Structures: Formed from primitives, including arrays, files, structures, lists, and classes.

Characteristics of an Array

  • Fixed Size: Array length must be declared at creation and cannot grow or shrink dynamically during execution.
  • Same Type Elements: All elements in the array must belong to the same data type for uniform storage.
  • Index-Based Access: Direct access to elements using index makes reading and modification extremely fast.
  • Contiguous Memory: Elements stored in continuous memory locations enabling fast memory access operations.
  • Slow Modifications: Insertion and deletion are slow because shifting multiple elements becomes necessary.

Array vs. Linked List Comparison

FeatureArrayLinked List
Size TypeFixed-size structure requiring predefined length before use.Dynamic size adjusts automatically with increasing or decreasing nodes.
Memory LayoutStored in contiguous memory enhancing direct index access speed.Stored in scattered memory blocks connected through pointers.
Access TimeFast random access using index values in O(1) time.Slow sequential access starting from head node until target found.
Insert/DeleteSlow operations because shifting elements is required for adjustments.Fast operations because only pointer modifications are needed.
Memory EfficiencyCan waste memory when array is not completely filled.Extremely memory-efficient because only actual edges are stored.

Linear vs. Non-linear Data Structures

  • Linear Structures: Store data in a straight sequence where every element has a unique previous and next element.
  • Examples of Linear: Arrays, linked lists, stacks, and queues which support predictable traversal.
  • Non-linear Structures: Store data in branching form allowing multiple relationships and multiple access paths.
  • Examples of Non-linear: Trees and graphs representing hierarchical or interconnected data structures.
  • Usage Difference: Linear used for simple storage; non-linear used for complex relationship modeling.

Self-Referential Structure Definition

A self-referential structure contains one or more pointers that reference another variable of the same structure type.

  • Purpose: Enables building dynamic data structures such as linked lists, trees, and graphs using the same format.
  • Basic Example: struct Node { int data; struct Node* next; }; represents a simple self-referential structure.
  • Usage: Forms backbone of many advanced structures including queues, stacks, binary trees, and adjacency lists.

4. Stacks and Queues

Stack vs. Queue Comparison

FeatureStackQueue
Order RuleFollows LIFO where last inserted item leaves first.Follows FIFO where first inserted element leaves first.
Insertion PointNew element added on top of stack.New element added at rear of queue.
Deletion PointItem removed from stack top only.Item removed from queue front.
UsageUsed for recursion, expression evaluation, and undo operations.Used for scheduling, buffering, and managing waiting lines.

Linked List Operations

  • Algorithm to Add Node at End: Create node, traverse to last node, set last node’s next pointer to new node, set new node’s next to NULL.
  • Algorithm to Delete Last Node: Traverse to second-last node, set its next pointer to NULL, and free the memory of the former last node.
  • Algorithm to Insert at Beginning: Create new node, set new node’s next to current head, and update head to point to the new node (O(1) time).

Polynomial Representation using Linked List

Each node stores coefficient, exponent, and pointer to the next term, allowing dynamic manipulation of polynomial terms.

5. Recursion and Sorting

Recursion vs. Iteration

FeatureRecursionIteration
DefinitionFunction calls itself repeatedly until a base condition stops the process.Loop repeatedly executes statements until a condition becomes false.
Memory UsageRequires stack memory for each call, increasing memory consumption.Uses constant memory because loop variables remain fixed.
Execution SpeedSlower due to function call overhead and repeated stack management.Faster because loop executes continuously without extra function calls.
Stopping ConditionMust include a base case to prevent infinite recursive calls.Stops when loop condition fails, preventing further iterations.

Recursive Functions Example

Factorial: fact(n) = 1 if n==0 else n * fact(n-1).

Summation: sum(n) = 0 if n==0 else n + sum(n-1).

Sorting Algorithm Analysis

Bubble Sort on [5, 3, 8, 2, 1]
  • Pass 1: [3, 5, 2, 1, 8]
  • Pass 2: [3, 2, 1, 5, 8]
  • Pass 3: [2, 1, 3, 5, 8]
  • Pass 4: [1, 2, 3, 5, 8] (Sorted)
Selection Sort on [4, 9, 0, 3, 1, 6, 2, 8, -2] after 4 Passes
  • Pass 1 (Min -2): [-2, 9, 0, 3, 1, 6, 2, 8, 4]
  • Pass 2 (Min 0): [-2, 0, 9, 3, 1, 6, 2, 8, 4]
  • Pass 3 (Min 1): [-2, 0, 1, 3, 9, 6, 2, 8, 4]
  • Pass 4 (Min 2): [-2, 0, 1, 2, 9, 6, 3, 8, 4]
Sort Comparison: Bubble Sort vs. Merge Sort
FeatureBubble SortMerge Sort
ConceptRepeatedly swaps adjacent elements until entire list becomes sorted.Uses divide-and-conquer by repeatedly splitting and merging sublists.
Time ComplexitySlow O(n²) time for nearly all types of inputs.Fast O(n log n) time suitable for large datasets.
Memory UseWorks in-place using constant extra memory.Requires extra memory proportional to list size.
StabilityStable algorithm preserving order of equal elements.Stable and consistent even for complex data.

Internal vs. External Sort: Internal sorting fits in RAM; External sorting requires disk access for data exceeding memory.

6. Expression Evaluation

Infix to Postfix Conversion Algorithm

  1. Scan expression left to right.
  2. Add operands directly to postfix output.
  3. Push operators onto stack based on precedence/associativity.
  4. Handle parentheses: push ‘(‘; pop until ‘)’ is found for correct grouping.
  5. Pop remaining operators from stack to postfix string at the end.

Postfix Evaluation

Postfix evaluation works left-to-right using a stack: push operands, and when an operator appears, pop required operands, compute, and push the result back.

Evaluate Postfix: 6 2 3 + - 38 2 / + ×
  1. 2 + 3 = 5 (Stack: [5])
  2. 6 – 5 = 1 (Stack: [1])
  3. 38 / 2 = 19 (Stack: [1, 19])
  4. 1 + 19 = 20 (Stack: [20])
  5. 20 × 1 = 20 (Final Result)
Evaluate Prefix: - 10 / * 3 4 2 (Processed Right-to-Left)
  1. 3 × 4 = 12
  2. 12 / 2 = 6
  3. 10 – 6 = 4 (Final Result)
Evaluate Infix: 5 3 + 7 2 - × 4 / 6 + 1 -
  1. 5 + 3 = 8
  2. 7 – 2 = 5
  3. 8 × 5 = 40
  4. 40 / 4 = 10
  5. 10 + 6 = 16
  6. 16 – 1 = 15 (Final Result)

7. Trees and Graphs

Binary Search Tree (BST) Concept

A BST is a binary tree where every left child value is smaller than the parent, and every right child value is larger, enabling fast O(log n) searching.

BST Traversal Comparison

FeaturePreorder TraversalInorder Traversal
Visiting OrderVisits root first, then left subtree, and finally the entire right subtree.Visits left subtree, then root, and finally right subtree in sorted manner.
Root PositionRoot appears at the very beginning of traversal output.Root appears between left and right subtree elements.
Expression UseHelps generate prefix expressions used in expression tree evaluation.Produces infix expressions and gives sorted order in BSTs.

Tree Traversal Comparison

FeaturePreorderInorderPostorder
OrderVisits root before its subtrees, exploring left first and then right.Visits left subtree, root, then right subtree in natural sorted manner.Visits left subtree, then right subtree, finishing with root last.
Main UseUsed for prefix expression generation and tree copying.Produces sorted order in binary search trees automatically.Used for postfix expression evaluation and deleting whole trees.
Design StrategyRoot-first functional strategy helpful for cloning structures.Balanced strategy offering ordered traversal output.Leaf-first approach especially helpful for safe deletion.

AVL Tree Concepts

  • Purpose: AVL trees maintain strict height balance (O(log n)) after every insertion/deletion to ensure fast searching.
  • Balance Factor: BF = height(left subtree) – height(right subtree); must be in {-1, 0, 1}.
  • Imbalance: Occurs when BF falls outside {-1, 0, 1}, requiring rotations (LL, RR, LR, RL) to restore balance.
  • Advantage over BST: Guarantees O(log n) search time, preventing worst-case O(n) skewing.

BST Deletion Steps

  1. Node Search: Find the target node using normal BST search rules comparing values.
  2. Leaf Node: If node has no children, remove it directly without further modifications.
  3. Single Child Case: Replace the node with its child to maintain BST ordering properties.
  4. Two Children Case: Replace node’s value with inorder successor to maintain proper value ordering.
  5. Final Cleanup: Delete the successor from its original position to complete deletion process.

Expression Tree

A special binary tree where leaves are operands and internal nodes are operators, allowing easy evaluation and precedence handling.

8. Graph Theory Fundamentals

Graph Definitions

  • Directed Graph: Contains edges that have direction, meaning connections follow arrow-like paths from one node to another.
  • Undirected Graph: Contains edges without directions where connections allow two-way movement naturally.

Walk vs. Path

FeatureWalkPath
Node RepetitionA walk may repeat vertices or edges multiple times without restriction.A path does not allow any repetition of vertices or edges at all.
StrictnessWalk is less strict and allows flexible movement across graph structure.Path is stricter requiring unique sequence of vertices.
Traversal StyleUsed for general exploration around graph without specific rules.Used for shortest or simplest connecting route between vertices.

Regular Graph Example

A regular graph is one where every vertex has exactly the same degree (number of connecting edges). Example: A triangle graph is 2-regular.

Graph Representations Comparison

FeatureAdjacency MatrixAdjacency List
Space UsageRequires O(n²) memory storing all possible edges, even when most are empty.Requires O(V + E) memory storing only existing edges efficiently.
Edge CheckingVery fast O(1) edge existence check because matrix lookup is constant-time.Slower O(degree) search required to check presence of a specific edge.
Graph Type SuitabilityPreferred for dense graphs containing many edges between nodes.Best for sparse graphs where number of edges is much lower.

Graph Traversal: BFS vs. DFS

FeatureBFSDFS
Traversal OrderVisits nodes level-by-level starting from source, ensuring shallow exploration first.Moves deep into graph branches before backtracking, exploring paths entirely before others.
Data StructureUses queue for storing next nodes to visit in order.Uses stack or recursion to manage deeper exploration.
Shortest PathAlways finds shortest path in unweighted graphs due to level-based traversal.Does not guarantee shortest path because it explores deeper paths first.
Memory UsageRequires more memory for storing entire levels in queue.Requires less memory unless recursion depth becomes too large.

BFS Explanation: Starts at a node, uses a queue to visit neighbors level-by-level, tracking visited nodes.

DFS Explanation: Starts at a node, uses a stack/recursion to explore as deep as possible along one path before backtracking.

Real-World Applications of Graph Theory

  • Social Networks: Connecting users and analyzing relationships (e.g., Facebook).
  • Navigation Systems: Computing shortest and fastest possible routes (e.g., Google Maps).
  • Computer Networks: Packet routing and network optimization using communication links.
  • Search Engines: PageRank uses graph algorithms to rank web pages based on links.
  • Project Scheduling: PERT and CPM use directed graphs for task dependencies and timelines.

9. Hashing

Hashing Definition and Collision Resolution

Hashing converts large keys into small, fixed-size indexes using a hash function for fast O(1) average access.

Collision Resolution Techniques
  • Chaining: Stores multiple colliding keys in a linked list at the same bucket index.
  • Linear Probing: Searches sequentially for the next available slot upon collision.
  • Quadratic Probing: Uses squared increments (1², 2², 3², …) to find the next slot, reducing primary clustering.
  • Double Hashing: Uses a second hash function to determine the step size for probing.
Hashing Example (Linear Probing)

Insert key 103 into table size 13 (h(k) = k mod 13):

  1. h(103) = 103 mod 13 = 12. (Collision if index 12 is full).
  2. Probe next: Index (12 + 1) mod 13 = 0. (Collision if index 0 is full).
  3. Probe next: Index (0 + 1) mod 13 = 1. Insert at index 1 if empty.
Hashing Example (h(key) = key % 7, Linear Probing)

Keys: 37, 38, 72, 48, 27, 11, 56

  • 37 → 2 (Index 2)
  • 38 → 3 (Index 3)
  • 72 → 2 (Collision at 2, probe 3, insert at Index 4)
  • 48 → 6 (Index 6)
  • 27 → 6 (Collision at 6, probe 0, insert at Index 0)
  • 11 → 4 (Collision at 4, probe 5, insert at Index 5)
  • 56 → 0 (Collision at 0, probe 1, insert at Index 1)
Chain Length Analysis (Buckets = 10, Total Keys = 9)
  • Average Chain Length: Total Keys / Buckets = 9/10 = 0.9.
  • Meaning: On average, each bucket holds less than one key, showing low collision chances.

10. Tree and Graph Performance Analysis

DFS vs. BFS Performance Analysis

FeatureDFSBFS
Time ComplexityBoth operate using O(V + E) time, processing every vertex and connecting edge.
Memory UsageUses recursion stack memory O(h).Needs queue memory O(w) for each level (width).
Performance OutcomePerforms better on deep, narrow graphs.Suits wide and shallow structures more effectively.

AVL Tree vs. B-Tree Performance

  • Search Efficiency: Both guarantee O(log n) search time. B-trees minimize disk I/O due to high node capacity.
  • Insertion Behavior: AVL uses rotations; B-trees use splitting/merging, optimized for disk block size.
  • Usage: AVL is ideal for in-memory structures; B-tree is essential for disk-based databases.

11. Specific Tree Structures

B-Tree Indexing Rationale

B-trees are used for database indexing because they minimize disk reads by storing multiple keys per wide node, keeping the tree shallow and balanced.

B-Tree Construction (Order 3)

Each node holds max 2 keys and 3 children. Insertion involves sequential addition, splitting overflowing nodes, and promoting the middle key to the parent.

AVL Tree Height Calculation (12 Nodes)

The maximum height for an AVL tree with $n=12$ nodes is 5, derived from logarithmic height formulas ensuring balance.

BST Performance Analysis

  • Best Case: O(log n) when the tree is balanced.
  • Worst Case: O(n) when the tree becomes skewed (like a linked list).
  • Efficiency: Performance is entirely dependent on maintaining tree height.