Sorting Algorithms, Inversion Counting & Complexity

Hybrid Merge-Insertion Sort

[2-1 Hybrid Merge-Insertion Sort]

Stops recursion when a subarray size ≤ k and uses insertion sort for that subarray; MERGE is unchanged.

Pseudocode

INSERTION_SORT(A, p, r)
  for j = p+1 to r
    key = A[j]
    i = j - 1
    while i >= p and A[i] > key
      A[i+1] = A[i]
      i = i - 1
    A[i+1] = key

HYBRID_MERGE_SORT(A, p, r, k)
  if r - p + 1 <= k
    INSERTION_SORT(A, p, r)
  else if p < r
    q = floor((p + r) / 2)
    HYBRID_MERGE_SORT(A, p, q, k)
    HYBRID_MERGE_SORT(A, q + 1, r, k)
    MERGE(A, p, q, r)

Bubble Sort

[2-2 Bubble Sort]

Repeatedly compares and swaps adjacent elements; the smallest element (or largest, depending on implementation) bubbles up each pass.

Pseudocode

BUBBLESORT(A)
  n = A.length
  for i = 1 to n - 1        // Outer invariant: A[1..i-1] sorted
    for j = n downto i + 1  // Inner invariant: A[j] is min in A[j..n]
      if A[j] < A[j-1]
        exchange A[j] with A[j-1]
  return A

Counting Inversions with Merge Sort

[2-4 Counting Inversions with Merge Sort]

Concept: An inversion is a pair of indices (i, j) such that i < j but A[i] > A[j]. Inversions measure how “unsorted” an array is.

Part A: Listing Inversions

Method: Manually check every pair (i, j) where i < j. Example: For <2, 3, 8, 6, 1>, inversions are (3,4) [8 > 6], (1,5) [2 > 1], (2,5) [3 > 1], (3,5) [8 > 1], (4,5) [6 > 1].

Part B: Array with Most Inversions

Array: the reverse-sorted array <n, n-1, …, 1>.

Count: Every pair is an inversion. The total count is C(n, 2) = n(n-1)/2.

Part C: Inversions & Insertion Sort Runtime Relationship

The running time of Insertion Sort is Θ(n + I), where I is the number of inversions. Justification: the number of swaps (or shifts) performed by Insertion Sort is exactly equal to the number of inversions.

Part D: Θ(n log n) Algorithm to Count Inversions

Modify Merge Sort. Logic: The total number of inversions = (inversions in left half) + (inversions in right half) + (split inversions).

How to Count Split Inversions: During the MERGE step, when you copy an element from the right subarray (R) to the final array, the number of elements remaining in the left subarray (L) equals the number of new split inversions you have just found. Add this number to your total count.

MERGE_AND_COUNT_SPLIT_INV

MERGE_AND_COUNT_SPLIT_INV(A, p, q, r)
  n1 = q - p + 1
  n2 = r - q
  Create L[1..n1+1], R[1..n2+1]
  for i = 1 to n1
    L[i] = A[p + i - 1]
  for j = 1 to n2
    R[j] = A[q + j]
  L[n1+1] = +infinity
  R[n2+1] = +infinity
  i = 1; j = 1; split_inv = 0
  for k = p to r
    if L[i] <= R[j]
      A[k] = L[i]; i = i + 1
    else
      A[k] = R[j]; j = j + 1
      split_inv = split_inv + (n1 - i + 1)
  return split_inv

COUNT_INVERSIONS(A, p, r)
  inv = 0
  if p < r
    q = floor((p + r) / 2)
    inv = inv + COUNT_INVERSIONS(A, p, q)
    inv = inv + COUNT_INVERSIONS(A, q + 1, r)
    inv = inv + MERGE_AND_COUNT_SPLIT_INV(A, p, q, r)
  return inv

Insertion Sort

Insertion Sort

INSERTION_SORT(A)
  n = length(A)
  for j = 2 to n
    key = A[j]
    i = j - 1
    while i >= 1 and A[i] > key
      A[i+1] = A[i]
      i = i - 1
    A[i+1] = key
  return A

Best for small or nearly-sorted arrays; worst-case when reverse-sorted: Θ(n²). Stable and in-place. CLRS: loop invariant proves correctness.


Merge Sort (no sentinels)

MERGE(A, p, q, r)
  n1 = q - p + 1
  n2 = r - q
  L[1..n1], R[1..n2]
  for i = 1 to n1
    L[i] = A[p + i - 1]
  for j = 1 to n2
    R[j] = A[q + j]
  i = 1; j = 1; k = p
  while i <= n1 and j <= n2
    if L[i] <= R[j]
      A[k] = L[i]; i = i + 1
    else
      A[k] = R[j]; j = j + 1
    k = k + 1
  while i <= n1
    A[k] = L[i]; i = i + 1; k = k + 1
  while j <= n2
    A[k] = R[j]; j = j + 1; k = k + 1

MERGE_SORT(A, p, r)
  if p < r
    q = floor((p + r) / 2)
    MERGE_SORT(A, p, q)
    MERGE_SORT(A, q + 1, r)
    MERGE(A, p, q, r)
  return A

Performance: best case and all cases Θ(n log n). Requires Θ(n) auxiliary space; stable.


Quick Sort (Lomuto + Randomized Pivot)

PARTITION(A, p, r)
  x = A[r]
  i = p - 1
  for j = p to r - 1
    if A[j] <= x
      i = i + 1
      swap A[i], A[j]
  swap A[i + 1], A[r]
  return i + 1

RANDOMIZED_PARTITION(A, p, r)
  i = random integer in [p, r]
  swap A[i], A[r]
  return PARTITION(A, p, r)

QUICKSORT(A, p, r, rand = false)
  if p < r
    if rand
      q = RANDOMIZED_PARTITION(A, p, r)
    else
      q = PARTITION(A, p, r)
    QUICKSORT(A, p, q - 1, rand)
    QUICKSORT(A, q + 1, r, rand)
  return A

Best when pivot choices are random or the array is shuffled; worst when pivot extremes are chosen repeatedly (e.g., sorted input with last element pivot) leading to Θ(n²). QuickSort is in-place but not stable.


Counting Sort

COUNTING_SORT(A, k)
  n = length(A)
  B[1..n]
  C[0..k] = 0
  for j = 1 to n
    C[A[j]] = C[A[j]] + 1
  for i = 1 to k
    C[i] = C[i] + C[i - 1]
  for j = n downto 1
    B[C[A[j]]] = A[j]
    C[A[j]] = C[A[j]] - 1
  return B

Best when k = O(n). Fails when k ≫ n or keys are non-integers. Running time Θ(n + k). Stable.


Radix Sort (LSD)

RADIX_SORT(A, d, k)
  for i = 1 to d
    A = STABLE_COUNT_ON_DIGIT(A, i, k)
  return A

STABLE_COUNT_ON_DIGIT(A, i, k)
  n = length(A)
  B[1..n]
  C[0..k] = 0
  for j = 1 to n
    C[DIGIT(A[j], i)] = C[DIGIT(A[j], i)] + 1
  for t = 1 to k
    C[t] = C[t] + C[t - 1]
  for j = n downto 1
    B[C[DIGIT(A[j], i)]] = A[j]
    C[DIGIT(A[j], i)] = C[DIGIT(A[j], i)] - 1
  return B

Best for small digit count d and small base k. Running time Θ(d (n + k)). Stable.


Maximum Subarray (Divide & Conquer)

FIND_MAX_CROSS(A, l, m, h)
  left = -infinity; sum = 0
  for i = m downto l
    sum = sum + A[i]
    if sum > left
      left = sum; maxl = i
  right = -infinity; sum = 0
  for j = m + 1 to h
    sum = sum + A[j]
    if sum > right
      right = sum; maxr = j
  return (maxl, maxr, left + right)

MAX_SUBARRAY_DC(A, l, h)
  if l == h
    return (l, h, A[l])
  m = floor((l + h) / 2)
  (ll, lh, ls) = MAX_SUBARRAY_DC(A, l, m)
  (rl, rh, rs) = MAX_SUBARRAY_DC(A, m + 1, h)
  (cl, ch, cs) = FIND_MAX_CROSS(A, l, m, h)
  if ls >= rs and ls >= cs
    return (ll, lh, ls)
  elseif rs >= ls and rs >= cs
    return (rl, rh, rs)
  else
    return (cl, ch, cs)

Kadane’s Algorithm

KADANE(A)
  max = -infinity
  meh = 0
  start = s = 1
  end = 1
  for i = 1 to len(A)
    meh = meh + A[i]
    if max < meh
      max = meh; start = s; end = i
    if meh < 0
      meh = 0; s = i + 1
  return (start, end, max)

Kadane runs in Θ(n); the divide-and-conquer method runs in Θ(n log n).


Hybrid Merge-Insertion Sort (Repeat)

Hybrid Merge-Insertion Sort

INSERTION_SORT_SUB(A, p, r)
  for j = p + 1 to r
    key = A[j]
    i = j - 1
    while i >= p and A[i] > key
      A[i+1] = A[i]
      i = i - 1
    A[i+1] = key

HYBRID_MERGE_SORT(A, p, r, k)
  if r - p + 1 <= k
    INSERTION_SORT_SUB(A, p, r)
  elseif p < r
    q = floor((p + r) / 2)
    HYBRID_MERGE_SORT(A, p, q, k)
    HYBRID_MERGE_SORT(A, q + 1, r, k)
    MERGE(A, p, q, r)
  return A

Recommended threshold k ≈ 16–32 for good practical performance; if k is too large the algorithm degrades toward Θ(n²).


Bubble Sort (Repeat)

BUBBLESORT(A)
  n = length(A)
  for i = 1 to n - 1
    for j = n downto i + 1
      if A[j] < A[j - 1]
        swap A[j], A[j - 1]
  return A

Best when the array is nearly sorted; worst-case Θ(n²); stable.


Counting Inversions via Merge Sort (Repeat)

MERGE_AND_COUNT(A, p, q, r)
  n1 = q - p + 1
  n2 = r - q
  L[1..n1], R[1..n2]
  for i = 1 to n1
    L[i] = A[p + i - 1]
  for j = 1 to n2
    R[j] = A[q + j]
  i = 1; j = 1; k = p; inv = 0
  while i <= n1 and j <= n2
    if L[i] <= R[j]
      A[k] = L[i]; i = i + 1
    else
      A[k] = R[j]; j = j + 1
      inv = inv + (n1 - i + 1)
    k = k + 1
  while i <= n1
    A[k] = L[i]; i = i + 1; k = k + 1
  while j <= n2
    A[k] = R[j]; j = j + 1; k = k + 1
  return inv

COUNT_INVERSIONS(A, p, r)
  inv = 0
  if p < r
    q = floor((p + r) / 2)
    inv += COUNT_INVERSIONS(A, p, q)
    inv += COUNT_INVERSIONS(A, q + 1, r)
    inv += MERGE_AND_COUNT(A, p, q, r)
  return inv

Counts inversions in Θ(n log n) time.


Problem 7.1: Partition Correctness

Problem: Show that PARTITION(A, p, r) of QuickSort places the pivot element into its final (sorted) position and partitions the array so that all elements ≤ pivot are left and all elements ≥ pivot are right.

Answer: Let x = A[r] be the pivot. The algorithm uses index i initialized to p − 1 and iterates j from p to r − 1: if A[j] ≤ x then i &leftarrow; i + 1; swap A[i], A[j].

Loop invariant for each j: elements A[p..i] ≤ x; elements A[i+1..j−1] > x; elements A[j..r−1] not yet processed; A[r] = x.

Initialization: j = p ⇒ i = p − 1 ⇒ A[p..i] is empty (vacuously ≤ x) and A[i+1..j−1] = A[p..p−1] is empty ⇒ invariant holds.

Maintenance: On processing A[j], if A[j] ≤ x we increment i and swap A[i], A[j], making A[p..i] ≤ x; else A[j] remains in A[i+1..j] > x; the invariant is preserved.

Termination: After j = r − 1, the invariant gives A[p..i] ≤ x and A[i+1..r−1] > x. The final swap A[i+1] ↔ A[r] places x at position q = i + 1 with left ≤ x and right > x. Because all elements left of q are ≤ x and all right of q are > x and x is at q, no element can move past q in subsequent recursive calls ⇒ x is in its final sorted position. QED.


Problem 7.2: QuickSort Worst Case

Problem: Show QuickSort’s worst-case running time and give an example input that achieves it; derive recurrence and Θ bound.

Answer: Worst-case occurs when PARTITION splits the array into sizes n − 1 and 0 every time (most unbalanced). Example: choose pivot as A[r] and input already sorted ascending (or descending) so the pivot is always the max (or min) ⇒ one side empty.

Recurrence: T(n) = T(n − 1) + Θ(n) (Θ(n) for partition) with T(1) = Θ(1). Solve: T(n) = Θ(1) + ∑_{k=2}^{n} Θ(k) = Θ(∑_{k=1}^{n} k) = Θ(n²). Hence QuickSort worst-case running time is Θ(n²). A fix is randomized pivot selection or median-of-three to reduce the probability of this case. QED.


Problem 7.4: Expected Comparisons in Randomized QuickSort

Problem: Compute the expected number of comparisons performed by randomized QuickSort; show E[comparisons] ≈ 2 n ln n ≈ 1.39 n log₂ n.

Answer: Label input elements in sorted order 1..n (ranks). Define indicator X_{i,j} = 1 if elements with ranks i < j are ever compared. Each unordered pair is compared at most once. The probability that pair (i, j) is compared equals the probability that, during QuickSort, the first pivot chosen among elements {i, i+1, …, j} is either i or j ⇒ 2/(j − i + 1).

Expected total comparisons E[C] = ∑_{1 ≤ i < j ≤ n} Pr{(i, j) compared} = ∑_{i < j} 2/(j − i + 1). Change variable k = j − i + 1 (k from 2 to n) and count pairs with gap k: (n − k + 1) pairs. Thus E[C] = ∑_{k=2}^{n} (n − k + 1)·(2/k) = 2∑_{k=2}^{n} (n + 1)/k − 2∑_{k=2}^{n} 1 = 2(n + 1)(H_n − 1) − 2(n − 1), where H_n is the harmonic number ≈ ln n + γ. The dominant term ≈ 2(n + 1) ln n − 2n + const ≈ 2n ln n + O(n). Converting to base-2: 2n ln n ≈ 1.386 n log₂ n, often rounded ≈ 1.39 n log₂ n. Hence E[C] ≈ 2 n ln n. QED.


Problem 7.5: Expected Runtime of Randomized QuickSort

Problem: Show randomized QuickSort has expected running time Θ(n log n). Provide the expected recurrence or argument that gives Θ(n log n).

Answer: Randomized QuickSort chooses the pivot uniformly at random at each recursive call. Let T(n) be the expected time on n distinct keys. Condition on pivot rank q (left-subarray size q): pivot chosen uniformly ⇒ Pr[left size = q] = 1/n for q = 0..n−1. Partition cost is Θ(n) deterministically. Thus

T(n) = (1/n) ∑_{q=0}^{n−1} (T(q) + T(n − q − 1)) + Θ(n).

Multiply both sides by n: n T(n) = ∑_{q=0}^{n−1} T(q) + ∑_{q=0}^{n−1} T(n − q − 1) + Θ(n²) = 2 ∑_{q=0}^{n−1} T(q) + Θ(n²). This recurrence solves to T(n) = Θ(n log n) (standard telescoping / induction). Intuitively, the expected split is reasonably balanced on average, yielding Θ(n log n). Expected comparisons are also Θ(n log n). QED.


Problem 8.1: Lower Bound for Comparison Sorts

Problem: Prove that any comparison-based sorting algorithm requires Ω(n log n) comparisons in the worst case.

Answer: Model any deterministic comparison-based sorting algorithm as a binary decision tree: internal nodes = comparisons between elements, edges = outcomes (< or ≥), leaves = possible permutations (output orders) consistent with comparison outcomes. For n distinct keys there are n! possible permutations; to correctly sort any input the decision tree must have at least n! leaves. A binary tree of height h has at most 2^h leaves ⇒ 2^h ≥ n! ⇒ h ≥ log₂(n!). Use Stirling’s approximation log₂(n!) = Θ(n log n) (more precisely log₂ n! = n log₂ n − Θ(n)). Thus the worst-case number of comparisons (depth of tree) h = Ω(n log n). Therefore any comparison-based sort must perform at least Ω(n log n) comparisons in the worst case. QED.