Discrete Mathematics & Probability Essentials

Posted on Jul 21, 2025 in Mathematics

Probability Fundamentals

Probability is defined as the ratio of the number of successful outcomes to the total number of possible outcomes. Every probability problem involves a probability space, which consists of:

Sample Space: A non-empty, countable set containing all possible outcomes of an experiment.
Probability Function: A function that maps each outcome to its probability. The sum of probabilities for all outcomes in the sample space must equal 1.

Core Probability Rules

Complement Rule

P(not A) = 1 – P(A)

Difference Rule

P(A excluding B) = P(A and not B) = P(A ∩ Bᶜ) = P(A) – P(A ∩ B)

Note: P(A and B) is often abbreviated as P(AB) or P(A ∩ B).

Principle of Inclusion-Exclusion

For two events: P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
For three events: P(A ∪ B ∪ C) = P(A) + P(B) + P(C) – P(A ∩ B) – P(A ∩ C) – P(B ∩ C) + P(A ∩ B ∩ C)

Conditional Probability Concepts

Conditional Probability Definition

P(A given B) = P(A | B) = P(A ∩ B) / P(B) (provided P(B) > 0)

Bayes’ Theorem

P(A | B)P(B) = P(A ∩ B)

Alternatively, Bayes’ Rule states: P(A | B)P(B) = P(B | A)P(A)

Law of Total Probability

P(A) = P(A | B)P(B) + P(A | Bᶜ)P(Bᶜ)

Event Independence

Defining Independent Events

Two events A and B are independent if:

P(A | B) = P(A) (provided P(B) > 0), or
P(A ∩ B) = P(A)P(B)

Important: This does not mean events are disjoint. Disjoint events (mutually exclusive) are generally not independent, unless one of the events has a probability of zero.

Pairwise & Mutual Independence

To prove three events A, B, C are mutually independent, one must show the following four conditions:

P(A ∩ B) = P(A)P(B)
P(A ∩ C) = P(A)P(C)
P(B ∩ C) = P(B)P(C)
P(A ∩ B ∩ C) = P(A)P(B)P(C)

Understanding Random Variables

Random Variable Definition

A random variable (RV) is a function whose domain is the sample space of an experiment. The probability of a random variable X taking a specific outcome x is denoted P(X=x).

Independent Random Variables

Two random variables X and Y are independent if, for all values x and y within their ranges, it holds that P(X=x, Y=y) = P(X=x)P(Y=y).

For multiple random variables X₁, X₂, …, Xₙ to be mutually independent, the joint probability of any combination of their values must equal the product of their individual probabilities: P(X₁=x₁, X₂=x₂, …, Xₙ=xₙ) = P(X₁=x₁)P(X₂=x₂)…P(Xₙ=xₙ).

PMF & CDF Explained

Probability Mass & Cumulative Distribution Functions

A random variable X has a Probability Mass Function (PMF), denoted f(x) = P(X=x).
Its Cumulative Distribution Function (CDF) is F(x) = Σ P(X=k) for all k ≤ x.
Important Property: PMFₓ(k) = CDFₓ(k) – CDFₓ(k-1). This means you can use the CDF to derive individual PMFs, but you do not use PMF to derive CDF.

Special Random Variables

Indicator & Uniform Random Variables

An Indicator Random Variable (IRV) takes on values 1 with probability p and 0 with probability (1-p).
A Uniform Random Variable takes on values (1, …, n) with probability 1/n each.

Binomial Distribution

Binomial PMF & CDF

The Binomial Distribution is defined by parameters n (number of trials) and p (probability of success in a single trial).

PMF: f(k) = P(X=k) = C(n, k) * pᵏ * (1-p)ⁿ⁻ᵏ
CDF: F(k) = P(X ≤ k) = Σ C(n, j) * pʲ * (1-p)ⁿ⁻ʲ from j=0 to k

Expected Value & Properties

Definition & Linearity

Expectation is a weighted average of the values of a random variable, weighted by their probabilities.

Definition: E(X) = Σ x * P(X=x) for all possible values x of X.
Linearity of Expectation: This property does not require independence. E(X₁ + X₂) = E(X₁) + E(X₂). Expectations can always be summed.
Mean Time to Failure (Geometric Distribution): For a process repeated until failure, where each iteration has a probability p of failure, the expected number of iterations (including the failure) is 1/p.
Products of Expectation: E(X₁X₂) = E(X₁)E(X₂) if X₁ and X₂ are independent.
Expectation of Linear Shifted Random Variable: E(aX + b) = aE(X) + b (where E(b) = b for constants).

Conditional Expectations

Conditional Expectation & Total Expectation

Conditional Expectation: E(X | Y=y) = Σ x * P(X=x | Y=y) for all possible values x of X.
Total Expectation Theorem: E(R) = E(R | E₁)P(E₁) + … + E(R | Eₙ)P(Eₙ), where E₁, …, Eₙ form a partition of the sample space.

Variance & Standard Deviation

Variance Definition & Properties

For a random variable R, Variance is: Var(R) = E[(R – E(R))²] = E(R²) – E(R)²
Standard Deviation (SD): SD(R) = √Var(R)
If random variables R₁, …, Rₙ are pairwise independent, then Var(R₁ + … + Rₙ) = Var(R₁) + … + Var(Rₙ).
Variance of Linear Shifted Random Variable: Var(aX + b) = a²Var(X).

Probability Tail Bounds

Union Bound, Markov, Chebyshev, Chernoff

Tail bounds provide upper limits on the probability of a random variable deviating significantly from its expected value, often without requiring independence assumptions. They are particularly useful for bounding the probability of”bad events”

Union Bound: P(A ∪ B) ≤ P(A) + P(B). This can be generalized to any number of events: P(∪ᵢ Aᵢ) ≤ Σᵢ P(Aᵢ).
Markov’s Inequality: Let R be a non-negative random variable. Then: P(R ≥ x) ≤ E(R)/x or P(R ≥ c * E(R)) ≤ 1/c. (If R can have negative values, shift it by a constant to make it non-negative, then shift back.)
Chebyshev’s Inequality: Let R be a random variable. For all x > 0: P(|R – E(R)| ≥ x) ≤ Var(R)/x² or P(|R – E(R)| ≥ c * SD(R)) ≤ 1/c².
Chernoff Bounds: If T₁, …, Tₙ are mutually independent random variables such that 0 ≤ Tᵢ ≤ 1 for all i. Let T = T₁ + … + Tₙ. Then for all c ≥ 1: P(T ≥ c * E(T)) ≤ e⁻⁽ᶜˡⁿᶜ ⁻ᶜ⁺¹⁾ᴱ⁽ᵀ⁾.

Pigeonhole Principle

Applying the Pigeonhole Principle

The Pigeonhole Principle (PHP) is often applied by considering the worst-case scenario. It states that if you have n pigeonholes and n+1 pigeons, then at least one pigeonhole must contain at least two pigeons. To apply PHP, clearly define what constitutes the ‘pigeons’ and the ‘pigeonholes’. Ensure all possibilities are covered and pigeonholes are disjoint.

Combinatorial Proofs & Counting

Double Counting Method

The key to a double counting or combinatorial proof is to demonstrate that both sides of an equation count the same set of objects in two different ways. Typically, one side is more straightforward, while the other may require summing over various cases.

Counting Principles

Binomial Coefficient: The number of ways to choose a subset of k items from a set of n items, denoted as ‘n choose k’ or C(n, k), is calculated as: C(n, k) = n! / (k!(n − k)!)
Bijection Rule: Count a difficult set by finding a one-to-one (bijective) mapping with an easier set that can be readily counted.
Product Rule: If a procedure can be broken down into a sequence of independent tasks, the total number of ways is the product of the number of ways for each task.
Sum Rule: If a task can be done in one of several mutually exclusive ways, the total number of ways is the sum of the number of ways for each option.
Division Rule: If there is a k-to-1 correspondence between sets A and B (meaning each element in B corresponds to exactly k elements in A), then |A| = k * |B|.
Generalized Product Rule: If a sequence of k choices is made, and there are nᵢ options for choice i (where i ∈ {1, …, k}), and the number of options for each choice does not depend on previous choices, then the total number of ways is |A| = n₁ * n₂ * … * nₖ.
Bookkeeper Method (Permutations with Repetition): For an n-letter word with a₁ occurrences of letter l₁, …, aₖ occurrences of letter lₖ, the number of distinct rearrangements is: n! / (a₁! * … * aₖ!)
Stars and Bars (Donuts and Dividers): The number of ways to choose n items from k distinct categories with repetition allowed (e.g., n donuts from k flavors) is given by: C(n + k – 1, k – 1) or C(n + k – 1, n). This can be visualized as arranging n ‘stars’ (donuts) and k-1 ‘bars’ (dividers) in a line.

Relations & Functions

Types of Relations & Functions

A relation R ⊆ A × B has a domain A, codomain B, and is defined as a subset R of ordered pairs (a, b) where a ∈ A and b ∈ B.

Notation: aRb or R(a, b) means (a, b) ∈ R.

A relation R is a function if and only if for every element a ∈ A, there is at most one ordered pair (a, b) ∈ R.
A relation R is total if for every a ∈ A, there is at least one b ∈ B such that (a, b) ∈ R.
A total function has exactly one output for every input, meaning for every a ∈ A, there is exactly one b ∈ B such that (a, b) ∈ R.
A relation is injective (one-to-one) if for every b ∈ B, there is at most one a ∈ A such that (a, b) ∈ R. Alternatively, if f(a) = f(a′), then a = a′.
A relation is surjective (onto) if for every b ∈ B, there is at least one a ∈ A such that (a, b) ∈ R. Alternatively, for all b ∈ B, there exists some a ∈ A such that b = f(a).
A total function that is both injective and surjective is called a bijection.

Equivalence Relations & Partial Orders

Properties of Relations:

Reflexive: aRa for all a ∈ A
Symmetric: aRb if and only if bRa for all a, b ∈ A
Antisymmetric: If aRb and bRa, then a = b.
Transitive: If aRb and bRc, then aRc.

Types of Relations:

A relation R is an equivalence relation if it is reflexive, symmetric, and transitive.
A relation R is a weak partial order if it is reflexive, antisymmetric, and transitive.
A weak partial order (WPO) is a total ordering if every pair of elements is comparable (i.e., for any a, b ∈ A, either aRb or bRa).

Number Theory & Algorithms

Number Theory Fundamentals

Euclidean Algorithm: gcd(a, b) = gcd(b, a mod b). This algorithm efficiently finds the greatest common divisor (GCD) of two integers.
Pulverizer (Extended Euclidean Algorithm): Finds the GCD and also expresses it as a linear combination of the original numbers, which is useful for finding multiplicative inverses in modular arithmetic.
Fermat’s Little Theorem: If p is a prime number, then for any integer a not divisible by p, a^(p-1) ≡ 1 (mod p).

Matching Algorithms

Matching Algorithms: Algorithms like the Gale-Shapley algorithm find stable matchings. The Gale-Shapley algorithm is optimal for the proposing side and pessimal for the proposed side.
Stable Matching: A matching is stable if no ‘rogue couple’ can be formed, meaning there are no two unmatched individuals who would both prefer each other over their current partners.

Graph Theory: DAGs & Trees

Properties of Trees

A tree is an undirected graph that satisfies any of the following equivalent conditions:

Connected and acyclic.
Connected, and removing any single edge disconnects it.
Acyclic, but adding any single edge creates a cycle.
Every pair of nodes has a unique path between them.
Connected with n-1 edges (where n is the number of nodes).
Acyclic with n-1 edges.

If any of these facts are true, then all six are true, and the graph is a tree.

Every tree has at least two leaves (nodes with degree 1).
Removing a leaf from a tree produces a new valid tree.

DAGs & Scheduling

Hasse Diagram: A graphical representation of a finite partially ordered set, showing only the ‘covering’ relations (direct connections without intermediate elements).
Comparable Nodes: In a Hasse diagram, two nodes a and b are comparable if a can reach b or b can reach a (i.e., aRb or bRa).
Chain: A subset of nodes where every pair of nodes is comparable.
Critical Path: The maximum-length chain in a DAG, representing the longest sequence of dependent tasks.
Anti-chain: A subset of nodes where no two distinct nodes are comparable.
Shortest Parallel Schedule: The minimum time required to complete all tasks in a DAG when tasks can be run in parallel, equal to the length of the longest chain (critical path).

Topological Ordering

A Topological Order of a DAG is a linear ordering of all nodes such that for every directed edge from node u to node v, u comes before v in the ordering.