Bayesian Networks and Probabilistic Graphical Models

1. Bayesian Networks (Directed Models)

Joint Probability Factorization:

Formula:
P(X₁, X₂, ..., Xₙ) = Π P(Xᵢ | parents(Xᵢ))

Variable Types:

  • Observed: User inputs and sensor measurements (Uₜ, Zₜ)

  • Latent/Hidden: States and landmarks (Xₜ, L)

Example Factorization:

Formula:
P(uₜ, l, xₜ, xₜ₊₁, zₜ, zₜ₊₁) = P(uₜ)P(l)P(xₜ|uₜ)P(xₜ₊₁|xₜ)P(zₜ|xₜ,l)P(zₜ₊₁|l,xₜ₊₁)

2. Conditional Independence and D-Separation

Blocking Rules:

  • Chain (A → B → C): Blocked if B is observed.

  • Fork (A ← B → C): Blocked if B is observed.

  • Collider (A → B ← C): Blocked if B is not observed; it is open if B or its descendants are observed.

Graphoid Axioms:

  • Decomposition: (X ⟂ Y, W | Z) ⇒ (X ⟂ Y | Z)

  • Weak Union: (X ⟂ Y, W | Z) ⇒ (X ⟂ Y | Z, W)

  • Contraction: (X ⟂ Y | Z) ∧ (X ⟂ W | Z, Y) ⇒ (X ⟂ Y, W | Z)

  • Intersection: (X ⟂ Y | Z, W) ∧ (X ⟂ W | Z, Y) ⇒ (X ⟂ Y, W | Z)

3. Inference Algorithms

Variable Elimination:

  • Sum out variables in order.

  • Store intermediate factors.

Belief (Sum-Product) Propagation:

  • Used for tree-structured graphs.

  • Messages: μ_{f→x} = Σ_{x_neighbors\f} f(X) Π μ_{y→f}

Max-Product Algorithm:

  • Find the Most Probable Explanation (MPE): argmax P(X | evidence).

  • Replace sums with max and track the argmax.

Marginal Probability Formula:

Formula:
P(X₂ | X₄) = P(X₂, X₄) / Σ_{X₂} P(X₂, X₄)

4. Junction Tree Algorithm

Steps:

  1. Moralize: Marry all parents (drop directions and connect parents).

  2. Triangulate: Add edges to eliminate cycles greater than three.

  3. Form Clusters: Create clusters based on the elimination order.

  4. Build Junction Tree: Connect clusters via separators.

Elimination Order Impact:

  • Good order: Results in small cliques and efficient inference.

  • Bad order: Results in large cliques and exponential complexity.

5. Treewidth

Definition:

Formula:
Treewidth = min_over_orders(max_clique_size) - 1

Finding Treewidth:

  • Try different elimination orders.

  • Find the smallest possible maximum clique.

  • Subtract 1 from the result.

6. Factor Graphs

Conversion from Bayesian Network:

  • Create a factor node for each Conditional Probability Table (CPT).

  • Connect variables to their respective factors.

Example:

Formula:
P(X₁)P(X₂)P(X₃|X₁,X₂)P(X₄|X₃)P(X₅|X₂) → Factors: f₁(X₁), f₂(X₂), f₁₂₃(X₁,X₂,X₃), f₃₄(X₃,X₄), f₂₅(X₂,X₅)

7. Parameter Learning (MLE)

Gaussian MLE:

Formula:
μ̂ = (1/N) Σ Xᵢ
σ̂² = (1/N) Σ (Xᵢ - μ̂)²

Linear Gaussian:

Formula:
X₃ ~ N(w₀ + w₁X₁ + w₂X₂, σ²)
Solve: A𝐰 = 𝐛 where Aᵢⱼ = Σ XᵢXⱼ, bᵢ = Σ X₃Xᵢ

8. Hidden Markov Models (HMMs)

Joint Probability:

Formula:
P(X, Y) = P(X₁) Π P(Xₜ | Xₜ₋₁) Π P(Yₜ | Xₜ)

With Additional Potentials:

Formula:
P(X, Y, Z) = P(X₁) Π P(Xₜ | Xₜ₋₁) Π P(Yₜ | Xₜ) Π ψ(Xₜ, Zₜ)

9. Key Probability Rules

Conditional Probability:

Formula:
P(A | B) = P(A, B) / P(B)

Chain Rule:

Formula:
P(A, B, C) = P(A) P(B | A) P(C | A, B)

Marginalization:

Formula:
P(A) = Σ_{B, C} P(A, B, C)

10. Common Patterns and Tricks

V-Structure (Collider):

  • Creates dependence when observed.

  • X → Z ← Y: X ⟂ Y but X ⟂̸ Y | Z.

Explaining Away:

  • Occurs when there are multiple causes for one effect.

  • Observing the effect makes the causes dependent on each other.

D-separation Quick Check:

  1. Identify all paths between variables.

  2. Check if any path is unblocked.

  3. If all paths are blocked, the variables are independent.

Factorization Simplification:
Use conditional independence to reduce expressions:

Formula:
P(A | B, C) = P(A | B) if A ⟂ C | B