Bayesian Networks and Probabilistic Graphical Models
1. Bayesian Networks (Directed Models)
Joint Probability Factorization:
P(X₁, X₂, ..., Xₙ) = Π P(Xᵢ | parents(Xᵢ))
Variable Types:
Observed: User inputs and sensor measurements (Uₜ, Zₜ)
Latent/Hidden: States and landmarks (Xₜ, L)
Example Factorization:
P(uₜ, l, xₜ, xₜ₊₁, zₜ, zₜ₊₁) = P(uₜ)P(l)P(xₜ|uₜ)P(xₜ₊₁|xₜ)P(zₜ|xₜ,l)P(zₜ₊₁|l,xₜ₊₁)
2. Conditional Independence and D-Separation
Blocking Rules:
Chain (A → B → C): Blocked if B is observed.
Fork (A ← B → C): Blocked if B is observed.
Collider (A → B ← C): Blocked if B is not observed; it is open if B or its descendants are observed.
Graphoid Axioms:
Decomposition: (X ⟂ Y, W | Z) ⇒ (X ⟂ Y | Z)
Weak Union: (X ⟂ Y, W | Z) ⇒ (X ⟂ Y | Z, W)
Contraction: (X ⟂ Y | Z) ∧ (X ⟂ W | Z, Y) ⇒ (X ⟂ Y, W | Z)
Intersection: (X ⟂ Y | Z, W) ∧ (X ⟂ W | Z, Y) ⇒ (X ⟂ Y, W | Z)
3. Inference Algorithms
Variable Elimination:
Sum out variables in order.
Store intermediate factors.
Belief (Sum-Product) Propagation:
Used for tree-structured graphs.
Messages: μ_{f→x} = Σ_{x_neighbors\f} f(X) Π μ_{y→f}
Max-Product Algorithm:
Find the Most Probable Explanation (MPE): argmax P(X | evidence).
Replace sums with max and track the argmax.
Marginal Probability Formula:
P(X₂ | X₄) = P(X₂, X₄) / Σ_{X₂} P(X₂, X₄)4. Junction Tree Algorithm
Steps:
Moralize: Marry all parents (drop directions and connect parents).
Triangulate: Add edges to eliminate cycles greater than three.
Form Clusters: Create clusters based on the elimination order.
Build Junction Tree: Connect clusters via separators.
Elimination Order Impact:
Good order: Results in small cliques and efficient inference.
Bad order: Results in large cliques and exponential complexity.
5. Treewidth
Definition:
Treewidth = min_over_orders(max_clique_size) - 1
Finding Treewidth:
Try different elimination orders.
Find the smallest possible maximum clique.
Subtract 1 from the result.
6. Factor Graphs
Conversion from Bayesian Network:
Create a factor node for each Conditional Probability Table (CPT).
Connect variables to their respective factors.
Example:
P(X₁)P(X₂)P(X₃|X₁,X₂)P(X₄|X₃)P(X₅|X₂) → Factors: f₁(X₁), f₂(X₂), f₁₂₃(X₁,X₂,X₃), f₃₄(X₃,X₄), f₂₅(X₂,X₅)
7. Parameter Learning (MLE)
Gaussian MLE:
μ̂ = (1/N) Σ Xᵢ σ̂² = (1/N) Σ (Xᵢ - μ̂)²
Linear Gaussian:
X₃ ~ N(w₀ + w₁X₁ + w₂X₂, σ²) Solve: A𝐰 = 𝐛 where Aᵢⱼ = Σ XᵢXⱼ, bᵢ = Σ X₃Xᵢ
8. Hidden Markov Models (HMMs)
Joint Probability:
P(X, Y) = P(X₁) Π P(Xₜ | Xₜ₋₁) Π P(Yₜ | Xₜ)
With Additional Potentials:
P(X, Y, Z) = P(X₁) Π P(Xₜ | Xₜ₋₁) Π P(Yₜ | Xₜ) Π ψ(Xₜ, Zₜ)
9. Key Probability Rules
Conditional Probability:
P(A | B) = P(A, B) / P(B)
Chain Rule:
P(A, B, C) = P(A) P(B | A) P(C | A, B)
Marginalization:
P(A) = Σ_{B, C} P(A, B, C)10. Common Patterns and Tricks
V-Structure (Collider):
Creates dependence when observed.
X → Z ← Y: X ⟂ Y but X ⟂̸ Y | Z.
Explaining Away:
Occurs when there are multiple causes for one effect.
Observing the effect makes the causes dependent on each other.
D-separation Quick Check:
Identify all paths between variables.
Check if any path is unblocked.
If all paths are blocked, the variables are independent.
Factorization Simplification:
Use conditional independence to reduce expressions:
P(A | B, C) = P(A | B) if A ⟂ C | B
