Probabilistic Reasoning & AI Decision Making

Probabilistic Inference

  1. What Is Probability?

    • A measure P assigning each event A a number in [0,1].

    • Axioms of Probability

      1. P(∅) = 0, P(Ω) = 1

      2. For disjoint A, B, P(A∪B) = P(A) + P(B)

    • Conditional Probability: P(A|B) = P(A∧B) / P(B)

  2. Bayesian Inference Fundamentals

    • Bayes’ Rule

      P(H|E) = [P(E|H) * P(H)] / P(E)

    • Key Terms in Bayes’ Rule

      • Prior P(H): initial belief

      • Likelihood P(E|H)

      • Posterior P(H|E): updated belief

  3. Bayesian Networks (BNs)

    • Structure: Directed acyclic graph; nodes = variables; edges = dependencies.

    • Factorization:
      P(X1,...,Xn) = ∏i P(Xi | Parents(Xi))

    • Monty Hall Problem Example

      • Nodes: PrizeDoor → ContestantPick → HostOpens → SwitchDecision

      • Observing HostOpens updates P(PrizeDoor) via Bayes’ rule.

  4. Inference Algorithms on BNs

    • Exact Inference Methods

      • Enumeration/Variable Elimination: Sum out irrelevant variables in a chosen order.

      • Complexity: Exponential in treewidth.

    • Approximate Inference Methods

      • Rejection Sampling: Sample full joint, discard inconsistent with evidence.

      • Likelihood Weighting: Weight samples by evidence probability.

      • Gibbs Sampling: Iterate sampling each variable given Markov blanket.

  5. Temporal Models: Hidden Markov Models (HMMs)

    • HMM Components

      • Hidden states St, observations Ot

      • Transition P(St | St-1)

      • Emission P(Ot | St)

      • Initial P(S0)

    • Key HMM Tasks

      • Filtering: P(St | O1:t)

      • Prediction: P(St+k | O1:t)

      • Smoothing: P(Sk | O1:T)

      • Viterbi: Most likely state sequence


Utility & Decision Making in AI

  1. Understanding Utility

    • Numeric score U(s) representing preference over outcome s.

    • Higher score = more preferred.

  2. Expected Utility Theory

    EU(a) = ∑s' P(s'|a) U(s')

    • Choose action a maximizing EU(a).

  3. Markov Decision Processes (MDPs)

    • MDP Tuple Definition

      • S: states, A: actions

      • T(s,a,s') = P(s'|s,a)

      • R(s,a): immediate reward

      • γ ∈ [0,1]: discount factor

  4. Value Iteration Algorithm

    • Initialize V0(s) arbitrarily.

    • Iteratively update:

      Vk+1(s) = max_a [R(s,a) + γ ∑s' T(s,a,s') Vk(s')]

    • Converges to V*; derive π*(s) = argmax_a [...].

  5. Policy Iteration Algorithm

    1. Policy Evaluation: Solve for under current π.

    2. Policy Improvement:
      π'(s) = argmax_a [R(s,a) + γ ∑s' T(s,a,s') Vπ(s')]

    3. Repeat until π stabilizes.