Probabilistic Reasoning & AI Decision Making
Probabilistic Inference
What Is Probability?
A measure
Passigning each eventAa number in[0,1].Axioms of Probability
P(∅) = 0,P(Ω) = 1For disjoint
A, B,P(A∪B) = P(A) + P(B)
Conditional Probability:
P(A|B) = P(A∧B) / P(B)
Bayesian Inference Fundamentals
Bayes’ Rule
P(H|E) = [P(E|H) * P(H)] / P(E)Key Terms in Bayes’ Rule
Prior
P(H): initial beliefLikelihood
P(E|H)Posterior
P(H|E): updated belief
Bayesian Networks (BNs)
Structure: Directed acyclic graph; nodes = variables; edges = dependencies.
Factorization:
P(X1,...,Xn) = ∏i P(Xi | Parents(Xi))Monty Hall Problem Example
Nodes: PrizeDoor → ContestantPick → HostOpens → SwitchDecision
Observing HostOpens updates
P(PrizeDoor)via Bayes’ rule.
Inference Algorithms on BNs
Exact Inference Methods
Enumeration/Variable Elimination: Sum out irrelevant variables in a chosen order.
Complexity: Exponential in treewidth.
Approximate Inference Methods
Rejection Sampling: Sample full joint, discard inconsistent with evidence.
Likelihood Weighting: Weight samples by evidence probability.
Gibbs Sampling: Iterate sampling each variable given Markov blanket.
Temporal Models: Hidden Markov Models (HMMs)
HMM Components
Hidden states
St, observationsOtTransition
P(St | St-1)Emission
P(Ot | St)Initial
P(S0)
Key HMM Tasks
Filtering:
P(St | O1:t)Prediction:
P(St+k | O1:t)Smoothing:
P(Sk | O1:T)Viterbi: Most likely state sequence
Utility & Decision Making in AI
Understanding Utility
Numeric score
U(s)representing preference over outcomes.Higher score = more preferred.
Expected Utility Theory
EU(a) = ∑s' P(s'|a) U(s')Choose action
amaximizingEU(a).
Markov Decision Processes (MDPs)
MDP Tuple Definition
S: states,A: actionsT(s,a,s') = P(s'|s,a)R(s,a): immediate rewardγ ∈ [0,1]: discount factor
Value Iteration Algorithm
Initialize
V0(s)arbitrarily.Iteratively update:
Vk+1(s) = max_a [R(s,a) + γ ∑s' T(s,a,s') Vk(s')]Converges to
V*; deriveπ*(s) = argmax_a [...].
Policy Iteration Algorithm
Policy Evaluation: Solve for
Vπunder currentπ.Policy Improvement:
π'(s) = argmax_a [R(s,a) + γ ∑s' T(s,a,s') Vπ(s')]Repeat until
πstabilizes.
