Probabilistic Reasoning & AI Decision Making
Probabilistic Inference
What Is Probability?
A measure
P
assigning each eventA
a number in[0,1]
.Axioms of Probability
P(∅) = 0
,P(Ω) = 1
For disjoint
A, B
,P(A∪B) = P(A) + P(B)
Conditional Probability:
P(A|B) = P(A∧B) / P(B)
Bayesian Inference Fundamentals
Bayes’ Rule
P(H|E) = [P(E|H) * P(H)] / P(E)
Key Terms in Bayes’ Rule
Prior
P(H)
: initial beliefLikelihood
P(E|H)
Posterior
P(H|E)
: updated belief
Bayesian Networks (BNs)
Structure: Directed acyclic graph; nodes = variables; edges = dependencies.
Factorization:
P(X1,...,Xn) = ∏i P(Xi | Parents(Xi))
Monty Hall Problem Example
Nodes: PrizeDoor → ContestantPick → HostOpens → SwitchDecision
Observing HostOpens updates
P(PrizeDoor)
via Bayes’ rule.
Inference Algorithms on BNs
Exact Inference Methods
Enumeration/Variable Elimination: Sum out irrelevant variables in a chosen order.
Complexity: Exponential in treewidth.
Approximate Inference Methods
Rejection Sampling: Sample full joint, discard inconsistent with evidence.
Likelihood Weighting: Weight samples by evidence probability.
Gibbs Sampling: Iterate sampling each variable given Markov blanket.
Temporal Models: Hidden Markov Models (HMMs)
HMM Components
Hidden states
St
, observationsOt
Transition
P(St | St-1)
Emission
P(Ot | St)
Initial
P(S0)
Key HMM Tasks
Filtering:
P(St | O1:t)
Prediction:
P(St+k | O1:t)
Smoothing:
P(Sk | O1:T)
Viterbi: Most likely state sequence
Utility & Decision Making in AI
Understanding Utility
Numeric score
U(s)
representing preference over outcomes
.Higher score = more preferred.
Expected Utility Theory
EU(a) = ∑s' P(s'|a) U(s')
Choose action
a
maximizingEU(a)
.
Markov Decision Processes (MDPs)
MDP Tuple Definition
S
: states,A
: actionsT(s,a,s') = P(s'|s,a)
R(s,a)
: immediate rewardγ ∈ [0,1]
: discount factor
Value Iteration Algorithm
Initialize
V0(s)
arbitrarily.Iteratively update:
Vk+1(s) = max_a [R(s,a) + γ ∑s' T(s,a,s') Vk(s')]
Converges to
V*
; deriveπ*(s) = argmax_a [...]
.
Policy Iteration Algorithm
Policy Evaluation: Solve for
Vπ
under currentπ
.Policy Improvement:
π'(s) = argmax_a [R(s,a) + γ ∑s' T(s,a,s') Vπ(s')]
Repeat until
π
stabilizes.