AI Essentials: Intelligent Agents, Search, and Decision Theory
1. Intelligent Agents
Key Concepts
- Agent: Maps percepts to actions.
- Intelligent Agent: An agent that acts rationally in its environment to achieve its goals.
- PEAS Framework:
- Performance Measure: Criteria for success.
- Environment: The external conditions.
- Actuators: Tools for acting.
- Sensors: Tools for perceiving.
Example
A robot vacuum:
- Performance: Amount of dirt cleaned.
- Environment: A home layout.
- Actuators: Wheels, suction.
- Sensors: Cameras, bump detectors.
2. Search
State Spaces
- Components: States, Initial State, Goal State, Transition Model, Path Cost.
- Example: Solving a maze:
- States: Positions in the maze.
- Transition: Moves to adjacent cells.
- Cost: 1 per step.
Search Algorithms
Uninformed Search:
- BFS (Breadth-First Search): Explores all nodes level by level.
- Uniform-Cost Search: Prioritizes paths with the lowest cost.
Informed Search:
- A* Search: f(n)=g(n)+h(n)
- g(n): Cost from the start to n.
- h(n): Estimated cost from n to the goal.
- Application: If g(n)=4 and h(n)=3, then f(n)=7.
- A* Search: f(n)=g(n)+h(n)
3. Decision Theory
Key Concepts
- Utility: A numeric representation of preferences.
- Expected Utility: EU(a)=∑ P(s∣a)U(s)
- P(s∣a): Probability of s given action a.
- U(s): Utility of state s.
Example
Choosing between actions with outcomes:
- Action A: 70% chance of +10, 30% chance of −5.
- Action B: 50% chance of +20, 50% chance of −15.
For Action A: EU(A)=(0.7)(10)+(0.3)(−5)=7−1.5=5.5
For Action B: EU(B)=(0.5)(20)+(0.5)(−15)=10−7.5=2.5
Choose Action A.
4. Supervised Learning
Key Formulas
Entropy: H(S)=−∑(pi)log2(pi)
- pi: Proportion of examples in class i.
Information Gain: IG=H(S)−weighted entropy of subsets
Dataset: 6 “spam”, 4 “legit.”
- Initial Entropy: H(S)=−[(6/10)log2(6/10)+(4/10)log2(4/10)]≈0.97
Split by feature:
- Subset 1: 3 “spam”, 1 “legit.”
- Subset 2: 3 “spam”, 3 “legit.”
Subset entropies:
- H(S1)=−[(3/4)log2(3/4)+(1/4)log2(1/4)]≈0.81
- H(S2)=−[(3/6log2(3/6)+(3/6)log2(3/6)]=1.0
Weighted entropy: H(Ssplit)=(4/10)(0.81)+(6/10)(1.0)=0.924
Information Gain: IG=0.97−0.924=0.046
5. Reinforcement Learning
Q-Learning Formula
Q(s,a)=Q(s,a)+α[r+γmaxa′Q(s′,a′)−Q(s,a)]
- α: Learning rate.
- γ: Discount factor.
- r: Reward.
- Q(s′,a′): Future state-action value.
Example
Given:
- Q(s,a)=2
- r=1
- γ=0.9
- maxa′Q(s′,a′)=3
- α=0.5
Calculation: Q(s,a)=2+0.5[1+0.9(3)−2]
Q(s,a)=2+0.5[1.7]=2.85.
6. Sequential Decision Making
Bellman Equation
V(s)=maxa∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]
Example
- States: s1,s2
- Actions: a1,a2
- Transition probabilities: P(s′∣s,a)
- Reward: R(s,a,s′)
Let:
- P(s2∣s1,a1)=0.8
- R(s1,a1,s2)=10
- V(s2)=15
- γ=0.9
Calculation: V(s1)=maxa∑s′P(s′∣s1,a)[R(s1,a,s′)+γV(s′)
For a1a: V(s1)=(0.8)[10+0.9(15)]= 0.8[23.5]= 18.8
7. Key General Concepts
Exploitation vs Exploration:
- Exploitation: Use known strategies.
- Exploration: Discover new strategies.
Goodhart’s Law:
- “When a measure becomes a target, it ceases to be a good measure.”
Generalization:
- Ensuring the model performs well on unseen data, not just the training set.
Markov Decision Processes (MDP)
Definition
MDPs provide a framework for sequential decision-making where outcomes are partly random and partly under the control of a decision-maker.
Components:
- States (S): The set of possible situations.
- Actions (A): The set of actions available to the agent.
- Transition Model (P(s′∣s,a): Probability of moving to state s′′ from state sss after action a.
- Rewards (R(s,a,s′)): Immediate reward received after transitioning from s to s′′ via a.
- Discount Factor (γ): Balances the importance of immediate rewards versus future rewards (0≤γ<1)
Utility of a Policy (piπ)
A policy π(s) defines the action to take in state s.
Utility of a state under a policy: Uπ(s)=∑s′P(s′∣s,π(s))[R(s,π(s),s′)+γUπ(s′)]
Optimal Policy: The policy that maximizes the expected utility for all states.
Value of a State
The maximum utility achievable from a state: V(s)=maxa∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]