AI Essentials: Intelligent Agents, Search, and Decision Theory

1. Intelligent Agents

Key Concepts

  • Agent: Maps percepts to actions.
  • Intelligent Agent: An agent that acts rationally in its environment to achieve its goals.
  • PEAS Framework:
    • Performance Measure: Criteria for success.
    • Environment: The external conditions.
    • Actuators: Tools for acting.
    • Sensors: Tools for perceiving.

Example

A robot vacuum:

  • Performance: Amount of dirt cleaned.
  • Environment: A home layout.
  • Actuators: Wheels, suction.
  • Sensors: Cameras, bump detectors.

2. Search

State Spaces

  • Components: States, Initial State, Goal State, Transition Model, Path Cost.
  • Example: Solving a maze:
    • States: Positions in the maze.
    • Transition: Moves to adjacent cells.
    • Cost: 1 per step.

Search Algorithms

  1. Uninformed Search:

    • BFS (Breadth-First Search): Explores all nodes level by level.
    • Uniform-Cost Search: Prioritizes paths with the lowest cost.
  2. Informed Search:

    • A* Search: f(n)=g(n)+h(n)
      • g(n): Cost from the start to n.
      • h(n): Estimated cost from n to the goal.
    • Application: If g(n)=4 and h(n)=3, then f(n)=7.

3. Decision Theory

Key Concepts

  • Utility: A numeric representation of preferences.
  • Expected Utility: EU(a)=∑ P(s∣a)U(s)
    • P(s∣a): Probability of s given action a.
    • U(s): Utility of state s.

Example

Choosing between actions with outcomes:

  • Action A: 70% chance of +10, 30% chance of −5.
  • Action B: 50% chance of +20, 50% chance of −15.

For Action A: EU(A)=(0.7)(10)+(0.3)(−5)=7−1.5=5.5

For Action B: EU(B)=(0.5)(20)+(0.5)(−15)=10−7.5=2.5

Choose Action A.


4. Supervised Learning

Key Formulas

  1. Entropy: H(S)=−∑(pi)log⁡2(pi)​

    • pi​: Proportion of examples in class i.
  2. Information Gain: IG=H(S)−weighted entropy of subsets

Dataset: 6 “spam”, 4 “legit.”

  • Initial Entropy: H(S)=−[(6/10)log⁡2(6/10)+(4/10)log⁡2(4/10)]≈0.97

Split by feature:

  • Subset 1: 3 “spam”, 1 “legit.”
  • Subset 2: 3 “spam”, 3 “legit.”

Subset entropies:

  • H(S1)=−[(3/4)log⁡2(3/4)+(1/4)log⁡2(1/4)]≈0.81
  • H(S2)=−[(3/6log⁡2(3/6)+(3/6)log⁡2(3/6)]=1.0

Weighted entropy: H(Ssplit)=(4/10)(0.81)+(6/10)(1.0)=0.924

Information Gain: IG=0.97−0.924=0.046


5. Reinforcement Learning

Q-Learning Formula

Q(s,a)=Q(s,a)+α[r+γmax⁡a′Q(s′,a′)−Q(s,a)]

  • α: Learning rate.
  • γ: Discount factor.
  • r: Reward.
  • Q(s′,a′): Future state-action value.

Example

Given:

  • Q(s,a)=2
  • r=1
  • γ=0.9
  • max⁡a′Q(s′,a′)=3
  • α=0.5

Calculation: Q(s,a)=2+0.5[1+0.9(3)−2]

Q(s,a)=2+0.5[1.7]=2.85.


6. Sequential Decision Making

Bellman Equation

V(s)=max⁡a∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]

Example

  • States: s1,s2
  • Actions: a1,a2
  • Transition probabilities: P(s′∣s,a)
  • Reward: R(s,a,s′)

Let:

  • P(s2∣s1,a1)=0.8
  • R(s1,a1,s2)=10
  • V(s2)=15
  • γ=0.9

Calculation: V(s1)=max⁡a∑s′P(s′∣s1,a)[R(s1,a,s′)+γV(s′)

For a1a: V(s1)=(0.8)[10+0.9(15)]= 0.8[23.5]= 18.8


7. Key General Concepts

  1. Exploitation vs Exploration:

    • Exploitation: Use known strategies.
    • Exploration: Discover new strategies.
  2. Goodhart’s Law:

    • “When a measure becomes a target, it ceases to be a good measure.”
  3. Generalization:

    • Ensuring the model performs well on unseen data, not just the training set.

Markov Decision Processes (MDP)

Definition

MDPs provide a framework for sequential decision-making where outcomes are partly random and partly under the control of a decision-maker.

Components:

  1. States (S): The set of possible situations.
  2. Actions (A): The set of actions available to the agent.
  3. Transition Model (P(s′∣s,a): Probability of moving to state s′′ from state sss after action a.
  4. Rewards (R(s,a,s′)): Immediate reward received after transitioning from s to s′′ via a.
  5. Discount Factor (γ): Balances the importance of immediate rewards versus future rewards (0≤γ<1)

Utility of a Policy (piπ)

A policy π(s) defines the action to take in state s.

Utility of a state under a policy: Uπ(s)=∑s′P(s′∣s,π(s))[R(s,π(s),s′)+γUπ(s′)]

Optimal Policy: The policy that maximizes the expected utility for all states.

Value of a State

The maximum utility achievable from a state: V(s)=max⁡a∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]