AI Essentials: Intelligent Agents, Search, and Decision Theory

Posted on Mar 4, 2025 in Business Administration and Innovation Management

1. Intelligent Agents

Key Concepts

Agent: Maps percepts to actions.
Intelligent Agent: An agent that acts rationally in its environment to achieve its goals.
PEAS Framework:
- Performance Measure: Criteria for success.
- Environment: The external conditions.
- Actuators: Tools for acting.
- Sensors: Tools for perceiving.

Example

A robot vacuum:

Performance: Amount of dirt cleaned.
Environment: A home layout.
Actuators: Wheels, suction.
Sensors: Cameras, bump detectors.

2. Search

State Spaces

Components: States, Initial State, Goal State, Transition Model, Path Cost.
Example: Solving a maze:
- States: Positions in the maze.
- Transition: Moves to adjacent cells.
- Cost: 1 per step.

Search Algorithms

Uninformed Search:
- BFS (Breadth-First Search): Explores all nodes level by level.
- Uniform-Cost Search: Prioritizes paths with the lowest cost.
Informed Search:
- A* Search: f(n)=g(n)+h(n)
  - g(n): Cost from the start to n.
  - h(n): Estimated cost from n to the goal.
- Application: If g(n)=4 and h(n)=3, then f(n)=7.

3. Decision Theory

Key Concepts

Utility: A numeric representation of preferences.
Expected Utility: EU(a)=∑ P(s∣a)U(s)
- P(s∣a): Probability of s given action a.
- U(s): Utility of state s.

Example

Choosing between actions with outcomes:

Action A: 70% chance of +10, 30% chance of −5.
Action B: 50% chance of +20, 50% chance of −15.

For Action A: EU(A)=(0.7)(10)+(0.3)(−5)=7−1.5=5.5

For Action B: EU(B)=(0.5)(20)+(0.5)(−15)=10−7.5=2.5

Choose Action A.

4. Supervised Learning

Key Formulas

Entropy: H(S)=−∑(pi)log⁡2(pi)
- pi: Proportion of examples in class i.
Information Gain: IG=H(S)−weighted entropy of subsets

Dataset: 6 “spam”, 4 “legit.”

Initial Entropy: H(S)=−[(6/10)log⁡2(6/10)+(4/10)log⁡2(4/10)]≈0.97

Split by feature:

Subset 1: 3 “spam”, 1 “legit.”
Subset 2: 3 “spam”, 3 “legit.”

Subset entropies:

H(S1)=−[(3/4)log⁡2(3/4)+(1/4)log⁡2(1/4)]≈0.81
H(S2)=−[(3/6log⁡2(3/6)+(3/6)log⁡2(3/6)]=1.0

Weighted entropy: H(Ssplit)=(4/10)(0.81)+(6/10)(1.0)=0.924

Information Gain: IG=0.97−0.924=0.046

5. Reinforcement Learning

Q-Learning Formula

Q(s,a)=Q(s,a)+α[r+γmax⁡a′Q(s′,a′)−Q(s,a)]

α: Learning rate.
γ: Discount factor.
r: Reward.
Q(s′,a′): Future state-action value.

Example

Given:

Q(s,a)=2
r=1
γ=0.9
max⁡a′Q(s′,a′)=3
α=0.5

Calculation: Q(s,a)=2+0.5[1+0.9(3)−2]

Q(s,a)=2+0.5[1.7]=2.85.

6. Sequential Decision Making

Bellman Equation

V(s)=max⁡a∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]

Example

States: s1,s2
Actions: a1,a2
Transition probabilities: P(s′∣s,a)
Reward: R(s,a,s′)

Let:

P(s2∣s1,a1)=0.8
R(s1,a1,s2)=10
V(s2)=15
γ=0.9

Calculation: V(s1)=max⁡a∑s′P(s′∣s1,a)[R(s1,a,s′)+γV(s′)

For a1a: V(s1)=(0.8)[10+0.9(15)]= 0.8[23.5]= 18.8

7. Key General Concepts

Exploitation vs Exploration:
- Exploitation: Use known strategies.
- Exploration: Discover new strategies.
Goodhart’s Law:
- “When a measure becomes a target, it ceases to be a good measure.”
Generalization:
- Ensuring the model performs well on unseen data, not just the training set.

Markov Decision Processes (MDP)

Definition

MDPs provide a framework for sequential decision-making where outcomes are partly random and partly under the control of a decision-maker.

Components:

States (S): The set of possible situations.
Actions (A): The set of actions available to the agent.
Transition Model (P(s′∣s,a): Probability of moving to state s′′ from state sss after action a.
Rewards (R(s,a,s′)): Immediate reward received after transitioning from s to s′′ via a.
Discount Factor (γ): Balances the importance of immediate rewards versus future rewards (0≤γ<1)

Utility of a Policy (piπ)

A policy π(s) defines the action to take in state s.

Utility of a state under a policy: Uπ(s)=∑s′P(s′∣s,π(s))[R(s,π(s),s′)+γUπ(s′)]

Optimal Policy: The policy that maximizes the expected utility for all states.

Value of a State

The maximum utility achievable from a state: V(s)=max⁡a∑s′P(s′∣s,a)[R(s,a,s′)+γV(s′)]

AI Essentials: Intelligent Agents, Search, and Decision Theory

1. Intelligent Agents

Key Concepts

Example

2. Search

State Spaces

Search Algorithms

3. Decision Theory

Key Concepts

Example

4. Supervised Learning

Key Formulas

5. Reinforcement Learning

Q-Learning Formula

Example

6. Sequential Decision Making

Bellman Equation

Example

7. Key General Concepts

Markov Decision Processes (MDP)

Definition

Utility of a Policy (piπ)

Optimal Policy: The policy that maximizes the expected utility for all states.

Value of a State

Recent Notes

Subjects

Publicidad