Deep Learning Fundamentals: GANs, Activation Functions, and Advanced RL
Generative Adversarial Networks (GANs)
A deep learning model featuring two competing neural networks (NNs) — the Generator and the Discriminator — designed to create realistic synthetic data.
(Mnemonic: The Generator creates fake data, and the Discriminator identifies it as real or fake.)
GAN Data Flow
Noise → Generator → Fake Data → Discriminator → Real/Fake
Core Components of GANs
- Generator: Creates synthetic data, attempting to fool the Discriminator (G).
- Discriminator: Evaluates data input, classifying it as real or fake (D).
GAN Working Principle
- The Generator (G) generates fake samples.
- The Discriminator (D) detects whether the input is fake or real.
- Both networks train simultaneously, leading to the generation of increasingly realistic fake data.
GAN Loss Function Concept
The objective is a minimax game:
min G max D [ log D(x) + log(1 − D(G(z))) ]
Applications of GANs
- Image and art generation
- Deepfakes creation
- Data augmentation
- Image enhancement and super-resolution
Summary: GAN = Generator + Discriminator. They compete to produce realistic synthetic data.
Deep Learning Visualization Techniques
These techniques are essential in Deep Learning and Machine Learning to understand model behavior, analyze how networks learn features, and explain prediction rationale.
(Visualization helps in understanding the model’s working and hidden features.)
Main Visualization Techniques
Activation Maps / Feature Maps
Show which parts of the input (image or text) activate specific neurons. Primarily used in CNNs to visualize what each layer detects.
Saliency Maps
Highlight the input pixels or features most responsible for a specific prediction. Useful for explaining image classification results.
Grad-CAM (Gradient-weighted Class Activation Mapping)
Visualizes the important regions in an image contributing to the prediction of a specific class.
t-SNE (t-Distributed Stochastic Neighbor Embedding)
Reduces high-dimensional data into 2D or 3D space for visual cluster analysis.
PCA (Principal Component Analysis)
Projects high-dimensional data onto its main components for simplified visualization.
Loss and Accuracy Plots
Display model training progress, helping identify issues like underfitting or overfitting.
Confusion Matrix
A visual display summarizing the performance of a classification model, showing correct versus incorrect predictions.
Uses of Visualization
- Debugging model behavior
- Interpreting predictions and model decisions
- Identifying bias or overfitting issues
- Improving architecture design
Neural Network Activation Functions
Activation functions are used in neural networks to introduce non-linearity, enabling the model to learn complex patterns.
(The activation function controls the neuron’s output and adds non-linearity to the network.)
1. Sigmoid Function
Equation
f(x) = 1 / (1 + e⁻ˣ)
- Range: (0, 1)
- Primary Use: Binary classification output layers
- Advantage: Smooth gradient, clear probability interpretation
- Disadvantage: Suffers from the vanishing gradient problem for large absolute values of x.
2. Tanh (Hyperbolic Tangent)
Equation
f(x) = tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)
- Range: (−1, 1)
- Primary Use: Hidden layers
- Advantage: Stronger gradient than Sigmoid, output is centered at 0
- Disadvantage: Still susceptible to the vanishing gradient problem.
3. ReLU (Rectified Linear Unit)
Equation
f(x) = max(0, x)
- Range: [0, ∞)
- Primary Use: Deep networks (CNNs, DNNs)
- Advantage: Computationally fast, avoids vanishing gradient for positive inputs.
- Disadvantage: Prone to the “Dead ReLU” problem when input x is less than 0.
Dynamic Programming in Reinforcement Learning
DP is a method used in RL to solve Markov Decision Processes (MDPs) by dividing them into smaller sub-problems. The goal is to find the optimal policy (π*) and optimal value function (V*).
Main Types of DP Algorithms
- Policy Evaluation: Calculates the value function (V) for a given policy (π).
- Policy Improvement: Updates the policy based on the calculated value function.
- Policy Iteration: Repeats evaluation and improvement until the policy stabilizes.
- Value Iteration: A faster method that directly computes the optimal value function (V*).
Value Iteration Algorithm
Value Iteration uses the Bellman Optimality Equation:
V_new(s) = max_a [ R(s,a) + γ * Σ P(s'|s,a) * V(s') ]
Steps for Value Iteration
- Initialize V(s) = 0 for all states.
- Update V(s) using the Bellman Optimality Equation.
- Repeat step 2 until convergence.
- Derive the optimal policy: π*(s) = argmaxₐ [ R + γ * Σ P * V ].
Example: In a Grid World, the agent updates state values until they stabilize, leading to the optimal path to the goal.
Fitted Q-Learning (FQL)
FQL is a Batch Reinforcement Learning method that learns the Q-function Q(s,a) using function approximators (such as decision trees or regression models), rather than a traditional Q-table.
(Instead of a Q-table, FQL fits a model to estimate Q-values.)
FQL Training Steps
- Collect a batch of transitions (s, a, r, s’).
- Train a regressor (function approximator) to minimize the squared error loss:
(r + γ * maxₐ′ Q(s′, a′) − Q(s, a))² - Repeat training on new data batches until the Q-function stabilizes.
- Use Case: Effective when the state–action space is large (often used in offline RL).
- Advantage: Capable of working with continuous state spaces.
Deep Q-Learning (DQN)
DQN is an improved version of Q-learning that utilizes a Deep Neural Network to approximate the Q-function Q(s,a).
(DQN replaces the traditional Q-table with a deep Convolutional Neural Network or standard Neural Network.)
Key Concepts in DQN
- Experience Replay: Stores transitions (s, a, r, s’) in a buffer and trains the network on random mini-batches from this buffer to break correlations in sequential data.
- Target Network: A fixed copy of the Q-network used to calculate the target Q-values, which stabilizes the training process.
DQN Update Equation
The Q-value is updated towards the target:
Q(s, a) ← r + γ * maxₐ′ Q_target(s′, a′)
- Use Cases: Solving complex tasks like Atari games, robotics, and control systems.
- Advantage: Effectively handles large and complex state spaces.
Inverse Reinforcement Learning (IRL)
IRL is an RL methodology where the primary goal is to infer the underlying reward function based on observing expert behavior (trajectories).
(The agent learns the reward function by observing the expert’s actions.)
IRL Core Idea
IRL is used when the reward function is unknown but demonstrations from an expert are available. The core idea is: If the expert follows an optimal policy (π*), find the reward function (R) such that π* is optimal under R.
IRL Steps
- Observe expert trajectories (state, action pairs).
- Estimate the reward function R(s, a) that best explains these observations.
- Learn an optimal policy based on the inferred reward function.
IRL Use Cases
- Autonomous driving (learning preferences from human drivers)
- Robotics imitation learning
- Healthcare decision systems modeling
Maximum Entropy IRL (MaxEnt IRL)
MaxEnt IRL is an extension that selects the reward function which makes the expert actions most likely while maintaining maximum randomness (entropy). This avoids bias toward a single solution.
(It chooses the most unbiased reward function that explains the expert’s demonstrations.)
MaxEnt Probability Formula
The probability of a trajectory (τ) is proportional to the exponentiated sum of rewards:
P(τ) ∝ e^(Σ_t R(s_t, a_t))
MaxEnt Advantages
- Avoids bias toward a single solution.
- Performs better in stochastic environments.
Maximum Entropy Deep IRL
- Uses deep neural networks to model the reward function R(s, a; θ).
- Allows learning complex, nonlinear reward structures from high-dimensional data.
- Applications: Autonomous driving, advanced robotics imitation, and video behavior learning.
