Deep Learning Fundamentals: GANs, Activation Functions, and Advanced RL

Posted on Dec 3, 2025 in Mathematics and Computer Science

Generative Adversarial Networks (GANs)

A deep learning model featuring two competing neural networks (NNs) — the Generator and the Discriminator — designed to create realistic synthetic data.

(Mnemonic: The Generator creates fake data, and the Discriminator identifies it as real or fake.)

GAN Data Flow

Noise → Generator → Fake Data → Discriminator → Real/Fake

Core Components of GANs

Generator: Creates synthetic data, attempting to fool the Discriminator (G).
Discriminator: Evaluates data input, classifying it as real or fake (D).

GAN Working Principle

The Generator (G) generates fake samples.
The Discriminator (D) detects whether the input is fake or real.
Both networks train simultaneously, leading to the generation of increasingly realistic fake data.

GAN Loss Function Concept

The objective is a minimax game:

min G max D [ log D(x) + log(1 − D(G(z))) ]

Applications of GANs

Image and art generation
Deepfakes creation
Data augmentation
Image enhancement and super-resolution

Summary: GAN = Generator + Discriminator. They compete to produce realistic synthetic data.

Deep Learning Visualization Techniques

These techniques are essential in Deep Learning and Machine Learning to understand model behavior, analyze how networks learn features, and explain prediction rationale.

(Visualization helps in understanding the model’s working and hidden features.)

Main Visualization Techniques

Activation Maps / Feature Maps
Show which parts of the input (image or text) activate specific neurons. Primarily used in CNNs to visualize what each layer detects.
Saliency Maps
Highlight the input pixels or features most responsible for a specific prediction. Useful for explaining image classification results.
Grad-CAM (Gradient-weighted Class Activation Mapping)
Visualizes the important regions in an image contributing to the prediction of a specific class.
t-SNE (t-Distributed Stochastic Neighbor Embedding)
Reduces high-dimensional data into 2D or 3D space for visual cluster analysis.
PCA (Principal Component Analysis)
Projects high-dimensional data onto its main components for simplified visualization.
Loss and Accuracy Plots
Display model training progress, helping identify issues like underfitting or overfitting.
Confusion Matrix
A visual display summarizing the performance of a classification model, showing correct versus incorrect predictions.

Uses of Visualization

Debugging model behavior
Interpreting predictions and model decisions
Identifying bias or overfitting issues
Improving architecture design

Neural Network Activation Functions

Activation functions are used in neural networks to introduce non-linearity, enabling the model to learn complex patterns.

(The activation function controls the neuron’s output and adds non-linearity to the network.)

1. Sigmoid Function

Equation

f(x) = 1 / (1 + e⁻ˣ)

Range: (0, 1)
Primary Use: Binary classification output layers
Advantage: Smooth gradient, clear probability interpretation
Disadvantage: Suffers from the vanishing gradient problem for large absolute values of x.

2. Tanh (Hyperbolic Tangent)

Equation

f(x) = tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)

Range: (−1, 1)
Primary Use: Hidden layers
Advantage: Stronger gradient than Sigmoid, output is centered at 0
Disadvantage: Still susceptible to the vanishing gradient problem.

3. ReLU (Rectified Linear Unit)

Equation

f(x) = max(0, x)

Range: [0, ∞)
Primary Use: Deep networks (CNNs, DNNs)
Advantage: Computationally fast, avoids vanishing gradient for positive inputs.
Disadvantage: Prone to the “Dead ReLU” problem when input x is less than 0.

Dynamic Programming in Reinforcement Learning

DP is a method used in RL to solve Markov Decision Processes (MDPs) by dividing them into smaller sub-problems. The goal is to find the optimal policy (π*) and optimal value function (V*).

Main Types of DP Algorithms

Policy Evaluation: Calculates the value function (V) for a given policy (π).
Policy Improvement: Updates the policy based on the calculated value function.
Policy Iteration: Repeats evaluation and improvement until the policy stabilizes.
Value Iteration: A faster method that directly computes the optimal value function (V*).

Value Iteration Algorithm

Value Iteration uses the Bellman Optimality Equation:

V_new(s) = max_a [ R(s,a) + γ * Σ P(s'|s,a) * V(s') ]

Steps for Value Iteration

Initialize V(s) = 0 for all states.
Update V(s) using the Bellman Optimality Equation.
Repeat step 2 until convergence.
Derive the optimal policy: π*(s) = argmaxₐ [ R + γ * Σ P * V ].

Example: In a Grid World, the agent updates state values until they stabilize, leading to the optimal path to the goal.

Fitted Q-Learning (FQL)

FQL is a Batch Reinforcement Learning method that learns the Q-function Q(s,a) using function approximators (such as decision trees or regression models), rather than a traditional Q-table.

(Instead of a Q-table, FQL fits a model to estimate Q-values.)

FQL Training Steps

Collect a batch of transitions (s, a, r, s’).
Train a regressor (function approximator) to minimize the squared error loss:
(r + γ * maxₐ′ Q(s′, a′) − Q(s, a))²
Repeat training on new data batches until the Q-function stabilizes.

Use Case: Effective when the state–action space is large (often used in offline RL).
Advantage: Capable of working with continuous state spaces.

Deep Q-Learning (DQN)

DQN is an improved version of Q-learning that utilizes a Deep Neural Network to approximate the Q-function Q(s,a).

(DQN replaces the traditional Q-table with a deep Convolutional Neural Network or standard Neural Network.)

Key Concepts in DQN

Experience Replay: Stores transitions (s, a, r, s’) in a buffer and trains the network on random mini-batches from this buffer to break correlations in sequential data.
Target Network: A fixed copy of the Q-network used to calculate the target Q-values, which stabilizes the training process.

DQN Update Equation

The Q-value is updated towards the target:

Q(s, a) ← r + γ * maxₐ′ Q_target(s′, a′)

Use Cases: Solving complex tasks like Atari games, robotics, and control systems.
Advantage: Effectively handles large and complex state spaces.

Inverse Reinforcement Learning (IRL)

IRL is an RL methodology where the primary goal is to infer the underlying reward function based on observing expert behavior (trajectories).

(The agent learns the reward function by observing the expert’s actions.)

IRL Core Idea

IRL is used when the reward function is unknown but demonstrations from an expert are available. The core idea is: If the expert follows an optimal policy (π*), find the reward function (R) such that π* is optimal under R.

IRL Steps

Observe expert trajectories (state, action pairs).
Estimate the reward function R(s, a) that best explains these observations.
Learn an optimal policy based on the inferred reward function.

IRL Use Cases

Autonomous driving (learning preferences from human drivers)
Robotics imitation learning
Healthcare decision systems modeling

Maximum Entropy IRL (MaxEnt IRL)

MaxEnt IRL is an extension that selects the reward function which makes the expert actions most likely while maintaining maximum randomness (entropy). This avoids bias toward a single solution.

(It chooses the most unbiased reward function that explains the expert’s demonstrations.)

MaxEnt Probability Formula

The probability of a trajectory (τ) is proportional to the exponentiated sum of rewards:

P(τ) ∝ e^(Σ_t R(s_t, a_t))

MaxEnt Advantages

Avoids bias toward a single solution.
Performs better in stochastic environments.

Maximum Entropy Deep IRL

Uses deep neural networks to model the reward function R(s, a; θ).
Allows learning complex, nonlinear reward structures from high-dimensional data.
Applications: Autonomous driving, advanced robotics imitation, and video behavior learning.

Deep Learning Fundamentals: GANs, Activation Functions, and Advanced RL

Generative Adversarial Networks (GANs)

GAN Data Flow

Core Components of GANs

GAN Working Principle

GAN Loss Function Concept

Applications of GANs

Deep Learning Visualization Techniques

Main Visualization Techniques

Activation Maps / Feature Maps

Saliency Maps

Grad-CAM (Gradient-weighted Class Activation Mapping)

t-SNE (t-Distributed Stochastic Neighbor Embedding)

PCA (Principal Component Analysis)

Loss and Accuracy Plots

Confusion Matrix

Uses of Visualization

Neural Network Activation Functions

1. Sigmoid Function

Equation

2. Tanh (Hyperbolic Tangent)

Equation

3. ReLU (Rectified Linear Unit)

Equation

Dynamic Programming in Reinforcement Learning

Main Types of DP Algorithms

Value Iteration Algorithm

Steps for Value Iteration

Fitted Q-Learning (FQL)

FQL Training Steps

Deep Q-Learning (DQN)

Key Concepts in DQN

DQN Update Equation

Inverse Reinforcement Learning (IRL)

IRL Core Idea

IRL Steps

IRL Use Cases

Maximum Entropy IRL (MaxEnt IRL)

MaxEnt Probability Formula

MaxEnt Advantages

Maximum Entropy Deep IRL

Recent Notes

Subjects

Publicidad