AI, Machine Learning, and Deep Learning: Core Concepts
The Relationship Between AI, ML, and DL
The correlation between these three fields is best understood as a hierarchical relationship where each is a sub-field of the previous one:
- Artificial Intelligence (AI): The broad field of creating systems capable of performing tasks that typically require human intelligence (e.g., reasoning and problem-solving).
- Machine Learning (ML): A subset of AI that focuses on the use of algorithms and statistical models to allow computers to learn from data without being explicitly programmed.
- Deep Learning (DL): A specialized subset of ML based on Artificial Neural Networks with multiple layers. It is designed to mimic the human brain to process complex patterns like speech and images.
Supervised vs. Unsupervised Learning
| Feature | Supervised Learning | Unsupervised Learning |
|---|---|---|
| Data Type | Uses Labeled data (Input-Output pairs). | Uses Unlabeled data (Input only). |
| Process | Learns a function to map inputs to a known output. | Discovers hidden patterns or structures in data. |
| Goal | Prediction: Categorizing data or forecasting outcomes. | Association: Grouping data by similarities (Clustering). |
| Example | Predicting house prices; Email spam filtering. | Customer segmentation; Recommendation systems. |
Limitations of Machine Learning
- Data Dependency and Quality: ML models are only as good as the data they are trained on (“Garbage In, Garbage Out”). If the data is biased, incomplete, or noisy, the model will produce inaccurate or unfair results.
- The “Black Box” Problem (Interpretability): Deep Learning models are often “black boxes.” While they might provide an accurate prediction, it is often difficult for humans to understand why the model made a specific decision.
- High Computational Cost: Training sophisticated models—especially Deep Learning architectures—requires massive amounts of processing power and high energy consumption.
- Overfitting and Underfitting: Overfitting occurs when the model learns the training data “too well,” including noise, and fails to generalize. Underfitting occurs when the model is too simple to capture underlying patterns.
- Lack of Common Sense and Context: ML models operate strictly within the statistical patterns of their training data and cannot reason using intuition or social context.
Deep Learning: Challenges and Propagation
Deep Learning (DL) utilizes Artificial Neural Networks with multiple layers to model complex patterns. Unlike traditional ML, DL algorithms can automatically discover features through their layered architecture.
Challenges of Deep Learning
- Massive Data Requirements: DL models are data-hungry and often require millions of labeled data points.
- High Computational Resources: Training requires immense processing power, typically necessitating high-end GPUs or TPUs.
- The “Black Box” Problem: The internal mathematical path taken to reach a conclusion is often too complex for humans to trace.
- Overfitting: Due to millions of parameters, DL models are prone to memorizing noise rather than learning patterns.
- Vanishing and Exploding Gradients: During backpropagation, mathematical signals can become infinitely small or large, preventing effective learning.
Forward vs. Backward Propagation
| Feature | Forward Propagation | Backward Propagation |
|---|---|---|
| Primary Purpose | To make a prediction. | To learn/update weights. |
| Starting Point | Input Layer. | Output Layer (Loss function). |
| Mathematical Tool | Matrix Multiplication & Activation Functions. | Chain Rule & Derivatives (Gradients). |
| Weights/Biases | Remains constant. | Are updated/modified. |
Industrial Tools for Deep Learning
- TensorFlow (Google): An end-to-end open-source platform known for scalability across CPUs, GPUs, and mobile devices.
- PyTorch (Meta): A premier library favored for research due to its “Dynamic Computational Graph.”
- Keras: A high-level API written in Python, designed for human readability and rapid prototyping.
- Apache MXNet: A framework designed for efficiency and flexibility, used as the official framework for AWS.
Real-Life Applications of Deep Learning
- Healthcare: Analyzing medical images like X-rays, MRIs, and CT scans.
- Autonomous Vehicles: Processing real-time sensor data to navigate and detect obstacles.
- Natural Language Processing: Powering virtual assistants like Siri, Alexa, and ChatGPT.
- Financial Services: Monitoring transactions in real-time to detect and prevent fraud.
Sentiment Analysis and Its Types
Sentiment Analysis is a branch of NLP that identifies and studies subjective information within text to determine its “emotional tone.”
- Fine-grained Sentiment Analysis: Breaks sentiment into specific polarity levels (e.g., a 5-point scale).
- Emotion Detection: Identifies specific emotional states beyond simple positive/negative polarity.
- Aspect-based Sentiment Analysis (ABSA): Focuses on specific components of a product or service.
Bias, Variance, and the Bias-Variance Tradeoff
- Bias: Error due to overly simple assumptions. High bias leads to underfitting.
- Variance: Error due to excessive sensitivity to small fluctuations in the training set. High variance leads to overfitting.
- The Bias-Variance Tradeoff: The goal is to find the “sweet spot” of model complexity that minimizes total error.
Underfitting, Overfitting, and Regularization
- Underfitting: Model is too simple; high error on both training and testing data. Fix: Increase complexity.
- Overfitting: Model is too complex; low error on training data but high error on new data. Fix: Use more data or regularization.
- Regularization: A technique that adds a “penalty” to the loss function to discourage overly complex models.
| Term | Underfitting | Overfitting |
|---|---|---|
| Data Fit | Poor | Too Good |
| Complexity | Too Low | Too High |
| Error Source | High Bias | High Variance |
Activation Functions
An Activation Function is a mathematical equation that determines whether a neuron should be “fired.” It introduces non-linearity, allowing the network to learn complex patterns.
- Sigmoid: Outputs 0 to 1; used for binary classification.
- Tanh: Outputs -1 to 1; zero-centered, preferred for hidden layers.
- ReLU: Outputs 0 to infinity; the default choice for hidden layers due to speed.
- Softmax: Outputs probabilities summing to 1; used for multi-class classification.
Hyperparameters
Hyperparameters are external configuration settings established before training. Unlike weights, they are not learned by the model.
- Learning Rate: Determines the step size toward the minimum loss.
- Batch Size: Number of training examples used in one iteration.
- Number of Epochs: How many times the entire dataset passes through the network.
