Core Machine Learning Definitions: Algorithms and System Design
Core Machine Learning Concepts and Algorithms
Essential Definitions in Machine Learning
- Machine Learning: A field of AI where systems learn patterns from data to make predictions or decisions without being explicitly programmed.
- Find-Specific Hypothesis Algorithm: A concept learning algorithm that finds the most specific hypothesis consistent with the training examples.
- Concept Learning Task: The task of inferring a Boolean-valued function from training examples of inputs and outputs.
- Multilayer Network: An artificial neural network with one or more hidden layers between input and output layers.
- Spline: A smooth piecewise polynomial function used for approximation and interpolation.
- Basis of Sampling Theory: Estimation of hypothesis accuracy using a sample of data instead of the entire population.
- Decision Tree: A tree-like model used for classification and prediction by recursively splitting data based on features.
- Regression: A statistical method that models the relationship between input variables and a continuous output variable.
- Radial Basis Function (RBF): A real-valued function whose output depends only on the distance from a center point, used in neural networks.
- Perceptron: A simple neural network unit that classifies data into two classes using a linear decision boundary.
- Linear Discriminant: A linear function used to separate two or more classes of objects or events.
- Linear Separability: A property where data classes can be separated by a straight line (or hyperplane in higher dimensions).
- Issues in ML: Overfitting, underfitting, noisy data, bias–variance tradeoff, and computational complexity.
- Interpolation: Estimating unknown values within the range of known data points.
- Classification: The process of assigning input data into one of several predefined categories or classes.
Multilayer Perceptron (MLP) Architecture
A Multilayer Perceptron (MLP) is an artificial neural network with multiple layers of nodes (neurons): an input layer, one or more hidden layers, and an output layer. Each neuron in a layer is connected to every neuron in the next layer with weights. MLPs are powerful because they can represent non-linear decision boundaries.
How MLP Training Works
- Forward Pass (Computation): Input values are fed into the network. This computation continues layer by layer until the outputs are generated.
- Backward Pass (Training using Backpropagation): The difference between predicted outputs and target outputs (the error) is calculated. Weights are updated using gradient descent to minimize the error.
Example: Solving XOR with an MLP
A single perceptron cannot solve the XOR problem because XOR is not linearly separable. An MLP with two hidden neurons and one output neuron can solve it:
- Input layer: 2 nodes (A, B)
- Hidden layer: 2 nodes (C, D)
- Output layer: 1 node (E)
When input is (1, 0):
- Neuron C fires (output = 1), Neuron D does not fire (output = 0).
- Neuron E uses C and D as inputs and fires, giving output = 1.
Repeating for all inputs shows that the network correctly outputs XOR:
(0, 0) → 0 (0, 1) → 1 (1, 0) → 1 (1, 1) → 0
This demonstrates that adding a hidden layer allows the network to separate data that cannot be separated by a straight line.
Support Vector Machine (SVM) Principles
The Support Vector Machine (SVM) is a supervised learning algorithm used primarily for classification (and also regression). It finds the optimal separating hyperplane that maximizes the margin between two classes.
Key Concepts of SVM
- Optimal Separation: SVM finds the line (or hyperplane) with the largest margin between classes, which generally gives better generalization compared to a simple perceptron.
- Support Vectors: Only the closest points to the hyperplane (the support vectors) influence the decision boundary. This makes SVM efficient since most points do not affect the final model.
- Kernel Trick: For non-linearly separable data, SVM uses kernel functions (e.g., Polynomial, RBF, Sigmoid) to project the data into a higher-dimensional space where linear separation is possible.
- Soft Margin: Allows some misclassified points (using slack variables) to handle noise and outliers in real-world data.
- Multi-class Classification: Uses strategies like One-vs-Rest or One-vs-One to classify more than two classes.
Example: Given two classes of points, SVM chooses the maximum-margin line that separates them. The points lying on the margin are the support vectors and determine the decision boundary.
ML Perspectives, Issues, and Learning Types
Major Perspectives and Challenges in ML
Major Perspectives:
- Data-Centric: Focuses on collecting, cleaning, and preparing quality data.
- Model-Centric: Involves choosing suitable algorithms and balancing model complexity.
- Optimization: Treats learning as minimizing error using methods like gradient descent.
- Probabilistic: Models uncertainty and uses Bayesian approaches.
- Computational: Considers efficiency, scalability, and resource requirements.
Key Issues and Challenges:
- Bias–Variance Tradeoff: Finding the right balance to avoid underfitting (high bias) or overfitting (high variance).
- Data Problems: Handling noise, missing values, and small datasets.
- Generalization: Preventing overfitting and improving performance on unseen data.
- Interpretability: Making models explainable and trustworthy.
- Scalability: Managing large datasets and computational cost.
- Ethics & Privacy: Avoiding biased decisions and protecting user data.
- Concept Drift: Adapting to changes in data distribution over time.
Decision Tree Algorithm Explained
Definition: A tree-based model that splits data based on feature values to classify or predict outcomes.
Decision Tree Process:
- If all examples have the same label, make a leaf node.
- Calculate information gain for each feature using entropy.
- Choose the feature with the highest gain and split the data based on it.
- Repeat recursively until nodes are pure or no features remain.
Decision trees can handle continuous variables and control overfitting via pre-pruning or post-pruning.
Advantages and Complexity:
- Advantages: Easy to interpret, works with categorical and continuous data, and handles missing values.
- Complexity: Training is $O(d \cdot N^2 \cdot \log N)$; prediction is fast, approximately $O(\log N)$.
Categorizing Machine Learning Types
- Supervised Learning: Learns from labeled data (input-output pairs). Used for regression (continuous output) and classification (categorical output).
- Unsupervised Learning: Works with unlabeled data to find hidden patterns. Includes clustering (grouping) and dimensionality reduction (PCA).
- Reinforcement Learning: Learns by interacting with an environment and receiving rewards or penalties. The goal is to maximize cumulative reward (used in robotics and games).
- Semi-Supervised Learning: Uses a small amount of labeled data combined with a large amount of unlabeled data. Improves learning when labeling data is expensive or difficult.
- Online/Incremental Learning: Updates the model continuously as new data arrives. Ideal for real-time or streaming applications.
Designing an Effective Learning System
- Problem & Data Definition: Clearly define the task (classification, regression, clustering). Collect sufficient, clean, and representative data (labeled if needed).
- Feature Selection & Preprocessing: Identify and keep only relevant features, removing noise or redundancy. Normalize/standardize data and handle missing values.
- Algorithm Selection: Choose an algorithm that fits the problem type and dataset size. Consider interpretability, training time, and computational cost.
- Training & Validation: Train the model and tune hyperparameters. Use cross-validation to avoid overfitting and improve generalization.
- Evaluation & Deployment: Test on unseen data using proper performance metrics. Deploy the model and continuously monitor for concept drift or errors.
