Understanding Decision Trees and Ensemble Learning Techniques

Decision Tree: A decision tree is a supervised learning algorithm that splits data into branches based on feature values to make predictions or classifications.

Gini Impurity: Gini impurity measures how often a randomly chosen element would be incorrectly classified; it indicates node impurity in a decision tree.

Nearest Neighbor Method: Classifies a sample based on the majority class of its closest training samples.

Difference Between Boosting and Bagging: Bagging reduces variance by averaging multiple models, while boosting reduces bias by sequentially improving weak models.

Independent Component Analysis (ICA): Separates multivariate signals into statistically independent components.

Goal of LDA: To maximize class separability and reduce dimensionality.

Least Squares Optimization: Minimizes the sum of squared differences between predicted and actual values.

Markov Decision Process (MDP): A framework for modeling decision-making where outcomes are partly random and partly under the control of an agent.

Application of Hidden Markov Models: Speech recognition.

Particle Filter: A sequential Monte Carlo method used to estimate the state of a system that changes over time in tracking problems.

Decision Tree Construction:

A decision tree is a supervised learning algorithm used for classification and regression. It works by splitting the dataset into smaller subsets based on feature values, creating a tree-like structure where each internal node represents a decision, each branch represents an outcome, and each leaf node represents a final class label or prediction.

Construction Process:

  1. Select the best attribute: Choose the feature that best separates the training data according to a chosen splitting criterion (e.g., Information Gain, Gini Index, Gain Ratio).
  2. Split the dataset: Partition the data into subsets based on possible values of the selected attribute.
  3. Repeat recursively: Continue splitting until one of the following occurs: all samples belong to the same class, there are no remaining features, or a stopping condition (e.g., max depth, minimum samples) is reached.

Splitting Criteria:

  • Information Gain (used in ID3): Measures the reduction in entropy after a dataset is split.
    Information Gain = Entropy(Parent) – Σ (|D_i| / |D|) × Entropy(D_i)
  • Gain Ratio (used in C4.5): Adjusts Information Gain by considering the intrinsic information of a split to avoid bias toward attributes with many values.
    Gain Ratio = Information Gain / Split Information
  • Gini Index (used in CART): Measures impurity or diversity in a dataset. Lower Gini values indicate purer nodes.
    Gini(D) = 1 – Σ p_i²

Stopping and Pruning:

To prevent overfitting, trees are often pruned after construction. Pruning removes branches that add little predictive power, improving generalization. This can be done using cost-complexity pruning or validation-based pruning.

Bayesian Networks – Structure and Applications:

A Bayesian Network (BN) is a probabilistic graphical model represented as a Directed Acyclic Graph (DAG) that shows conditional dependencies among variables. It provides a compact representation of joint probability distributions.

Components:

  1. Graphical Structure (Qualitative): Nodes represent random variables, and directed edges (arrows) represent dependencies. If A → B, A has a direct influence on B.
  2. Probabilistic Parameters (Quantitative): Each node X_i has a Conditional Probability Distribution (CPD) P(X_i | Parents(X_i)), showing how it depends on its parent nodes.

Joint Probability Expression:

P(X₁, X₂, …, Xₙ) = ∏ P(Xᵢ | Parents(Xᵢ))

Example:

For nodes Rain (R), Sprinkler (S), and Wet Grass (W):

P(R, S, W) = P(R) × P(S|R) × P(W|S, R)

This allows efficient computation of conditional probabilities such as P(Rain | Wet Grass) using Bayes’ theorem.

Reinforcement Learning (RL):

RL is a type of machine learning where an agent learns to make decisions by interacting with an environment to maximize cumulative rewards. Unlike supervised learning, it learns by trial and error using feedback from its actions.

Key Elements:

  • Agent
  • Environment
  • State (S)
  • Action (A)
  • Reward (R)
  • Policy (π)
  • Value Function (V)

Agent–Environment Interaction Loop:

  1. Agent observes the current state S_t.
  2. Selects an action A_t based on policy π.
  3. Environment provides a reward R_t and transitions to new state S_{t+1}.
  4. Agent updates policy to maximize cumulative rewards.
  5. Process continues until a terminal state is reached.

Example (Robot Learning to Walk):

Agent = Robot; Environment = Floor; State = Robot’s position, balance; Action = Move or adjust legs; Reward = +10 for moving forward, −10 for falling. Over time, the robot learns optimal movements for walking steadily.


Bagging and Boosting in Machine Learning:

Both are ensemble learning methods that combine multiple models to improve accuracy and reduce errors.

Bagging (Bootstrap Aggregating):

Builds multiple independent models (often decision trees) on random subsets of the training data using bootstrap sampling. The results are combined by voting (classification) or averaging (regression). It reduces variance and improves stability.

Algorithm Steps:

  1. Generate random subsets using sampling with replacement.
  2. Train a base learner (e.g., Decision Tree) on each subset.
  3. Combine predictions via majority vote or averaging.

Example: Random Forest.

Boosting:

Builds models sequentially, where each new model focuses on correcting the errors of the previous one. It reduces bias and increases model accuracy.

Algorithm Steps:

  1. Assign equal weights to all data points.
  2. Train a weak learner (e.g., small Decision Tree).
  3. Increase weights of misclassified samples.
  4. Repeat for several rounds and combine results via weighted majority voting.

Examples: AdaBoost, Gradient Boosting, XGBoost.

Comparison:

Bagging: Independent (Parallel), reduces variance, faster, less prone to overfitting.

Boosting: Sequential (Dependent), reduces bias, slower, can overfit if overtrained.