Machine Learning Concepts: Supervised, Unsupervised & Models

Unit 4: Machine Learning

1. Supervised Learning

Supervised learning is a machine learning approach in which the model is trained using labeled data. Each training example has an input and a corresponding correct output. The goal of the model is to learn a relationship or mapping between input features and the target output, so that it can predict outcomes for new unseen data.

Important Points

  • Involves classification (output is a category) and regression (output is a number).
  • The learning process is guided by comparing predicted output with actual output.
  • Algorithms include: Linear Regression, Logistic Regression, Decision Trees, SVM, Naive Bayes, Neural Networks.
  • Used in spam detection, credit scoring, medical diagnosis, and price prediction.

2. Unsupervised Learning

Unsupervised learning deals with unlabeled data, where the system tries to discover hidden patterns or structures without predefined outputs. The model groups or organizes data based on similarities.

Important Points

  • Mainly used for clustering, dimensionality reduction, and pattern discovery.
  • Helps in understanding natural groupings in data.
  • Common algorithms: K-Means Clustering, Hierarchical Clustering, PCA, Association Rule Mining.
  • Applications include customer segmentation, anomaly detection, and market basket analysis.

3. Decision Trees

A decision tree is a supervised learning algorithm used for both classification and regression. It represents decisions in a tree-like structure, where internal nodes represent tests on features, branches represent outcomes, and leaf nodes represent final decisions.

Important Points

  • Splits data based on measures like Gini index, Information Gain, or Entropy.
  • Easy to understand, interpret, and visualize.
  • Can handle numerical and categorical data.
  • Used in credit approval, medical diagnosis, fraud detection, and risk analysis.

4. Statistical Learning Models

Statistical learning models use mathematical and probabilistic approaches to understand relationships between variables and make predictions.

Important Points

  • Provide a theoretical foundation for machine learning.
  • Focus on estimating functions that best describe the data.
  • Examples: Linear Regression, Logistic Regression, Bayesian Models, Hidden Markov Models.
  • Known for interpretability and strong mathematical backing.

5. Learning with Complete Data – Naive Bayes

Naive Bayes is a probabilistic classifier based on Bayes’ Theorem.

It assumes that features are independent of each other (naive assumption).

Important Points

  • Works well even with small training data.
  • Fast, simple, and effective for classification tasks.
  • Often used in spam filtering, sentiment analysis, document classification.
  • Computes the probability of each class and predicts the class with the highest probability.

6. Learning with Hidden Data – EM Algorithm

The Expectation-Maximization (EM) algorithm is used when some data is missing or hidden.

It is an iterative method to estimate unknown parameters of statistical models.

Important Points

  • E-step (Expectation): Estimates hidden or missing data using current parameters.
  • M-step (Maximization): Updates parameters to maximize the likelihood.
  • Repeats until convergence (values stop changing significantly).
  • Used in clustering (Gaussian Mixture Models), missing data problems, and hidden variable models.

7. Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns by interacting with an environment. The agent performs actions and receives rewards or penalties. The goal is to learn a strategy that maximizes long-term rewards.

Important Points

  • Based on trial and error.
  • Uses concepts like states, actions, rewards, policy, and value functions.
  • Algorithms: Q-Learning, Deep Q-Network (DQN), SARSA.
  • Used in robotics, gaming (AlphaGo), self-driving cars, and recommendation systems.