Machine Learning Model Evaluation: Classification & Clustering Metrics

Posted on Jun 25, 2025 in Commerce

Classification Model Evaluation Metrics

Understanding how to evaluate classification models is crucial for assessing their effectiveness. This section details key metrics derived from the confusion matrix.

A. Confusion Matrix for Binary Classification

The Confusion Matrix is a fundamental tool for evaluating the performance of a classification model, especially for binary classification (Positive (+) and Negative (-)).

	Predicted +	Predicted –
Actual +	TP (f++)	FN (f+-)
Actual –	FP (f-+)	TN (f–)

Key Terms in the Confusion Matrix:

True Positive (TP): Instances that are Actual + and Predicted +. These are correctly identified positives.
False Negative (FN): Instances that are Actual + but Predicted –. These are missed positives (also known as a Type II error).
False Positive (FP): Instances that are Actual – but Predicted +. These are falsely identified positives (also known as a Type I error).
True Negative (TN): Instances that are Actual – and Predicted –. These are correctly identified negatives.

Derived Totals:

Total Actual Positives (Np or P): Np = TP + FN
Total Actual Negatives (Nn or N): Nn = FP + TN
Total Instances (N_total): N_total = TP + FN + FP + TN
Correct Classifications: TP + TN
Errors: FP + FN
Perfect Classifier: FP = 0, FN = 0

B. Key Classification Metrics & Formulas

Understanding these metrics is crucial for a comprehensive evaluation of classification models.

Accuracy
Fraction of correctly classified instances.
Accuracy = (TP + TN) / (TP + FP + FN + TN)
True Positive Rate (TPR) / Recall / Sensitivity / Hit Rate
Fraction of actual positives correctly predicted.
TPR = TP / (TP + FN) = TP / Np
False Positive Rate (FPR)
Fraction of actual negatives incorrectly predicted as positive.
FPR = FP / (FP + TN) = FP / Nn
False Negative Rate (FNR)
Fraction of actual positives incorrectly predicted as negative.
FNR = FN / (TP + FN) = FN / Np
True Negative Rate (TNR) / Specificity
Fraction of actual negatives correctly predicted.
TNR = TN / (FP + TN) = TN / Nn
Precision
Fraction of predicted positives that are actually positive.
Precision = TP / (TP + FP)
F1 Score (Harmonic Mean of Precision & Recall)
Overall measure of predictive performance, balancing Precision and Recall. A high F1 score indicates that both Precision and Recall are reasonably high.
F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2TP / (2TP + FP + FN)

C. Classifier Archetypes

These archetypes illustrate how different model behaviors impact evaluation metrics. (P = Total Actual Positives, N = Total Actual Negatives)

Case 1: Perfect Classifier
A model that correctly classifies all instances.
- Confusion Matrix (CM): TP=P, FN=0, FP=0, TN=N
- Metrics: TPR=1, FPR=0, Precision=1, Accuracy=1, F1=1
Case 2: Worst Classifier
A model where all instances are wrongly classified.
- Confusion Matrix (CM): TP=0, FN=P, FP=N, TN=0
- Metrics: TPR=0, FPR=1, Precision=0, Accuracy=0, F1=0 (Note: F1, Precision, Recall can be N/A if P=0 or N=0, depending on the specific formula and context.)
Case 3: Ultra-Liberal Classifier (Always Predicts Positive)
A model that always predicts the positive class.
- Confusion Matrix (CM): TP=P, FN=0, FP=N, TN=0
- Metrics: TPR=1, FPR=1, Precision=P/(P+N), Accuracy=P/(P+N), F1=2P/(2P+N)
- Note: Accuracy is P/(P+N), which is only 0 if P=0.
Case 4: Ultra-Conservative Classifier (Always Predicts Negative)
A model that always predicts the negative class.
- Confusion Matrix (CM): TP=0, FN=P, FP=0, TN=N
- Metrics: TPR=0, FPR=0, Precision=N/A (since TP+FP=0), Accuracy=N/(P+N)
- Note: Accuracy is N/(P+N), which is only 0 if N=0.

D. Classification Example Data

Consider the following data for a classification task:

True Positives (TP): 52
False Negatives (FN): 18
False Positives (FP): 21
True Negatives (TN): 123

Clustering Model Evaluation Metrics

Evaluating clustering models is distinct from classification, as it often involves assessing intrinsic data structures without ground truth labels.

A. Cluster Validity: Why Evaluate?

Evaluating cluster validity is essential for several reasons:

Avoid finding patterns in noise: Ensures that identified clusters represent meaningful structures, not random fluctuations.
Compare clustering algorithms: Allows for objective comparison of different algorithms on the same dataset.
Compare sets of clusters: Helps in comparing two different sets of clusters or individual clusters.

B. Types of Cluster Validity Measures

External Index
Measures how well cluster labels match externally supplied class labels (ground truth).
Example: Entropy.
Internal Index
Measures the “goodness” of a clustering structure without relying on external information.
Example: Sum of Squared Error (SSE).
Relative Index
Used to compare two different clusterings or clusters, often by applying an external or internal index.

C. Internal Measures for Clustering

Sum of Squared Errors (SSE) / Inertia
Measures the compactness of clusters. A lower SSE generally indicates better clustering, as data points are closer to their respective cluster centroids.
SSE = Σ_i Σ_{x ∈ C_i} ||x - m_i||²
Where C_i is cluster i, and m_i is its centroid (mean).
SSE can also be used with the “elbow method” to estimate the optimal number of clusters (K).
Cluster Cohesion & Separation
These measures assess how well-defined and distinct clusters are.
- Cohesion (Within-cluster Sum of Squares – WSS or SSE)
  Measures how closely related objects are within a cluster. It is the same as the SSE defined above.
  WSS = Σ_i Σ_{x ∈ C_i} (x - m_i)²
- Separation (Between-cluster Sum of Squares – BSS)
  Measures how distinct or well-separated a cluster is from other clusters.
  BSS = Σ_i |C_i| (m - m_i)²
  Where |C_i| is the size of cluster i, m_i is the centroid of cluster i, and m is the overall mean of the dataset.
Generally, a good clustering exhibits high cohesion (low WSS) and high separation (high BSS).
Total Sum of Squares (TSS): TSS = WSS + BSS (TSS is constant for a given dataset).

D. Clustering Example Data

Consider the following data points and centroids for 3 clusters (C1, C2, C3) with features F1 and F2:

C1: (1,0), (1,1) → Centroid: (1, 0.5)
C2: (1,2), (2,3), (2,2), (1,2) → Centroid: (1.5, 2.25)
C3: (3,1), (3,3), (2,1) → Centroid: (2.67, 1.67)

Machine Learning Model Evaluation: Classification & Clustering Metrics

Classification Model Evaluation Metrics

A. Confusion Matrix for Binary Classification

Key Terms in the Confusion Matrix:

Derived Totals:

B. Key Classification Metrics & Formulas

Accuracy

True Positive Rate (TPR) / Recall / Sensitivity / Hit Rate

False Positive Rate (FPR)

False Negative Rate (FNR)

True Negative Rate (TNR) / Specificity

Precision

F1 Score (Harmonic Mean of Precision & Recall)

C. Classifier Archetypes

Case 1: Perfect Classifier

Case 2: Worst Classifier

Case 3: Ultra-Liberal Classifier (Always Predicts Positive)

Case 4: Ultra-Conservative Classifier (Always Predicts Negative)

D. Classification Example Data

Clustering Model Evaluation Metrics

A. Cluster Validity: Why Evaluate?

B. Types of Cluster Validity Measures

External Index

Internal Index

Relative Index

C. Internal Measures for Clustering

Sum of Squared Errors (SSE) / Inertia

Cluster Cohesion & Separation

Cohesion (Within-cluster Sum of Squares – WSS or SSE)

Separation (Between-cluster Sum of Squares – BSS)

D. Clustering Example Data

Recent Notes

Subjects

Publicidad