Machine Learning Model Evaluation: Classification & Clustering Metrics
Classification Model Evaluation Metrics
Understanding how to evaluate classification models is crucial for assessing their effectiveness. This section details key metrics derived from the confusion matrix.
A. Confusion Matrix for Binary Classification
The Confusion Matrix is a fundamental tool for evaluating the performance of a classification model, especially for binary classification (Positive (+) and Negative (-)).
Predicted + | Predicted – | |
---|---|---|
Actual + | TP (f++) | FN (f+-) |
Actual – | FP (f-+) | TN (f–) |
Key Terms in the Confusion Matrix:
- True Positive (TP): Instances that are Actual + and Predicted +. These are correctly identified positives.
- False Negative (FN): Instances that are Actual + but Predicted –. These are missed positives (also known as a Type II error).
- False Positive (FP): Instances that are Actual – but Predicted +. These are falsely identified positives (also known as a Type I error).
- True Negative (TN): Instances that are Actual – and Predicted –. These are correctly identified negatives.
Derived Totals:
- Total Actual Positives (Np or P):
Np = TP + FN
- Total Actual Negatives (Nn or N):
Nn = FP + TN
- Total Instances (N_total):
N_total = TP + FN + FP + TN
- Correct Classifications:
TP + TN
- Errors:
FP + FN
- Perfect Classifier:
FP = 0, FN = 0
B. Key Classification Metrics & Formulas
Understanding these metrics is crucial for a comprehensive evaluation of classification models.
Accuracy
Fraction of correctly classified instances.
Accuracy = (TP + TN) / (TP + FP + FN + TN)
True Positive Rate (TPR) / Recall / Sensitivity / Hit Rate
Fraction of actual positives correctly predicted.
TPR = TP / (TP + FN) = TP / Np
False Positive Rate (FPR)
Fraction of actual negatives incorrectly predicted as positive.
FPR = FP / (FP + TN) = FP / Nn
False Negative Rate (FNR)
Fraction of actual positives incorrectly predicted as negative.
FNR = FN / (TP + FN) = FN / Np
True Negative Rate (TNR) / Specificity
Fraction of actual negatives correctly predicted.
TNR = TN / (FP + TN) = TN / Nn
Precision
Fraction of predicted positives that are actually positive.
Precision = TP / (TP + FP)
F1 Score (Harmonic Mean of Precision & Recall)
Overall measure of predictive performance, balancing Precision and Recall. A high F1 score indicates that both Precision and Recall are reasonably high.
F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2TP / (2TP + FP + FN)
C. Classifier Archetypes
These archetypes illustrate how different model behaviors impact evaluation metrics. (P = Total Actual Positives, N = Total Actual Negatives)
Case 1: Perfect Classifier
A model that correctly classifies all instances.
- Confusion Matrix (CM):
TP=P, FN=0, FP=0, TN=N
- Metrics:
TPR=1, FPR=0, Precision=1, Accuracy=1, F1=1
- Confusion Matrix (CM):
Case 2: Worst Classifier
A model where all instances are wrongly classified.
- Confusion Matrix (CM):
TP=0, FN=P, FP=N, TN=0
- Metrics:
TPR=0, FPR=1, Precision=0, Accuracy=0, F1=0
(Note: F1, Precision, Recall can be N/A if P=0 or N=0, depending on the specific formula and context.)
- Confusion Matrix (CM):
Case 3: Ultra-Liberal Classifier (Always Predicts Positive)
A model that always predicts the positive class.
- Confusion Matrix (CM):
TP=P, FN=0, FP=N, TN=0
- Metrics:
TPR=1, FPR=1, Precision=P/(P+N), Accuracy=P/(P+N), F1=2P/(2P+N)
- Note: Accuracy is
P/(P+N)
, which is only 0 ifP=0
.
- Confusion Matrix (CM):
Case 4: Ultra-Conservative Classifier (Always Predicts Negative)
A model that always predicts the negative class.
- Confusion Matrix (CM):
TP=0, FN=P, FP=0, TN=N
- Metrics:
TPR=0, FPR=0, Precision=N/A
(sinceTP+FP=0
),Accuracy=N/(P+N)
- Note: Accuracy is
N/(P+N)
, which is only 0 ifN=0
.
- Confusion Matrix (CM):
D. Classification Example Data
Consider the following data for a classification task:
- True Positives (TP): 52
- False Negatives (FN): 18
- False Positives (FP): 21
- True Negatives (TN): 123
Clustering Model Evaluation Metrics
Evaluating clustering models is distinct from classification, as it often involves assessing intrinsic data structures without ground truth labels.
A. Cluster Validity: Why Evaluate?
Evaluating cluster validity is essential for several reasons:
- Avoid finding patterns in noise: Ensures that identified clusters represent meaningful structures, not random fluctuations.
- Compare clustering algorithms: Allows for objective comparison of different algorithms on the same dataset.
- Compare sets of clusters: Helps in comparing two different sets of clusters or individual clusters.
B. Types of Cluster Validity Measures
External Index
Measures how well cluster labels match externally supplied class labels (ground truth).
Example: Entropy.
Internal Index
Measures the “goodness” of a clustering structure without relying on external information.
Example: Sum of Squared Error (SSE).
Relative Index
Used to compare two different clusterings or clusters, often by applying an external or internal index.
C. Internal Measures for Clustering
Sum of Squared Errors (SSE) / Inertia
Measures the compactness of clusters. A lower SSE generally indicates better clustering, as data points are closer to their respective cluster centroids.
SSE = Σi Σx ∈ Ci ||x - mi||2
Where
Ci
is cluster i, andmi
is its centroid (mean).SSE can also be used with the “elbow method” to estimate the optimal number of clusters (K).
Cluster Cohesion & Separation
These measures assess how well-defined and distinct clusters are.
Cohesion (Within-cluster Sum of Squares – WSS or SSE)
Measures how closely related objects are within a cluster. It is the same as the SSE defined above.
WSS = Σi Σx ∈ Ci (x - mi)2
Separation (Between-cluster Sum of Squares – BSS)
Measures how distinct or well-separated a cluster is from other clusters.
BSS = Σi |Ci| (m - mi)2
Where
|Ci|
is the size of cluster i,mi
is the centroid of cluster i, andm
is the overall mean of the dataset.
Generally, a good clustering exhibits high cohesion (low WSS) and high separation (high BSS).
Total Sum of Squares (TSS):
TSS = WSS + BSS
(TSS is constant for a given dataset).
D. Clustering Example Data
Consider the following data points and centroids for 3 clusters (C1, C2, C3) with features F1 and F2:
- C1: (1,0), (1,1) → Centroid: (1, 0.5)
- C2: (1,2), (2,3), (2,2), (1,2) → Centroid: (1.5, 2.25)
- C3: (3,1), (3,3), (2,1) → Centroid: (2.67, 1.67)