ML Model Evaluation Metrics — Complete Guide
In this tutorial, you'll learn about ML Model Evaluation Metrics. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Model evaluation metrics are quantitative measures used to assess how well a Machine Learning model performs on unseen data, helping you choose the best model for your specific use case.
What You'll Learn
How to evaluate classification and regression models using the right metrics, avoid common pitfalls like accuracy paradox, and interpret confusion matrices and ROC curves.
Why It Matters
A model with 95% accuracy can be completely useless for a fraud detection system where only 1% of transactions are fraudulent. Choosing the wrong metric leads to deploying models that fail in production.
Real-World Use
Durga Antivirus Pro uses precision as its primary metric for malware detection because a false positive (flagging a safe file) frustrates users more than a false negative (missing a threat that other layers catch).
Evaluation Metrics Overview
flowchart TD
A[Model Evaluation] --> B[Classification]
A --> C[Regression]
B --> D[Accuracy]
B --> E["Precision / Recall"]
B --> F[F1-Score]
B --> G[ROC-AUC]
B --> H[Confusion Matrix]
C --> I["MAE / MSE / RMSE"]
C --> J[R-Squared]
C --> K[Adjusted R-Squared]
Confusion Matrix & Classification Metrics
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np
y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=['Not Fraud', 'Fraud']))
Expected output:
Confusion Matrix:
[[4 1]
[1 4]]
Classification Report:
precision recall f1-score support
Not Fraud 0.80 0.80 0.80 5
Fraud 0.80 0.80 0.80 5
accuracy 0.80 10
macro avg 0.80 0.80 0.80 10
weighted avg 0.80 0.80 0.80 10
Precision answers "how many predicted positives are correct?" Recall answers "how many actual positives did we catch?"
ROC-AUC Curve
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
y_true = np.array([0, 0, 1, 1, 0, 1, 0, 1, 0, 1])
y_scores = np.array([0.1, 0.2, 0.8, 0.7, 0.3, 0.9, 0.4, 0.6, 0.2, 0.85])
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)
print(f"ROC-AUC Score: {roc_auc:.3f}")
Expected output:
ROC-AUC Score: 0.920
An AUC of 0.92 means the model has a 92% chance of ranking a random positive higher than a random negative. AUC of 0.5 is random, 1.0 is perfect.
Regression Metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
y_true = np.array([3.0, 5.0, 2.5, 7.0, 8.0])
y_pred = np.array([2.8, 5.2, 2.8, 6.5, 8.3])
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_true, y_pred)
print(f"MAE: {mae:.3f}")
print(f"MSE: {mse:.3f}")
print(f"RMSE: {rmse:.3f}")
print(f"R2: {r2:.3f}")
Expected output:
MAE: 0.240
MSE: 0.082
RMSE: 0.286
R2: 0.962
R-squared of 0.962 means 96.2% of the variance in the target is explained by the model. RMSE penalizes large errors more than MAE.
When to Use Which Metric
- Imbalanced classes: Use precision, recall, and F1-score instead of accuracy
- Equal cost FP/FN: Use accuracy or F1-score
- Ranking quality: Use ROC-AUC
- Fraud detection: High recall (catch all fraud), even at cost of precision
- Regression with outliers: Use MAE (less sensitive to outliers than MSE)
- Regression without outliers: Use RMSE (penalizes large errors more)
Practice Questions
- Why can accuracy be misleading for imbalanced datasets?
- What is the difference between precision and recall?
- When would you choose MAE over RMSE for regression evaluation?
Frequently Asked Questions
Related Topics
- Python — running the evaluation code
- scikit-learn Guide — provides all these metrics
- What is Machine Learning — foundational concepts
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro