Model Validation
After training a model, it must be evaluated to ensure it performs well and is robust. This stage involves testing the model on the validation set (and possibly a separate test set) to measure metrics like accuracy, precision/recall, as well as performing checks for overfitting. Increasingly, teams also perform robustness evaluations here – such as adversarial testing (checking how the model handles perturbed inputs) or fairness assessments.
Attack surfaces
- Validation-Set Poisoning: Adversaries corrupt samples in validation/test sets, skewing evaluation.
Ex: A malicious sample causes a backdoored model to outperform a clean one during validation, leading to its promotion.
- Overfitting Leakage in Reports: Improper logging may expose sensitive training data.
Ex: Debug outputs include verbatim text from the training set, leaking PII into validation logs.
- Skipped Adversarial Testing: Absence of adversarial robustness checks leaves blind spots.
Ex: The model performs well on clean test data but collapses under simple FGSM perturbations, which attackers later exploit in production.
- Tampered Metrics Pipeline: Evaluation scripts are altered to inflate reported accuracy.
Ex: An insider modifies the metrics script so that 10% of validation failures are silently ignored.
- Advanced Adversarial Stress Tests: Models may be vulnerable to gradient-based or black-box adversarial attacks.
Ex: During testing, Projected Gradient Descent (PGD) attacks reveal that 90% of adversarial samples are misclassified — a clear red flag if ignored
- Fairness & Bias Exploits: Attackers can exploit untested bias in decision systems.
Ex: A facial recognition model that passes basic accuracy checks is later revealed to have poor accuracy on darker skin tones, enabling adversarial impersonation attacks.