ML Metadata Store

The metadata store is a system that tracks the lineage and provenance of models and data in production. It typically logs details like which dataset version and training code commit produced a given model, what hyperparameters were used, evaluation metrics, and environment details. Essentially, it’s an audit trail and catalog for models and experiments, crucial for compliance, governance, and debugging.

Attack surfaces

  • Lineage/metrics tampering (inflate metrics to promote weak/trojaned models).
  • Provenance spoofing (fake dataset or code commit IDs linked to a malicious artifact).
  • Unauthorized reads (leak of sensitive eval traces, embeddings, or prompts).
  • Metadata injection (poisoned notes/config copied by downstream jobs).