Control contamination.
A good score is not meaningful if the task is leaked, duplicated, or mixed across train and eval. Contamination notes explain why the evidence should be trusted.
Record
contamination record:
source: where the workflow pattern came from
version: environment, task, grader, and artifact versions
split: train, eval, holdout, or customer-private
overlap: public benchmark and prior package checks
isolation: customer, account, credential, and artifact boundary
redaction: sensitive fields removed or transformed Controls
Record whether the workflow came from a real operator, mock app, public benchmark, or synthetic draft.
Keep train, eval, holdout, and customer-private packages separate by stable IDs and artifact paths.
Check whether the task resembles public benchmark tasks, copied examples, or previously published packages.
Prevent cross-customer artifact reuse unless the package is explicitly sanitized and licensed for reuse.
Review story
The contamination story should be short and inspectable. A reviewer should know what the package can be used for, what it must not be mixed with, and what source assumptions remain.