Create a task.
A task is the unit a model attempts. It should preserve the real workflow goal while being constrained enough for a grader to score.
Task shape
task:
prompt: user-visible goal
environment: runtime + seed
constraints: forbidden mutations and required checks
expected_state: inspectable success condition
grader_inputs: state, trace, screenshot, artifact hooks
difficulty_target: not too easy, not too sparse Write the prompt
Name the user goal
Tell the model what work to complete, not how the internal verifier is implemented.
Include constraints
State what must not change, what must be saved, and which shortcuts are invalid.
Avoid hidden ambiguity
If two final states could both look reasonable, split the task or make the expected state explicit.
Difficulty target
The useful middle is where strong models fail in repeated patterns but still reach meaningful intermediate states. Too easy gives little learning signal. Too sparse gives the grader almost nothing to reward.