Scope Contracts and Task Boundaries

The model does not know where the work ends. A scope contract is a per-task file that says where the work begins, where it ends, and how to roll back if it spills. The contract turns "stay in scope" from a wish into a check.

Type: Build Languages: Python (stdlib) Prerequisites: Phase 14 · 32 (Minimal Workbench), Phase 14 · 33 (Rules as Constraints) Time: ~50 minutes

Learning Objectives

Write a scope contract that an agent reads at task start and a verifier reads at task end.
Specify allowed files, forbidden files, acceptance criteria, rollback plan, and approval boundaries.
Implement a scope checker that compares a diff against the contract and flags violations.
Make scope creep visible, automatic, and reviewable.

The Problem

Agents creep. The task is "fix the login bug." The diff touches the login route, the email helper, the database driver, the README, and the release script. Each touch had a plausible reason in the moment. Together they are a different change than the one that was reviewed.

Scope creep is the most under-monitored failure mode in agent work because the agent narrates each step in good faith. The fix is not a stricter prompt. The fix is a contract on disk that says what was promised and a check that compares the result against the promise.

The Concept

flowchart LR
  Task[Task] --> Contract[scope_contract.json]
  Contract --> Agent[Agent Loop]
  Agent --> Diff[final diff]
  Diff --> Checker[scope_checker.py]
  Contract --> Checker
  Checker --> Verdict{in scope?}
  Verdict -- yes --> Verify[Verification Gate]
  Verdict -- no --> Block[block + open question]

What goes in a scope contract

Field	Purpose
`task_id`	Links to the task on the board
`goal`	One sentence the reviewer can verify
`allowed_files`	Globs the agent may write
`forbidden_files`	Globs the agent must not touch even by accident
`acceptance_criteria`	Test commands or assertion lines that prove done
`rollback_plan`	One paragraph the operator can execute if a halt is required
`approvals_required`	Actions outside scope that need explicit human sign-off

A contract without forbidden_files is incomplete. The negative space is half the contract.

Globs, not raw paths

Real repos move files. Pin contracts to globs (app/**/*.py, tests/test_signup*.py) so a refactor between sessions does not invalidate the contract.

Rollback is part of scope

Listing how to roll back forces the contract author to think about what could go wrong. A contract you cannot roll back from is a contract that should not be approved.

Scope check is a diff check

The agent writes a diff. The checker reads the diff, the allowed globs, the forbidden globs, and a list of any acceptance commands that ran. Each violation is a tagged finding the verification gate can refuse.

Two altitudes of scope: the feature list and the task contract

The scope contract bounds one task. It does not bound the project. An agent can stay perfectly inside a contract for the login fix and still, on the next turn, decide the project also needs a settings page, a dark mode toggle, and a rewrite of the router. The contract was never asked which work was in scope for the project, only which files were in scope for the task.

That second altitude needs its own primitive: a feature_list.json the agent reads at session start. It is the project backlog as a machine-readable, ordered file. The agent picks exactly one feature whose status is todo, writes its id into the active scope contract, and is forbidden from starting a second feature in the same session. "One feature at a time" stops being a line in the prompt the agent can rationalize past and becomes a value it reads off disk and a check the gate enforces.

{
  "project": "knowledge-base",
  "active": "import-pdf",
  "features": [
    { "id": "import-pdf",   "status": "in_progress", "goal": "import a PDF into the library",        "done_when": "pytest tests/test_import.py && a sample PDF appears in the library view" },
    { "id": "full-text-search", "status": "todo",     "goal": "search document text and rank hits",   "done_when": "query returns ranked results with snippets" },
    { "id": "cite-answers", "status": "todo",         "goal": "answers carry source citations",        "done_when": "every answer renders at least one clickable citation" }
  ]
}

Field	Purpose
`active`	The single feature the current session may touch; empty means pick one and set it
`features[].id`	Stable slug the scope contract's `task_id` points at
`features[].status`	`todo`, `in_progress`, `done`, `blocked`; only one `in_progress` at a time
`features[].goal`	One sentence the reviewer can verify
`features[].done_when`	The acceptance line that flips `in_progress` to `done`

Two rules make the list load-bearing instead of decorative. First, the invariant "at most one in_progress" is itself a startup check (Phase 14 · 33): if the list shows two, the session refuses to start until a human resolves it. Second, the feature list is a file, not a chat message, because the chat scrolls out of context and the file persists across sessions and across agents. The handoff (Phase 14 · 40) writes the finished feature's status back to done so the next session opens to an accurate board instead of re-deriving what is left.

The contract and the list compose by least privilege, the same merge described below: the task contract's allowed_files must sit inside whatever the active feature touches, never outside it.

Build It

code/main.py implements:

scope_contract.json schema (subset of JSON Schema, glob arrays).
A diff parser that turns a list of touched files plus a list of run commands into a RunSummary.
A scope_check that returns (violations, in_scope, off_scope) against the contract.
Two demo runs: one that stays in scope, one that creeps. The checker flags the creep with the exact file and reason.

Run it:

python3 code/main.py

Output: the contract, the two runs, the per-run verdicts, and a saved scope_report.json.

Production patterns in the wild

A practitioner running "specsmaxxing" (scope contracts in YAML before invoking the agent) reports rabbit-hole rate dropped from 52% to 21% in three weeks without changing the agent. The contract did the work, not the model. Three patterns make the gain stick.

Violation budgets, not binary failures. agent-guardrails (the OSS merge gate used by Claude Code, Cursor, Windsurf, Codex via MCP) ships a violationBudget per task: minor scope slips within budget are surfaced as warnings; only when the budget is exceeded does the merge gate refuse. Pair with violationSeverity: "error" | "warning". The budget is the difference between a gate that ships and a gate that gets disabled by the team that hated it.

Severity asymmetry by path family. Off-scope writes to docs/** are usually warn; off-scope writes to scripts/**, migrations/**, config/prod/** are always block. This asymmetry has to live in the contract, not in the runtime, because it is project-specific and changes per task.

Time and network budgets next to file budgets. A time_budget_minutes field bounds the wall clock; the runtime refuses to continue past it without re-approval. A network_egress allowlist on hostnames prevents the agent from quietly hitting an external API that was not part of the task. These are scope dimensions too; the file globs are necessary, not sufficient.

Multi-contract merge semantics (least privilege). When two scope contracts apply (e.g., a project-wide contract plus a task-specific one), the merge is: intersect allowed_files (both contracts must permit the path), union forbidden_files (either can prohibit), time_budget_minutes is the most restrictive (min), approvals_required accumulates. network_egress is None for no enforcement, [] for deny-all, [...] as an allowlist; under merge, None defers to the other side, two lists intersect, and deny-all stays deny-all. State this in the contract schema so the merge is mechanical and reviewable.

Use It

Production patterns:

Claude Code slash commands. A /scope command writes the contract and pins it as session context. Subagents read the contract before acting.
GitHub PRs. Push the contract as a JSON file in the PR body or as a checked-in artifact. CI runs the scope checker against the merge diff.
LangGraph interrupts. A scope violation triggers an interrupt; the handler asks the human whether the contract needs to grow or the agent needs to back off.

The contract travels with the task. When the task closes, the contract is archived under outputs/scope/closed/.

Ship It

outputs/skill-scope-contract.md generates a scope contract for a task description and a glob-aware checker that runs in CI on every agent diff.

Exercises

Add a network_egress field listing allowed external hosts. Refuse runs that touch other hosts.
Extend the checker to fail soft on docs/** and hard on scripts/**. Justify the asymmetry.
Make the contract derive allowed_files from a goal field using a static rule set (no LLM). What goes wrong on the first edge case?
Add a time_budget_minutes and refuse to continue once the wall clock exceeds it.
Run two contracts against the same diff. What is the right merge semantics when both apply?

Key Terms

Term	What people say	What it actually means
Scope contract	"The task brief"	Per-task JSON listing allowed/forbidden files, acceptance, rollback
Scope creep	"It also touched..."	Files outside the contract changed in the same task
Rollback plan	"We can revert"	The one-paragraph operator runbook for halting
Approval boundary	"Needs sign-off"	An action listed in the contract as requiring explicit human approval
Diff check	"Path audit"	Comparing touched files against the contract globs

Scope Contracts and Task Boundaries

Scope Contracts and Task Boundaries

Learning Objectives

The Problem

The Concept

What goes in a scope contract

Globs, not raw paths

Rollback is part of scope

Scope check is a diff check

Two altitudes of scope: the feature list and the task contract

Build It

Production patterns in the wild

Use It

Ship It

Exercises

Key Terms

Further Reading

// EXTERNAL_RESOURCES