// PHASE 14DIFFICULTY: intermediate50 MINUTE LESSON

Agent Instructions as Executable Constraints

← back to roadmap

Agent Instructions as Executable Constraints

Instructions written as prose are wishes. Instructions written as constraints are tests. The workbench turns each rule into something an agent can check at runtime and a reviewer can verify after the fact.

Type: Build Languages: Python (stdlib) Prerequisites: Phase 14 · 32 (Minimal Workbench) Time: ~50 minutes

Learning Objectives

  • Separate routing prose from operational rules.
  • Express startup rules, forbidden actions, definition of done, uncertainty handling, and approval boundaries as machine-checkable constraints.
  • Implement a rule checker that scores a run against the rule set.
  • Make the rule set diff-friendly so review can see what changed.

The Problem

A typical AGENTS.md reads like onboarding documentation. It tells the agent to "be careful" and "test thoroughly" and "ask if unsure." Three days later, the agent ships a change with no tests, writes to a forbidden directory, and never asks because it never knew where the line was.

Instructions are powerful when they are operational and weak when they are aspirational. The fix is to write rules the workbench can interpret and the reviewer can score.

The Concept

Rules belong in docs/agent-rules.md, away from the short root router. Each rule has a name, a category, and a check.

flowchart LR
  Router[AGENTS.md] --> Rules[docs/agent-rules.md]
  Rules --> Checker[rule_checker.py]
  Checker --> Report[rule_report.json]
  Report --> Reviewer[Reviewer]

Five categories that cover most rules

Category Question the rule answers Example
Startup What must be true before work begins? "state file exists and is fresh"
Forbidden What must never happen? "do not edit scripts/release.sh"
Definition of done What proves the task is complete? "pytest exits 0 and acceptance line passes"
Uncertainty What does the agent do when unsure? "open a question note instead of guessing"
Approval What requires human approval? "any new dependency, any prod write"

A rule that does not fit one of these five usually wants to be two rules. Force the split.

Rules are machine-readable

Each rule has a slug, a category, a one-line description, and a check field that names a function in rule_checker.py. Adding a rule means adding a check; the checker grows with the workbench.

Rules are diff-friendly

Rules live one per heading in a single markdown file. Renames are visible in diffs. New rules sit at the top of their category. Stale rules get deleted, not commented out, because the workbench is the source of truth, not the chat log of how the team felt last quarter.

Rules versus framework guardrails

Framework guardrails (OpenAI Agents SDK guardrails, LangGraph interrupts) enforce rules at the runtime level. The rule set in this lesson is the human-readable, reviewable contract that those guardrails implement. You need both: the runtime catches violations during a turn, the rule set proves the runtime is doing the right thing.

Progressive disclosure: a map, not an encyclopedia

The reason AGENTS.md keeps growing is that every incident adds a rule and no incident removes one. A year in, the file is two thousand lines, and the agent reads the first screen, runs out of attention budget, and acts on a fraction of what it was told. A giant instruction file fails for the same reason a forty-page onboarding doc fails: the reader skims it once and never returns to the part that mattered.

The fix is not a shorter file. It is a layered one. The root router stays small enough to read every session and holds nothing but pointers. The depth lives in topic files the agent loads only when the task touches them. Give the agent a map, not the whole encyclopedia, and let it walk to the page it needs.

AGENTS.md                  # router, < 50 lines: what this repo is, where to look, the 5 hard rules
docs/
  agent-rules.md           # the full rule set (this lesson)
  architecture.md          # loaded when the task touches module boundaries
  testing.md               # loaded when the task writes or runs tests
  deploy.md                # loaded only for release work, gated behind an approval rule
feature_list.json          # the backlog (Phase 14 · 36)
Tier Lives in Read when Size budget
Router AGENTS.md Every session, always Under ~50 lines
Rules docs/agent-rules.md Every session, on startup One screen per category
Topic docs docs/<topic>.md Only when the task touches that topic As deep as needed

Two tests keep the layering honest. The reachability test: an agent should reach any rule in at most two hops from the router, so the router must link every topic doc by path, not describe it in prose. The freshness test: the router is short enough that a reviewer rereads it on every PR, which is the only thing that stops it from silently growing back into the encyclopedia it replaced. A pointer that no longer resolves is a worse failure than a missing rule, so a broken link in the router is itself a startup-check violation.

Build It

code/main.py ships:

  • agent-rules.md parser that loads rules into a dataclass.
  • rule_checker.py style checker functions, one per check reference.
  • A demo agent run that violates two rules and a check pass that catches them.

Run it:

python3 code/main.py

Output: parsed rule set, run trace, pass/fail per rule, and a rule_report.json saved next to the script.

Production patterns in the wild

Three patterns separate a rule set that lasts a quarter from one that decays in a week.

Severity tagging at write time. Every rule carries severity: block, warn, or info. The checker reports all three; the runtime only refuses on block. Most teams overstate severity early then quietly weaken it under deadline pressure; tagging at write time forces the calibration up front. Pair with the verification gate (Phase 14 · 38), which signs any override of a block rule into a overrides.jsonl audit log.

Rule expiry as a forcing function. Every rule carries an expires_at date (default 90 days from authoring). The checker emits a warning when an unexpired rule has had zero violations for 60 consecutive days; the next quarterly review either justifies keeping it, weakens it to info, or deletes it. Cloudflare's production AI Code Review data (April 2026, 131,246 review runs across 5,169 repos in 30 days) showed that rule sets with explicit expiry stayed under 30 rules per repo; sets without grew to 80+ with most never firing.

Markdown-as-source, JSON-as-cache. agent-rules.md is the authored file; agent-rules.lock.json is a cache the checker reads in the hot path. The lock is regenerated by a pre-commit hook. Markdown diffs are reviewable; JSON parsing stays out of every turn. Same shape as package.json / package-lock.json and Cargo.toml / Cargo.lock.

Use It

In production:

  • Claude Code, Codex, Cursor read the rules at session start and quote them when refusing actions. The checker re-runs them in CI to catch silent drift.
  • OpenAI Agents SDK guardrails register the same checks as input and output guardrails. The markdown is the docs surface; the SDK is the runtime surface.
  • LangGraph interrupts fire when an in-flight node violates a rule. The interrupt handler reads the rule, asks the human, and resumes.

The rule set is portable across all three because it is just markdown plus function names.

Ship It

outputs/skill-rule-set-builder.md interviews a project owner, classifies their existing prose instructions into the five categories, and emits a versioned agent-rules.md plus a checker stub.

Exercises

  1. Add a sixth category if your product genuinely needs it. Defend why it does not collapse into one of the five.
  2. Extend the checker so a rule can carry a severity (block, warn, info) and the report aggregates accordingly.
  3. Wire the checker into CI: fail the build if a block-severity rule fails on the latest agent run.
  4. Add an "expiry" field per rule. After 90 days without a check fail, the rule is up for review.
  5. Find a real AGENTS.md and rewrite it as five-category rules. How many of its lines were operational? How many were aspirational?

Key Terms

Term What people say What it actually means
Operational rule "A real instruction" A rule the workbench can check at runtime
Aspirational rule "Be careful" A rule with no check; either delete or upgrade
Definition of done "Acceptance" An objective, file-backed proof the task is complete
Block severity "Hard rule" Violation halts the run; cannot be silenced without an operator
Rule expiry "Stale rule sweep" A rule with no fails in N days is up for retirement

Further Reading

// EXTERNAL_RESOURCES