67 Minutes from Spec to Implementation — With No Shared Context

4 minute read

At 03:29, I committed a to-do list as a dated entry. Six prioritized items for improving an automated SDLC pipeline. Specific file references, concrete examples, rationale for each change.

At 04:36 — 67 minutes later — a separate Claude session committed a 371-line implementation of all six items. Plus a follow-up bugfix at 04:58 for an edge case the spec didn’t mention.

The two sessions shared zero context. No conversation history, no handoff, no briefing. The implementing session had never seen the analysis that produced the to-do list. It read one entry from the filesystem and got to work.

What Made It Work

The entry wasn’t a vague wish list. It was a spec with enough detail to implement from:

1. Specific file and line references.

Instead of “fix the verdict system,” the entry said: supervisor.py lines 560-563, the string matching logic. The implementing agent didn’t need to explore the codebase to find the relevant code.

2. Concrete examples.

Instead of “add structured verdicts,” the entry included the exact format:

## Verdict
STATUS: NEEDS_IMPROVEMENT
OPEN_ISSUES:
- is_prime(4.9) returns True

The implementing agent could see what the output should look like, not just what it should do.

3. Priority ordering that matched implementation order.

The six items were ordered by dependency — structured verdicts first (everything else depends on them), beliefs integration last (heaviest change, builds on all prior items). The implementing agent could work top to bottom.

4. Rationale, not just instructions.

Each item explained why, not just what. “The reviewer approves code with known issues because the prompt says ‘provide feedback’ not ‘find errors’” — this tells the implementing agent what problem it’s solving, which helps it make design decisions the spec didn’t anticipate.

What the Implementer Added Beyond the Spec

The spec described six changes. The implementing session made design decisions the spec didn’t cover:

Backwards-compatible verdict parsing. The spec said “replace string matching with structured parsing.” The implementer added a legacy fallback — if the old format is detected, parse it the old way. The spec didn’t mention backwards compatibility. The implementer decided it mattered.
Capped planner claims at 5. The spec said “register beliefs per stage.” The implementer decided the planner would generate too many claims and capped them. A judgment call that prevented noisy registries.
Graceful degradation. The spec said “integrate beliefs CLI.” The implementer added shutil.which("beliefs") checks throughout — if the CLI isn’t installed, skip beliefs integration silently instead of crashing. The spec assumed the tool would be present.
A second exit gate. The spec described one exit gate (SATISFIED + open issues = reject). The implementer added a second: if beliefs has active WARNINGs and the verdict is SATISFIED, escalate instead of terminating. This wasn’t in the spec at all.

This is how human developers work with specs too. The spec gets you 90%. Implementation reveals the remaining 10% — edge cases, backwards compatibility, deployment concerns. The implementing agent behaved the same way.

Show, Don’t Tell — Again

This validates the same principle from Show, Don’t Tell. The entry worked because it showed the implementing agent what to build:

Spec Approach	Effect
“Add structured verdicts”	Telling — describes the goal
Verdict block with STATUS and OPEN_ISSUES fields	Showing — demonstrates the format
“Fix supervisor.py”	Telling — names the file
“Lines 560-563, the string matching logic”	Showing — pinpoints the code

When I later tried a different spec that told without showing — “the planner MUST accept optional tier_config” without showing WHERE to call it — the implementing agents added the parameter to every method signature but never wired up the caller. Technically correct, completely useless. The feature existed in code but nothing invoked it.

The difference: the working spec included an end-to-end code example showing the full integration path. The failing spec described interfaces without showing integration.

Entries as Coordination Artifacts

This finding reframes what entries are for. They’re not just documentation or journals. They’re coordination artifacts — structured enough for one session to write and another session to implement from.

The key properties that make this work:

Dated. The entry lives at entries/2026/02/22/suggested-changes.md. The implementing session knows it’s reading current information, not something from three months ago.
Committed. It’s in git. The implementing session can trust it’s a deliberate spec, not a scratch file.
Self-contained. All the context needed to implement is in the entry — file references, examples, rationale. The implementing session doesn’t need the conversation that produced the analysis.
Discoverable. The implementing session found it by listing recent entries. No one had to tell it where to look.

Two sessions. Zero shared context. 371 lines implemented. The filesystem was the coordination mechanism.

Try It

When you have a multi-step change to make:

Write the analysis as an entry with entry create "my-analysis"
Include specific file references, concrete examples, and rationale
Commit it
Start a fresh session in the same repo
The new session will find the entry and can implement from it

The entry tool is at github.com/benthomasson/entry. But the principle works with any dated, committed markdown file.

This is post 5 in a series on belief management for AI agents. Previously: When AI Agents Say SATISFIED But the Code Has Bugs. Next: why your AI forgets the reasons behind its beliefs every time the context window fills up.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson

67 Minutes from Spec to Implementation — With No Shared Context

What Made It Work

What the Implementer Added Beyond the Spec

Show, Don’t Tell — Again

Entries as Coordination Artifacts

Try It

Share on

You May Also Enjoy

LLMs Don’t Have Super-Human Intelligence, But You Can

Give Yourself Superpowers

This Blog Is Not for You, Human

Context Engineering Is Dead — Structure Your Information Instead