The Expert Agent (Revised)

7 minute read

This is a revised edition of the original Expert Agent post from March 2026. The original described building expert agents from git repos with markdown beliefs — five creative AI programmes in four days. Three months later, the architecture matured into a pipeline that has produced 40+ domain experts with measured performance: 88% vs 33% accuracy, model-tier compression, and a community hub for sharing domain competence.

A repo is an expert.

Not metaphorically. Literally. When you build up a repository with a knowledge base of justified beliefs, source documents the agent has read, entries recording what it’s learned, and contradiction records tracking what failed — you have constructed a domain expert that can be instantiated in seconds.

The agent isn’t an expert because the LLM “knows” the domain. The LLM has read everything and remembers nothing reliably. The agent is an expert because the repo remembers for it. The belief network says what the agent knows and why it knows it. The entries say what it’s discovered. The nogoods say what contradicts what. Strip away the repo and you have a generalist. Add it back and you have a specialist.

From Five Programmes to Forty Experts

The original post described five creative AI programmes built in four days — music, art, story, animation — with 163 entries and 130 beliefs. That was the proof of concept.

The architecture scaled:

Expert	Domain	Beliefs	What it found
redhat-expert	Enterprise products (6 departments)	12,731	136 actionable blockers no one asked about
handbook-expert	Internal processes (724 sources)	4,496	8% premise fabrication rate measured
leetcode-expert	Algorithms (510 solutions)	1,641	17 real bugs that passed all test suites
ddia-expert	Distributed systems (37 implementations)	1,405	30 bugs across 3 rounds of analysis
aap-expert	Ansible Automation Platform	237	+12.7pp recall, 47% cheaper than from-scratch
beliefs-pi	EEM research itself	527	The system finding problems in itself

Plus 30+ more across codebases, certification curricula, research papers, and product documentation. The total: over 30,000 justified beliefs across domains, each with provenance tracing back to source documents.

The creative AI programmes from the original post were the first five. They proved the architecture works. The next forty proved it scales.

The Anatomy of an Expert (Updated)

An expert agent has five layers. The original post described these as markdown files and CLI tools. They’re now a concrete pipeline:

Belief network (reasons.db) — Not a flat list of claims. A SQLite database of justified beliefs with truth values (IN/OUT), derivation chains, retraction cascades, and contradiction records. When one belief is retracted, everything derived from it cascades automatically. Built and managed by ftl-reasons.

Source documents (entries/, sources/) — The raw material the beliefs were extracted from. Every belief traces back to a specific source document. When the source changes, check-stale flags the dependent beliefs.

Expert prompt (CLAUDE.md) — Not “you are a helpful assistant.” Instead: the agent’s domain, its research questions, its current evidence, and its tools. The tighter the role, the better the expertise.

Retrieval layer (expert-service) — Dual-path search: pre-computed beliefs (TMS path) plus full-text source search (FTS path). Any model can query any expert via HTTP. The cheapest model with this layer matches the most expensive model without it.

Build pipeline (expert-agent-builder) — Automated construction: fetch docs, summarize into entries, extract candidate beliefs, review premises against sources, accept reviewed beliefs, derive higher-order conclusions, review derivations adversarially, retract errors with cascade propagation.

The original anatomy was markdown files in a git repo. The updated anatomy is a pipeline that produces a database-backed knowledge base with measured quality gates at every stage.

The Build Pipeline

Building an expert used to be manual — read sources, write beliefs, track dependencies by hand. Now it’s a pipeline:

fetch-docs → summarize → propose-beliefs → review-premises → accept
                                                                 ↓
                                              derive → review → repair
                                                ↑                 ↓
                                                └─────────────────┘

Phase 1: Extract. Fetch source documents, summarize into structured entries, extract candidate beliefs with ACCEPT/REJECT recommendations.

Phase 2: Review premises. A separate LLM pass checks each proposed belief against its source: “Does this document actually say this?” This catches the 8% fabricated specificity rate — plausible details the proposer adds that the source never mentioned.

Phase 3: Derive and review. Generate higher-order conclusions from combinations of existing beliefs. Then adversarially evaluate each derivation: does the conclusion actually follow from the premises? This catches 13-37% of derived beliefs per round.

Phase 4: Iterate. Each round of derive-review-repair catches fewer errors — cascade impact declines from 4.1 per retraction (round 1) to 0.8 (round 3). When rounds stop finding issues, the pipeline has covered the model’s accessible knowledge for this domain.

The cost: $10-$300 depending on domain scale. The redhat-expert (12,731 beliefs from 5,366 source documents across 6 departments) cost approximately $300 with Sonnet. A medium-size domain costs $10-$25.

When an Expert Knows It’s Wrong

The art programme’s closure from the original post remains the strongest early evidence that the architecture works. A lesser system would have pushed through — producing increasingly ugly renderings while insisting the approach was viable. Instead, the art PI registered three nogoods, identified the root cause (medium mismatch between vector graphics and traditional art techniques), and recommended pivoting.

Three months later, this self-correction mechanism has been measured at scale:

Derive-review retraction: 13-37% of derived beliefs caught as invalid per round. The derive step generates. The review step critiques. Same model, different objective, different knowledge activated.

Gated belief analysis: Concrete observations that block general conclusions. “We believe the deployment architecture is production-ready” is blocked by “TLS verification is disabled in health checks.” In redhat-expert, 136 of these blockers surfaced — problems the system found in itself without anyone asking.

Cascade convergence: Each round catches fewer errors because the high-impact ones (at the base of the deepest reasoning chains) get caught first. By round 3, the remaining errors are low-impact leaves. The system converges.

This is what a research institution looks like: programmes that can close without the institution losing what they learned, and knowledge bases that find their own problems.

The Performance Data

The original post claimed expert agents were useful. Now we can quantify how useful:

Metric	Without Expert	With Expert
A-grade accuracy (50 questions)	33%	88%
Response time	350s average	25s average
Haiku + expert vs Opus alone	—	94% vs 98%
Per-query cost	~$0.12	< $0.06

The cheapest Claude model with an expert knowledge base matches the most expensive model without one. The architecture compresses a model-tier gap into a 4-point difference at 1/60th the per-query cost.

This holds across domains. The LeetCode expert found 17 bugs that passed all test suites. The DDIA expert found 30 implementation bugs across 3 rounds. The aap-expert improved recall by 12.7 percentage points while cutting cost by 47%.

Artificial Domain Competence

We call what an expert agent provides Artificial Domain Competence — domain expertise that any model can access, regardless of what it was trained on.

You don’t need to wait for AGI. You can have ADC right now. Clone an expert repo, point your model at it, and you have domain competence today:

# Clone a pre-built expert
git clone https://github.com/eem-hub/ddia-expert

# Or build your own
expert-build init my-domain --domain "My Domain"
expert-build fetch-docs https://docs.example.com/
expert-build summarize
expert-build propose-beliefs
expert-build accept-beliefs

The EEM Hub is where domain experts share their knowledge bases. One team builds the distributed systems expert. Another builds the Kubernetes expert. A third builds the compliance expert. Any model can query all of them. The knowledge compounds across the community.

The Human’s Role

The original post described a division: “the human decides what questions are worth asking and judges whether the results are any good. The agent does everything in between.”

That division still holds. But the “everything in between” is now automated. The human:

Chooses the domain — what sources to feed the pipeline
Reviews proposed beliefs — the pipeline recommends ACCEPT/REJECT; the human approves or overrides
Judges the results — are the derived conclusions useful? Are the gated blockers real problems?

The pipeline does the reading, extracting, deriving, reviewing, and retracting. The human provides direction and judgment. The cost of exploring a dead end collapsed from months to hours — exactly as the original post predicted. What changed is that the pipeline is now automated end-to-end instead of requiring manual belief tracking.

The Bottom Line

An expert agent built this way is not AGI. It doesn’t generalize across all domains. It doesn’t learn new skills. It is artificially competent in a specific domain — the domain you built the knowledge base for.

But this is exactly what most organizations need. They don’t need a model that can do everything. They need a model that knows their domain. The expert agent architecture turns that need into a $10-$300 build, usable from the first belief, portable across models, and self-correcting through adversarial review.

The tools are open source:

ftl-reasons — Justified belief networks with retraction cascades
expert-agent-builder — Automated pipeline from sources to expert
expert-service — Dual-path retrieval serving beliefs to any model
EEM Hub — Pre-built experts you can clone and use today

This is a revised edition of the original Expert Agent post from March 2026. The original described five creative AI programmes built manually with markdown beliefs. This revision updates with the automated pipeline, 40+ domain experts, controlled performance data, and the ADC framing for community sharing via the EEM Hub.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson

The Expert Agent (Revised)

From Five Programmes to Forty Experts

The Anatomy of an Expert (Updated)

The Build Pipeline

When an Expert Knows It’s Wrong

The Performance Data

Artificial Domain Competence

The Human’s Role

The Bottom Line

Share on

You May Also Enjoy

Keep a Diary

The Sawtooth: Why Your AI Forgets Why It Believes Things (Revised)

Metaprogramming With Beliefs: Treating Knowledge About Code as Data (Revised)

Classical AI Solved Your LLM’s Problems in 1979 (Revised)