4 minute read

Every AI conversation starts cold. Your agent doesn’t remember what it figured out yesterday. It doesn’t know what it concluded last week. It will re-read the same documents, re-derive the same conclusions, and charge you the same tokens — every single session. Your organization’s AI bill scales with ignorance, not with novelty.

I’ve spent this year building the fix. It’s called external epistemic memory — knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows.

Three Words, Each Load-Bearing

External — outside the model’s parameters, in a separate substrate. A SQLite database, not a neural network. This means the memory survives model upgrades, works across providers (Claude, Gemini, local models), and can be copied, inspected, edited, and audited by humans. When your model provider ships a new version, the memory keeps its content. When you swap from Opus to Haiku to save costs, the knowledge comes with you.

Epistemic — not just facts, but justified beliefs. Every claim carries its justification chain: where did this come from? What depends on it? Does anything contradict it? When a belief is retracted, everything derived from it cascades to OUT automatically. This is what distinguishes EEM from RAG. A vector database stores text chunks and returns approximate matches. An epistemic memory stores justified claims and returns auditable knowledge.

Memory — Tulving’s cognitive science term for persistent structured knowledge. This is long-term semantic memory for AI agents. It persists across sessions, compaction cycles, and model swaps. The organization stops forgetting.

What It Looks Like in Practice

The build pipeline takes source documents and produces a reviewed, justified belief network:

Sources --> Entries --> Beliefs --> Derived Beliefs --> Reviewed Network
 fetch     summarize   extract     derive              review + retract

An LLM reads source documents and extracts discrete, justified claims. Then a derive step combines existing beliefs into new derived beliefs — connections and implications that no single source document states. Then a review step evaluates each derivation: is the justification valid? Does it overreach? The review catches 13-37% of derived beliefs as invalid and retracts them, with corrections cascading through the network.

At query time, an agent searches the belief network instead of re-reading raw documents:

$ reasons search "deployment architecture"
NODE-142: AAP 2.6 uses a hub-spoke topology with automation controller...
NODE-287: Edge deployments use execution environments for disconnected...
NODE-431: [DERIVED] The deployment architecture supports three tiers...

$ reasons show NODE-431
Text: The deployment architecture supports three tiers...
Status: IN
Depth: 2
Justified by: NODE-142, NODE-287
Dependents: NODE-512, NODE-519

Every answer traces back through its justification chain to source evidence. “How does the system know this?” has a literal, traversable answer.

The Data

I built 40+ expert knowledge bases across domains — enterprise products, codebases, research papers, certification curricula. The largest covers 6 departments with 5,366 source documents producing 13,511 justified beliefs.

Metric Without EEM With EEM
A-grade accuracy 33% 88%
Response time 15x slower Baseline
Haiku + EEM vs Opus alone 94% vs 98%
Construction cost (6 departments) ~$300 (Sonnet)

The cheapest Claude model with EEM matches the most expensive model without it. That’s not a marginal improvement — it’s a model tier for $300 in construction cost, amortized across every future query.

The 88% vs 33% comparison is controlled: same 50 domain questions, same model (Opus), same evaluation rubric. The only difference is whether the agent has access to the pre-built belief network. Without it, the agent searches raw documents from scratch every time. With it, the agent queries pre-computed, reviewed, justified knowledge.

Why Now

Andrej Karpathy recently published an “LLM Wiki” proposal — a persistent, structured knowledge base that an LLM incrementally builds and maintains instead of re-discovering knowledge from scratch on every query. Same diagnosis: RAG is stateless waste. Same general solution: build and maintain a knowledge artifact.

EEM goes further in three ways:

  1. Justification chains. The wiki stores conclusions as markdown pages. EEM stores beliefs with their full derivation provenance — every claim traces back to source evidence.
  2. Retraction cascades. When a wiki page is wrong, someone has to find and fix everything that referenced it. When an EEM belief is retracted, corrections propagate automatically through all dependents.
  3. Measured results. EEM has controlled eval data across 4 model families. The wiki is a design proposal.

The convergence matters. When independent researchers arrive at the same architecture from different starting points, the architecture is probably right.

The Pattern Behind It

Building an EEM follows a four-phase loop that shows up everywhere iterative work happens:

Phase EEM Code Development
Generate Derive new beliefs from existing ones Write code
Critique Review beliefs for validity Code review
Repair Fix justifications, soften claims, retract Fix review findings
Follow up Identify gaps for next derive round File follow-up issues

The follow-up phase is what makes it a loop instead of a pipeline. The repair phase’s discoveries become the next cycle’s input. Each round, the network gets more accurate and more complete.

What’s Next

The tools are open source:

  • ftl-reasons — The EEM engine: justified beliefs, retraction cascades, derive, review
  • expert-agent-builder — Pipeline to build EEM from source documents

The research continues: epistemic stratification (how iterative derivation produces distinct reasoning layers), cross-model construction (which models build the best EEMs), and the depth-8 ceiling (why beliefs beyond 8 derivation steps don’t survive review — on any reasoning substrate, human or machine).

Your agents don’t need bigger models. They need memory that carries its reasons.

Previously in this series: LLMs Don’t Need Bigger Models, They Need Clay Tablets, Classical AI Solved Your LLM’s Problems, The Sawtooth.