External Epistemic Memory: What It Is and Why It Matters
Every AI conversation starts cold. Your agent doesn’t remember what it figured out yesterday. It doesn’t know what it concluded last week. It will re-read the same documents, re-derive the same conclusions, and charge you the same tokens — every single session. Your organization’s AI bill scales with ignorance, not with novelty.
I’ve spent this year building the fix. It’s called external epistemic memory — knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows.
Three Words, Each Load-Bearing
External — outside the model’s parameters, in a separate substrate. A SQLite database, not a neural network. This means the memory survives model upgrades, works across providers (Claude, Gemini, local models), and can be copied, inspected, edited, and audited by humans. When your model provider ships a new version, the memory keeps its content. When you swap from Opus to Haiku to save costs, the knowledge comes with you.
Epistemic — not just facts, but justified beliefs. Every claim carries its justification chain: where did this come from? What depends on it? Does anything contradict it? When a belief is retracted, everything derived from it cascades to OUT automatically. This is what distinguishes EEM from RAG. A vector database stores text chunks and returns approximate matches. An epistemic memory stores justified claims and returns auditable knowledge.
Memory — Tulving’s cognitive science term for persistent structured knowledge. This is long-term semantic memory for AI agents. It persists across sessions, compaction cycles, and model swaps. The organization stops forgetting.
What It Looks Like in Practice
The build pipeline takes source documents and produces a reviewed, justified belief network:
Sources --> Entries --> Beliefs --> Derived Beliefs --> Reviewed Network
fetch summarize extract derive review + retract
An LLM reads source documents and extracts discrete, justified claims. Then a derive step combines existing beliefs into new derived beliefs — connections and implications that no single source document states. Then a review step evaluates each derivation: is the justification valid? Does it overreach? The review catches 13-37% of derived beliefs as invalid and retracts them, with corrections cascading through the network.
At query time, an agent searches the belief network instead of re-reading raw documents:
$ reasons search "deployment architecture"
NODE-142: AAP 2.6 uses a hub-spoke topology with automation controller...
NODE-287: Edge deployments use execution environments for disconnected...
NODE-431: [DERIVED] The deployment architecture supports three tiers...
$ reasons show NODE-431
Text: The deployment architecture supports three tiers...
Status: IN
Depth: 2
Justified by: NODE-142, NODE-287
Dependents: NODE-512, NODE-519
Every answer traces back through its justification chain to source evidence. “How does the system know this?” has a literal, traversable answer.
The Data
I built 40+ expert knowledge bases across domains — enterprise products, codebases, research papers, certification curricula. The largest covers 6 departments with 5,366 source documents producing 13,511 justified beliefs.
| Metric | Without EEM | With EEM |
|---|---|---|
| A-grade accuracy | 33% | 88% |
| Response time | 15x slower | Baseline |
| Haiku + EEM vs Opus alone | — | 94% vs 98% |
| Construction cost (6 departments) | — | ~$300 (Sonnet) |
The cheapest Claude model with EEM matches the most expensive model without it. That’s not a marginal improvement — it’s a model tier for $300 in construction cost, amortized across every future query.
The 88% vs 33% comparison is controlled: same 50 domain questions, same model (Opus), same evaluation rubric. The only difference is whether the agent has access to the pre-built belief network. Without it, the agent searches raw documents from scratch every time. With it, the agent queries pre-computed, reviewed, justified knowledge.
Why Now
Andrej Karpathy recently published an “LLM Wiki” proposal — a persistent, structured knowledge base that an LLM incrementally builds and maintains instead of re-discovering knowledge from scratch on every query. Same diagnosis: RAG is stateless waste. Same general solution: build and maintain a knowledge artifact.
EEM goes further in three ways:
- Justification chains. The wiki stores conclusions as markdown pages. EEM stores beliefs with their full derivation provenance — every claim traces back to source evidence.
- Retraction cascades. When a wiki page is wrong, someone has to find and fix everything that referenced it. When an EEM belief is retracted, corrections propagate automatically through all dependents.
- Measured results. EEM has controlled eval data across 4 model families. The wiki is a design proposal.
The convergence matters. When independent researchers arrive at the same architecture from different starting points, the architecture is probably right.
The Pattern Behind It
Building an EEM follows a four-phase loop that shows up everywhere iterative work happens:
| Phase | EEM | Code Development |
|---|---|---|
| Generate | Derive new beliefs from existing ones | Write code |
| Critique | Review beliefs for validity | Code review |
| Repair | Fix justifications, soften claims, retract | Fix review findings |
| Follow up | Identify gaps for next derive round | File follow-up issues |
The follow-up phase is what makes it a loop instead of a pipeline. The repair phase’s discoveries become the next cycle’s input. Each round, the network gets more accurate and more complete.
What’s Next
The tools are open source:
- ftl-reasons — The EEM engine: justified beliefs, retraction cascades, derive, review
- expert-agent-builder — Pipeline to build EEM from source documents
The research continues: epistemic stratification (how iterative derivation produces distinct reasoning layers), cross-model construction (which models build the best EEMs), and the depth-8 ceiling (why beliefs beyond 8 derivation steps don’t survive review — on any reasoning substrate, human or machine).
Your agents don’t need bigger models. They need memory that carries its reasons.
Previously in this series: LLMs Don’t Need Bigger Models, They Need Clay Tablets, Classical AI Solved Your LLM’s Problems, The Sawtooth.