Stop Fine-Tuning, Start Remembering

7 minute read

Your organization wants domain-specific AI.

Your team has two options:

Fine-tune the model. Modify the weights so the model “knows” your domain. GPU hours, ML expertise, training pipelines, forgetting risk.
Give the model external memory. Store domain knowledge in a database. The model reads it at query time. No training, no GPUs, no forgetting.

I’ve spent this year building option 2 — external epistemic memory (EEM). After controlled experiments across 4 model families, 40+ knowledge bases, and production deployments, the economics aren’t close.

The Numbers

I built a knowledge base covering 6 enterprise departments from 5,361 source documents. It produced 13,511 justified beliefs. Total cost: ~$300 at Sonnet pricing.

A single fine-tuning run on a 7B model costs $500–$5,000 in GPU time, requires ML expertise to configure, and produces a model locked to one architecture. The EEM costs less than a team lunch, requires no ML expertise, and works with every model.

Dimension	Fine-Tuning	EEM
Construction cost	$10K–$100K+	$10–$300
Time to first value	Hours to days	Minutes
Per-query cost	Model serving	< $0.06/question
Adding new knowledge	Retrain (GPU hours)	Add a belief (instant)
Removing wrong knowledge	Retrain	Retract one belief
Forgetting risk	Nonzero	Zero
Model portability	Locked to one architecture	Any model via HTTP
Inspectable	No	Read the belief text

For a medium-sized domain (~100 source documents), the total construction cost is $10–25. Break-even happens after 100–250 queries. An enterprise querying 50 questions per day breaks even in 2–5 days.

Performance

Same 50 domain questions. Same model. Same evaluation rubric. The only difference: does the agent have access to pre-built beliefs?

Metric	Without EEM	With EEM
A-grade answers	33%	88%
Response time	~350s	25s
Speed	Baseline	15x faster

In a head-to-head with the same synthesis model (Gemini 3.1 Pro), switching from raw document search to belief retrieval produced 2.5x better accuracy and 10x faster responses. The retrieval architecture is the only variable.

In a separate controlled eval (55 questions, AAP domain):

Metric	From Scratch	With EEM
Cost	$13.61	$7.25
Tokens	142,395	50,199
MC accuracy	97.5%	100%
Open-ended recall	72.9%	85.6%

Better accuracy. Half the cost. A third of the tokens.

Smaller Models, Same Results

Here’s the economic multiplier. EEM equalizes model size:

Configuration	A+B Grade
Haiku + EEM	94–100%
Opus + EEM	98%
Opus without EEM	20–94%

Haiku is 10–50x cheaper per token than Opus. With EEM, the cheapest model matches the most expensive one. The memory is doing the work, not the model.

Build the EEM once with frontier tokens. Query it forever with the cheapest model that can read. The marginal cost of the 1,000th question is near zero.

You Don’t Even Need an LLM

Fine-tuned knowledge is locked inside model weights. The only way to access it is to run the model on a GPU. EEM knowledge lives in a SQLite database — and any program that can query a database can use it.

A production routing system I deployed replaces two LLM calls with a full-text search over beliefs:

# No LLM. FTS5 keyword search over belief text.
results = reasons_search(user_query, db_path=eem_db)
for belief in results:
    if belief["id"].startswith("mart-"):
        # Matched a datamart
    elif belief["id"].startswith("intent-"):
        # Mapped user vocabulary to schema vocabulary

167 beliefs — 8 describing datamarts, 122 describing tables with column names, 26 mapping user vocabulary to schema vocabulary (“bugs” → ISSUETYPE, “VMs” → IS_VIRTUAL). SQLite FTS5 search. Sub-100ms.

Metric	LLM Routing (before)	EEM Routing (after)
Latency	10–30s	<100ms
Accuracy	88%	96%
Infrastructure	Vertex AI credentials	SQLite file

100x faster. 8 percentage points more accurate. No API credentials. No model serving costs. The per-query cost is effectively zero.

This is a capability fine-tuning fundamentally cannot provide. Knowledge in weights requires GPU inference to access. Knowledge in a database is accessible to LLMs, application code, shell scripts, CI pipelines, and humans reading the text directly.

Catastrophic Forgetting Doesn’t Exist

The entire continual learning field — EWC, OSFT, progressive networks, replay buffers — exists because neural network weights are continuous and interdependent. Updating any weight to learn something new risks degrading everything the model already knows.

OSFT (Nayak et al., 2025) solves this with SVD decomposition of weight matrices, gradient projection into orthogonal subspaces, and a tunable forgetting dial (unfreeze_rank_ratio). It’s a 12-page paper with eigenvalue analysis to manage a tradeoff.

EEM doesn’t have this tradeoff. Adding belief #501 does not modify beliefs #1–500. Each belief is a row in a database. There is no shared parameter space, no optimization surface where old and new knowledge compete.

	Fine-Tuning (OSFT)	EEM
Add knowledge	Constrain gradients to low-rank subspace	Add a database row
Preserve old knowledge	Freeze singular values, project gradients	Nothing to do
Forgetting risk	Nonzero (tunable, never zero)	Zero
Time to availability	After convergence (hours)	Immediately

While building an EEM for Minecraft modding, I could query it while the pipeline was still running. Five mod experiments worked on the first try from beliefs extracted from tutorials. The sixth failed — because a tutorial was wrong, not the system. Claude Code traced it through the belief chain, found the bad source, updated one belief, fixed. Total time to fix: minutes. Total things broken by the fix: zero.

If that had been a fine-tuned model, the fix would be: update the training data, regenerate synthetic data, retrain with forgetting constraints, validate that old capabilities survived, deploy. Hours of compute to fix one wrong fact.

Why Architecture Beats Training

An 85% accurate model across 5 pipeline stages produces correct output 44% of the time. Errors compound multiplicatively (more on this here).

The same 85% model with iterative EEM architecture — derive beliefs, review for errors, retract invalid ones, repeat — converges to 98.2% in 5 rounds.

Getting from 85% to 98% through training requires a ~9x reduction in error rate. That’s a decades-long engineering effort historically. Getting there through EEM architecture is immediate.

Returns on model improvement are linear. Returns on architecture are multiplicative.

When Fine-Tuning Still Wins

EEM is not a universal replacement for training. Training wins for:

Skill learning. Reasoning patterns, tool-use behaviors, code generation — these are procedural skills that need weight updates. EEM stores what to know, not how to think.
Style adaptation. Writing voice, formatting conventions — fine-tuning shapes how the model expresses itself. EEM shapes what it knows.
Native tool use. Training small models to natively query EEMs (via LoRA + GRPO) is itself a training task. The skill of using external memory is best learned through weight updates.
Non-LLMs. Robotics, computer vision, drone navigation — these learn from sensor data, not text. Training is the only path when the knowledge is perceptual, embodied, or spatiotemporal.

The clean split: train for skills, EEM for knowledge. Train for perception, EEM for propositions. Most “model customization” requests are actually knowledge requests — and for those, EEM wins on every dimension.

Build Your Knowledge Base While You Can

People are losing access to models — rate limits tightening, pricing tiers changing, providers deprecating older models. If your knowledge lives in model weights, losing access to the model means losing the knowledge. If your knowledge lives in an EEM, it survives any change in model availability.

EEM turns model access into a construction tool rather than a dependency. Use the best model you have access to today to extract, derive, and review beliefs. The resulting knowledge base is a SQLite file that works with any model tomorrow — or no model at all. The frontier model is the drill that digs the well, not the well itself.

This is the insurance argument: build your knowledge base now while you still have access to good models. The beliefs persist. The justification chains persist. The model that built them can disappear entirely, and the knowledge remains queryable, inspectable, and useful.

A fine-tuned model is the opposite bet. You’re encoding knowledge into weights that are locked to one architecture, one provider, one moment in time. If the base model is deprecated, the fine-tune dies with it.

The Squeeze

Fine-tuning is being compressed from three directions:

From above. Frontier models improve faster than fine-tunes can keep up. Each new base model release obsoletes your existing fine-tune.
From the side. GPU infrastructure accelerates model building from scratch. If you can build a model in hours, fine-tuning becomes a bottleneck.
From below. EEM eliminates the need to put domain knowledge in weights at all.

Frontier tokens are subsidized below cost by investors. Token prices decrease every quarter. A single ML engineer’s salary buys millions of API calls. The economics of running your own training infrastructure are moving in the wrong direction.

The Bottom Line

	Fine-Tuning	EEM
Build cost	$10K–$100K+	$10–$300
Deploy time	Days	Minutes
Fix a wrong answer	Retrain	Edit a line of text
Model portability	Locked	Any model
Accuracy (same questions)	33% A-grade	88% A-grade
Forgetting risk	Nonzero	Zero
Usable without an LLM	No	Yes

For domain knowledge — which is what most organizations actually need — EEM is cheaper to build, faster to deploy, easier to fix, portable across models, and produces better results. The economics favor EEM for any domain queried more than ~100 times.

Your LLM doesn’t need different weights. It needs a memory that carries its reasons.

The tools are open source:

ftl-reasons — The EEM engine
expert-agent-builder — Build EEM from source documents
EEM-Hub — Community-built EEMs

Browse a live EEM: expert.ftl2.com/public

Previously in this series: External Epistemic Memory: What It Is and Why It Matters, The Dirty Pipeline, LLMs Don’t Need Bigger Models, They Need Clay Tablets.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson