Your AI Agents Are Lying to Each Other

5 minute read

I run six AI agents across seven repositories. They share a codebase, share results, and reference each other’s work. After months of operation, I audited their beliefs. Every single agent was operating on stale information. But the real problem wasn’t staleness — it was that agents in different roles held contradictory beliefs and none of them knew it.

The PI’s role definition said the tensor sector was “resolved.” The PI’s own entry, written three days later, classified it as “falsified.” Both documents sat in the same repository. The agent read both and never noticed the contradiction.

The review agent’s role definition described the project scope as one thing. The actual research had moved into entirely different territory. The agent kept reviewing against the old scope.

This isn’t lying in the human sense. It’s worse. These agents are incapable of detecting their own inconsistency.

The Divergence Problem

When multiple agents work on related problems, their beliefs drift apart. Each agent reads its own role definition, works in its own context, and accumulates its own understanding. There’s no mechanism for Agent A to notice that Agent B just invalidated one of Agent A’s core assumptions.

In classical AI, this is called the merge problem (Alchourrón, Gärdenfors, and Makinson, 1985). When two belief sets developed independently need to be reconciled, you can’t just union them — they may contain contradictions. You need a principled way to decide which beliefs survive.

My agents had no merge mechanism. They diverged silently.

Role Determines What Gets Registered

I built a tool called beliefs — a CLI for tracking claims with their sources, dates, and dependencies. Within six hours of making it available, five agents had adopted it independently. Four were unprompted.

But here’s what I didn’t expect: what an agent registers depends entirely on its role, not on any guidance about how to use the tool.

Agent	Claims	Warnings	Retracted	Sources
Researcher	12	0	0	CLAUDE.md
Reviewer	12	2	0	Specific entries
Verifier	25	7	3	Specific entries

The researcher registered only successes. Twelve claims, all positive, zero warnings, zero retractions. Sourced from the role definition file — a summary document, not primary evidence.

The verifier registered problems. Twenty-five claims including seven active warnings and three retracted beliefs with documented reasons. Sourced from specific dated entries — primary evidence that enables meaningful staleness detection.

The verifier’s registry was objectively more valuable. It tracked known gaps, documented what had been tried and failed, and maintained the full lifecycle of beliefs from IN to STALE to OUT. The researcher’s registry was a highlight reel.

Nobody told them to behave differently. Their roles shaped what they noticed and what they considered worth recording.

The CLAUDE.md Problem

Ten of the researcher’s twelve claims cited CLAUDE.md as their source. This creates a cascading fragility:

All ten claims go stale together if CLAUDE.md changes — a noisy, uninformative signal
CLAUDE.md is a summary, not primary evidence. The staleness detector can’t distinguish which specific claim is invalidated
No granularity — you can’t trace a specific belief back to the specific discovery that produced it

Compare the verifier’s approach: claims sourced from specific entries like entries/2026/02/21/verification-audit.md. When that entry’s content changes, only the claims derived from it are flagged. The staleness signal is precise.

Nogoods: Contradictions That Persist

The beliefs tool maintains a contradiction database called nogoods.md. When two claims are found to be mutually inconsistent, the contradiction is recorded permanently — it can’t be deleted or buried.

This matters because contradictions in multi-agent systems tend to get rediscovered. Agent A finds a problem in February. It gets lost during context compaction. Agent B rediscovers it in March. Without a persistent record, the system wastes cycles on the same contradiction repeatedly.

Nogoods are append-only. Once discovered, always known. The resolution gets recorded too, but the original contradiction stays in the record.

Entrenchment: Who Wins When Beliefs Conflict

When two claims contradict each other, someone has to decide which one survives. The beliefs tool uses epistemic entrenchment — a concept from AGM belief revision theory (Gärdenfors, 1988). Each claim gets a score based on:

Source type: A simulation result outranks a speculation. A formal derivation outranks an analytical argument. A verified test outranks an untested claim.
Recency: Newer evidence gets a bonus, capped at six months.
Derivation type: Axioms are hardest to retract. Warnings are almost as hard — guardrails should be sticky. Predictions are easier to retract when evidence arrives.

When you run beliefs resolve claim-a claim-b, it shows you the scores and tells you which belief should survive. It doesn’t auto-retract — you decide. But it gives you a principled basis for the decision instead of gut feel.

Cross-Repo Verification

Claims can reference source files in other repositories. The tool resolves paths like physics-review/entries/2026/02/20/gw-polarization.md against a registry of known repos.

beliefs check-refs verifies that:

The source file exists (FAIL if not)
At least 50% of the claim’s keywords appear in the source (WARN if not)
All dependency targets exist as registered claims (FAIL if not)
No dependencies point to OUT (retracted) claims (WARN if they do)

This catches the most common drift: an agent claims something based on a file that was later deleted, renamed, or rewritten to say something different.

What This Looks Like in Practice

# Initialize with your repos
beliefs init --repos project-a project-b

# Register a claim
beliefs add --id "auth-uses-jwt" \
  --text "The auth system uses JWT tokens" \
  --source project-a/entries/2026/02/15/auth-design.md \
  --type DERIVED

# Later: check for staleness
beliefs check-stale
# STALE auth-uses-jwt: source file hash changed

# Check cross-references
beliefs check-refs
# WARN auth-uses-jwt: keyword "JWT" not found in source

# Resolve a conflict
beliefs resolve auth-uses-jwt auth-uses-sessions
# auth-uses-sessions wins (score 75 vs 60, newer source)

The Takeaway

If you run multiple AI agents that share information, they are diverging right now. They hold contradictory beliefs and don’t know it. The longer they run, the worse it gets.

The fix isn’t better prompts. It’s infrastructure: a registry that tracks what each agent believes, where those beliefs came from, and whether the sources still support them. Give your error-finding agents the tool first — they’ll register the most valuable information.

The code is at github.com/benthomasson/beliefs. Zero dependencies. Markdown-native. Install the skill and your agents will start using it.

uv tool install git+https://github.com/benthomasson/beliefs
beliefs install-skill

This is post 2 in a series on belief management for AI agents. Previously: LLMs Have No Memory of Time. Next: what happened when five agents adopted the beliefs tool overnight — without being asked.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson

Your AI Agents Are Lying to Each Other

The Divergence Problem

Role Determines What Gets Registered

The CLAUDE.md Problem

Nogoods: Contradictions That Persist

Entrenchment: Who Wins When Beliefs Conflict

Cross-Repo Verification

What This Looks Like in Practice

The Takeaway

Share on

You May Also Enjoy

LLMs Don’t Have Super-Human Intelligence, But You Can

Give Yourself Superpowers

This Blog Is Not for You, Human

Context Engineering Is Dead — Structure Your Information Instead