Metaprogramming With Beliefs: Treating Knowledge About Code as Data (Revised)

7 minute read

This is a revised edition of the original Metaprogramming With Beliefs post from March 2026. The original described analyzing a 15,000-line codebase with 785 beliefs and filing 8 GitHub issues. Three months later, the same architecture has been applied across 40+ domains, producing over 30,000 beliefs, finding 47 code-level bugs across two independent studies, and surfacing 136 actionable blockers in a 12,731-belief enterprise knowledge base.

Traditional metaprogramming treats code as data — Lisp macros transform code, Python decorators wrap functions, compilers optimize ASTs. The program operates on programs.

We’ve been doing something different: treating knowledge about code as data. Not the code itself, but structured claims about what the code does, how its components interact, and where its assumptions break. A belief network that can be reasoned over, derived from, and acted on programmatically.

The result, at the scale we’ve now reached: an AI pipeline analyzed distributed systems implementations, algorithmic solutions, and enterprise codebases — producing tens of thousands of beliefs, discovering dozens of bugs that code review and test suites missed, and surfacing architectural problems that emerge from the structure of knowledge itself.

What a Belief Network Looks Like

A belief is a structured claim about code with a source, justification chain, and a truth value:

dep-detection-is-static-ast [IN]
  Source: ftl2/modules/dependency.py
  Depth: 0 (premise — observed from code)

dependency-resolution-production-ready [OUT]
  Depends on: dep-detection-is-static-ast, find-all-deps-dedup-by-path-and-import
  Unless: dependency-resolution-lenient
  (OUT because dependency-resolution-lenient is IN)

ai-safe-autonomous-operation [OUT]
  Depth: 4 (derived from 33 base premises)
  Blocked by: ssh-security-gaps, policy-engine-incomplete

Each belief is either IN (justified) or OUT (retracted). Derived beliefs depend on premises — when a premise goes OUT, everything derived from it cascades OUT automatically. You don’t have to remember which architectural claims are invalidated by a code change. The network computes it.

From 8 Issues to 166 Blockers

The original post described finding 8 GitHub issues from 785 beliefs about a single codebase. That was one expert, one domain.

The architecture now operates at a different scale:

Expert	Domain	Beliefs	What it found
agents-python-expert	15K-line codebase	785	8 issues, 3 bugs code review missed
ddia-expert	37 distributed systems implementations	1,405	30 bugs across 3 rounds
leetcode-expert	510 algorithmic solutions	1,641	17 bugs that passed all test suites
redhat-expert	Enterprise products (6 departments)	12,731	136 actionable blockers
awx-expert	AWX codebase architecture	~500	30 code-level gatekeepers

The mechanism is the same at every scale. Concrete observations (depth 0) combine into subsystem properties (depth 1-2), which combine into system-wide guarantees (depth 3-4). When a concrete observation contradicts a higher-level claim, that’s a gated belief — an actionable problem that emerged from the structure of knowledge, not from any query.

Three Categories of Bug Discovery

The original post showed one category: code analysis via belief networks. Three months of work revealed three distinct categories:

1. Structural analysis (the original)

The SSH module had known_hosts=None in multiple places, disabling host key verification entirely. It also interpolated user-controlled paths directly into shell commands — a command injection vector. No diff introduced these. No code review would have caught them because there was nothing new to review. The belief network caught them because it asked “is the SSH layer production-hardened?” and traced the answer through premises about host key verification and input sanitization.

2. Parametric knowledge extraction

The DDIA and LeetCode experts demonstrated something stronger. The source material was minimal — a table of contents and problem statements. Everything else came from the model’s training data, extracted through repeated pipeline passes:

Input	Output
DDIA table of contents	37 implementations, 1,405 beliefs, 30 bugs, 7 architectural rules
LeetCode problem statements	510 solutions, 1,641 beliefs, 17 bugs

The model that generated a bug during coding found that bug during review — because generation and review activate different knowledge from the same distribution. Six passes (generate, explain, extract, derive, review, gate) covered far more of what the model knows than any single conversation could.

The DDIA bugs were real: missing fsync calls, unguarded concurrency, recovery paths that violated their own invariants. A delete-before-rename bug in round 1 cascaded through 31 derived beliefs. By round 3, cascade impact had declined from 4.1 to 0.8 per retraction — the high-impact errors surface first.

3. Cross-domain emergence

The redhat-expert (12,731 beliefs across 6 departments) surfaced 136 actionable blockers that no one asked about. These weren’t bugs in the traditional sense — they were cross-cutting problems that emerge when you have enough structured knowledge about a system to detect contradictions across organizational boundaries.

“We believe the deployment architecture is production-ready” blocked by “TLS verification is disabled in health checks.” “We believe the approval workflow is complete” blocked by “HTTP-based communication uses a different rationale than documented.” These problems are invisible when you look at one component. They become visible when the belief network connects components.

The Gated Belief Mechanism

The key mechanism — where concrete observations block general conclusions — scales better than I expected.

In the original post, gated beliefs produced 8 issues from 785 beliefs. That’s roughly 1 issue per 100 beliefs. At scale:

Expert	Beliefs	Gated blockers	Rate
agents-python	785	8	1.0%
awx-expert	~500	30	6.0%
redhat-expert	12,731	136	1.1%

The rate is roughly stable at 1% — larger networks surface proportionally more blockers. Each blocker maps to a specific, actionable problem: a concrete observation that contradicts a general claim, with the full justification chain showing exactly why the general claim fails and what would need to change to make it true.

The Automated Loop (Mature)

The original post described the loop as partially automated with tools “in active development.” It’s now a complete pipeline:

expert-build fetch-docs    → source documents
expert-build summarize     → structured entries
expert-build propose-beliefs → candidate beliefs with ACCEPT/REJECT
                               ↓
                    review-premises (check against source)
                               ↓
expert-build accept-beliefs → verified premises in reasons.db
                               ↓
                    reasons derive → derived beliefs
                               ↓
                    reasons review-beliefs → adversarial evaluation
                               ↓
                    retract invalid → cascade propagation
                               ↓
                    reasons list --gated → actionable blockers

Each stage has a measured error rate. The propose step fabricates plausible details at 8%. The derive step over-generates at 13-37%. The review step catches both. The pipeline is self-correcting — each round catches errors from the previous round, converging toward a stable, high-quality network.

The belief tracker (ftl-reasons) makes no LLM calls. It’s pure infrastructure — SQLite, graph algorithms, deterministic. The LLM decides what to believe. The tracker guarantees consistency. Every other tool in the loop is stochastic and needs experimental validation. The belief tracker is deterministic and tested with unit tests. It’s the stable foundation.

Why “Metaprogramming”?

The framing from the original post still holds:

Level	Data	Operations
Programming	Values	Compute with them
Metaprogramming	Code	Transform, generate, analyze
This	Beliefs about code	Derive, retract, cascade, gate

We’re not transforming code or generating code. We’re building a structured theory of the codebase and then reasoning over that theory to produce actionable findings.

What’s changed is the ambition. The original post described analyzing a single codebase. The revised architecture analyzes anything that can be described in text — codebases, distributed systems, enterprise processes, certification requirements, research papers. The beliefs aren’t about code specifically. They’re about any domain. The metaprogramming pattern generalizes to meta-knowledge-work: treating knowledge about anything as structured, queryable, revisable data.

The Depth Hierarchy at Scale

Individual facts (depth 0) combine into subsystem properties (depth 1), which combine into cross-subsystem properties (depth 2), which combine into system-wide guarantees (depth 3-4). The original post showed a depth-4 conclusion tracing to 33 base premises.

At scale, a structural ceiling emerged: beliefs beyond depth 8 don’t survive review. At depth 0, the retraction rate is near zero (premises grounded in source material). By depth 9+, it’s 100%. The justification chains get too long for any reasoner — human or LLM — to evaluate reliably.

This isn’t a limitation of the model. It’s a property of reasoning itself. The probability of maintaining correct reasoning across N steps decreases with each step because the context grows, the reasoning gets more abstract, and the relevant knowledge becomes more specific. The depth-8 ceiling is where that probability drops below the review step’s detection threshold.

The practical response: keep the network wide, not deep. Add more depth-0 observations rather than trying to derive deeper. Experiments that bring fresh observations into the network reset the derivation depth counter and enable new reasoning chains from a verified foundation.

Getting Started

The tools are open source and mature:

ftl-reasons — Belief tracking with automatic retraction cascades
expert-agent-builder — Automated pipeline from sources to expert
expert-service — Dual-path retrieval serving beliefs to any model
EEM Hub — Pre-built experts you can clone and use today

# Build an expert for your codebase
expert-build init my-codebase --domain "My Project Architecture"
expert-build fetch-docs https://docs.example.com/
expert-build summarize
expert-build propose-beliefs
expert-build accept-beliefs

# Find what the network knows
reasons search "production readiness"
reasons list --gated   # actionable blockers

The key insight remains unchanged from the original: code analysis is a knowledge acquisition problem. Build the knowledge into a structured, queryable, maintainable network, and the analysis produces itself. What changed is the scale at which that insight operates — from one codebase to forty domains, from 8 issues to 166 blockers, from a prototype to a community.

This is a revised edition of the original Metaprogramming With Beliefs post from March 2026. The original described 785 beliefs about one codebase. This revision adds 40+ domains, three categories of bug discovery, the depth-8 ceiling, measured gating rates, and the complete automated pipeline.

*Previously: Classical AI Solved Your LLM’s Problems (Revised), Clay Tablets (Revised). Tools: ftl-reasons

expert-agent-builder

EEM Hub*

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson