Metaprogramming With Beliefs: Treating Knowledge About Code as Data

5 minute read

Traditional metaprogramming treats code as data — Lisp macros transform code, Python decorators wrap functions, compilers optimize ASTs. The program operates on programs.

We’ve been doing something different: treating knowledge about code as data. Not the code itself, but structured claims about what the code does, how its components interact, and where its assumptions break. A belief network that can be reasoned over, derived from, and acted on programmatically.

The result: an AI agent analyzed a 15,000-line codebase, built a network of 785 beliefs about it, derived architectural conclusions, identified security vulnerabilities that code review had missed, and filed 8 GitHub issues — all from the belief network, not from reading diffs.

What a Belief Network Looks Like

A belief is a structured claim about code with a source, dependencies, and a truth value:

dep-detection-is-static-ast [IN]
  Source: ftl2/modules/dependency.py

dependency-resolution-lenient [IN]
  Source: entries/2026/03/08/topic-dependency-resolution.md

dependency-resolution-production-ready [OUT]
  Depends on: dep-detection-is-static-ast, find-all-deps-dedup-by-path-and-import
  Unless: dependency-resolution-lenient
  (OUT because dependency-resolution-lenient is IN)

Each belief is either IN (believed) or OUT (retracted). Derived beliefs depend on premises — when a premise goes OUT, everything that depends on it cascades OUT automatically. You don’t have to remember which architectural claims are invalidated by a code change. The network computes it.

From Beliefs to Issues

The key mechanism is what we call GATE beliefs — conclusions that hold unless a known problem exists:

ai-guardrails-fully-operational [OUT]
  Depends on: ai-automation-guardrail-architecture, execution-never-crashes-caller
  Unless: policy-engine-incomplete, ssh-security-gaps

The “unless” clause uses non-monotonic reasoning: the conclusion is true unless counter-evidence is believed. Each blocking belief maps directly to a GitHub issue:

ssh-security-gaps → Issue #1: SSH host key verification disabled + command injection risk
policy-engine-incomplete → Issue #2: Policy engine implemented but dormant
dependency-resolution-lenient → Issue #5: Dependency resolution silently drops missing modules

When the developer fixes the SSH security gaps and retracts the blocking belief, ai-guardrails-fully-operational automatically flips to IN (provided the policy engine is also fixed). The resolution is verifiable — it’s a state change in the network, not a judgment call.

What This Found That Code Review Missed

The SSH module had known_hosts=None in multiple places, disabling host key verification entirely. It also interpolated user-controlled paths directly into shell commands without quoting — a command injection vector.

These existed from the initial implementation. There was no diff that introduced them. No code review would have caught them because there was nothing to review. They were invisible in the day-to-day flow of PRs.

The belief network caught them because it asked a different question: not “does this change look correct?” but “is the SSH layer production-hardened?” It traced the answer through premises about host key verification and input sanitization. Both premises were OUT. The derived belief was OUT. An issue was filed.

This is the difference between analyzing changes and analyzing the system. Code review is excellent at the former. Belief networks enable the latter.

The Automated Loop

Four tools compose into a complete cycle:

code-expert scan    → beliefs about code (785 facts)
         ↓
code-expert derive  → architectural conclusions + issues
         ↓
multiagent-loop     → fix the issues (PRs)
         ↓
code-review         → review PRs, approve/reject
         ↓
merge → rescan      → updated beliefs (cycle repeats)

The loop is stateful — the belief network persists across cycles. Issues don’t get re-discovered. Fixes are verified structurally. Regressions are caught because changed code invalidates beliefs, which cascades through derived conclusions.

One tool in this loop stands apart: the belief tracker (ftl-reasons) makes no LLM calls. It’s pure infrastructure — SQLite, graph algorithms, deterministic. The LLM decides what to believe. The tracker guarantees consistency. Every other tool in the loop is stochastic and needs experimental validation. The belief tracker is deterministic and tested with unit tests. It’s the stable foundation.

Why “Metaprogramming”?

Classical metaprogramming operates on code-as-data:

Level	Data	Operations
Programming	Values	Compute with them
Metaprogramming	Code	Transform, generate, analyze
This	Beliefs about code	Derive, retract, cascade, gate

We’re not transforming code or generating code. We’re building a structured theory of the codebase and then reasoning over that theory to produce actionable findings. The beliefs are the meta-level — they’re data about the program that can be operated on programmatically.

The derive command is the clearest example: it takes existing beliefs and proposes higher-level conclusions. “Pipeline fails gracefully” + “safety never blocks pipeline” + “startup is lightweight” → “pipeline is resilient.” This is automated reasoning about code properties, not about code syntax.

The what-if command takes it further: “what would break if we changed the dependency resolution strategy?” shows the cascade without changing anything. This is hypothetical reasoning about the codebase’s architectural properties — exploring the belief space, not the code space.

The Depth Hierarchy

Individual code facts (depth 0) combine into subsystem properties (depth 1), which combine into cross-subsystem properties (depth 2), which combine into system-wide guarantees (depth 3-4):

Depth	Example	What it means
0	`dep-detection-is-static-ast`	Observable code fact — read the source and verify
1	`errors-as-data-philosophy`	Single-subsystem property — reasoning over 2-4 facts
2	`dependency-bundling-reliable`	Cross-subsystem — emergent from multiple subsystems
3	`ai-automation-guardrail-architecture`	Multi-subsystem design (5 premises)
4	`ai-safe-autonomous-operation`	System-wide guarantee — AI can operate safely

The depth-4 conclusion traces to 33 independently verifiable base premises. No single code path demonstrates this property — it emerges from the interaction of error handling, graph topology, safety bounds, synthesis invariants, citation rules, concurrency patterns, input validation, and data flow constraints.

Retracting one base fact cascades through the hierarchy. If dep-detection-is-static-ast is invalidated → dependency-resolution-production-ready goes OUT → dependency-bundling-reliable goes OUT → any system-wide guarantee that depends on reliable bundling goes OUT. One code change, multiple architectural claims invalidated, all automatically.

The Human’s Role

The human doesn’t write code in this workflow. The human does three things:

Walk through the process once — direct the AI through scanning, deriving, and fixing interactively. Watch what works, correct what doesn’t.
Encode the pattern — the corrections and successes become tools and orchestrator configs. What was interactive becomes automated.
Audit the results — review the belief network, the derived conclusions, and the filed issues. Approve or redirect.

This is meta-programming in the literal sense: programming the programs that program. The tools we built encode how to think about code, not how to write code. The 785 beliefs aren’t code — they’re a procedure for understanding code that can be repeated on any codebase.

Getting Started

The tools are open source:

ftl-reasons — Belief tracking with automatic retraction cascades (pip install ftl-reasons)
ftl-beliefs — Flat belief registry for simpler use cases (pip install ftl-beliefs)

The belief-driven SDLC loop is still early — code-expert, multiagent-loop, and multi-agent-code-review are in active development. But the foundation (belief tracking with dependency networks) is stable, tested (211 tests), and published on PyPI.

The key insight: code analysis is a knowledge acquisition problem. Build the knowledge into a structured, queryable, maintainable network, and the analysis produces itself.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson

Metaprogramming With Beliefs: Treating Knowledge About Code as Data

What a Belief Network Looks Like

From Beliefs to Issues

What This Found That Code Review Missed

The Automated Loop

Why “Metaprogramming”?

The Depth Hierarchy

The Human’s Role

Getting Started

Share on

You May Also Enjoy

LLM Engineering Is Experimental Science, Not Software Engineering

LLMs Don’t Need Bigger Models. They Need Clay Tablets.

The Expert Agent

The Craft Before It Was Automated