LLMs Don’t Have Super-Human Intelligence, But You Can

7 minute read

An LLM trained on the internet has a compressed, lossy representation of nearly everything humanity has ever written. Every Wikipedia article, most academic papers, most textbooks, most technical documentation, most code. No single human has read even a fraction of a percent of this.

But the LLM doesn’t know that it knows any of it. It has associative memory, like you do. The knowledge is there but dormant until the right prompt activates it. It can’t tell you “I know 47 things about metallurgy.” It only knows when you ask and the right associations fire.

This is exactly like having a colleague who has read every book ever written but has a terrible memory. They can’t tell you what they know. But ask the right question and they’ll say “oh yeah, I remember something about that” and pull out exactly what you needed.

You wouldn’t call that colleague super-intelligent. But the combination of your questions and their knowledge — that’s something neither of you could achieve alone.

Half an Intelligence

An LLM is half an intelligence. It’s purely generative. It produces — text, code, ideas, explanations — but it can’t evaluate what it produces. It has no internal critic. It doesn’t know when it’s wrong. It doesn’t know when it’s right either. It just generates.

Humans have both: a generator and a critic. You can come up with an idea and then evaluate whether it’s any good. That loop — generate, evaluate, refine — is what thinking actually is.

Right now, most people using AI are acting as the critic. The AI generates, you evaluate. You decide what’s good, what’s wrong, what needs another pass. The intelligence isn’t the AI. It’s the pair. The human-AI system is the thinking entity, not either half alone.

This is why different humans using the same AI get wildly different results. It’s not because the AI is inconsistent. It’s because each human accesses different parts of the model. Your questions, your follow-ups, your judgment about what’s worth pursuing — those determine which of the AI’s vast associative memories get activated. Two physicists asking the same model about the same problem will get different answers because they ask different questions, notice different things in the responses, and push in different directions.

The AI is the same. The intelligence that emerges is different because the human half is different.

You can partially close this gap with multiple AI roles — a generator, a critic, and a mediator. This gets closer to complete intelligence without a human in the loop. But the human critic still has something the AI critic lacks: second-order knowledge. You know what you know and what you don’t know. The AI doesn’t. It can’t say “I’m uncertain about this” with any reliability, because it has no model of its own knowledge. It has poor epistemic self-awareness. You have good epistemic self-awareness. That’s what makes you the better critic.

What Each Side Brings

What the LLM has: A vague recollection of almost everything. Breadth across every field. The ability to synthesize 50 documents in 30 seconds. Cross-domain pattern matching — connecting metallurgy to quantum computing to ancient history in one thought.

What the LLM lacks: Awareness of what it knows. Current information. Precision. Goals. Self-evaluation. Second-order knowledge — it doesn’t know what it knows or doesn’t know.

What you have: Direction. Recognition — you know a good answer when you see one. The right questions. Judgment. Context about your actual situation. The critic function that completes the intelligence.

What web search adds: Precision, currency, verification, discovery. The exact number, the exact date, what happened five minutes ago.

Together: a complete intelligence with breadth no human can match and judgment no AI can replicate.

The Pair Is the Intelligence

There are only a handful of frontier AI models. Maybe five that matter at any given time. But there are billions of humans, each with radically different context, experience, and perspective.

A metallurgist and a musician using the same Claude instance will produce completely different intelligences. Not because the AI changes — because the human half changes. The metallurgist activates knowledge about crystal structures and alloy properties. The musician activates knowledge about harmonic theory and acoustic physics. Same model, same weights, different intelligence emerging from the pair.

This means the combination explodes. A small number of AI models times billions of unique human perspectives produces billions of unique intelligences, each with superhuman breadth but shaped by a specific human’s direction and judgment.

The AI provides the scale and speed — vast knowledge, instant synthesis, tireless iteration. The human provides what no model can: the unique perspective that determines which questions get asked, which answers get pursued, and which results matter. Your career, your failures, your weird hobby, the conversation you had last Tuesday — all of that shapes what you pull out of the model. No one else will pull out the same thing.

This is why “AI will replace humans” misses the point entirely. The AI is half an intelligence. It needs the other half. And the other half is different for every person who sits down with it.

And this is why the AIs are services on the internet instead of locked in a box. If they were good enough by themselves, their owners would just keep them and collect infinite returns on investment. They wouldn’t sell access. The fact that Anthropic, OpenAI, and Google offer these models as services tells you something important: the models alone aren’t the product. The model plus you is the product. They need your half to complete the intelligence.

The Conversation Is the Interface

You can’t get superhuman intelligence from a single prompt. The LLM’s associative memory needs multiple passes to activate.

First question: the obvious answer. Follow-up: the nuance it forgot to mention. Third question: a connection to something unexpected. Fourth question: knowledge from a completely different domain that turns out to be relevant.

This is why conversation is the right interface, not search. Search gives you what matches. Conversation gives you what’s relevant — and steers toward insights neither of you would have reached alone.

Grounding It in Reality

The standard objection: LLMs confabulate. They make things up and sound confident doing it. How can you trust superhuman intelligence that might be superhuman nonsense?

The answer: don’t trust the words, trust the math.

I run multiple groups of AI agents working as researchers across different fields. They review literature, build simulations, analyze results, peer-review each other’s work, iterate on feedback. Some of these are problems that only a handful of humans have ever worked on.

The key insight: simulations can’t lie. An LLM can confabulate a plausible-sounding explanation of a physical phenomenon. It cannot confabulate a simulation result. When the code runs and produces numbers, those numbers are either right or wrong. When they’re wrong, the agent can see that and has to correct course.

This doesn’t eliminate confabulation — it contains it. Force everything through a simulation bottleneck and confabulation becomes self-correcting. The agent might confabulate an explanation, but it can’t confabulate a result that passes mathematical verification. The simulation is the ground truth that keeps the vast associative memory honest.

The human role: direction, domain familiarity (not expertise), and the judgment to say “that doesn’t look right, run it again” or “show me the derivation” or “compare that to the known result.” Knowing enough to steer. Not enough to do it alone.

Why This Is Super-Human

No single human could do this at this pace — reviewing the literature across multiple subfields, building simulations, running them, peer-reviewing the results, iterating on feedback, all across domains that would normally require years of specialization.

No LLM could do it alone either — it would confabulate plausible-sounding nonsense without the mathematical and simulation constraints.

Together: research velocity on hard problems that exceeds what either could achieve independently. Not because the AI is super-intelligent. Because the combination of human direction, LLM knowledge breadth, web search precision, and mathematical grounding exceeds what any individual can do.

A human alone can be an expert in one or two fields. A human plus LLMs plus the right tools can be competent across all of them, simultaneously, in real time, with ground truth checks that keep everything honest.

The Remaining Gaps

Even with all of this, there are two things missing. The LLM can’t track time — it doesn’t know whether a claim was made in January or yesterday. And it can’t track its own beliefs — it doesn’t know when something it holds to be true has been superseded by new evidence.

These are memory problems, not intelligence problems. And they’re solvable with the right tools.

Once you close those gaps — rational thinking, vast knowledge, web search for precision, simulations for ground truth, temporal memory, belief tracking — the question stops being “what can the AI do?” and becomes “what can’t we do together?”

I’m testing the hardest things I can find. I haven’t hit a limit yet.

Share on

X Facebook LinkedIn Bluesky

Ben Thomasson

LLMs Don’t Have Super-Human Intelligence, But You Can

Half an Intelligence

What Each Side Brings

The Pair Is the Intelligence

The Conversation Is the Interface

Grounding It in Reality

Why This Is Super-Human

The Remaining Gaps

Share on

You May Also Enjoy

Give Yourself Superpowers

This Blog Is Not for You, Human

Context Engineering Is Dead — Structure Your Information Instead

Classical AI Solved Your LLM’s Problems in 1979