Blog

Insights and updates from the Supermodel team- Building graph-based world models for coding agents, Benchmarking MCPs, building useful primitives for code factories and more.

April 24, 2026

I told Claude to double-check itself. It got 55% dumber.

Adding a verify step made my AI agent 55% dumber. Here's why asking a weaker tool to check a stronger one always backfires.

Jonathan Popham

Your AI agent is spending 90% of its tool calls on map-building

Why AI agents burn most of their tokens rediscovering a codebase, and how a small file next to each source file fixes it.

Jonathan Popham

April 21, 2026

Dear Vibecoder,

Your AI doesn't see your code as a system. It sees words. Here's what that costs you and what to do about it.

Jonathan Popham

April 21, 2026

Supermodel Public API Explainer

A walk through every public endpoint on the Supermodel API — what each graph represents, why we ship it as a primitive, and what you can build on top.

Jonathan Popham

March 30, 2026

What Dead Code Taught Us About Building Tools for AI Agents

We benchmarked AI agents on dead code detection across 60+ runs and 14 real-world repositories using Claude Opus 4.6. Graph-enhanced agents achieved 94.1% F1 with 100% precision. The result -- 156x cheaper, 11x faster, 2x better performance than the baseline agent alone. Here's what we learned about context engineering, honest benchmarking, and why code graphs are the missing primitive for software factories.

Jonathan Popham

May 8, 2026

Impact Analysis Is a Ranking Problem

We benchmarked whether Supermodel can rank the validation files an agent should inspect for a scoped change. On 10 real PRs, scoped ranking found 20 of 21 labeled validation files and beat a path/name baseline by 4.3x F1.

Jonathan Popham

April 30, 2026

The graph file is the interface

AI agents do not need another dashboard to understand your repo. They need a small graph file next to the code they are already reading.

Jonathan Popham

April 14, 2026

50% cheaper. 4× faster. Same correct answer.

We ran a test — give Claude Code the same task three ways on a 270k-line Django repo. All had to make 8 failing tests pass. Same model, same starting point. The runs with Supermodel were 40–50% cheaper and 3–4× faster.

Grey Newell

March 16, 2026

Why Your Weekend Code Graph Project is Bullshit

You discover tree-sitter, parse a codebase into a graph, render it with a force-directed layout, post a screenshot. It gets hundreds of likes. Then you try it on a real codebase and everything falls apart.

Lance Robertson

February 25, 2026

How We Split a Monolith Into a Control Plane and Data Plane (and Got 10x Scale)

How we redesigned a synchronous monolith into an async control plane and data plane in one calendar week — achieving 10x scale with zero new infrastructure.