Every few weeks a new one shows up. Someone discovers tree-sitter, parses a codebase into a graph, renders it with a force-directed layout, and posts a screenshot. It looks like a constellation. It gets hundreds of likes. The comments say "this is the future of code understanding."

You see it, and you get the itch. You want one for your codebase. Maybe you try the tool — poke around for twenty minutes, get a feel for the shape of things, close the tab, never open it again. Or maybe you decide to build your own. Tree-sitter makes parsing feel easy. A graph database makes storage feel free. A weekend later you have a demo that works on a small project.

Then you try it on a real codebase and everything falls apart.

Here's what's going to happen to you.

You're gonna try every rendering library

Your first version uses D3 with a force-directed layout. It works beautifully on a small project — a few hundred nodes, maybe a thousand edges. Clusters form naturally. You can hover and see connections light up. You can drag nodes around. It feels like you've built something real. Then you point it at an actual production codebase and the clusters collapse into a hairball. Labels pile on top of each other. One heavily-imported utility module sucks the entire layout into a star pattern that tells you nothing about the architecture.

So you start making decisions, and every one of them trades something away.

You're going to think hard about rendering. SVG is the easiest to start with but it falls over past a few hundred nodes — too many DOM elements. Canvas is fast but canvas text looks terrible next to the browser's native font rendering — and you're graphing code, so the labels are the whole point. WebGL gives you raw GPU performance but you trade away the DOM's flexibility and the ease of handling mouse events. You'll try one, hit a wall, switch to another, hit a different wall.

You're going to think harder about layout. Force-directed is the default and it has a lot going for it — it's interactive, users can drag nodes around, the graph responds in real time. But it's O(n²) per tick, and it doesn't encode hierarchy. Your code has structure — packages contain modules contain classes contain methods. Spring physics replaces all of that with "connected things clump together." At small scale the structure emerges naturally. At real scale it's soup.

So you'll look at hierarchical layout engines — ELK, COLA, Dagre. They handle tree structures well. But they assume your graph is a DAG, or close to one. The moment you start mapping call relationships alongside file structure, your codebase stops looking like a tree and starts looking like a network. A utility function called by forty files wants to be at the center of the graph, not buried in a leaf node of a directory tree. Call graphs are cyclic — A calls B, B calls C, C calls A — and hierarchical layouts handle cycles by reversing edges and drawing confusing back-arrows. The more your graph looks like real code, the worse hierarchical layout performs.

And hierarchical layouts are batch computations — you run the algorithm, you get coordinates back. If someone wants to drag a node to untangle a cluster, you can't. If the graph changes, you re-run from scratch. COLA caps out at around a hundred nodes before performance degrades. ELK handles more but still can't guarantee non-overlapping labels — only non-overlapping node bounding boxes, and only when you explicitly turn it on. Long function names extending past their boxes overlap freely.

So you end up back on force-directed, because at least it's interactive. And now you need to make it actually work.

You're gonna drown in the engineering of making it usable

Making a force-directed graph actually work at scale is where weekend projects go to die.

Your labels are going to be unreadable. Force-directed layout doesn't guarantee non-overlapping labels. At 500 nodes they pile on top of each other. You'll try font scaling — too small to read. You'll try hiding labels until hover — now your users can't scan the graph. You'll try collision avoidance and discover it's a layout problem on top of a layout problem. Eventually you'll realize you need level-of-detail rendering: show nothing at low zoom, show labels for nearby nodes at medium zoom, show everything at high zoom, and make the transitions feel seamless rather than jarring. And whichever rendering strategy you chose — canvas, SVG, WebGL — it's going to interact with your label system in ways you didn't anticipate. Canvas text looks bad. SVG text looks good but 10,000 DOM elements kills your frame rate. You end up building a hybrid renderer with a spatial index for hit testing, and now your "label system" is its own engineering project.

Your main thread is going to freeze. The force simulation runs on the same thread as your rendering and your user interaction. Past a few hundred nodes, every pan and zoom stutters because the physics are eating your frame budget. You need to move the simulation to a Web Worker and send position updates back to the render thread on an interval. Now you have a concurrency problem — the user is interacting with positions that are slightly stale, nodes jump when batched updates arrive, and you need to interpolate between states to keep it feeling smooth.

Your users are going to lose context. They'll carefully arrange their view, expand a cluster to see what's inside, and the force simulation will reshuffle every other node in the graph. Their spatial memory — "the auth stuff was over on the left" — is destroyed. You need position preservation: when a user expands or collapses a group, only the affected nodes should move. Everything else stays where it was. Now you're thinking about persisting user-determined layout state, and reconciling it with the simulation's opinion about where things should go.

None of this is in any tutorial. None of it shows up in your 200-node demo. It all hits at once when you load a real codebase, and each problem requires its own engineering investment that has nothing to do with parsing code or building a graph.

We built all of this. Our rendering pipeline, view orchestration layer, and force worker are tens of thousands of lines of TypeScript. The label system alone — multi-line rendering, intelligent truncation, collision avoidance, level-of-detail transitions — took weeks. These problems are inherent to the medium. Anyone who pushes graph visualization past the demo stage runs into the same walls.

You're gonna discover your edges are garbage

You survive the rendering gauntlet. You have a performant, interactive graph visualization that handles thousands of nodes.

Your edges are pointing at the wrong things.

Your same-file calls are going to be the only ones that work. Tree-sitter sees both the caller and the callee in the same file. Match by name, add an edge, done. But the calls that reveal architecture — the ones that cross file boundaries, that connect packages, that show you how your system actually fits together — those require tracing imports, resolving workspace packages, handling re-exports and barrel files and monorepo path mappings. Your tool handles this with a lookup table: name maps to file path. One match? Fine. Multiple matches? It picks whichever file got indexed first. No import tracing. No workspace resolution. A coin flip dressed up as analysis.

Your visualization is faithfully rendering nonsense. The graph still looks like a graph. The edges still look like edges. They're just wrong, and you'll never know because you're not going to check them by hand. Nobody does. Your metrics might tell you the graph has ten thousand call edges, and that feels like a lot. But are you checking them?

We benchmarked multiple tools on the same codebase against compiler-grade ground truth. One reported ten thousand edges — four times more than anyone else. Almost none were real. The resolution strategy was "only one function in the codebase has this name, so it must be the right one." Speculation dressed up as analysis. The tool had no idea it was wrong, and neither would you.

It works on the project you built it for. It silently degrades on everything else. You find out when you try to use it for something real and the answer is wrong.

You're gonna hit limits you can't engineer around

Let's say you're in rare territory — you've got the rendering, you've got real cross-file resolution, you're tracing imports and handling workspace packages. You're further than 95% of weekend projects get.

You're still going to hit a ceiling, and it's lower than you think.

Your language server is going to help you and trap you at the same time. In a statically typed language, the LSP knows types, follows the type system, finds the right definition. But it needs the full project context, the right SDK, the right build config. It only works for one language at a time. You can containerize it and run it in the cloud, but now you're standing up per-language infrastructure for each codebase you analyze — heavyweight setup for what's supposed to be a lightweight tool. You get accuracy at the cost of complexity.

Your dynamically typed users are going to get noise or silence. In Python, Ruby, JavaScript without TypeScript — which is still most of the world's code — tree-sitter tells you a function was called but can't tell you which of fifteen implementations gets dispatched at runtime. You either guess all of them and drown the signal in false positives, or pick none and produce a graph that's missing the connections that matter. There's no right answer without type inference, and type inference is a compiler problem, not a graph problem.

Your multi-language support is going to be a mile wide and an inch deep. Supporting a new language isn't just writing a tree-sitter grammar. It's import conventions, package resolution, workspace tooling, framework idioms. Go resolves imports to directories, not files. Ruby loads classes by convention, not explicit import. TypeScript has tsconfig paths, barrel exports, and package.json exports field indirection. Each language is its own project, and "supports 30 languages" usually means "parses 30 languages, resolves calls in maybe two."

We've lived this. We started with exhaustive static analysis — every edge, every constructor, every framework hook. Thorough, but so noisy the output was useless. We moved to LLM-based resolution that focuses on architecturally meaningful calls — a deliberate tradeoff that sacrifices completeness for usefulness. We picked a spot on the same spectrum everyone else is on, and every tool picks a different spot and lives with what it can't see from there.

Where this is actually going

We started where you did. Building a visualizer. An interactive code map in the IDE. We went through the rendering library gauntlet. We built the performance infrastructure — the web workers, the spatial indexing, the label system, all of it.

The visualization was how we demonstrated what we were actually building — the graph primitives. The data that makes a code graph useful, whether you're rendering it for a human or feeding it to an agent.

Coding agents went from novelty to default. It's no longer just the developer who needs to understand a codebase. Now it's an AI agent making hundreds of file-editing decisions autonomously. And increasingly, it's a system of agents — a software factory that assigns work, validates output, and merges changes without a human in the loop.

An agent doesn't need a visualization. It needs primitives. Call graphs that tell it what's upstream and downstream. Impact analysis that tells it which tests to run after a change. Dead code detection that keeps the codebase clean as agents generate more of it. Dependency data that prevents it from breaking something three packages away.

That's what we build now. We built the visualizer, and we're proud of the engineering that went into it. But the graph primitives are the product. Every hill we climbed to make the visualizer work taught us what the real product needed to be.

The difference is we kept going.